# 📋 Task 2: Domain Generalization via Invariant & Robust Learning

In this task, we explore Domain Generalization (DG), where a model is trained on multiple source domains and must generalize to a completely unseen target domain. We will implement and compare four methods: ERM, IRM, GroupDRO, and SAM.

Our setup will use the **PACS dataset**. We will train on the **Art, Cartoon, and Photo** domains, holding out the **Sketch** domain as our unseen test environment, as suggested in the assignment manual.

---

## **Part 1: Empirical Risk Minimization (ERM) Baseline**

### **1.1. Overview**

We begin by establishing a baseline using standard **Empirical Risk Minimization (ERM)**. This approach involves merging all data from the source domains into a single dataset and training a standard classifier on it. This model's performance on the unseen target domain will serve as the benchmark against which we will compare more advanced DG techniques.

### **1.2. Environment Setup**

First, we need to set up the Python environment to ensure the notebook can find and import the DomainBed library from our `code/` directory.

In [None]:
import json
import os
import sys

import matplotlib.pyplot as plt
import pandas as pd
import seaborn as sns

os.environ["TQDM_NOTEBOOK"] = "0"
from tqdm import tqdm

# Add the 'code' directory to the Python path
# This allows us to import 'domainbed'
module_path = os.path.abspath(
    os.path.join(".", "code")
)  # Adjusted path for root notebook
if module_path not in sys.path:
    sys.path.append(module_path)
    print(f"✅ Added '{module_path}' to Python path.")

# Import the main training function from DomainBed
try:
    from domainbed.scripts import train

    print("✅ Successfully imported DomainBed.")
except ImportError as e:
    print(
        "❌ Error importing DomainBed. Make sure the repository is in the './code/domainbed' directory."
    )
    print(e)

# Set plotting style for later
sns.set_theme(style="whitegrid")

### **1.3. Experiment Runner Function**

To keep our code clean, we'll define a helper function that can launch any DomainBed experiment by taking a dictionary of arguments. This function mimics passing arguments via the command line.

In [None]:
def run_experiment(args_dict):
    """Launches a DomainBed training run with arguments from a dictionary."""
    original_argv = sys.argv
    try:
        # Create the argument list for the train.main() function
        sys.argv = [''] # First element is the script name, can be empty
        for key, value in args_dict.items():
            sys.argv.append(f'--{key}')
            sys.argv.append(str(value))

        print(f"🚀 Starting training for algorithm: {args_dict.get('algorithm', 'N/A')}")
        print(f"   Log and model output will be saved to: {args_dict.get('output_dir', 'N/A')}")
        
        # Call the main training function from DomainBed
        train.main()
        
        print(f"🎉 Training finished for {args_dict.get('algorithm', 'N/A')}.")

    except Exception as e:
        print(f"❌ An error occurred during the experiment: {e}")
    finally:
        # Restore the original sys.argv to not interfere with other notebook cells
        sys.argv = original_argv

### **1.4. Run ERM Training**

Now, we define the specific parameters for our ERM baseline experiment and launch the training.

In [None]:
# --- ERM Experiment Configuration ---
erm_args = {
    'data_dir': './data/',
    'dataset': 'PACS',
    'algorithm': 'ERM',
    'test_env': 3,  # The index for the 'Sketch' domain in PACS, our unseen target
    'output_dir': './results/erm',
    'hparams_seed': 0,
    'trial_seed': 0,
    'seed': 0,
    'progress_bar': True,
}

# Launch the experiment
run_experiment(erm_args)