<a href="https://colab.research.google.com/github/kaminglui/Domain-Adaptation-with-ME-IIS/blob/main/ME_IIS_Colab.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# ME-IIS Domain Adaptation

This notebook runs the **Domain-Adaptation-with-ME-IIS** code on Google Colab.

It will:

1. Clone / update the GitHub repo.
2. Install Python dependencies.
3. Download **Office-Home** and/or **Office-31** via `kagglehub` when needed.
4. Train a **source-only ResNet‑50** baseline.
5. Run **ME‑IIS** (with optional BIC‑based GMM selection and pseudo‑labels).
6. Save checkpoints and CSV results in the repo directory.

In [None]:
# === 1. (Optional) Mount Google Drive so checkpoints & results persist ===

import os

USE_DRIVE = True   # set to False if you do NOT want to use Google Drive

if USE_DRIVE:
    from google.colab import drive  # type: ignore
    drive.mount('/content/drive')
    PROJECT_ROOT = "/content/drive/MyDrive/MEIIS-Colab"
else:
    PROJECT_ROOT = "/content"

os.makedirs(PROJECT_ROOT, exist_ok=True)
os.chdir(PROJECT_ROOT)
print("PROJECT_ROOT:", PROJECT_ROOT)
print("Current working dir:", os.getcwd())

Mounted at /content/drive
PROJECT_ROOT: /content/drive/MyDrive/MEIIS-Colab
Current working dir: /content/drive/MyDrive/MEIIS-Colab


In [None]:
# === 2. Clone / update the ME-IIS repo and install dependencies ===

import os, subprocess, sys

REPO_URL = "https://github.com/kaminglui/Domain-Adaptation-with-ME-IIS.git"
REPO_DIR = "Domain-Adaptation-with-ME-IIS"

if not os.path.isdir(REPO_DIR):
    print("[Repo] Cloning repository...")
    subprocess.run(f"git clone {REPO_URL}", shell=True, check=True)
else:
    print("[Repo] Repository already exists, pulling latest changes...")
    subprocess.run(f"cd {REPO_DIR} && git pull", shell=True, check=False)

os.chdir(REPO_DIR)
print("[Repo] Now in:", os.getcwd())

# Install dependencies (Colab: prefer the colab-specific requirements if present)
req_path = None
if os.path.exists("env/requirements_colab.txt"):
    req_path = "env/requirements_colab.txt"
elif os.path.exists("env/requirements.txt"):
    req_path = "env/requirements.txt"

if req_path is not None:
    print(f"[Deps] Installing from {req_path} ...")
    subprocess.run(f"pip install -r {req_path}", shell=True, check=False)
else:
    print("[Deps] WARNING: No env/requirements*.txt found; assuming runtime has needed packages.")

[Repo] Repository already exists, pulling latest changes...
[Repo] Now in: /content/drive/MyDrive/MEIIS-Colab/Domain-Adaptation-with-ME-IIS
[Deps] Installing from env/requirements_colab.txt ...


In [None]:
# === 3. Dataset helpers (Office-Home & Office-31 via KaggleHub) ===

import pathlib, textwrap, subprocess

try:
    import kagglehub  # type: ignore
except ImportError:
    print("[Data] Installing kagglehub...")
    subprocess.run("pip install kagglehub", shell=True, check=True)
    import kagglehub  # type: ignore


def _find_office_home_root(base_dir: str) -> str:
    """Search recursively for a folder that has Art / Clipart / Product / Real* subfolders."""
    base = pathlib.Path(base_dir)
    candidates = [base] + list(base.rglob("*"))
    for c in candidates:
        if not c.is_dir():
            continue
        names = {d.name for d in c.iterdir() if d.is_dir()}
        if {"Art", "Clipart", "Product"} <= names and any(n.lower().startswith("real") for n in names):
            print("[Data] Found Office-Home root:", c)
            return str(c)
    raise RuntimeError(f"Could not find Office-Home root under {base_dir}")


def _find_office31_root(base_dir: str) -> str:
    """Search recursively for a folder that has amazon / dslr / webcam subfolders."""
    base = pathlib.Path(base_dir)
    candidates = [base] + list(base.rglob("*"))
    for c in candidates:
        if not c.is_dir():
            continue
        names = {d.name.lower() for d in c.iterdir() if d.is_dir()}
        if {"amazon", "dslr", "webcam"} <= names:
            print("[Data] Found Office-31 root:", c)
            return str(c)
    raise RuntimeError(f"Could not find Office-31 root under {base_dir}")


def resolve_data_root(dataset_name: str) -> str:
    """Download and locate the dataset root for the given benchmark."""
    dataset_name = dataset_name.lower()
    if dataset_name == "office_home":
        print("[Data] Downloading Office-Home (lhrrraname/officehome) via KaggleHub...")
        root = kagglehub.dataset_download("lhrrraname/officehome")
        print("  Raw Office-Home KaggleHub root:", root)
        try:
            return _find_office_home_root(root)
        except Exception as e:
            print("  WARNING:", e)
            fallback = os.path.join("datasets", "Office-Home")
            print("  Falling back to:", fallback)
            return fallback
    elif dataset_name == "office31":
        print("[Data] Downloading Office-31 (xixuhu/office31) via KaggleHub...")
        root = kagglehub.dataset_download("xixuhu/office31")
        print("  Raw Office-31 KaggleHub root:", root)
        try:
            return _find_office31_root(root)
        except Exception as e:
            print("  WARNING:", e)
            fallback = os.path.join("datasets", "Office-31")
            print("  Falling back to:", fallback)
            return fallback
    else:
        raise ValueError(f"Unknown dataset_name {dataset_name}")

## 4. Configure a single experiment

Edit the variables in the next cell to choose:

* Benchmark (`office_home` or `office31`)
* Source / target domains
* Training hyper‑parameters
* ME‑IIS + BIC + pseudo‑label settings

Then run the **Train source-only** cell, followed by the **Adapt with ME‑IIS** cell.

In [None]:
# === 4.1 Main experiment configuration ===

# Dataset / domains
DATASET_NAME = "office_home"   # "office_home" or "office31"
SOURCE_DOMAIN = "Ar"           # Office-Home: Ar/Cl/Pr/Rw;  Office-31: A/D/W
TARGET_DOMAIN = "Cl"

# Global seed / determinism
SEED = 0
DETERMINISTIC = True

# Data loader controls
BATCH_SIZE = 32
NUM_WORKERS = 2          # use 0 if you see DataLoader issues
DRY_RUN_MAX_SAMPLES = 0  # 0 = use full dataset; >0 = limit per domain
DRY_RUN_MAX_BATCHES = 0  # 0 = no cap; >0 = limit batches (quick smoke test)

# Source-only training hyper-parameters
NUM_EPOCHS_SRC = 100
LR_BACKBONE = 1e-3
LR_CLASSIFIER = 1e-2
WEIGHT_DECAY = 1e-3
SAVE_EVERY = 20          # 0 = only final checkpoint; >0 = also save every N epochs

# ME-IIS adaptation hyper-parameters
ADAPT_EPOCHS = 15
FINETUNE_BACKBONE = True
BACKBONE_LR_SCALE = 0.1       # only used if FINETUNE_BACKBONE is True
CLASSIFIER_LR_ADAPT = 1e-2
WEIGHT_DECAY_ADAPT = 1e-3

# GMM / ME-IIS settings
GMM_SELECTION_MODE = "bic"     # "fixed" or "bic"
NUM_LATENT_STYLES = 5          # only used for "fixed" mode
GMM_BIC_MIN_COMPONENTS = 3     # lower bound for BIC search
GMM_BIC_MAX_COMPONENTS = 10    # upper bound for BIC search
FEATURE_LAYERS = "layer3,layer4,avgpool"

IIS_ITERS = 20
IIS_TOL = 5e-4

# Pseudo-label settings (ME-IIS + PL)
USE_PSEUDO_LABELS = True
PSEUDO_CONF_THRESH = 0.9   # keep only targets with max prob >= this
PSEUDO_MAX_RATIO = 1.0     # max pseudo-labeled targets per source sample
PSEUDO_LOSS_WEIGHT = 0.5   # weight for pseudo-target loss vs source loss

## 5. Train source-only baseline (ResNet‑50)

This calls `scripts/train_source.py` with the configuration above.

It will:

* Resolve the dataset root (via KaggleHub if needed).
* Train the source‑only model.
* Save a checkpoint to `checkpoints/source_only_{SRC}_to_{TGT}_seed{SEED}.pth`.
* Append results to `results/office_home_me_iis.csv` (even for Office‑31, for now).

In [None]:
# === 5. Train source-only baseline ===

DATA_ROOT = resolve_data_root(DATASET_NAME)
print("[Config] Using DATA_ROOT:", DATA_ROOT)

cmd = (
    "python scripts/train_source.py "
    f"--dataset_name {DATASET_NAME} "
    f"--data_root \"{DATA_ROOT}\" "
    f"--source_domain {SOURCE_DOMAIN} "
    f"--target_domain {TARGET_DOMAIN} "
    f"--num_epochs {NUM_EPOCHS_SRC} "
    f"--batch_size {BATCH_SIZE} "
    f"--lr_backbone {LR_BACKBONE} "
    f"--lr_classifier {LR_CLASSIFIER} "
    f"--weight_decay {WEIGHT_DECAY} "
    f"--num_workers {NUM_WORKERS} "
    f"--seed {SEED} "
)

# Optional flags
if DETERMINISTIC:
    cmd += "--deterministic "

if DRY_RUN_MAX_SAMPLES > 0:
    cmd += f"--dry_run_max_samples {DRY_RUN_MAX_SAMPLES} "

if DRY_RUN_MAX_BATCHES > 0:
    cmd += f"--dry_run_max_batches {DRY_RUN_MAX_BATCHES} "

if SAVE_EVERY > 0:
    cmd += f"--save_every {SAVE_EVERY} "

print("\n[Run] train_source.py command:\n")
print(cmd)
print("\n[Run] Launching...\n")

# This will stream training output in real time
!{cmd}

[Data] Downloading Office-Home (lhrrraname/officehome) via KaggleHub...
Using Colab cache for faster access to the 'officehome' dataset.
  Raw Office-Home KaggleHub root: /kaggle/input/officehome
[Data] Found Office-Home root: /kaggle/input/officehome/datasets/OfficeHomeDataset_10072016
[Config] Using DATA_ROOT: /kaggle/input/officehome/datasets/OfficeHomeDataset_10072016

[Run] train_source.py command:

python scripts/train_source.py --dataset_name office_home --data_root "/kaggle/input/officehome/datasets/OfficeHomeDataset_10072016" --source_domain Ar --target_domain Cl --num_epochs 100 --batch_size 32 --lr_backbone 0.001 --lr_classifier 0.01 --weight_decay 0.001 --num_workers 2 --seed 0 --deterministic 

[Run] Launching...

2025-12-07 06:02:05.337619: I tensorflow/core/util/port.cc:153] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment varia

## 6. Adapt with ME‑IIS (optionally with BIC + pseudo‑labels)

This calls `scripts/adapt_me_iis.py` with:

* GMM selection mode: `fixed` or `bic` (BIC chooses components per layer).
* Optional pseudo‑labels on confident target samples.

It expects that the source‑only checkpoint from the previous cell already exists.

In [None]:
# === 6. Adapt with ME-IIS (BIC + optional pseudo-labels) ===

DATA_ROOT = resolve_data_root(DATASET_NAME)
print("[Config] Using DATA_ROOT:", DATA_ROOT)

SOURCE_CKPT = f"checkpoints/source_only_{SOURCE_DOMAIN}_to_{TARGET_DOMAIN}_seed{SEED}.pth"

cmd = (
    "python scripts/adapt_me_iis.py "
    f"--dataset_name {DATASET_NAME} "
    f"--data_root \"{DATA_ROOT}\" "
    f"--source_domain {SOURCE_DOMAIN} "
    f"--target_domain {TARGET_DOMAIN} "
    f"--checkpoint {SOURCE_CKPT} "
    f"--batch_size {BATCH_SIZE} "
    f"--num_workers {NUM_WORKERS} "
    f"--num_latent_styles {NUM_LATENT_STYLES} "
    f"--feature_layers {FEATURE_LAYERS} "
    f"--gmm_selection_mode {GMM_SELECTION_MODE} "
    f"--gmm_bic_min_components {GMM_BIC_MIN_COMPONENTS} "
    f"--gmm_bic_max_components {GMM_BIC_MAX_COMPONENTS} "
    f"--iis_iters {IIS_ITERS} "
    f"--iis_tol {IIS_TOL} "
    f"--adapt_epochs {ADAPT_EPOCHS} "
    f"--backbone_lr_scale {BACKBONE_LR_SCALE} "
    f"--classifier_lr {CLASSIFIER_LR_ADAPT} "
    f"--weight_decay {WEIGHT_DECAY_ADAPT} "
    f"--pseudo_conf_thresh {PSEUDO_CONF_THRESH} "
    f"--pseudo_max_ratio {PSEUDO_MAX_RATIO} "
    f"--pseudo_loss_weight {PSEUDO_LOSS_WEIGHT} "
    f"--seed {SEED} "
)

if FINETUNE_BACKBONE:
    cmd += "--finetune_backbone "

if USE_PSEUDO_LABELS:
    cmd += "--use_pseudo_labels "

if DETERMINISTIC:
    cmd += "--deterministic "

if DRY_RUN_MAX_SAMPLES > 0:
    cmd += f"--dry_run_max_samples {DRY_RUN_MAX_SAMPLES} "

if DRY_RUN_MAX_BATCHES > 0:
    cmd += f"--dry_run_max_batches {DRY_RUN_MAX_BATCHES} "

print("\n[Run] adapt_me_iis.py command:\n")
print(cmd)
print("\n[Run] Launching ME-IIS adaptation...\n")

!{cmd}

[Data] Downloading Office-Home (lhrrraname/officehome) via KaggleHub...
Using Colab cache for faster access to the 'officehome' dataset.
  Raw Office-Home KaggleHub root: /kaggle/input/officehome
[Data] Found Office-Home root: /kaggle/input/officehome/datasets/OfficeHomeDataset_10072016
[Config] Using DATA_ROOT: /kaggle/input/officehome/datasets/OfficeHomeDataset_10072016

[Run] adapt_me_iis.py command:

python scripts/adapt_me_iis.py --dataset_name office_home --data_root "/kaggle/input/officehome/datasets/OfficeHomeDataset_10072016" --source_domain Ar --target_domain Cl --checkpoint checkpoints/source_only_Ar_to_Cl_seed0.pth --batch_size 32 --num_workers 2 --num_latent_styles 5 --feature_layers layer3,layer4,avgpool --gmm_selection_mode bic --gmm_bic_min_components 3 --gmm_bic_max_components 10 --iis_iters 20 --iis_tol 0.0005 --adapt_epochs 15 --backbone_lr_scale 0.1 --classifier_lr 0.01 --weight_decay 0.001 --pseudo_conf_thresh 0.9 --pseudo_max_ratio 1.0 --pseudo_loss_weight 0.5 --s

## 7. Where to find outputs

* **Checkpoints**
  * Source-only baseline: `checkpoints/source_only_{SOURCE}_to_{TARGET}_seed{SEED}.pth`
  * ME‑IIS adaptation: `checkpoints/me_iis_{SOURCE}_to_{TARGET}_<layers>_seed{SEED}.pth`

* **IIS weights / diagnostics**
  * `results/me_iis_weights_{SOURCE}_to_{TARGET}_<layers>_seed{SEED}.npz`

* **Aggregated CSV**
  * `results/office_home_me_iis.csv` – each run appends one row with:
    * dataset, source, target, seed
    * method (`source_only`, `me_iis`, `me_iis_bic`, `me_iis_pl`, etc.)
    * accuracies and ME‑IIS hyper‑parameters

To try a new configuration, edit the variables in **Section 4**, then re‑run:

1. The **Train source-only baseline** cell.
2. The **Adapt with ME‑IIS** cell.