# C-SFDA Stage 2 - Kaggle Notebook

## Setup Instructions
1. Upload this notebook to Kaggle
2. Turn ON GPU: Settings → Accelerator → GPU P100
3. Run cells in order

**Expected time:** 4-6 hours  
**Cost:** $0 (free tier)

## Step 1: Setup Environment

In [None]:
# Clone repository
!git clone https://github.com/nazmul-karim170/C-SFDA_Source-Free-Domain-Adaptation.git
%cd C-SFDA_Source-Free-Domain-Adaptation

In [None]:
# Uninstall any existing PyTorch and reinstall with CUDA support
!pip uninstall -y torch torchvision
!pip install torch==2.0.1+cu118 torchvision==0.15.2+cu118 --index-url https://download.pytorch.org/whl/cu118
!pip install hydra-core omegaconf scikit-learn tqdm wandb matplotlib
!pip install 'numpy<2'  # Fix NumPy compatibility with torchvision 0.15.2

In [2]:
# Verify GPU
import torch
print(f"PyTorch version: {torch.__version__}")
print(f"CUDA available: {torch.cuda.is_available()}")
if torch.cuda.is_available():
    print(f"GPU: {torch.cuda.get_device_name(0)}")
    print(f"VRAM: {torch.cuda.get_device_properties(0).total_memory / 1e9:.1f} GB")
else:
    print("⚠️ No GPU! Go to Settings → Accelerator → GPU P100")

PyTorch version: 2.0.1
CUDA available: False
⚠️ No GPU! Go to Settings → Accelerator → GPU P100


## Step 2: Download VisDA-C Dataset

**About VisDA-C:**
- Training domain: Synthetic object images (rendered from CAD models)
- Validation domain: Real object images (cropped from COCO dataset)
- 12 categories: aeroplane, bicycle, bus, car, horse, knife, motorcycle, person, plant, skateboard, train, truck

**For Stage 2, you only need the VALIDATION set (target domain)**

Choose one of the options below:

In [3]:
# ==================== OPTION A: Direct Download (Recommended) ====================
# Official VisDA-C dataset from Boston University server

!mkdir -p data/VisDA-C

# Download validation set (target domain) - ~5GB
!wget -P data/VisDA-C http://csr.bu.edu/ftp/visda17/clf/validation.tar

# Extract
!tar -xvf data/VisDA-C/validation.tar -C data/VisDA-C/

# Download image list files (already in repo but good to have latest)
!wget -P data/VisDA-C https://raw.githubusercontent.com/VisionLearningGroup/taskcv-2017-public/master/classification/data/image_list.txt

# Optional: Download training set if you want to run Stage 1 later (~25GB)
# !wget -P data/VisDA-C http://csr.bu.edu/ftp/visda17/clf/train.tar
# !tar -xvf data/VisDA-C/train.tar -C data/VisDA-C/

# Clean up tar files to save space
!rm -f data/VisDA-C/*.tar

print("\n✓ Dataset downloaded!")


# ==================== OPTION B: Google Drive Download ====================
# If the wget links don't work, use Google Drive

# !pip install -q gdown
# !mkdir -p data/VisDA-C

# # Validation set (~5GB)
# !gdown --id 0BwcIeDbwQ0XmUEVJRjl4Tkd4bTA -O data/VisDA-C/validation.tar
# !tar -xvf data/VisDA-C/validation.tar -C data/VisDA-C/
# !rm data/VisDA-C/validation.tar

# # Optional: Training set if needed (~25GB)
# # !gdown --id 0BwcIeDbwQ0XmdENwQ3R4TUVTMHc -O data/VisDA-C/train.tar
# # !tar -xvf data/VisDA-C/train.tar -C data/VisDA-C/
# # !rm data/VisDA-C/train.tar


# ==================== OPTION C: Upload from Kaggle Dataset ====================
# If you've already uploaded VisDA-C as a Kaggle dataset

# !mkdir -p data
# !cp -r /kaggle/input/visda-c/VisDA-C ./data/
# # OR
# !cp -r /kaggle/input/visda-c/* ./data/VisDA-C/


# ==================== Verify Dataset ====================
print("\nChecking dataset structure...")
!ls -lh data/VisDA-C/

print("\nExpected structure:")
print("data/VisDA-C/")
print("  ├── validation/          (target domain images)")
print("  ├── validation_list.txt  (provided in repo)")
print("  └── train_list.txt       (provided in repo)")
print("\nIf you see 'validation/' folder, you're ready!")

--2025-11-11 22:11:40--  http://csr.bu.edu/ftp/visda17/clf/validation.tar
Resolving csr.bu.edu (csr.bu.edu)... 128.197.11.70
Connecting to csr.bu.edu (csr.bu.edu)|128.197.11.70|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 1023758336 (976M) [application/x-tar]
Saving to: ‘data/VisDA-C/validation.tar’

validation.tar        0%[                    ]  30.76K  --.-KB/s    eta 7d 1h  

OSError: [Errno 5] Input/output error

## Step 3: Download Pre-trained Checkpoint

In [1]:
# Download checkpoint from Google Drive
# Link: https://drive.google.com/drive/folders/16vTNNzzAt4M1mmeLsOxSFDRzBogaNkJw

!pip install -q gdown

# Download specific file (replace FILE_ID with actual ID from Drive link)
# Get FILE_ID by: Right-click file in Drive → Get link → Copy ID
# Example: https://drive.google.com/file/d/1a2B3c4D5e6F7g8H9i0J/view → FILE_ID = 1a2B3c4D5e6F7g8H9i0J

# !gdown --id FILE_ID_FOR_best_train_2020 -O checkpoint/best_train_2020.pth.tar

# Alternative: Download entire folder
!mkdir -p checkpoint
# !gdown --folder https://drive.google.com/drive/folders/16vTNNzzAt4M1mmeLsOxSFDRzBogaNkJw --output checkpoint/
!gdown --folder https://drive.google.com/drive/folders/1gJhqu00z536tPB3wwBw6zcWIxPjbh5Ri --output checkpoint/

# Verify checkpoint
!ls -lh checkpoint/

Retrieving folder contents
Processing file 17Jy9I55-bldXLmcPt-QVKA2stc8nNzLI best_train_2020.pth.tar
Processing file 1fxufBP0NS_yUdmDoYB1hHlsZCBAtFAiX best_train_2021.pth.tar
Processing file 1WHVkgB9DRm7LhIoY3yciWI3O9TeBR-bD best_train_2022.pth.tar
Retrieving folder contents completed
Building directory structure
Building directory structure completed
^C
  File "/Users/cuongnguyen/Documents/Academic/C-SFDA/.venv/lib/python3.9/site-packages/requests/sessions.py", line 724, in <listcomp>
    history = [resp for resp in gen]
  File "/Users/cuongnguyen/Documents/Academic/C-SFDA/.venv/lib/python3.9/site-packages/requests/sessions.py", line 265, in resolve_redirects
    resp = self.send(
  File "/Users/cuongnguyen/Documents/Academic/C-SFDA/.venv/lib/python3.9/site-packages/requests/sessions.py", line 703, in send
    r = adapter.send(request, **kwargs)
  File "/Users/cuongnguyen/Documents/Academic/C-SFDA/.venv/lib/python3.9/site-packages/requests/adapters.py", line 644, in send
    resp = co

## Step 4B: Full Training Run (4-6 hours)

**Only run this after Step 4A succeeds!**

This will take 4-6 hours. The notebook will keep running even if you close the browser.

In [None]:
# Full training - 25 epochs (4-6 hours)
!python main_csfda.py \
    train_source=false \
    seed=2020 \
    data.dataset="VISDA-C" \
    data.data_root="./data/" \
    data.source_domains="[train]" \
    data.target_domains="[validation]" \
    data.batch_size=64 \
    data.workers=4 \
    model_src.arch="resnet101" \
    model_tta.src_log_dir="./checkpoint/" \
    learn.epochs=25 \
    optim.lr=2e-4 \
    multiprocessing_distributed=false \
    use_wandb=false

## Step 4A: Quick Test Run (10-15 minutes)

**Run this first to verify everything works!**

This will do 1 epoch to test:
- Dataset loads correctly
- Checkpoint loads
- Model initializes
- Training loop runs
- Output saves properly

Once this succeeds, skip to Step 4B for full training.

In [None]:
# Quick test run - just 1 epoch to verify everything works
# Use sys.executable to ensure we're using the notebook's Python with correct PyTorch
import sys
python_path = sys.executable
print(f"Using Python: {python_path}")

!{python_path} main_csfda.py \
    train_source=false \
    seed=2022 \
    data.dataset="VISDA-C" \
    data.data_root="./data/" \
    data.source_domains="[train]" \
    data.target_domains="[validation]" \
    data.batch_size=64 \
    data.workers=4 \
    model_src.arch="resnet101" \
    model_tta.src_log_dir="./checkpoint/" \
    learn.epochs=1 \
    optim.lr=2e-4 \
    multiprocessing_distributed=false \
    use_wandb=false

## Step 5: Check Results

In [None]:
# List output files
!find output/ -name "*.pth.tar" -o -name "*.txt" -o -name "*.yaml" | head -20

In [None]:
# Read final results (adjust path based on actual output)
!tail -50 output/VISDA-C/*/logs.txt 2>/dev/null || echo "Check output/ directory structure"

In [None]:
# Download results
!zip -r stage2_results.zip output/
from IPython.display import FileLink
FileLink('stage2_results.zip')

## Expected Results

According to the paper, on VisDA-C you should see:
- **Test Accuracy:** ~85%
- **Per-class Average:** ~83-85%

If you get similar numbers, congrats! The adaptation worked.

## Troubleshooting

### GPU Out of Memory
Reduce batch size: `data.batch_size=32` or `=16`

### Checkpoint Not Found
Check: `!ls checkpoint/` should show `best_train_2020.pth.tar`

### Dataset Not Found
Check: `!ls data/VisDA-C/` should show `train/` and `validation/` folders