# C-SFDA Stage 2 - Kaggle Notebook

## Setup Instructions
1. Upload this notebook to Kaggle
2. Turn ON GPU: Settings → Accelerator → GPU P100
3. Run cells in order

**Expected time:** 4-6 hours  
**Cost:** $0 (free tier)

## Step 1: Setup Environment

In [None]:
# # Clone repository
# !git clone https://github.com/nazmul-karim170/C-SFDA_Source-Free-Domain-Adaptation.git
# %cd C-SFDA_Source-Free-Domain-Adaptation

In [1]:
import os
import sys

# Force use of conda's libstdc++
conda_lib = '/home/ec2-user/anaconda3/envs/python3/lib'
if 'LD_LIBRARY_PATH' in os.environ:
    os.environ['LD_LIBRARY_PATH'] = f"{conda_lib}:{os.environ['LD_LIBRARY_PATH']}"
else:
    os.environ['LD_LIBRARY_PATH'] = conda_lib

# Restart Python to pick up the new library path
# os.execv(sys.executable, ['python'] + sys.argv)

In [1]:
# Uninstall any existing PyTorch and reinstall with CUDA support
!pip install torch==2.0.1+cu118 torchvision==0.15.2+cu118 --index-url https://download.pytorch.org/whl/cu118
!pip install hydra-core omegaconf scikit-learn tqdm wandb matplotlib
!pip install 'numpy<2'  # Fix NumPy compatibility with torchvision 0.15.2

Looking in indexes: https://download.pytorch.org/whl/cu118


In [2]:
# Verify GPU
import torch
print(f"PyTorch version: {torch.__version__}")
print(f"CUDA available: {torch.cuda.is_available()}")
if torch.cuda.is_available():
    print(f"GPU: {torch.cuda.get_device_name(0)}")
    print(f"VRAM: {torch.cuda.get_device_properties(0).total_memory / 1e9:.1f} GB")
else:
    print("⚠️ No GPU! Go to Settings → Accelerator → GPU P100")

PyTorch version: 2.0.1+cu118
CUDA available: True
GPU: NVIDIA A10G
VRAM: 23.7 GB


## Step 2: Download VisDA-C Dataset

**About VisDA-C:**
- Training domain: Synthetic object images (rendered from CAD models)
- Validation domain: Real object images (cropped from COCO dataset)
- 12 categories: aeroplane, bicycle, bus, car, horse, knife, motorcycle, person, plant, skateboard, train, truck

**For Stage 2, you only need the VALIDATION set (target domain)**

Choose one of the options below:

In [None]:
# ==================== OPTION A: Direct Download (Recommended) ====================
# Official VisDA-C dataset from Boston University server

!mkdir -p data/VISDA-C

# Download validation set (target domain) - ~5GB
!wget -P data/VISDA-C http://csr.bu.edu/ftp/visda17/clf/validation.tar

# Extract
!tar -xvf data/VISDA-C/validation.tar -C data/VISDA-C/

# Download image list files (already in repo but good to have latest)
!wget -O data/VISDA-C/validation_list.txt https://raw.githubusercontent.com/VisionLearningGroup/taskcv-2017-public/master/classification/data/image_list.txt

# Optional: Download training set if you want to run Stage 1 later (~25GB)
# !wget -P data/VisDA-C http://csr.bu.edu/ftp/visda17/clf/train.tar
# !tar -xvf data/VisDA-C/train.tar -C data/VisDA-C/

# Clean up tar files to save space
!rm -f data/VISDA-C/*.tar

print("\n✓ Dataset downloaded!")


# ==================== OPTION B: Google Drive Download ====================
# If the wget links don't work, use Google Drive

# !pip install -q gdown
# !mkdir -p data/VisDA-C

# # Validation set (~5GB)
# !gdown --id 0BwcIeDbwQ0XmUEVJRjl4Tkd4bTA -O data/VisDA-C/validation.tar
# !tar -xvf data/VisDA-C/validation.tar -C data/VisDA-C/
# !rm data/VisDA-C/validation.tar

# # Optional: Training set if needed (~25GB)
# # !gdown --id 0BwcIeDbwQ0XmdENwQ3R4TUVTMHc -O data/VisDA-C/train.tar
# # !tar -xvf data/VisDA-C/train.tar -C data/VisDA-C/
# # !rm data/VisDA-C/train.tar


# ==================== OPTION C: Upload from Kaggle Dataset ====================
# If you've already uploaded VisDA-C as a Kaggle dataset

# !mkdir -p data
# !cp -r /kaggle/input/visda-c/VisDA-C ./data/
# # OR
# !cp -r /kaggle/input/visda-c/* ./data/VisDA-C/


# ==================== Verify Dataset ====================
print("\nChecking dataset structure...")
!ls -lh data/VisDA-C/

print("\nExpected structure:")
print("data/VisDA-C/")
print("  ├── validation/          (target domain images)")
print("  ├── validation_list.txt  (provided in repo)")
print("  └── train_list.txt       (provided in repo)")
print("\nIf you see 'validation/' folder, you're ready!")

In [5]:
!wget -O data/VISDA-C/validation_list.txt https://raw.githubusercontent.com/VisionLearningGroup/taskcv-2017-public/master/classification/data/image_list.txt

--2025-11-14 15:10:33--  https://raw.githubusercontent.com/VisionLearningGroup/taskcv-2017-public/master/classification/data/image_list.txt
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.111.133, 185.199.110.133, 185.199.109.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.111.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 3415515 (3.3M) [text/plain]
Saving to: ‘data/VISDA-C/validation_list.txt’


2025-11-14 15:10:33 (196 MB/s) - ‘data/VISDA-C/validation_list.txt’ saved [3415515/3415515]



## Step 3: Download Pre-trained Checkpoint

In [6]:
# Download checkpoint from Google Drive
# Link: https://drive.google.com/drive/folders/16vTNNzzAt4M1mmeLsOxSFDRzBogaNkJw

!pip install -q gdown

# Download specific file (replace FILE_ID with actual ID from Drive link)
# Get FILE_ID by: Right-click file in Drive → Get link → Copy ID
# Example: https://drive.google.com/file/d/1a2B3c4D5e6F7g8H9i0J/view → FILE_ID = 1a2B3c4D5e6F7g8H9i0J

# !gdown --id FILE_ID_FOR_best_train_2020 -O checkpoint/best_train_2020.pth.tar

# Alternative: Download entire folder
!mkdir -p checkpoint
# !gdown --folder https://drive.google.com/drive/folders/16vTNNzzAt4M1mmeLsOxSFDRzBogaNkJw --output checkpoint/
!gdown --folder https://drive.google.com/drive/folders/1gJhqu00z536tPB3wwBw6zcWIxPjbh5Ri --output checkpoint/

# Verify checkpoint
!ls -lh checkpoint/

Retrieving folder contents
Processing file 17Jy9I55-bldXLmcPt-QVKA2stc8nNzLI best_train_2020.pth.tar
Processing file 1fxufBP0NS_yUdmDoYB1hHlsZCBAtFAiX best_train_2021.pth.tar
Processing file 1WHVkgB9DRm7LhIoY3yciWI3O9TeBR-bD best_train_2022.pth.tar
Retrieving folder contents completed
Building directory structure
Building directory structure completed
Downloading...
From (original): https://drive.google.com/uc?id=17Jy9I55-bldXLmcPt-QVKA2stc8nNzLI
From (redirected): https://drive.google.com/uc?id=17Jy9I55-bldXLmcPt-QVKA2stc8nNzLI&confirm=t&uuid=e4f7d70f-2970-4883-b8ee-f109d75e2ef1
To: /home/ec2-user/SageMaker/C-SFDA/checkpoint/VISDA-C/best_train_2020.pth.tar
100%|████████████████████████████████████████| 173M/173M [00:02<00:00, 63.2MB/s]
Downloading...
From (original): https://drive.google.com/uc?id=1fxufBP0NS_yUdmDoYB1hHlsZCBAtFAiX
From (redirected): https://drive.google.com/uc?id=1fxufBP0NS_yUdmDoYB1hHlsZCBAtFAiX&confirm=t&uuid=a318113e-f064-4b03-b62e-00d1cc742447
To: /home/ec2-user/S

## Step 4B: Full Training Run (4-6 hours)

**Only run this after Step 4A succeeds!**

This will take 4-6 hours. The notebook will keep running even if you close the browser.

In [None]:
# # Full training - 25 epochs (4-6 hours)
# !python main_csfda.py \
#     train_source=false \
#     seed=2020 \
#     data.dataset="VISDA-C" \
#     data.data_root="./data/" \
#     data.source_domains="[train]" \
#     data.target_domains="[validation]" \
#     data.batch_size=64 \
#     data.workers=4 \
#     model_src.arch="resnet101" \
#     model_tta.src_log_dir="./checkpoint/" \
#     learn.epochs=25 \
#     optim.lr=2e-4 \
#     multiprocessing_distributed=false \
#     use_wandb=false

## Step 4A: Quick Test Run (10-15 minutes)

**Run this first to verify everything works!**

This will do 1 epoch to test:
- Dataset loads correctly
- Checkpoint loads
- Model initializes
- Training loop runs
- Output saves properly

Once this succeeds, skip to Step 4B for full training.

In [8]:
!conda install -c conda-forge libstdcxx-ng -y
!conda install -c conda-forge pillow -y

Retrieving notices: done
Channels:
 - conda-forge
Platform: linux-64
Collecting package metadata (repodata.json): done
Solving environment: done

## Package Plan ##

  environment location: /home/ec2-user/anaconda3/envs/python3

  added / updated specs:
    - libstdcxx-ng


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    ca-certificates-2025.11.12 |       hbd8a1cb_0         149 KB  conda-forge
    certifi-2025.11.12         |     pyhd8ed1ab_0         153 KB  conda-forge
    openssl-3.6.0              |       h26f9b46_0         3.0 MB  conda-forge
    ------------------------------------------------------------
                                           Total:         3.3 MB

The following packages will be UPDATED:

  ca-certificates                      2025.10.5-hbd8a1cb_0 --> 2025.11.12-hbd8a1cb_0 
  certifi                            2025.10.5-pyhd8ed1ab_0 --> 2025.11.12-pyhd8ed1ab_0 

In [5]:
# Quick test run - just 1 epoch to verify everything works
# Use sys.executable to ensure we're using the notebook's Python with correct PyTorch
import sys
python_path = sys.executable
print(f"Using Python: {python_path}")

!{python_path} main_csfda.py \
    train_source=false \
    seed=2022 \
    data.dataset="VISDA-C" \
    data.data_root="./data/" \
    data.source_domains="[train]" \
    data.target_domains="[validation]" \
    data.batch_size=64 \
    data.workers=4 \
    model_src.arch="resnet101" \
    model_tta.src_log_dir="./checkpoint/VISDA-C" \
    learn.epochs=1 \
    optim.lr=2e-4 \
    multiprocessing_distributed=false \
    use_wandb=false

Using Python: /home/ec2-user/anaconda3/envs/python3/bin/python
[INFO] 2025-11-14 15:40:39 main_csfda.py:96 Dataset: VISDA-C, Source domains: ['train'], Target domains: ['validation'], Pipeline: target
[INFO] 2025-11-14 15:40:39 target_csfda.py:160 Start target training on train-validation...
  nn.init.orthogonal(m.weight.data)   # Initializing with orthogonal rows
[INFO] 2025-11-14 15:40:40 classifier.py:67 Loaded from ./checkpoint/VISDA-C/best_train_2022.pth.tar; missing params: []
[INFO] 2025-11-14 15:40:40 classifier.py:67 Loaded from ./checkpoint/VISDA-C/best_train_2022.pth.tar; missing params: []
[INFO] 2025-11-14 15:40:41 target_csfda.py:195 1 - Created target model
[INFO] 2025-11-14 15:40:41 target_csfda.py:49 Eval and labeling...
100%|█████████████████████████████████████████| 217/217 [01:24<00:00,  2.56it/s]
[INFO] 2025-11-14 15:42:05 target_csfda.py:82 Accuracy of direct prediction: 51.89
[INFO] 2025-11-14 15:42:05 utils.py:319 Accuracy per class: [75.45 15.31 47.85 64.89 59.

## Step 5: Check Results

In [None]:
# List output files
!find output/ -name "*.pth.tar" -o -name "*.txt" -o -name "*.yaml" | head -20

In [None]:
# Read final results (adjust path based on actual output)
!tail -50 output/VISDA-C/*/logs.txt 2>/dev/null || echo "Check output/ directory structure"

In [None]:
# Download results
!zip -r stage2_results.zip output/
from IPython.display import FileLink
FileLink('stage2_results.zip')

## Expected Results

According to the paper, on VisDA-C you should see:
- **Test Accuracy:** ~85%
- **Per-class Average:** ~83-85%

If you get similar numbers, congrats! The adaptation worked.

## Troubleshooting

### GPU Out of Memory
Reduce batch size: `data.batch_size=32` or `=16`

### Checkpoint Not Found
Check: `!ls checkpoint/` should show `best_train_2020.pth.tar`

### Dataset Not Found
Check: `!ls data/VisDA-C/` should show `train/` and `validation/` folders