This notebook:
- takes the last checkpoint model trained by `finetune-clip-tinyImageNet/finetune-eval_model_soup.ipynb`, and continue fine-tune but using a constant learning rate. 

## Setup Environment

In [11]:
LOCAL = True

# if run locally:
if LOCAL:
    ROOT_DIR = "/Users/Yang/Desktop/research-model-merge/playground/merge_soup-clip-tinyImageNet"
    DATA_DIR = f"{ROOT_DIR}/dataset"
    CODE_DIR = f"{ROOT_DIR}/src"
# on Colab
else:
    ROOT_DIR = "/content"
    DATA_DIR = "/content"
    CODE_DIR = "./clip_TinyImageNet"



In [None]:
import os, sys

sys.path.insert(0, os.path.abspath(ROOT_DIR))
sys.path.insert(0, os.path.abspath(CODE_DIR))

If use Colab, you need to save output results to google drive.

In [None]:
if not LOCAL:
    from google.colab import drive
    drive.mount('/content/drive')
    DRIVE_DIR = "drive/MyDrive/research-model_merge"

We will work under the same dir as this notebook

To copy the code to fine-tune clip on tinyImageNet, run:

In [4]:
if not LOCAL:
    !git clone https://github.com/nbzy1995/clip_TinyImageNet.git

To download tiny imagenet dataset

In [5]:
if not LOCAL:
    !wget -q http://cs231n.stanford.edu/tiny-imagenet-200.zip
    !unzip -q tiny-imagenet-200.zip

Now we created a directory called "tiny-imagenet-200" containing the dataset.


We now copy pre-computed index for the train/ folder, 90% for training, 10% for validation. The val/ folder will be used as test set.

In [6]:
if not LOCAL:
    !cp $CODE_DIR/dataset/tiny_imagenet_train_val_indices.npy /content/tiny_imagenet_train_val_indices.npy

Now we install the requirements for fine-tuning clip on tinyImageNet

In [7]:
if not LOCAL:
    !pip install --quiet --upgrade pip
    !pip install -q -r clip_TinyImageNet/requirements.txt
    print("✅ Core packages installed!")

In [8]:
# Check GPU availability and system info
import torch
import subprocess

print("🔍 System Information:")
print(f"Python version: {subprocess.check_output(['python', '--version']).decode().strip()}")
print(f"PyTorch version: {torch.__version__}")
print(f"CUDA available: {torch.cuda.is_available()}")

if torch.cuda.is_available():
    print(f"GPU device: {torch.cuda.get_device_name(0)}")
    print(f"GPU memory: {torch.cuda.get_device_properties(0).total_memory / 1e9:.1f} GB")
    print(f"CUDA version: {torch.version.cuda}")
    DEVICE = torch.device("cuda")
else:
    print("❌ No GPU available! Please enable GPU runtime in Colab.")
    print("Runtime > Change runtime type > Hardware accelerator > GPU")
    DEVICE = torch.device("cpu")

🔍 System Information:
Python version: Python 3.11.5
PyTorch version: 2.8.0
CUDA available: False
❌ No GPU available! Please enable GPU runtime in Colab.
Runtime > Change runtime type > Hardware accelerator > GPU


## Fine-tuning


In [None]:
import os
import time
from typing import List, Dict, Any, Sequence, Optional

import torch
import clip
from tqdm import tqdm
from timm.data.transforms_factory import transforms_imagenet_train

from dataset.tiny_imagenet import TinyImageNet
from src.utils import ModelWrapper, maybe_dictionarize_batch, cosine_lr


def finetune_clip(
    data_location: str = '.',
    start_checkpoint_path: str = 'x.pt',
    model_save_location: str = '.',
    batch_size: int = 256,
    workers: int = 8,
    epochs: int = 10,
    warmup_length: int = 500,
    lr: float = 2e-5,
    wd: float = 0.1,
    model_name: str = 'ViT-B/32',
    name: str = 'config1',
    timm_aug: bool = False,
    scheduler_type: str = 'cosine',  # 'cosine' or 'constant'
    save_every: int = 1,
    log_interval: int = 20,
    grad_clip: float = 1.0,
) -> Dict[str, Any]:
    """Finetune CLIP on TinyImageNet inside notebook.

    Parameters
    ----------
    data_location : str
        Root directory containing Tiny ImageNet data (expects tiny-imagenet-200 folder or dataset loader handles path).
    model_save_location : str
        Directory to save checkpoints.
    batch_size : int
        Train batch size.
    custom_template : bool
        Use simple custom prompt template instead of OpenAI ImageNet template.
    workers : int
        DataLoader worker processes.
    epochs : int
        Number of epochs.
    warmup_length : int
        Warmup steps (only for cosine scheduler).
    lr : float
        Learning rate.
    wd : float
        Weight decay.
    model_name : str
        CLIP model name passed to clip.load.
    name : str
        Base filename prefix for checkpoints.
    timm_aug : bool
        Use timm ImageNet augmentation pipeline for training.
    scheduler_type : str
        'cosine' for cosine decay after warmup, 'constant' for fixed LR.
    save_every : int
        Save checkpoint every N epochs.

    Returns
    -------
    history: dict
        train_loss per epoch, val_loss per epoch, val_acc per epoch, learning rate
    """
    os.makedirs(model_save_location, exist_ok=True)

    # # Prompt template
    # template = openai_imagenet_template

    # ---- Prepare dataset

    clip_model, preprocess = clip.load(model_name, DEVICE, jit=False)

    if timm_aug:
        train_preprocess = transforms_imagenet_train(
            img_size=clip_model.visual.input_resolution,
            mean=(0.48145466, 0.4578275, 0.40821073),
            std=(0.26862954, 0.26130258, 0.27577711),
        )
    else:
        train_preprocess = preprocess

    dset = TinyImageNet(
        eval_preprocess=preprocess,
        train_preprocess=train_preprocess,
        location=data_location,
        batch_size=batch_size,
        num_workers=workers,
    )

    num_classes = len(dset.classnames)
    feature_dim = clip_model.visual.output_dim

    # ---- Load model

    # build image classifier model from clip model
    model = ModelWrapper(clip_model, feature_dim, num_classes, normalize=True)
    for p in model.parameters():
        p.data = p.data.float()

    print(f'Loading model state_dict from {start_checkpoint_path}')
    checkpoint = torch.load(start_checkpoint_path, map_location=DEVICE)
    model.load_state_dict(checkpoint)
    model = model.to(DEVICE)

    if DEVICE.type == 'cuda' and torch.cuda.device_count() > 1:
        model = torch.nn.DataParallel(model, device_ids=list(range(torch.cuda.device_count())))

    # ---- Optimizer
    model_parameters = [p for p in model.parameters() if p.requires_grad]
    print(f"Training {sum(p.numel() for p in model_parameters):,} parameters")
    optimizer = torch.optim.AdamW(model_parameters, lr=lr, weight_decay=wd)

    # ---- LR scheduler
    num_batches = len(dset.train_loader)
    if scheduler_type == 'cosine':
        scheduler = cosine_lr(optimizer, lr, warmup_length, epochs * num_batches)
    elif scheduler_type == 'constant':
        def scheduler(step):
            return  # no-op
    else:
        raise ValueError("scheduler_type must be 'cosine' or 'constant'")
    
    # ---- Loss function
    loss_fn = torch.nn.CrossEntropyLoss()

    # ---- Training loop
    history = {
        'train_loss': [],
        'val_loss': [],
        'val_acc': [],
        # 'delta_w': [], # magnitude of weights updates this epoch
    }

    # # Save initial weights (epoch 0 pre-training state)
    # model_path = os.path.join(model_save_location, f'{name}_0.pt')
    # model_state = model.module.state_dict() if hasattr(model, 'module') else model.state_dict()
    # torch.save(model_state, model_path)
    # checkpoints.append(model_path)
    # print('Saved initial model to', model_path)

    for epoch in range(epochs):
        model.train()
        end = time.time()
        for i, batch in enumerate(dset.train_loader):
            step = i + epoch * num_batches
            scheduler(step)
            optimizer.zero_grad()
            batch = maybe_dictionarize_batch(batch)
            inputs, labels = batch['images'].to(DEVICE), batch['labels'].to(DEVICE)
            data_time = time.time() - end

            logits = model(inputs)
            loss = loss_fn(logits, labels)
            loss.backward()
            torch.nn.utils.clip_grad_norm_(model.parameters(), grad_clip)
            
            optimizer.step()

            batch_time = time.time() - end
            end = time.time()

            history['train_loss'].append(float(loss.item()))

            if i % log_interval == 0:
                percent_complete = 100.0 * i / num_batches
                current_lr = optimizer.param_groups[0]['lr']
                print(
                    f"Epoch {epoch} [{percent_complete:5.1f}% {i}/{num_batches}] \t"
                    f"Loss {loss.item():.4f} \tLR {current_lr:.4g} \tData {data_time:.3f}s Batch {batch_time:.3f}s",
                    flush=True,
                )

        # ---- Eval on Val set
        model.eval()
        with torch.no_grad():
            print('*'*80)
            print('Starting eval on validation split')
            correct, count = 0.0, 0.0
            val_loss_accum = 0.0
            pbar = tqdm(dset.val_loader, desc=f'Val Epoch {epoch}')
            for batch in pbar:
                batch = maybe_dictionarize_batch(batch)
                inputs, labels = batch['images'].to(DEVICE), batch['labels'].to(DEVICE)
                logits = model(inputs)
                loss = loss_fn(logits, labels)
                val_loss_accum += loss.item() * len(labels)

                pred = logits.argmax(dim=1, keepdim=True)
                correct += pred.eq(labels.view_as(pred)).sum().item()
                count += len(logits)
                pbar.set_description(
                    f"Val loss: {loss.item():.4f}   Acc: {100*correct/count:.2f}")
                
            top1 = correct / count
            val_loss_mean = val_loss_accum / count
        history['val_acc'].append(top1)
        history['val_loss'].append(val_loss_mean)
        print(f'Val acc at epoch {epoch}: {100*top1:.2f}% | Val loss: {val_loss_mean:.4f}')

        # if (epoch + 1) % save_every == 0 or epoch == epochs - 1:
        model_path = os.path.join(model_save_location, f'{name}_{epoch + 1}.pt')
        model_state = model.module.state_dict() if hasattr(model, 'module') else model.state_dict()
        torch.save(model_state, model_path)
        print('Saved model to', model_path)

    result = {
        'history': history,
        'config': {
            'data_location': data_location,
            'model_save_location': model_save_location,
            'batch_size': batch_size,
            'workers': workers,
            'epochs': epochs,
            'warmup_length': warmup_length,
            'lr': lr,
            'wd': wd,
            'model_name': model_name,
            'name': name,
            'timm_aug': timm_aug,
            'scheduler_type': scheduler_type,
        },
    }
    return result

In [None]:
# We continue the training using const learning rate, from the previous cosline lr trained checkpoints, with same hyper param.
checkpt_dir = f"{DRIVE_DIR}/checkpoints" if not LOCAL else './checkpoints'
configs = [
    dict(lr=3e-5, wd=0.1, name='config1_10', start_checkpoint_path=f"{checkpt_dir}/config1_10.pt"),
    dict(lr=1e-5, wd=0.1, name='config2_10', start_checkpoint_path=f'{checkpt_dir}/config2_10.pt'),
    dict(lr=3e-6, wd=0.1, name='config3_10', start_checkpoint_path=f'{checkpt_dir}/config3_10.pt'),
    dict(lr=2e-5, wd=1e-3, name='config4_10', start_checkpoint_path=f'{checkpt_dir}/config4_10.pt'),
    dict(lr=1e-6, wd=1e-4, name='config5_10', start_checkpoint_path=f'{checkpt_dir}/config5_10.pt'),
]

common = dict(
    data_location=DATA_DIR,
    model_save_location= checkpt_dir,
    batch_size=256,
    epochs=10,
    workers=2,
    scheduler_type='constant',  # change to 'constant' for constant LR
)


for config in configs:
    run_config = {**common, **config}
    print(f"Running with config: {run_config['name']}")
    print(run_config)
    result = finetune_clip(**run_config)

    print("✅ Config run completed and backed up to Drive!")

Starting run with config: {'lr': 3e-05, 'wd': 0.1, 'name': 'config1_10', 'start_checkpoint_path': 'checkpoints/config1_10.pt'}
Run config: {'data_location': '/Users/Yang/Desktop/research-model-merge/playground/merge_soup-clip-tinyImageNet/dataset', 'model_save_location': 'checkpoints', 'batch_size': 256, 'epochs': 10, 'workers': 8, 'scheduler_type': 'constant', 'lr': 3e-05, 'wd': 0.1, 'name': 'config1_10', 'start_checkpoint_path': 'checkpoints/config1_10.pt'}
Loading model state_dict from checkpoints/config1_10.pt


KeyboardInterrupt: 

In [None]:
# import pandas as pd
# summary = []
# for r in sweep_results:
#     cfg = r['config']
#     summary.append({
#         'name': cfg['name'],
#         'lr': cfg['lr'],
#         'wd': cfg['wd'],
#         'timm_aug': cfg['timm_aug'],
#         'final_val_acc': r['val_acc_final'],
#         'final_ckpt': r['final_checkpoint']
#     })
# df = pd.DataFrame(summary)
# display(df)
# df

We use the same setup as `finetune-eval_model_soup.ipynb`.
Hyperparameter Configurations:
1. **Config 1**: lr=3e-5, wd=0.1, epochs=10, batch_size=256
2. **Config 2**: lr=1e-5, wd=0.1, epochs=10, batch_size=256
3. **Config 3**: lr=3e-6, wd=0.1, epochs=10, batch_size=256
4. **Config 4**: lr=2e-5, wd=1e-3, epochs=10, batch_size=256
5. **Config 5**: lr=1e-6, wd=1e-4, epochs=10, batch_size=256



In [None]:
# Configuration 1: lr=3e-5, wd=0.1, epochs=10, batch_size=256, timm_aug=False
!python $CODE_DIR/finetune.py --lr 3e-5 --wd 0.1 --epochs 10 --batch-size 256 --data-location $DATA_DIR --name "config1"

# Backup model to Google Drive
!cp config1_*.pt "/content/drive/MyDrive/Colab Notebooks/"
print("✅ Configuration 1 completed and backed up to Drive!")

Building zero-shot classifier.
100% 200/200 [00:10<00:00, 18.22it/s]
Saving model to ./config1_0.pt
Train Epoch: 0 [0% 0/352]	Loss: 1.442926	Data (t) 11.733	Batch (t) 15.286
Train Epoch: 0 [6% 20/352]	Loss: 1.261179	Data (t) 2.040	Batch (t) 2.307
Train Epoch: 0 [11% 40/352]	Loss: 0.963566	Data (t) 1.796	Batch (t) 2.055
Train Epoch: 0 [17% 60/352]	Loss: 1.013636	Data (t) 2.045	Batch (t) 2.076
Train Epoch: 0 [23% 80/352]	Loss: 0.916063	Data (t) 2.007	Batch (t) 2.073
Train Epoch: 0 [28% 100/352]	Loss: 0.836356	Data (t) 1.816	Batch (t) 2.080
Train Epoch: 0 [34% 120/352]	Loss: 1.102863	Data (t) 2.034	Batch (t) 2.066
Train Epoch: 0 [40% 140/352]	Loss: 0.925082	Data (t) 2.032	Batch (t) 2.065
Train Epoch: 0 [45% 160/352]	Loss: 1.065236	Data (t) 2.039	Batch (t) 2.070
Train Epoch: 0 [51% 180/352]	Loss: 0.848087	Data (t) 2.037	Batch (t) 2.069
Train Epoch: 0 [57% 200/352]	Loss: 0.978953	Data (t) 1.815	Batch (t) 1.847
Train Epoch: 0 [62% 220/352]	Loss: 0.807766	Data (t) 1.821	Batch (t) 1.855
Train 

In [None]:
# Configuration 2: lr=1e-5, wd=0.1, epochs=10, batch_size=256, timm_aug=False
!python $CODE_DIR/finetune.py --lr 1e-5 --wd 0.1 --epochs 10 --batch-size 256 --data-location $DATA_DIR --name "config2"

# Backup model to Google Drive
!cp config2_*.pt "/content/drive/MyDrive/Colab Notebooks/"
print("✅ Configuration 2 completed and backed up to Drive!")

Building zero-shot classifier.
100% 200/200 [00:11<00:00, 17.68it/s]
Saving model to ./config2_0.pt
Train Epoch: 0 [0% 0/352]	Loss: 1.667826	Data (t) 14.185	Batch (t) 20.316
Train Epoch: 0 [6% 20/352]	Loss: 1.226475	Data (t) 1.959	Batch (t) 2.005
Train Epoch: 0 [11% 40/352]	Loss: 1.347273	Data (t) 2.024	Batch (t) 2.088
Train Epoch: 0 [17% 60/352]	Loss: 1.134597	Data (t) 2.014	Batch (t) 2.060
Train Epoch: 0 [23% 80/352]	Loss: 0.934149	Data (t) 2.039	Batch (t) 2.070
Train Epoch: 0 [28% 100/352]	Loss: 0.970417	Data (t) 2.030	Batch (t) 2.078
Train Epoch: 0 [34% 120/352]	Loss: 1.104056	Data (t) 2.029	Batch (t) 2.062
Train Epoch: 0 [40% 140/352]	Loss: 1.082346	Data (t) 2.034	Batch (t) 2.065
Train Epoch: 0 [45% 160/352]	Loss: 0.980819	Data (t) 2.032	Batch (t) 2.063
Train Epoch: 0 [51% 180/352]	Loss: 0.824228	Data (t) 2.032	Batch (t) 2.063
Train Epoch: 0 [57% 200/352]	Loss: 1.084069	Data (t) 2.031	Batch (t) 2.061
Train Epoch: 0 [62% 220/352]	Loss: 1.013503	Data (t) 2.032	Batch (t) 2.063
Train 

In [None]:
# Configuration 3: lr=3e-6, wd=0.1, epochs=10, batch_size=256, timm_aug=False
!python $CODE_DIR/finetune.py --lr 3e-6 --wd 0.1 --epochs 10 --batch-size 256 --data-location $DATA_DIR --name "config3"

# Backup model to Google Drive
!cp config3_*.pt "/content/drive/MyDrive/Colab Notebooks/"
print("✅ Configuration 3 completed and backed up to Drive!")

Building zero-shot classifier.
100% 200/200 [00:11<00:00, 17.26it/s]
Saving model to ./config3_0.pt
Train Epoch: 0 [0% 0/352]	Loss: 1.600170	Data (t) 10.423	Batch (t) 17.322
Train Epoch: 0 [6% 20/352]	Loss: 1.583610	Data (t) 2.006	Batch (t) 2.053
Train Epoch: 0 [11% 40/352]	Loss: 1.211902	Data (t) 2.009	Batch (t) 2.065
Train Epoch: 0 [17% 60/352]	Loss: 1.175887	Data (t) 2.009	Batch (t) 2.053
Train Epoch: 0 [23% 80/352]	Loss: 1.156675	Data (t) 2.027	Batch (t) 2.253
Train Epoch: 0 [28% 100/352]	Loss: 1.083810	Data (t) 2.030	Batch (t) 2.061
Train Epoch: 0 [34% 120/352]	Loss: 1.012528	Data (t) 2.031	Batch (t) 2.299
Train Epoch: 0 [40% 140/352]	Loss: 1.187462	Data (t) 2.035	Batch (t) 2.067
Train Epoch: 0 [45% 160/352]	Loss: 0.988308	Data (t) 2.035	Batch (t) 2.067
Train Epoch: 0 [51% 180/352]	Loss: 1.009607	Data (t) 2.036	Batch (t) 2.069
Train Epoch: 0 [57% 200/352]	Loss: 0.782348	Data (t) 2.033	Batch (t) 2.064
Train Epoch: 0 [62% 220/352]	Loss: 1.010485	Data (t) 2.034	Batch (t) 2.070
Train 

In [None]:
# Configuration 4: lr=2e-5, wd=1e-3, epochs=10, batch_size=256, timm_aug=True
!python $CODE_DIR/finetune.py --lr 2e-5 --wd 1e-3 --epochs 10 --batch-size 256 --timm-aug --data-location $DATA_DIR --name "config4"

# Backup model to Google Drive
!cp config4_*.pt "/content/drive/MyDrive/Colab Notebooks/"
print("✅ Configuration 4 completed and backed up to Drive!")

Building zero-shot classifier.
100% 200/200 [00:12<00:00, 16.02it/s]
Saving model to ./config4_0.pt
Train Epoch: 0 [0% 0/352]	Loss: 3.308727	Data (t) 15.836	Batch (t) 23.652
Train Epoch: 0 [6% 20/352]	Loss: 2.737835	Data (t) 1.325	Batch (t) 1.400
Train Epoch: 0 [11% 40/352]	Loss: 2.778366	Data (t) 1.858	Batch (t) 2.029
Train Epoch: 0 [17% 60/352]	Loss: 2.296328	Data (t) 1.971	Batch (t) 2.070
Train Epoch: 0 [23% 80/352]	Loss: 2.271421	Data (t) 1.982	Batch (t) 2.036
Train Epoch: 0 [28% 100/352]	Loss: 2.170868	Data (t) 2.007	Batch (t) 2.054
Train Epoch: 0 [34% 120/352]	Loss: 2.146047	Data (t) 1.986	Batch (t) 2.038
Train Epoch: 0 [40% 140/352]	Loss: 1.899604	Data (t) 1.807	Batch (t) 1.855
Train Epoch: 0 [45% 160/352]	Loss: 2.142196	Data (t) 1.999	Batch (t) 2.049
Train Epoch: 0 [51% 180/352]	Loss: 2.240858	Data (t) 1.990	Batch (t) 2.039
Train Epoch: 0 [57% 200/352]	Loss: 2.136600	Data (t) 1.904	Batch (t) 1.955
Train Epoch: 0 [62% 220/352]	Loss: 2.052347	Data (t) 1.987	Batch (t) 2.036
Train 

In [None]:
# Configuration 5: lr=1e-6, wd=1e-4, epochs=10, batch_size=256, timm_aug=False
!python $CODE_DIR/finetune.py --lr 1e-6 --wd 1e-4 --epochs 10 --batch-size 256 --data-location $DATA_DIR --name "config5"

# Backup model to Google Drive
!cp config5_*.pt "/content/drive/MyDrive/Colab Notebooks/"
print("✅ Configuration 5 completed and backed up to Drive!")

Building zero-shot classifier.
100% 200/200 [00:12<00:00, 16.65it/s]
Saving model to ./config5_0.pt
Train Epoch: 0 [0% 0/352]	Loss: 1.430876	Data (t) 13.988	Batch (t) 19.614
Train Epoch: 0 [6% 20/352]	Loss: 1.671773	Data (t) 2.032	Batch (t) 2.335
Train Epoch: 0 [11% 40/352]	Loss: 1.435908	Data (t) 2.023	Batch (t) 2.070
Train Epoch: 0 [17% 60/352]	Loss: 1.195043	Data (t) 2.033	Batch (t) 2.081
Train Epoch: 0 [23% 80/352]	Loss: 1.295531	Data (t) 2.033	Batch (t) 2.066
Train Epoch: 0 [28% 100/352]	Loss: 1.165548	Data (t) 1.821	Batch (t) 1.852
Train Epoch: 0 [34% 120/352]	Loss: 1.359950	Data (t) 1.803	Batch (t) 1.834
Train Epoch: 0 [40% 140/352]	Loss: 1.352595	Data (t) 1.812	Batch (t) 1.844
Train Epoch: 0 [45% 160/352]	Loss: 1.147962	Data (t) 1.799	Batch (t) 1.831
Train Epoch: 0 [51% 180/352]	Loss: 1.303125	Data (t) 2.039	Batch (t) 2.071
Train Epoch: 0 [57% 200/352]	Loss: 0.973293	Data (t) 1.796	Batch (t) 1.830
Train Epoch: 0 [62% 220/352]	Loss: 1.075627	Data (t) 2.036	Batch (t) 2.067
Train 