<a href="https://colab.research.google.com/github/isabelmoore/csce633_machine_learning/blob/main/csce633_project_template.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Contrastive Language-Image Pretraining with SogCLR

### **Introduction**

In this tutorial, you will learn how to conduct contrastive language-image pretraining by optimizing the [Global Contrastive Loss](https://arxiv.org/abs/2202.12387) (GCL) on a subset of the [Conceptual Captions](https://ai.google.com/research/ConceptualCaptions/) dataset. Also, you will learn how to evaluate the model on retrieval task using the [MSCOCO](https://cocodataset.org/#home) dataset and zero-shot classification task using the [ImageNet](https://www.image-net.org/challenges/LSVRC/index.php) dataset. The code is based on [iSogCLR's](https://github.com/zhqiu/contrastive-learning-iSogCLR) codebase, which includes the implementation of CLIP, SogCLR and iSogCLR.

### Preparation

First, we:

1. Download the source code and data
2. Install required packages

In [None]:
!git clone -b project https://github.com/xywei00/csce689_iSogCLR.git iSogCLR

!export PYTHONPATH="$PYTHONPATH:./iSogCLR/bimodal_exps"
!export HUGGINGFACE_HUB_CACHE='./checkpoints/huggingface'
!mkdir checkpoints

!gdown 142xxRoMaHxX3BIfCw_1b_G_dgu-02Yq3    # clip_train.tar.gz
!gdown 142zQjlOw0Xw4tKzXMrQjYE6NtGRTeasT    # cc3m_subset_100k.tar.gz
!gdown 142tMsnclHTTPpnTXHSeNgTUlBk4She6o    # ms_coco_val.tar.gz
!gdown 1NXhfhwFy-nhdABACkodgYqm9pomDKE39    # val.tar

!mkdir datasets
!mkdir -p datasets/imagenet
!tar xf clip_train.tar.gz
!tar xf cc3m_subset_100k.tar.gz -C datasets
!tar xf mscoco_val.tar.gz -C datasets
!tar xf val.tar -C datasets/imagenet

!pip install -r ./iSogCLR/requirements_colab.txt    # there may be pip warnings/ errors, should be fine to ignore them

Cloning into 'iSogCLR'...
remote: Enumerating objects: 314, done.[K
remote: Counting objects: 100% (25/25), done.[K
remote: Compressing objects: 100% (12/12), done.[K
remote: Total 314 (delta 17), reused 13 (delta 13), pack-reused 289 (from 2)[K
Receiving objects: 100% (314/314), 152.51 KiB | 21.79 MiB/s, done.
Resolving deltas: 100% (145/145), done.
Downloading...
From: https://drive.google.com/uc?id=142xxRoMaHxX3BIfCw_1b_G_dgu-02Yq3
To: /content/clip_train.tar.gz
100% 4.06M/4.06M [00:00<00:00, 250MB/s]
Downloading...
From (original): https://drive.google.com/uc?id=142zQjlOw0Xw4tKzXMrQjYE6NtGRTeasT
From (redirected): https://drive.google.com/uc?id=142zQjlOw0Xw4tKzXMrQjYE6NtGRTeasT&confirm=t&uuid=5731e7b2-ecc7-460a-8bbf-fdab8ae8e21b
To: /content/cc3m_subset_100k.tar.gz
100% 3.07G/3.07G [00:37<00:00, 81.5MB/s]
Downloading...
From (original): https://drive.google.com/uc?id=142tMsnclHTTPpnTXHSeNgTUlBk4She6o
From (redirected): https://drive.google.com/uc?id=142tMsnclHTTPpnTXHSeNgTUlBk4

In [None]:
!gcloud auth login
!gcloud compute ssh instance-20251111-213126 --project=atomic-bird-475203-g6 --zone=us-central1-c --troubleshoot --tunnel-through-iap
!gcloud compute ssh --zone "us-central1-c" "instance-20251111-213126" --project "atomic-bird-475203-g6"

Go to the following link in your browser, and complete the sign-in prompts:

    https://accounts.google.com/o/oauth2/auth?response_type=code&client_id=32555940559.apps.googleusercontent.com&redirect_uri=https%3A%2F%2Fsdk.cloud.google.com%2Fauthcode.html&scope=openid+https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fuserinfo.email+https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fcloud-platform+https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fappengine.admin+https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fsqlservice.login+https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fcompute+https%3A%2F%2Fwww.googleapis.com%2Fauth%2Faccounts.reauth&state=wHeD2OFXonMaAfiqkjjTP6qFq88SCB&prompt=consent&token_usage=remote&access_type=offline&code_challenge=COuZnhb7yOUuzrRT8c0kNvI3FhBwWsd_rHsQLy29SL8&code_challenge_method=S256

Once finished, enter the verification code provided in your browser: 4/0Ab32j90DhFAsDHTsh916Vt9nP1Wpi_oXCH4c2vp9VJd2I4vqU0i287gCxxBOsWX_XknkDA

You are now logged in as [isabelmoore717@gmail.com].
Your current proje

In [None]:
from google.colab import drive
drive.mount('/content/drive')

import os
project_root = '/content/drive/MyDrive/csce633/project'
os.makedirs(project_root, exist_ok=True)
%cd $project_root

Mounted at /content/drive
/content/drive/MyDrive/csce633/project


In [None]:

if not os.path.exists("iSogCLR"):
    !git clone -b project https://github.com/xywei00/csce689_iSogCLR.git iSogCLR

if not os.path.exists("checkpoints"):
    !mkdir -p checkpoints

if not os.path.exists("datasets/imagenet"):
    !mkdir -p datasets/imagenet

files = {
    "clip_train.tar.gz": "142xxRoMaHxX3BIfCw_1b_G_dgu-02Yq3",
    "cc3m_subset_100k.tar.gz": "142zQjlOw0Xw4tKzXMrQjYE6NtGRTeasT",
    "mscoco_val.tar.gz": "142tMsnclHTTPpnTXHSeNgTUlBk4She6o",
    "val.tar": "1NXhfhwFy-nhdABACkodgYqm9pomDKE39"
}

for fname, fid in files.items():
    if not os.path.exists(fname):
        !gdown {fid}

if not os.path.exists("clip_train"):
    !tar xf clip_train.tar.gz

if not os.path.exists("datasets/cc3m_subset_100k"):
    !tar xf cc3m_subset_100k.tar.gz -C datasets

if not os.path.exists("datasets/ms_coco_val"):
    !tar xf mscoco_val.tar.gz -C datasets

if not os.path.exists("datasets/imagenet/val"):
    !tar xf val.tar -C datasets/imagenet


Cloning into 'iSogCLR'...
remote: Enumerating objects: 314, done.[K
remote: Counting objects: 100% (25/25), done.[K
remote: Compressing objects: 100% (12/12), done.[K
remote: Total 314 (delta 17), reused 13 (delta 13), pack-reused 289 (from 2)[K
Receiving objects: 100% (314/314), 152.51 KiB | 5.08 MiB/s, done.
Resolving deltas: 100% (145/145), done.
Downloading...
From: https://drive.google.com/uc?id=142xxRoMaHxX3BIfCw_1b_G_dgu-02Yq3
To: /content/drive/MyDrive/csce633/project/clip_train.tar.gz
100% 4.06M/4.06M [00:00<00:00, 231MB/s]
Downloading...
From (original): https://drive.google.com/uc?id=142zQjlOw0Xw4tKzXMrQjYE6NtGRTeasT
From (redirected): https://drive.google.com/uc?id=142zQjlOw0Xw4tKzXMrQjYE6NtGRTeasT&confirm=t&uuid=ce6340c1-630c-4580-be5c-8da86af2533e
To: /content/drive/MyDrive/csce633/project/cc3m_subset_100k.tar.gz
100% 3.07G/3.07G [00:14<00:00, 218MB/s]
Downloading...
From (original): https://drive.google.com/uc?id=142tMsnclHTTPpnTXHSeNgTUlBk4She6o
From (redirected): h

In [None]:
%env PYTHONPATH=$PYTHONPATH:./iSogCLR/bimodal_exps
%env HUGGINGFACE_HUB_CACHE=./checkpoints/huggingface

!pip install -r ./iSogCLR/requirements_colab.txt

## CLIP Training

In [None]:
epochs = 2
ita_type = "clip"

optimizer_type = "adamw"
!CUDA_VISIBLE_DEVICES=0 python ./iSogCLR/bimodal_exps/clip.py \
    --data_path ./datasets \
    --ann_path ./clip_train \
    --train_file cc3m_train_subset.json \
    --train_image_root cc3m_subset_100k \
    --output_dir {project_root}/output/{optimizer_type}/{ita_type}_cc3m_g0.8_e30 \
    --init_model \
    --use_amp \
    --ita_type {ita_type} \
    --tau_init 0.01 \
    --sogclr_gamma 0.8 \
    --eta_init 0.03 --sched cosine \
    --no-distributed \
    --epochs {epochs} \
    --opt {optimizer_type}

optimizer_type = "adam"
!CUDA_VISIBLE_DEVICES=0 python ./iSogCLR/bimodal_exps/clip.py \
    --data_path ./datasets \
    --ann_path ./clip_train \
    --train_file cc3m_train_subset.json \
    --train_image_root cc3m_subset_100k \
    --output_dir {project_root}/output/{optimizer_type}/{ita_type}_cc3m_g0.8_e30 \
    --init_model \
    --use_amp \
    --ita_type {ita_type} \
    --tau_init 0.01 \
    --sogclr_gamma 0.8 \
    --eta_init 0.03 --sched cosine \
    --no-distributed \
    --epochs {epochs} \
    --opt {optimizer_type}

optimizer_type = "momentum"
!CUDA_VISIBLE_DEVICES=0 python ./iSogCLR/bimodal_exps/clip.py \
    --data_path ./datasets \
    --ann_path ./clip_train \
    --train_file cc3m_train_subset.json \
    --train_image_root cc3m_subset_100k \
    --output_dir {project_root}/output/{optimizer_type}/{ita_type}_cc3m_g0.8_e30 \
    --init_model \
    --use_amp \
    --ita_type {ita_type} \
    --tau_init 0.01 \
    --sogclr_gamma 0.8 \
    --eta_init 0.03 --sched cosine \
    --no-distributed \
    --epochs {epochs} \
    --opt {optimizer_type}


2025-11-11 20:51:26.306765: I tensorflow/core/util/port.cc:153] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2025-11-11 20:51:26.324855: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:467] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
E0000 00:00:1762894286.346052   14017 cuda_dnn.cc:8579] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
E0000 00:00:1762894286.352494   14017 cuda_blas.cc:1407] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
W0000 00:00:1762894286.369324   14017 computation_placer.cc:177] computation placer already registered. Please check linkage and avoid linking 

### SogCLR Training

The following command runs the training script to train a ResNet50 (pretrained on ImageNet) and a DistilBERT (pretrained on BookCorpus and English Wikipedia) on the cc3m dataset using the SogCLR loss for 30 epochs with temperature 0.01.

In [None]:
epochs = 2
ita_type = "sogclr"

optimizer_type = "adamw"
!CUDA_VISIBLE_DEVICES=0 python ./iSogCLR/bimodal_exps/clip.py \
    --data_path ./datasets \
    --ann_path ./clip_train \
    --train_file cc3m_train_subset.json \
    --train_image_root cc3m_subset_100k \
    --output_dir {project_root}/output/{optimizer_type}/{ita_type}_cc3m_g0.8_e30 \
    --init_model \
    --use_amp \
    --ita_type {ita_type} \
    --tau_init 0.01 \
    --sogclr_gamma 0.8 \
    --eta_init 0.03 --sched cosine \
    --no-distributed \
    --epochs {epochs} \
    --opt {optimizer_type}

optimizer_type = "adam"
!CUDA_VISIBLE_DEVICES=0 python ./iSogCLR/bimodal_exps/clip.py \
    --data_path ./datasets \
    --ann_path ./clip_train \
    --train_file cc3m_train_subset.json \
    --train_image_root cc3m_subset_100k \
    --output_dir {project_root}/output/{optimizer_type}/{ita_type}_cc3m_g0.8_e30 \
    --init_model \
    --use_amp \
    --ita_type {ita_type} \
    --tau_init 0.01 \
    --sogclr_gamma 0.8 \
    --eta_init 0.03 --sched cosine \
    --no-distributed \
    --epochs {epochs} \
    --opt {optimizer_type}

optimizer_type = "momentum"
!CUDA_VISIBLE_DEVICES=0 python ./iSogCLR/bimodal_exps/clip.py \
    --data_path ./datasets \
    --ann_path ./clip_train \
    --train_file cc3m_train_subset.json \
    --train_image_root cc3m_subset_100k \
    --output_dir {project_root}/output/{optimizer_type}/{ita_type}_cc3m_g0.8_e30 \
    --init_model \
    --use_amp \
    --ita_type {ita_type} \
    --tau_init 0.01 \
    --sogclr_gamma 0.8 \
    --eta_init 0.03 --sched cosine \
    --no-distributed \
    --epochs {epochs} \
    --opt {optimizer_type}


2025-11-11 18:26:40.421211: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:467] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
E0000 00:00:1762885600.441278   16299 cuda_dnn.cc:8579] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
E0000 00:00:1762885600.447374   16299 cuda_blas.cc:1407] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
W0000 00:00:1762885600.463685   16299 computation_placer.cc:177] computation placer already registered. Please check linkage and avoid linking the same target more than once.
W0000 00:00:1762885600.463708   16299 computation_placer.cc:177] computation placer already registered. Please check linkage and avoid linking the same target more than once.
W0000 00:00:1762885600.463711   16299 computation_placer.cc:177] computation placer alr

## iSogCLR Training

In [None]:
epochs = 2
ita_type = "isogclr_new_v2"

optimizer_type = "adamw"
!CUDA_VISIBLE_DEVICES=0 python ./iSogCLR/bimodal_exps/clip.py \
    --data_path ./datasets \
    --ann_path ./clip_train \
    --train_file cc3m_train_subset.json \
    --train_image_root cc3m_subset_100k \
    --output_dir {project_root}/output/{optimizer_type}/{ita_type}_cc3m_g0.8_e30 \
    --init_model \
    --use_amp \
    --ita_type {ita_type} \
    --tau_init 0.01 \
    --sogclr_gamma 0.8 \
    --eta_init 0.03 --sched cosine \
    --no-distributed \
    --epochs {epochs} \
    --opt {optimizer_type}

optimizer_type = "adam"
!CUDA_VISIBLE_DEVICES=0 python ./iSogCLR/bimodal_exps/clip.py \
    --data_path ./datasets \
    --ann_path ./clip_train \
    --train_file cc3m_train_subset.json \
    --train_image_root cc3m_subset_100k \
    --output_dir {project_root}/output/{optimizer_type}/{ita_type}_cc3m_g0.8_e30 \
    --init_model \
    --use_amp \
    --ita_type {ita_type} \
    --tau_init 0.01 \
    --sogclr_gamma 0.8 \
    --eta_init 0.03 --sched cosine \
    --no-distributed \
    --epochs {epochs} \
    --opt {optimizer_type}

optimizer_type = "momentum"
!CUDA_VISIBLE_DEVICES=0 python ./iSogCLR/bimodal_exps/clip.py \
    --data_path ./datasets \
    --ann_path ./clip_train \
    --train_file cc3m_train_subset.json \
    --train_image_root cc3m_subset_100k \
    --output_dir {project_root}/output/{optimizer_type}/{ita_type}_cc3m_g0.8_e30 \
    --init_model \
    --use_amp \
    --ita_type {ita_type} \
    --tau_init 0.01 \
    --sogclr_gamma 0.8 \
    --eta_init 0.03 --sched cosine \
    --no-distributed \
    --epochs {epochs} \
    --opt {optimizer_type}


### Evaluation

The following command runs the evaluation script to evaluate the retrieval performance of the trained model on the MSCOCO validation dataset and the zero-shot classification performance on the ImageNet validation dataset. The evaluation command is obtained by appending `--evaluate --checkpoint /path/to/your/checkpoint --zs_dataset imagenet --zs_datafolder /path/to/imagenet/val` to the training command.

In [None]:
for ita_type in ["clip", "sogclr", "isogclr_new_v2"]:
    for optimizer_type in ["adamw", "adam", "momentum"]:
        print(f"=== Running {ita_type} with {optimizer_type} ===")
        !CUDA_VISIBLE_DEVICES=0 python ./iSogCLR/bimodal_exps/clip.py \
            --data_path ./datasets \
            --ann_path ./clip_train \
            --train_file cc3m_train_subset.json \
            --train_image_root cc3m_subset_100k \
            --output_dir {project_root}/output/{optimizer_type}/{ita_type}_cc3m_g0.8_e30 \
            --init_model \
            --use_amp \
            --ita_type {ita_type} \
            --tau_init 0.01 \
            --sogclr_gamma 0.8 \
            --eta_init 0.03 --sched cosine \
            --no-distributed \
            --epochs {epochs} \
            --opt {optimizer_type}

In [None]:
import matplotlib.pyplot as plt
import json

ita_type = "clip"
optimizers = ["adamw", "adam", "momentum"]

plt.figure(figsize=(7,5))

for opt in optimizers:
    log_path = f"{project_root}/output/{opt}/{ita_type}_cc3m_g0.8_e30/coco_log.txt"
    records = []
    with open(log_path, "r") as f:
        for line in f:
            line = line.strip()
            if not line or line.startswith("best epoch"):
                continue
            records.append(json.loads(line))
    epochs = [r.get("epoch", i) for i, r in enumerate(records)]
    loss_ita = [float(r.get("train_loss_ita", 0)) for r in records]
    plt.plot(epochs, loss_ita, marker="o", label=opt.upper())

plt.xlabel("Epoch")
plt.ylabel("train_loss_ita")
plt.title(f"{ita_type.upper()} — Optimizer Comparison")
plt.legend()
plt.grid(True)
plt.show()


### Benchmarks

The following results are recall at 1 results on the provided MSCOCO and ImageNet datasets. The first row of results are from the model trained using the CLIP loss, and the second row of results are from the model trained using the SogCLR loss. All results are based on a batch size of 128 for 30-epoch pretraining. IR@1 denotes the recall at 1 of image retrieval on MSCOCO, TR@1 denotes the recall at 1 of text retrieval on MSCOCO, and ACC@1 denotes the top 1 accuracy on ImageNet. Average denotes the average of the three metrics.

| Method | MSCOCO TR@1 | MSCOCO IR@1 | ImageNet ACC@1 | Average |
|:----------:|:--------:|:--------:|:--------:|:--------:|
| CLIP | 12.0 | 9.32 | 21.35 | 14.22 |
| SogCLR |  14.38  |  10.73  | 24.54 | 16.55 |