# **Self-Supervised Learning with SogCLR**

**Author**: Zhuoning Yuan



### **Introduction**

In this tutorial, you will learn how to train a self-supervised model by optimizing [Global Contrastive Loss](https://arxiv.org/abs/2202.12387) (GCL) on CIFAR10/CIFAR100. The original GCL was implementated in Tensorflow and run in TPUs. This tutorial re-implements GCL in PyTorch and GPUs based on [moco's](https://github.com/facebookresearch/moco) codebase. We recommend users to run this notebook on a GPU-enabled environment, e.g., [Google Colab](https://colab.research.google.com/). 

### **Preparation**

Download [source code](https://github.com/Optimization-AI/SogCLR/tree/cifar) and extract it to local folder

In [None]:
#%% mount drive
from google.colab import drive
drive.mount('/content/gdrive')

Mounted at /content/gdrive


In [None]:
import os
path='/content/gdrive/MyDrive/CSCE_689_Optimization_Project' 
os.chdir(path)
print(os.getcwd())

/content/gdrive/.shortcut-targets-by-id/1SVUbV5-MTm32TAMzKmmJcktD_v7coGlR/CSCE_689_Optimization_Project


In [None]:
#!git clone -b cifar https://github.com/Optimization-AI/SogCLR.git
#!mv ./SogCLR/* ./
#!rm -r /content/gdrive/MyDrive/CSCE_689_Optimization_Project/SogCLR

Cloning into 'SogCLR'...
remote: Enumerating objects: 193, done.[K
remote: Counting objects: 100% (95/95), done.[K
remote: Compressing objects: 100% (79/79), done.[K
remote: Total 193 (delta 41), reused 51 (delta 11), pack-reused 98[K
Receiving objects: 100% (193/193), 928.45 KiB | 7.43 MiB/s, done.
Resolving deltas: 100% (61/61), done.
rm: cannot remove '/content/SogCLR': No such file or directory


### **Self-Supervised Pre-Training on CIFAR10/CIFAR100**


Below are two examples for self-supervised pre-training of a ResNet-50 model on CIFAR10 on a single GPU. The first time you run the scripts, datasets will be automatically downloaded to `/data/`. 
- By default, we use linear learning rate scaling, e.g., $\text{LearningRate}=1.0\times\text{BatchSize}/256$, [LARS](https://arxiv.org/abs/1708.03888) optimizer and a weight decay of 1e-4. For temperature parameter $\tau$, we use a fixed value of 0.3. For GCL, gamma (γ in the paper) is an additional parameter for maintaining moving average estimator, the default value is 0.9. 
- By default, `CIFAR10` is used for experiments. To pretrain on CIFAR100, you can set `--data_name=cifar100`. In this codebase, only `CIFAR10/CIFAR100` is supported, however, you can modify the dataloader to support other datasets.

**Training with ResNet-50 using DCL (SogCLR)**

We use batch size of 64 and train 400 epochs for pretraining on ResNet-50 with a 2-layer non-linear projection head on top of backbone encoder with the hidden size of 128. You can also increase the number of workers to accelerate the training speed, e.g., `--workers=32`
 

In [None]:
!CUDA_VISIBLE_DEVICES=0 python train.py \
  --lr=1.0 --learning-rate-scaling=linear \
  --epochs=400 --batch-size=64 \
  --loss_type dcl \
  --gamma 0.9 \
  --workers 32 \
  --wd=1e-4 \
  --data_name cifar10 \
  --save_dir ./saved_models/ \
  --print-freq 1000

Use GPU: 0 for training
pretraining on cifar10
=> creating model 'resnet50'
cifar head: True
initial learning rate: 0.25
20221013_cifar10_resnet50_sogclr-128-2048_bz_64_E400_WR10_lr_0.250_linear_wd_0.0001_t_0.3_g_0.9_lars
Downloading https://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz to ./data/cifar-10-python.tar.gz
100% 170498071/170498071 [00:02<00:00, 74817026.48it/s]
Extracting ./data/cifar-10-python.tar.gz to ./data/
Epoch: [0][  0/781]	Time 17.423 (17.423)	Data  3.492 ( 3.492)	LR 0.0000e+00 (0.0000e+00)	Loss -1.7371e-01 (-1.7371e-01)
Epoch: [1][  0/781]	Time  5.904 ( 5.904)	Data  3.218 ( 3.218)	LR 2.5000e-02 (2.5000e-02)	Loss -3.9610e-01 (-3.9610e-01)
Epoch: [2][  0/781]	Time  5.640 ( 5.640)	Data  3.072 ( 3.072)	LR 5.0000e-02 (5.0000e-02)	Loss -4.7044e-01 (-4.7044e-01)
Epoch: [3][  0/781]	Time  6.613 ( 6.613)	Data  3.718 ( 3.718)	LR 7.5000e-02 (7.5000e-02)	Loss -5.5438e-01 (-5.5438e-01)
Epoch: [4][  0/781]	Time  5.931 ( 5.931)	Data  3.161 ( 3.161)	LR 1.0000e-01 (1.0000e-01)	

**Training with ResNet-50 using CL (SimCLR)**

In [None]:
!CUDA_VISIBLE_DEVICES=0 python train.py \
  --lr=1.0 --learning-rate-scaling=linear \
  --epochs=400 --batch-size=64 \
  --loss_type cl \
  --workers 2 \
  --wd=1e-4 \
  --data_name cifar10 \
  --save_dir ./saved_models/ \
  --print-freq 1000

Use GPU: 0 for training
pretraining on cifar10
=> creating model 'resnet50'
cifar head: True
initial learning rate: 0.25
20221013_cifar10_resnet50_simclr-128-2048_bz_64_E2_WR10_lr_0.250_linear_wd_0.0001_t_0.3_g_0.9_lars
Files already downloaded and verified
Epoch: [0][  0/781]	Time  4.989 ( 4.989)	Data  0.261 ( 0.261)	LR 0.0000e+00 (0.0000e+00)	Loss 9.0097e+00 (9.0097e+00)
Epoch: [1][  0/781]	Time  0.544 ( 0.544)	Data  0.286 ( 0.286)	LR 2.5000e-02 (2.5000e-02)	Loss 7.8423e+00 (7.8423e+00)


### **Linear Classification**

By default, we use momentum-SGD without weight decay and a batch size of 1024 for linear classification on frozen features/weights. In this stage, it runs 90 epochs.

In [None]:
!python lincls.py \
  --workers 2 \
  --pretrained ./saved_models/20221013_cifar10_resnet50_sogclr-128-2048_bz_64_E400_WR10_lr_0.250_linear_wd_0.0001_t_0.3_g_0.9_lars/checkpoint_0399.pth.tar \
  --data_name cifar10 \
  --save_dir ./saved_models/

=> creating model 'resnet50'
Dataset: cifar10
Cifar head: True
=> loading checkpoint './saved_models/20221013_cifar10_resnet50_sogclr-128-2048_bz_64_E400_WR10_lr_0.250_linear_wd_0.0001_t_0.3_g_0.9_lars/checkpoint_0399.pth.tar'
=> loaded pre-trained model './saved_models/20221013_cifar10_resnet50_sogclr-128-2048_bz_64_E400_WR10_lr_0.250_linear_wd_0.0001_t_0.3_g_0.9_lars/checkpoint_0399.pth.tar'
linear_eval_20221013_cifar10_resnet50_sogclr-128-2048_bz_64_E400_WR10_lr_0.250_linear_wd_0.0001_t_0.3_g_0.9_lars
Files already downloaded and verified
Files already downloaded and verified
Epoch: [0][ 0/49]	Time 15.497 (15.497)	Data  3.439 ( 3.439)	Loss 2.3114e+00 (2.3114e+00)	Acc@1   7.13 (  7.13)	Acc@5  47.66 ( 47.66)
Epoch: [0][10/49]	Time  0.806 ( 2.138)	Data  0.000 ( 0.313)	Loss 4.9326e-01 (1.0183e+00)	Acc@1  83.98 ( 73.77)	Acc@5  98.63 ( 93.63)
Epoch: [0][20/49]	Time  0.821 ( 1.508)	Data  0.000 ( 0.164)	Loss 4.5537e-01 (7.7112e-01)	Acc@1  86.91 ( 78.65)	Acc@5  98.54 ( 96.21)
Epoch: [0][30/4

### **Benchmarks**

The following results are linear evaluation accuracy on CIFAR10 testing dataset. All results are based on a batch size of 64 for 400-epoch pretraining.

| Method | BatchSize |Epoch | Linear eval. |
|:----------:|:--------:|:--------:|:--------:|
| SimCLR | 64   |   400 |  90.66    |
| SogCLR | 64   |   400 | 91.78  |


### **Reference**
If you find this tutorial helpful, please cite our paper:

<pre>
@inproceedings{yuan2022provable,
  title={Provable stochastic optimization for global contrastive learning: Small batch does not harm performance},
  author={Yuan, Zhuoning and Wu, Yuexin and Qiu, Zi-Hao and Du, Xianzhi and Zhang, Lijun and Zhou, Denny and Yang, Tianbao},
  booktitle={International Conference on Machine Learning},
  pages={25760--25782},
  year={2022},
  organization={PMLR}
}
</pre>

In [None]:
!CUDA_VISIBLE_DEVICES=0 python train.py \
  --lr=1.0 --learning-rate-scaling=linear \
  --epochs=400 --batch-size=64 \
  --loss_type dcl \
  --gamma 0.9 \
  --workers 32 \
  --wd=1e-4 \
  --data_name cifar10 \
  --save_dir ./saved_models/ \
  --print-freq 1000

Use GPU: 0 for training
pretraining on cifar10
=> creating model 'resnet50'
cifar head: True
initial learning rate: 0.25
20221013_cifar10_resnet50_sogclr-128-2048_bz_64_E400_WR10_lr_0.250_linear_wd_0.0001_t_0.3_g_0.9_lars
Files already downloaded and verified
Epoch: [0][  0/781]	Time 11.877 (11.877)	Data  3.429 ( 3.429)	LR 0.0000e+00 (0.0000e+00)	Loss 4.7079e-02 (4.7079e-02)
Epoch: [1][  0/781]	Time  6.264 ( 6.264)	Data  3.257 ( 3.257)	LR 2.5000e-02 (2.5000e-02)	Loss -2.3343e-01 (-2.3343e-01)
Epoch: [2][  0/781]	Time  5.753 ( 5.753)	Data  3.119 ( 3.119)	LR 5.0000e-02 (5.0000e-02)	Loss -3.5563e-01 (-3.5563e-01)
Epoch: [3][  0/781]	Time  6.966 ( 6.966)	Data  3.703 ( 3.703)	LR 7.5000e-02 (7.5000e-02)	Loss -5.6335e-01 (-5.6335e-01)
Epoch: [4][  0/781]	Time  7.245 ( 7.245)	Data  4.089 ( 4.089)	LR 1.0000e-01 (1.0000e-01)	Loss -6.8456e-01 (-6.8456e-01)
Epoch: [5][  0/781]	Time  6.930 ( 6.930)	Data  4.350 ( 4.350)	LR 1.2500e-01 (1.2500e-01)	Loss -7.6171e-01 (-7.6171e-01)
Epoch: [6][  0/781]	Ti