Skip to content

A PyTorch implementation of MoCo based on CVPR 2020 paper "Momentum Contrast for Unsupervised Visual Representation Learning"

Notifications You must be signed in to change notification settings

leftthomas/MoCo

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

MoCo

A PyTorch implementation of MoCo based on CVPR 2020 paper Momentum Contrast for Unsupervised Visual Representation Learning.

Network Architecture image from the paper

Requirements

conda install pytorch=1.6.0 torchvision cudatoolkit=10.2 -c pytorch

Dataset

CIFAR10 dataset is used in this repo, the dataset will be downloaded into data directory by PyTorch automatically.

Usage

Train MoCo

python train.py --batch_size 1024 --epochs 50
optional arguments:
--feature_dim                 Feature dim for each image [default value is 128]
--m                           Negative sample number [default value is 4096]
--temperature                 Temperature used in softmax [default value is 0.5]
--momentum                    Momentum used for the update of memory bank [default value is 0.999]
--k                           Top k most similar images used to predict the label [default value is 200]
--batch_size                  Number of images in each mini-batch [default value is 256]
--epochs                      Number of sweeps over the dataset to train [default value is 500]

Linear Evaluation

python linear.py --batch_size 1024 --epochs 200 
optional arguments:
--model_path                  The pretrained model path [default value is 'results/128_4096_0.5_0.999_200_256_500_model.pth']
--batch_size                  Number of images in each mini-batch [default value is 256]
--epochs                      Number of sweeps over the dataset to train [default value is 100]

Results

There are some difference between this implementation and official implementation, the model (ResNet50) is trained on one NVIDIA GeForce GTX TITAN GPU:

  1. No Gaussian blur used;
  2. Adam optimizer with learning rate 1e-3 is used to replace SGD optimizer;
  3. No Linear learning rate scaling used.
Evaluation Protocol Feature Dim Momentum Batch Size Epoch Num τ K Top1 Acc % Top5 Acc % Download
KNN 128 0.999 256 500 0.5 200 80.9 99.1 model | tex8
Linear - - 256 100 - - 86.3 99.6 model | 6me4

About

A PyTorch implementation of MoCo based on CVPR 2020 paper "Momentum Contrast for Unsupervised Visual Representation Learning"

Topics

Resources

Stars

Watchers

Forks

Languages