Skip to content
main
Switch branches/tags
Code

Latest commit

Summary: Pull Request resolved: #64

Differential Revision: D23247550

Pulled By: ppwwyyxx

fbshipit-source-id: cfeefe6075b755d3fcce8c103239ae55ab2d2da2
78b69ca

Git stats

Files

Permalink
Failed to load latest commit information.
Type
Name
Latest commit message
Commit time
Mar 23, 2020
Mar 23, 2020
Mar 23, 2020
Mar 23, 2020

MoCo: Momentum Contrast for Unsupervised Visual Representation Learning

This is a PyTorch implementation of the MoCo paper:

@Article{he2019moco,
  author  = {Kaiming He and Haoqi Fan and Yuxin Wu and Saining Xie and Ross Girshick},
  title   = {Momentum Contrast for Unsupervised Visual Representation Learning},
  journal = {arXiv preprint arXiv:1911.05722},
  year    = {2019},
}

It also includes the implementation of the MoCo v2 paper:

@Article{chen2020mocov2,
  author  = {Xinlei Chen and Haoqi Fan and Ross Girshick and Kaiming He},
  title   = {Improved Baselines with Momentum Contrastive Learning},
  journal = {arXiv preprint arXiv:2003.04297},
  year    = {2020},
}

Preparation

Install PyTorch and ImageNet dataset following the official PyTorch ImageNet training code.

This repo aims to be minimal modifications on that code. Check the modifications by:

diff main_moco.py <(curl https://raw.githubusercontent.com/pytorch/examples/master/imagenet/main.py)
diff main_lincls.py <(curl https://raw.githubusercontent.com/pytorch/examples/master/imagenet/main.py)

Unsupervised Training

This implementation only supports multi-gpu, DistributedDataParallel training, which is faster and simpler; single-gpu or DataParallel training is not supported.

To do unsupervised pre-training of a ResNet-50 model on ImageNet in an 8-gpu machine, run:

python main_moco.py \
  -a resnet50 \
  --lr 0.03 \
  --batch-size 256 \
  --dist-url 'tcp://localhost:10001' --multiprocessing-distributed --world-size 1 --rank 0 \
  [your imagenet-folder with train and val folders]

This script uses all the default hyper-parameters as described in the MoCo v1 paper. To run MoCo v2, set --mlp --moco-t 0.2 --aug-plus --cos.

Note: for 4-gpu training, we recommend following the linear lr scaling recipe: --lr 0.015 --batch-size 128 with 4 gpus. We got similar results using this setting.

Linear Classification

With a pre-trained model, to train a supervised linear classifier on frozen features/weights in an 8-gpu machine, run:

python main_lincls.py \
  -a resnet50 \
  --lr 30.0 \
  --batch-size 256 \
  --pretrained [your checkpoint path]/checkpoint_0199.pth.tar \
  --dist-url 'tcp://localhost:10001' --multiprocessing-distributed --world-size 1 --rank 0 \
  [your imagenet-folder with train and val folders]

Linear classification results on ImageNet using this repo with 8 NVIDIA V100 GPUs :

pre-train
epochs
pre-train
time
MoCo v1
top-1 acc.
MoCo v2
top-1 acc.
ResNet-50 200 53 hours 60.8±0.2 67.5±0.1

Here we run 5 trials (of pre-training and linear classification) and report mean±std: the 5 results of MoCo v1 are {60.6, 60.6, 60.7, 60.9, 61.1}, and of MoCo v2 are {67.7, 67.6, 67.4, 67.6, 67.3}.

Models

Our pre-trained ResNet-50 models can be downloaded as following:

epochs mlp aug+ cos top-1 acc. model md5
MoCo v1 200 60.6 download b251726a
MoCo v2 200 67.7 download 59fd9945
MoCo v2 800 71.1 download a04e12f8

Transferring to Object Detection

See ./detection.

License

This project is under the CC-BY-NC 4.0 license. See LICENSE for details.

See Also