Skip to content

Latest commit

 

History

History

unsup

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 
 
 
 
 

MGD in Unsupervised Learning

This experiment is an extension of the original paper. MGD can naturally work with current unsupervised learning frameworks, e.g., Momentum Contrast (MoCo) and Simple Siamese Learning (SimSiam). In this repo, we initially investigate MoCo-v2 training with MGD and work in progress on other parts.

Environments

  • PyTorch 1.8.1

Data Preparation

Prepare ImageNet-1K dataset following the official PyTorch ImageNet training code.

Directory Structure
`-- path/to/${ImageNet-1K}/root/folder
    `-- train
    |   |-- n01440764
    |   |-- n01734418
    |   |-- ...
    |   |-- n15075141
    `-- valid
    |   |-- n01440764
    |   |-- n01734418
    |   |-- ...
    |   |-- n15075141

Code Preparation

cp -r ../mgd/sampler.py mgd

Unsupervised Training with MGD

Please download the pre-trained weight (md5: 59fd9945, epochs: 200) of ResNet-50 from MoCo-v2 Models and then load it with the arg of --resume.

To do unsupervised pre-training of a ResNet-18 model with MGD on ImageNet in an 8-gpu machine, run:

python main_moco_mgd.py \
  -a resnet18 \
  --lr 0.03 \
  --batch-size 256 \
  --dist-url 'tcp://localhost:10001' --multiprocessing-distributed --world-size 1 --rank 0 \
  --mlp --moco-t 0.2 --aug-plus --cos \
  --resume moco_v2_200ep_pretrain.pth.tar \
  [your imagenet-folder with train and val folders]
method model pre-train
epochs
training
logs
MGD ResNet-50 distills ResNet-34 200 Baidu Pan [ bkr5 ]
MGD ResNet-50 distills ResNet-18 200 Baidu Pan [ jbcv ]

Note:

  • The MGD distiller is engined by the AMP -- absolute max pooling.
  • The teacher is ResNet-50 in deafult.
  • The hyper-parameters of MGD, such as loss factors, are the same as supervised training. We did not search hyper-parameters. But according to training logs, we believe performances can be better with tunning hyper-parameters, for example, increasing the factor from 1e4 to 1e2.

Linear Classification

Same as linear classification of MoCo-v2. Linear classification results on ImageNet using this repo with 8 NVIDIA TITAN Xp GPUs:

method model pre-train
epochs
MoCo v2
top-1 acc.
MoCo v2
top-5 acc.
Teacher ResNet-50 200 67.5 -
Student ResNet-34 200 57.2 81.5
MGD ResNet-34 200 58.5 82.7
Student ResNet-18 200 52.5 77.0
MGD ResNet-18 200 53.6 78.7

Update Schedule

The schedule for updating MGD matching matrix is different with that in the original paper. We scale it with a log function, i.e., we update matching matrix at the epoch of [1, 2, 3, 6, 9, 15, 26, 43, 74, 126].