<a href="https://colab.research.google.com/github/open-mmlab/mmselfsup/blob/master/demo/mmselfsup_colab_tutorial.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# MMSelfSup Tutorial
In this tutorial, we will introduce the following content:

- How to install MMSelfSup
- How to train the algorithm in MMSelfSup
- How to train downstream tasks

If you have any other questions, welcome to report issues.

## How to install MMSelfSup

Before using MMSelfSup, we need to prepare the environment with the following steps:

1. Install Python, CUDA, C/C++ compiler and git
2. Install PyTorch (CUDA version)
3. Install dependent codebase (mmcv, mmcls)
4. Clone mmselfsup source code from GitHub and install it

Because this tutorial is on Google Colab, and the basic environment has been completed, we can skip the first two steps.

In [3]:
!pwd

/content


In [4]:
# Check nvcc version
!nvcc -V

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2020 NVIDIA Corporation
Built on Mon_Oct_12_20:09:46_PDT_2020
Cuda compilation tools, release 11.1, V11.1.105
Build cuda_11.1.TC455_06.29190527_0


In [5]:
# Check GCC version
!gcc --version

gcc (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0
Copyright (C) 2017 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.



In [6]:
# Check PyTorch installation
import torch, torchvision
print(torch.__version__)
print(torch.cuda.is_available())

1.10.0+cu111
True


## Install MMCV

MMCV is the basic package of all OpenMMLab packages. We have pre-built wheels on Linux, so we can download and install them directly.

Please pay attention to PyTorch and CUDA versions to match the wheel.

In the above steps, we have checked the version of PyTorch and CUDA, and they are 1.10.2 and 11.3 respectively, so we need to choose the corresponding wheel.

In addition, we can also install the full version of mmcv (mmcv-full). It includes full features and various CUDA ops out of the box, but needs a longer time to build.

MIM is recommended: https://github.com/open-mmlab/mim

In [7]:
!pip install openmim



In [10]:
!mim install mmcv-full

installing mmcv-full from wheel.
Looking in links: https://download.openmmlab.com/mmcv/dist/cu111/torch1.10.0/index.html
Collecting mmcv-full==1.4.6
  Downloading https://download.openmmlab.com/mmcv/dist/cu111/torch1.10.0/mmcv_full-1.4.6-cp37-cp37m-manylinux1_x86_64.whl (46.0 MB)
[K     |████████████████████████████████| 46.0 MB 11.3 MB/s 
Collecting addict
  Downloading addict-2.4.0-py3-none-any.whl (3.8 kB)
Collecting yapf
  Downloading yapf-0.32.0-py2.py3-none-any.whl (190 kB)
[K     |████████████████████████████████| 190 kB 5.3 MB/s 
Installing collected packages: yapf, addict, mmcv-full
Successfully installed addict-2.4.0 mmcv-full-1.4.6 yapf-0.32.0
[32mSuccessfully installed mmcv-full.[0m


Besides, you can also use pip to install the packages, but you are supposed to check the pytorch and cuda version mannually. The example commmand is provided below, but you need to modify it according to your PyTorch and CUDA version.

In [None]:
# Install mmcv and mmcls
# !pip install mmcv -f https://download.openmmlab.com/mmcv/dist/cu111/torch1.10/index.html
# !pip install mmcv-full -f https://download.openmmlab.com/mmcv/dist/cu111/torch1.10/index.html

## Clone and install mmselfsup

In [11]:
# Clone MMSelfSup repository
!git clone https://github.com/open-mmlab/mmselfsup.git
%cd mmselfsup/

# Install MMSelfSup from source
!pip install -e . 

Cloning into 'mmselfsup'...
remote: Enumerating objects: 3308, done.[K
remote: Counting objects: 100% (930/930), done.[K
remote: Compressing objects: 100% (560/560), done.[K
remote: Total 3308 (delta 577), reused 475 (delta 369), pack-reused 2378[K
Receiving objects: 100% (3308/3308), 1.97 MiB | 12.29 MiB/s, done.
Resolving deltas: 100% (1950/1950), done.
/content/mmselfsup
Obtaining file:///content/mmselfsup
Collecting mmcls<=0.20.1,>=0.19.0
  Downloading mmcls-0.20.1-py2.py3-none-any.whl (490 kB)
[K     |████████████████████████████████| 490 kB 5.3 MB/s 
Collecting timm
  Downloading timm-0.5.4-py3-none-any.whl (431 kB)
[K     |████████████████████████████████| 431 kB 46.2 MB/s 
Installing collected packages: timm, mmcls, mmselfsup
  Running setup.py develop for mmselfsup
Successfully installed mmcls-0.20.1 mmselfsup-0.7.1 timm-0.5.4


In [12]:
# Check MMSelfSup installation
import mmselfsup
print(mmselfsup.__version__)

0.7.1


## Example to start a self-supervised task

Before you start training, you need to prepare your dataset, please check [prepare_data.md](https://github.com/open-mmlab/mmselfsup/blob/master/docs/en/prepare_data.md) file carefully.

**Note**: As we follow the original algorithms to implement our codes, so many algorithms are supposed to run on distributed mode, they are not supported on 1 GPU training officially. You can check it [here](https://github.com/open-mmlab/mmselfsup/blob/master/tools/train.py#L120).


In [25]:
!pwd

/content/mmselfsup


Here we provide a example and download a small dataset to display the demo.

In [26]:
!mkdir data
!wget https://download.openmmlab.com/mmselfsup/data/imagenet.zip
!unzip -q imagenet.zip -d ./data/

--2022-03-21 07:15:11--  https://download.openmmlab.com/mmselfsup/data/imagenet.zip
Resolving download.openmmlab.com (download.openmmlab.com)... 47.252.96.28
Connecting to download.openmmlab.com (download.openmmlab.com)|47.252.96.28|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 155496559 (148M) [application/zip]
Saving to: ‘imagenet.zip’


2022-03-21 07:15:30 (8.31 MB/s) - ‘imagenet.zip’ saved [155496559/155496559]



In [35]:
# Check data directory
!apt-get install tree
!tree -d ./data

Reading package lists... Done
Building dependency tree       
Reading state information... Done
tree is already the newest version (1.7.0-5).
0 upgraded, 0 newly installed, 0 to remove and 39 not upgraded.
./data
└── imagenet
    ├── meta
    └── train
        └── n01440764

4 directories


### Create a new config file
To reuse the common parts of different config files, we support inheriting multiple base config files. For example, to train `relative_loc` algorithm, the new config file can create the model's basic structure by inheriting `configs/_base_/models/relative-loc.py`.

In [20]:
%%writefile configs/selfsup/relative_loc/relative-loc_resnet50_8xb64-steplr-70e_in1k_colab.py
_base_ = [
    '../_base_/models/relative-loc.py',
    '../_base_/datasets/imagenet_relative-loc.py',
    '../_base_/schedules/sgd_steplr-200e_in1k.py',
    '../_base_/default_runtime.py',
]

log_config = dict(interval=10)

# optimizer
optimizer = dict(
    type='SGD',
    lr=0.2,
    weight_decay=1e-4,
    momentum=0.9,
    paramwise_options={
        '\\Aneck.': dict(weight_decay=5e-4),
        '\\Ahead.': dict(weight_decay=5e-4)
    })

# learning policy
lr_config = dict(
    policy='step',
    step=[1])

# runtime settings
runner = dict(type='EpochBasedRunner', max_epochs=2)
# the max_keep_ckpts controls the max number of ckpt file in your work_dirs
# if it is 3, when CheckpointHook (in mmcv) saves the 4th ckpt
# it will remove the oldest one to keep the number of total ckpts as 3
checkpoint_config = dict(interval=1, max_keep_ckpts=3)


Overwriting configs/selfsup/relative_loc/relative-loc_resnet50_8xb64-steplr-70e_in1k_colab.py


Then you can use the command below to train the implemented algorithm.

In [21]:
# Here is just a simple example
!python ./tools/train.py configs/selfsup/relative_loc/relative-loc_resnet50_8xb64-steplr-70e_in1k_colab.py

  return f(*args, **kwds)
  return f(*args, **kwds)
Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
  from numpy import (exp, inf, pi, sqrt, floor, sin, cos, around, int,
  f'Setting OMP_NUM_THREADS environment variable for each process '
  f'Setting MKL_NUM_THREADS environment variable for each process '
2022-03-21 08:39:57,427 - mmselfsup - INFO - Environment info:
------------------------------------------------------------
sys.platform: linux
Python: 3.7.12 (default, Jan 15 2022, 18:48:18) [GCC 7.5.0]
CUDA available: True
GPU 0: Tesla K80
CUDA_HOME: /usr/local/cuda
NVCC: Build cuda_11.1.TC455_06.29190527_0
GCC: gcc (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0
PyTorch: 1.10.0+cu111
PyTorch compiling details: PyTorch built with:
  - GCC 7.3
  - C++ Version: 201402
  - Intel(R) Math Kernel Library Version 2020.0.0 Product Build 20191122 for Intel(R) 64 architecture applications
  - Intel(R) MKL-DNN v2.2.3 (Git Hash 7336c

## Example to start a downstream task


In [15]:
!pwd

/content/mmselfsup


### Prepare config file

Here we create a new config file for demo dataset, actually we provided various config files in directory `configs/benchmarks`.

In [28]:
%%writefile configs/benchmarks/classification/imagenet/resnet50_8xb32-steplr-100e_in1k_colab.py
_base_ = [
    '../_base_/models/resnet50.py',
    '../_base_/datasets/imagenet.py',
    '../_base_/schedules/sgd_steplr-100e.py',
    '../_base_/default_runtime.py',
]

log_config = dict(interval=10)

model = dict(backbone=dict(frozen_stages=4))

# dataset summary
# as the demo only has small part of train dataset, so modified the path of val
data = dict(
    val=dict(
        data_source=dict(
            data_prefix='data/imagenet/train',
            ann_file='data/imagenet/meta/train.txt',
        )))
evaluation = dict(interval=1, topk=(1, 5))

# moco setting
# optimizer
optimizer = dict(type='SGD', lr=30., momentum=0.9, weight_decay=0.)

# runtime settings
runner = dict(type='EpochBasedRunner', max_epochs=2)
# the max_keep_ckpts controls the max number of ckpt file in your work_dirs
# if it is 3, when CheckpointHook (in mmcv) saves the 4th ckpt
# it will remove the oldest one to keep the number of total ckpts as 3
checkpoint_config = dict(interval=1, max_keep_ckpts=3)

Overwriting configs/benchmarks/classification/imagenet/resnet50_8xb32-steplr-100e_in1k_colab.py


### Extract backbone weights from pre-train model

In [26]:
!python tools/model_converters/extract_backbone_weights.py \
  work_dirs/selfsup/relative-loc_resnet50_8xb64-steplr-70e_in1k_colab/epoch_2.pth \
  work_dirs/selfsup/relative-loc_resnet50_8xb64-steplr-70e_in1k_colab/relative-loc_backbone-weights.pth

In [30]:
!python -u tools/train.py configs/benchmarks/classification/imagenet/resnet50_8xb32-steplr-100e_in1k_colab.py \
  --cfg-options model.backbone.init_cfg.type=Pretrained \
  model.backbone.init_cfg.checkpoint=work_dirs/selfsup/relative-loc_resnet50_8xb64-steplr-70e_in1k_colab/relative-loc_backbone-weights.pth

  return f(*args, **kwds)
  return f(*args, **kwds)
Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
  from numpy import (exp, inf, pi, sqrt, floor, sin, cos, around, int,
  f'Setting OMP_NUM_THREADS environment variable for each process '
  f'Setting MKL_NUM_THREADS environment variable for each process '
2022-03-21 08:56:50,167 - mmselfsup - INFO - Environment info:
------------------------------------------------------------
sys.platform: linux
Python: 3.7.12 (default, Jan 15 2022, 18:48:18) [GCC 7.5.0]
CUDA available: True
GPU 0: Tesla K80
CUDA_HOME: /usr/local/cuda
NVCC: Build cuda_11.1.TC455_06.29190527_0
GCC: gcc (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0
PyTorch: 1.10.0+cu111
PyTorch compiling details: PyTorch built with:
  - GCC 7.3
  - C++ Version: 201402
  - Intel(R) Math Kernel Library Version 2020.0.0 Product Build 20191122 for Intel(R) 64 architecture applications
  - Intel(R) MKL-DNN v2.2.3 (Git Hash 7336c

**Note: As the demo only has one class in dataset, the model collapsed and the results of loss and acc should be ignored.**

### Download pre-train model provided by MMSelfSup

In [14]:
# Download pre-train model
%cd models/
!wget https://download.openmmlab.com/mmselfsup/moco/mocov2_resnet50_8xb32-coslr-200e_in1k_20220225-89e03af4.pth
%cd /content/mmselfsup/

--2022-03-21 08:22:09--  https://download.openmmlab.com/mmselfsup/moco/mocov2_resnet50_8xb32-coslr-200e_in1k_20220225-89e03af4.pth
Resolving download.openmmlab.com (download.openmmlab.com)... 47.252.96.28
Connecting to download.openmmlab.com (download.openmmlab.com)|47.252.96.28|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 94291945 (90M) [application/octet-stream]
Saving to: ‘mocov2_resnet50_8xb32-coslr-200e_in1k_20220225-89e03af4.pth’


2022-03-21 08:22:23 (6.61 MB/s) - ‘mocov2_resnet50_8xb32-coslr-200e_in1k_20220225-89e03af4.pth’ saved [94291945/94291945]

/content/mmselfsup


In [29]:
# start linear probing traing
!python -u tools/train.py configs/benchmarks/classification/imagenet/resnet50_8xb32-steplr-100e_in1k_colab.py \
  --cfg-options model.backbone.init_cfg.type=Pretrained \
  model.backbone.init_cfg.checkpoint=https://download.openmmlab.com/mmselfsup/moco/mocov2_resnet50_8xb32-coslr-200e_in1k_20220225-89e03af4.pth

  return f(*args, **kwds)
  return f(*args, **kwds)
Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
  from numpy import (exp, inf, pi, sqrt, floor, sin, cos, around, int,
  f'Setting OMP_NUM_THREADS environment variable for each process '
  f'Setting MKL_NUM_THREADS environment variable for each process '
2022-03-21 08:54:43,263 - mmselfsup - INFO - Environment info:
------------------------------------------------------------
sys.platform: linux
Python: 3.7.12 (default, Jan 15 2022, 18:48:18) [GCC 7.5.0]
CUDA available: True
GPU 0: Tesla K80
CUDA_HOME: /usr/local/cuda
NVCC: Build cuda_11.1.TC455_06.29190527_0
GCC: gcc (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0
PyTorch: 1.10.0+cu111
PyTorch compiling details: PyTorch built with:
  - GCC 7.3
  - C++ Version: 201402
  - Intel(R) Math Kernel Library Version 2020.0.0 Product Build 20191122 for Intel(R) 64 architecture applications
  - Intel(R) MKL-DNN v2.2.3 (Git Hash 7336c

**Note: As the demo only has one class in dataset, the model collapsed and the results of loss and acc should be ignored.**