Skip to content

Commit

Permalink
Bump version to v0.6.0 (#199)
Browse files Browse the repository at this point in the history
* [Feature] Add MoCo v3 (#194)

* [Feature] add position embedding function

* [Fature] modify nonlinear neck for vit backbone

* [Feature] add mocov3 head

* [Feature] modify cls_head for vit backbone

* [Feature] add ViT backbone

* [Feature] add mocov3 algorithm

* [Docs] revise BYOL hook docstring

* [Feature] add mocov3 vit small config files

* [Feature] add mocov3 vit small linear eval config files

* [Fix] solve conflict

* [Fix] add mmcls

* [Fix] fix docstring format

* [Fix] fix isort

* [Fix] add mmcls to runtime requirements

* [Feature] remove duplicated codes

* [Feature] add mocov3 related unit test

* [Feature] revise position embedding function

* [Feature] add UT codes

* [Docs] add README.md

* [Docs] add model links and results to model zoo

* [Docs] fix model links

* [Docs] add metafile

* [Docs] modify install.md and add mmcls requirements

* [Docs] modify description

* [Fix] using specific arch name `mocov3-small`  rather than general arch name `small`

* [Fix] add mmcls

* [Fix] fix arch name

* [Feature] change name to `MoCoV3`

* [Fix] fix unit test bug

* [Feature] change `BYOLHook` name to `MomentumUpdateHook`

* [Feature] change name to MoCoV3

* [Docs] modify description

Co-authored-by: fangyixiao18 <fangyx18@hotmail.com>
Co-authored-by: Yixiao Fang <36138628+fangyixiao18@users.noreply.github.com>

* [Docs] update model zoo results (#195)

* Bump version to v0.6.0 (#198)

* [Docs] update model zoo results

* Bump version to v0.6.0

Co-authored-by: fangyixiao18 <fangyx18@hotmail.com>
Co-authored-by: Yixiao Fang <36138628+fangyixiao18@users.noreply.github.com>
  • Loading branch information
3 people committed Feb 2, 2022
1 parent 6a89f45 commit fc69e38
Show file tree
Hide file tree
Showing 38 changed files with 979 additions and 118 deletions.
3 changes: 2 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -66,7 +66,7 @@ This project is released under the [Apache 2.0 license](LICENSE).

## ChangeLog

MMSelfSup **v0.5.0** was released with refactor in 16/12/2021.
MMSelfSup **v0.6.0** was released in 02/02/2022.

Please refer to [changelog.md](docs/en/changelog.md) for details and release history.

Expand All @@ -91,6 +91,7 @@ Supported algorithms:
- [x] [SwAV (NeurIPS'2020)](https://arxiv.org/abs/2006.09882)
- [x] [DenseCL (CVPR'2021)](https://arxiv.org/abs/2011.09157)
- [x] [SimSiam (CVPR'2021)](https://arxiv.org/abs/2011.10566)
- [x] [MoCo v3 (ICCV'2021)](https://arxiv.org/abs/2104.02057)

More algorithms are in our plan.

Expand Down
9 changes: 5 additions & 4 deletions README_zh-CN.md
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,7 @@
[📘使用文档](https://mmselfsup.readthedocs.io/zh_CN/latest/) |
[🛠️安装教程](https://mmselfsup.readthedocs.io/zh_CN/latest/install.html) |
[👀模型库](https://github.com/open-mmlab/mmselfsup/blob/master/docs/zh_cn/model_zoo.md) |
[🆕变更日志](https://mmselfsup.readthedocs.io/zh_CN/latest/changelog.html) |
[🆕更新日志](https://mmselfsup.readthedocs.io/zh_CN/latest/changelog.html) |
[🤔报告问题](https://github.com/open-mmlab/mmselfsup/issues/new/choose)
</div>

Expand Down Expand Up @@ -62,11 +62,11 @@ MMSelfSup 是一个基于 PyTorch 实现的开源自监督表征学习工具箱

该项目采用 [Apache 2.0 开源许可证](LICENSE).

## 修改日志
## 更新日志

MMSelfSup **v0.5.0** 在 16/12/2021 发版.
最新的 **v0.6.0** 版本已经在 2022.02.02 发布。

请参考 [变更日志](docs/zh_cn/changelog.md) 获取更多细节和历史版本信息。
请参考 [更新日志](docs/zh_cn/changelog.md) 获取更多细节和历史版本信息。

MMSelfSup 和 OpenSelfSup 的不同点写在 [对比文档](docs/en/compatibility.md) 中。

Expand All @@ -90,6 +90,7 @@ MMSelfSup 和 OpenSelfSup 的不同点写在 [对比文档](docs/en/compatibilit
- [x] [SwAV (NeurIPS'2020)](https://arxiv.org/abs/2006.09882)
- [x] [DenseCL (CVPR'2021)](https://arxiv.org/abs/2011.09157)
- [x] [SimSiam (CVPR'2021)](https://arxiv.org/abs/2011.10566)
- [x] [MoCo v3 (ICCV'2021)](https://arxiv.org/abs/2104.02057)

更多的算法实现已经在我们的计划中。

Expand Down
15 changes: 15 additions & 0 deletions configs/benchmarks/classification/_base_/models/vit-small-p16.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
# model settings
model = dict(
type='Classification',
backbone=dict(
type='VisionTransformer',
arch='mocov3-small', # embed_dim = 384
img_size=224,
patch_size=16,
stop_grad_conv1=True),
head=dict(
type='ClsHead',
in_channels=384,
num_classes=1000,
vit_backbone=True,
))
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
_base_ = [
'../_base_/models/vit-small-p16.py',
'../_base_/datasets/imagenet.py',
'../_base_/schedules/sgd_coslr-100e.py',
'../_base_/default_runtime.py',
]
# MoCo v3 linear probing setting

model = dict(backbone=dict(frozen_stages=12, norm_eval=True))

# dataset summary
data = dict(imgs_per_gpu=128) # total 128*8=1024, 8 GPU linear cls

# optimizer
optimizer = dict(type='SGD', lr=12, momentum=0.9, weight_decay=0.)

# runtime settings
runner = dict(type='EpochBasedRunner', max_epochs=90)

# the max_keep_ckpts controls the max number of ckpt file in your work_dirs
# if it is 3, when CheckpointHook (in mmcv) saves the 4th ckpt
# it will remove the oldest one to keep the number of total ckpts as 3
checkpoint_config = dict(interval=10, max_keep_ckpts=3)
66 changes: 66 additions & 0 deletions configs/selfsup/_base_/datasets/imagenet_mocov3.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,66 @@
# dataset settings
data_source = 'ImageNet'
dataset_type = 'MultiViewDataset'
img_norm_cfg = dict(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
train_pipeline1 = [
dict(type='RandomResizedCrop', size=224, scale=(0.2, 1.)),
dict(
type='RandomAppliedTrans',
transforms=[
dict(
type='ColorJitter',
brightness=0.4,
contrast=0.4,
saturation=0.2,
hue=0.1)
],
p=0.8),
dict(type='RandomGrayscale', p=0.2),
dict(type='GaussianBlur', sigma_min=0.1, sigma_max=2.0, p=1.),
dict(type='Solarization', p=0.),
dict(type='RandomHorizontalFlip'),
]
train_pipeline2 = [
dict(type='RandomResizedCrop', size=224, scale=(0.2, 1.)),
dict(
type='RandomAppliedTrans',
transforms=[
dict(
type='ColorJitter',
brightness=0.4,
contrast=0.4,
saturation=0.2,
hue=0.1)
],
p=0.8),
dict(type='RandomGrayscale', p=0.2),
dict(type='GaussianBlur', sigma_min=0.1, sigma_max=2.0, p=0.1),
dict(type='Solarization', p=0.2),
dict(type='RandomHorizontalFlip'),
]

# prefetch
prefetch = False
if not prefetch:
train_pipeline1.extend(
[dict(type='ToTensor'),
dict(type='Normalize', **img_norm_cfg)])
train_pipeline2.extend(
[dict(type='ToTensor'),
dict(type='Normalize', **img_norm_cfg)])

# dataset summary
data = dict(
imgs_per_gpu=256, # 256*16(gpu)=4096
workers_per_gpu=4,
train=dict(
type=dataset_type,
data_source=dict(
type=data_source,
data_prefix='data/imagenet/train',
ann_file='data/imagenet/meta/train.txt',
),
num_views=[1, 1],
pipelines=[train_pipeline1, train_pipeline2],
prefetch=prefetch,
))
36 changes: 36 additions & 0 deletions configs/selfsup/_base_/models/mocov3_vit-small-p16.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@
# model settings
model = dict(
type='MoCoV3',
base_momentum=0.99,
backbone=dict(
type='VisionTransformer',
arch='mocov3-small', # embed_dim = 384
img_size=224,
patch_size=16,
stop_grad_conv1=True),
neck=dict(
type='NonLinearNeck',
in_channels=384,
hid_channels=4096,
out_channels=256,
num_layers=3,
with_bias=False,
with_last_bn=True,
with_last_bn_affine=False,
with_last_bias=False,
with_avg_pool=False,
vit_backbone=True),
head=dict(
type='MoCoV3Head',
predictor=dict(
type='NonLinearNeck',
in_channels=256,
hid_channels=4096,
out_channels=256,
num_layers=2,
with_bias=False,
with_last_bn=True,
with_last_bn_affine=False,
with_last_bias=False,
with_avg_pool=False),
temperature=0.2))
16 changes: 16 additions & 0 deletions configs/selfsup/_base_/schedules/adamw_coslr-300e_in1k.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
# optimizer
optimizer = dict(type='AdamW', lr=6e-4, weight_decay=0.1)
optimizer_config = dict() # grad_clip, coalesce, bucket_size_mb

# learning policy
lr_config = dict(
policy='CosineAnnealing',
by_epoch=False,
min_lr=0.,
warmup='linear',
warmup_iters=40,
warmup_ratio=1e-4, # cannot be 0
warmup_by_epoch=True)

# runtime settings
runner = dict(type='EpochBasedRunner', max_epochs=300)
42 changes: 42 additions & 0 deletions configs/selfsup/mocov3/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,42 @@
# MoCo v3

> [An Empirical Study of Training Self-Supervised Vision Transformers](https://arxiv.org/abs/2104.02057)
<!-- [ALGORITHM] -->

## Abstract

This paper does not describe a novel method. Instead, it studies a straightforward, incremental, yet must-know baseline given the recent progress in computer vision: self-supervised learning for Vision Transformers (ViT). While the training recipes for standard convolutional networks have been highly mature and robust, the recipes for ViT are yet to be built, especially in the self-supervised scenarios where training becomes more challenging. In this work, we go back to basics and investigate the effects of several fundamental components for training self-supervised ViT. We observe that instability is a major issue that degrades accuracy, and it can be hidden by apparently good results. We reveal that these results are indeed partial failure, and they can be improved when training is made more stable. We benchmark ViT results in MoCo v3 and several other self-supervised frameworks, with ablations in various aspects. We discuss the currently positive evidence as well as challenges and open questions. We hope that this work will provide useful data points and experience for future research.

<div align="center">
<img src="https://user-images.githubusercontent.com/36138628/151305362-e6e8ea35-b3b8-45f6-8819-634e67083218.png" width="500" />
</div>

## Results and Models

**Back to [model_zoo.md](../../../docs/en/model_zoo.md) to download models.**

In this page, we provide benchmarks as much as possible to evaluate our pre-trained models. If not mentioned, all models were trained on ImageNet1k dataset.

### Classification

The classification benchmarks includes 4 downstream task datasets, **VOC**, **ImageNet**, **iNaturalist2018** and **Places205**. If not specified, the results are Top-1 (%).

#### ImageNet Linear Evaluation

The **Linear Evaluation** result is obtained by training a linear head upon the pre-trained backbone. Please refer to [vit-small-p16_8xb128-coslr-90e_in1k](../../benchmarks/classification/imagenet/vit-small-p16_8xb128-coslr-90e_in1k.py) for details of config.

| Self-Supervised Config | Linear Evaluation |
| ----------------------------------------------------------------------------------------------------------------- | ----------------- |
| [mocov3_vit-small-p16_32xb128-fp16-coslr-300e_in1k-224](mocov3_vit-small-p16_32xb128-fp16-coslr-300e_in1k-224.py) | 73.07 |

## Citation

```bibtex
@Article{chen2021mocov3,
author = {Xinlei Chen* and Saining Xie* and Kaiming He},
title = {An Empirical Study of Training Self-Supervised Vision Transformers},
journal = {arXiv preprint arXiv:2104.02057},
year = {2021},
}
```
28 changes: 28 additions & 0 deletions configs/selfsup/mocov3/metafile.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
Collections:
- Name: MoCoV3
Metadata:
Training Data: ImageNet-1k
Training Techniques:
- LARS
Training Resources: 32x V100 GPUs
Architecture:
- ViT
- MoCo
Paper:
URL: https://arxiv.org/abs/2104.02057
Title: "An Empirical Study of Training Self-Supervised Vision Transformers"
README: configs/selfsup/mocov3/README.md

Models:
- Name: mocov3_vit-small-p16_32xb128-fp16-coslr-300e_in1k-224
In Collection: MoCoV3
Metadata:
Epochs: 300
Batch Size: 4096
Results:
- Task: Self-Supervised Image Classification
Dataset: ImageNet-1k
Metrics:
Top 1 Accuracy: 73.07
Config: configs/selfsup/mocov3/mocov3_vit-small-p16_32xb128-fp16-coslr-300e_in1k-224.py
Weights: https://download.openmmlab.com/mmselfsup/moco/mocov3_vit-small-p16_32xb128-fp16-coslr-300e_in1k-224_20220127-e9332db2.pth
Original file line number Diff line number Diff line change
@@ -0,0 +1,75 @@
_base_ = [
'../_base_/models/mocov3_vit-small-p16.py',
'../_base_/datasets/imagenet_mocov3.py',
'../_base_/schedules/adamw_coslr-300e_in1k.py',
'../_base_/default_runtime.py',
]

# dataset settings
img_norm_cfg = dict(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
# the difference between ResNet50 and ViT pipeline is the `scale` in
# `RandomResizedCrop`, `scale=(0.08, 1.)` in ViT pipeline
train_pipeline1 = [
dict(type='RandomResizedCrop', size=224, scale=(0.08, 1.)),
dict(
type='RandomAppliedTrans',
transforms=[
dict(
type='ColorJitter',
brightness=0.4,
contrast=0.4,
saturation=0.2,
hue=0.1)
],
p=0.8),
dict(type='RandomGrayscale', p=0.2),
dict(type='GaussianBlur', sigma_min=0.1, sigma_max=2.0, p=1.),
dict(type='Solarization', p=0.),
dict(type='RandomHorizontalFlip'),
]
train_pipeline2 = [
dict(type='RandomResizedCrop', size=224, scale=(0.08, 1.)),
dict(
type='RandomAppliedTrans',
transforms=[
dict(
type='ColorJitter',
brightness=0.4,
contrast=0.4,
saturation=0.2,
hue=0.1)
],
p=0.8),
dict(type='RandomGrayscale', p=0.2),
dict(type='GaussianBlur', sigma_min=0.1, sigma_max=2.0, p=0.1),
dict(type='Solarization', p=0.2),
dict(type='RandomHorizontalFlip'),
]

# prefetch
prefetch = False
if not prefetch:
train_pipeline1.extend(
[dict(type='ToTensor'),
dict(type='Normalize', **img_norm_cfg)])
train_pipeline2.extend(
[dict(type='ToTensor'),
dict(type='Normalize', **img_norm_cfg)])

# dataset summary
data = dict(
imgs_per_gpu=128, train=dict(pipelines=[train_pipeline1, train_pipeline2]))

# MoCo v3 use the same momentum update method as BYOL
custom_hooks = [dict(type='MomentumUpdateHook')]

# optimizer
optimizer = dict(type='AdamW', lr=2.4e-3, weight_decay=0.1)

# fp16
fp16 = dict(loss_scale='dynamic')

# the max_keep_ckpts controls the max number of ckpt file in your work_dirs
# if it is 3, when CheckpointHook (in mmcv) saves the 4th ckpt
# it will remove the oldest one to keep the number of total ckpts as 3
checkpoint_config = dict(interval=10, max_keep_ckpts=3)
2 changes: 1 addition & 1 deletion configs/selfsup/npid/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -65,7 +65,7 @@ Please refer to [mask_rcnn_r50_fpn_mstrain_1x_coco.py](../../benchmarks/mmdetect

| Self-Supervised Config | mAP(Box) | AP50(Box) | AP75(Box) | mAP(Mask) | AP50(Mask) | AP75(Mask) |
| --------------------------------------------------------------------- | -------- | --------- | --------- | --------- | ---------- | ---------- |
| [resnet50_8xb32-steplr-200e](npid_resnet50_8xb32-steplr-200e_in1k.py) | | | | | | |
| [resnet50_8xb32-steplr-200e](npid_resnet50_8xb32-steplr-200e_in1k.py) | 38.5 | 57.7 | 42.0 | 34.6 | 54.8 | 37.1 |

### Segmentation

Expand Down

0 comments on commit fc69e38

Please sign in to comment.