# Tutorial 9: Classification with Transformers and MLPs


In this tutorial, you will learn

- How to build a Transformer or MLP model
- How to implement a specific data augmentation
- How to use a Transformer or MLP model for classification


## Step 1: Install

The requirements for the classification are as follow:
```
Ubuntu
Nvidia RTX 2080

cuda=='11.0'
python=='3.7'
pytorch=='1.7.1'
torchvision=='0.8.2'
mmcv=='1.3.14'
mmcls=='0.15.0'
```


The following shows how to install mmclassification from scratch. We Create a conda virtual environment and activate it.
```shell
(base) ➜  mmclassification git:(master) git checkout -b dev-tutorial v0.16.0 
Switched to a new branch 'dev-tutorial'

# create an environment named 'comp3340', with python=3.8
conda create -n comp3340 python=3.8

conda install pytorch==1.8.0 torchvision==0.9.0 torchaudio==0.8.0 cudatoolkit=11.1 -c pytorch -c conda-forge

pip install mmcv-full==1.3.8

cd /work/path/to/mmclassification
pip install -e .

```

### Check Installation
```
python -c "import mmcls; print(mmcls.__version__)"

# Install successfully if get:
0.16.0
```

## Step 2. Data Preparing

We use [Category Flower Dataset](https://www.robots.ox.ac.uk/~vgg/data/flowers/17/index.html) for this tutorial.


This dataset contains 17 category flower dataset with 80 images for each class. The flowers chosen are some common flowers in the UK. The images have large scale, pose and light variations and there are also classes with large varations of images within the class and close similarity to other classes. The categories can be seen in the figure below. The dataset is randomly split into 3 different training, validation and test sets. A subset of the images have been groundtruth labelled for segmentation.

<!-- ![Image example](https://www.robots.ox.ac.uk/~vgg/data/flowers/17/categories.jpg) -->


### Download and Split Data

Let `$DATA_ROOT` denote the path of dataset. E.g., `/home/chenshoufa/share_data/comp3340`.
```bash
cd $DATA_ROOT
# download
wget https://www.robots.ox.ac.uk/~vgg/data/flowers/17/17flowers.tgz
tar zxvf 17flowers.tgz

# rename folder
mv jpg flowers

mkdir data
mv 17flowers data/flowers

# split
python split.py

# meta file
mkdir meta
python generate_meta.py

```

The foloder structure:
```
flowers
     |--train
          |--class_0
                |--image_xxxx.jpg
                |--image_xxxx.jpg
          |--class_1
                |--image_xxxx.jpg
     |--val
          |--class_0
                |--image_xxxx.jpg
                |--image_xxxx.jpg
          |--class_1
                |--image_xxxx.jpg  
    |--meta
        |--train.txt
        |--val.txt
```


### Data Loader



In ```mmclassification/mmcls```, we follow the file 'imagenet.py', which writes the dataset class of Imagenet, to write a dataset class file 'flowers.py'. Since we follow the data directory of Imagenet above, we just need to copy 'imagenet.py' and re-write the **CLASSES** here.  
```python
@DATASETS.register_module()
class Flowers(BaseDataset):

    IMG_EXTENSIONS = ('.jpg', '.jpeg', '.png', '.ppm', '.bmp', '.pgm', '.tif')
    CLASSES = [
        'daffodil', 'snowdrop', 'lilyValley', 'bluebell', 'crocys', 'iris', 'tigerlily', 'tulip', 'fritillary', 'sunflower', 'daisy', 'colts foot', 'dandelion', 'cowslip', 'buttercup', 'wind flower', 'pansy'
    ]

    def load_annotations(self):
        if self.ann_file is None:
            folder_to_idx = find_folders(self.data_prefix)
            samples = get_samples(
                self.data_prefix,
                folder_to_idx,
                extensions=self.IMG_EXTENSIONS)
            if len(samples) == 0:
                raise (RuntimeError('Found 0 files in subfolders of: '
                                    f'{self.data_prefix}. '
                                    'Supported extensions are: '
                                    f'{",".join(self.IMG_EXTENSIONS)}'))

            self.folder_to_idx = folder_to_idx
        elif isinstance(self.ann_file, str):
            with open(self.ann_file) as f:
                samples = [x.strip().rsplit(' ', 1) for x in f.readlines()]
        else:
            raise TypeError('ann_file must be a str or None')
        self.samples = samples

        data_infos = []
        for filename, gt_label in self.samples:
            info = {'img_prefix': self.data_prefix}
            info['img_info'] = {'filename': filename}
            info['gt_label'] = np.array(gt_label, dtype=np.int64)
            data_infos.append(info)
        return data_infos
```

We also need to import `Flower` class in the `mmcls/data/sets/__init__.py`:

```diff
 from .voc import VOC
+from .flowers import Flowers
 
 __all__ = [
     'BaseDataset', 'ImageNet', 'CIFAR10', 'CIFAR100', 'MNIST', 'FashionMNIST',
     'VOC', 'MultiLabelDataset', 'build_dataloader', 'build_dataset', 'Compose',
     'DistributedSampler', 'ConcatDataset', 'RepeatDataset',
-    'ClassBalancedDataset', 'DATASETS', 'PIPELINES'
+    'ClassBalancedDataset', 'DATASETS', 'PIPELINES', 'Flowers',
 ]

```


## Configuration


Now we are at `mmclassification/configs/_base_` directory

### set data config
a. In `_base_/datasets`, we add `flowers_bs32.py` to describle the basic config about data.
    Note that we need to claim the data meta path in it like that
```shell
    data = dict(
    samples_per_gpu=32,
    workers_per_gpu=1,
    train=dict(
        type=dataset_type,
        data_prefix='data/flowers/train',
        ann_file='data/flowers/meta/train.txt',
        pipeline=train_pipeline),
    val=dict(
        type=dataset_type,
        data_prefix='data/flowers/val',
        ann_file='data/flowers/meta/val.txt',
        pipeline=test_pipeline),
    test=dict(
        # replace `data/val` with `data/test` for standard test
        type=dataset_type,
        data_prefix='data/flowers/val',
        ann_file='data/flowers/meta/val.txt',
        pipeline=test_pipeline))
    evaluation = dict(interval=1, metric='accuracy')
```

### Building Vision Transformer
b. In `_base_/models`, we add `vit_base_flowers.py` to describle the basic config about model.

```shell
    type='ImageClassifier',
    backbone=dict(
        return_tuple=False,
        type='VisionTransformer',
        num_layers=12,
        embed_dim=768,
        num_heads=12,
        img_size=224,
        patch_size=16,
        in_channels=3,
        feedforward_channels=3072,
        drop_rate=0.1,
        attn_drop_rate=0.),
    neck=None,
    head=dict(
        type='VisionTransformerClsHead',
        num_classes=17,  # modify
        in_channels=768,
        hidden_dim=3072,
        loss=dict(type='LabelSmoothLoss', label_smooth_val=0.1),
        topk=(1, 5),
    ),
    train_cfg=dict(
        augments=dict(type='BatchMixup', alpha=0.2, num_classes=17,  # modify
                      prob=1.))
)
```
Note we use the `ViT-Base` model, and set `num_classes` as 17 since the number of flower categories is 17.

### Set training config
c. In `_base_/schedules`, we add `flowers_bs32.py`
```shell
paramwise_cfg = dict(
    norm_decay_mult=0.0,
    bias_decay_mult=0.0,
    custom_keys={
        '.absolute_pos_embed': dict(decay_mult=0.0),
        '.relative_position_bias_table': dict(decay_mult=0.0)
    })

# for batch in each gpu is 128, 8 gpu
# lr = 5e-4 * 128 * 8 / 512 = 0.001
optimizer = dict(
    type='AdamW',
    # lr=5e-4 * 128 * 8 / 512,
    lr=5e-4 * 16 / 512,
    weight_decay=0.05,
    eps=1e-8,
    betas=(0.9, 0.999),
    paramwise_cfg=paramwise_cfg)
optimizer_config = dict(grad_clip=dict(max_norm=5.0))

# learning policy
lr_config = dict(
    policy='CosineAnnealing',
    by_epoch=False,
    min_lr_ratio=1e-2,
    warmup='linear',
    warmup_ratio=1e-3,
    warmup_iters=20 * 1252,
    warmup_by_epoch=False)

runner = dict(type='EpochBasedRunner', max_epochs=300)

```

### AutoAugmentation

```python

policies = [
    [
        dict(type='Posterize', bits=4, prob=0.4),
        dict(type='Rotate', angle=30., prob=0.6)
    ],
    [
        dict(type='Solarize', thr=256 / 9 * 4, prob=0.6),
        dict(type='AutoContrast', prob=0.5)
    ],
...
    [
        dict(type='ColorTransform', magnitude=0.4, prob=0.6),
        dict(type='Contrast', magnitude=0.8, prob=1.)
    ],
    [dict(type='Equalize', prob=0.8),
     dict(type='Equalize', prob=0.6)],
]
```

### set saving config
d. In the ```_base_/default_runtime.py```, we set the config about checkpoint saving and log file.
```
    # checkpoint saving
    checkpoint_config = dict(interval=1)
    # yapf:disable
    log_config = dict(
        interval=100,
        hooks=[
            dict(type='TextLoggerHook'),
            # dict(type='TensorboardLoggerHook')
        ])
    # yapf:enable

    dist_params = dict(backend='nccl')
    log_level = 'INFO'
    load_from = None
    resume_from = None
    workflow = [('train', 1)]
```

And then, we new a config file in ```mmclassification/configs/resnet/resnet18_flowers_bs128.py``` 
we add these config .py files into this file as
```
_base_ = [
    '../_base_/models/vit_base_flowers.py',
    '../_base_/datasets/flowers_bs32.py',
    '../_base_/schedules/flowers_bs32.py',
    '../_base_/default_runtime.py'
]
```

## Train

In the directory of 'mmclassification', we run
```shell
python tools/train.py configs/vision_transformer/vit_base_patch16_224_flowers.py
```



Logs:

```
2021-10-11 17:16:01,923 - mmcls - INFO - Epoch [2][10/75]       lr: 6.799e-08, eta: 1:06:39, time: 0.355, data_time: 0.236, memory: 4064, loss: 2.8331, grad_norm: 7.4339
2021-10-11 17:16:03,150 - mmcls - INFO - Epoch [2][20/75]       lr: 7.422e-08, eta: 1:04:26, time: 0.123, data_time: 0.003, memory: 4064, loss: 2.8331, grad_norm: 7.2677
2021-10-11 17:16:04,365 - mmcls - INFO - Epoch [2][30/75]       lr: 8.045e-08, eta: 1:02:35, time: 0.121, data_time: 0.003, memory: 4064, loss: 2.8331, grad_norm: 7.4220
2021-10-11 17:16:05,570 - mmcls - INFO - Epoch [2][40/75]       lr: 8.668e-08, eta: 1:01:02, time: 0.121, data_time: 0.003, memory: 4064, loss: 2.8331, grad_norm: 7.1345 
2021-10-11 17:16:06,838 - mmcls - INFO - Epoch [2][50/75]       lr: 9.292e-08, eta: 0:59:54, time: 0.127, data_time: 0.004, memory: 4064, loss: 2.8330, grad_norm: 7.3180
2021-10-11 17:16:08,087 - mmcls - INFO - Epoch [2][60/75]       lr: 9.915e-08, eta: 0:58:53, time: 0.125, data_time: 0.004, memory: 4064, loss: 2.8330, grad_norm: 7.2354
2021-10-11 17:16:09,308 - mmcls - INFO - Epoch [2][70/75]       lr: 1.054e-07, eta: 0:57:56, time: 0.122, data_time: 0.004, memory: 4064, loss: 2.8330, grad_norm: 7.3601
2021-10-11 17:16:09,999 - mmcls - INFO - Saving checkpoint at 2 epochs
[>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>] 170/170, 181.1 task/s, elapsed: 1s, ETA:     0s2021-10-11 17:16:12,832 - mmcls - INFO - Epoch(val) [2][11] accuracy_top-1: 20.5882, accuracy_top-5: 64.70
59

```

## Evaluation

```bash
python tools/test.py configs/vision_transformer/vit_base_patch16_224_flowers.py path/to/checkpoint.pth
```

## Group Project Description

* In this project, you need to implement at least one MLP-based model (e.g., MLP-Mixer, ResMLP, gMLP) based on MMClassification codebase; 
* You need to train your model on the  Category Flower Dataset, which contains 17 category flower dataset with 80 images for each class. After training, you need test your model on the validation set. 
* Bonus: Implement more than one MLP-based models and discuss their similarities and differences.

## References

[1] Tolstikhin, Ilya, et al. "Mlp-mixer: An all-mlp architecture for vision." NeurIPS, 2021.

[2] OpenMMLab's Image Classification Toolbox and Benchmark. https://github.com/open-mmlab/mmclassification
