Skip to content

Commit

Permalink
[Doc] Add Tutorial of KD,Pruning,NAS and Installation. (#255)
Browse files Browse the repository at this point in the history
* [doc] add doc of installation, kd, nas, pruning

* [doc] add user guide part

* fix name of mmcv-full to mmcv

* [doc] fix mmcv based on zaida 's suggestion

* [doc] fix index error based on shiguang's comments
  • Loading branch information
pprp committed Aug 30, 2022
1 parent ce22497 commit c650b3e
Show file tree
Hide file tree
Showing 7 changed files with 795 additions and 0 deletions.
110 changes: 110 additions & 0 deletions docs/en/get_started/installation.md
Original file line number Diff line number Diff line change
@@ -1 +1,111 @@
# Installation

## Prepare Environment

Create a conda virtual environment and activate it.

```Python
conda create -n openmmlab python=3.7 -y
conda activate openmmlab
```

Install PyTorch and torchvision following the [official instructions](https://pytorch.org/).

Note: Make sure that your compilation CUDA version and runtime CUDA version match. You can check the supported CUDA version for precompiled packages on the [PyTorch website](https://pytorch.org/).

`E.g.1` If you have CUDA 10.2 installed under `/usr/local/cuda` and would like to install PyTorch 1.10, you need to install the prebuilt PyTorch with CUDA 10.2.

```Python
conda install pytorch torchvision torchaudio cudatoolkit=10.2 -c pytorch
```

`E.g.2` If you have CUDA 9.2 installed under `/usr/local/cuda` and would like to install PyTorch 1.5.1, you need to install the prebuilt PyTorch with CUDA 9.2.

```Python
conda install pytorch==1.5.1 torchvision==0.6.1 cudatoolkit=9.2 -c pytorch
```

- If you build PyTorch from source instead of installing the prebuilt package, you can use more CUDA versions such as 9.0.

## Customize Installation

It is recommended to install MMRazor with [MIM](https://github.com/open-mmlab/mim), which automatically handles the dependencies of OpenMMLab projects, including mmcv and other python packages.

```Python
pip install openmim
mim install git+https://github.com/open-mmlab/mmrazor.git@1.0.0rc0
```

Or you can still install MMRazor manually

1. Install mmcv.

```Python
pip install 'mmcv>=2.0.0rc1' -f https://download.openmmlab.com/mmcv/dist/{cu_version}/{torch_version}/index.html
```

Please replace `{cu_version}` and `{torch_version}` in the url to your desired one. For example, to install the latest `mmcv` with `CUDA 10.2` and `PyTorch 1.10.0`, use the following command:

```Python
pip install 'mmcv>=2.0.0rc1' -f https://download.openmmlab.com/mmcv/dist/cu102/torch1.10.0/index.html
```

See [here](https://github.com/open-mmlab/mmcv#installation) for different versions of MMCV compatible to different PyTorch and CUDA versions.

Optionally, you can compile mmcv from source.

```
MMCV_WITH_OPS=0 pip install -e . -v
# install mmcv-lite, do not compile operators
MMCV_WITH_OPS=1 pip install -e . -v
# install mmcv (originally called mmcv-full), compile operators
pip install -e . -v
# install mmcv with compiled operators,
```

2. Install MMEngine.

Compile MMEngine from source.

```Python
git clone https://github.com/open-mmlab/mmengine.git
cd mmengine
pip install -v -e .
```

3. Install MMRazor.

If you would like to install MMRazor in `dev` mode, run following:

```Python
git clone https://github.com/open-mmlab/mmrazor.git
cd mmrazor
git fetch origin
git checkout -b 1.0.0rc0 origin/1.0.0rc0
# The new version is released in branch ``1.0.0rc0``
pip install -v -e .
# "-v" means verbose, or more output
# "-e" means installing a project in editable mode,
# thus any local modifications made to the code will take effect without reinstallation.
```

**Note:**

- When MMRazor is installed on `dev` mode, any local modifications made to the code will take effect without the need to reinstall it.

## A from-scratch Setup Script

```Python
conda create -n openmmlab python=3.7 -y
conda activate openmmlab

conda install pytorch torchvision cudatoolkit=10.2 -c pytorch
# install the latest mmcv
pip install 'mmcv>=2.0.0rc1' -f https://download.openmmlab.com/mmcv/dist/cu102/torch1.10.0/index.html
# install mmrazor
git clone https://github.com/open-mmlab/mmrazor.git
cd mmrazor
git fetch origin
git checkout -b 1.0.0rc0 origin/1.0.0rc0
pip install -v -e .
```
106 changes: 106 additions & 0 deletions docs/en/user_guides/2_train_different_types_algorithms.md
Original file line number Diff line number Diff line change
@@ -1 +1,107 @@
# Train different types algorithms

**Before running our algorithms, you may need to prepare the datasets according to the instructions in the corresponding document.**

**Note**:

- With the help of mmengine, mmrazor unified entered interfaces for various tasks, thus our algorithms will adapt all OpenMMLab upstream repos in theory.

- We dynamically pass arguments `cfg-options` (e.g., `mutable_cfg` in nas algorithm or `channel_cfg` in pruning algorithm) to **avoid the need for a config for each subnet or checkpoint**. If you want to specify different subnets for retraining or testing, you just need to change this argument.

### NAS

Here we take SPOS(Single Path One Shot) as an example. There are three steps to start neural network search(NAS), including **supernet pre-training**, **search for subnet on the trained supernet** and **subnet retraining**.

#### Supernet Pre-training

```Python
python tools/train.py ${CONFIG_FILE} [optional arguments]
```

The usage of optional arguments are the same as corresponding tasks like mmclassification, mmdetection and mmsegmentation.

For example,

```Python
python ./tools/train.py \
configs/nas/mmcls/spos/spos_shufflenet_supernet_8xb128_in1k.py
--work-dir $WORK_DIR
```

#### Search for Subnet on The Trained Supernet

```Python
python tools/train.py ${CONFIG_FILE} ${CHECKPOINT_PATH} [optional arguments]
```

For example,

```Python
python ./tools/train.py \
configs/nas/mmcls/spos/spos_shufflenet_search_8xb128_in1k.py \
$STEP1_CKPT \
--work-dir $WORK_DIR
```

#### Subnet Retraining

```Python
python tools/train.py ${CONFIG_FILE} \
--cfg-options algorithm.fix_subnet=${MUTABLE_CFG_PATH} [optional arguments]
```

- `MUTABLE_CFG_PATH`: Path of `fix_subnet`. `fix_subnet` represents **config for mutable of the subnet searched out**, used to specify different subnets for retraining. An example for `fix_subnet` can be found [here](https://github.com/open-mmlab/mmrazor/blob/master/configs/nas/spos/SPOS_SHUFFLENETV2_330M_IN1k_PAPER.yaml), and the usage can be found [here](https://github.com/open-mmlab/mmrazor/blob/master/configs/nas/spos/README.md#subnet-retraining-on-imagenet).

For example,

```Python
python ./tools/train.py \
configs/nas/mmcls/spos/spos_shufflenet_subnet_8xb128_in1k.py \
--work-dir $WORK_DIR \
--cfg-options algorithm.fix_subnet=$YAML_FILE_BY_STEP2
```

We note that instead of using `--cfg-options`, you can also directly modify ``` configs/nas/mmcls/spos/``spos_shufflenet_subnet_8xb128_in1k``.py ``` like this:

```Python
fix_subnet = 'configs/nas/mmcls/spos/SPOS_SHUFFLENETV2_330M_IN1k_PAPER.yaml'
model = dict(fix_subnet=fix_subnet)
```

### Pruning

Pruning has three steps, including **supernet pre-training**, **search for subnet on the trained supernet** and **subnet retraining**. The commands of the first two steps are similar to NAS, except that we need to use `CONFIG_FILE` of Pruning here. The commands of the **subnet retraining** are as follows.

#### Subnet Retraining

```Python
python tools/train.py ${CONFIG_FILE} --cfg-options model._channel_cfg_paths=${CHANNEL_CFG_PATH} [optional arguments]
```

Different from NAS, the argument that needs to be specified here is `channel_cfg_paths` .

- `CHANNEL_CFG_PATH`: Path of `_channel_cfg_path`. `channel_cfg` represents **config for channel of the subnet searched out**, used to specify different subnets for testing.

For example, the default `_channel_cfg_paths` is set in the config below.

```Python
python ./tools/train.py \
configs/pruning/mmcls/autoslim/autoslim_mbv2_1.5x_subnet_8xb256_in1k_flops-530M \
--work-dir your_work_dir
```

### Distillation

There is only one step to start knowledge distillation.

```Python
python tools/train.py ${CONFIG_FILE} [optional arguments]
```

For example,

```Python
python ./tools/train.py \
configs/distill/mmcls/kd/kd_logits_r34_r18_8xb32_in1k.py \
--work-dir your_work_dir
```
92 changes: 92 additions & 0 deletions docs/en/user_guides/3_train_with_different_devices.md
Original file line number Diff line number Diff line change
@@ -1 +1,93 @@
# Train with different devices

**Note**: The default learning rate in config files is for 8 GPUs. If using different number GPUs, the total batch size will change in proportion, you have to scale the learning rate following `new_lr = old_lr * new_ngpus / old_ngpus`. We recommend to use `tools/dist_train.sh` even with 1 gpu, since some methods do not support non-distributed training.

### Training with CPU

```Python
export CUDA_VISIBLE_DEVICES=-1
python tools/train.py ${CONFIG_FILE}
```

**Note**: We do not recommend users to use CPU for training because it is too slow and some algorithms are using `SyncBN` which is based on distributed training. We support this feature to allow users to debug on machines without GPU for convenience.

### Train with single/multiple GPUs

```Python
sh tools/dist_train.sh ${CONFIG_FILE} ${GPUS} --work_dir ${YOUR_WORK_DIR} [optional arguments]
```

**Note**: During training, checkpoints and logs are saved in the same folder structure as the config file under `work_dirs/`. Custom work directory is not recommended since evaluation scripts infer work directories from the config file name. If you want to save your weights somewhere else, please use symlink, for example:

```Python
ln -s ${YOUR_WORK_DIRS} ${MMRAZOR}/work_dirs
```

Alternatively, if you run MMRazor on a cluster managed with [slurm](https://slurm.schedmd.com/):

```Python
GPUS_PER_NODE=${GPUS_PER_NODE} GPUS=${GPUS} SRUN_ARGS=${SRUN_ARGS} sh tools/xxx/slurm_train.sh ${PARTITION} ${JOB_NAME} ${CONFIG_FILE} ${YOUR_WORK_DIR} [optional arguments]
```

### Train with multiple machines

If you launch with multiple machines simply connected with ethernet, you can simply run the following commands:

On the first machine:

```Python
NNODES=2 NODE_RANK=0 PORT=$MASTER_PORT MASTER_ADDR=$MASTER_ADDR sh tools/dist_train.sh $CONFIG $GPUS
```

On the second machine:

```Python
NNODES=2 NODE_RANK=1 PORT=$MASTER_PORT MASTER_ADDR=$MASTER_ADDR sh tools/dist_train.sh $CONFIG $GPUS
```

Usually it is slow if you do not have high speed networking like InfiniBand.

If you launch with slurm, the command is the same as that on single machine described above, but you need refer to [slurm_train.sh](https://github.com/open-mmlab/mmselfsup/blob/master/tools/slurm_train.sh) to set appropriate parameters and environment variables.

### Launch multiple jobs on a single machine

If you launch multiple jobs on a single machine, e.g., 2 jobs of 4-GPU training on a machine with 8 GPUs, you need to specify different ports (29500 by default) for each job to avoid communication conflict.

If you use `dist_train.sh` to launch training jobs:

```Python
CUDA_VISIBLE_DEVICES=0,1,2,3 PORT=29500 sh tools/xxx/dist_train.sh ${CONFIG_FILE} 4 --work_dir tmp_work_dir_1
CUDA_VISIBLE_DEVICES=4,5,6,7 PORT=29501 sh tools/xxx/dist_train.sh ${CONFIG_FILE} 4 --work_dir tmp_work_dir_2
```

If you use launch training jobs with slurm, you have two options to set different communication ports:

Option 1:

In `config1.py`:

```Python
dist_params = dict(backend='nccl', port=29500)
```

In `config2.py`:

```Python
dist_params = dict(backend='nccl', port=29501)
```

Then you can launch two jobs with config1.py and config2.py.

```Python
CUDA_VISIBLE_DEVICES=0,1,2,3 GPUS=4 sh tools/slurm_train.sh ${PARTITION} ${JOB_NAME} config1.py tmp_work_dir_1
CUDA_VISIBLE_DEVICES=4,5,6,7 GPUS=4 sh tools/slurm_train.sh ${PARTITION} ${JOB_NAME} config2.py tmp_work_dir_2
```

Option 2:

You can set different communication ports without the need to modify the configuration file, but have to set the `cfg-options` to overwrite the default port in configuration file.

```Python
CUDA_VISIBLE_DEVICES=0,1,2,3 GPUS=4 sh tools/slurm_train.sh ${PARTITION} ${JOB_NAME} config1.py tmp_work_dir_1 --cfg-options dist_params.port=29500
CUDA_VISIBLE_DEVICES=4,5,6,7 GPUS=4 sh tools/slurm_train.sh ${PARTITION} ${JOB_NAME} config2.py tmp_work_dir_2 --cfg-options dist_params.port=29501
```
78 changes: 78 additions & 0 deletions docs/en/user_guides/4_test_a_model.md
Original file line number Diff line number Diff line change
@@ -1 +1,79 @@
# Test a model

### NAS

To test nas method, you can use the following command.

```Python
python tools/test.py ${CONFIG_FILE} ${CHECKPOINT_PATH} --cfg-options algorithm.fix_subnet=${FIX_SUBNET_PATH} [optional arguments]
```

- `FIX_SUBNET_PATH`: Path of `fix_subnet`. `fix_subnet` represents **config for mutable of the subnet searched out**, used to specify different subnets for testing. An example for `fix_subnet` can be found [here](https://github.com/open-mmlab/mmrazor/blob/master/configs/nas/spos/SPOS_SHUFFLENETV2_330M_IN1k_PAPER.yaml).

The usage of optional arguments are the same as corresponding tasks like mmclassification, mmdetection and mmsegmentation.

For example,

```Python
python tools/test.py \
configs/nas/mmcls/spos/spos_subnet_shufflenetv2_8xb128_in1k.py \
your_subnet_checkpoint_path \
--cfg-options algorithm.fix_subnet=configs/nas/mmcls/spos/SPOS_SHUFFLENETV2_330M_IN1k_PAPER.yaml
```

### Pruning

#### Split Checkpoint(Optional)

If you train a slimmable model during retraining, checkpoints of different subnets are actually fused in only one checkpoint. You can split this checkpoint to multiple independent checkpoints by using the following command

```Python
python tools/model_converters/split_checkpoint.py ${CONFIG_FILE} ${CHECKPOINT_PATH} --channel-cfgs ${CHANNEL_CFG_PATH} [optional arguments]
```

- `CHANNEL_CFG_PATH`: A list of paths of `channel_cfg`. For example, when you retrain a slimmable model, your command will be like `--cfg-options algorithm.channel_cfg=cfg1,cfg2,cfg3`. And your command here should be `--channel-cfgs cfg1 cfg2 cfg3`. The order of them should be the same.

For example,

```Python
python tools/model_converters/split_checkpoint.py \
configs/pruning/autoslim/autoslim_mbv2_subnet_8xb256_in1k.py \
your_retraining_checkpoint_path \
--channel-cfgs configs/pruning/autoslim/AUTOSLIM_MBV2_530M_OFFICIAL.yaml configs/pruning/autoslim/AUTOSLIM_MBV2_320M_OFFICIAL.yaml configs/pruning/autoslim/AUTOSLIM_MBV2_220M_OFFICIAL.yaml
```

#### Test

To test pruning method, you can use following command

```Python
python tools/test.py ${CONFIG_FILE} ${CHECKPOINT_PATH} --cfg-options model._channel_cfg_paths=${CHANNEL_CFG_PATH} [optional arguments]
```

- `task`: one of `mmcls``mmdet` and `mmseg`

- `CHANNEL_CFG_PATH`: Path of `channel_cfg`. `channel_cfg` represents **config for channel of the subnet searched out**, used to specify different subnets for testing. An example for `channel_cfg` can be found [here](https://github.com/open-mmlab/mmrazor/blob/master/configs/pruning/autoslim/AUTOSLIM_MBV2_220M_OFFICIAL.yaml), and the usage can be found [here](https://github.com/open-mmlab/mmrazor/blob/master/configs/pruning/autoslim/README.md#test-a-subnet).

For example,

```Python
python ./tools/test.py \
configs/pruning/mmcls/autoslim/autoslim_mbv2__1.5x_subnet_8xb256_in1k-530M.py \
your_splitted_checkpoint_path --metrics accuracy
```

### Distillation

To test the distillation method, you can use the following command

```Python
python tools/test.py ${CONFIG_FILE} ${CHECKPOINT_PATH} [optional arguments]
```

For example,

```Python
python ./tools/test.py \
configs/distill/mmseg/cwd/cwd_logits_pspnet_r101_d8_pspnet_r18_d8_512x1024_cityscapes_80k.py \
your_splitted_checkpoint_path --show
```
Loading

0 comments on commit c650b3e

Please sign in to comment.