[Doc] Add Tutorial of KD,Pruning,NAS and Installation. (#255)

* [doc] add doc of installation, kd, nas, pruning * [doc] add user guide part * fix name of mmcv-full to mmcv * [doc] fix mmcv based on zaida 's suggestion * [doc] fix index error based on shiguang's comments
open-mmlab · Aug 30, 2022 · c650b3e · c650b3e
1 parent ce22497
commit c650b3e
Show file tree

Hide file tree

Showing 7 changed files with 795 additions and 0 deletions.
diff --git a/docs/en/get_started/installation.md b/docs/en/get_started/installation.md
@@ -1 +1,111 @@
 # Installation
+
+## Prepare Environment
+
+Create a conda virtual environment and activate it.
+
+```Python
+conda create -n openmmlab python=3.7 -y
+conda activate openmmlab
+```
+
+Install PyTorch and torchvision following the [official instructions](https://pytorch.org/).
+
+Note: Make sure that your compilation CUDA version and runtime CUDA version match. You can check the supported CUDA version for precompiled packages on the [PyTorch website](https://pytorch.org/).
+
+`E.g.1` If you have CUDA 10.2 installed under `/usr/local/cuda` and would like to install PyTorch 1.10, you need to install the prebuilt PyTorch with CUDA 10.2.
+
+```Python
+conda install pytorch torchvision torchaudio cudatoolkit=10.2 -c pytorch
+```
+
+`E.g.2` If you have CUDA 9.2 installed under `/usr/local/cuda` and would like to install PyTorch 1.5.1, you need to install the prebuilt PyTorch with CUDA 9.2.
+
+```Python
+conda install pytorch==1.5.1 torchvision==0.6.1 cudatoolkit=9.2 -c pytorch
+```
+
+- If you build PyTorch from source instead of installing the prebuilt package, you can use more CUDA versions such as 9.0.
+
+## Customize Installation
+
+It is recommended to install MMRazor with [MIM](https://github.com/open-mmlab/mim), which automatically handles the dependencies of OpenMMLab projects, including mmcv and other python packages.
+
+```Python
+pip install openmim
+mim install git+https://github.com/open-mmlab/mmrazor.git@1.0.0rc0
+```
+
+Or you can still install MMRazor manually
+
+1. Install mmcv.
+
+```Python
+pip install 'mmcv>=2.0.0rc1' -f https://download.openmmlab.com/mmcv/dist/{cu_version}/{torch_version}/index.html
+```
+
+Please replace `{cu_version}` and `{torch_version}` in the url to your desired one. For example, to install the latest `mmcv` with `CUDA 10.2` and `PyTorch 1.10.0`, use the following command:
+
+```Python
+pip install 'mmcv>=2.0.0rc1' -f https://download.openmmlab.com/mmcv/dist/cu102/torch1.10.0/index.html
+```
+
+See [here](https://github.com/open-mmlab/mmcv#installation) for different versions of MMCV compatible to different PyTorch and CUDA versions.
+
+Optionally, you can compile mmcv from source.
+
+```
+MMCV_WITH_OPS=0 pip install -e . -v
+# install mmcv-lite, do not compile operators
+MMCV_WITH_OPS=1 pip install -e . -v
+# install mmcv (originally called mmcv-full), compile operators
+pip install -e . -v
+# install mmcv with compiled operators，
+```
+
+2. Install MMEngine.
+
+Compile MMEngine from source.
+
+```Python
+git clone https://github.com/open-mmlab/mmengine.git
+cd mmengine
+pip install -v -e .
+```
+
+3. Install MMRazor.
+
+If you would like to install MMRazor in `dev` mode, run following:
+
+```Python
+git clone https://github.com/open-mmlab/mmrazor.git
+cd mmrazor
+git fetch origin
+git checkout -b 1.0.0rc0 origin/1.0.0rc0
+# The new version is released in branch ``1.0.0rc0``
+pip install -v -e .
+# "-v" means verbose, or more output
+# "-e" means installing a project in editable mode,
+# thus any local modifications made to the code will take effect without reinstallation.
+```
+
+**Note:**
+
+- When MMRazor is installed on `dev` mode, any local modifications made to the code will take effect without the need to reinstall it.
+
+## A from-scratch Setup Script
+
+```Python
+conda create -n openmmlab python=3.7 -y
+conda activate openmmlab
+
+conda install pytorch torchvision cudatoolkit=10.2 -c pytorch
+# install the latest mmcv
+pip install 'mmcv>=2.0.0rc1' -f https://download.openmmlab.com/mmcv/dist/cu102/torch1.10.0/index.html
+# install mmrazor
+git clone https://github.com/open-mmlab/mmrazor.git
+cd mmrazor
+git fetch origin
+git checkout -b 1.0.0rc0 origin/1.0.0rc0
+pip install -v -e .
+```
diff --git a/docs/en/user_guides/2_train_different_types_algorithms.md b/docs/en/user_guides/2_train_different_types_algorithms.md
@@ -1 +1,107 @@
 # Train different types algorithms
+
+**Before running our algorithms, you may need to prepare the datasets according to the instructions in the corresponding document.**
+
+**Note**:
+
+- With the help of mmengine, mmrazor unified entered interfaces for various tasks, thus our algorithms will adapt all OpenMMLab upstream repos in theory.
+
+- We dynamically pass arguments `cfg-options` (e.g., `mutable_cfg` in nas algorithm or `channel_cfg` in pruning algorithm) to **avoid the need for a config for each subnet or checkpoint**. If you want to specify different subnets for retraining or testing, you just need to change this argument.
+
+### NAS
+
+Here we take SPOS(Single Path One Shot) as an example. There are three steps to start neural network search(NAS), including **supernet pre-training**, **search for subnet on the trained supernet** and **subnet retraining**.
+
+#### Supernet Pre-training
+
+```Python
+python tools/train.py ${CONFIG_FILE} [optional arguments]
+```
+
+The usage of optional arguments are the same as corresponding tasks like mmclassification, mmdetection and mmsegmentation.
+
+For example,
+
+```Python
+python ./tools/train.py \
+  configs/nas/mmcls/spos/spos_shufflenet_supernet_8xb128_in1k.py
+  --work-dir $WORK_DIR
+```
+
+#### Search for Subnet on The Trained Supernet
+
+```Python
+python tools/train.py ${CONFIG_FILE} ${CHECKPOINT_PATH} [optional arguments]
+```
+
+For example,
+
+```Python
+python ./tools/train.py \
+  configs/nas/mmcls/spos/spos_shufflenet_search_8xb128_in1k.py \
+  $STEP1_CKPT \
+  --work-dir $WORK_DIR
+```
+
+#### Subnet Retraining
+
+```Python
+python tools/train.py ${CONFIG_FILE} \
+    --cfg-options algorithm.fix_subnet=${MUTABLE_CFG_PATH} [optional arguments]
+```
+
+- `MUTABLE_CFG_PATH`: Path of `fix_subnet`. `fix_subnet` represents **config for mutable of the subnet searched out**, used to specify different subnets for retraining. An example for `fix_subnet` can be found [here](https://github.com/open-mmlab/mmrazor/blob/master/configs/nas/spos/SPOS_SHUFFLENETV2_330M_IN1k_PAPER.yaml), and the usage can be found [here](https://github.com/open-mmlab/mmrazor/blob/master/configs/nas/spos/README.md#subnet-retraining-on-imagenet).
+
+For example,
+
+```Python
+python ./tools/train.py \
+  configs/nas/mmcls/spos/spos_shufflenet_subnet_8xb128_in1k.py \
+  --work-dir $WORK_DIR \
+  --cfg-options algorithm.fix_subnet=$YAML_FILE_BY_STEP2
+```
+
+We note that instead of using `--cfg-options`, you can also directly modify ``` configs/nas/mmcls/spos/``spos_shufflenet_subnet_8xb128_in1k``.py ``` like this:
+
+```Python
+fix_subnet = 'configs/nas/mmcls/spos/SPOS_SHUFFLENETV2_330M_IN1k_PAPER.yaml'
+model = dict(fix_subnet=fix_subnet)
+```
+
+### Pruning
+
+Pruning has three steps, including **supernet pre-training**, **search for subnet on the trained supernet** and **subnet retraining**. The commands of the first two steps are similar to NAS, except that we need to use `CONFIG_FILE` of Pruning here. The commands of the **subnet retraining** are as follows.
+
+#### Subnet Retraining
+
+```Python
+python tools/train.py ${CONFIG_FILE} --cfg-options model._channel_cfg_paths=${CHANNEL_CFG_PATH} [optional arguments]
+```
+
+Different from NAS, the argument that needs to be specified here is `channel_cfg_paths` .
+
+- `CHANNEL_CFG_PATH`: Path of `_channel_cfg_path`. `channel_cfg` represents **config for channel of the subnet searched out**, used to specify different subnets for testing.
+
+For example, the default `_channel_cfg_paths` is set in the config below.
+
+```Python
+python ./tools/train.py \
+  configs/pruning/mmcls/autoslim/autoslim_mbv2_1.5x_subnet_8xb256_in1k_flops-530M \
+  --work-dir your_work_dir
+```
+
+### Distillation
+
+There is only one step to start knowledge distillation.
+
+```Python
+python tools/train.py ${CONFIG_FILE} [optional arguments]
+```
+
+For example,
+
+```Python
+python ./tools/train.py \
+  configs/distill/mmcls/kd/kd_logits_r34_r18_8xb32_in1k.py \
+  --work-dir your_work_dir
+```
diff --git a/docs/en/user_guides/3_train_with_different_devices.md b/docs/en/user_guides/3_train_with_different_devices.md
@@ -1 +1,93 @@
 # Train with different devices
+
+**Note**: The default learning rate in config files is for 8 GPUs. If using different number GPUs, the total batch size will change in proportion, you have to scale the learning rate following `new_lr = old_lr * new_ngpus / old_ngpus`. We recommend to use `tools/dist_train.sh` even with 1 gpu, since some methods do not support non-distributed training.
+
+### Training with CPU
+
+```Python
+export CUDA_VISIBLE_DEVICES=-1
+python tools/train.py ${CONFIG_FILE}
+```
+
+**Note**: We do not recommend users to use CPU for training because it is too slow and some algorithms are using `SyncBN` which is based on distributed training. We support this feature to allow users to debug on machines without GPU for convenience.
+
+### Train with single/multiple GPUs
+
+```Python
+sh tools/dist_train.sh ${CONFIG_FILE} ${GPUS} --work_dir ${YOUR_WORK_DIR} [optional arguments]
+```
+
+**Note**: During training, checkpoints and logs are saved in the same folder structure as the config file under `work_dirs/`. Custom work directory is not recommended since evaluation scripts infer work directories from the config file name. If you want to save your weights somewhere else, please use symlink, for example:
+
+```Python
+ln -s ${YOUR_WORK_DIRS} ${MMRAZOR}/work_dirs
+```
+
+Alternatively, if you run MMRazor on a cluster managed with [slurm](https://slurm.schedmd.com/):
+
+```Python
+GPUS_PER_NODE=${GPUS_PER_NODE} GPUS=${GPUS} SRUN_ARGS=${SRUN_ARGS} sh tools/xxx/slurm_train.sh ${PARTITION} ${JOB_NAME} ${CONFIG_FILE} ${YOUR_WORK_DIR} [optional arguments]
+```
+
+### Train with multiple machines
+
+If you launch with multiple machines simply connected with ethernet, you can simply run the following commands:
+
+On the first machine:
+
+```Python
+NNODES=2 NODE_RANK=0 PORT=$MASTER_PORT MASTER_ADDR=$MASTER_ADDR sh tools/dist_train.sh $CONFIG $GPUS
+```
+
+On the second machine:
+
+```Python
+NNODES=2 NODE_RANK=1 PORT=$MASTER_PORT MASTER_ADDR=$MASTER_ADDR sh tools/dist_train.sh $CONFIG $GPUS
+```
+
+Usually it is slow if you do not have high speed networking like InfiniBand.
+
+If you launch with slurm, the command is the same as that on single machine described above, but you need refer to [slurm_train.sh](https://github.com/open-mmlab/mmselfsup/blob/master/tools/slurm_train.sh) to set appropriate parameters and environment variables.
+
+### Launch multiple jobs on a single machine
+
+If you launch multiple jobs on a single machine, e.g., 2 jobs of 4-GPU training on a machine with 8 GPUs, you need to specify different ports (29500 by default) for each job to avoid communication conflict.
+
+If you use `dist_train.sh` to launch training jobs:
+
+```Python
+CUDA_VISIBLE_DEVICES=0,1,2,3 PORT=29500 sh tools/xxx/dist_train.sh ${CONFIG_FILE} 4 --work_dir tmp_work_dir_1
+CUDA_VISIBLE_DEVICES=4,5,6,7 PORT=29501 sh tools/xxx/dist_train.sh ${CONFIG_FILE} 4 --work_dir tmp_work_dir_2
+```
+
+If you use launch training jobs with slurm, you have two options to set different communication ports:
+
+Option 1:
+
+In `config1.py`:
+
+```Python
+dist_params = dict(backend='nccl', port=29500)
+```
+
+In `config2.py`:
+
+```Python
+dist_params = dict(backend='nccl', port=29501)
+```
+
+Then you can launch two jobs with config1.py and config2.py.
+
+```Python
+CUDA_VISIBLE_DEVICES=0,1,2,3 GPUS=4 sh tools/slurm_train.sh ${PARTITION} ${JOB_NAME} config1.py tmp_work_dir_1
+CUDA_VISIBLE_DEVICES=4,5,6,7 GPUS=4 sh tools/slurm_train.sh ${PARTITION} ${JOB_NAME} config2.py tmp_work_dir_2
+```
+
+Option 2:
+
+You can set different communication ports without the need to modify the configuration file, but have to set the `cfg-options` to overwrite the default port in configuration file.
+
+```Python
+CUDA_VISIBLE_DEVICES=0,1,2,3 GPUS=4 sh tools/slurm_train.sh ${PARTITION} ${JOB_NAME} config1.py tmp_work_dir_1 --cfg-options dist_params.port=29500
+CUDA_VISIBLE_DEVICES=4,5,6,7 GPUS=4 sh tools/slurm_train.sh ${PARTITION} ${JOB_NAME} config2.py tmp_work_dir_2 --cfg-options dist_params.port=29501
+```
diff --git a/docs/en/user_guides/4_test_a_model.md b/docs/en/user_guides/4_test_a_model.md
@@ -1 +1,79 @@
 # Test a model
+
+### NAS
+
+To test nas method, you can use the following command.
+
+```Python
+python tools/test.py ${CONFIG_FILE} ${CHECKPOINT_PATH} --cfg-options algorithm.fix_subnet=${FIX_SUBNET_PATH} [optional arguments]
+```
+
+- `FIX_SUBNET_PATH`: Path of `fix_subnet`. `fix_subnet` represents **config for mutable of the subnet searched out**, used to specify different subnets for testing. An example for `fix_subnet` can be found [here](https://github.com/open-mmlab/mmrazor/blob/master/configs/nas/spos/SPOS_SHUFFLENETV2_330M_IN1k_PAPER.yaml).
+
+The usage of optional arguments are the same as corresponding tasks like mmclassification, mmdetection and mmsegmentation.
+
+For example,
+
+```Python
+python tools/test.py \
+  configs/nas/mmcls/spos/spos_subnet_shufflenetv2_8xb128_in1k.py \
+  your_subnet_checkpoint_path \
+  --cfg-options algorithm.fix_subnet=configs/nas/mmcls/spos/SPOS_SHUFFLENETV2_330M_IN1k_PAPER.yaml
+```
+
+### Pruning
+
+#### Split Checkpoint(Optional)
+
+If you train a slimmable model during retraining, checkpoints of different subnets are actually fused in only one checkpoint. You can split this checkpoint to multiple independent checkpoints by using the following command
+
+```Python
+python tools/model_converters/split_checkpoint.py ${CONFIG_FILE} ${CHECKPOINT_PATH} --channel-cfgs ${CHANNEL_CFG_PATH} [optional arguments]
+```
+
+- `CHANNEL_CFG_PATH`: A list of paths of `channel_cfg`. For example, when you retrain a slimmable model, your command will be like `--cfg-options algorithm.channel_cfg=cfg1,cfg2,cfg3`. And your command here should be `--channel-cfgs cfg1 cfg2 cfg3`. The order of them should be the same.
+
+For example,
+
+```Python
+python tools/model_converters/split_checkpoint.py \
+  configs/pruning/autoslim/autoslim_mbv2_subnet_8xb256_in1k.py \
+  your_retraining_checkpoint_path \
+  --channel-cfgs configs/pruning/autoslim/AUTOSLIM_MBV2_530M_OFFICIAL.yaml configs/pruning/autoslim/AUTOSLIM_MBV2_320M_OFFICIAL.yaml configs/pruning/autoslim/AUTOSLIM_MBV2_220M_OFFICIAL.yaml
+```
+
+#### Test
+
+To test pruning method, you can use following command
+
+```Python
+python tools/test.py ${CONFIG_FILE} ${CHECKPOINT_PATH} --cfg-options model._channel_cfg_paths=${CHANNEL_CFG_PATH} [optional arguments]
+```
+
+- `task`: one of `mmcls`、`mmdet` and `mmseg`
+
+- `CHANNEL_CFG_PATH`: Path of `channel_cfg`. `channel_cfg` represents **config for channel of the subnet searched out**, used to specify different subnets for testing. An example for `channel_cfg` can be found [here](https://github.com/open-mmlab/mmrazor/blob/master/configs/pruning/autoslim/AUTOSLIM_MBV2_220M_OFFICIAL.yaml), and the usage can be found [here](https://github.com/open-mmlab/mmrazor/blob/master/configs/pruning/autoslim/README.md#test-a-subnet).
+
+For example,
+
+```Python
+python ./tools/test.py \
+  configs/pruning/mmcls/autoslim/autoslim_mbv2__1.5x_subnet_8xb256_in1k-530M.py \
+  your_splitted_checkpoint_path --metrics accuracy
+```
+
+### Distillation
+
+To test the distillation method, you can use the following command
+
+```Python
+python tools/test.py ${CONFIG_FILE} ${CHECKPOINT_PATH} [optional arguments]
+```
+
+For example,
+
+```Python
+python ./tools/test.py \
+  configs/distill/mmseg/cwd/cwd_logits_pspnet_r101_d8_pspnet_r18_d8_512x1024_cityscapes_80k.py \
+  your_splitted_checkpoint_path --show
+```