From f89cd0e5d83bc69cbb630b848ebdfb1bfba986de Mon Sep 17 00:00:00 2001 From: PJDong <1115957667@qq.com> Date: Tue, 30 Aug 2022 11:16:31 +0800 Subject: [PATCH] [doc] add user guide part --- .../2_train_different_types_algorithms.md | 106 ++++++++++++++++++ .../3_train_with_different_devices.md | 92 +++++++++++++++ docs/en/user_guides/4_test_a_model.md | 78 +++++++++++++ 3 files changed, 276 insertions(+) diff --git a/docs/en/user_guides/2_train_different_types_algorithms.md b/docs/en/user_guides/2_train_different_types_algorithms.md index 61385b4b0..01f295581 100644 --- a/docs/en/user_guides/2_train_different_types_algorithms.md +++ b/docs/en/user_guides/2_train_different_types_algorithms.md @@ -1 +1,107 @@ # Train different types algorithms + +**Before running our algorithms, you may need to prepare the datasets according to the instructions in the corresponding document.** + +**Note**: + +- With the help of mmengine, mmrazor unified entered interfaces for various tasks, thus our algorithms will adapt all OpenMMLab upstream repos in theory. + +- We dynamically pass arguments `cfg-options` (e.g., `mutable_cfg` in nas algorithm or `channel_cfg` in pruning algorithm) to **avoid the need for a config for each subnet or checkpoint**. If you want to specify different subnets for retraining or testing, you just need to change this argument. + +### NAS + +Here we take SPOS(Single Path One Shot) as an example. There are three steps to start neural network search(NAS), including **supernet pre-training**, **search for subnet on the trained supernet** and **subnet retraining**. + +#### Supernet Pre-training + +```Python +python tools/train.py ${CONFIG_FILE} [optional arguments] +``` + +The usage of optional arguments are the same as corresponding tasks like mmclassification, mmdetection and mmsegmentation. + +For example, + +```Python +python ./tools/train.py \ + configs/nas/mmcls/spos/spos_shufflenet_supernet_8xb128_in1k.py + --work-dir $WORK_DIR +``` + +#### Search for Subnet on The Trained Supernet + +```Python +python tools/train.py ${CONFIG_FILE} ${CHECKPOINT_PATH} [optional arguments] +``` + +For example, + +```Python +python ./tools/train.py \ + configs/nas/mmcls/spos/spos_shufflenet_search_8xb128_in1k.py \ + $STEP1_CKPT \ + --work-dir $WORK_DIR +``` + +#### Subnet Retraining + +```Python +python tools/train.py ${CONFIG_FILE} \ + --cfg-options algorithm.fix_subnet=${MUTABLE_CFG_PATH} [optional arguments] +``` + +- `MUTABLE_CFG_PATH`: Path of `fix_subnet`. `fix_subnet` represents **config for mutable of the subnet searched out**, used to specify different subnets for retraining. An example for `fix_subnet` can be found [here](https://github.com/open-mmlab/mmrazor/blob/master/configs/nas/spos/SPOS_SHUFFLENETV2_330M_IN1k_PAPER.yaml), and the usage can be found [here](https://github.com/open-mmlab/mmrazor/blob/master/configs/nas/spos/README.md#subnet-retraining-on-imagenet). + +For example, + +```Python +python ./tools/train.py \ + configs/nas/mmcls/spos/spos_shufflenet_subnet_8xb128_in1k.py \ + --work-dir $WORK_DIR \ + --cfg-options algorithm.fix_subnet=$YAML_FILE_BY_STEP2 +``` + +We note that instead of using `--cfg-options`, you can also directly modify ``` configs/nas/mmcls/spos/``spos_shufflenet_subnet_8xb128_in1k``.py ``` like this: + +```Python +fix_subnet = 'configs/nas/mmcls/spos/SPOS_SHUFFLENETV2_330M_IN1k_PAPER.yaml' +model = dict(fix_subnet=fix_subnet) +``` + +### Pruning + +Pruning has three steps, including **supernet pre-training**, **search for subnet on the trained supernet** and **subnet retraining**. The commands of the first two steps are similar to NAS, except that we need to use `CONFIG_FILE` of Pruning here. The commands of the **subnet retraining** are as follows. + +#### Subnet Retraining + +```Python +python tools/train.py ${CONFIG_FILE} --cfg-options model._channel_cfg_paths=${CHANNEL_CFG_PATH} [optional arguments] +``` + +Different from NAS, the argument that needs to be specified here is `channel_cfg_paths` . + +- `CHANNEL_CFG_PATH`: Path of `_channel_cfg_path`. `channel_cfg` represents **config for channel of the subnet searched out**, used to specify different subnets for testing. + +For example, the default `_channel_cfg_paths` is set in the config below. + +```Python +python ./tools/train.py \ + configs/pruning/mmcls/autoslim/autoslim_mbv2_1.5x_subnet_8xb256_in1k_flops-530M \ + --work-dir your_work_dir +``` + +### Distillation + +There is only one step to start knowledge distillation. + +```Python +python tools/train.py ${CONFIG_FILE} [optional arguments] +``` + +For example, + +```Python +python ./tools/train.py \ + configs/distill/mmcls/kd/kd_logits_r34_r18_8xb32_in1k.py \ + --work-dir your_work_dir +``` diff --git a/docs/en/user_guides/3_train_with_different_devices.md b/docs/en/user_guides/3_train_with_different_devices.md index 60d52fe56..aa24654e8 100644 --- a/docs/en/user_guides/3_train_with_different_devices.md +++ b/docs/en/user_guides/3_train_with_different_devices.md @@ -1 +1,93 @@ # Train with different devices + +**Note**: The default learning rate in config files is for 8 GPUs. If using different number GPUs, the total batch size will change in proportion, you have to scale the learning rate following `new_lr = old_lr * new_ngpus / old_ngpus`. We recommend to use `tools/dist_train.sh` even with 1 gpu, since some methods do not support non-distributed training. + +### Training with CPU + +```Python +export CUDA_VISIBLE_DEVICES=-1 +python tools/train.py ${CONFIG_FILE} +``` + +**Note**: We do not recommend users to use CPU for training because it is too slow and some algorithms are using `SyncBN` which is based on distributed training. We support this feature to allow users to debug on machines without GPU for convenience. + +### Train with single/multiple GPUs + +```Python +sh tools/dist_train.sh ${CONFIG_FILE} ${GPUS} --work_dir ${YOUR_WORK_DIR} [optional arguments] +``` + +**Note**: During training, checkpoints and logs are saved in the same folder structure as the config file under `work_dirs/`. Custom work directory is not recommended since evaluation scripts infer work directories from the config file name. If you want to save your weights somewhere else, please use symlink, for example: + +```Python +ln -s ${YOUR_WORK_DIRS} ${MMRAZOR}/work_dirs +``` + +Alternatively, if you run MMRazor on a cluster managed with [slurm](https://slurm.schedmd.com/): + +```Python +GPUS_PER_NODE=${GPUS_PER_NODE} GPUS=${GPUS} SRUN_ARGS=${SRUN_ARGS} sh tools/xxx/slurm_train.sh ${PARTITION} ${JOB_NAME} ${CONFIG_FILE} ${YOUR_WORK_DIR} [optional arguments] +``` + +### Train with multiple machines + +If you launch with multiple machines simply connected with ethernet, you can simply run the following commands: + +On the first machine: + +```Python +NNODES=2 NODE_RANK=0 PORT=$MASTER_PORT MASTER_ADDR=$MASTER_ADDR sh tools/dist_train.sh $CONFIG $GPUS +``` + +On the second machine: + +```Python +NNODES=2 NODE_RANK=1 PORT=$MASTER_PORT MASTER_ADDR=$MASTER_ADDR sh tools/dist_train.sh $CONFIG $GPUS +``` + +Usually it is slow if you do not have high speed networking like InfiniBand. + +If you launch with slurm, the command is the same as that on single machine described above, but you need refer to [slurm_train.sh](https://github.com/open-mmlab/mmselfsup/blob/master/tools/slurm_train.sh) to set appropriate parameters and environment variables. + +### Launch multiple jobs on a single machine + +If you launch multiple jobs on a single machine, e.g., 2 jobs of 4-GPU training on a machine with 8 GPUs, you need to specify different ports (29500 by default) for each job to avoid communication conflict. + +If you use `dist_train.sh` to launch training jobs: + +```Python +CUDA_VISIBLE_DEVICES=0,1,2,3 PORT=29500 sh tools/xxx/dist_train.sh ${CONFIG_FILE} 4 --work_dir tmp_work_dir_1 +CUDA_VISIBLE_DEVICES=4,5,6,7 PORT=29501 sh tools/xxx/dist_train.sh ${CONFIG_FILE} 4 --work_dir tmp_work_dir_2 +``` + +If you use launch training jobs with slurm, you have two options to set different communication ports: + +Option 1: + +In `config1.py`: + +```Python +dist_params = dict(backend='nccl', port=29500) +``` + +In `config2.py`: + +```Python +dist_params = dict(backend='nccl', port=29501) +``` + +Then you can launch two jobs with config1.py and config2.py. + +```Python +CUDA_VISIBLE_DEVICES=0,1,2,3 GPUS=4 sh tools/slurm_train.sh ${PARTITION} ${JOB_NAME} config1.py tmp_work_dir_1 +CUDA_VISIBLE_DEVICES=4,5,6,7 GPUS=4 sh tools/slurm_train.sh ${PARTITION} ${JOB_NAME} config2.py tmp_work_dir_2 +``` + +Option 2: + +You can set different communication ports without the need to modify the configuration file, but have to set the `cfg-options` to overwrite the default port in configuration file. + +```Python +CUDA_VISIBLE_DEVICES=0,1,2,3 GPUS=4 sh tools/slurm_train.sh ${PARTITION} ${JOB_NAME} config1.py tmp_work_dir_1 --cfg-options dist_params.port=29500 +CUDA_VISIBLE_DEVICES=4,5,6,7 GPUS=4 sh tools/slurm_train.sh ${PARTITION} ${JOB_NAME} config2.py tmp_work_dir_2 --cfg-options dist_params.port=29501 +``` diff --git a/docs/en/user_guides/4_test_a_model.md b/docs/en/user_guides/4_test_a_model.md index 7a61fd7fd..152b5dc41 100644 --- a/docs/en/user_guides/4_test_a_model.md +++ b/docs/en/user_guides/4_test_a_model.md @@ -1 +1,79 @@ # Test a model + +### NAS + +To test nas method, you can use the following command. + +```Python +python tools/test.py ${CONFIG_FILE} ${CHECKPOINT_PATH} --cfg-options algorithm.fix_subnet=${FIX_SUBNET_PATH} [optional arguments] +``` + +- `FIX_SUBNET_PATH`: Path of `fix_subnet`. `fix_subnet` represents **config for mutable of the subnet searched out**, used to specify different subnets for testing. An example for `fix_subnet` can be found [here](https://github.com/open-mmlab/mmrazor/blob/master/configs/nas/spos/SPOS_SHUFFLENETV2_330M_IN1k_PAPER.yaml). + +The usage of optional arguments are the same as corresponding tasks like mmclassification, mmdetection and mmsegmentation. + +For example, + +```Python +python tools/test.py \ + configs/nas/mmcls/spos/spos_subnet_shufflenetv2_8xb128_in1k.py \ + your_subnet_checkpoint_path \ + --cfg-options algorithm.fix_subnet=configs/nas/mmcls/spos/SPOS_SHUFFLENETV2_330M_IN1k_PAPER.yaml +``` + +### Pruning + +#### Split Checkpoint(Optional) + +If you train a slimmable model during retraining, checkpoints of different subnets are actually fused in only one checkpoint. You can split this checkpoint to multiple independent checkpoints by using the following command + +```Python +python tools/model_converters/split_checkpoint.py ${CONFIG_FILE} ${CHECKPOINT_PATH} --channel-cfgs ${CHANNEL_CFG_PATH} [optional arguments] +``` + +- `CHANNEL_CFG_PATH`: A list of paths of `channel_cfg`. For example, when you retrain a slimmable model, your command will be like `--cfg-options algorithm.channel_cfg=cfg1,cfg2,cfg3`. And your command here should be `--channel-cfgs cfg1 cfg2 cfg3`. The order of them should be the same. + +For example, + +```Python +python tools/model_converters/split_checkpoint.py \ + configs/pruning/autoslim/autoslim_mbv2_subnet_8xb256_in1k.py \ + your_retraining_checkpoint_path \ + --channel-cfgs configs/pruning/autoslim/AUTOSLIM_MBV2_530M_OFFICIAL.yaml configs/pruning/autoslim/AUTOSLIM_MBV2_320M_OFFICIAL.yaml configs/pruning/autoslim/AUTOSLIM_MBV2_220M_OFFICIAL.yaml +``` + +#### Test + +To test pruning method, you can use following command + +```Python +python tools/test.py ${CONFIG_FILE} ${CHECKPOINT_PATH} --cfg-options model._channel_cfg_paths=${CHANNEL_CFG_PATH} [optional arguments] +``` + +- `task`: one of `mmcls`、`mmdet` and `mmseg` + +- `CHANNEL_CFG_PATH`: Path of `channel_cfg`. `channel_cfg` represents **config for channel of the subnet searched out**, used to specify different subnets for testing. An example for `channel_cfg` can be found [here](https://github.com/open-mmlab/mmrazor/blob/master/configs/pruning/autoslim/AUTOSLIM_MBV2_220M_OFFICIAL.yaml), and the usage can be found [here](https://github.com/open-mmlab/mmrazor/blob/master/configs/pruning/autoslim/README.md#test-a-subnet). + +For example, + +```Python +python ./tools/test.py \ + configs/pruning/mmcls/autoslim/autoslim_mbv2__1.5x_subnet_8xb256_in1k-530M.py \ + your_splitted_checkpoint_path --metrics accuracy +``` + +### Distillation + +To test the distillation method, you can use the following command + +```Python +python tools/test.py ${CONFIG_FILE} ${CHECKPOINT_PATH} [optional arguments] +``` + +For example, + +```Python +python ./tools/test.py \ + configs/distill/mmseg/cwd/cwd_logits_pspnet_r101_d8_pspnet_r18_d8_512x1024_cityscapes_80k.py \ + your_splitted_checkpoint_path --show +```