Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Docs] Introduce how to customize distributed training settings #1279

Merged
merged 2 commits into from
Jul 31, 2023
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
54 changes: 52 additions & 2 deletions docs/en/common_usage/distributed_training.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,9 @@

MMEngine supports training models with CPU, single GPU, multiple GPUs in single machine and multiple machines. When multiple GPUs are available in the environment, we can use the following command to enable multiple GPUs in single machine or multiple machines to shorten the training time of the model.

## multiple GPUs in single machine
## Launch Training

### multiple GPUs in single machine

Assuming the current machine has 8 GPUs, you can enable multiple GPUs training with the following command:

Expand All @@ -16,7 +18,7 @@ If you need to specify the GPU index, you can set the `CUDA_VISIBLE_DEVICES` env
CUDA_VISIBLE_DEVICES=0,3 python -m torch.distributed.launch --nproc_per_node=2 examples/distributed_training.py --launcher pytorch
```

## multiple machines
### multiple machines

Assume that there are 2 machines connected with ethernet, you can simply run following commands.

Expand Down Expand Up @@ -56,3 +58,51 @@ srun -p mm_dev \
--kill-on-bad-exit=1 \
python examples/distributed_training.py --launcher="slurm"
```

## Customize Distributed Training

When users switch from `single GPU` training to `multiple GPUs` training, no changes need to be made. [Runner](mmengine.runner.Runner.wrap_model) will use [MMDistributedDataParallel](mmengine.model.MMDistributedDataParallel) by default to wrap the model, thereby supporting multiple GPUs training.

If you want to pass more parameters to MMDistributedDataParallel or use your own `CustomDistributedDataParallel`, you can set `model_wrapper_cfg`.

### Pass More Parameters to MMDistributedDataParallel

For example, setting `find_unused_parameters` to `True`:

```python
cfg = dict(
model_wrapper_cfg=dict(
type='MMDistributedDataParallel', find_unused_parameters=True)
)
runner = Runner(
model=ResNet18(),
work_dir='./work_dir',
train_dataloader=train_dataloader_cfg,
optim_wrapper=dict(optimizer=dict(type='SGD', lr=0.001, momentum=0.9)),
train_cfg=dict(by_epoch=True, max_epochs=3),
cfg = cfg,
zhouzaida marked this conversation as resolved.
Show resolved Hide resolved
)
runner.train()
```

### Use a Customized CustomDistributedDataParallel

```python
from mmengine.registry import MODEL_WRAPPERS

@MODEL_WRAPPERS.register_module()
class CustomDistributedDataParallel(DistributedDataParallel):
pass


cfg = dict(model_wrapper_cfg=dict(type='CustomDistributedDataParallel'))
runner = Runner(
model=ResNet18(),
work_dir='./work_dir',
train_dataloader=train_dataloader_cfg,
optim_wrapper=dict(optimizer=dict(type='SGD', lr=0.001, momentum=0.9)),
train_cfg=dict(by_epoch=True, max_epochs=3),
cfg = cfg,
zhouzaida marked this conversation as resolved.
Show resolved Hide resolved
)
runner.train()
```
53 changes: 51 additions & 2 deletions docs/zh_cn/common_usage/distributed_training.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,9 @@

MMEngine 支持 CPU、单卡、单机多卡以及多机多卡的训练。当环境中有多张显卡时,我们可以使用以下命令开启单机多卡或者多机多卡的方式从而缩短模型的训练时间。

## 单机多卡
## 启动训练

### 单机多卡

假设当前机器有 8 张显卡,可以使用以下命令开启多卡训练

Expand All @@ -16,7 +18,7 @@ python -m torch.distributed.launch --nproc_per_node=8 examples/distributed_train
CUDA_VISIBLE_DEVICES=0,3 python -m torch.distributed.launch --nproc_per_node=2 examples/distributed_training.py --launcher pytorch
```

## 多机多卡
### 多机多卡

假设有 2 台机器,每台机器有 8 张卡。

Expand Down Expand Up @@ -56,3 +58,50 @@ srun -p mm_dev \
--kill-on-bad-exit=1 \
python examples/distributed_training.py --launcher="slurm"
```

## 定制化分布式训练

当用户从单卡训练切换到多卡训练时,无需做任何改动,[Runner](mmengine.runner.Runner.wrap_model) 会默认使用 [MMDistributedDataParallel](mmengine.model.MMDistributedDataParallel) 封装 model,从而支持多卡训练。
如果你希望给 MMDistributedDataParallel 传入更多的参数或者使用自定义的 `CustomDistributedDataParallel`,你可以设置 `model_wrapper_cfg`。
zhouzaida marked this conversation as resolved.
Show resolved Hide resolved

### 往 MMDistributedDataParallel 传入更多的参数

例如设置 `find_unused_parameters` 为 `True`:

```python
cfg = dict(
model_wrapper_cfg=dict(
type='MMDistributedDataParallel', find_unused_parameters=True)
)
runner = Runner(
model=ResNet18(),
work_dir='./work_dir',
train_dataloader=train_dataloader_cfg,
optim_wrapper=dict(optimizer=dict(type='SGD', lr=0.001, momentum=0.9)),
train_cfg=dict(by_epoch=True, max_epochs=3),
cfg = cfg,
zhouzaida marked this conversation as resolved.
Show resolved Hide resolved
)
runner.train()
```

### 使用自定义的 CustomDistributedDataParallel

```python
from mmengine.registry import MODEL_WRAPPERS

@MODEL_WRAPPERS.register_module()
class CustomDistributedDataParallel(DistributedDataParallel):
pass


cfg = dict(model_wrapper_cfg=dict(type='CustomDistributedDataParallel'))
runner = Runner(
model=ResNet18(),
work_dir='./work_dir',
train_dataloader=train_dataloader_cfg,
optim_wrapper=dict(optimizer=dict(type='SGD', lr=0.001, momentum=0.9)),
train_cfg=dict(by_epoch=True, max_epochs=3),
cfg = cfg,
zhouzaida marked this conversation as resolved.
Show resolved Hide resolved
)
runner.train()
```