Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature] Add colossalai strategy #1299

Merged
merged 14 commits into from
Aug 18, 2023
1 change: 0 additions & 1 deletion docs/en/api/optim.rst
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,6 @@ Optimizer
OptimWrapperDict
DefaultOptimWrapperConstructor
ZeroRedundancyOptimizer
DeepSpeedOptimWrapper

.. autosummary::
:toctree: generated
Expand Down
23 changes: 23 additions & 0 deletions docs/en/api/strategy.rst
Original file line number Diff line number Diff line change
Expand Up @@ -16,3 +16,26 @@ mmengine._strategy
DDPStrategy
DeepSpeedStrategy
FSDPStrategy
ColossalAIStrategy


.. currentmodule:: mmengine._strategy.deepspeed

.. autosummary::
:toctree: generated
:nosignatures:
:template: classtemplate.rst

MMDeepSpeedEngineWrapper
DeepSpeedOptimWrapper


.. currentmodule:: mmengine._strategy.colossalai

.. autosummary::
:toctree: generated
:nosignatures:
:template: classtemplate.rst

CollosalAIModelWrapper
ColossalAIOpitmWrapper
76 changes: 75 additions & 1 deletion docs/en/common_usage/large_model_training.md
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,7 @@ pip install deepspeed
After installing DeepSpeed, you need to configure the `strategy` and `optim_wrapper` parameters of FlexibleRunner as follows:

- strategy: Set `type='DeepSpeedStrategy'` and configure other parameters. See [DeepSpeedStrategy](mmengine._strategy.DeepSpeedStrategy) for more details.
- optim_wrapper: Set `type='DeepSpeedOptimWrapper'` and configure other parameters. See [DeepSpeedOptimWrapper](mmengine.optim.DeepSpeedOptimWrapper) for more details.
- optim_wrapper: Set `type='DeepSpeedOptimWrapper'` and configure other parameters. See [DeepSpeedOptimWrapper](mmengine._strategy.deepspeed.DeepSpeedOptimWrapper) for more details.

Here is an example configuration related to DeepSpeed:

Expand Down Expand Up @@ -182,3 +182,77 @@ torchrun --nproc-per-node 2 examples/distributed_training_with_flexible_runner.p
```

</details>

## ColossalAI

[ColossalAI](https://colossalai.org/) is a comprehensive large-scale model training system that utilizes efficient parallelization techniques. Starting from MMEngine v0.8.5, it supports training models using optimization strategies from the ZeRO series in ColossalAI.

Install ColossalAI with a version greater than v0.3.1. This version requirement is due to a [bug](https://github.com/hpcaitech/ColossalAI/issues/4393) in v0.3.1 that causes some program blocking, which has been fixed in later versions. If the highest available version of ColossalAI is still v0.3.1, it is recommended to install ColossalAI from the source code on the main branch.

```{note}
Note that if you encounter compilation errors like `nvcc fatal: Unsupported gpu architecture 'compute_90'` and your PyTorch version is higher than 2.0, you need to git clone the source code and follow the modifications in this [PR](https://github.com/hpcaitech/ColossalAI/pull/4357) before proceeding with the installation.
```

```bash
pip install git+https://github.com/hpcaitech/ColossalAI
```

If the latest version of ColossalAI is higher than v0.3.1, you can directly install it using pip:

```bash
pip install colossalai
```

Once ColossalAI is installed, configure the `strategy` and `optim_wrapper` parameters for FlexibleRunner:

- `strategy`: Specify `type='ColossalAIStrategy'` and configure the parameters. Detailed parameter descriptions can be found in [ColossalAIStrategy](mmengine._strategy.ColossalAI).
- `optim_wrapper`: Default to no `type` parameter or specify `type=ColossalAIOpitmWrapper`. It is recommended to choose `HybridAdam` as the optimizer type. Other configurable types are listed in [ColossalAIOptimWrapper](mmengine._strategy.ColossalAIOptimWrapper).

Here's the configuration related to ColossalAI:

```python
from mmengine.runner._flexible_runner import FlexibleRunner

strategy = dict(type='ColossalAIStrategy')
optim_wrapper = dict(optimizer=dict(type='HybridAdam', lr=1e-3))

# Initialize FlexibleRunner
runner = FlexibleRunner(
model=MMResNet50(),
work_dir='./work_dirs',
strategy=strategy,
train_dataloader=train_dataloader,
optim_wrapper=optim_wrapper,
param_scheduler=dict(type='LinearLR'),
train_cfg=dict(by_epoch=True, max_epochs=10, val_interval=1),
val_dataloader=val_dataloader,
val_cfg=dict(),
val_evaluator=dict(type=Accuracy))

# Start training
runner.train()
```

To initiate distributed training using two GPUs:

```bash
torchrun --nproc-per-node 2 examples/distributed_training_with_flexible_runner.py --use-colossalai
```

<details>
<summary>Training Logs</summary>

```
08/18 11:56:34 - mmengine - INFO - Epoch(train) [1][ 10/196] lr: 3.3333e-04 eta: 0:10:31 time: 0.3238 data_time: 0.0344 memory: 597 loss: 3.8766
08/18 11:56:35 - mmengine - INFO - Epoch(train) [1][ 20/196] lr: 3.3333e-04 eta: 0:06:56 time: 0.1057 data_time: 0.0338 memory: 597 loss: 2.3797
08/18 11:56:36 - mmengine - INFO - Epoch(train) [1][ 30/196] lr: 3.3333e-04 eta: 0:05:45 time: 0.1068 data_time: 0.0342 memory: 597 loss: 2.3219
08/18 11:56:37 - mmengine - INFO - Epoch(train) [1][ 40/196] lr: 3.3333e-04 eta: 0:05:08 time: 0.1059 data_time: 0.0337 memory: 597 loss: 2.2641
08/18 11:56:38 - mmengine - INFO - Epoch(train) [1][ 50/196] lr: 3.3333e-04 eta: 0:04:45 time: 0.1062 data_time: 0.0338 memory: 597 loss: 2.2250
08/18 11:56:40 - mmengine - INFO - Epoch(train) [1][ 60/196] lr: 3.3333e-04 eta: 0:04:31 time: 0.1097 data_time: 0.0339 memory: 597 loss: 2.1672
08/18 11:56:41 - mmengine - INFO - Epoch(train) [1][ 70/196] lr: 3.3333e-04 eta: 0:04:21 time: 0.1096 data_time: 0.0340 memory: 597 loss: 2.1688
08/18 11:56:42 - mmengine - INFO - Epoch(train) [1][ 80/196] lr: 3.3333e-04 eta: 0:04:13 time: 0.1098 data_time: 0.0338 memory: 597 loss: 2.1781
08/18 11:56:43 - mmengine - INFO - Epoch(train) [1][ 90/196] lr: 3.3333e-04 eta: 0:04:06 time: 0.1097 data_time: 0.0338 memory: 597 loss: 2.0938
08/18 11:56:44 - mmengine - INFO - Epoch(train) [1][100/196] lr: 3.3333e-04 eta: 0:04:01 time: 0.1097 data_time: 0.0339 memory: 597 loss: 2.1078
08/18 11:56:45 - mmengine - INFO - Epoch(train) [1][110/196] lr: 3.3333e-04 eta: 0:04:01 time: 0.1395 data_time: 0.0340 memory: 597 loss: 2.0141
08/18 11:56:46 - mmengine - INFO - Epoch(train) [1][120/196] lr: 3.3333
```
1 change: 0 additions & 1 deletion docs/zh_cn/api/optim.rst
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,6 @@ Optimizer
OptimWrapperDict
DefaultOptimWrapperConstructor
ZeroRedundancyOptimizer
DeepSpeedOptimWrapper

.. autosummary::
:toctree: generated
Expand Down
23 changes: 23 additions & 0 deletions docs/zh_cn/api/strategy.rst
zhouzaida marked this conversation as resolved.
Show resolved Hide resolved
Original file line number Diff line number Diff line change
Expand Up @@ -16,3 +16,26 @@ mmengine._strategy
DDPStrategy
DeepSpeedStrategy
FSDPStrategy
ColossalAIStrategy


.. currentmodule:: mmengine._strategy.deepspeed

.. autosummary::
:toctree: generated
:nosignatures:
:template: classtemplate.rst

MMDeepSpeedEngineWrapper
DeepSpeedOptimWrapper


.. currentmodule:: mmengine._strategy.colossalai

.. autosummary::
:toctree: generated
:nosignatures:
:template: classtemplate.rst

CollosalAIModelWrapper
ColossalAIOpitmWrapper
84 changes: 83 additions & 1 deletion docs/zh_cn/common_usage/large_model_training.md
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,7 @@ pip install deepspeed
安装好 deepspeed 后,需配置 FlexibleRunner 的 strategy 和 optim_wrapper 参数:

- strategy:指定 `type='DeepSpeedStrategy'` 并配置参数。参数的详细介绍可阅读 [DeepSpeedStrategy](mmengine._strategy.DeepSpeedStrategy)。
- optim_wrapper:指定 `type='DeepSpeedOptimWrapper'` 并配置参数。参数的详细介绍可阅读 [DeepSpeedOptimWrapper](mmengine.optim.DeepSpeedOptimWrapper)。
- optim_wrapper:指定 `type='DeepSpeedOptimWrapper'` 并配置参数。参数的详细介绍可阅读 [DeepSpeedOptimWrapper](mmengine._strategy.deepspeed.DeepSpeedOptimWrapper)。

下面是 DeepSpeed 相关的配置:

Expand Down Expand Up @@ -181,3 +181,85 @@ torchrun --nproc-per-node 2 examples/distributed_training_with_flexible_runner.p
```

</details>

## ColossalAI

[ColossalAI](https://colossalai.org/) 是一个具有高效并行化技术的综合大规模模型训练系统。MMEngine 自 v0.8.5 开始,支持使用 ColossalAI 中的 ZeRO 系列优化策略训练模型。

安装版本大于 v0.3.1 的 ColossalAI。这个版本限制是由于 v0.3.1 存在一些程序阻塞的 [Bug](https://github.com/hpcaitech/ColossalAI/issues/4393),而该 Bug 在之后的版本中已经修复。如果目前 ColossalAI 的最高版本仍为 v0.3.1,建议从源码安装主分支的 ColossalAI。

```{note}
需要注意的是,如果你的 PyTorch 版本高于 2.0,并遇到了 `nvcc fatal : Unsupported gpu architecture 'compute_90'` 类似的编译错误,则需要 git clone 源码,参考该 [PR](https://github.com/hpcaitech/ColossalAI/pull/4357) 进行修改源码,再进行安装
```

```bash
pip install git+https://github.com/hpcaitech/ColossalAI
```

如果 ColossalAI 的最新版本大于 v0.3.1,可以直接使用 pip 安装:

```bash
pip install colossalai
```

安装好 ColossalAI 后,需配置 FlexibleRunner 的 strategy 和 optim_wrapper 参数:

- strategy:指定 `type='ColossalAIStrategy'` 并配置参数。参数的详细介绍可阅读 [ColossalAIStrategy](mmengine._strategy.ColossalAI)。
- optim_wrapper:缺省 `type` 参数,或指定 `type=ColossalAIOpitmWrapper`,优化器类型时建议选择 `HybridAdam`。其他可配置类型可阅读 [ColossalAIOptimWrapper](mmengine._strategy.ColossalAIOptimWrapper)。

下面是 ColossalAI 相关的配置:

```python
from mmengine.runner._flexible_runner import FlexibleRunner

strategy = dict(type='ColossalAIStrategy')
optim_wrapper = dict(optimizer=dict(type='HybridAdam', lr=1e-3))

# 初始化 FlexibleRunner
runner = FlexibleRunner(
model=MMResNet50(),
work_dir='./work_dirs',
strategy=strategy,
train_dataloader=train_dataloader,
optim_wrapper=optim_wrapper,
param_scheduler=dict(type='LinearLR'),
train_cfg=dict(by_epoch=True, max_epochs=10, val_interval=1),
val_dataloader=val_dataloader,
val_cfg=dict(),
val_evaluator=dict(type=Accuracy))

# 开始训练
runner.train()
```

使用两张卡启动分布式训练:

```bash
torchrun --nproc-per-node 2 examples/distributed_training_with_flexible_runner.py --use-colossalai
```

<details>
<summary>训练日志</summary>

```
08/18 11:56:34 - mmengine - INFO - Epoch(train) [1][ 10/196] lr: 3.3333e-04 eta: 0:10:31 time: 0.3238 data_time: 0.0344 memory: 597 loss: 3.8766
08/18 11:56:35 - mmengine - INFO - Epoch(train) [1][ 20/196] lr: 3.3333e-04 eta: 0:06:56 time: 0.1057 data_time: 0.0338 memory: 597 loss: 2.3797
08/18 11:56:36 - mmengine - INFO - Epoch(train) [1][ 30/196] lr: 3.3333e-04 eta: 0:05:45 time: 0.1068 data_time: 0.0342 memory: 597 loss: 2.3219
08/18 11:56:37 - mmengine - INFO - Epoch(train) [1][ 40/196] lr: 3.3333e-04 eta: 0:05:08 time: 0.1059 data_time: 0.0337 memory: 597 loss: 2.2641
08/18 11:56:38 - mmengine - INFO - Epoch(train) [1][ 50/196] lr: 3.3333e-04 eta: 0:04:45 time: 0.1062 data_time: 0.0338 memory: 597 loss: 2.2250
08/18 11:56:40 - mmengine - INFO - Epoch(train) [1][ 60/196] lr: 3.3333e-04 eta: 0:04:31 time: 0.1097 data_time: 0.0339 memory: 597 loss: 2.1672
08/18 11:56:41 - mmengine - INFO - Epoch(train) [1][ 70/196] lr: 3.3333e-04 eta: 0:04:21 time: 0.1096 data_time: 0.0340 memory: 597 loss: 2.1688
08/18 11:56:42 - mmengine - INFO - Epoch(train) [1][ 80/196] lr: 3.3333e-04 eta: 0:04:13 time: 0.1098 data_time: 0.0338 memory: 597 loss: 2.1781
08/18 11:56:43 - mmengine - INFO - Epoch(train) [1][ 90/196] lr: 3.3333e-04 eta: 0:04:06 time: 0.1097 data_time: 0.0338 memory: 597 loss: 2.0938
08/18 11:56:44 - mmengine - INFO - Epoch(train) [1][100/196] lr: 3.3333e-04 eta: 0:04:01 time: 0.1097 data_time: 0.0339 memory: 597 loss: 2.1078
08/18 11:56:45 - mmengine - INFO - Epoch(train) [1][110/196] lr: 3.3333e-04 eta: 0:04:01 time: 0.1395 data_time: 0.0340 memory: 597 loss: 2.0141
08/18 11:56:46 - mmengine - INFO - Epoch(train) [1][120/196] lr: 3.3333e-04 eta: 0:03:56 time: 0.1090 data_time: 0.0338 memory: 597 loss: 2.0273
08/18 11:56:48 - mmengine - INFO - Epoch(train) [1][130/196] lr: 3.3333e-04 eta: 0:03:52 time: 0.1096 data_time: 0.0339 memory: 597 loss: 2.0086
08/18 11:56:49 - mmengine - INFO - Epoch(train) [1][140/196] lr: 3.3333e-04 eta: 0:03:49 time: 0.1096 data_time: 0.0339 memory: 597 loss: 1.9180
08/18 11:56:50 - mmengine - INFO - Epoch(train) [1][150/196] lr: 3.3333e-04 eta: 0:03:46 time: 0.1092 data_time: 0.0339 memory: 597 loss: 1.9578
08/18 11:56:51 - mmengine - INFO - Epoch(train) [1][160/196] lr: 3.3333e-04 eta: 0:03:43 time: 0.1097 data_time: 0.0339 memory: 597 loss: 1.9375
08/18 11:56:52 - mmengine - INFO - Epoch(train) [1][170/196] lr: 3.3333e-04 eta: 0:03:40 time: 0.1092 data_time: 0.0339 memory: 597 loss: 1.9312
08/18 11:56:53 - mmengine - INFO - Epoch(train) [1][180/196] lr: 3.3333e-04 eta: 0:03:37 time: 0.1070 data_time: 0.0339 memory: 597 loss: 1.9078
```

</details>
21 changes: 21 additions & 0 deletions examples/distributed_training_with_flexible_runner.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
# Copyright (c) OpenMMLab. All rights reserved.
import argparse

import torch
import torch.nn.functional as F
import torchvision
import torchvision.transforms as transforms
Expand Down Expand Up @@ -44,6 +45,7 @@ def parse_args():
parser.add_argument('--local_rank', '--local-rank', type=int, default=0)
parser.add_argument('--use-fsdp', action='store_true')
parser.add_argument('--use-deepspeed', action='store_true')
parser.add_argument('--use-colossalai', action='store_true')
args = parser.parse_args()
return args

Expand Down Expand Up @@ -116,6 +118,25 @@ def main():
model_wrapper=dict(auto_wrap_policy=size_based_auto_wrap_policy))
optim_wrapper = dict(
type='AmpOptimWrapper', optimizer=dict(type='AdamW', lr=1e-3))
elif args.use_colossalai:
from colossalai.tensor.op_wrapper import colo_op_impl

# ColossalAI overwrite some torch ops with their custom op to
# make it compatible with `ColoTensor`. However, a backward error
# is more likely to happen if there are inplace operation in the
# model.
# For example, layers like `conv` + `bn` + `relu` is OK when `relu` is
# inplace since PyTorch builtin ops `batch_norm` could handle it.
# However, if `relu` is an `inplaced` op while `batch_norm` is an
# custom op, an error will be raised since PyTorch thinks the custom op
# could not handle the backward graph modification caused by inplace
# op.
# In this example, the inplace op `add_` in resnet could raise an error
# since PyTorch consider the custom op before it could not handle the
# backward graph modification
colo_op_impl(torch.Tensor.add_)(torch.add)
strategy = dict(type='ColossalAIStrategy')
optim_wrapper = dict(optimizer=dict(type='HybridAdam', lr=1e-3))
else:
strategy = None
optim_wrapper = dict(
Expand Down
4 changes: 3 additions & 1 deletion mmengine/_strategy/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,12 +2,14 @@
from mmengine.utils import digit_version
from mmengine.utils.dl_utils import TORCH_VERSION
from .base import BaseStrategy
from .colossalai import ColossalAIStrategy
from .deepspeed import DeepSpeedStrategy
from .distributed import DDPStrategy
from .single_device import SingleDeviceStrategy

__all__ = [
'BaseStrategy', 'DDPStrategy', 'SingleDeviceStrategy', 'DeepSpeedStrategy'
'BaseStrategy', 'DDPStrategy', 'SingleDeviceStrategy', 'DeepSpeedStrategy',
'ColossalAIStrategy'
]

if digit_version(TORCH_VERSION) >= digit_version('2.0.0'):
Expand Down