[FEATURE] does this project supports gradient checkpointing? #117

feifeibear · 2022-01-05T07:19:51Z

Activation Checkpoint (a.k.a gradient checkpointing in PyTorch (https://pytorch.org/docs/stable/checkpoint.html)) is an effective technique (from my perspective, maybe the most effective one) to improve model scale. It can primarily save activation memory footprint at the cost of recomputing. However, I did not see the technique applied in colossal?
I believe it is a model-relative optimization and should not be put in the core functionality of colossal-ai. But you should add it in the example or benchmark scripts.

See the huggingface GPT2 implementation for more details

https://github.com/huggingface/transformers/blob/master/src/transformers/models/gpt2/modeling_gpt2.py#L865

kurisusnowdeng · 2022-01-05T07:40:05Z

Hi. Thanks for your suggestion! We now support this feature to some degree. You can define a checkpointable module as

from colossalai.nn.layer import CheckpointModule
class Net(CheckpointModule):
    def __init__(self, checkpoint=False, *args, **kwargs):
        super().__init__(checkpoint=checkpoint)
        ...
    def _forward(self, *args, **kwargs):
        # Define the forward pass here
        ...

Then initialize the module as net = Net(checkpoint=True) to enable activation checkpointing.
We already take advantage of such approach to reduce gpu memory usage in our GPT benchmark.

feifeibear added the enhancement New feature or request label Jan 5, 2022

feifeibear closed this as completed Jan 5, 2022

feifeibear reopened this Jan 5, 2022

feifeibear closed this as completed Jan 5, 2022

feifeibear mentioned this issue Jan 7, 2022

[Discussion] About 3D Parallelism #131

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FEATURE] does this project supports gradient checkpointing? #117

[FEATURE] does this project supports gradient checkpointing? #117

feifeibear commented Jan 5, 2022

kurisusnowdeng commented Jan 5, 2022 •

edited

Loading

[FEATURE] does this project supports gradient checkpointing? #117

[FEATURE] does this project supports gradient checkpointing? #117

Comments

feifeibear commented Jan 5, 2022

kurisusnowdeng commented Jan 5, 2022 • edited Loading

kurisusnowdeng commented Jan 5, 2022 •

edited

Loading