Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEATURE] does this project supports gradient checkpointing? #117

Closed
feifeibear opened this issue Jan 5, 2022 · 1 comment
Closed

[FEATURE] does this project supports gradient checkpointing? #117

feifeibear opened this issue Jan 5, 2022 · 1 comment
Labels
enhancement New feature or request

Comments

@feifeibear
Copy link
Contributor

Activation Checkpoint (a.k.a gradient checkpointing in PyTorch (https://pytorch.org/docs/stable/checkpoint.html)) is an effective technique (from my perspective, maybe the most effective one) to improve model scale. It can primarily save activation memory footprint at the cost of recomputing. However, I did not see the technique applied in colossal?
I believe it is a model-relative optimization and should not be put in the core functionality of colossal-ai. But you should add it in the example or benchmark scripts.

See the huggingface GPT2 implementation for more details

https://github.com/huggingface/transformers/blob/master/src/transformers/models/gpt2/modeling_gpt2.py#L865

@feifeibear feifeibear added the enhancement New feature or request label Jan 5, 2022
@kurisusnowdeng
Copy link
Member

kurisusnowdeng commented Jan 5, 2022

Hi. Thanks for your suggestion! We now support this feature to some degree. You can define a checkpointable module as

from colossalai.nn.layer import CheckpointModule
class Net(CheckpointModule):
    def __init__(self, checkpoint=False, *args, **kwargs):
        super().__init__(checkpoint=checkpoint)
        ...
    def _forward(self, *args, **kwargs):
        # Define the forward pass here
        ...

Then initialize the module as net = Net(checkpoint=True) to enable activation checkpointing.
We already take advantage of such approach to reduce gpu memory usage in our GPT benchmark.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants