You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Activation Checkpoint (a.k.a gradient checkpointing in PyTorch (https://pytorch.org/docs/stable/checkpoint.html)) is an effective technique (from my perspective, maybe the most effective one) to improve model scale. It can primarily save activation memory footprint at the cost of recomputing. However, I did not see the technique applied in colossal?
I believe it is a model-relative optimization and should not be put in the core functionality of colossal-ai. But you should add it in the example or benchmark scripts.
See the huggingface GPT2 implementation for more details
Hi. Thanks for your suggestion! We now support this feature to some degree. You can define a checkpointable module as
from colossalai.nn.layer import CheckpointModule
class Net(CheckpointModule):
def __init__(self, checkpoint=False, *args, **kwargs):
super().__init__(checkpoint=checkpoint)
...
def _forward(self, *args, **kwargs):
# Define the forward pass here
...
Then initialize the module as net = Net(checkpoint=True) to enable activation checkpointing.
We already take advantage of such approach to reduce gpu memory usage in our GPT benchmark.
Activation Checkpoint (a.k.a gradient checkpointing in PyTorch (https://pytorch.org/docs/stable/checkpoint.html)) is an effective technique (from my perspective, maybe the most effective one) to improve model scale. It can primarily save activation memory footprint at the cost of recomputing. However, I did not see the technique applied in colossal?
I believe it is a model-relative optimization and should not be put in the core functionality of colossal-ai. But you should add it in the example or benchmark scripts.
See the huggingface GPT2 implementation for more details
https://github.com/huggingface/transformers/blob/master/src/transformers/models/gpt2/modeling_gpt2.py#L865
The text was updated successfully, but these errors were encountered: