You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When training large models, we often want to activation checkpoint something smaller than the wrap module for FSDP. For example, we might want to only activation checkpoint attention in a transformer block.
Unfortunately, when calling get_state_dict with the new distributed checkpoint interface, the _CHECKPOINT_PREFIX from checkpoint wrapper is not properly stripped when we activation checkpoint submodules.
We have to monkeypatch torch here to strip this always.