CHECKPOINT_PREFIX is not stripped when non-root module is activation checkpointed

### 🐛 Describe the bug

When training large models, we often want to activation checkpoint something smaller than the wrap module for FSDP. For example, we might want to only activation checkpoint attention in a transformer block.

Unfortunately, when calling `get_state_dict` with the new distributed checkpoint interface, the [_CHECKPOINT_PREFIX from checkpoint wrapper](https://github.com/pytorch/pytorch/blob/main/torch/distributed/algorithms/_checkpoint/checkpoint_wrapper.py#L13) is not properly stripped when we activation checkpoint submodules. 

We have to monkeypatch torch [here](https://github.com/mosaicml/composer/pull/2845/files#diff-03f7202bce6ff365b02b9171a328331179a8a5182232cdd5b3a720691f47a98bR97) to strip this always.

### Versions

Torch 2.1.2 / Nightly for Torch 2.2

cc @mrshenli @pritamdamania87 @zhaojuanmao @satgera @rohan-varma @gqchen @aazzolini @osalpekar @jiayisuse @H-Huang @kwen2501 @awgu @penguinwu @fegin @XilunWu @wanchaol @fduwjj @wz337 @tianyu-l @wconstab @yf225

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

CHECKPOINT_PREFIX is not stripped when non-root module is activation checkpointed #117399

🐛 Describe the bug

Versions

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

CHECKPOINT_PREFIX is not stripped when non-root module is activation checkpointed #117399

Description

🐛 Describe the bug

Versions

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions