Saved tensor hooks checkpoint implementation cannot robustly clear storage

### 🐛 Describe the bug

In the saved tensor hooks based checkpointing approach (https://github.com/pytorch/pytorch/blob/386b39831745d9b87f24481b7c919816a80686cb/torch/utils/checkpoint.py#L349),  when autograd needs to unpack an activation, it potentially re-runs the forward to recompute all activations, and then returns the activation for the index it is unpacking.

However, we currently do a `storage.pop()` for this to ensure we don't hold references to the tensor after the backward is over. This raises the issue that if the same tensor is unpacked twice, without a pack in between, we'll run into an error. A (silly) example repro is here: https://github.com/pytorch/pytorch/blob/386b39831745d9b87f24481b7c919816a80686cb/test/test_autograd.py#L4598

Another concern is if a tensor is packed by autograd but never unpacked, thus leading to `storage` leaking. Although, we are unsure if this can occur in practice.

### Versions

main

cc @ezyang @albanD @zou3519 @gqchen @pearu @nikitaved @soulitzer @Lezcano @Varal7

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Saved tensor hooks checkpoint implementation cannot robustly clear storage #82482

🐛 Describe the bug

Versions

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Saved tensor hooks checkpoint implementation cannot robustly clear storage #82482

Description

🐛 Describe the bug

Versions

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions