-
Notifications
You must be signed in to change notification settings - Fork 25.7k
Closed
Labels
module: activation checkpointingRelated to activation checkpointingRelated to activation checkpointingmodule: autogradRelated to torch.autograd, and the autograd engine in generalRelated to torch.autograd, and the autograd engine in generalneeds designWe want to add this feature but we need to figure out how firstWe want to add this feature but we need to figure out how firsttriagedThis issue has been looked at a team member, and triaged and prioritized into an appropriate moduleThis issue has been looked at a team member, and triaged and prioritized into an appropriate module
Description
🐛 Describe the bug
In the saved tensor hooks based checkpointing approach (
pytorch/torch/utils/checkpoint.py
Line 349 in 386b398
| storage: Dict[int, Optional[torch.Tensor]] = {} |
However, we currently do a storage.pop() for this to ensure we don't hold references to the tensor after the backward is over. This raises the issue that if the same tensor is unpacked twice, without a pack in between, we'll run into an error. A (silly) example repro is here:
Line 4598 in 386b398
| def test_checkpointing_without_reentrant_custom_function_raises(self): |
Another concern is if a tensor is packed by autograd but never unpacked, thus leading to storage leaking. Although, we are unsure if this can occur in practice.
Versions
main
cc @ezyang @albanD @zou3519 @gqchen @pearu @nikitaved @soulitzer @lezcano @Varal7
Metadata
Metadata
Assignees
Labels
module: activation checkpointingRelated to activation checkpointingRelated to activation checkpointingmodule: autogradRelated to torch.autograd, and the autograd engine in generalRelated to torch.autograd, and the autograd engine in generalneeds designWe want to add this feature but we need to figure out how firstWe want to add this feature but we need to figure out how firsttriagedThis issue has been looked at a team member, and triaged and prioritized into an appropriate moduleThis issue has been looked at a team member, and triaged and prioritized into an appropriate module