Skip to content

New utils.checkpoint rollout. #65537

@albanD

Description

@albanD

Thanks to the work done on SavedVariable hooks, we can know have a new implementation of checkpoint that does not require re-entrant autograd and so make it more composable with systems that don't support re-entrant autograd (most notably DDP).

We have a full fledge prototype and testing in #62964 that implement the core logic and ensures it works as expected.

The API we want to introduce here is to have a use_reentrant: bool argument to checkpoint and checkpoint_sequential that control which version of checkpoint is used.

Due to BC concerns we won't be able to just change the default to the new version so we should (one PT version per bullet point):

  • Add the new boolean flag that default to True (current behavior) and deprecate not passing the value for it
  • Remove the default value for the flag
  • Add the a default value of False for the flag

cc @ezyang @gchanan @zou3519 @bdhirsh @jbschlosser @albanD @gqchen @pearu @nikitaved @soulitzer @lezcano @Varal7

Metadata

Metadata

Assignees

Labels

actionablebetter-engineeringRelatively self-contained tasks for better engineering contributorshigh prioritymodule: autogradRelated to torch.autograd, and the autograd engine in generaltriagedThis issue has been looked at a team member, and triaged and prioritized into an appropriate module

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions