Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Summary: Pull Request resolved: #626 # Context We want to add another checkpointer using [DCP](https://pytorch.org/docs/stable/distributed.checkpoint.html). However, we don't want to duplicate the logic that already exists in TorchSnapshotSaver related to checkpoint frequency, keeping k latest checkpoints, etc # This Diff * Adds abstract `BaseCheckpointer` class to implements common logic like syncing dirpath's across all ranks, implementing all hooks where checkpoint may occur, etc. * Any class subclassing must implement `_checkpoint_impl` and `restore` functions. The `restore_from_latest` method will call the user defined `restore`. * copies all applicable tests from `TorchSnapshotSaver` into `BaseCheckpointer`'s test (will remove relevant `TorchSnapshotSaver` tests in next diff) Reviewed By: galrotem Differential Revision: D51328340 fbshipit-source-id: b7bc65c294fabf5d3671735dc1afcf54c1c59a1b
- Loading branch information