Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[booster] Implement CheckpointIO for Native PyTorch #3053

Closed
FrankLeeeee opened this issue Mar 8, 2023 · 0 comments · Fixed by #3205
Closed

[booster] Implement CheckpointIO for Native PyTorch #3053

FrankLeeeee opened this issue Mar 8, 2023 · 0 comments · Fixed by #3205
Assignees
Labels
API related to API changes enhancement New feature or request

Comments

@FrankLeeeee
Copy link
Contributor

Overview

CheckpointIO takes care of the Booster.save and Booster.load logic to allow for model saving/resuming/loading. It should be noted that CheckpointIO is often used in pair with the Plugin as a Plugin can possibly require a specific saving/loading strategy. However, we should propose general ones for normal pytorch model and a DTensor-based model. As the DTensor is under development, we should focus on the native PyTorch implementation first.

Wanna track the development progress? Take a look at

proposal: #3046
project kanban: https://github.com/orgs/hpcaitech/projects/19

Goal

The CheckpointIO should allow the user to save/load the native PyTorch model/optimizer/lr schduler.

@FrankLeeeee FrankLeeeee self-assigned this Mar 8, 2023
@FrankLeeeee FrankLeeeee added enhancement New feature or request API related to API changes labels Mar 8, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
API related to API changes enhancement New feature or request
Projects
Status: Done
Development

Successfully merging a pull request may close this issue.

1 participant