[booster] Implement CheckpointIO for Native PyTorch #3053

FrankLeeeee · 2023-03-08T07:32:02Z

Overview

CheckpointIO takes care of the Booster.save and Booster.load logic to allow for model saving/resuming/loading. It should be noted that CheckpointIO is often used in pair with the Plugin as a Plugin can possibly require a specific saving/loading strategy. However, we should propose general ones for normal pytorch model and a DTensor-based model. As the DTensor is under development, we should focus on the native PyTorch implementation first.

Wanna track the development progress? Take a look at

proposal: #3046
project kanban: https://github.com/orgs/hpcaitech/projects/19

Goal

The CheckpointIO should allow the user to save/load the native PyTorch model/optimizer/lr schduler.

The text was updated successfully, but these errors were encountered:

FrankLeeeee self-assigned this Mar 8, 2023

FrankLeeeee added enhancement New feature or request API related to API changes labels Mar 8, 2023

FrankLeeeee mentioned this issue Mar 22, 2023

[api] implemented the checkpoint io module #3205

Merged

10 tasks

ver217 closed this as completed in #3205 Mar 23, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[booster] Implement CheckpointIO for Native PyTorch #3053

[booster] Implement CheckpointIO for Native PyTorch #3053

FrankLeeeee commented Mar 8, 2023

[booster] Implement CheckpointIO for Native PyTorch #3053

[booster] Implement CheckpointIO for Native PyTorch #3053

Comments

FrankLeeeee commented Mar 8, 2023

Overview

Goal