Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Output checkpoint file of training with both optimizer's parameters and model's one. #618

Open
junjihashimoto opened this issue Sep 1, 2021 · 5 comments

Comments

@junjihashimoto
Copy link
Member

No description provided.

@tscholak
Copy link
Member

tscholak commented Sep 1, 2021

they are typically saved in two different files

@tscholak
Copy link
Member

tscholak commented Sep 1, 2021

add rng state to the list

@junjihashimoto
Copy link
Member Author

pytorch saves the RNG state for the current device and the device of all cuda.
https://pytorch.org/docs/stable/checkpoint.html

In this recipe, all the states are put in one file.
https://pytorch.org/tutorials/recipes/recipes/saving_and_loading_a_general_checkpoint.html

Rather than outputting two files (model and optimizer states) separately in torchvision, it seems that it outputs the one file as checkpoint.pth and model_xx.pth, and the latest one is just checkpoint.pth.
https://github.com/pytorch/vision/blob/main/references/detection/train.py#L209-L221

@junjihashimoto
Copy link
Member Author

I just want to have checkpoint-function, so I think it's okay to have one or more.

@junjihashimoto
Copy link
Member Author

@tscholak
Thank you for pointing it out.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants