Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cfg.checkpoint_folder is not defined when using StateDictType.FULL_STATE_DICT #48

Closed
drcege opened this issue Jul 26, 2023 · 1 comment
Labels

Comments

@drcege
Copy link

drcege commented Jul 26, 2023

When fine-tuning with StateDictType.FULL_STATE_DICT, the program crashes when saving checkpoint.

The error is caused here
https://github.com/facebookresearch/llama-recipes/blob/74bde65a62667a38ee0411676cf058c53f85771c/model_checkpointing/checkpoint_handler.py#L145

I know this can be easily solved by assigning cfg.checkpoint_folder some value, but just curious why adding another config rather than using cfg.dist_checkpoint_root_folder and cfg.dist_checkpoint_folder.

Besides, using two dist_ configs is also strange. Isn't one such config enough?

@HamidShojanazeri
Copy link
Contributor

HamidShojanazeri commented Jul 26, 2023

Thanks @drcege for trying it out, it was mostly legacy from previous config format, been fixed in this PR, #51.

Current folder design, thinking of multiple runs to be saved in a central place, in future, we may think of reformatting into one folder.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants