-
Notifications
You must be signed in to change notification settings - Fork 4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fail to load checkpoints after zero3 initialize #3574
Comments
@Stick-To I'm also facing this issue. Could you please share how you resolved it? |
@dittops, please re-open and share repro steps including a stack trace. Thanks! |
im also facing this issue, any help would be great! |
I'm also facing this issue too, anyone who can help ? |
I actually found a solution to my problem in the Saving and Loading section of this article: https://huggingface.co/docs/accelerate/usage_guides/deepspeed#saving-and-loading |
@Stick-To I'm also facing this issue. Could you please share how you resolved it? |
I have not solve it |
One possible reason could be the conflict of multiple initialization between hf deepspeed integration and explicit call "deepspeed.zero.Init()". I solve this following here. |
I met the same problem and solved it by reinitialization. U can deepcopy an original model before u wrap it with |
The text was updated successfully, but these errors were encountered: