-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Deepspeed Zero3 LoRA merge error #297
Comments
same error here , any update? |
same error here, any update? |
same question |
1 similar comment
same question |
Same problem |
Let me clarify few points about ZeRO3 model initialisation with The method This context manager will partitioned the model parameters and store them in the new Things get more complicated when we add LoRA parameters to this model. For example, we wrap the partitioned model with this |
@1ytic |
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread. |
Hello , I met a same situation as you did. Have you find a solution to combine LoRA and DeepSpeed ZeRO3 to finetune model? |
Hi Anyms: This is a quite old issue so I am not sure if there has already had some new and elegant method to tackle it. One year ago, I gave a try to first call "model_wrapped._zero3_consolidated_16bit_state_dict()" then "get_peft_model_state_dict" on the new engine_state_dict. This is based on Deepspeed 0.10 and Peft "13e53fc" head. Even though my method could successfully save the correct "adapter.bin", it was time consuming to aggregate all model slices. Therefore, I would not suggest you to follow my solution. |
Hi,
I found that when I use Deepspeed zero3, the LoRA merge cannot work. (The zero2 case works properly.) Could you help me to check that?
RuntimeErrorRuntimeError: : The size of tensor a (0) must match the size of tensor b (2048) at non-singleton dimension 1The size of tensor a (0) must match the size of tensor b (2048) at non-singleton dimension 1
Full error:
The config:
The text was updated successfully, but these errors were encountered: