-
Notifications
You must be signed in to change notification settings - Fork 4.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug fix] Add rope_theta for llama config #4480
Conversation
@microsoft-github-policy-service agree company="JetBrains" |
@mrwyattii Take a look at this, please. |
@mrwyattii, Please, run CI. Do I need to do anything else? |
Thanks @cupertank LGTM |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, LGTM!
@cupertank it looks like several of the inference related unit tests are failing. I can help debug next week. |
@mrwyattii I think I found a bug, run CI, please. |
@mrwyattii I hope it's last fix, run CI, please |
@mrwyattii I see everything is good now, so maybe we merge it? |
Added to the merge queue. Thank you @cupertank! |
Do you plan to have a patch release for this? |
* Add rope_theta for llama config * Add rope_theta to bias_add_transform_0213 * Fix CI problems * Add rope_theta to linear layer --------- Co-authored-by: Michael Wyatt <michaelwyatt@microsoft.com> Co-authored-by: Lev Kurilenko <113481193+lekurile@users.noreply.github.com>
This PR updates `diffusers_attention` to properly pass the `rope_theta` arg to the `linear_func` calls. This was added in GH-4480 and needed to be updated for the diffusers attention module as well. Co-authored-by: Michael Wyatt <michaelwyatt@microsoft.com>
* Add rope_theta for llama config * Add rope_theta to bias_add_transform_0213 * Fix CI problems * Add rope_theta to linear layer --------- Co-authored-by: Michael Wyatt <michaelwyatt@microsoft.com> Co-authored-by: Lev Kurilenko <113481193+lekurile@users.noreply.github.com>
This PR updates `diffusers_attention` to properly pass the `rope_theta` arg to the `linear_func` calls. This was added in microsoftGH-4480 and needed to be updated for the diffusers attention module as well. Co-authored-by: Michael Wyatt <michaelwyatt@microsoft.com>
Fixed bug with CodeLlama. Bug description is here #4442. Now DeepSpeed uses rope_theta from transformers.