-
Notifications
You must be signed in to change notification settings - Fork 9.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Grouped-Query Attention #384
Comments
Out of impatience I asked Claude 2 about the differences between Implementation A (LLaMA 1) and Implementation B (LLaMA 2):
Then I explained the concept of GQA and asked it for the parts enabling GQA:
Is this a proper assessment by Claude? |
Hello Meta GenAI team (cc @ruanslv),
With regards to the 70B model, I'm currently looking into the implementation of the GQA architecture -- specifically after noticing the 8192 x 1024 layer shapes, I was trying to identify the conditional GQA parts in your reference implementation but couldn't pin it down.
Given that there are some conditions that smell suspiciously GQA-related, could you please elaborate on the parts of the implementation that enable this architecture specifically for the 34B / 70B models?
Thanks
The text was updated successfully, but these errors were encountered: