-
Notifications
You must be signed in to change notification settings - Fork 91
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add GPT-NeoX (StableLM) #204
Conversation
@seanmor5 yeah, the dense units for GPTNeoX QKV implementation are ordered as The remaining difference is the attention block, GPTNeoX has |
Beautiful! I believe open assistant is based on this model too? So theoretically this is StableLM and OpenAssistant? |
block_impl( | ||
block_type, | ||
hidden_state, | ||
self_attention_norm, | ||
self_attention, | ||
cross_attention_maybe, | ||
cross_attention_norm, | ||
cross_attention, | ||
output_norm, | ||
ffn | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The variations started to accumulate, so I added a separate function to orchestrate the order and flow of layers. That's the best thing I came up with so far, maybe we will arrive at a better abstraction at some point, but it's internal so it's fine as long as it works :D
Should support StableLM. It's rebased off of Llama for rotary embeddings.
@jonatanklosko I wonder if you might be able to take a look at the sliced qkv calculation? When debugging the values I was getting for
query
are correct, but forkey
they are way off.