Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add GPT-NeoX (StableLM) #204

Merged
merged 7 commits into from
May 4, 2023
Merged

Add GPT-NeoX (StableLM) #204

merged 7 commits into from
May 4, 2023

Conversation

seanmor5
Copy link
Contributor

Should support StableLM. It's rebased off of Llama for rotary embeddings.

@jonatanklosko I wonder if you might be able to take a look at the sliced qkv calculation? When debugging the values I was getting for query are correct, but for key they are way off.

@jonatanklosko
Copy link
Member

@seanmor5 yeah, the dense units for GPTNeoX QKV implementation are ordered as (num_attention_heads, 3, head_size), while for example for BLIP it's (3, num_attention_heads, head_size), so we need to split the kernel differently. I pushed a fix.

The remaining difference is the attention block, GPTNeoX has use_parallel_residual option, which uses a different formulation in the attention block, such that attention and MLP are computed in parallel, as in GPT-J. I made a quick tweak locally and after that all tests pass. I will refactor it into an option later :)

@seanmor5
Copy link
Contributor Author

Beautiful! I believe open assistant is based on this model too? So theoretically this is StableLM and OpenAssistant?

@jonatanklosko
Copy link
Member

@seanmor5 the Pythia models are GPTNeoX yeah, the new ones used in huggingface chat are LLaMA I think and they are stored in a very fancy xored way.

Comment on lines +464 to +474
block_impl(
block_type,
hidden_state,
self_attention_norm,
self_attention,
cross_attention_maybe,
cross_attention_norm,
cross_attention,
output_norm,
ffn
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The variations started to accumulate, so I added a separate function to orchestrate the order and flow of layers. That's the best thing I came up with so far, maybe we will arrive at a better abstraction at some point, but it's internal so it's fine as long as it works :D

@jonatanklosko jonatanklosko merged commit 1a16603 into main May 4, 2023
1 check passed
@jonatanklosko jonatanklosko deleted the sm-neox branch May 4, 2023 20:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants