Add GPT-NeoX (StableLM) #204

seanmor5 · 2023-04-21T19:36:40Z

Should support StableLM. It's rebased off of Llama for rotary embeddings.

@jonatanklosko I wonder if you might be able to take a look at the sliced qkv calculation? When debugging the values I was getting for query are correct, but for key they are way off.

jonatanklosko · 2023-04-28T21:29:39Z

@seanmor5 yeah, the dense units for GPTNeoX QKV implementation are ordered as (num_attention_heads, 3, head_size), while for example for BLIP it's (3, num_attention_heads, head_size), so we need to split the kernel differently. I pushed a fix.

The remaining difference is the attention block, GPTNeoX has use_parallel_residual option, which uses a different formulation in the attention block, such that attention and MLP are computed in parallel, as in GPT-J. I made a quick tweak locally and after that all tests pass. I will refactor it into an option later :)

seanmor5 · 2023-04-28T21:31:22Z

Beautiful! I believe open assistant is based on this model too? So theoretically this is StableLM and OpenAssistant?

jonatanklosko · 2023-04-28T21:41:57Z

@seanmor5 the Pythia models are GPTNeoX yeah, the new ones used in huggingface chat are LLaMA I think and they are stored in a very fancy xored way.

jonatanklosko · 2023-05-04T14:48:22Z

lib/bumblebee/layers/transformer.ex

+      block_impl(
+        block_type,
+        hidden_state,
+        self_attention_norm,
+        self_attention,
+        cross_attention_maybe,
+        cross_attention_norm,
+        cross_attention,
+        output_norm,
+        ffn
+      )


The variations started to accumulate, so I added a separate function to orchestrate the order and flow of layers. That's the best thing I came up with so far, maybe we will arrive at a better abstraction at some point, but it's internal so it's fine as long as it works :D

seanmor5 and others added 2 commits May 4, 2023 13:20

Add GPT Neo X

bc799da

Split query-key-value kernel properly

35a1452

jonatanklosko force-pushed the sm-neox branch from 9f6a4ba to 46bbab3 Compare May 4, 2023 14:20

Add transformer block types

bbc5b88

jonatanklosko force-pushed the sm-neox branch from 46bbab3 to bbc5b88 Compare May 4, 2023 14:21

Add missing architectures

7e47d5a

jonatanklosko reviewed May 4, 2023

View reviewed changes

jonatanklosko added 3 commits May 4, 2023 17:35

Up

21df82d

Fix NaNs

d55de3b

Add parallel transformer block option

91a92da

jonatanklosko merged commit 1a16603 into main May 4, 2023
1 check passed

jonatanklosko deleted the sm-neox branch May 4, 2023 20:19

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add GPT-NeoX (StableLM) #204

Add GPT-NeoX (StableLM) #204

seanmor5 commented Apr 21, 2023

jonatanklosko commented Apr 28, 2023

seanmor5 commented Apr 28, 2023

jonatanklosko commented Apr 28, 2023

jonatanklosko May 4, 2023

Add GPT-NeoX (StableLM) #204

Add GPT-NeoX (StableLM) #204

Conversation

seanmor5 commented Apr 21, 2023

jonatanklosko commented Apr 28, 2023

seanmor5 commented Apr 28, 2023

jonatanklosko commented Apr 28, 2023

jonatanklosko May 4, 2023

Choose a reason for hiding this comment