Add T5 model #159

seanmor5 · 2023-02-13T21:50:38Z

Google's Flan-T5 model is a really good open-source alternative to GPT, and it's based on this T5. This is still failing, I think it's something to do with the relative attention bias, but will go back and look in a little bit.

I had to change a few things:

I added customizable layer norm because T5 uses an RMS norm rather than a layer norm. I will eventually upstream RMS norm into Axon
I added a relative_attention_bias option because some T5 blocks rely on relative attention bias
I added an output_norm option to control whether or not to apply the final output_norm for each block. T5 does not and instead applies one globally for a block

seanmor5 · 2023-02-13T23:46:03Z

T5 also does not scale queries, so I added that layer in. It's strange though we match PT and Flax until the application of softmax on the attention weights, and then we start to diverge, but I don't think there's anything wrong with Axon's softmax implementation

jonatanklosko

Awesome, a couple comments!

lib/bumblebee/text/t5.ex

lib/bumblebee/layers/transformer.ex

test/bumblebee/text/t5_test.exs

lib/bumblebee/layers/transformer.ex

jonatanklosko · 2023-02-15T12:39:36Z

lib/bumblebee/text/t5.ex

+    name = opts[:name]
+
+    hidden_state
+    |> Layers.rms_norm(name: join(name, "layer_norm"), epsilon: spec.layer_norm_epsilon)


@seanmor5 this is the output norm actually, so we don't need :output_norm, we just need to remove this here. That's the funny part of transformer implementations, they group the layers differently, but at the end of the day they are mostly the same :D

Also, I think we can add support for fft: [use_bias: false] defaulting to true, and then we don't need the custom fft altogether!

@jonatanklosko I missed a step in the implementation which uses a custom ffn based on the activation function passed, which I am adding now so we may still need the custom ffn in that case, will see how generic I can make it!

Ah, so this is the output layer, but in this case they use the state before the normalization layer for the shortcut connection. I removed the :output_norm norm option and changed shortcut to use the parent, which I think is simpler and better shows the model difference/similarity :)

lib/bumblebee/layers/transformer.ex

jonatanklosko · 2023-02-20T19:04:14Z

@seanmor5 I made some changes, feel free to merge once everything looks good to you. The CI is failing, probably OOM or similar, since I verified that everything passes locally.

Co-authored-by: Jonatan Kłosko <jonatanklosko@gmail.com>

jonatanklosko reviewed Feb 15, 2023

View reviewed changes

lib/bumblebee/layers/transformer.ex Outdated Show resolved Hide resolved

jonatanklosko reviewed Feb 15, 2023

View reviewed changes

lib/bumblebee/layers/transformer.ex Outdated Show resolved Hide resolved

jonatanklosko force-pushed the sm-t5 branch from 8d1f859 to 079298c Compare February 20, 2023 18:33

seanmor5 and others added 22 commits February 20, 2023 11:37

Failing draft of T5

e6a5733

Fix all tests with new transformer options

bcd58cd

Apply position bias at each layer

91e6988

Do not scale query

dbe72bd

Fix some minor implementation bugs:

576da4d

Pass tests

44a2b44

Add conditional generation head

7f72995

Fix gated act

2ea9c2f

Apply suggestions from code review

637c121

Co-authored-by: Jonatan Kłosko <jonatanklosko@gmail.com>

Fix tests

29dba78

Update test

941b831

Account for attention projection size in the cache

09ee9df

Fix relative attention bias in autoregression with cache

bddc371

Remove :output_norm

61a5214

Add tokenizer tests

f8d609e

Pass :scale_query? to attention weights layer

0899016

Simplify

f170e04

Refactor gated activation

c63cf20

attention_projection_size -> attention_head_size

2bd3e6f

Refactor relative attention bias

0e7de35

Make layer norm option optional

4ec563d

Fix albert tests

bdbc53d

seanmor5 force-pushed the sm-t5 branch from 55b4d5f to bdbc53d Compare February 20, 2023 19:38

Remove usage of deprecated power

5c8a7dc

seanmor5 merged commit a2df872 into main Feb 20, 2023

seanmor5 deleted the sm-t5 branch February 20, 2023 19:53

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add T5 model #159

Add T5 model #159

seanmor5 commented Feb 13, 2023 •

edited

Loading

seanmor5 commented Feb 13, 2023

jonatanklosko left a comment

jonatanklosko Feb 15, 2023

seanmor5 Feb 15, 2023

jonatanklosko Feb 16, 2023 •

edited

Loading

jonatanklosko commented Feb 20, 2023

Add T5 model #159

Add T5 model #159

Conversation

seanmor5 commented Feb 13, 2023 • edited Loading

seanmor5 commented Feb 13, 2023

jonatanklosko left a comment

Choose a reason for hiding this comment

jonatanklosko Feb 15, 2023

Choose a reason for hiding this comment

seanmor5 Feb 15, 2023

Choose a reason for hiding this comment

jonatanklosko Feb 16, 2023 • edited Loading

Choose a reason for hiding this comment

jonatanklosko commented Feb 20, 2023

seanmor5 commented Feb 13, 2023 •

edited

Loading

jonatanklosko Feb 16, 2023 •

edited

Loading