[Flax] Add FlaxBlenderbot #13633

stancld · 2021-09-17T20:44:16Z

What does this PR do?

This PR adds flax implementation of Blenderbot.

Before submitting

Did you read the contributor guideline,
Pull Request section?
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?

TODOs:

fix PT-Flax model equivalence

Who can review?

@patrickvonplaten @patil-suraj

* Add tests * Clean source code and fix some bugs

tests/test_modeling_flax_blenderbot.py

stancld · 2021-10-15T07:56:08Z

@patrickvonplaten I would like to kindly ping for a review. :) I've been struggling to achieve the pt-flax equivalence, however, I cannot find that difference/bug in this new flax implementation.

Thanks a lot! :)

patrickvonplaten · 2021-10-22T15:29:27Z

Hey @stancld,

Thanks a lot for the PR! The difference between PT and Flax in your PR is very close actually < 0.1 so it might also very well be that the implementation is correct!

I'll try to take a deeper look at the end of next week. Could you try one last thing:

add print statements such as:

print("PT", hidden_states.sum()) in PyTorch

and

print("FX", hidden_states.sum()) in Flax

before the word embeddings, after the word embeddings, each encoder transformer layer, before the decoder word embeddings, the decoder attention layers, ... to see when the activations start to diverge. If it happens gradually it might very well be that the model is correct and there is a difference. If it haapens all of a sudden at some point then there might be a subtle bug.

stancld · 2021-10-25T06:34:27Z

@patrickvonplaten Thank you for the tip! I'll have a look :)

stancld · 2021-10-28T09:52:54Z

Hello @patrickvonplaten, I ran a few tests it seems and one output is below. There is some level of divergence, but not sure if it's too severe. I'm gonna check the Flax code today once again :)

===PyTorch===
---Encoder---
PT first hidden-states:  tensor(-1.2589)
PT encoder after self-attn:  tensor(0.5862)
PT encoder: tensor(-0.7895)
PT encoder after self-attn:  tensor(0.0465)
PT encoder last hidden states before norm:  tensor(-0.2601)
PT encoder last hidden states after norm:  tensor(0.)
---Decoder---
PT decoder after self-attn:  tensor(1.1000)
PT decoder after cross-attn:  tensor(0.1547)
PT decoder: tensor(-0.0142)
PT decoder after self-attn:  tensor(0.9638)
PT decoder after cross-attn:  tensor(1.7759)
PT decoder: tensor(2.7198)
PT decoder last hidden states before norm:  tensor(2.7198)
PT decoder last hidden states after norm:  tensor(-5.7220e-06)
PT output:  tensor(-5.7220e-06)

===Flax===
---Encoder---
FX first hidden-states:  -1.2589027
FX encoder after self-attn:  0.59013414
FX encoder: -0.7862803
FX encoder after self-attn:  0.04762589
FX encoder last hidden states before norm:  -0.25001374
FX encoder last hidden states after norm:  4.053116e-06
---Decoder---
FX decoder after self-attn:  1.1029385
FX decoder after cross-attn:  0.15325405
FX decoder: -0.013041288
FX decoder after self-attn:  0.96520036
FX decoder after cross-attn:  1.7912248
FX decoder last hidden states before norm:  2.735363
FX decoder last hidden states after norm:  -1.1697412e-06
FX output:  -1.1697412e-06

…into flax_blenderbot

patrickvonplaten · 2021-11-02T10:48:05Z

src/transformers/models/bart/modeling_flax_bart.py

@@ -405,7 +405,7 @@ def setup(self) -> None:
            num_heads=self.config.encoder_attention_heads,
            dropout=self.config.attention_dropout,
        )
-        self.self_attn_layer_norm = nn.LayerNorm(dtype=self.dtype)
+        self.self_attn_layer_norm = nn.LayerNorm(dtype=self.dtype, epsilon=1e-05)


@patil-suraj - the default in PyTorch is 1e-05, so I adapted it for all Bart-like models. Given that PT and Flax tests were passing for Bart before I think this "bug correction" is fine in terms of backwards compatibility

patrickvonplaten

Good for me now!

stancld · 2021-11-03T16:42:09Z

@patrickvonplaten Thank you very much for spotting the problem! :]

…nto flax_blenderbot

patrickvonplaten · 2021-11-03T17:50:22Z

Tests on master seem to be broken currently :-/

But I think the PR is good to go. @patil-suraj could you maybe take a look once you're back (and maybe rebase to master with @stancld to fix the circli ci runner)

patrickvonplaten · 2021-11-04T18:11:00Z

Awesome - I let you merge @patil-suraj once you're back :-)

patil-suraj · 2021-11-29T15:24:03Z

src/transformers/models/blenderbot/modeling_flax_blenderbot.py

+            self.embed_dim,
+            use_bias=self.bias,
+            dtype=self.dtype,
+            kernel_init=jax.nn.initializers.normal(self.config.init_std, self.dtype),


We should not pass the dtype anymore to the kernel_init, it's meant to specify the dtype of computation and not of parameters. This was a bug in all flax models, which is fixed by #13098.

@stancld Could you please rebase the branch again with master and fix this according to what is explained in #13098?

@patil-suraj Thank you for providing me with the context. Should be fixed now :]

patil-suraj

Thank you @stancld for adding this, LGTM!

WIll push a couple of flax checkpoint and then merge :)

patil-suraj · 2021-11-30T11:00:18Z

src/transformers/testing_utils.py

+if is_flax_available():
+    import jax
+
+    jax_device = jax.default_backend()
+else:
+    jax_device = None
+


tests/test_modeling_flax_blenderbot.py

stancld added 5 commits September 17, 2021 16:40

Init Flax implementation for Blenderbot

44eda19

Add a majority of stuff except for tests

7a388f8

make style quality

a23d867

Add tests and fix some bugs

950a58f

* Add tests * Clean source code and fix some bugs

Merge branch 'master' into flax_blenderbot

04d5837

stancld changed the title ~~[Flax] Add FlaxBlenderbot~~ [WIP] [Flax] Add FlaxBlenderbot Sep 17, 2021

stancld added 2 commits September 17, 2021 22:58

Fix copies and docs

b829548

Fix jax device condition for tests

8f3770f

stancld changed the title ~~[WIP] [Flax] Add FlaxBlenderbot~~ [Flax] Add FlaxBlenderbot Sep 17, 2021

stancld added 3 commits September 20, 2021 23:53

Merge branch 'master' into flax_blenderbot

0668da4

Fix layer norm in the encoder

b01d9b0

Merge branch 'master' into flax_blenderbot

eed08c4

patrickvonplaten reviewed Sep 24, 2021

View reviewed changes

tests/test_modeling_flax_blenderbot.py Outdated Show resolved Hide resolved

stancld added 4 commits September 24, 2021 14:23

Merge branch 'master' into flax_blenderbot

077dc23

Fix a few typos in the test file

1a3976d

Merge branch 'master' into flax_blenderbot

6f45e48

Merge branch 'master' into flax_blenderbot

fe8b194

make fix-copies

56fb0ff

stancld requested a review from patrickvonplaten October 15, 2021 08:53

stancld added 2 commits October 22, 2021 14:08

Merge branch 'master' into flax_blenderbot

f06f4b2

make fix-copies

03928bf

stancld mentioned this pull request Oct 28, 2021

[Flax] Add Flax implementation of BlenderbotSmallModel #14188

Closed

stancld and others added 4 commits October 29, 2021 09:22

Merge branch 'master' into flax_blenderbot

be59789

Merge branch 'master' into flax_blenderbot

b3328ad

Merge branch 'master' of https://github.com/huggingface/transformers …

74dfb1a

…into flax_blenderbot

fix layer norm

81824c5

patrickvonplaten reviewed Nov 2, 2021

View reviewed changes

patrickvonplaten approved these changes Nov 2, 2021

View reviewed changes

patrickvonplaten requested a review from patil-suraj November 2, 2021 11:17

erge branch 'master' of https://github.com/huggingface/transformers i…

6239e70

…nto flax_blenderbot

Merge branch 'master' into flax_blenderbot

7c886f8

huggingface deleted a comment from github-actions bot Nov 29, 2021

patil-suraj reviewed Nov 29, 2021

View reviewed changes

stancld added 4 commits November 29, 2021 17:47

Merge branch 'master' into flax_blenderbot

b672dec

Fix Flax params dtype (huggingface#13090)

62346a7

Fix PR reference (huggingface#13098)

7f21df3

make fix-copies

5c59ecb

patil-suraj approved these changes Nov 30, 2021

View reviewed changes

patil-suraj reviewed Nov 30, 2021

View reviewed changes

tests/test_modeling_flax_blenderbot.py Outdated Show resolved Hide resolved

Update tests/test_modeling_flax_blenderbot.py

809e864

patil-suraj merged commit faacd74 into huggingface:master Nov 30, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Flax] Add FlaxBlenderbot #13633

[Flax] Add FlaxBlenderbot #13633

stancld commented Sep 17, 2021 •

edited

Loading

stancld commented Oct 15, 2021 •

edited

Loading

patrickvonplaten commented Oct 22, 2021

stancld commented Oct 25, 2021

stancld commented Oct 28, 2021

patrickvonplaten Nov 2, 2021

patrickvonplaten left a comment

stancld commented Nov 3, 2021

patrickvonplaten commented Nov 3, 2021

patrickvonplaten commented Nov 4, 2021

patil-suraj Nov 29, 2021

stancld Nov 29, 2021

patil-suraj left a comment

patil-suraj Nov 30, 2021

[Flax] Add FlaxBlenderbot #13633

[Flax] Add FlaxBlenderbot #13633

Conversation

stancld commented Sep 17, 2021 • edited Loading

What does this PR do?

Before submitting

TODOs:

Who can review?

stancld commented Oct 15, 2021 • edited Loading

patrickvonplaten commented Oct 22, 2021

stancld commented Oct 25, 2021

stancld commented Oct 28, 2021

patrickvonplaten Nov 2, 2021

Choose a reason for hiding this comment

patrickvonplaten left a comment

Choose a reason for hiding this comment

stancld commented Nov 3, 2021

patrickvonplaten commented Nov 3, 2021

patrickvonplaten commented Nov 4, 2021

patil-suraj Nov 29, 2021

Choose a reason for hiding this comment

stancld Nov 29, 2021

Choose a reason for hiding this comment

patil-suraj left a comment

Choose a reason for hiding this comment

patil-suraj Nov 30, 2021

Choose a reason for hiding this comment

stancld commented Sep 17, 2021 •

edited

Loading

stancld commented Oct 15, 2021 •

edited

Loading