[Seq2Seq] Fix a couple of bugs and clean examples #7474

patrickvonplaten · 2020-09-30T13:34:42Z

What does this PR do?

IMPORTANT - BREAKING CHANGES

This PR changes the behavior of T5 and TFT5 due to 3 bugs and 1 small change in forward API to support onnx and torchscript.
It also slighlty changes the behavior of Bart and EncoderDecoderModel

Description

1st Bug: Due to a sloppy review from my part these lines got merged into the PyTorch T5 model: https://github.com/huggingface/transformers/pull/5518/files#r496058908, which set decoder_input_ids = input_ids if decoder_input_ids were not provided. This is misleading and also just wrong. Never should decoder_input_ids=input_ids in T5. This is not done during training nor during inference, so these lines don't make much sense. Because T5 is mostly either used with .generate() or in training with model(input_ids=input_ids, labels=labels), in which cases the change has no effect we only received one issue recently about it #7358. The change was done to make T5 work with onnx, but is just wrong IMO.

2nd Bug: T5 was implemented with a small bug regarding the relative distance bias calculation for the cross-attention layer. It was spotted here: #7323. The correction leads to slightly different results when doing beam search. @sshleifer - if it's easy for you could maybe run a quick eval on WMT to see if bleu improves in this PR?

3rd Bug: T5 currently cuts the input_ids to the last token when past is used. This is a convenient function for the user, but has the potential to lead to bugs as mentioned here: #4368 (comment). It's not really in the spirit of the library to do some magic under-the-hood which make certain use-cases easier for the user, but prevents other edge cases as shown in the issue above.

Feature request: support torchscript and onnx. This PR allows to use T5 with torchscript and onnx.

Now the difficult part: This PR has breaking changes!

For once, all three bugs lead to breaking changes. Then in order to solve the torchscript, onnx problem we are having with T5 (and we actually have with all Seq2Seq models), I had to change the positional ordering of T5's forward pass slightly, which should have minimal breaking changes because I doubt anybody has used T5 with positional arguments as follows:

tf_model(input_ids, None, None, decoder_input_ids).

We had a couple of issues e.g. #5647 about supporting torchscript and onnx for Bart/T5. If we ever want to support onnx and torchscript in the future, I think we need to do this positional reordering. As shown by @mfuntowicz onnx can lead to great speed improvements and we also know now that torchscript can give ~30% speed improvement on dynamic input sizes. => I would be really happy if we could accept this slight breaking change here.

I thought about this quite a bit and I think it's very important that we agree on ONE positional argument ordering for the forward pass of Seq2Seq models. At the moment the ordering of Bart, EncoderDecoder, T5, ... is not coherent and done in a way that does not support onnx and torchscript. At the moment no seq2seq model really supports torchscript (Bart does in the test, but one cannot provide decoder_input_ids when using torchscript which effectively makes torchscript useless for inference). The ordering should be as follows IMO:

input_ids
attention_mask
decoder_input_ids
decoder_attention_mask
encoder_outputs
...,

meaning that all required inputs should come first to comply with onnx and torchscript and optional ones should come after.

I changed the ordering of all seq2seq models to comply with this format even though we have some positional ordering breaking changes for T5, Bart and EncoderDecoder.

UPDATE:

Added tests that the encoder decoder forward signature stays the same
Applied changes to all Seq2Seq models
Cleaned docs
Fixed TF slow tests

sgugger

I'm okay with those breaking changes as they are easy to fix: the user just has to reorder their inputs (and they should really use keyword arguments as it's more python idiomatic anyway) and supporting ONNX and torch.jit is definitely something we want. Thanks for doing the TF part with the annoying re-numbering!

Just remember to reorder the doc accordingly or I'll be grumpy ;-)

thomwolf

Yes this is good for me.

It's an important improvement that we already did once for encoder-only models.

LysandreJik

Yes, I agree that the breaking changes here are needed as well. Having the same API for all models is the #1 goal of the library, so it makes sense to uniformize the APIs of all seq2seq models now before we add more.

sshleifer

LGTM!

tests/test_modeling_tf_common.py

sshleifer · 2020-09-30T19:47:05Z

You got a little zero-shot BLEU boost it seems!

This Branch: 34.433 (on en-de)
Master: 34.4052

src/transformers/modeling_bart.py

sgugger · 2020-09-30T19:58:11Z

tests/test_modeling_common.py

+                ]
+                self.assertListEqual(arg_names[:5], expected_arg_names)
+            else:
+                expected_arg_names = ["input_ids"]


Do we need the ordering to enable something for those models?

…ormers into clean_t5

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

LysandreJik

LGTM!

src/transformers/modeling_encoder_decoder.py

src/transformers/modeling_t5.py

src/transformers/modeling_tf_t5.py

LysandreJik · 2020-10-01T08:21:33Z

tests/test_modeling_common.py

+                ]
+                self.assertListEqual(arg_names[:5], expected_arg_names)
+            else:
+                expected_arg_names = ["input_ids"]


I think this should be enforced in most models, I like this test! For example with RoBERTa, it seems that all architectures follow the input_ids into attention_mask, except the RobertaForMultipleChoice. I see this more as a bug than something voluntary.

Some models like GPT-2 however should opt-out, as they use the past key values as a second value, which makes sense and is a defining trait of their API.

LysandreJik · 2020-10-01T08:22:21Z

tests/test_modeling_common.py

+                ]
+                self.assertListEqual(arg_names[:5], expected_arg_names)
+            else:
+                expected_arg_names = ["input_ids"]


I would advocate for a case-by-case study of these models to see which should have the ordering with input_ids > attention_mask, and which should stay the way they are.

tests/test_modeling_t5.py

…ormers into clean_t5

patrickvonplaten · 2020-10-01T15:36:18Z

src/transformers/modeling_funnel.py

@@ -187,16 +187,16 @@ def __init__(self, config):
        # dividide.
        self.pooling_mult = None

-    def init_attention_inputs(self, input_embeds, attention_mask=None, token_type_ids=None):
+    def init_attention_inputs(self, inputs_embeds, attention_mask=None, token_type_ids=None):


@sgugger - I renamed the variable here because it's always called inputs_embeds in the lib and funnel was the only model that had input_embeds at some point. It's just a nit because it's not exposed to the user, but I think it's better for consistency.

patrickvonplaten · 2020-10-01T15:38:29Z

tests/test_modeling_tf_common.py

@@ -152,7 +175,12 @@ def test_saved_model_with_hidden_states_output(self):
                tf.saved_model.save(model, tmpdirname)
                model = tf.keras.models.load_model(tmpdirname)
                outputs = model(inputs_dict)
-                output = outputs[list(outputs.keys())[-1]] if isinstance(outputs, dict) else outputs[-1]
+
+                if self.is_encoder_decoder:


@sgugger @LysandreJik @jplu - this slow test was failing for all models - fixed it here.

patrickvonplaten · 2020-10-01T15:38:35Z

tests/test_modeling_tf_common.py

-                output = outputs[list(outputs.keys())[-1]] if isinstance(outputs, dict) else outputs[-1]
+
+                if self.is_encoder_decoder:
+                    output = outputs["encoder_attentions"] if isinstance(outputs, dict) else outputs[-1]


@sgugger @LysandreJik @jplu - this slow test was failing for all models - fixed it here.

jsrozner · 2020-11-10T22:32:22Z

@patrickvonplaten

Is the docstring still wrong? T5ForConditionalGeneration, under decoder_input_ids says "if both decoder_input_ids and decoder_inputs_embeds are both unset, decoder_input_ids takes the value of input_ids"

patrickvonplaten · 2020-11-11T08:53:27Z

@patrickvonplaten

Is the docstring still wrong? T5ForConditionalGeneration, under decoder_input_ids says "if both decoder_input_ids and decoder_inputs_embeds are both unset, decoder_input_ids takes the value of input_ids"

I can't find this docstring, can you link to it?

jsrozner · 2020-11-11T16:07:35Z

@patrickvonplaten
Is the docstring still wrong? T5ForConditionalGeneration, under decoder_input_ids says "if both decoder_input_ids and decoder_inputs_embeds are both unset, decoder_input_ids takes the value of input_ids"

I can't find this docstring, can you link to it?

In T5, it's under method forward and argument decoder_input_ids. Here's code link

patrickvonplaten · 2020-11-11T18:25:54Z

Oh yeah you're right that was a bad copy & past probably! Do you feel like opening a PR to fix it / delete it? That would be amazing :-)

* clean T5 * fix t5 tests * fix index typo * fix tf common test * fix examples * change positional ordering for Bart and FSTM * add signature test * clean docs and add tests * add docs to encoder decoder * clean docs * correct two doc strings * remove sig test for TF Elektra & Funnel * fix tf t5 slow tests * fix input_ids to inputs in tf * Update src/transformers/modeling_bart.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Update src/transformers/modeling_bart.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * implement lysandre results * make style * fix encoder decoder typo * fix tf slow tests * fix slow tests * renaming * remove unused input Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

…e#7474)" This reverts commit 7e18628.

patrickvonplaten added 3 commits September 30, 2020 15:00

clean T5

050a6fd

fix t5 tests

bb8df0c

fix index typo

73ab032

patrickvonplaten changed the title ~~Clean t5~~ [T5] Fix bugs and clean examples Sep 30, 2020

patrickvonplaten added 2 commits September 30, 2020 13:42

fix tf common test

75d3044

fix examples

ab43f02

patrickvonplaten requested review from sgugger, LysandreJik, thomwolf and sshleifer September 30, 2020 14:23

patrickvonplaten changed the title ~~[T5] Fix bugs and clean examples~~ [T5] Fix a couple of bugs and clean examples Sep 30, 2020

sgugger approved these changes Sep 30, 2020

View reviewed changes

thomwolf approved these changes Sep 30, 2020

View reviewed changes

LysandreJik approved these changes Sep 30, 2020

View reviewed changes

patrickvonplaten requested a review from mfuntowicz September 30, 2020 16:10

guillaume-be added a commit to guillaume-be/rust-bert that referenced this pull request Sep 30, 2020

Updated attention mechanism (see huggingface/transformers#7474)

724f467

guillaume-be added a commit to guillaume-be/rust-bert that referenced this pull request Sep 30, 2020

Simplified past handling (see huggingface/transformers#7474)

ce7333c

patrickvonplaten mentioned this pull request Sep 30, 2020

[Bart] Cannot use Bart decoder cache with torchscript #6348

Closed

4 tasks

change positional ordering for Bart and FSTM

4a4d54d

patrickvonplaten changed the title ~~[T5] Fix a couple of bugs and clean examples~~ [Seq2Seq] Fix a couple of bugs and clean examples Sep 30, 2020

sshleifer approved these changes Sep 30, 2020

View reviewed changes

patrickvonplaten added 5 commits September 30, 2020 20:26

add signature test

8c81d77

clean docs and add tests

e174e1e

add docs to encoder decoder

86d6c5b

clean docs

4afd845

correct two doc strings

cb9eb05

patrickvonplaten commented Sep 30, 2020

View reviewed changes

tests/test_modeling_tf_common.py Show resolved Hide resolved

patrickvonplaten requested review from sgugger and LysandreJik September 30, 2020 19:12

fix tf t5 slow tests

638e9e3

sgugger approved these changes Sep 30, 2020

View reviewed changes

patrickvonplaten and others added 4 commits October 1, 2020 09:59

fix input_ids to inputs in tf

5cb980b

Merge branch 'clean_t5' of https://github.com/patrickvonplaten/transf…

f286f81

…ormers into clean_t5

Update src/transformers/modeling_bart.py

0147890

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

Update src/transformers/modeling_bart.py

fad2292

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

LysandreJik approved these changes Oct 1, 2020

View reviewed changes

patrickvonplaten added 8 commits October 1, 2020 13:16

implement lysandre results

34b0b80

make style

75e17b7

fix encoder decoder typo

8b08cb2

fix tf slow tests

64104ad

Merge branch 'clean_t5' of https://github.com/patrickvonplaten/transf…

0f32ca7

…ormers into clean_t5

fix slow tests

2c489e7

renaming

2034185

remove unused input

255028b

patrickvonplaten commented Oct 1, 2020

View reviewed changes

patrickvonplaten merged commit 62f5ae6 into huggingface:master Oct 1, 2020

patrickvonplaten mentioned this pull request Oct 1, 2020

[Attention Mask] Fix data type #7513

Merged

fabiocapsouza added a commit to fabiocapsouza/transformers that referenced this pull request Nov 15, 2020

Revert "[Seq2Seq] Fix a couple of bugs and clean examples (huggingfac…

0719963

…e#7474)" This reverts commit 7e18628.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Seq2Seq] Fix a couple of bugs and clean examples #7474

[Seq2Seq] Fix a couple of bugs and clean examples #7474

patrickvonplaten commented Sep 30, 2020 •

edited

sgugger left a comment

thomwolf left a comment

LysandreJik left a comment

sshleifer left a comment

sshleifer commented Sep 30, 2020

sgugger Sep 30, 2020

LysandreJik left a comment

LysandreJik Oct 1, 2020

LysandreJik Oct 1, 2020

patrickvonplaten Oct 1, 2020

patrickvonplaten Oct 1, 2020

patrickvonplaten Oct 1, 2020

jsrozner commented Nov 10, 2020 •

edited

patrickvonplaten commented Nov 11, 2020

jsrozner commented Nov 11, 2020 •

edited

patrickvonplaten commented Nov 11, 2020

[Seq2Seq] Fix a couple of bugs and clean examples #7474

[Seq2Seq] Fix a couple of bugs and clean examples #7474

Conversation

patrickvonplaten commented Sep 30, 2020 • edited

What does this PR do?

IMPORTANT - BREAKING CHANGES

Description

UPDATE:

sgugger left a comment

Choose a reason for hiding this comment

thomwolf left a comment

Choose a reason for hiding this comment

LysandreJik left a comment

Choose a reason for hiding this comment

sshleifer left a comment

Choose a reason for hiding this comment

sshleifer commented Sep 30, 2020

sgugger Sep 30, 2020

Choose a reason for hiding this comment

LysandreJik left a comment

Choose a reason for hiding this comment

LysandreJik Oct 1, 2020

Choose a reason for hiding this comment

LysandreJik Oct 1, 2020

Choose a reason for hiding this comment

patrickvonplaten Oct 1, 2020

Choose a reason for hiding this comment

patrickvonplaten Oct 1, 2020

Choose a reason for hiding this comment

patrickvonplaten Oct 1, 2020

Choose a reason for hiding this comment

jsrozner commented Nov 10, 2020 • edited

patrickvonplaten commented Nov 11, 2020

jsrozner commented Nov 11, 2020 • edited

patrickvonplaten commented Nov 11, 2020

patrickvonplaten commented Sep 30, 2020 •

edited

jsrozner commented Nov 10, 2020 •

edited

jsrozner commented Nov 11, 2020 •

edited