Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Seq2Seq] Fix a couple of bugs and clean examples #7474

Merged
merged 25 commits into from Oct 1, 2020

Conversation

patrickvonplaten
Copy link
Contributor

@patrickvonplaten patrickvonplaten commented Sep 30, 2020

What does this PR do?

IMPORTANT - BREAKING CHANGES

This PR changes the behavior of T5 and TFT5 due to 3 bugs and 1 small change in forward API to support onnx and torchscript.
It also slighlty changes the behavior of Bart and EncoderDecoderModel

Description

1st Bug: Due to a sloppy review from my part these lines got merged into the PyTorch T5 model: https://github.com/huggingface/transformers/pull/5518/files#r496058908, which set decoder_input_ids = input_ids if decoder_input_ids were not provided. This is misleading and also just wrong. Never should decoder_input_ids=input_ids in T5. This is not done during training nor during inference, so these lines don't make much sense. Because T5 is mostly either used with .generate() or in training with model(input_ids=input_ids, labels=labels), in which cases the change has no effect we only received one issue recently about it #7358. The change was done to make T5 work with onnx, but is just wrong IMO.

2nd Bug: T5 was implemented with a small bug regarding the relative distance bias calculation for the cross-attention layer. It was spotted here: #7323. The correction leads to slightly different results when doing beam search. @sshleifer - if it's easy for you could maybe run a quick eval on WMT to see if bleu improves in this PR?

3rd Bug: T5 currently cuts the input_ids to the last token when past is used. This is a convenient function for the user, but has the potential to lead to bugs as mentioned here: #4368 (comment). It's not really in the spirit of the library to do some magic under-the-hood which make certain use-cases easier for the user, but prevents other edge cases as shown in the issue above.

Feature request: support torchscript and onnx. This PR allows to use T5 with torchscript and onnx.

Now the difficult part: This PR has breaking changes!

For once, all three bugs lead to breaking changes. Then in order to solve the torchscript, onnx problem we are having with T5 (and we actually have with all Seq2Seq models), I had to change the positional ordering of T5's forward pass slightly, which should have minimal breaking changes because I doubt anybody has used T5 with positional arguments as follows:

tf_model(input_ids, None, None, decoder_input_ids).

We had a couple of issues e.g. #5647 about supporting torchscript and onnx for Bart/T5. If we ever want to support onnx and torchscript in the future, I think we need to do this positional reordering. As shown by @mfuntowicz onnx can lead to great speed improvements and we also know now that torchscript can give ~30% speed improvement on dynamic input sizes. => I would be really happy if we could accept this slight breaking change here.

I thought about this quite a bit and I think it's very important that we agree on ONE positional argument ordering for the forward pass of Seq2Seq models. At the moment the ordering of Bart, EncoderDecoder, T5, ... is not coherent and done in a way that does not support onnx and torchscript. At the moment no seq2seq model really supports torchscript (Bart does in the test, but one cannot provide decoder_input_ids when using torchscript which effectively makes torchscript useless for inference). The ordering should be as follows IMO:

input_ids
attention_mask
decoder_input_ids
decoder_attention_mask
encoder_outputs
...,

meaning that all required inputs should come first to comply with onnx and torchscript and optional ones should come after.

I changed the ordering of all seq2seq models to comply with this format even though we have some positional ordering breaking changes for T5, Bart and EncoderDecoder.

UPDATE:

Added tests that the encoder decoder forward signature stays the same
Applied changes to all Seq2Seq models
Cleaned docs
Fixed TF slow tests

@patrickvonplaten patrickvonplaten changed the title [T5] Fix bugs and clean examples [T5] Fix a couple of bugs and clean examples Sep 30, 2020
Copy link
Collaborator

@sgugger sgugger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm okay with those breaking changes as they are easy to fix: the user just has to reorder their inputs (and they should really use keyword arguments as it's more python idiomatic anyway) and supporting ONNX and torch.jit is definitely something we want. Thanks for doing the TF part with the annoying re-numbering!

Just remember to reorder the doc accordingly or I'll be grumpy ;-)

Copy link
Member

@thomwolf thomwolf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes this is good for me.

It's an important improvement that we already did once for encoder-only models.

Copy link
Member

@LysandreJik LysandreJik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I agree that the breaking changes here are needed as well. Having the same API for all models is the #1 goal of the library, so it makes sense to uniformize the APIs of all seq2seq models now before we add more.

guillaume-be added a commit to guillaume-be/rust-bert that referenced this pull request Sep 30, 2020
guillaume-be added a commit to guillaume-be/rust-bert that referenced this pull request Sep 30, 2020
@patrickvonplaten patrickvonplaten changed the title [T5] Fix a couple of bugs and clean examples [Seq2Seq] Fix a couple of bugs and clean examples Sep 30, 2020
Copy link
Contributor

@sshleifer sshleifer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@sshleifer
Copy link
Contributor

You got a little zero-shot BLEU boost it seems!

This Branch: 34.433 (on en-de)
Master: 34.4052

src/transformers/modeling_bart.py Outdated Show resolved Hide resolved
src/transformers/modeling_bart.py Outdated Show resolved Hide resolved
]
self.assertListEqual(arg_names[:5], expected_arg_names)
else:
expected_arg_names = ["input_ids"]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need the ordering to enable something for those models?

patrickvonplaten and others added 4 commits October 1, 2020 09:59
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Copy link
Member

@LysandreJik LysandreJik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

src/transformers/modeling_encoder_decoder.py Show resolved Hide resolved
src/transformers/modeling_t5.py Outdated Show resolved Hide resolved
src/transformers/modeling_tf_t5.py Show resolved Hide resolved
]
self.assertListEqual(arg_names[:5], expected_arg_names)
else:
expected_arg_names = ["input_ids"]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should be enforced in most models, I like this test! For example with RoBERTa, it seems that all architectures follow the input_ids into attention_mask, except the RobertaForMultipleChoice. I see this more as a bug than something voluntary.

Some models like GPT-2 however should opt-out, as they use the past key values as a second value, which makes sense and is a defining trait of their API.

]
self.assertListEqual(arg_names[:5], expected_arg_names)
else:
expected_arg_names = ["input_ids"]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would advocate for a case-by-case study of these models to see which should have the ordering with input_ids > attention_mask, and which should stay the way they are.

tests/test_modeling_t5.py Show resolved Hide resolved
@@ -187,16 +187,16 @@ def __init__(self, config):
# dividide.
self.pooling_mult = None

def init_attention_inputs(self, input_embeds, attention_mask=None, token_type_ids=None):
def init_attention_inputs(self, inputs_embeds, attention_mask=None, token_type_ids=None):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sgugger - I renamed the variable here because it's always called inputs_embeds in the lib and funnel was the only model that had input_embeds at some point. It's just a nit because it's not exposed to the user, but I think it's better for consistency.

@@ -152,7 +175,12 @@ def test_saved_model_with_hidden_states_output(self):
tf.saved_model.save(model, tmpdirname)
model = tf.keras.models.load_model(tmpdirname)
outputs = model(inputs_dict)
output = outputs[list(outputs.keys())[-1]] if isinstance(outputs, dict) else outputs[-1]

if self.is_encoder_decoder:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sgugger @LysandreJik @jplu - this slow test was failing for all models - fixed it here.

output = outputs[list(outputs.keys())[-1]] if isinstance(outputs, dict) else outputs[-1]

if self.is_encoder_decoder:
output = outputs["encoder_attentions"] if isinstance(outputs, dict) else outputs[-1]
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sgugger @LysandreJik @jplu - this slow test was failing for all models - fixed it here.

@patrickvonplaten patrickvonplaten merged commit 62f5ae6 into huggingface:master Oct 1, 2020
@jsrozner
Copy link
Contributor

jsrozner commented Nov 10, 2020

@patrickvonplaten

Is the docstring still wrong? T5ForConditionalGeneration, under decoder_input_ids says "if both decoder_input_ids and decoder_inputs_embeds are both unset, decoder_input_ids takes the value of input_ids"

@patrickvonplaten
Copy link
Contributor Author

@patrickvonplaten

Is the docstring still wrong? T5ForConditionalGeneration, under decoder_input_ids says "if both decoder_input_ids and decoder_inputs_embeds are both unset, decoder_input_ids takes the value of input_ids"

I can't find this docstring, can you link to it?

@jsrozner
Copy link
Contributor

jsrozner commented Nov 11, 2020

@patrickvonplaten
Is the docstring still wrong? T5ForConditionalGeneration, under decoder_input_ids says "if both decoder_input_ids and decoder_inputs_embeds are both unset, decoder_input_ids takes the value of input_ids"

I can't find this docstring, can you link to it?

In T5, it's under method forward and argument decoder_input_ids. Here's code link

@patrickvonplaten
Copy link
Contributor Author

Oh yeah you're right that was a bad copy & past probably! Do you feel like opening a PR to fix it / delete it? That would be amazing :-)

fabiocapsouza pushed a commit to fabiocapsouza/transformers that referenced this pull request Nov 15, 2020
* clean T5

* fix t5 tests

* fix index typo

* fix tf common test

* fix examples

* change positional ordering for Bart and FSTM

* add signature test

* clean docs and add tests

* add docs to encoder decoder

* clean docs

* correct two doc strings

* remove sig test for TF Elektra & Funnel

* fix tf t5 slow tests

* fix input_ids to inputs in tf

* Update src/transformers/modeling_bart.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/modeling_bart.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* implement lysandre results

* make style

* fix encoder decoder typo

* fix tf slow tests

* fix slow tests

* renaming

* remove unused input

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
fabiocapsouza added a commit to fabiocapsouza/transformers that referenced this pull request Nov 15, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
6 participants