New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Seq2Seq] Fix a couple of bugs and clean examples #7474
[Seq2Seq] Fix a couple of bugs and clean examples #7474
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm okay with those breaking changes as they are easy to fix: the user just has to reorder their inputs (and they should really use keyword arguments as it's more python idiomatic anyway) and supporting ONNX and torch.jit is definitely something we want. Thanks for doing the TF part with the annoying re-numbering!
Just remember to reorder the doc accordingly or I'll be grumpy ;-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes this is good for me.
It's an important improvement that we already did once for encoder-only models.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I agree that the breaking changes here are needed as well. Having the same API for all models is the #1 goal of the library, so it makes sense to uniformize the APIs of all seq2seq models now before we add more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
You got a little zero-shot BLEU boost it seems! This Branch: 34.433 (on en-de) |
] | ||
self.assertListEqual(arg_names[:5], expected_arg_names) | ||
else: | ||
expected_arg_names = ["input_ids"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need the ordering to enable something for those models?
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
] | ||
self.assertListEqual(arg_names[:5], expected_arg_names) | ||
else: | ||
expected_arg_names = ["input_ids"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this should be enforced in most models, I like this test! For example with RoBERTa, it seems that all architectures follow the input_ids
into attention_mask
, except the RobertaForMultipleChoice
. I see this more as a bug than something voluntary.
Some models like GPT-2 however should opt-out, as they use the past key values as a second value, which makes sense and is a defining trait of their API.
] | ||
self.assertListEqual(arg_names[:5], expected_arg_names) | ||
else: | ||
expected_arg_names = ["input_ids"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would advocate for a case-by-case study of these models to see which should have the ordering with input_ids
> attention_mask
, and which should stay the way they are.
@@ -187,16 +187,16 @@ def __init__(self, config): | |||
# dividide. | |||
self.pooling_mult = None | |||
|
|||
def init_attention_inputs(self, input_embeds, attention_mask=None, token_type_ids=None): | |||
def init_attention_inputs(self, inputs_embeds, attention_mask=None, token_type_ids=None): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sgugger - I renamed the variable here because it's always called inputs_embeds
in the lib and funnel was the only model that had input_embeds
at some point. It's just a nit because it's not exposed to the user, but I think it's better for consistency.
@@ -152,7 +175,12 @@ def test_saved_model_with_hidden_states_output(self): | |||
tf.saved_model.save(model, tmpdirname) | |||
model = tf.keras.models.load_model(tmpdirname) | |||
outputs = model(inputs_dict) | |||
output = outputs[list(outputs.keys())[-1]] if isinstance(outputs, dict) else outputs[-1] | |||
|
|||
if self.is_encoder_decoder: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sgugger @LysandreJik @jplu - this slow test was failing for all models - fixed it here.
output = outputs[list(outputs.keys())[-1]] if isinstance(outputs, dict) else outputs[-1] | ||
|
||
if self.is_encoder_decoder: | ||
output = outputs["encoder_attentions"] if isinstance(outputs, dict) else outputs[-1] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sgugger @LysandreJik @jplu - this slow test was failing for all models - fixed it here.
Is the docstring still wrong? T5ForConditionalGeneration, under |
I can't find this docstring, can you link to it? |
In T5, it's under method |
Oh yeah you're right that was a bad copy & past probably! Do you feel like opening a PR to fix it / delete it? That would be amazing :-) |
* clean T5 * fix t5 tests * fix index typo * fix tf common test * fix examples * change positional ordering for Bart and FSTM * add signature test * clean docs and add tests * add docs to encoder decoder * clean docs * correct two doc strings * remove sig test for TF Elektra & Funnel * fix tf t5 slow tests * fix input_ids to inputs in tf * Update src/transformers/modeling_bart.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Update src/transformers/modeling_bart.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * implement lysandre results * make style * fix encoder decoder typo * fix tf slow tests * fix slow tests * renaming * remove unused input Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
What does this PR do?
IMPORTANT - BREAKING CHANGES
This PR changes the behavior of T5 and TFT5 due to 3 bugs and 1 small change in forward API to support onnx and torchscript.
It also slighlty changes the behavior of Bart and EncoderDecoderModel
Description
1st Bug: Due to a sloppy review from my part these lines got merged into the PyTorch T5 model: https://github.com/huggingface/transformers/pull/5518/files#r496058908, which set
decoder_input_ids = input_ids
ifdecoder_input_ids
were not provided. This is misleading and also just wrong. Never shoulddecoder_input_ids=input_ids
in T5. This is not done during training nor during inference, so these lines don't make much sense. Because T5 is mostly either used with.generate()
or in training withmodel(input_ids=input_ids, labels=labels)
, in which cases the change has no effect we only received one issue recently about it #7358. The change was done to make T5 work with onnx, but is just wrong IMO.2nd Bug: T5 was implemented with a small bug regarding the relative distance bias calculation for the cross-attention layer. It was spotted here: #7323. The correction leads to slightly different results when doing beam search. @sshleifer - if it's easy for you could maybe run a quick eval on WMT to see if bleu improves in this PR?
3rd Bug: T5 currently cuts the
input_ids
to the last token whenpast
is used. This is a convenient function for the user, but has the potential to lead to bugs as mentioned here: #4368 (comment). It's not really in the spirit of the library to do some magic under-the-hood which make certain use-cases easier for the user, but prevents other edge cases as shown in the issue above.Feature request: support torchscript and onnx. This PR allows to use T5 with torchscript and onnx.
Now the difficult part: This PR has breaking changes!
For once, all three bugs lead to breaking changes. Then in order to solve the torchscript, onnx problem we are having with T5 (and we actually have with all Seq2Seq models), I had to change the positional ordering of T5's forward pass slightly, which should have minimal breaking changes because I doubt anybody has used T5 with positional arguments as follows:
tf_model(input_ids, None, None, decoder_input_ids)
.We had a couple of issues e.g. #5647 about supporting torchscript and onnx for Bart/T5. If we ever want to support onnx and torchscript in the future, I think we need to do this positional reordering. As shown by @mfuntowicz onnx can lead to great speed improvements and we also know now that
torchscript
can give ~30% speed improvement on dynamic input sizes. => I would be really happy if we could accept this slight breaking change here.I thought about this quite a bit and I think it's very important that we agree on ONE positional argument ordering for the forward pass of Seq2Seq models. At the moment the ordering of Bart, EncoderDecoder, T5, ... is not coherent and done in a way that does not support onnx and torchscript. At the moment no seq2seq model really supports torchscript (Bart does in the test, but one cannot provide
decoder_input_ids
when using torchscript which effectively makes torchscript useless for inference). The ordering should be as follows IMO:input_ids
attention_mask
decoder_input_ids
decoder_attention_mask
encoder_outputs
...,
meaning that all
required
inputs should come first to comply with onnx and torchscript and optional ones should come after.I changed the ordering of all seq2seq models to comply with this format even though we have some positional ordering breaking changes for
T5
,Bart
andEncoderDecoder
.UPDATE:
Added tests that the encoder decoder forward signature stays the same
Applied changes to all Seq2Seq models
Cleaned docs
Fixed TF slow tests