-
Notifications
You must be signed in to change notification settings - Fork 26.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[TF Led] Fix wrong decoder attention mask behavior #9601
Conversation
if input_shape[-1] > 1: | ||
combined_attention_mask = _make_causal_mask(input_shape, past_key_values_length=past_key_values_length) | ||
else: | ||
combined_attention_mask = _expand_mask( | ||
tf.ones((input_shape[0], input_shape[1] + past_key_values_length)), tgt_len=input_shape[-1] | ||
) | ||
|
||
if inputs["attention_mask"] is None and inputs["input_ids"] is not None and input_shape[-1] > 1: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wrongly copied this from the old Bart templates. Those lines previously automatically create a correct attention mask in case there are pad_tokens. However, we do not support this behavior in the PyTorch version of LED and should not support it. In case some inputs should be padded the tokenizers should take care of masking the inputs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for fixing!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Will do a patch release.
* fix tf led * remove loop file
What does this PR do?
This PR fixes TF LED. I wrongly added some lines to TFLed that automatically change the attention mask. However, this is incorrect behavior and not present in the PT version of the model. Sadly, I discovered this now after the release yesterday. @LysandreJik do you think we can patch this fix to circumvent breaking backward compatibility (but it's a bug IMO anyway).
This also fixes consequencetly the flaky
let_pt_tf_equivalence
test. I ran the test 40 times and it does not fail anymore.Before submitting
Pull Request section?
to it if that's the case.
documentation guidelines, and
here are tips on formatting docstrings.
Who can review?
Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors which may be interested in your PR.