[TF Led] Fix wrong decoder attention mask behavior #9601

patrickvonplaten · 2021-01-14T17:32:25Z

What does this PR do?

This PR fixes TF LED. I wrongly added some lines to TFLed that automatically change the attention mask. However, this is incorrect behavior and not present in the PT version of the model. Sadly, I discovered this now after the release yesterday. @LysandreJik do you think we can patch this fix to circumvent breaking backward compatibility (but it's a bug IMO anyway).

This also fixes consequencetly the flaky let_pt_tf_equivalence test. I ran the test 40 times and it does not fail anymore.

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline,
Pull Request section?
Was this discussed/approved via a Github issue or the forum? Please add a link
to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors which may be interested in your PR.

patrickvonplaten · 2021-01-14T17:34:35Z

src/transformers/models/led/modeling_tf_led.py

        if input_shape[-1] > 1:
            combined_attention_mask = _make_causal_mask(input_shape, past_key_values_length=past_key_values_length)
        else:
            combined_attention_mask = _expand_mask(
                tf.ones((input_shape[0], input_shape[1] + past_key_values_length)), tgt_len=input_shape[-1]
            )

-        if inputs["attention_mask"] is None and inputs["input_ids"] is not None and input_shape[-1] > 1:


I wrongly copied this from the old Bart templates. Those lines previously automatically create a correct attention mask in case there are pad_tokens. However, we do not support this behavior in the PyTorch version of LED and should not support it. In case some inputs should be padded the tokenizers should take care of masking the inputs.

sgugger

Thanks for fixing!

LysandreJik

LGTM! Will do a patch release.

* fix tf led * remove loop file

fix tf led

576cc2c

patrickvonplaten commented Jan 14, 2021

View reviewed changes

remove loop file

993ec8a

patrickvonplaten requested review from LysandreJik and sgugger January 14, 2021 21:52

sgugger approved these changes Jan 15, 2021

View reviewed changes

LysandreJik approved these changes Jan 15, 2021

View reviewed changes

LysandreJik merged commit 90ca8d3 into huggingface:master Jan 15, 2021

LysandreJik pushed a commit that referenced this pull request Jan 21, 2021

[TF Led] Fix wrong decoder attention mask behavior (#9601)

21d4595

* fix tf led * remove loop file

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[TF Led] Fix wrong decoder attention mask behavior #9601

[TF Led] Fix wrong decoder attention mask behavior #9601

patrickvonplaten commented Jan 14, 2021 •

edited

Loading

patrickvonplaten Jan 14, 2021

sgugger left a comment

LysandreJik left a comment

[TF Led] Fix wrong decoder attention mask behavior #9601

[TF Led] Fix wrong decoder attention mask behavior #9601

Conversation

patrickvonplaten commented Jan 14, 2021 • edited Loading

What does this PR do?

Before submitting

Who can review?

patrickvonplaten Jan 14, 2021

Choose a reason for hiding this comment

sgugger left a comment

Choose a reason for hiding this comment

LysandreJik left a comment

Choose a reason for hiding this comment

patrickvonplaten commented Jan 14, 2021 •

edited

Loading