-
Notifications
You must be signed in to change notification settings - Fork 31.2k
Attention mask is important in the case of batching... #16222
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Attention mask is important in the case of batching... #16222
Conversation
|
The documentation is not available anymore as the PR was closed or merged. |
src/transformers/pipelines/base.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In random models, special_tokens_mask would be extended in the batch with 0 instead of 1 so we could still predict PAD token in the pipeline.
I think having pad being always considered a special_tokens_mask is fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
return_attention_mask=True is also incorrect because FNet doesn't expect an attention mask.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe FNet will continue to exhibit the flaw that pad tokens modify the output, I don't know enough about it though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You will thus get the attention mask since you don't remove it afterward, but I'm guessing that's the whole point?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually it seems the FNet tokenizer doesn't return attention mask if we don't ask for it. (Which is fair since the model doesn't seem to accept them).
sgugger
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks for fixing!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You will thus get the attention mask since you don't remove it afterward, but I'm guessing that's the whole point?
a6bf0fc to
e0bc450
Compare
* Attention mask is important in the case of batching... * Improve the fix. * Making the sentence different enough that they exhibit different predictions.
What does this PR do?
Fixes #16221
Fixes # (issue)
Before submitting
Pull Request section?
to it if that's the case.
documentation guidelines, and
here are tips on formatting docstrings.
Who can review?
Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.