Skip to content

Conversation

@Narsil
Copy link
Contributor

@Narsil Narsil commented Mar 17, 2022

What does this PR do?

Fixes #16221

Fixes # (issue)

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you read the contributor guideline,
    Pull Request section?
  • Was this discussed/approved via a Github issue or the forum? Please add a link
    to it if that's the case.
  • Did you make sure to update the documentation with your changes? Here are the
    documentation guidelines, and
    here are tips on formatting docstrings.
  • Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

@HuggingFaceDocBuilderDev
Copy link

HuggingFaceDocBuilderDev commented Mar 17, 2022

The documentation is not available anymore as the PR was closed or merged.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In random models, special_tokens_mask would be extended in the batch with 0 instead of 1 so we could still predict PAD token in the pipeline.

I think having pad being always considered a special_tokens_mask is fine.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

return_attention_mask=True is also incorrect because FNet doesn't expect an attention mask.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe FNet will continue to exhibit the flaw that pad tokens modify the output, I don't know enough about it though.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You will thus get the attention mask since you don't remove it afterward, but I'm guessing that's the whole point?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually it seems the FNet tokenizer doesn't return attention mask if we don't ask for it. (Which is fair since the model doesn't seem to accept them).

@Narsil Narsil requested a review from sgugger March 17, 2022 11:54
Copy link
Collaborator

@sgugger sgugger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for fixing!

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You will thus get the attention mask since you don't remove it afterward, but I'm guessing that's the whole point?

@Narsil Narsil force-pushed the fix_attention_mask_in_token_classif_pipeline branch from a6bf0fc to e0bc450 Compare March 18, 2022 08:01
@Narsil Narsil merged commit ecb4662 into huggingface:master Mar 18, 2022
@Narsil Narsil deleted the fix_attention_mask_in_token_classif_pipeline branch March 18, 2022 09:02
FrancescoSaverioZuppichini pushed a commit that referenced this pull request Mar 24, 2022
* Attention mask is important in the case of batching...

* Improve the fix.

* Making the sentence different enough that they exhibit different
predictions.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

token classification pipeline does not use attention mask, affects predictions

3 participants