Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GPT2 model does not have attention mask #808

Closed
Saner3 opened this issue Jul 17, 2019 · 5 comments
Closed

GPT2 model does not have attention mask #808

Saner3 opened this issue Jul 17, 2019 · 5 comments

Comments

@Saner3
Copy link

Saner3 commented Jul 17, 2019

Hello, in the doc string of GPT2 model, it says there is an optional input called attention_mask to avoid computing attention on paddings. But actually I cannot find the implementation and there is no such arguments either.

@thomwolf
Copy link
Member

Indeed, I will remove this doctring, there is no attention_mask on GPT-2.

@Saner3
Copy link
Author

Saner3 commented Jul 17, 2019

Indeed, I will remove this doctring, there is no attention_mask on GPT-2.

But what to do if I do want to avoid computing attention on the paddings in the input sequences.

@mehdimashayekhi
Copy link

@Saner3 @thomwolf I have same question? don't we need that for paddings?

@thomwolf
Copy link
Member

GPT-2 is a model with absolute position embeddings (like Bert) so you should always pad on the right to get best performances for this model (will add this information to the doc_string).

As it's a causal model (only attend to the left context), also means that the model will not attend to the padding tokens (which are on the right) for any real token anyway.

So in conclusion, no need to take special care of avoiding attention on padding.

Just don't use the output of the padded tokens for anything as they don't contain any reliable information (which is obvious I hope).

@mehdimashayekhi
Copy link

@thomwolf thanks much, and great job!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants