GPT2 model does not have attention mask #808

Saner3 · 2019-07-17T18:36:10Z

Hello, in the doc string of GPT2 model, it says there is an optional input called attention_mask to avoid computing attention on paddings. But actually I cannot find the implementation and there is no such arguments either.

thomwolf · 2019-07-17T21:17:15Z

Indeed, I will remove this doctring, there is no attention_mask on GPT-2.

Saner3 · 2019-07-17T21:39:33Z

Indeed, I will remove this doctring, there is no attention_mask on GPT-2.

But what to do if I do want to avoid computing attention on the paddings in the input sequences.

mehdimashayekhi · 2019-08-16T18:44:50Z

@Saner3 @thomwolf I have same question? don't we need that for paddings?

thomwolf · 2019-08-20T09:25:51Z

GPT-2 is a model with absolute position embeddings (like Bert) so you should always pad on the right to get best performances for this model (will add this information to the doc_string).

As it's a causal model (only attend to the left context), also means that the model will not attend to the padding tokens (which are on the right) for any real token anyway.

So in conclusion, no need to take special care of avoiding attention on padding.

Just don't use the output of the padded tokens for anything as they don't contain any reliable information (which is obvious I hope).

mehdimashayekhi · 2019-08-21T04:06:44Z

@thomwolf thanks much, and great job!

thomwolf mentioned this issue Aug 20, 2019

Add a few of typos corrections, bugs fixes and small improvements #1057

Merged

thomwolf closed this as completed in 53c8f70 Aug 20, 2019

DaveyBiggers mentioned this issue Feb 25, 2022

Padding / attention_mask questions kzl/decision-transformer#36

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GPT2 model does not have attention mask #808

GPT2 model does not have attention mask #808

Saner3 commented Jul 17, 2019 •

edited

Loading

thomwolf commented Jul 17, 2019

Saner3 commented Jul 17, 2019

mehdimashayekhi commented Aug 16, 2019

thomwolf commented Aug 20, 2019

mehdimashayekhi commented Aug 21, 2019

GPT2 model does not have attention mask #808

GPT2 model does not have attention mask #808

Comments

Saner3 commented Jul 17, 2019 • edited Loading

thomwolf commented Jul 17, 2019

Saner3 commented Jul 17, 2019

mehdimashayekhi commented Aug 16, 2019

thomwolf commented Aug 20, 2019

mehdimashayekhi commented Aug 21, 2019

Saner3 commented Jul 17, 2019 •

edited

Loading