Why the RoBERTa's max_position_embeddings size is 512+2=514? #1363

kugwzk · 2019-09-28T11:52:45Z

❓ Questions & Help

When I see the code of Roberta, I have a question about the padding_idx = 1, I don't know very well. And the comment is still confused for me.

julien-c · 2019-09-28T13:00:12Z

What's your precise question?

kugwzk · 2019-09-28T13:11:27Z

What's your precise question?

the self.padding_idx's meaning in modeling_roberta.py

BramVanroy · 2019-09-28T14:01:20Z

It's the position of the padding vector. It's not unique to RoBERTa but far more general, especially for embeddings. Take a look at the PyTorch documentation.

kugwzk · 2019-09-28T14:05:22Z

It's the position of the padding vector. It's not unique to RoBERTa but far more general, especially for embeddings. Take a look at the PyTorch documentation.

I know that, but I confuse about why there is 1 and the <s> is 0, is it ignore and why the max_position_embeddings size is 512+2=514?

BramVanroy · 2019-09-28T14:13:38Z

Because that's their index in the vocab. The max_position_embeddings size is indeed 514, I'm not sure why. The tokenizer seems to handle text correctly with a max of 512. Perhaps someone of the developers can help with that. I would advise you to change the title of your topic.

transformers/transformers/tokenization_roberta.py

Lines 84 to 85 in ae50ad9

    
           self.max_len_single_sentence = self.max_len - 2  # take into account special tokens 
        
           self.max_len_sentences_pair = self.max_len - 4  # take into account special tokens

julien-c · 2019-09-28T16:13:24Z

@LysandreJik can chime in if I’m wrong, but afaik max_position_embeddings is just the name of the variable that we use to encode the size of the embedding matrix. Max_len is correctly set to 512.

stale · 2019-11-27T16:18:51Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

morganmcg1 · 2020-04-17T12:50:17Z

Answer here in case anyone from the future is curious: facebookresearch/fairseq#1187

dsantiago · 2021-03-17T04:11:10Z

Answer here in case anyone from the future is curious: pytorch/fairseq#1187

@morganmcg1 Tks for this, was getting all kinds of CUDA errors because i setted max_position_embeddings=512, now that i setted 514 it's running ok...

kugwzk changed the title ~~What is the mean of padding_idx in modeling_roberta?~~ Why the RoBERTa's max_position_embeddings size is 512+2=514? Sep 28, 2019

BramVanroy referenced this issue Sep 28, 2019

fix padding_idx of RoBERTa model

a6a6d9e

stale bot added the wontfix label Nov 27, 2019

stale bot closed this as completed Dec 4, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Why the RoBERTa's max_position_embeddings size is 512+2=514? #1363

Why the RoBERTa's max_position_embeddings size is 512+2=514? #1363

kugwzk commented Sep 28, 2019

julien-c commented Sep 28, 2019

kugwzk commented Sep 28, 2019

BramVanroy commented Sep 28, 2019

kugwzk commented Sep 28, 2019 •

edited

BramVanroy commented Sep 28, 2019

julien-c commented Sep 28, 2019

stale bot commented Nov 27, 2019

morganmcg1 commented Apr 17, 2020

dsantiago commented Mar 17, 2021 •

edited

Why the RoBERTa's max_position_embeddings size is 512+2=514? #1363

Why the RoBERTa's max_position_embeddings size is 512+2=514? #1363

Comments

kugwzk commented Sep 28, 2019

❓ Questions & Help

julien-c commented Sep 28, 2019

kugwzk commented Sep 28, 2019

BramVanroy commented Sep 28, 2019

kugwzk commented Sep 28, 2019 • edited

BramVanroy commented Sep 28, 2019

julien-c commented Sep 28, 2019

stale bot commented Nov 27, 2019

morganmcg1 commented Apr 17, 2020

dsantiago commented Mar 17, 2021 • edited

kugwzk commented Sep 28, 2019 •

edited

dsantiago commented Mar 17, 2021 •

edited