Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Why the RoBERTa's max_position_embeddings size is 512+2=514? #1363

Closed
kugwzk opened this issue Sep 28, 2019 · 9 comments
Closed

Why the RoBERTa's max_position_embeddings size is 512+2=514? #1363

kugwzk opened this issue Sep 28, 2019 · 9 comments
Labels

Comments

@kugwzk
Copy link

kugwzk commented Sep 28, 2019

❓ Questions & Help

When I see the code of Roberta, I have a question about the padding_idx = 1, I don't know very well. And the comment is still confused for me.

@julien-c
Copy link
Member

What's your precise question?

@kugwzk
Copy link
Author

kugwzk commented Sep 28, 2019

What's your precise question?

the self.padding_idx's meaning in modeling_roberta.py

@BramVanroy
Copy link
Collaborator

It's the position of the padding vector. It's not unique to RoBERTa but far more general, especially for embeddings. Take a look at the PyTorch documentation.

@kugwzk
Copy link
Author

kugwzk commented Sep 28, 2019

It's the position of the padding vector. It's not unique to RoBERTa but far more general, especially for embeddings. Take a look at the PyTorch documentation.

I know that, but I confuse about why there is 1 and the <s> is 0, is it ignore and why the max_position_embeddings size is 512+2=514?

@BramVanroy
Copy link
Collaborator

Because that's their index in the vocab. The max_position_embeddings size is indeed 514, I'm not sure why. The tokenizer seems to handle text correctly with a max of 512. Perhaps someone of the developers can help with that. I would advise you to change the title of your topic.

self.max_len_single_sentence = self.max_len - 2 # take into account special tokens
self.max_len_sentences_pair = self.max_len - 4 # take into account special tokens

@kugwzk kugwzk changed the title What is the mean of padding_idx in modeling_roberta? Why the RoBERTa's max_position_embeddings size is 512+2=514? Sep 28, 2019
@julien-c
Copy link
Member

@LysandreJik can chime in if I’m wrong, but afaik max_position_embeddings is just the name of the variable that we use to encode the size of the embedding matrix. Max_len is correctly set to 512.

@stale
Copy link

stale bot commented Nov 27, 2019

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the wontfix label Nov 27, 2019
@stale stale bot closed this as completed Dec 4, 2019
@morganmcg1
Copy link
Contributor

Answer here in case anyone from the future is curious: facebookresearch/fairseq#1187

@dsantiago
Copy link

dsantiago commented Mar 17, 2021

Answer here in case anyone from the future is curious: pytorch/fairseq#1187

@morganmcg1 Tks for this, was getting all kinds of CUDA errors because i setted max_position_embeddings=512, now that i setted 514 it's running ok...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

5 participants