Labels for CLS, SEP and PAD as well as X #30

namiyousef · 2022-03-29T13:27:17Z

Do CLS and SEP need to have separate labels than your training labels?
Yes, these should be labelled as -100.
Does PAD need to have a separate label?
No, this should be labelled as -100.
Does the attention mask basically ignore the effect of those things with attention mask 0, or is it still important?
Yes it does. You should have attention mask set to zero for PAD, but not for CLS and SEP because they can contain important information about the training items. The idea is:
for PAD: don't attend and don't compute loss
for CLS/SEP: attend but don't compute loss
There is a thread on huggingface forums on this.

It is a requirement to use -100 for the CLS and SEP tokens because of the crossentropy function in PyTorch. You might apply this to the subtokens as well if you wanted to ignore them.

namiyousef assigned ValerieF412 and namiyousef Mar 29, 2022

namiyousef assigned namiyousef and unassigned ValerieF412 and namiyousef Apr 1, 2022

namiyousef added the bug Something isn't working label Apr 1, 2022

namiyousef closed this as completed Apr 16, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Labels for CLS, SEP and PAD as well as X #30

Labels for CLS, SEP and PAD as well as X #30

namiyousef commented Mar 29, 2022 •

edited

Loading

Labels for CLS, SEP and PAD as well as X #30

Labels for CLS, SEP and PAD as well as X #30

Comments

namiyousef commented Mar 29, 2022 • edited Loading

namiyousef commented Mar 29, 2022 •

edited

Loading