You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Do CLS and SEP need to have separate labels than your training labels?
Yes, these should be labelled as -100.
Does PAD need to have a separate label?
No, this should be labelled as -100.
Does the attention mask basically ignore the effect of those things with attention mask 0, or is it still important?
Yes it does. You should have attention mask set to zero for PAD, but not for CLS and SEP because they can contain important information about the training items. The idea is:
for PAD: don't attend and don't compute loss
for CLS/SEP: attend but don't compute loss
There is a thread on huggingface forums on this.
It is a requirement to use -100 for the CLS and SEP tokens because of the crossentropy function in PyTorch. You might apply this to the subtokens as well if you wanted to ignore them.
The text was updated successfully, but these errors were encountered:
Do CLS and SEP need to have separate labels than your training labels?
Yes, these should be labelled as -100.
Does PAD need to have a separate label?
No, this should be labelled as -100.
Does the attention mask basically ignore the effect of those things with attention mask 0, or is it still important?
Yes it does. You should have attention mask set to zero for PAD, but not for CLS and SEP because they can contain important information about the training items. The idea is:
for PAD: don't attend and don't compute loss
for CLS/SEP: attend but don't compute loss
There is a thread on huggingface forums on this.
It is a requirement to use -100 for the CLS and SEP tokens because of the crossentropy function in PyTorch. You might apply this to the subtokens as well if you wanted to ignore them.
The text was updated successfully, but these errors were encountered: