Replace pad_token with -100 for LM loss calculation#4718
Conversation
445fda1 to
6af7942
Compare
Codecov Report
@@ Coverage Diff @@
## master #4718 +/- ##
==========================================
- Coverage 77.10% 76.45% -0.65%
==========================================
Files 128 128
Lines 21723 21725 +2
==========================================
- Hits 16749 16610 -139
- Misses 4974 5115 +141
Continue to review full report at Codecov.
|
LysandreJik
left a comment
There was a problem hiding this comment.
This is correct! Thanks @setu4993!
|
Closed the PR by mistake... Re-opening it. |
|
Just a quick question for @mfuntowicz, is |
|
I saw |
|
Bumping this since I haven't seen any activity in a few days. |
|
Yes |
6af7942 to
6cceabd
Compare
|
LGTM but let's let @LysandreJik have a last check and merge this |
|
Thanks @julien-c! |
|
Hey @LysandreJik, can you please review when you have a chance? Thanks! |
|
Thanks @setu4993! |
The docs for both GPT and GPT2 specify that labels that are not -100 will be used for the calculation of the loss. So, the padding for the labels should be
-100, nottokenizer.pad_token_id.