Skip to content

Replace pad_token with -100 for LM loss calculation#4718

Merged
LysandreJik merged 1 commit into
huggingface:masterfrom
setu4993:bug/data-collator-lm
Jun 24, 2020
Merged

Replace pad_token with -100 for LM loss calculation#4718
LysandreJik merged 1 commit into
huggingface:masterfrom
setu4993:bug/data-collator-lm

Conversation

@setu4993
Copy link
Copy Markdown
Contributor

@setu4993 setu4993 commented Jun 2, 2020

The docs for both GPT and GPT2 specify that labels that are not -100 will be used for the calculation of the loss. So, the padding for the labels should be -100, not tokenizer.pad_token_id.

@setu4993 setu4993 force-pushed the bug/data-collator-lm branch from 445fda1 to 6af7942 Compare June 2, 2020 08:49
@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented Jun 2, 2020

Codecov Report

Merging #4718 into master will decrease coverage by 0.64%.
The diff coverage is 100.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #4718      +/-   ##
==========================================
- Coverage   77.10%   76.45%   -0.65%     
==========================================
  Files         128      128              
  Lines       21723    21725       +2     
==========================================
- Hits        16749    16610     -139     
- Misses       4974     5115     +141     
Impacted Files Coverage Δ
src/transformers/data/data_collator.py 89.55% <100.00%> (+0.32%) ⬆️
src/transformers/modeling_tf_pytorch_utils.py 8.72% <0.00%> (-81.21%) ⬇️
src/transformers/modeling_ctrl.py 96.56% <0.00%> (-2.58%) ⬇️
src/transformers/modeling_openai.py 79.51% <0.00%> (-1.39%) ⬇️
src/transformers/trainer.py 38.08% <0.00%> (-1.17%) ⬇️
src/transformers/modeling_tf_utils.py 86.49% <0.00%> (-0.78%) ⬇️
src/transformers/benchmark/benchmark_utils.py 72.80% <0.00%> (-0.30%) ⬇️
src/transformers/file_utils.py 73.79% <0.00%> (ø)
src/transformers/modeling_utils.py 90.61% <0.00%> (+0.11%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update e80d6c6...6cceabd. Read the comment docs.

@setu4993 setu4993 changed the title Replace labels with -100 to skip loss calc Replace pad_token with -100 for LM loss calculation Jun 2, 2020
Copy link
Copy Markdown
Member

@LysandreJik LysandreJik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is correct! Thanks @setu4993!

@LysandreJik LysandreJik requested a review from julien-c June 3, 2020 18:05
@setu4993 setu4993 closed this Jun 6, 2020
@setu4993 setu4993 deleted the bug/data-collator-lm branch June 6, 2020 08:01
@setu4993 setu4993 restored the bug/data-collator-lm branch June 6, 2020 08:01
@setu4993 setu4993 reopened this Jun 6, 2020
@setu4993
Copy link
Copy Markdown
Contributor Author

setu4993 commented Jun 6, 2020

Closed the PR by mistake... Re-opening it.

@julien-c
Copy link
Copy Markdown
Member

julien-c commented Jun 6, 2020

Just a quick question for @mfuntowicz, is copy.deepcopy a performant way to clone a tensor? (given that this is called at each training step)

@setu4993
Copy link
Copy Markdown
Contributor Author

setu4993 commented Jun 6, 2020

I saw deepcopy being used elsewhere in the code, so added it as is. Just looking at the new tensor documentation, and they recommend using .clone().detach(). Happy to change it to that.

@setu4993
Copy link
Copy Markdown
Contributor Author

Bumping this since I haven't seen any activity in a few days.

@julien-c
Copy link
Copy Markdown
Member

Yes .clone().detach() sounds good.

@julien-c julien-c requested a review from mfuntowicz June 11, 2020 07:07
@setu4993 setu4993 force-pushed the bug/data-collator-lm branch from 6af7942 to 6cceabd Compare June 11, 2020 07:18
@julien-c
Copy link
Copy Markdown
Member

LGTM but let's let @LysandreJik have a last check and merge this

@setu4993
Copy link
Copy Markdown
Contributor Author

Thanks @julien-c!

@setu4993 setu4993 requested a review from LysandreJik June 11, 2020 16:52
@setu4993
Copy link
Copy Markdown
Contributor Author

Hey @LysandreJik, can you please review when you have a chance? Thanks!

@LysandreJik
Copy link
Copy Markdown
Member

Thanks @setu4993!

@LysandreJik LysandreJik merged commit 0a3d0e0 into huggingface:master Jun 24, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants