`disable_ngram_loss` fix for prophetnet #8554

Zhylkaaa · 2020-11-15T21:55:06Z

What does this PR do?

This PR fixes disable_ngram_loss behaviour for ProphetNetForConditionalGeneration and is related to #8553

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline,
Pull Request section?
Was this discussed/approved via a Github issue or the forum? Please add a link
to the it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?

Who can review?

I guess @patrickvonplaten was using this model (I saw models on hub), sorry if I am wrong, but there is no one to tag for ProphetNet

patrickvonplaten · 2020-11-16T13:40:49Z

Hey @Zhylkaaa,

Thanks a lot for your PR! @qiweizhen - could you maybe take a look and give your opinion on this PR? I don't have much experience with training ProphetNet

qiweizhen · 2020-11-16T14:00:33Z

Hi @patrickvonplaten , thanks for informing me. It seems it's still related to the padding tokens (default -100 or padding_idx) which should not be calculated loss.
In the old version code, expend_targets is filled with self.padding_idx however in the loss function, padding_idx is not fed in. It results in calculating wrong loss.

Here @Zhylkaaa set them consistent. I suggest that 1) the outside data preprocess padding function, 2) here expend_targets and 3) the loss function to be consistent.

If Huggingface Transformers default uses -100 for all the NLG models padding, then this code can be merged. If Huggingface Transformers default uses self.padding_idx for all the NLG models padding, then not merge this code, but feed padding_idx into the loss function.

Zhylkaaa · 2020-11-16T15:14:17Z

Thanks @qiweizhen for reviewing,
to my knowledge Huggingface Transformers use -100 to indicate ignored tokens during loss calculations, also I wanted to ask if reduction strategy is important (Transformers use reduction='mean' to my knowledge, but hear it is set to 'sum') because for me it's not?
I also noticed that I haven't changed this behaviour in another ProphetNet model, I will examine if it's necessary and commit changes, also I will write some tests to check for this behaviour in nearest future.

qiweizhen · 2020-11-17T08:26:11Z

Thanks @qiweizhen for reviewing,
to my knowledge Huggingface Transformers use -100 to indicate ignored tokens during loss calculations, also I wanted to ask if reduction strategy is important (Transformers use reduction='mean' to my knowledge, but hear it is set to 'sum') because for me it's not?
I also noticed that I haven't changed this behaviour in another ProphetNet model, I will examine if it's necessary and commit changes, also I will write some tests to check for this behaviour in nearest future.

Thank you for pointing out this "mean" or "sum" problem! This line of code is converted from Fairseq version ProphetNet, which use loss sum here, to be consistent with [Fairseq Transformer] (https://github.com/pytorch/fairseq/blob/v0.9.0/fairseq/criterions/label_smoothed_cross_entropy.py#L26-L27). The reason is that in the training pipeline of Fairseq, they will do the "mean" operation in their trainer. So we return the sum loss and sample_size for Fairseq to calculate sum loss / sample_size (mean).
So I agree here we should use "mean" as you suggested. Thank you @Zhylkaaa !

…emove unnecessary arguments

Zhylkaaa · 2020-11-17T12:48:56Z

Hi @qiweizhen, I want to verify that I should mean label smoothing loss instead of summing it to be consistent with change of reduction strategy and also should I change non_pad_mask to mask where I exclude -100? (You can see this changes in last commit but I just want to be sure 😁)

Also @patrickvonplaten, I have messed up with rebasing so I needed to make reset hard, is it ok or should I close this PR and open one that doesn't change commit history when I finish?)

…ace/transformers#8554

patrickvonplaten · 2020-11-19T17:31:29Z

Great PR @Zhylkaaa! I did a small refactor and fixed the test. Thanks for your help @qiweizhen

* `disable_ngram_loss` fix for prophetnet * add changes documentation * fix _compute_loss to use mean reduction and -100 to masked tokens & remove unnecessary arguments * mean label smoothing loss * small refactor * fix test Co-authored-by: patrickvonplaten <patrick.v.platen@gmail.com>

Zhylkaaa added 2 commits November 15, 2020 22:31

disable_ngram_loss fix for prophetnet

8a7d166

add changes documentation

e21232c

fix _compute_loss to use mean reduction and -100 to masked tokens & r…

f441097

…emove unnecessary arguments

Zhylkaaa force-pushed the disable_ngram_fix branch from e13d031 to f441097 Compare November 17, 2020 12:05

mean label smoothing loss

db1734f

Zhylkaaa added a commit to Zhylkaaa/simpletransformers that referenced this pull request Nov 17, 2020

remove model specific reduction argument tracking changes in huggingf…

40c059c

…ace/transformers#8554

small refactor

0eb1afa

julien-c added the model card Related to pretrained model cards label Nov 19, 2020

patrickvonplaten force-pushed the disable_ngram_fix branch from de691b4 to 0eb1afa Compare November 19, 2020 17:28

patrickvonplaten added 2 commits November 19, 2020 17:28

Merge remote-tracking branch 'main/master' into disable_ngram_fix

104ec40

fix test

8cc37d6

patrickvonplaten removed the model card Related to pretrained model cards label Nov 19, 2020

patrickvonplaten merged commit ca0109b into huggingface:master Nov 19, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`disable_ngram_loss` fix for prophetnet #8554

`disable_ngram_loss` fix for prophetnet #8554

Zhylkaaa commented Nov 15, 2020

patrickvonplaten commented Nov 16, 2020

qiweizhen commented Nov 16, 2020 •

edited

Zhylkaaa commented Nov 16, 2020

qiweizhen commented Nov 17, 2020

Zhylkaaa commented Nov 17, 2020

patrickvonplaten commented Nov 19, 2020

disable_ngram_loss fix for prophetnet #8554

disable_ngram_loss fix for prophetnet #8554

Conversation

Zhylkaaa commented Nov 15, 2020

What does this PR do?

Before submitting

Who can review?

patrickvonplaten commented Nov 16, 2020

qiweizhen commented Nov 16, 2020 • edited

Zhylkaaa commented Nov 16, 2020

qiweizhen commented Nov 17, 2020

Zhylkaaa commented Nov 17, 2020

patrickvonplaten commented Nov 19, 2020

`disable_ngram_loss` fix for prophetnet #8554

`disable_ngram_loss` fix for prophetnet #8554

qiweizhen commented Nov 16, 2020 •

edited