-
Notifications
You must be signed in to change notification settings - Fork 25.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
disable_ngram_loss
fix for prophetnet
#8554
disable_ngram_loss
fix for prophetnet
#8554
Conversation
Hey @Zhylkaaa, Thanks a lot for your PR! @qiweizhen - could you maybe take a look and give your opinion on this PR? I don't have much experience with training ProphetNet |
Hi @patrickvonplaten , thanks for informing me. It seems it's still related to the padding tokens (default -100 or padding_idx) which should not be calculated loss. Here @Zhylkaaa set them consistent. I suggest that 1) the outside data preprocess padding function, 2) here expend_targets and 3) the loss function to be consistent. If Huggingface Transformers default uses -100 for all the NLG models padding, then this code can be merged. If Huggingface Transformers default uses self.padding_idx for all the NLG models padding, then not merge this code, but feed padding_idx into the loss function. |
Thanks @qiweizhen for reviewing, |
Thank you for pointing out this "mean" or "sum" problem! This line of code is converted from Fairseq version ProphetNet, which use loss sum here, to be consistent with [Fairseq Transformer] (https://github.com/pytorch/fairseq/blob/v0.9.0/fairseq/criterions/label_smoothed_cross_entropy.py#L26-L27). The reason is that in the training pipeline of Fairseq, they will do the "mean" operation in their trainer. So we return the sum loss and sample_size for Fairseq to calculate sum loss / sample_size (mean). |
…emove unnecessary arguments
e13d031
to
f441097
Compare
Hi @qiweizhen, I want to verify that I should mean label smoothing loss instead of summing it to be consistent with change of reduction strategy and also should I change Also @patrickvonplaten, I have messed up with rebasing so I needed to make reset hard, is it ok or should I close this PR and open one that doesn't change commit history when I finish?) |
de691b4
to
0eb1afa
Compare
Great PR @Zhylkaaa! I did a small refactor and fixed the test. Thanks for your help @qiweizhen |
* `disable_ngram_loss` fix for prophetnet * add changes documentation * fix _compute_loss to use mean reduction and -100 to masked tokens & remove unnecessary arguments * mean label smoothing loss * small refactor * fix test Co-authored-by: patrickvonplaten <patrick.v.platen@gmail.com>
What does this PR do?
This PR fixes
disable_ngram_loss
behaviour for ProphetNetForConditionalGeneration and is related to #8553Fixes #8553
Before submitting
Pull Request section?
to the it if that's the case.
documentation guidelines, and
here are tips on formatting docstrings.
Who can review?
I guess @patrickvonplaten was using this model (I saw models on hub), sorry if I am wrong, but there is no one to tag for ProphetNet