-
Notifications
You must be signed in to change notification settings - Fork 6.6k
Fix padding_idx logical error in Adaptive Input #1629
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Hi Huffon! Thank you for your pull request and welcome to our community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file.In order for us to review and merge your code, please sign at https://code.facebook.com/cla. If you are contributing on behalf of someone else (eg your employer), the individual CLA may not be sufficient and your employer may need to sign the corporate CLA. If you have received this in error or have any questions, please contact us at cla@fb.com. Thanks! |
Thank you for signing our Contributor License Agreement. We can now accept your code for this (and any) Facebook open source project. Thanks! |
Thank you for signing our Contributor License Agreement. We can now accept your code for this (and any) Facebook open source project. Thanks! |
Thanks, nice catch! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@myleott has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
Summary: # Before submitting - [ ] Was this discussed/approved via a Github issue? (no need for typos, doc improvements) - [x] Did you read the [contributor guideline](https://github.com/pytorch/fairseq/blob/master/CONTRIBUTING.md)? - [ ] Did you make sure to update the docs? - [ ] Did you write any new necessary tests? ## What does this PR do? I think if we keep pass **padding index of vocabulary** as `padding_idx` to adaptive embedding layers, there will be no chance to train some words. e.g. If `cut_off` is (20000,60000) and vocab is larger than 60000, we can't learn[**20,000+padding_idx**]th word and [**60,000+padding_idx**]th word. Because those words' ids will be **padding_idx** by subtraction logic and eventually get zero tensors. So, I changed `self.padding_idx` to `None` after assign vocab's `padding_idx` **for the first time at head embedding representation**. Pull Request resolved: facebookresearch#1629 Differential Revision: D19557340 Pulled By: myleott fbshipit-source-id: e0c3b38862374d422a46dc62c248b2ecfbf08fd2
Summary: # Before submitting - [ ] Was this discussed/approved via a Github issue? (no need for typos, doc improvements) - [ ] Did you read the [contributor guideline](https://github.com/pytorch/fairseq/blob/master/CONTRIBUTING.md)? - [ ] Did you make sure to update the docs? - [ ] Did you write any new necessary tests? ## What does this PR do? Fixes # (issue). ## PR review Anyone in the community is free to review the PR once the tests have passed. If we didn't discuss your PR in Github issues there's a high chance it will not be merged. ## Did you have fun? Make sure you had fun coding � Pull Request resolved: fairinternal/fairseq-py#1629 Reviewed By: myleott Differential Revision: D26484942 Pulled By: sshleifer fbshipit-source-id: 9dcbab5c404c14d8f35628d823102ad9ce59dffd
Summary: # Before submitting - [ ] Was this discussed/approved via a Github issue? (no need for typos, doc improvements) - [ ] Did you read the [contributor guideline](https://github.com/pytorch/fairseq/blob/master/CONTRIBUTING.md)? - [ ] Did you make sure to update the docs? - [ ] Did you write any new necessary tests? ## What does this PR do? Fixes # (issue). ## PR review Anyone in the community is free to review the PR once the tests have passed. If we didn't discuss your PR in Github issues there's a high chance it will not be merged. ## Did you have fun? Make sure you had fun coding � Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/1629 Reviewed By: myleott Differential Revision: D26484942 Pulled By: sshleifer fbshipit-source-id: 9dcbab5c404c14d8f35628d823102ad9ce59dffd
Summary: # Before submitting - [ ] Was this discussed/approved via a Github issue? (no need for typos, doc improvements) - [ ] Did you read the [contributor guideline](https://github.com/pytorch/fairseq/blob/master/CONTRIBUTING.md)? - [ ] Did you make sure to update the docs? - [ ] Did you write any new necessary tests? ## What does this PR do? Fixes # (issue). ## PR review Anyone in the community is free to review the PR once the tests have passed. If we didn't discuss your PR in Github issues there's a high chance it will not be merged. ## Did you have fun? Make sure you had fun coding � Pull Request resolved: fairinternal/fairseq-py#1629 Reviewed By: myleott Differential Revision: D26484942 Pulled By: sshleifer fbshipit-source-id: 9dcbab5c404c14d8f35628d823102ad9ce59dffd
Summary: # Before submitting - [ ] Was this discussed/approved via a Github issue? (no need for typos, doc improvements) - [x] Did you read the [contributor guideline](https://github.com/pytorch/fairseq/blob/master/CONTRIBUTING.md)? - [ ] Did you make sure to update the docs? - [ ] Did you write any new necessary tests? ## What does this PR do? I think if we keep pass **padding index of vocabulary** as `padding_idx` to adaptive embedding layers, there will be no chance to train some words. e.g. If `cut_off` is (20000,60000) and vocab is larger than 60000, we can't learn[**20,000+padding_idx**]th word and [**60,000+padding_idx**]th word. Because those words' ids will be **padding_idx** by subtraction logic and eventually get zero tensors. So, I changed `self.padding_idx` to `None` after assign vocab's `padding_idx` **for the first time at head embedding representation**. Pull Request resolved: facebookresearch/fairseq#1629 Differential Revision: D19557340 Pulled By: myleott fbshipit-source-id: e0c3b38862374d422a46dc62c248b2ecfbf08fd2
Summary: # Before submitting - [ ] Was this discussed/approved via a Github issue? (no need for typos, doc improvements) - [x] Did you read the [contributor guideline](https://github.com/pytorch/fairseq/blob/master/CONTRIBUTING.md)? - [ ] Did you make sure to update the docs? - [ ] Did you write any new necessary tests? ## What does this PR do? I think if we keep pass **padding index of vocabulary** as `padding_idx` to adaptive embedding layers, there will be no chance to train some words. e.g. If `cut_off` is (20000,60000) and vocab is larger than 60000, we can't learn[**20,000+padding_idx**]th word and [**60,000+padding_idx**]th word. Because those words' ids will be **padding_idx** by subtraction logic and eventually get zero tensors. So, I changed `self.padding_idx` to `None` after assign vocab's `padding_idx` **for the first time at head embedding representation**. Pull Request resolved: facebookresearch/fairseq#1629 Differential Revision: D19557340 Pulled By: myleott fbshipit-source-id: e0c3b38862374d422a46dc62c248b2ecfbf08fd2
Before submitting
What does this PR do?
I think if we keep pass padding index of vocabulary as
padding_idx
to adaptive embedding layers,there will be no chance to train some words.
e.g. If
cut_off
is (20000,60000) and vocab is larger than 60000,we can't learn[20,000+padding_idx]th word and [60,000+padding_idx]th word.
Because those words' ids will be padding_idx by subtraction logic and eventually get zero tensors.
So, I changed
self.padding_idx
toNone
after assign vocab'spadding_idx
for the first time at head embedding representation.