Skip to content

Conversation

Huffon
Copy link
Contributor

@Huffon Huffon commented Jan 17, 2020

Before submitting

  • Was this discussed/approved via a Github issue? (no need for typos, doc improvements)
  • Did you read the contributor guideline?
  • Did you make sure to update the docs?
  • Did you write any new necessary tests?

What does this PR do?

I think if we keep pass padding index of vocabulary as padding_idx to adaptive embedding layers,
there will be no chance to train some words.

e.g. If cut_off is (20000,60000) and vocab is larger than 60000,
we can't learn[20,000+padding_idx]th word and [60,000+padding_idx]th word.
Because those words' ids will be padding_idx by subtraction logic and eventually get zero tensors.

So, I changed self.padding_idx to None after assign vocab's padding_idx
for the first time at head embedding representation.

@facebook-github-bot
Copy link
Contributor

Hi Huffon! Thank you for your pull request and welcome to our community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file.In order for us to review and merge your code, please sign at https://code.facebook.com/cla. If you are contributing on behalf of someone else (eg your employer), the individual CLA may not be sufficient and your employer may need to sign the corporate CLA.

If you have received this in error or have any questions, please contact us at cla@fb.com. Thanks!

@facebook-github-bot
Copy link
Contributor

Thank you for signing our Contributor License Agreement. We can now accept your code for this (and any) Facebook open source project. Thanks!

@facebook-github-bot
Copy link
Contributor

Thank you for signing our Contributor License Agreement. We can now accept your code for this (and any) Facebook open source project. Thanks!

@myleott
Copy link

myleott commented Jan 24, 2020

Thanks, nice catch!

Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@myleott has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@myleott merged this pull request in 4f71c63.

@Huffon Huffon deleted the adapinp-fix branch January 26, 2020 08:26
moussaKam pushed a commit to moussaKam/language-adaptive-pretraining that referenced this pull request Sep 29, 2020
Summary:
# Before submitting

- [ ] Was this discussed/approved via a Github issue? (no need for typos, doc improvements)
- [x] Did you read the [contributor guideline](https://github.com/pytorch/fairseq/blob/master/CONTRIBUTING.md)?
- [ ] Did you make sure to update the docs?
- [ ] Did you write any new necessary tests?

## What does this PR do?

I think if we keep pass **padding index of vocabulary** as `padding_idx` to adaptive embedding layers,
there will be no chance to train some words.

e.g. If `cut_off` is (20000,60000) and vocab is larger than 60000,
we can't learn[**20,000+padding_idx**]th word and [**60,000+padding_idx**]th word.
Because those words' ids will be **padding_idx** by subtraction logic and eventually get zero tensors.

So, I changed `self.padding_idx` to `None` after assign vocab's `padding_idx`
**for the first time at head embedding representation**.
Pull Request resolved: facebookresearch#1629

Differential Revision: D19557340

Pulled By: myleott

fbshipit-source-id: e0c3b38862374d422a46dc62c248b2ecfbf08fd2
facebook-github-bot pushed a commit that referenced this pull request Feb 17, 2021
Summary:
# Before submitting

- [ ] Was this discussed/approved via a Github issue? (no need for typos, doc improvements)
- [ ] Did you read the [contributor guideline](https://github.com/pytorch/fairseq/blob/master/CONTRIBUTING.md)?
- [ ] Did you make sure to update the docs?
- [ ] Did you write any new necessary tests?

## What does this PR do?
Fixes # (issue).

## PR review
Anyone in the community is free to review the PR once the tests have passed.
If we didn't discuss your PR in Github issues there's a high chance it will not be merged.

## Did you have fun?
Make sure you had fun coding �

Pull Request resolved: fairinternal/fairseq-py#1629

Reviewed By: myleott

Differential Revision: D26484942

Pulled By: sshleifer

fbshipit-source-id: 9dcbab5c404c14d8f35628d823102ad9ce59dffd
harkash pushed a commit to harkash/fairseq that referenced this pull request Feb 23, 2021
Summary:
# Before submitting

- [ ] Was this discussed/approved via a Github issue? (no need for typos, doc improvements)
- [ ] Did you read the [contributor guideline](https://github.com/pytorch/fairseq/blob/master/CONTRIBUTING.md)?
- [ ] Did you make sure to update the docs?
- [ ] Did you write any new necessary tests?

## What does this PR do?
Fixes # (issue).

## PR review
Anyone in the community is free to review the PR once the tests have passed.
If we didn't discuss your PR in Github issues there's a high chance it will not be merged.

## Did you have fun?
Make sure you had fun coding �

Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/1629

Reviewed By: myleott

Differential Revision: D26484942

Pulled By: sshleifer

fbshipit-source-id: 9dcbab5c404c14d8f35628d823102ad9ce59dffd
sshleifer added a commit that referenced this pull request Apr 7, 2021
Summary:
# Before submitting

- [ ] Was this discussed/approved via a Github issue? (no need for typos, doc improvements)
- [ ] Did you read the [contributor guideline](https://github.com/pytorch/fairseq/blob/master/CONTRIBUTING.md)?
- [ ] Did you make sure to update the docs?
- [ ] Did you write any new necessary tests?

## What does this PR do?
Fixes # (issue).

## PR review
Anyone in the community is free to review the PR once the tests have passed.
If we didn't discuss your PR in Github issues there's a high chance it will not be merged.

## Did you have fun?
Make sure you had fun coding �

Pull Request resolved: fairinternal/fairseq-py#1629

Reviewed By: myleott

Differential Revision: D26484942

Pulled By: sshleifer

fbshipit-source-id: 9dcbab5c404c14d8f35628d823102ad9ce59dffd
Harleen8118 pushed a commit to Harleen8118/IBERT that referenced this pull request Jun 26, 2025
Summary:
# Before submitting

- [ ] Was this discussed/approved via a Github issue? (no need for typos, doc improvements)
- [x] Did you read the [contributor guideline](https://github.com/pytorch/fairseq/blob/master/CONTRIBUTING.md)?
- [ ] Did you make sure to update the docs?
- [ ] Did you write any new necessary tests?

## What does this PR do?

I think if we keep pass **padding index of vocabulary** as `padding_idx` to adaptive embedding layers,
there will be no chance to train some words.

e.g. If `cut_off` is (20000,60000) and vocab is larger than 60000,
we can't learn[**20,000+padding_idx**]th word and [**60,000+padding_idx**]th word.
Because those words' ids will be **padding_idx** by subtraction logic and eventually get zero tensors.

So, I changed `self.padding_idx` to `None` after assign vocab's `padding_idx`
**for the first time at head embedding representation**.
Pull Request resolved: facebookresearch/fairseq#1629

Differential Revision: D19557340

Pulled By: myleott

fbshipit-source-id: e0c3b38862374d422a46dc62c248b2ecfbf08fd2
caltia pushed a commit to caltia/fairseq that referenced this pull request Jul 8, 2025
Summary:
# Before submitting

- [ ] Was this discussed/approved via a Github issue? (no need for typos, doc improvements)
- [x] Did you read the [contributor guideline](https://github.com/pytorch/fairseq/blob/master/CONTRIBUTING.md)?
- [ ] Did you make sure to update the docs?
- [ ] Did you write any new necessary tests?

## What does this PR do?

I think if we keep pass **padding index of vocabulary** as `padding_idx` to adaptive embedding layers,
there will be no chance to train some words.

e.g. If `cut_off` is (20000,60000) and vocab is larger than 60000,
we can't learn[**20,000+padding_idx**]th word and [**60,000+padding_idx**]th word.
Because those words' ids will be **padding_idx** by subtraction logic and eventually get zero tensors.

So, I changed `self.padding_idx` to `None` after assign vocab's `padding_idx`
**for the first time at head embedding representation**.
Pull Request resolved: facebookresearch/fairseq#1629

Differential Revision: D19557340

Pulled By: myleott

fbshipit-source-id: e0c3b38862374d422a46dc62c248b2ecfbf08fd2
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants