Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support dataset upsampling / relative ratio in PytorchTranslateTask #657

Closed
wants to merge 1 commit into from

Conversation

cndn
Copy link
Contributor

@cndn cndn commented Apr 23, 2019

Summary:
Library side change split from D14924942

Added 2 arguments for load_dataset in PytorchTranslateTask

  1. dataset_upsampling. A nested dictionary {direction:{dataset: upsampling_ratio}}. Upsampling_ratio larger than one mean that the bitext is ob- served more often than actually present in the combined bitext and synthetic training corpus.

  2. dataset_relative_ratio. A tuple (dataset, ratio). The ratio represents the frequency certain dataset gets sampled to the rest of corpora map.

At most one of them could be specified.

Differential Revision: D15041293

cndn added a commit to cndn/translate that referenced this pull request Apr 24, 2019
…ytorch#657)

Summary:
Pull Request resolved: facebookresearch/fairseq#657

Library side change split from D14924942

Added 2 arguments for load_dataset in PytorchTranslateTask
1. dataset_upsampling. A nested dictionary {direction:{dataset: upsampling_ratio}}. Upsampling_ratio larger than one mean that the bitext is ob- served more often than actually present in the combined bitext and synthetic training corpus.

2. dataset_relative_ratio. A tuple (dataset, ratio). The ratio represents the frequency certain dataset gets sampled to the rest of corpora map.

At most one of them could be specified.

Differential Revision: D15041293

fbshipit-source-id: 9a17b3c22b68441586b716b45eb53f80a8a6d6b8
cndn added a commit to cndn/translate that referenced this pull request Apr 24, 2019
…ytorch#494)

Summary:
Pull Request resolved: pytorch#494

Pull Request resolved: facebookresearch/fairseq#657

Library side change split from D14924942

Added 2 arguments for load_dataset in PytorchTranslateTask
1. dataset_upsampling. A nested dictionary {direction:{dataset: upsampling_ratio}}. Upsampling_ratio larger than one mean that the bitext is ob- served more often than actually present in the combined bitext and synthetic training corpus.

2. dataset_relative_ratio. A tuple (dataset, ratio). The ratio represents the frequency certain dataset gets sampled to the rest of corpora map.

At most one of them could be specified.

Differential Revision: D15041293

fbshipit-source-id: 3a15d8ca10f03cbf147bbcf56d385b358ca498ce
cndn added a commit to cndn/translate that referenced this pull request May 1, 2019
…ytorch#494)

Summary:
Pull Request resolved: pytorch#494

Pull Request resolved: facebookresearch/fairseq#657

Library side change split from D14924942

Added 2 arguments for load_dataset in PytorchTranslateTask
1. dataset_upsampling. A nested dictionary {direction:{dataset: upsampling_ratio}}. Upsampling_ratio larger than one mean that the bitext is ob- served more often than actually present in the combined bitext and synthetic training corpus.

2. dataset_relative_ratio. A tuple (dataset, ratio). The ratio represents the frequency certain dataset gets sampled to the rest of corpora map.

At most one of them could be specified.

Reviewed By: liezl200

Differential Revision: D15041293

fbshipit-source-id: cb5bdcbac503e9f2ceaf058fec8c59e69a3ab4a2
cndn added a commit to cndn/translate that referenced this pull request May 1, 2019
…ytorch#494)

Summary:
Pull Request resolved: pytorch#494

Pull Request resolved: facebookresearch/fairseq#657

Library side change split from D14924942

Added 2 arguments for load_dataset in PytorchTranslateTask
1. dataset_upsampling. A nested dictionary {direction:{dataset: upsampling_ratio}}. Upsampling_ratio larger than one mean that the bitext is ob- served more often than actually present in the combined bitext and synthetic training corpus.

2. dataset_relative_ratio. A tuple (dataset, ratio). The ratio represents the frequency certain dataset gets sampled to the rest of corpora map.

At most one of them could be specified.

Reviewed By: liezl200

Differential Revision: D15041293

fbshipit-source-id: 9b91de4c8f6a3e99ae2118c6f486660e8a86c09c
…acebookresearch#494)

Summary:
Pull Request resolved: pytorch/translate#494

Pull Request resolved: facebookresearch#657

Library side change split from D14924942

Added 2 arguments for load_dataset in PytorchTranslateTask
1. dataset_upsampling. A nested dictionary {direction:{dataset: upsampling_ratio}}. Upsampling_ratio larger than one mean that the bitext is ob- served more often than actually present in the combined bitext and synthetic training corpus.

2. dataset_relative_ratio. A tuple (dataset, ratio). The ratio represents the frequency certain dataset gets sampled to the rest of corpora map.

At most one of them could be specified.

Reviewed By: liezl200

Differential Revision: D15041293

fbshipit-source-id: b5eac5fadaf4fcaf32113fa8e3fe1a0bb3eae22d
facebook-github-bot pushed a commit to pytorch/translate that referenced this pull request May 1, 2019
…494)

Summary:
Pull Request resolved: #494

Pull Request resolved: facebookresearch/fairseq#657

Library side change split from D14924942

Added 2 arguments for load_dataset in PytorchTranslateTask
1. dataset_upsampling. A nested dictionary {direction:{dataset: upsampling_ratio}}. Upsampling_ratio larger than one mean that the bitext is ob- served more often than actually present in the combined bitext and synthetic training corpus.

2. dataset_relative_ratio. A tuple (dataset, ratio). The ratio represents the frequency certain dataset gets sampled to the rest of corpora map.

At most one of them could be specified.

Reviewed By: liezl200

Differential Revision: D15041293

fbshipit-source-id: 92daad29895c234e26d1b19f121106118a3957ad
@facebook-github-bot
Copy link
Contributor

This pull request has been merged in ff74ca9.

yzpang pushed a commit to yzpang/gold-off-policy-text-gen-iclr21 that referenced this pull request Feb 19, 2021
…#494)

Summary:
Pull Request resolved: pytorch/translate#494

Pull Request resolved: facebookresearch/fairseq#657

Library side change split from D14924942

Added 2 arguments for load_dataset in PytorchTranslateTask
1. dataset_upsampling. A nested dictionary {direction:{dataset: upsampling_ratio}}. Upsampling_ratio larger than one mean that the bitext is ob- served more often than actually present in the combined bitext and synthetic training corpus.

2. dataset_relative_ratio. A tuple (dataset, ratio). The ratio represents the frequency certain dataset gets sampled to the rest of corpora map.

At most one of them could be specified.

Reviewed By: liezl200

Differential Revision: D15041293

fbshipit-source-id: 92daad29895c234e26d1b19f121106118a3957ad
yzpang pushed a commit to yzpang/gold-off-policy-text-gen-iclr21 that referenced this pull request Feb 19, 2021
…#494)

Summary:
Pull Request resolved: pytorch/translate#494

Pull Request resolved: facebookresearch/fairseq#657

Library side change split from D14924942

Added 2 arguments for load_dataset in PytorchTranslateTask
1. dataset_upsampling. A nested dictionary {direction:{dataset: upsampling_ratio}}. Upsampling_ratio larger than one mean that the bitext is ob- served more often than actually present in the combined bitext and synthetic training corpus.

2. dataset_relative_ratio. A tuple (dataset, ratio). The ratio represents the frequency certain dataset gets sampled to the rest of corpora map.

At most one of them could be specified.

Reviewed By: liezl200

Differential Revision: D15041293

fbshipit-source-id: 92daad29895c234e26d1b19f121106118a3957ad
yfyeung pushed a commit to yfyeung/fairseq that referenced this pull request Dec 6, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants