Skip to content
This repository has been archived by the owner on Aug 1, 2023. It is now read-only.

Support dataset upsampling / relative ratio in PytorchTranslateTask (#657) #494

Closed
wants to merge 1 commit into from

Conversation

cndn
Copy link
Contributor

@cndn cndn commented Apr 24, 2019

Summary:
Pull Request resolved: facebookresearch/fairseq#657

Library side change split from D14924942

Added 2 arguments for load_dataset in PytorchTranslateTask

  1. dataset_upsampling. A nested dictionary {direction:{dataset: upsampling_ratio}}. Upsampling_ratio larger than one mean that the bitext is ob- served more often than actually present in the combined bitext and synthetic training corpus.

  2. dataset_relative_ratio. A tuple (dataset, ratio). The ratio represents the frequency certain dataset gets sampled to the rest of corpora map.

At most one of them could be specified.

Differential Revision: D15041293

cndn added a commit to cndn/translate that referenced this pull request Apr 24, 2019
…ytorch#494)

Summary:
Pull Request resolved: pytorch#494

Pull Request resolved: facebookresearch/fairseq#657

Library side change split from D14924942

Added 2 arguments for load_dataset in PytorchTranslateTask
1. dataset_upsampling. A nested dictionary {direction:{dataset: upsampling_ratio}}. Upsampling_ratio larger than one mean that the bitext is ob- served more often than actually present in the combined bitext and synthetic training corpus.

2. dataset_relative_ratio. A tuple (dataset, ratio). The ratio represents the frequency certain dataset gets sampled to the rest of corpora map.

At most one of them could be specified.

Differential Revision: D15041293

fbshipit-source-id: 3a15d8ca10f03cbf147bbcf56d385b358ca498ce
cndn added a commit to cndn/translate that referenced this pull request May 1, 2019
…ytorch#494)

Summary:
Pull Request resolved: pytorch#494

Pull Request resolved: facebookresearch/fairseq#657

Library side change split from D14924942

Added 2 arguments for load_dataset in PytorchTranslateTask
1. dataset_upsampling. A nested dictionary {direction:{dataset: upsampling_ratio}}. Upsampling_ratio larger than one mean that the bitext is ob- served more often than actually present in the combined bitext and synthetic training corpus.

2. dataset_relative_ratio. A tuple (dataset, ratio). The ratio represents the frequency certain dataset gets sampled to the rest of corpora map.

At most one of them could be specified.

Reviewed By: liezl200

Differential Revision: D15041293

fbshipit-source-id: cb5bdcbac503e9f2ceaf058fec8c59e69a3ab4a2
…ytorch#494)

Summary:
Pull Request resolved: pytorch#494

Pull Request resolved: facebookresearch/fairseq#657

Library side change split from D14924942

Added 2 arguments for load_dataset in PytorchTranslateTask
1. dataset_upsampling. A nested dictionary {direction:{dataset: upsampling_ratio}}. Upsampling_ratio larger than one mean that the bitext is ob- served more often than actually present in the combined bitext and synthetic training corpus.

2. dataset_relative_ratio. A tuple (dataset, ratio). The ratio represents the frequency certain dataset gets sampled to the rest of corpora map.

At most one of them could be specified.

Reviewed By: liezl200

Differential Revision: D15041293

fbshipit-source-id: 9b91de4c8f6a3e99ae2118c6f486660e8a86c09c
cndn added a commit to cndn/fairseq that referenced this pull request May 1, 2019
…acebookresearch#494)

Summary:
Pull Request resolved: pytorch/translate#494

Pull Request resolved: facebookresearch#657

Library side change split from D14924942

Added 2 arguments for load_dataset in PytorchTranslateTask
1. dataset_upsampling. A nested dictionary {direction:{dataset: upsampling_ratio}}. Upsampling_ratio larger than one mean that the bitext is ob- served more often than actually present in the combined bitext and synthetic training corpus.

2. dataset_relative_ratio. A tuple (dataset, ratio). The ratio represents the frequency certain dataset gets sampled to the rest of corpora map.

At most one of them could be specified.

Reviewed By: liezl200

Differential Revision: D15041293

fbshipit-source-id: b5eac5fadaf4fcaf32113fa8e3fe1a0bb3eae22d
facebook-github-bot pushed a commit that referenced this pull request May 1, 2019
…494)

Summary:
Pull Request resolved: #494

Pull Request resolved: facebookresearch/fairseq#657

Library side change split from D14924942

Added 2 arguments for load_dataset in PytorchTranslateTask
1. dataset_upsampling. A nested dictionary {direction:{dataset: upsampling_ratio}}. Upsampling_ratio larger than one mean that the bitext is ob- served more often than actually present in the combined bitext and synthetic training corpus.

2. dataset_relative_ratio. A tuple (dataset, ratio). The ratio represents the frequency certain dataset gets sampled to the rest of corpora map.

At most one of them could be specified.

Reviewed By: liezl200

Differential Revision: D15041293

fbshipit-source-id: 92daad29895c234e26d1b19f121106118a3957ad
yzpang pushed a commit to yzpang/gold-off-policy-text-gen-iclr21 that referenced this pull request Feb 19, 2021
…#494)

Summary:
Pull Request resolved: pytorch/translate#494

Pull Request resolved: facebookresearch/fairseq#657

Library side change split from D14924942

Added 2 arguments for load_dataset in PytorchTranslateTask
1. dataset_upsampling. A nested dictionary {direction:{dataset: upsampling_ratio}}. Upsampling_ratio larger than one mean that the bitext is ob- served more often than actually present in the combined bitext and synthetic training corpus.

2. dataset_relative_ratio. A tuple (dataset, ratio). The ratio represents the frequency certain dataset gets sampled to the rest of corpora map.

At most one of them could be specified.

Reviewed By: liezl200

Differential Revision: D15041293

fbshipit-source-id: 92daad29895c234e26d1b19f121106118a3957ad
yzpang pushed a commit to yzpang/gold-off-policy-text-gen-iclr21 that referenced this pull request Feb 19, 2021
…#494)

Summary:
Pull Request resolved: pytorch/translate#494

Pull Request resolved: facebookresearch/fairseq#657

Library side change split from D14924942

Added 2 arguments for load_dataset in PytorchTranslateTask
1. dataset_upsampling. A nested dictionary {direction:{dataset: upsampling_ratio}}. Upsampling_ratio larger than one mean that the bitext is ob- served more often than actually present in the combined bitext and synthetic training corpus.

2. dataset_relative_ratio. A tuple (dataset, ratio). The ratio represents the frequency certain dataset gets sampled to the rest of corpora map.

At most one of them could be specified.

Reviewed By: liezl200

Differential Revision: D15041293

fbshipit-source-id: 92daad29895c234e26d1b19f121106118a3957ad
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants