Support dataset upsampling / relative ratio in PytorchTranslateTask #657

cndn · 2019-04-23T23:16:07Z

Summary:
Library side change split from D14924942

Added 2 arguments for load_dataset in PytorchTranslateTask

dataset_upsampling. A nested dictionary {direction:{dataset: upsampling_ratio}}. Upsampling_ratio larger than one mean that the bitext is ob- served more often than actually present in the combined bitext and synthetic training corpus.
dataset_relative_ratio. A tuple (dataset, ratio). The ratio represents the frequency certain dataset gets sampled to the rest of corpora map.

At most one of them could be specified.

Differential Revision: D15041293

…ytorch#657) Summary: Pull Request resolved: facebookresearch/fairseq#657 Library side change split from D14924942 Added 2 arguments for load_dataset in PytorchTranslateTask 1. dataset_upsampling. A nested dictionary {direction:{dataset: upsampling_ratio}}. Upsampling_ratio larger than one mean that the bitext is ob- served more often than actually present in the combined bitext and synthetic training corpus. 2. dataset_relative_ratio. A tuple (dataset, ratio). The ratio represents the frequency certain dataset gets sampled to the rest of corpora map. At most one of them could be specified. Differential Revision: D15041293 fbshipit-source-id: 9a17b3c22b68441586b716b45eb53f80a8a6d6b8

…ytorch#494) Summary: Pull Request resolved: pytorch#494 Pull Request resolved: facebookresearch/fairseq#657 Library side change split from D14924942 Added 2 arguments for load_dataset in PytorchTranslateTask 1. dataset_upsampling. A nested dictionary {direction:{dataset: upsampling_ratio}}. Upsampling_ratio larger than one mean that the bitext is ob- served more often than actually present in the combined bitext and synthetic training corpus. 2. dataset_relative_ratio. A tuple (dataset, ratio). The ratio represents the frequency certain dataset gets sampled to the rest of corpora map. At most one of them could be specified. Differential Revision: D15041293 fbshipit-source-id: 3a15d8ca10f03cbf147bbcf56d385b358ca498ce

…ytorch#494) Summary: Pull Request resolved: pytorch#494 Pull Request resolved: facebookresearch/fairseq#657 Library side change split from D14924942 Added 2 arguments for load_dataset in PytorchTranslateTask 1. dataset_upsampling. A nested dictionary {direction:{dataset: upsampling_ratio}}. Upsampling_ratio larger than one mean that the bitext is ob- served more often than actually present in the combined bitext and synthetic training corpus. 2. dataset_relative_ratio. A tuple (dataset, ratio). The ratio represents the frequency certain dataset gets sampled to the rest of corpora map. At most one of them could be specified. Reviewed By: liezl200 Differential Revision: D15041293 fbshipit-source-id: cb5bdcbac503e9f2ceaf058fec8c59e69a3ab4a2

…ytorch#494) Summary: Pull Request resolved: pytorch#494 Pull Request resolved: facebookresearch/fairseq#657 Library side change split from D14924942 Added 2 arguments for load_dataset in PytorchTranslateTask 1. dataset_upsampling. A nested dictionary {direction:{dataset: upsampling_ratio}}. Upsampling_ratio larger than one mean that the bitext is ob- served more often than actually present in the combined bitext and synthetic training corpus. 2. dataset_relative_ratio. A tuple (dataset, ratio). The ratio represents the frequency certain dataset gets sampled to the rest of corpora map. At most one of them could be specified. Reviewed By: liezl200 Differential Revision: D15041293 fbshipit-source-id: 9b91de4c8f6a3e99ae2118c6f486660e8a86c09c

…acebookresearch#494) Summary: Pull Request resolved: pytorch/translate#494 Pull Request resolved: facebookresearch#657 Library side change split from D14924942 Added 2 arguments for load_dataset in PytorchTranslateTask 1. dataset_upsampling. A nested dictionary {direction:{dataset: upsampling_ratio}}. Upsampling_ratio larger than one mean that the bitext is ob- served more often than actually present in the combined bitext and synthetic training corpus. 2. dataset_relative_ratio. A tuple (dataset, ratio). The ratio represents the frequency certain dataset gets sampled to the rest of corpora map. At most one of them could be specified. Reviewed By: liezl200 Differential Revision: D15041293 fbshipit-source-id: b5eac5fadaf4fcaf32113fa8e3fe1a0bb3eae22d

…494) Summary: Pull Request resolved: #494 Pull Request resolved: facebookresearch/fairseq#657 Library side change split from D14924942 Added 2 arguments for load_dataset in PytorchTranslateTask 1. dataset_upsampling. A nested dictionary {direction:{dataset: upsampling_ratio}}. Upsampling_ratio larger than one mean that the bitext is ob- served more often than actually present in the combined bitext and synthetic training corpus. 2. dataset_relative_ratio. A tuple (dataset, ratio). The ratio represents the frequency certain dataset gets sampled to the rest of corpora map. At most one of them could be specified. Reviewed By: liezl200 Differential Revision: D15041293 fbshipit-source-id: 92daad29895c234e26d1b19f121106118a3957ad

facebook-github-bot · 2019-05-01T19:01:27Z

This pull request has been merged in ff74ca9.

…#494) Summary: Pull Request resolved: pytorch/translate#494 Pull Request resolved: facebookresearch/fairseq#657 Library side change split from D14924942 Added 2 arguments for load_dataset in PytorchTranslateTask 1. dataset_upsampling. A nested dictionary {direction:{dataset: upsampling_ratio}}. Upsampling_ratio larger than one mean that the bitext is ob- served more often than actually present in the combined bitext and synthetic training corpus. 2. dataset_relative_ratio. A tuple (dataset, ratio). The ratio represents the frequency certain dataset gets sampled to the rest of corpora map. At most one of them could be specified. Reviewed By: liezl200 Differential Revision: D15041293 fbshipit-source-id: 92daad29895c234e26d1b19f121106118a3957ad

facebook-github-bot added the CLA Signed label Apr 23, 2019

cndn mentioned this pull request Apr 24, 2019

Support dataset upsampling / relative ratio in PytorchTranslateTask (#657) pytorch/translate#494

Closed

cndn force-pushed the export-D15041293 branch from 3ec948e to 0759e8a Compare April 24, 2019 01:32

cndn force-pushed the export-D15041293 branch from 0759e8a to a1779ba Compare May 1, 2019 04:35

facebook-github-bot closed this in ff74ca9 May 1, 2019

facebook-github-bot added the Merged label May 1, 2019

yfyeung pushed a commit to yfyeung/fairseq that referenced this pull request Dec 6, 2023

Fix LG log file name (facebookresearch#657)

64aed2c

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support dataset upsampling / relative ratio in PytorchTranslateTask #657

Support dataset upsampling / relative ratio in PytorchTranslateTask #657

cndn commented Apr 23, 2019

facebook-github-bot commented May 1, 2019

Support dataset upsampling / relative ratio in PytorchTranslateTask #657

Support dataset upsampling / relative ratio in PytorchTranslateTask #657

Conversation

cndn commented Apr 23, 2019

facebook-github-bot commented May 1, 2019