Support dataset upsampling / relative ratio in PytorchTranslateTask (#657) #494

cndn · 2019-04-24T01:32:23Z

Summary:
Pull Request resolved: facebookresearch/fairseq#657

Library side change split from D14924942

Added 2 arguments for load_dataset in PytorchTranslateTask

dataset_upsampling. A nested dictionary {direction:{dataset: upsampling_ratio}}. Upsampling_ratio larger than one mean that the bitext is ob- served more often than actually present in the combined bitext and synthetic training corpus.
dataset_relative_ratio. A tuple (dataset, ratio). The ratio represents the frequency certain dataset gets sampled to the rest of corpora map.

At most one of them could be specified.

Differential Revision: D15041293

…ytorch#494) Summary: Pull Request resolved: pytorch#494 Pull Request resolved: facebookresearch/fairseq#657 Library side change split from D14924942 Added 2 arguments for load_dataset in PytorchTranslateTask 1. dataset_upsampling. A nested dictionary {direction:{dataset: upsampling_ratio}}. Upsampling_ratio larger than one mean that the bitext is ob- served more often than actually present in the combined bitext and synthetic training corpus. 2. dataset_relative_ratio. A tuple (dataset, ratio). The ratio represents the frequency certain dataset gets sampled to the rest of corpora map. At most one of them could be specified. Differential Revision: D15041293 fbshipit-source-id: 3a15d8ca10f03cbf147bbcf56d385b358ca498ce

…ytorch#494) Summary: Pull Request resolved: pytorch#494 Pull Request resolved: facebookresearch/fairseq#657 Library side change split from D14924942 Added 2 arguments for load_dataset in PytorchTranslateTask 1. dataset_upsampling. A nested dictionary {direction:{dataset: upsampling_ratio}}. Upsampling_ratio larger than one mean that the bitext is ob- served more often than actually present in the combined bitext and synthetic training corpus. 2. dataset_relative_ratio. A tuple (dataset, ratio). The ratio represents the frequency certain dataset gets sampled to the rest of corpora map. At most one of them could be specified. Reviewed By: liezl200 Differential Revision: D15041293 fbshipit-source-id: cb5bdcbac503e9f2ceaf058fec8c59e69a3ab4a2

…ytorch#494) Summary: Pull Request resolved: pytorch#494 Pull Request resolved: facebookresearch/fairseq#657 Library side change split from D14924942 Added 2 arguments for load_dataset in PytorchTranslateTask 1. dataset_upsampling. A nested dictionary {direction:{dataset: upsampling_ratio}}. Upsampling_ratio larger than one mean that the bitext is ob- served more often than actually present in the combined bitext and synthetic training corpus. 2. dataset_relative_ratio. A tuple (dataset, ratio). The ratio represents the frequency certain dataset gets sampled to the rest of corpora map. At most one of them could be specified. Reviewed By: liezl200 Differential Revision: D15041293 fbshipit-source-id: 9b91de4c8f6a3e99ae2118c6f486660e8a86c09c

…acebookresearch#494) Summary: Pull Request resolved: pytorch/translate#494 Pull Request resolved: facebookresearch#657 Library side change split from D14924942 Added 2 arguments for load_dataset in PytorchTranslateTask 1. dataset_upsampling. A nested dictionary {direction:{dataset: upsampling_ratio}}. Upsampling_ratio larger than one mean that the bitext is ob- served more often than actually present in the combined bitext and synthetic training corpus. 2. dataset_relative_ratio. A tuple (dataset, ratio). The ratio represents the frequency certain dataset gets sampled to the rest of corpora map. At most one of them could be specified. Reviewed By: liezl200 Differential Revision: D15041293 fbshipit-source-id: b5eac5fadaf4fcaf32113fa8e3fe1a0bb3eae22d

…494) Summary: Pull Request resolved: #494 Pull Request resolved: facebookresearch/fairseq#657 Library side change split from D14924942 Added 2 arguments for load_dataset in PytorchTranslateTask 1. dataset_upsampling. A nested dictionary {direction:{dataset: upsampling_ratio}}. Upsampling_ratio larger than one mean that the bitext is ob- served more often than actually present in the combined bitext and synthetic training corpus. 2. dataset_relative_ratio. A tuple (dataset, ratio). The ratio represents the frequency certain dataset gets sampled to the rest of corpora map. At most one of them could be specified. Reviewed By: liezl200 Differential Revision: D15041293 fbshipit-source-id: 92daad29895c234e26d1b19f121106118a3957ad

…#494) Summary: Pull Request resolved: pytorch/translate#494 Pull Request resolved: facebookresearch/fairseq#657 Library side change split from D14924942 Added 2 arguments for load_dataset in PytorchTranslateTask 1. dataset_upsampling. A nested dictionary {direction:{dataset: upsampling_ratio}}. Upsampling_ratio larger than one mean that the bitext is ob- served more often than actually present in the combined bitext and synthetic training corpus. 2. dataset_relative_ratio. A tuple (dataset, ratio). The ratio represents the frequency certain dataset gets sampled to the rest of corpora map. At most one of them could be specified. Reviewed By: liezl200 Differential Revision: D15041293 fbshipit-source-id: 92daad29895c234e26d1b19f121106118a3957ad

cndn force-pushed the export-D15041293 branch from 1127a00 to 2eccf6a Compare April 24, 2019 18:09

cndn force-pushed the export-D15041293 branch from 2eccf6a to 08c11c7 Compare May 1, 2019 03:50

cndn force-pushed the export-D15041293 branch from 08c11c7 to e65ba59 Compare May 1, 2019 04:27

facebook-github-bot closed this in facebookresearch/fairseq@ff74ca9 May 1, 2019

facebook-github-bot added the Merged label May 1, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support dataset upsampling / relative ratio in PytorchTranslateTask (#657) #494

Support dataset upsampling / relative ratio in PytorchTranslateTask (#657) #494

cndn commented Apr 24, 2019

Support dataset upsampling / relative ratio in PytorchTranslateTask (#657) #494

Support dataset upsampling / relative ratio in PytorchTranslateTask (#657) #494

Conversation

cndn commented Apr 24, 2019