Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unclear what differences if any exist in handling trainpref/validpref/testpref #2565

Closed
munael opened this issue Sep 3, 2020 · 1 comment
Assignees

Comments

@munael
Copy link

munael commented Sep 3, 2020

馃摎 Documentation

See: https://fairseq.readthedocs.io/en/latest/command_line_tools.html#Preprocessing

- -
--trainpref train file prefix
--validpref comma separated, valid file prefixes
--testpref comma separated, test file prefixes

Besides that, does fairseq-preprocess do any extra processing of the data (that is, besides binarization) for any of the splits? Is any processing different between the splits?

@myleott
Copy link
Contributor

myleott commented Sep 4, 2020

Good point, I'll add some docs. --trainpref produces the dictionary (dict.txt). --validpref and --testpref use that dictionary, but any missing words are replaced with <unk>. There is no difference between --validpref and --testpref other than the output filenames.

myleott added a commit that referenced this issue Oct 23, 2020
@myleott myleott mentioned this issue Oct 23, 2020
myleott added a commit that referenced this issue Oct 26, 2020
jinyiyang-jhu pushed a commit to jinyiyang-jhu/fairseq-jyang that referenced this issue Feb 26, 2021
Summary:
- Rename type -> key in fairseq/tasks/sentence_prediction.py (fixes facebookresearch/fairseq#2746)
- Update preprocessing docs (fixes facebookresearch/fairseq#2565)
- Turn off logging in test_fp16_optimizer.TestGradientScaling
- Documentation updates
- Remove some unused code
- Fix noisychannel example (fixes facebookresearch/fairseq#2213)

Pull Request resolved: facebookresearch/fairseq#2786

Reviewed By: shruti-bh

Differential Revision: D24515146

Pulled By: myleott

fbshipit-source-id: 86b0f5516c57610fdca801c60e58158ef052fc3a
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants