You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Besides that, does fairseq-preprocess do any extra processing of the data (that is, besides binarization) for any of the splits? Is any processing different between the splits?
The text was updated successfully, but these errors were encountered:
Good point, I'll add some docs. --trainpref produces the dictionary (dict.txt). --validpref and --testpref use that dictionary, but any missing words are replaced with <unk>. There is no difference between --validpref and --testpref other than the output filenames.
馃摎 Documentation
See: https://fairseq.readthedocs.io/en/latest/command_line_tools.html#Preprocessing
--trainpref
--validpref
--testpref
Besides that, does
fairseq-preprocess
do any extra processing of the data (that is, besides binarization) for any of the splits? Is any processing different between the splits?The text was updated successfully, but these errors were encountered: