You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Thank you very much for making the repository public!
I have one confusion regarding the train and valid file in mono and para directory for NMT task for model pre-training and fine-tuning tasks.
As stated in the README file. I understand dict.en.txt and dict.zh.txt should be exact same in both mono and para directory. And in para directory bilingual data should be there in order to fine-tune the model for fine-tune task. The confusion i have is basically for mono directory and number of examples it should contain for both the languages in their respective train and valid files.
Whether number of sentences and the sentences itself in both languages can differ for mono directory, right? I mean it should not matter if one uses, lets say, 100 sentences for en and 200 sentences for zh as they are just bunch of monolingual data.
The only point to note that is both mono and para directory should share same dictionary files, right?
The text was updated successfully, but these errors were encountered:
Thank you very much for making the repository public!
I have one confusion regarding the
train
andvalid
file inmono
andpara
directory for NMT task for model pre-training and fine-tuning tasks.As stated in the README file. I understand dict.en.txt and dict.zh.txt should be exact same in both
mono
andpara
directory. And in para directory bilingual data should be there in order to fine-tune the model for fine-tune task. The confusion i have is basically formono
directory and number of examples it should contain for both the languages in their respectivetrain
andvalid
files.Whether number of sentences and the sentences itself in both languages can differ for
mono
directory, right? I mean it should not matter if one uses, lets say, 100 sentences for en and 200 sentences for zh as they are just bunch of monolingual data.The only point to note that is both mono and para directory should share same dictionary files, right?
The text was updated successfully, but these errors were encountered: