-
Notifications
You must be signed in to change notification settings - Fork 61
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Question about the back translations. #13
Comments
We use back-translation to create paraphrases for unlabeled data and perform consistency training. You could use other ways to generate paraphrases. |
So I have to create paraphrases, right? In addition, When I look at the code, I find that only the first 100,000 pieces of data in the data set have been back translated. Do I not need to perform back translation for all the datasets? |
It depends on the size of the unlabeled data you are going to use. In this work, we used 100,000 unlabeled data, so we just did back translations on them, not the whole dataset. |
Sorry, I'm still a little confused. |
You could use up to 100,000 |
10,000 |
Anyway, the number of data you need to paraphrase only depends on the number of unlabeled data you are going to use. |
Are they one-to-one correspondence? |
one unlabeled data could be associated with multiple paraphrases. Please refer to the paper/codes for details. |
Can I not do data augmentation on unlabelled data?
The text was updated successfully, but these errors were encountered: