-
Notifications
You must be signed in to change notification settings - Fork 65
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[P1] MNLI has two validation set, how do you report the score #62
Comments
@BaohaoLiao Hey, thanks for your question! For MNLI dataset, we choose the validation_matched split for validation and testing. (I will make this clear in the next revision. I think the RED paper was not clear either, so I figured this out by emailing the authors! I might also just describe what RED paper appendix says in the ReFT paper as well to make it self-contained about the validation setup and evaluation metric (whether use accuracy, correlation, etc..).) To reproduce, here is an example script for python train.py -task glue \
-train_dataset mnli \
-model FacebookAI/roberta-base \
-seed 42 -l all -r 1 -p f1 -e 40 -lr 6e-4 \
-type LoreftIntervention \
-gradient_accumulation_steps 1 \
-batch_size 32 \
-eval_batch_size 32 \
-test_split validation_matched \
-max_length 256 \
--metric_for_best_model accuracy \
--dropout 0.05 \
--weight_decay 0.0000 \
--warmup_ratio 0.00 \
--logging_steps 20 \
--allow_cls_grad Use the seeds Please let me know if you have other questions! And feel free to close the ticket if you feel like your question is addressed. Thanks for your interests! |
Also attaching GLUE benchmark description that will be added into the Appendix to provide more details. Please also see Appendix A.1 of the RED paper for the original implementation (I basically paraphrased their setup description, so credit goes to them). |
Thank you very much for your timely help. |
Hi,
I have a question about the GLUE task, MNLI. As you know, MNLI has matched and mis-matched validation set. How do you partition the validation set and report the score?
It would be great if you can offer the reproduction script for MNLI task.
The text was updated successfully, but these errors were encountered: