[P1] MNLI has two validation set, how do you report the score #62

BaohaoLiao · 2024-04-22T16:14:26Z

Hi,

I have a question about the GLUE task, MNLI. As you know, MNLI has matched and mis-matched validation set. How do you partition the validation set and report the score?

It would be great if you can offer the reproduction script for MNLI task.

frankaging · 2024-04-22T16:54:17Z

@BaohaoLiao Hey, thanks for your question! For MNLI dataset, we choose the validation_matched split for validation and testing. (I will make this clear in the next revision. I think the RED paper was not clear either, so I figured this out by emailing the authors! I might also just describe what RED paper appendix says in the ReFT paper as well to make it self-contained about the validation setup and evaluation metric (whether use accuracy, correlation, etc..).)

To reproduce, here is an example script for RoBERTa-base. For RoBERTa-large, you can copy the hyperparameters from our appendix to reproduce:

python train.py -task glue \
-train_dataset mnli \
-model FacebookAI/roberta-base \
-seed 42 -l all -r 1 -p f1 -e 40 -lr 6e-4 \
-type LoreftIntervention \
-gradient_accumulation_steps 1 \
-batch_size 32 \
-eval_batch_size 32 \
-test_split validation_matched \
-max_length 256 \
--metric_for_best_model accuracy \
--dropout 0.05 \
--weight_decay 0.0000 \
--warmup_ratio 0.00 \
--logging_steps 20 \
--allow_cls_grad

Use the seeds {42,43,44,45,46}. And for the validation set partition, please refer to our code for details. But basically, we partition a set from the validation set (random partition based on the seed) for selecting the best model, and report the final accuracy on the hold out set.

Please let me know if you have other questions! And feel free to close the ticket if you feel like your question is addressed.

Thanks for your interests!

frankaging · 2024-04-22T18:36:34Z

Also attaching GLUE benchmark description that will be added into the Appendix to provide more details. Please also see Appendix A.1 of the RED paper for the original implementation (I basically paraphrased their setup description, so credit goes to them).

BaohaoLiao · 2024-04-22T23:35:11Z

Thank you very much for your timely help.

frankaging self-assigned this Apr 22, 2024

frankaging added the question Further information is requested label Apr 22, 2024

frankaging changed the title ~~MNLI has two validation set, how do you report the score~~ [P1] MNLI has two validation set, how do you report the score Apr 22, 2024

BaohaoLiao closed this as completed Apr 22, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[P1] MNLI has two validation set, how do you report the score #62

[P1] MNLI has two validation set, how do you report the score #62

BaohaoLiao commented Apr 22, 2024 •

edited

frankaging commented Apr 22, 2024 •

edited

frankaging commented Apr 22, 2024 •

edited

BaohaoLiao commented Apr 22, 2024

[P1] MNLI has two validation set, how do you report the score #62

[P1] MNLI has two validation set, how do you report the score #62

Comments

BaohaoLiao commented Apr 22, 2024 • edited

frankaging commented Apr 22, 2024 • edited

frankaging commented Apr 22, 2024 • edited

BaohaoLiao commented Apr 22, 2024

BaohaoLiao commented Apr 22, 2024 •

edited

frankaging commented Apr 22, 2024 •

edited

frankaging commented Apr 22, 2024 •

edited