Model Evaluation using SNLI Development Set
Here is an example of evaluating a model (fine-tuned either on MNLI or SNLI) using SNLI development set Bowman et al., 2015.
This is an example of using in Google Colab:
!git clone
!mv /content/transformers-snli/* /content/
!rm transformers-snli -r
!pip install -r requirements.txt
# Download SNLI
!mkdir /content/data/ && mkdir /content/data/glue
!unzip /content/ -d /content/data/glue/
!rm -r /content/data/glue/__MACOSX && rm
!mv /content/data/glue/snli_1.0 /content/data/glue/SNLI
!python \
--task_name snli \
--do_eval \
--data_dir data/glue/SNLI \
--model_name_or_path mnli-6 \
--max_seq_length 128 \
--output_dir mnli-6
- Note that the instructions above apply to models fine-tuned on MNLI; however, a model which is fine-tuned on SNLI could also be evaluated using these instructions with a few minor changes explained at the end of this document; nevertheless, I suggest using this repository to do so.
This will create the snli_predictions.txt file in ./mnli-6, which can then be evaluated using
!python ./mnli-6/snli_predictions.txt
The evaluation results for the mnli-6 model is as follows:
entailment: 0.9035746470411535
neutral: 0.7669242658423493
contradiction: 0.8398413666870043
Overall SNLI Dev Evaluation Accuracy: 0.8374314163787848
- Refer to (lines 131, 196, and 271), and apply the mentioned changes if you want to evaluate a model that is fine-tuned on the SNLI training set using above instructions.