Skip to content

owenzx/InstabilityAnalysis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

The Curse of Performance Instability in Analysis Datasets: Consequences, Source, and Suggestions

This is the official repo for the following paper

  • The Curse of Performance Instability in Analysis Datasets: Consequences, Source, and Suggestions, Xiang Zhou, Yixin Nie, Hao Tan and Mohit Bansal, EMNLP 2020 (arxiv)

Dependencies

This code requires Python 3. All the dependencies are specified in "requirement.txt"

pip install -r requirements.txt

Instructions

The current code supports the calculation of decomposed variance metrics from standard evaluation numbers.

  1. Download the NLI datasets and put it under the nli_data folder in the root directory

  2. Organize the evaluation result of your model under the models directly in the same way as the berts (an example folder showing the result of BERT-base) folder, name of the folder representing the model type

    • MODEL_TYPE/seed_x saves the evaluation results with seed x
    • Inside MODEL_TYPE/seed_x/, each folder represent the evaluation result on one dataset, including three files:
      • eval_results.txt : Final accuracy of the model
      • logits_results.txt : List of logits output by the model on every example in the dataset
      • pred_results.txt : List of labels predicted by the model on every example in the dataset
  3. Run the evaluation scripts by

    python variance_report.py MODEL_PATH
    

Other scripts (training/evaluation/analysis) and model checkpoints that are used in the paper will come soon.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages