Skip to content

peterbhase/ExplanationRoles

main
Switch branches/tags

Name already in use

A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?
Code

Latest commit

 

Git stats

Files

Permalink
Failed to load latest commit information.
Type
Name
Latest commit message
Commit time
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

When Can Models Learn From Explanations?

This is the codebase for the paper: "When Can Models Learn From Explanations? A Formal Framework for Understanding the Roles of Explanation Data"

Here's the directory structure:

data/ --> data folder (files too large to upload here but are publicly available)
models/ --> contains special model classes for use with retrieval model
training_reports/ --> folder to be populated with individual training run reports
result_sheets/ --> folder to be populated with .csv's of results from experiments 
figures/ --> contains plots generated by plots.Rmd
main.py --> main script for all individual experiments in the paper
make_SNLI_data.py --> convert e-SNLI .txt files to .csv's
plots.Rmd --> R markdown file that makes plots using .csv's in result_sheets
report.py --> experiment logging class, reports appear in training_reports
retriever.py --> class for retrieval model
run_tasks.py --> script for running several experiments for each RQ in the paper
utils.py --> data loading and miscellaneous utilities
write_synthetic_data.py --> script for writing synthetic datasets

The code is written in python 3.6. plots.Rmd is an R markdown file that makes the plots for each experiment. The package requirements are:

torch==1.4
transformers==3.3.1
faiss-cpu==1.6.3
pandas==1.0.5
numpy==1.18.5
scipy==1.4.1
sklearn==0.23.1
argparse==1.1
json==2.0.9

Experimental results in the paper are replicated by running run_tasks.py with a special experiment command. Below, we give commands organized by the corresponding research question in the paper. For synthetic data experiments, all that is required is that you first set save_dir and cache_dir in main.py. We later give instructions for downloading and formatting data for experiments with existing datasets.

The run_tasks.py script can take a few additional arguments when desired: --seeds gives the number of seeds to run for each session (defaults to 1), --gpu controls the GPU, and --train_batch_size and --grad_accumulation_factor can be used to control the effective train batch size and memory usage.

RQ1

python run_tasks.py --experiment memorization_by_num_tasks

RQ2

python run_tasks.py --experiment memorization_by_n

python run_tasks.py --experiment missing_by_learning

RQ3

python run_tasks.py --experiment evidential_by_learning

python run_tasks.py --experiment recomposable_by_learning

RQ4

python run_tasks.py --experiment evidential_opt_by_method_n

RQ5

python run_tasks.py --experiment memorization_by_r_smoothness

RQ6

python run_tasks.py --experiment missing_by_feature_correlation

python run_tasks.py --experiment missing_opt_by_translate_model_n

RQ7

python run_tasks.py --experiment evidential_by_retriever

python run_tasks.py --experiment evidential_by_init

RQ8

The dataused used in the paper can be obtained here: TACRED and e-SNLI (SemEval included here), and should be placed into folders in data/ titled semeval, tacred, and eSNLI. Running make_SNLI_data.py will format the SNLI data into .csv files as expected by data utilities in utils.py.

Experiments with existing datasets:

SemEval

python run_tasks.py --experiment semeval_baseline

python run_tasks.py --experiment semeval_textcat

python run_tasks.py --experiment semeval_textcat_by_context

TACRED

python run_tasks.py --experiment tacred_baseline

python run_tasks.py --experiment tacred_textcat

python run_tasks.py --experiment tacred_textcat_by_context

e-SNLI

python run_tasks.py --experiment esnli_baseline

python run_tasks.py --experiment esnli_textcat

python run_tasks.py --experiment esnli_textcat_by_context

Additional tuning experiments:

python run_tasks.py --experiment missing_by_k

python run_tasks.py --experiment missing_by_rb

python run_tasks.py --experiment evidential_opt_by_method_c

python run_tasks.py --experiment evidential_by_k

Seed tests:

python run_tasks.py --experiment memorization_by_seed_test --seeds 10

python run_tasks.py --experiment missing_by_seed_test --seeds 5

python run_tasks.py --experiment evidential_by_seed_test --seeds 5

About

Code for paper "When Can Models Learn From Explanations? A Formal Framework for Understanding the Roles of Explanation Data"

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages