This repository is the official implementation of "The Spotlight: A General Method for Discovering Systematic Errors in Deep Learning Models" (under submission at NeurIPS 2021). It includes:
- Training code for FairFace and X-ray classifiers
- Code for running the spotlight (inference passes and spotlight optimizer)
- Analysis notebooks used to visualize results in paper
To install requirements for training and running spotlights:
pip install -r requirements.txt
For analysis notebooks, we used Singularity to run the scipy-notebook Jupyter Docker stack.
Our experiments use the following datasets.
Set the environment variable DATA_DIR
appropriately:
$DATA_DIR/fairface
: FairFace, using thepadding=0.25
version of the dataset$DATA_DIR/imagenet
: ImageNet$DATA_DIR/amazon
: Amazon Polarity$DATA_DIR/squad
: SQuAD$DATA_DIR/movielens
: MovieLens 100k, from Graham, Hartford et al.'s implementation of DeepSet$DATA_DIR/xray
: X-ray
For two of the domains in the paper, we train classifiers using standard architectures and training methods.
These scripts assume that DATA_DIR
and MODEL_DIR
have been set appropriately:
FairFace:
python train_fairface.py --checkpoint_dir $MODEL_DIR/fairface
X-ray:
python train_xray.py
We include inference scripts for each model, saving final-layer embeddings along with model outputs and losses:
inference_fairface.py
(FairFace)inference_imagenet.py
(ImageNet)inference_amazon.py
(Amazon Polarity)inference_squad.py
(SQuAD)inference_movielens.py
(MovieLens)inference_xray.py
(X-ray)
The spotlight is implemented as a command-line utility in spotlight/run_spotlight.py
.
The specific commands that we ran in our experiments are listed in:
spotlights_fairface.sh
(FairFace)spotlights_imagenet.sh
(ImageNet)spotlights_amazon.sh
(Amazon Polarity)spotlights_squad.sh
(SQuAD)spotlights_movielens.sh
(MovieLens)spotlights_xray.sh
(X-ray)
The results shown in our paper are produced by analyzing examples in each dataset that are given high weights by the spotlights.
We include our spotlight weights in spotlight_outputs/
, and Jupyter notebooks to visualize these results in analysis.ipynb
and analysis_nlp.ipynb
(for image/recommender systems and NLP models, respectively).