Skip to content


Switch branches/tags

Name already in use

A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?

Latest commit


Git stats


Failed to load latest commit information.


This repository provides code, datasets, and pretrained models for our paper "Failing Loudly: An Empirical Study of Methods for Detecting Dataset Shift", to be presented at Neural Information Processing Systems (NeurIPS) 2019.

Paper URL:


We might hope that when faced with unexpected inputs, well-designed software systems would fire off warnings. Machine learning (ML) systems, however, which depend strongly on properties of their inputs (e.g. the i.i.d. assumption), tend to fail silently. This paper explores the problem of building ML systems that fail loudly, investigating methods for detecting dataset shift and identifying exemplars that most typify the shift. We focus on several datasets and various perturbations to both covariates and label distributions with varying magnitudes and fractions of data affected. Interestingly, we show that across the dataset shifts that we explore, a two-sample-testing-based approach, using pre-trained classifiers for dimensionality reduction, performs best. Moreover, we demonstrate that domain-discriminating approaches tend to be helpful for characterizing shifts qualitatively and determining if they are harmful.

Running experiments

Run all experiments using:


Run single experiments using:


Example: python mnist adversarial_shift univ


We require the following dependencies:


We provide shift detection using the datasets, dimensionality reduction (DR) techniques, tests, and shift types as reported in the paper. Interested users can adapt the config block in to their own needs to change:

  • the DR methods used,
  • how many samples to obtain from the test set,
  • how many random runs should be performed,
  • the significance level of the test,
  • and which shifts should be simulated.

Custom shifts can be defined in


While some datasets are already part of the Keras distribution (like MNIST, CIFAR10, and Fashion MNIST), other datasets we tested against are not directly provided. That's why we provide external datasets in the datasets directory for your convenience.

Pre-trained models

This repository also provides pre-trained models for the autoencodes and BBSD for the datasets that we tested our detectors against. If you supply a dataset for which no pre-trained model is available, we will train a BBSD model for you on the fly. Convolutional autoencoder models need to be defined by you in, though, as we cannot ensure that all datasets reduce to the desired latent dimension and a convolutional architecture limits the way we can reduce the dimensionality.


Code repository for our paper "Failing Loudly: An Empirical Study of Methods for Detecting Dataset Shift":







No releases published


No packages published