Certified Defenses for Data Poisoning Attacks

This code replicates the experiments from the following paper:

Jacob Steinhardt, Pang Wei Koh, and Percy Liang

Certified Defenses for Data Poisoning Attacks

NIPS 2017.

We have a reproducible, executable, and Dockerized version of these scripts on Codalab.

The datasets for the experiments can also be found at the Codalab link.

Dependencies:

Numpy/Scipy/Scikit-learn/Pandas
Tensorflow (tested on v1.1.0)
Keras (tested on v2.0.4)
Spacy (tested on v1.8.2)
h5py (tested on v2.7.0)
cvxpy (tested on 0.4.9)
MATLAB/Gurobi
Matplotlib/Seaborn (for visualizations)

A Dockerfile with these dependencies (except MATLAB) can be found here: https://hub.docker.com/r/pangwei/tf1.1_cvxpy/

Machine learning systems trained on user-provided data are susceptible to data poisoning attacks, whereby malicious users inject false training data with the aim of corrupting the learned model. While recent work has proposed a number of attacks and defenses, little is understood about the worst-case loss of a defense in the face of a determined attacker. We address this by constructing approximate upper bounds on the loss across a broad family of attacks, for defenders that first perform outlier removal followed by empirical risk minimization. Our approximation relies on two assumptions: (1) that the dataset is large enough for statistical concentration between train and test error to hold, and (2) that outliers within the clean (non- poisoned) data do not have a strong effect on the model. Our bound comes paired with a candidate attack that often nearly matches the upper bound, giving us a powerful tool for quickly assessing defenses on a given dataset. Empirically, we find that even under a simple defense, the MNIST-1-7 and Dogfish datasets are resilient to attack, while in contrast the IMDB sentiment dataset can be driven from 12% to 23% test error by adding only 3% poisoned data.

If you have questions, please contact Jacob Steinhardt (jsteinhardt@cs.stanford.edu) or Pang Wei Koh (pangwei@cs.stanford.edu).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Certified Defenses for Data Poisoning Attacks

Files

README.md

Latest commit

History

README.md

File metadata and controls

Certified Defenses for Data Poisoning Attacks