Skip to content
No description, website, or topics provided.
Jupyter Notebook Python MATLAB Shell
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
influence @ 578bc45
matlab
.gitignore
.gitmodules
LICENSE
README.md
__init__.py
combine_bounds_and_attacks.py
combine_quartiles.py
combine_shards.py
combine_slabs.py
data_utils.py
datasets.py
defense_testers.py
defenses.py
generate_bounds.py
generate_feasible_attack.py
generate_label_flip_baseline.py
generate_or_process_bounds.py
iterative_attack.py
make_combined_paper_plots.ipynb
mix_and_match_slab.py
plotter.py
process_all.sh
process_feasible_bounds.py
report.py
run_baseline_defenses.py
run_gradient_attack.py
test_defenses.py
upper_bounds.py

README.md

Certified Defenses for Data Poisoning Attacks

This code replicates the experiments from the following paper:

Jacob Steinhardt, Pang Wei Koh, and Percy Liang

Certified Defenses for Data Poisoning Attacks

NIPS 2017.

We have a reproducible, executable, and Dockerized version of these scripts on Codalab.

The datasets for the experiments can also be found at the Codalab link.

Dependencies:

  • Numpy/Scipy/Scikit-learn/Pandas
  • Tensorflow (tested on v1.1.0)
  • Keras (tested on v2.0.4)
  • Spacy (tested on v1.8.2)
  • h5py (tested on v2.7.0)
  • cvxpy (tested on 0.4.9)
  • MATLAB/Gurobi
  • Matplotlib/Seaborn (for visualizations)

A Dockerfile with these dependencies (except MATLAB) can be found here: https://hub.docker.com/r/pangwei/tf1.1_cvxpy/


Machine learning systems trained on user-provided data are susceptible to data poisoning attacks, whereby malicious users inject false training data with the aim of corrupting the learned model. While recent work has proposed a number of attacks and defenses, little is understood about the worst-case loss of a defense in the face of a determined attacker. We address this by constructing approximate upper bounds on the loss across a broad family of attacks, for defenders that first perform outlier removal followed by empirical risk minimization. Our approximation relies on two assumptions: (1) that the dataset is large enough for statistical concentration between train and test error to hold, and (2) that outliers within the clean (non- poisoned) data do not have a strong effect on the model. Our bound comes paired with a candidate attack that often nearly matches the upper bound, giving us a powerful tool for quickly assessing defenses on a given dataset. Empirically, we find that even under a simple defense, the MNIST-1-7 and Dogfish datasets are resilient to attack, while in contrast the IMDB sentiment dataset can be driven from 12% to 23% test error by adding only 3% poisoned data.

If you have questions, please contact Jacob Steinhardt (jsteinhardt@cs.stanford.edu) or Pang Wei Koh (pangwei@cs.stanford.edu).

You can’t perform that action at this time.