Skip to content

Commit

Permalink
05/04 Differential Privacy
Browse files Browse the repository at this point in the history
  • Loading branch information
irenetrampoline committed May 5, 2017
1 parent ab9bb28 commit 5732e52
Show file tree
Hide file tree
Showing 3 changed files with 39 additions and 0 deletions.
4 changes: 4 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -34,6 +34,10 @@ My 2017 resolution is to read an academic paper every day. Here I keep myself ac

**Mar 27, 2017:** [The Dependence of Machine Learning on Electronic Medical Record Quality.](writeups/HoLedAcz_17.md). L. Ho, D. Ledbetter, M. Aczon. 2017. [[pdf]](https://arxiv.org/pdf/1703.08251.pdf)

**May 04, 2017:** [Differential Privacy and Machine Learning: a Survey and Review.](writeups/JiLipElk14.md). Z. Ji, Z. C. Lipton, C. Elkan. . 2014. [[pdf]](https://arxiv.org/pdf/1412.7584.pdf)

**MORE TO COME!**

## To be read

[]
Expand Down
Binary file added img/JiLipElk14/fig1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
35 changes: 35 additions & 0 deletions writeups/JiLipElk14.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
# Differential Privacy and Machine Learning: a Survey and Review

Zhanglong Ji, Zachary C. Lipton, Charles Elkan. [Differential Privacy and Machine Learning: a Survey and Review.](https://arxiv.org/pdf/1412.7584.pdf) Dec. 2014.

## tl;dr
- DP is popular definition of privacy, making a mechanism robust to a change
- LaPlace and Gaussian noise can be added to make a query DP-safe
- ML algorithms can be modified to do so, potentially for free

## Definitions
**Differential privacy** requires a mechanism outputting information about a dataset be robust against any change of one sample.

![def](../img/JiLipElk14/fig1.png)

By this definition, **delta** refers to the confidence level (if delta > 0, the mechanism leaks information) whereas **epislon** refers to the level of privacy protection. Epislon is also called the privacy budget and may be split up among the different steps of a mechanism.

The **sensitivity** of a query is defined as the maximum distance between f(D) and f(D') where D and D' differ by 1. The normalizing function can be L1 or L2.

The **Laplacian mechanism** adds noise distributed Laplacian aka exp(-eps / S) where S is the sensitivity. This satisfies epsilon-privacy (aka delta = 0). A similar mechanism can be built for Gaussian noise, but that only preserves differential privacy for delta > 0.

The paper also discusses **local sensitivity**: given a dataset D, find the D' that maximizes distance of query f(D) and f(D'). We have a problem where attackers can infer whether or not the dataset is D or D' based on the distance between f(D) and f(D') since D can have a small local sensitivity and D' can have a larger one. The **smoothening sensitivity** smooths the scale of noise across neighboring datasets.

Lastly, the **sample and aggregate** framework samples the dataset D and calculates f on various subsets. We then find the nearest neighbor for each f(D_i) over roughly half of the options and smooth to ensure differential privacy.


## Machine Learning methods
- Supervised: Naive Bayes, linear regression, linear SVM, logistic regression, kernel SVM, decision tree, online convex programming
- Unsupervised: K-means
- Dimensionality reduction, PCA

## Four main ideas to reduce noise while still achieve DP
- add noice once instead of every round
- lower global sensitivity
- public information can help
- iterative noise addition

0 comments on commit 5732e52

Please sign in to comment.