-
Notifications
You must be signed in to change notification settings - Fork 9
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
1 parent
ab9bb28
commit 5732e52
Showing
3 changed files
with
39 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,35 @@ | ||
# Differential Privacy and Machine Learning: a Survey and Review | ||
|
||
Zhanglong Ji, Zachary C. Lipton, Charles Elkan. [Differential Privacy and Machine Learning: a Survey and Review.](https://arxiv.org/pdf/1412.7584.pdf) Dec. 2014. | ||
|
||
## tl;dr | ||
- DP is popular definition of privacy, making a mechanism robust to a change | ||
- LaPlace and Gaussian noise can be added to make a query DP-safe | ||
- ML algorithms can be modified to do so, potentially for free | ||
|
||
## Definitions | ||
**Differential privacy** requires a mechanism outputting information about a dataset be robust against any change of one sample. | ||
|
||
![def](../img/JiLipElk14/fig1.png) | ||
|
||
By this definition, **delta** refers to the confidence level (if delta > 0, the mechanism leaks information) whereas **epislon** refers to the level of privacy protection. Epislon is also called the privacy budget and may be split up among the different steps of a mechanism. | ||
|
||
The **sensitivity** of a query is defined as the maximum distance between f(D) and f(D') where D and D' differ by 1. The normalizing function can be L1 or L2. | ||
|
||
The **Laplacian mechanism** adds noise distributed Laplacian aka exp(-eps / S) where S is the sensitivity. This satisfies epsilon-privacy (aka delta = 0). A similar mechanism can be built for Gaussian noise, but that only preserves differential privacy for delta > 0. | ||
|
||
The paper also discusses **local sensitivity**: given a dataset D, find the D' that maximizes distance of query f(D) and f(D'). We have a problem where attackers can infer whether or not the dataset is D or D' based on the distance between f(D) and f(D') since D can have a small local sensitivity and D' can have a larger one. The **smoothening sensitivity** smooths the scale of noise across neighboring datasets. | ||
|
||
Lastly, the **sample and aggregate** framework samples the dataset D and calculates f on various subsets. We then find the nearest neighbor for each f(D_i) over roughly half of the options and smooth to ensure differential privacy. | ||
|
||
|
||
## Machine Learning methods | ||
- Supervised: Naive Bayes, linear regression, linear SVM, logistic regression, kernel SVM, decision tree, online convex programming | ||
- Unsupervised: K-means | ||
- Dimensionality reduction, PCA | ||
|
||
## Four main ideas to reduce noise while still achieve DP | ||
- add noice once instead of every round | ||
- lower global sensitivity | ||
- public information can help | ||
- iterative noise addition |