# DeepFool
paper: [DeepFool: a simple and accurate method to fool deep neural networks](https://arxiv.org/pdf/1511.04599.pdf)

code: [tree-regularization-public](https://github.com/dtak/tree-regularization-public)

## Binary Classification
The author uses a simple affine function $y = w^T x + b$ as an introduction.
<img src="images/binary_classification.png"/>

The minimal change for $\mathbf{x_0}$ is to move to the projection of $\mathbf{x_0}$ in the line. The distance of $\mathbf{x_0}$ to the line is:
$$
d = \frac{\lvert f(\mathbf{x_0})\rvert}{\lVert \mathbf{w} \rVert_2},
$$
where $\lVert \mathbf{w} \rVert_2 = \sqrt{w_1^2 + w_2^2}$.

$\mathbf{x_0}$ is a point, and thus the distance should be decomposed to $x$ and $y$ offsets, which is:
$$
r_*(\mathbf{x_0}) = \begin{bmatrix}
-\frac{f(\mathbf{x_0})}{\lVert \mathbf{w} \rVert_2^2} w_1 \\
-\frac{f(\mathbf{x_0})}{\lVert \mathbf{w} \rVert_2^2} w_2
\end{bmatrix} = -\frac{f(\mathbf{x_0})}{\lVert \mathbf{w} \rVert_2^2} \mathbf{w}
$$

We can extend the above equation to the following form (**the minimal perturbation to change the classifier's decision**) to fit general binary classifier:
$$
r_*(\mathbf{x_0}) =  -\frac{f(\mathbf{x_0})}{\lVert \bigtriangledown{f(\mathbf{x_0})} \rVert_2^2} \bigtriangledown{f(\mathbf{x_0})}
$$


However, the above algorithm can only converge to a point on the zero level set. In order to reach the other side of the classification boundary, **the final perturbation verctor is multiplied by a constant $1 + \eta$, with $\eta \ll 1$**. In paper, the author uses $\eta = 0.02$.


## 2. Multiclass Classification
If we define the hyperplane as $\mathscr{F}_k = f_{\hat{k}(\mathbf{x_0})}(\mathbf{x_0}) - f_k(\mathbf{x_0}), k = 1, 2, \cdots, c$, where $k$ is $k$th class index, $\hat{k}(\mathbf{x_0})$ is the class index $\mathbf{x_0}$ is classified into, $f_k(\mathbf{x_0})$ is the $k$th output of the final layer before softmax and $c$ is the number of classes.

The fact that $\mathbf{x_0}$ belongs to $\hat{k}(\mathbf{x_0})$ means that $\forall k \in \{1, 2, \cdots, c\}, \mathscr{F_k} \geq 0$. The illustration is shown as follows:
<img src="images/multiclass_classification.png" width="350"/>

The goal of DeepFool is to perturb the image so that the model misclassifies input image, and the misclassified label is not the concern. Therefore, we only need to move $\mathbf{x_0}$ outside of the green region shown in the figure above.

**The minimal perturbation to change the classifier's decision for multiclass classification is**:
$$
r_*(\mathbf{x_0}) =  -\frac{\mathscr{F}_k(\mathbf{x_0})}{\lVert \bigtriangledown{\mathscr{F}_k(\mathbf{x_0})} \rVert_2^2} \bigtriangledown{\mathscr{F}_k(\mathbf{x_0})}
$$

In practical, the author selects the top $10$ classes the input image is classified to and iterate them to select one that produces the least perturbation which is defined as the least disance from $\mathbf{x_0}$ to hyperplanes. $\bigtriangledown{\mathscr{F}_k(\mathbf{x_0})}$ is computed by $\bigtriangledown f_{\hat{k}(\mathbf{x_0})}(\mathbf{x_0}) - \bigtriangledown f_k(\mathbf{x_0})$.

<img src="images/multiclass_classification_alg.png" width="350"/>