# Introduction to SHAP values

This notebook aims to clarify the steps described in Lundberg and Lee 2017, with examples and applications 

In [1]:
import numpy as np

def nonlinear_func(x):
    """This function defines a simple nonlinear function of 3 variables.
    The three variables correspond to the columns of the input matrix x."""
    return (0.7 + 1.3 * x[:, 0] -2.1 * x[:, 1]**2 + 1.5 * x[:, 2]**3)

print(nonlinear_func(np.array([[0.8, -0.1, 0.2]])))

[1.731]


We generate an input dataset of 10,000 triplets of values, where the values are generated according to a multivariate normal distribution with high correlation between the predictors.

In [2]:
np.random.seed(3141592)
means = [0, 0, 0]
covmat = [[1, 0.8, 0.9],
          [0.8, 1, 0.85],
          [0.9, 0.85, 1]]
x = np.random.multivariate_normal(mean=means, cov=covmat, size=10000)

In [3]:
np.corrcoef(x.T)

array([[1.        , 0.79996288, 0.89841072],
       [0.79996288, 1.        , 0.84726482],
       [0.89841072, 0.84726482, 1.        ]])

## Additive Attribution Models

> The best explanation of a simple model is the model itself.

In the case of a *linear model* with **uncorrelated** predictors, the interpretation of the model is straightforward, sin
$\mathbb{E}[f | {x_1, x_2, \ldots, x_p}] = \beta_0 + \beta_1 x_1 + \ldots \beta_p x_p$

If the predictors are correlated, the interpretation is less obvious. In the extreme case of two almost identical predictors, there is a whole subspace of predictor values that produce the same output, and intepretability at the single predictor level becomes impossible.

The fact that linear model with uncorrelated coefficient are the blueprint of an interpretable model suggests two things:

1. The interpretability of a model $f(x)$ can be quantified via another model $g(z)$.
2. $g(z)$ is a linear model. **TODO** can we prove that the coefficients of $g(z)$ are independent?

### Local Explanation Methods

Let $f: x \in \mathcal{X} \subseteq \mathbb{R}^N \to \mathcal{Y}$ be the original model. It can be regression, classification, or something else.

Let $g: z \in \{0, 1\}^M \to \mathcal{Y}'$ be the explanation model.

Note that:

1. We assume that the the inputs to the $g$ functions take binary vectors as inputs.
2. These inputs are called **simplified inputs** for reason that will become clear shortly.
4. The output space of the explanatory model is not yet well defined, and we indicate it with $\mathcal{Y}'$.

We assume that $g(z)$ is a **local method**. This means the output of $f(x^*)$ is interpreted based solely on the input $x^*$, and not on any other $x \in \mathcal{X} \subseteq \mathbb{R}^N$.

This is of **crucial importance**: it means that every pair $\{f(x), y\}$ has *its own explanation model*. The *functional form* of the explanation model is the same for all $\{f(x), y\}$ pairs, but the coefficient of the model vary from pair to pair.

So, for any given $x \in \mathbb{X}$ there is a $z \in \{0, 1\}^M$ and vice-versa. Note that going from $x$ to $z$ we incur in a loss of information. We introduce a map $h_x(z) \to x$ but this is not an analytical function, rather it is a rule-based lookup table that associates a $z$ to an $x$ and vice versa. The function $h_x(z)$ is *specific to the input* $x$.

**TO THINK ABOUT**

> Local methods try to ensure $g(z') \approx f(h_x(z'))$ whenever $z' \approx x'$.

## Additive Feature Attribution Methods

Based on the above discussion, it will not be a surprise that we restrict our attention to local explanation models that have a linear functional form. More precisely

$$
\begin{equation}
g(z) = \phi_0 + \sum_{i=1}^M \phi_i z_i
\end{equation}
$$

where $z_i \in \{0, 1\}$ and $\phi \in \mathbb{R}$.

The equation above assigns an effect $\phi_i$ to each *simplified* feature $z_i$. **ARE THESE EFFECTS INDEPENDENT?**.


### LIME

LIME is presented as an example of a local linear explanation model. Simplified inputs are in the form described above, although their mapping to the original input depends on the data modality (text, images, etc). LIME uses the following optimization method to find the coefficients $\phi_i$:

$$
g(z) = \arg \min_{g \in \mathcal{G}} L(f, g, \pi_{x'}) + \Omega(g)
$$

Where $L$ is a squared loss, $\pi$ is a weighting kernel, and $\Omega(g)$ is a regularizing term. We said that each $\{x, f(x)\}$ pair has its own set of coefficients, however here the squared loss is minimized
> over a set of samples in the simplified input space weighted by the kernel $\pi_{x'}$.

My understanding is the following: we fix $x$ and have a value $f(x)$. $z$ is the simplified input associated with $x$, and $h_x(z) = x$. If $z'$ is another simplified input "close" to $z$, the weighting kernel will give it a non-negligible weight. Conversely, if $z$ and $z'$ are "far apart" the kernel will set the contribution of $z'$ close to 0. Given $N$ inputs, the loss can be written as

$$
L(f, g, \pi_{x'}) = \sum_{i=1}^N \left[f(h_x(z)) - g(z^{(i)})\right]^2 \pi_{x'}(z^{(i)})
$$

Where $z^{(i)}$ is the i-th simplified input vector.

**VERIFY THIS PART**

### Classic Shapley Value Estimation

This approach was developed to deal with linear models in presence of multicollinearity. Let therefore be

$f(x) = \beta_0 + \beta_1 x_1 + \ldots \beta_p x_p$

Where $p$ is the total number of predictors ($p+1$, including the intercept). Let $F = \{x_1, \ldots, x_p\}$ be the set of all $p$ features and let $S \subset F$ be the subset of $F$ where we remove the i-th feature, i.e., $S = F \setminus \{i\}$. There are $2^{p-1}$ possible subsets, including the empty set $\emptyset$. We then compare the model on the subset $S$, and on the subset $S \cup \{i\}$, i.e., the same subset plus the feature we removed. We have:

$ \delta f_{i} = f_{S \cup \{i\}}(x_{S \cup \{i\}}) - f_S(x_S)$

In a linear model $\delta f_{i} = \beta_i^S$ where we must add the superscript $S$ to clarify that the value of $\beta_i$ will be different for every set $S$ we consider (we are fitting a different model every time).