# Shapley
---

Shapley Vector - a method of a "fair" gain distribution between team members, introduced in the mid 20th century in the research of cooperative games (games theory). The principle was later adopted in Machine Learning for determining individual feature contributions

### Problem setting
There is a set of players $\{1, 2 … N\}$ that will form a team. The team collectively do some work and gets a revenue at the end. The task is to construct a "good" distribution policy for the revenue. There are different definitions of goodness in game theory, but in general - those who contribute more should get more

In primitive games players act independently. In individual sports competitions players who run faster / jump further than others, receive all the gain.

In cooperative games players act in coordination: player X can secure player B, player C can be assigned to work he does the best and so on. The  goal is to maximize the overall team's gain.

$S$ - a coalition (team) of players<br>
$v(S)$ - what revenue does the team S receive in a game

Player contribution (aka marginal contribution) is quite intuitive, it is "revenue with player" - "revenue without player"

<img src="img/formula1.png" width=150>


### Method idea
since player contribution depends on the team and we want to get a "fair" evaluation of it, it's better to average the contribution over ALL possible teamings. There are $n!$ possible teamings, so

<img src="img/formula2.png" width=250>

In order to compute, it is convenient to list all team combinations that include player X through player permutations. That is, for each permutation $R$ we find the position of player $X$ within it and form with all preceeding (to the left) players

The table illustrates this kind of iteration for 3 player coalitions. First, we list all permutations. Then compute the marginal contribution <br>The second column contains all contributions for player "1"
<img src="img/formula3.png" width=350>

Shapley Vector = the set of all Shapley Values<br>
$\varphi = \{\varphi_1, \varphi_2 ... \varphi_N \}$


### Formula variants

In most scenarios the gain does not depend on the order of players => we can simplify this formula by grouping teams that differ only by its order

The number of permutations inside the coalition of players $S$ equals $|S|!$<br>
The number of permutations of other players that we exclude equals $(N-S-1)!$<br>
Thus we can rewrite:

<img src="img/formula4.png" width=300>



And if we take $1/n$ out of the braces, we can also rewrite it using combination notation

<img src="img/unnamed (1).png" width=310>


# Properties


### 1) Effectiveness

Распределеник по Шепли - эффективно<br>В теории кооперативных игр эффективность распределения = ничего не остается нераспределенного

__Доказательство__<br>
Представляем все возможные коалиции в виде таблицы N! перестановок. Каждому элементу соответсвует его вклад в коалицию, что слева. Нам надо просуммировать вклады каждой переменной, уседнить и сложить их. Поменяем порядок суммирования - по строкам все всегда суммируется в v(N), так как мы суммируем дельты и, получается, порядок не важен. Тогда средняя строка будет как раз v(N) - это и есть сумма вкладов.

### Superadditivity
Если функция выгрыша v супераддитивна (в коалицию вступать выгодно), то вклад будет больше вклада игрока, если бы он играл один




A way to fairly evaluate contribution of each player to a team result

In simple game models you can evaluate them individually (game is a sum of player results)
In most real world scenarios players act differently depending on team cooperation strategy

{1,2 … N} players
S = coalition (set of players)
characteristic function v(S) = performance of a coalition S
It does not depend on the order of players

Important assumptions about player contribution:


- they are not additive
contribution depends on what already has been done


- they depend on other players
with strong team members new player brings less

Marginal contribution of player i to coalition S = additional revenue brought by the player i when he cooperates with team S


## Shapley as Synergy

Informaly, synergy = when effect of a coalition is bigger than sum of the individual effects<br> (i.e. value function is convex over coalitions)

Formally, synergy = "uplift" of the payout after joining player's efforts together<br>It is calculated using set algebra. Example for 2 players:<br>

$W_{X+Y} = V_{X+Y} - V_X - V_Y$<br><br>

If X and Y are subteams and not single players one can compute for multilevel inclusions. Here is the example for 2 levels:<br>

$W_{X+Y} = V_{X+Y} - (V_{x} - V_{X_1} - V_{x_2}) - (V_{Y} - V_{y_1} - V_{y_2})$

<img src="img/synergy2.jpg" width=300>

Or for arbitrary level
<img src="img/formula5.png" width=200>

Total synergy $w(S)$ = the sum of the uplifts of all inclusions in the team of players $S$

Synergy takes place when there is some sort of coordination between players in a team - it can boost or reduce the overall performance<br>This means synegry can be negative


Shapley Value = sum of the synergies of ALL teams that include player X divided by the size of the team
<img src="img/synergy1.png" width=200>



__Synergy Example__<br>
Team consists of a manager and a set of workers<br> $\{o, w_1 … w_N\}$ 

Without manager team is useless<br>
$v(w_1) = … = v(w_N) = 0$

Without workers manager is useless<br>
$v(o) = 0$

But together they provide result:<br>
$v(o, w_1 w_N) = (N-1) p$

Synergy of a {manager+worker} cooperation is positive and equals to<br>
$w(o, w_1) = -v(0) -v(w_1) + v(o, w_1)$

Shapley Value = sum of synergies per member in all coalitions

How Shapley is computed
1.compare results with and without player
2.average over all subsets of player combinations





# Attribtuion

In Marketing the task of conversion attribution = to evaluate what (and how much) drived the customer's decision to convert (to follow the advertisment or to buy a product). Usually it is the influence of different communication channels that is being compared

- Players = communication channels
- Coalition = mix of the channels used for communication with customer
- Gain = number of conversions or CTR

Задача: честно распределить конверсии между каналами. Правильная оценка позволит сделать вывод о полезности каналов
Task = to properly distribute conversions between the channels. A good evaluation will allow to make a proper conculsion about channel efficiencies


# SHAP
---

SHAP = Python library for Prediction interpretation

SHAP stands for Shapley Additive Explanations

It repeats the Shapley Vector mechanics but here:
- game = prediction on a point $x$ by a pretrained model $f$
- coaition = set of features 
- gain = prediction deviation from $E[f]$

Like Shapley method decomposes the return into sum of player contributions, it can also decompose the predicted value for each observation

<img src="img/formula.png" width=300>

We need approximate solution => train regression on a set of features. Resulting coefficients will be approximate Shapley values. This approach is called __KernelSHAP__

## Visualization

__Force plot__ = we start from $E[f]$ at the bottom and gradually add up new features showing how each of them affects the prediction<br> When all features are added we come up to the actual prediction 
<img src="img/shap1.png" width=500>

__Beeswarm__ = for each feature (vertical axis) plot its contributions for all data points in a dataset (horizontal axis). Color denotes feature value
<img src="img/shap2.png" width=500>

Среднее значение вклада от признака f по всему датасету = значимость признака
<img src="img/shap3.png" width=400>


По оси X - шкала предсказания, посередине - вывод модели, синее занижает предсказание, красное завышает
<img src="img/shap4.png" width=750>





# Usage Example
---

In [None]:
import shap
import sklearn
import numpy as np

from sklearn.datasets import load_iris
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split

# Load data
iris = load_iris()
X_train, X_test, y_train, y_test = train_test_split(
    iris.data, iris.target, random_state=42
)

# Train a model
model = RandomForestClassifier(n_estimators=100)
model.fit(X_train, y_train)


# Create a TreeExplainer for the RandomForest model
explainer = shap.TreeExplainer(model)

# Choose one instance to explain
instance = X_test[0].reshape(1, -1)

# Get SHAP values for that instance
shap_values = explainer.shap_values(instance)

