# SHAP

SHAP (SHapley Additive exPlanations) is a method used to explain the output of machine learning models.  
SHAP aims to explain how an input affects the output of the model, by showing the impact of each input feature on the output.  
When reading the SHAP values, you will see for each input feature how much it possitively or negatively pushed the output to the answer we got, compared to a average base value of the dataset.

You can read more here: https://trustyai-explainability.github.io/trustyai-site/main/local-explainers.html

In [None]:
!pip -q install "onnx" "onnxruntime"

In [None]:
import pandas as pd
import onnxruntime as rt
import numpy as np
import pickle

Let's start by loading some artifacts.  
We will need:
- The ONNX model
- Our pre-and-post processing artifacts
    - scaler.pkl
    - label_encoder.pkl
- Some data
    - The training inputs, these will be used to get an average input for our dataset
    - The test data, these will be used to get a point we want to analyse

In [None]:
onnx_session = rt.InferenceSession("../2-dev_datascience/models/jukebox/1/model.onnx", providers=rt.get_available_providers())
onnx_input_name = onnx_session.get_inputs()[0].name
onnx_output_name = onnx_session.get_outputs()[0].name


with open('../2-dev_datascience/models/jukebox/1/artifacts/scaler.pkl', 'rb') as handle:
    scaler = pickle.load(handle)

with open('../2-dev_datascience/models/jukebox/1/artifacts/label_encoder.pkl', 'rb') as handle:
    label_encoder = pickle.load(handle)

with open('../2-dev_datascience/models/jukebox/1/artifacts/y_test.pkl', 'rb') as handle:
    y_test = pickle.load(handle)

X_train = pd.read_parquet("../2-dev_datascience/models/jukebox/1/artifacts/X_train.parquet")
X_test = pd.read_parquet("../2-dev_datascience/models/jukebox/1/artifacts/X_test.parquet")

We arbitrarily choose the first datapoint in our test data to be the data we want to test.  
In practice, you might choose the datapoint that you predict the worst on, or a datapoint that gave an unexpected answer.  
We also look at how our datapoint looks like when normalized (after going through pre-processing). This is how it will look like going into the model

In [None]:
point_to_explain = X_test.iloc[0:1]
point_to_explain

In [None]:
def normalize_dataframe(df):
    normalized_data = scaler.transform(df)
    return pd.DataFrame(normalized_data, columns=df.columns)

In [None]:
normalize_dataframe(point_to_explain)

We grab all the countrycodes from the post-processing artifact label_encoder.  
We will use these to know what output is what country.

In [None]:
output_names = label_encoder.classes_
output_names

TrustyAI SHAP explainer requires our model to have a pandas dataframe as an input, and numpy or pandas output, so we wrap our model in a pred() function that makes sure the input and output are converted properly. 

In [None]:
def pred(x):
    pred = onnx_session.run([onnx_output_name], {onnx_input_name: x.to_numpy().astype(np.float32)})[0]
    return pd.DataFrame(pred, columns=output_names)

In [None]:
from trustyai.model import Model
trustyai_model = Model(pred, dataframe_input=True, output_names=output_names)

Let's try to use our TrustyAI Model to predict the output of our point we want to explain with SHAP.

In [None]:
trustyai_model(normalize_dataframe(point_to_explain))

And with everything set up, we can create a SHAP explainer and let it analyze our datapoint!  
You can also note that we add 100 datapoints from our training dataset to the SHAPExplainer, this is used to calculate the average base values of our dataset. With this we can see how much our interesting datapoint contributes to the prediction compared to what a "standard" value would.

In [None]:
from trustyai.explainers import SHAPExplainer
explainer = SHAPExplainer(background=normalize_dataframe(X_train[:100]))

In [None]:
explanations = explainer.explain(inputs=normalize_dataframe(point_to_explain),
                                 outputs=pd.DataFrame([dict(zip(output_names, y_test[0]))]), #This is just the ground truth of the point_to_explain
                                 model=trustyai_model)

With our SHAP Explainer ready we can start looking at the results.

Let's choose a specific output country which we want to know how the input affected.  
CH is the country which we are supposed to get as the popular country for this input, so it's especially interesting to see the inputs effect on that output.  
That being said, feel free to try with a few other countries and see what happens.  

In [None]:
COUNTRY_OF_INTEREST = "CH"

First we will get a table of values.  
Here we can see the **Mean Background Value** - this is the average base value we were talking about before.  
We can also see our **Value**, which is the normalized datapoint that we sent into the explainer. Red values are lower than the average value and green values are higher.  
Finally, we have the **SHAP Value**. These indicate how much that input feature had an effect on the output. Red indicate a negative contribution to the prediction while green a possitive contribution. The large the value, the larger the contribution.

In [None]:
explanations.as_html()[COUNTRY_OF_INTEREST]

We can also visualize it as a candlestick plot, seeing how the different input features build up to the output value.

In [None]:
from trustyai.visualizations.shap import SHAPViz
SHAPViz()._matplotlib_plot(explanations=explanations, output_name=COUNTRY_OF_INTEREST)