### Shapash Model Overview
https://shapash.readthedocs.io/en/latest/

# Shapash

Shapash is a Python library which aims to make machine learning interpretable and understandable to everyone. Shapash provides several types of visualization which displays explicit labels that everyone can understand. Data Scientists can more easily understand their models and share their results. End users can understand the decision proposed by a model using a summary of the most influential criteria. The project was developed by MAIF Data Scientists.

##### With this tutorial you:
Understand how to create a Shapash SmartPredictor to make prediction and have local explanation in production with a simple use case.

This tutorial describes the different steps from training the model to Shapash SmartPredictor deployment. A more detailed tutorial allows you to know more about the SmartPredictor Object.

Contents:

- Build a Regressor
- Compile Shapash SmartExplainer
- From Shapash SmartExplainer to SmartPredictor
- Save Shapash Smartpredictor Object in pickle file
- Make a prediction

In [1]:
import seaborn as sns
import pandas as pd

In [2]:
df=pd.read_csv('breast-cancer_csv.csv')

In [3]:
df.head()

Unnamed: 0,age,menopause,tumor-size,inv-nodes,node-caps,deg-malig,breast,breast-quad,irradiat,Class
0,40-49,premeno,15-19,0-2,yes,3,right,left_up,no,recurrence-events
1,50-59,ge40,15-19,0-2,no,1,right,central,no,no-recurrence-events
2,50-59,ge40,35-39,0-2,no,2,left,left_low,no,recurrence-events
3,40-49,premeno,35-39,0-2,yes,3,right,left_low,yes,no-recurrence-events
4,40-49,premeno,30-34,3-5,yes,2,left,right_up,no,recurrence-events


In [4]:
df.dropna(inplace=True)
df.isna().sum()

age            0
menopause      0
tumor-size     0
inv-nodes      0
node-caps      0
deg-malig      0
breast         0
breast-quad    0
irradiat       0
Class          0
dtype: int64

In [5]:
df.Class.replace({'recurrence-events':1,'no-recurrence-events':0},inplace=True)
df.irradiat.replace({'no':0,'yes':1},inplace=True)
df['node-caps'].replace({'no':0,'yes':1},inplace=True)
df['breast'].replace({'right':0,'left':1},inplace=True)
df['menopause'].replace({'premeno':0,'ge40':1,'lt40':2},inplace=True)
df['breast-quad'].replace({'left_low':0,'left_up':1,'central':2,'right_low':3,'right_up':4},inplace=True)
df['age'].replace({'20-29':0,'30-39':1,'40-49':1,'50-59':2,'60-69':2,'70-79':3},inplace=True)
df['inv-nodes'].replace({'0-2':0,'3-5':1,'6-8':2,'9-11':3,'12-14':4,'15-17':5,'24-26':6},inplace=True)
df['tumor-size'].replace({'0-4':0,'5-9':0,'10-14':1,'15-19':1,'20-24':2,'25-29':2,'30-34':3,'35-39':3,'40-44':4,'45-49':4,'50-54':5},inplace=True)

In [6]:
df.head()

Unnamed: 0,age,menopause,tumor-size,inv-nodes,node-caps,deg-malig,breast,breast-quad,irradiat,Class
0,1,0,1,0,1,3,0,1,0,1
1,2,1,1,0,0,1,0,2,0,0
2,2,1,3,0,0,2,1,0,0,1
3,1,0,3,0,1,3,0,0,1,0
4,1,0,3,1,1,2,1,4,0,1


In [7]:
### Divide the dataset into independent and dependent dataset
y=df['Class']
X=df[df.columns.difference(['Class'])]

In [8]:
X.head()

Unnamed: 0,age,breast,breast-quad,deg-malig,inv-nodes,irradiat,menopause,node-caps,tumor-size
0,1,0,1,3,0,0,0,1,1
1,2,0,2,1,0,0,1,0,1
2,2,1,0,2,0,0,1,0,3
3,1,0,0,3,0,1,0,1,3
4,1,1,4,2,1,0,0,1,3


In [9]:
df.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 277 entries, 0 to 285
Data columns (total 10 columns):
 #   Column       Non-Null Count  Dtype
---  ------       --------------  -----
 0   age          277 non-null    int64
 1   menopause    277 non-null    int64
 2   tumor-size   277 non-null    int64
 3   inv-nodes    277 non-null    int64
 4   node-caps    277 non-null    int64
 5   deg-malig    277 non-null    int64
 6   breast       277 non-null    int64
 7   breast-quad  277 non-null    int64
 8   irradiat     277 non-null    int64
 9   Class        277 non-null    int64
dtypes: int64(10)
memory usage: 23.8 KB


In [10]:
X

Unnamed: 0,age,breast,breast-quad,deg-malig,inv-nodes,irradiat,menopause,node-caps,tumor-size
0,1,0,1,3,0,0,0,1,1
1,2,0,2,1,0,0,1,0,1
2,2,1,0,2,0,0,1,0,3
3,1,0,0,3,0,1,0,1,3
4,1,1,4,2,1,0,0,1,3
...,...,...,...,...,...,...,...,...,...
281,2,1,0,2,2,0,1,1,3
282,2,1,0,2,1,1,0,1,2
283,1,0,4,2,2,0,0,1,3
284,2,0,0,2,0,0,0,0,1


In [11]:
### Train Test split
from sklearn.model_selection import train_test_split
X_train,X_test,y_train,y_test=train_test_split(X,y,train_size=0.75,random_state=1)

In [12]:
from sklearn.ensemble import RandomForestRegressor
regressor = RandomForestRegressor(n_estimators=200).fit(X_train,y_train)

#### Lets Understand Our Model With Shapash 
In this section, we use the SmartExplainer Object from shapash.

- It allows users to understand how the model works with the specified data.
- This object must be used only for data mining step. Shapash provides another object for deployment.


### Installing Shapash

In [13]:
!pip install shapash



In [14]:
from shapash.explainer.smart_explainer import SmartExplainer

In [15]:
xpl = SmartExplainer()

In [16]:
xpl.compile(
    x=X_test,
    model=regressor,
   
)

Backend: Shap TreeExplainer


In [17]:
xpl

<shapash.explainer.smart_explainer.SmartExplainer at 0x2322b134970>

#### Lets Understand the results of your trained model
Then, we can easily get a first summary of the explanation of the model results.

- Here, we chose to get the 3 most contributive features for each prediction.
- We used a wording to get features names more understandable in operationnal case.

In [18]:
app = xpl.run_app(title_story='Tips Dataset')





Dash is running on http://0.0.0.0:8050/



INFO:root:Your Shapash application run on http://PIPA-BLR-01:8050/
INFO:root:Use the method .kill() to down your app.
INFO:shapash.webapp.smart_app:Dash is running on http://0.0.0.0:8050/



 * Serving Flask app "shapash.webapp.smart_app" (lazy loading)
 * Environment: production
   Use a production WSGI server instead.
 * Debug mode: off


INFO:werkzeug: * Running on http://0.0.0.0:8050/ (Press CTRL+C to quit)
INFO:werkzeug:192.168.0.108 - - [12/Apr/2021 11:46:06] "[37mGET / HTTP/1.1[0m" 200 -
INFO:werkzeug:192.168.0.108 - - [12/Apr/2021 11:46:06] "[37mGET /_dash-dependencies HTTP/1.1[0m" 200 -
INFO:werkzeug:192.168.0.108 - - [12/Apr/2021 11:46:06] "[37mGET /_dash-layout HTTP/1.1[0m" 200 -
INFO:werkzeug:192.168.0.108 - - [12/Apr/2021 11:46:07] "[37mPOST /_dash-update-component HTTP/1.1[0m" 200 -
INFO:werkzeug:192.168.0.108 - - [12/Apr/2021 11:46:07] "[37mPOST /_dash-update-component HTTP/1.1[0m" 200 -
INFO:werkzeug:192.168.0.108 - - [12/Apr/2021 11:46:07] "[37mPOST /_dash-update-component HTTP/1.1[0m" 200 -
INFO:werkzeug:192.168.0.108 - - [12/Apr/2021 11:46:07] "[37mPOST /_dash-update-component HTTP/1.1[0m" 200 -
INFO:werkzeug:192.168.0.108 - - [12/Apr/2021 11:46:07] "[37mPOST /_dash-update-component HTTP/1.1[0m" 200 -
INFO:werkzeug:192.168.0.108 - - [12/Apr/2021 11:46:07] "[37mPOST /_dash-update-compone

In [19]:
predictor = xpl.to_smartpredictor()

In [20]:
predictor.save('./predictor.pkl')

In [21]:
from shapash.utils.load_smartpredictor import load_smartpredictor
predictor_load = load_smartpredictor('./predictor.pkl')

#### Make a prediction with your SmartPredictor
In order to make new predictions and summarize local explainability of your model on new datasets, you can use the method add_input of the SmartPredictor.

- The add_input method is the first step to add a dataset for prediction and explainability.
- It checks the structure of the dataset, the prediction and the contribution if specified.
- It applies the preprocessing specified in the initialisation and reorder the features with the order used by the model. (see the documentation of this method)
- In API mode, this method can handle dictionnaries data which can be received from a GET or a POST request.
- Add data
- The x input in add_input method doesn't have to be encoded, add_input applies preprocessing.

In [22]:
predictor_load.add_input(x=X, ypred=y)

In [23]:
detailed_contributions = predictor_load.detail_contributions()


In [24]:
detailed_contributions.head()

Unnamed: 0,Class,age,breast,breast-quad,deg-malig,inv-nodes,irradiat,menopause,node-caps,tumor-size
0,1,0.034791,0.030547,-0.031297,0.178791,-0.061735,0.015443,0.041715,0.235538,0.019251
1,0,-0.034982,0.017184,-0.003064,-0.071434,-0.035753,-0.02723,-0.013812,-0.019649,-0.073216
2,1,-0.011498,-0.01519,0.03941,0.03847,-0.031347,-0.008798,0.003745,-0.001812,0.142742
3,0,-0.010328,-0.031068,-0.052992,0.043134,-0.116609,0.009838,0.003155,0.025423,0.002491
4,1,0.011813,0.017798,0.115992,-0.114426,0.082429,-0.062575,0.012816,0.082495,0.031702


#### Summarize explanability of the predictions
- You can use the summarize method to summarize your local explainability
- This summary can be configured with modify_mask method so that you have explainability that meets your operational needs.

In [25]:

predictor_load.modify_mask(max_contrib=3)

In [26]:
explanation = predictor_load.summarize()

In [27]:
explanation.head()

Unnamed: 0,Class,feature_1,value_1,contribution_1,feature_2,value_2,contribution_2,feature_3,value_3,contribution_3
0,1,node-caps,1,0.235538,deg-malig,3,0.178791,inv-nodes,0,-0.0617353
1,0,tumor-size,1,-0.0732165,deg-malig,1,-0.0714341,inv-nodes,0,-0.0357527
2,1,tumor-size,3,0.142742,breast-quad,0,0.0394096,deg-malig,2,0.0384705
3,0,inv-nodes,0,-0.116609,breast-quad,0,-0.0529921,deg-malig,3,0.0431342
4,1,breast-quad,4,0.115992,deg-malig,2,-0.114426,node-caps,1,0.0824953
