# Explanaible AI - Shapash

In this notebook are followed the basic tutorial of shapash to create and app for the dataset california house price

**Tutorial shapash:**
- https://shapash.readthedocs.io/en/latest/tutorials/tutorial01-Shapash-Overview-Launch-WebApp.html

**Tutorial shap - california house price:**
- https://shap.readthedocs.io/en/latest/example_notebooks/overviews/An%20introduction%20to%20explainable%20AI%20with%20Shapley%20values.html

#### Important: to work this package, the datasets need to be a pd.dataframe or pd.series

## RUN

In [1]:
import warnings
warnings.filterwarnings('ignore')

import pandas as pd
import numpy as np
import pickle
import matplotlib.pyplot as plt
import sklearn

import plotly.graph_objs as go
from plotly.subplots import make_subplots
import plotly.express as px

# models
from sklearn.model_selection import train_test_split

from sklearn.linear_model import LinearRegression # lr
from sklearn.linear_model import Ridge # ridge
from sklearn.linear_model import Lasso # lasso
from sklearn.tree import DecisionTreeRegressor # tree
from sklearn.ensemble import GradientBoostingRegressor #gb
from sklearn.ensemble import RandomForestRegressor #rf
from xgboost import XGBRegressor # xgb
from  sklearn.neural_network import MLPRegressor # mlp

# shap
import shap

# shapash
from shapash import SmartExplainer

### 0. Global params

### 1. Load data

In [2]:
# a classic housing price dataset
data_X, data_y = shap.datasets.california(n_points=1000)
data_y = pd.Series(data_y, index = data_X.index)

In [3]:
data_X.head()

Unnamed: 0,MedInc,HouseAge,AveRooms,AveBedrms,Population,AveOccup,Latitude,Longitude
14740,4.1518,22.0,5.663073,1.075472,1551.0,4.180593,32.58,-117.05
10101,5.7796,32.0,6.107226,0.927739,1296.0,3.020979,33.92,-117.97
20566,4.3487,29.0,5.930712,1.026217,1554.0,2.910112,38.65,-121.84
2670,2.4511,37.0,4.992958,1.316901,390.0,2.746479,33.2,-115.6
15709,5.0049,25.0,4.319261,1.039578,649.0,1.712401,37.79,-122.43


In [4]:
data_y.head()

14740    1.369
10101    2.413
20566    2.007
2670     0.725
15709    4.600
dtype: float64

In [5]:
# generate a dictionary with the explanation of features
dit_features_definition = {
    "MedInc": "median income in block group",
    "HouseAge": "median house age in block group",
    "AveRooms": "average number of rooms per household",
    "AveBedrms": "average number of bedrooms per household",
    "Population": "block group population",
    "AveOccup": "average number of household members",
    "Latitude": "block group latitude",
    "Longitude": "block group longitude"
}
dit_features_definition

{'MedInc': 'median income in block group',
 'HouseAge': 'median house age in block group',
 'AveRooms': 'average number of rooms per household',
 'AveBedrms': 'average number of bedrooms per household',
 'Population': 'block group population',
 'AveOccup': 'average number of household members',
 'Latitude': 'block group latitude',
 'Longitude': 'block group longitude'}

In [6]:
# split train test
X_train, X_test, y_train, y_test = train_test_split(data_X, data_y, train_size = 0.75, random_state = 42)

In [7]:
X_train.shape

(750, 8)

In [8]:
y_train.shape

(750,)

### 2. Train models

In [9]:
# train lr
lr = LinearRegression()
lr.fit(X_train, y_train)

In [10]:
# explore coeficients linear regression
print("Model coefficients:\n")
for i in range(X_train.shape[1]):
    print(X_train.columns[i], "=", lr.coef_[i].round(5))

Model coefficients:

MedInc = 0.42384
HouseAge = 0.00711
AveRooms = -0.15719
AveBedrms = 0.86798
Population = 3e-05
AveOccup = -0.24511
Latitude = -0.4703
Longitude = -0.46562


In [11]:
# train rf simple
param_n_trees = 3
rf_simple = RandomForestRegressor(n_estimators = param_n_trees,
                                  random_state = 42,
                                 min_samples_split = 0.2,
                                    min_samples_leaf = 0.1)
                                   #max_depth = 2)

rf_simple.fit(X_train, y_train)

### 3. Shapash

### 3.1 Declare and Compile SmartExplainer

In [12]:
from shapash import SmartExplainer

In [13]:
xpl = SmartExplainer(
    model = lr,
    #features_dict = dit_features_definition # optional parameter, specifies label for features name
)

In [14]:
xpl.compile(x = X_test,
            y_target = y_test # Optional: allows to display True Values vs Predicted Values
           )

INFO: Shap explainer type - <shap.explainers._exact.ExactExplainer object at 0x000001F61A1742B0>


#### 3.2 Start web app

In [15]:
app = xpl.run_app(title_story = 'California House Prices', port=8020)
#app = xpl.run_app(title_story = 'California House Prices', port=8030)

INFO:root:Your Shapash application run on http://101STLJORTEGA:8020/
INFO:root:Use the method .kill() to down your app.


#### 3.3 End web app

OBS: This code doesn't work, but reset kernel in the jupyter notebook works

In [16]:
# app.kill()