# Explanaible AI - Shapash

In this notebook is replicated the basic tutorial of shapash

Tutorial:
https://shapash.readthedocs.io/en/latest/tutorials/tutorial01-Shapash-Overview-Launch-WebApp.html

In [18]:
import pandas as pd
from category_encoders import OrdinalEncoder
from sklearn.model_selection import train_test_split
from sklearn.ensemble import ExtraTreesRegressor

# models
from sklearn.linear_model import LinearRegression
from lightgbm import LGBMRegressor

# shapash
from shapash import SmartExplainer

### 1. Load data

In [19]:
from shapash.data.data_loader import data_loading
house_df, house_dict = data_loading('house_prices')

In [25]:
house_df.head()

Unnamed: 0_level_0,MSSubClass,MSZoning,LotArea,Street,LotShape,LandContour,Utilities,LotConfig,LandSlope,Neighborhood,...,EnclosedPorch,3SsnPorch,ScreenPorch,PoolArea,MiscVal,MoSold,YrSold,SaleType,SaleCondition,SalePrice
Id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
1,2-Story 1946 & Newer,Residential Low Density,8450,Paved,Regular,Near Flat/Level,"All public Utilities (E,G,W,& S)",Inside lot,Gentle slope,College Creek,...,0,0,0,0,0,2,2008,Warranty Deed - Conventional,Normal Sale,208500
2,1-Story 1946 & Newer All Styles,Residential Low Density,9600,Paved,Regular,Near Flat/Level,"All public Utilities (E,G,W,& S)",Frontage on 2 sides of property,Gentle slope,Veenker,...,0,0,0,0,0,5,2007,Warranty Deed - Conventional,Normal Sale,181500
3,2-Story 1946 & Newer,Residential Low Density,11250,Paved,Slightly irregular,Near Flat/Level,"All public Utilities (E,G,W,& S)",Inside lot,Gentle slope,College Creek,...,0,0,0,0,0,9,2008,Warranty Deed - Conventional,Normal Sale,223500
4,2-Story 1945 & Older,Residential Low Density,9550,Paved,Slightly irregular,Near Flat/Level,"All public Utilities (E,G,W,& S)",Corner lot,Gentle slope,Crawford,...,272,0,0,0,0,2,2006,Warranty Deed - Conventional,Abnormal Sale,140000
5,2-Story 1946 & Newer,Residential Low Density,14260,Paved,Slightly irregular,Near Flat/Level,"All public Utilities (E,G,W,& S)",Frontage on 2 sides of property,Gentle slope,Northridge,...,0,0,0,0,0,12,2008,Warranty Deed - Conventional,Normal Sale,250000


In [27]:
house_dict

{'1stFlrSF': 'First Floor square feet',
 '2ndFlrSF': 'Second floor square feet',
 '3SsnPorch': 'Three season porch area in square feet',
 'BedroomAbvGr': 'Bedrooms above grade',
 'BldgType': 'Type of dwelling',
 'BsmtCond': 'General condition of the basement',
 'BsmtExposure': 'Refers to walkout or garden level walls',
 'BsmtFinSF1': 'Type 1 finished square feet',
 'BsmtFinSF2': 'Type 2 finished square feet',
 'BsmtFinType1': 'Rating of basement finished area',
 'BsmtFinType2': 'Rating of basement finished area (if present)',
 'BsmtFullBath': 'Basement full bathrooms',
 'BsmtHalfBath': 'Basement half bathrooms',
 'BsmtQual': 'Height of the basement',
 'BsmtUnfSF': 'Unfinished square feet of basement area',
 'CentralAir': 'Central air conditioning',
 'Condition1': 'Proximity to various conditions',
 'Condition2': 'Proximity to other various conditions',
 'Electrical': 'Electrical system',
 'EnclosedPorch': 'Enclosed porch area in square feet',
 'ExterCond': "Exterior materials' conditio

In [22]:
y_df=house_df['SalePrice'].to_frame()
X_df=house_df[house_df.columns.difference(['SalePrice'])]

In [24]:
house_df.shape

(1460, 73)

### 2. Encoding Categorical Features

In [28]:
from category_encoders import OrdinalEncoder

categorical_features = [col for col in X_df.columns if X_df[col].dtype == 'object']

encoder = OrdinalEncoder(
    cols=categorical_features,
    handle_unknown='ignore',
    return_df=True).fit(X_df)

X_df = encoder.transform(X_df)

In [29]:
X_df.head()

Unnamed: 0_level_0,1stFlrSF,2ndFlrSF,3SsnPorch,BedroomAbvGr,BldgType,BsmtCond,BsmtExposure,BsmtFinSF1,BsmtFinSF2,BsmtFinType1,...,SaleType,ScreenPorch,Street,TotRmsAbvGrd,TotalBsmtSF,Utilities,WoodDeckSF,YearBuilt,YearRemodAdd,YrSold
Id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
1,856,854,0,3,1,1,1,706,0,1,...,1,0,1,8,856,1,0,2003,2003,2008
2,1262,0,0,3,1,1,2,978,0,2,...,1,0,1,6,1262,1,298,1976,1976,2007
3,920,866,0,3,1,1,3,486,0,1,...,1,0,1,6,920,1,0,2001,2002,2008
4,961,756,0,3,1,2,1,216,0,2,...,1,0,1,7,756,1,0,1915,1970,2006
5,1145,1053,0,4,1,1,4,655,0,1,...,1,0,1,9,1145,1,192,2000,2000,2008


### 3. Train / Test Split

In [30]:
Xtrain, Xtest, ytrain, ytest = train_test_split(X_df, y_df, train_size=0.75, random_state=1)

### 4. Model Fitting

In [31]:
regressor = LGBMRegressor(n_estimators=100).fit(Xtrain,ytrain)

[LightGBM] [Info] Auto-choosing row-wise multi-threading, the overhead of testing was 0.000361 seconds.
You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 2986
[LightGBM] [Info] Number of data points in the train set: 1095, number of used features: 66
[LightGBM] [Info] Start training from score 182319.757078


# Understanding my model with shapash

### 5. Declare and Compile SmartExplainer

In [32]:
from shapash import SmartExplainer

In [33]:
xpl = SmartExplainer(
    model=regressor,
    preprocessing=encoder,   # Optional: compile step can use inverse_transform method
    features_dict=house_dict # optional parameter, specifies label for features name
)

In [34]:
xpl.compile(x=Xtest,
            y_target=ytest # Optional: allows to display True Values vs Predicted Values
           )

INFO: Shap explainer type - <shap.explainers._tree.TreeExplainer object at 0x0000021F247D60E0>


### 6. Start WebApp

In [35]:
app = xpl.run_app(title_story='House Prices', port=8020)

INFO:root:Your Shapash application run on http://101STLJORTEGA:8020/
INFO:root:Use the method .kill() to down your app.
Exception in thread Thread-9 (<lambda>):
Traceback (most recent call last):
  File "D:\Anaconda\envs\data-science-python-3-10-explanaible-ai\lib\site-packages\urllib3\connection.py", line 203, in _new_conn
    sock = connection.create_connection(
  File "D:\Anaconda\envs\data-science-python-3-10-explanaible-ai\lib\site-packages\urllib3\util\connection.py", line 85, in create_connection
    raise err
  File "D:\Anaconda\envs\data-science-python-3-10-explanaible-ai\lib\site-packages\urllib3\util\connection.py", line 73, in create_connection
    sock.connect(sa)
OSError: [WinError 10049] La dirección solicitada no es válida en este contexto

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "D:\Anaconda\envs\data-science-python-3-10-explanaible-ai\lib\site-packages\urllib3\connectionpool.py", line 791, in urlo

### 7. Stop the WebApp after using it

OBS: This code doesn't work, but reset kernel in the jupyter notebook works

In [38]:
# app.kill()

### 8. Export local explaination in DataFrame

In [39]:
summary_df= xpl.to_pandas(
    max_contrib=3, # Number Max of features to show in summary
    threshold=5000,
)

In [40]:
summary_df.head()

Unnamed: 0,pred,feature_1,value_1,contribution_1,feature_2,value_2,contribution_2,feature_3,value_3,contribution_3
259,210110.151269,Ground living area square feet,1792,13901.94265,Overall material and finish of the house,7,13388.78542,Total square feet of basement area,963.0,-5785.099112
268,177235.867901,Ground living area square feet,2192,27473.618384,Overall material and finish of the house,5,-26452.634884,Overall condition of the house,8.0,6804.442167
289,112646.087826,Overall material and finish of the house,5,-25938.289451,Ground living area square feet,900,-16286.662175,Total square feet of basement area,882.0,-5586.17177
650,74998.294879,Overall material and finish of the house,4,-34883.385439,Ground living area square feet,630,-21321.74224,Total square feet of basement area,630.0,-12563.167906
1234,139796.103166,Overall material and finish of the house,5,-26786.433337,Ground living area square feet,1188,-11315.446699,,,
