# Imports

In [None]:
%load_ext autoreload
%autoreload 0

In [None]:
# import * is safe: a restrictive __all__ has been defined in the modules
from explainerdashboard.explainers import *
from explainerdashboard.dashboards import *
from explainerdashboard.datasets import *

In [None]:
feature_descriptions = {
    "Sex": "Gender of passenger",
    "Deck": "The deck the passenger had their cabin on",
    "PassengerClass": "The class of the ticket: 1st, 2nd or 3rd class",
    "Fare": "The amount of money people paid", 
    "No_of_relatives_on_board": "number of siblings, spouses, parents plus children on board",
    "Embarked": "the port where the passenger boarded the Titanic. Either Southampton, Cherbourg or Queenstown",
    "Age": "Age of the passenger",
    "No_of_siblings_plus_spouses_on_board": "The sum of the number of siblings plus the number of spouses on board",
    "No_of_parents_plus_children_on_board" : "The sum of the number of parents plus the number of children on board",
}

# ClassifierExplainer examples

## Load classifier data:
    - predicting probability that a person on the titanic survived

In [None]:
X_train, y_train, X_test, y_test = titanic_survive()
train_names, test_names = titanic_names()

In [None]:
X_train.head()

We'll use the passenger names later as idxs for the Explainer, such that they get displayed on the contributions tab of the dashboard, and you can also use them to pass as an index into various methods:

In [None]:
train_names[:5]

## One line example:
- click on the link (http://localhost:8050) to go to the dashboard
- Interrupt the kernel to stop the dashboard

In [None]:
from sklearn.ensemble import RandomForestClassifier

In [None]:
ExplainerDashboard(ClassifierExplainer(RandomForestClassifier().fit(X_train, y_train), X_test, y_test)).run()

## Multi line example
- create an explainer object out the model and the X and y that you wish to display.
- the explainer object calculates shap values, permutation importances, pdp's, etc, and provides all kinds of plots that will be used by the ExplainerDashboard object

In [None]:
model = RandomForestClassifier(n_estimators=50, max_depth=5)
model.fit(X_train, y_train)

explainer = ClassifierExplainer(model, X_test, y_test)

Now create an ExplainerDashboard instance out of the explainer instance:
- depending on which tabs are included, all necessary calculations (shap values, importances, etc) get done up front

In [None]:
db = ExplainerDashboard(explainer)

And run the dashboard on the default port (=8050):

In [None]:
db.run()

Or on another port, e.g. port 8051:

### Run on specific port

In [None]:
db.run(8051)

### Switch on/off specific tabs (+add title)

By default all the tabs that should work are displayed, exceptions:
- shap_interaction tab is disabled when model doesn't support shap interaction values
    - e.g. linear models, or when calculating shap values in probability space for gradient boosting models
- **Depending on your model and data calculating shap interaction values may be slow, so in that case switch off the interactions tab manually!**
- decision_trees tab is disabled unless explainer is RandomForestClassifierExplainer or RandomForestRegressionExplainer

- You can also manually switch tabs on or off with booleans, as shown below:

In [None]:
ExplainerDashboard(explainer, title="Titanic Explainer",
                        model_summary=True,  
                        contributions=True,
                        shap_dependence=False,
                        shap_interaction=False,
                        decision_trees=False).run()

### cats, idxs, descriptions, labels

You can make the dashboard a bit more user friendly by passing in some additional information about the variables in the model:

- `cats`: If you have onehotencoded some variables, you get a lot of shap values for binary features that are either 0 or 1, which are hard to interpret as a whole. 
    - However, given that shap values are additive, we can sum them up and give a single shap value for the onehotencoded variables! 
    - Furthermore, we can use different types of default plots for categorical variables than continuous ones. 
    - By passing a list of variables that have been encoded with `varname_category` `explainerdashboard` will allow you to group the cats and show appropriate plots
    - In our sample dataset this would be:
        - `Sex`: `Sex_female`, `Sex_male` 
        - `Deck`: `Deck_A`, `Deck_B`, etc
        - `Embarked`: `Embarked_Southampton`, `Embarked_Cherbourg`, etc
- `idxs`: You may have specific identifiers (names, customer id's, etc) for each row in your dataset.
    - If you pass these the the Explainer object, you can index using both the numerical index, e.g. `explainer.contrib_df(0)` for the first row, or using the identifier, e.g. `explainer.contrib_df("Braund, Mr. Owen Harris")` 
    - The proper name or id will also be displayed and searchable on the contributions tab
- `descriptions`: a dictionary of descriptions for each variable.
    - In order to be explanatory, you often have to explain the meaning of the features themselves (especially if the naming is not obvious)
    - Passing the dict along to `descriptions` will show hover-over tooltips for the various variables in the dashboard
- `labels`: The outcome variables for a classification problem are assumed to be encoded 0, 1 (, 2, 3, ...)
    - This is not very human readable, so you can pass a list of human readable labels

In [None]:
explainer = ClassifierExplainer(model, X_test, y_test,
                               cats=['Sex', 'Deck', 'Embarked'], # makes it easy to group onehotencoded vars
                               idxs=test_names, #names of passengers # index by name
                               descriptions=feature_descriptions, # show feature descriptions in plots
                               labels=['Not survived', 'Survived']) # show nice labels
ExplainerDashboard(explainer).run()

Can also pass a title and explicitly switch off certain tabs:

In [None]:
db = ExplainerDashboard(explainer, "Titanic Explainer",
                        model_summary=True,
                        contributions=False,
                        shap_dependence=True,
                        shap_interaction=False, # Linear models have no interactions
                        decision_trees=False)
db.run(8052)

### X_background, model_output and shap
- `X_background`: 
    - Some models like LogisticRegression (as well as certain gradienst boosting algorithms in probability space) need a background dataset to calculate shap values. These can be passed as `X_background`.
    - If you don't pass an `X_background`, Explainer uses X instead but gives off a warning.
- `model_output`: 
    - By default model_output for classifier is set to "probability", as this is more intuitively explainable to non data scientist stakeholders
    - However for certain models (e.g. XGBClassifier, LGBMCLassifier, CatBoostClassifier), need a background dataset X_background to calculate shap values in probability space, and are not able to calculate shap interaction values.
    - Therefore you can also pass `model_output='logodds'`, in which case shap values get calculated as logodds
- `shap`:
    - By default `shap='guess'`, which means that the Explainer will try to guess based on the model what kind of shap explainer it needs: e.g. `shap.TreeExplainer(...)`, `shap.LinearExplainer(...)`, etc.
    - In case the guess fails or you'd like to override it, you can set it manually:
        - e.g. `shap='tree'`, `shap='linear'`, `shap='kernel'`, `shap='deep'` 

In [None]:
from sklearn.linear_model import LogisticRegression

model = LogisticRegression()
model.fit(X_train, y_train)

explainer = ClassifierExplainer(model, X_test, y_test, 
                                    shap='linear', 
                                    X_background=X_train, 
                                    model_output='logodds')
ExplainerDashboard(explainer).run()

### XGBClassifier, LGBMClassifier, CatBoostClassifier
- default for ClassifierExplainer is `model_output='probability'`, but for most gradient boosting classifier algorithms (e.g xgboost, lightgbm, catboost):
    - You have to pass an `X_background` to calculate the shape values against (defaults to using `X`)
    - You can't calculate shap interaction values
- alternative is to pass model_output='logodds':
    - Can then calculate shap values based on trees alone (so no background data needed), and can calculate interaction values as well.
    - plus: faster

In [None]:
#from lightgbm.sklearn import LGBMClassifier
#model = LGBMClassifier()

#from catboost import CatBoostClassifier
#model = CatBoostClassifier(iterations=100, learning_rate=100)

from xgboost import XGBClassifier
model = XGBClassifier()

model.fit(X_train, y_train)

explainer = ClassifierExplainer(model, X_test, y_test, 
                                    X_background=X_train,
                                    cats=['Sex', 'Deck', 'Embarked'],
                                    idxs=test_names, #names of passengers 
                                    labels=['Not survived', 'Survived'])

In [None]:
ExplainerDashboard(explainer).run()

In [None]:
explainer = ClassifierExplainer(model, X_test, y_test, 
                                    model_output='logodds', # <---------
                                    cats=['Sex', 'Deck', 'Embarked'],
                                    idxs=test_names, #names of passengers 
                                    labels=['Not survived', 'Survived'])
ExplainerDashboard(explainer).run()

# RegressionExplainer

## Load regression data:
    - predicting the fare that a titanic passenger paid for their ticket

In [None]:
X_train, y_train, X_test, y_test = titanic_fare()
train_names, test_names = titanic_names()

In [None]:
X_train.head()

In [None]:
y_train.head()

## Adding units of target
- In this case we are predicting the price of the fare, so we can add the units as `"$"`
    - this will then be displayed along the axis of certain plots

In [None]:
from sklearn.ensemble import RandomForestRegressor
model = RandomForestRegressor(n_estimators=50, max_depth=5)
model.fit(X_train, y_train)

explainer = RegressionExplainer(model, X_test, y_test, 
                                cats=['Sex', 'Deck', 'Embarked'], 
                                idxs=test_names, 
                                units="$")

In [None]:
ExplainerDashboard(explainer).run()

## LGBMRegressor, LinearRegression, CatBoostRegressor, XGBRegressor

In [None]:
from lightgbm.sklearn import LGBMRegressor
model = LGBMRegressor()

# from sklearn.linear_model import LinearRegression
# model = LinearRegression()

# from catboost import CatBoostRegressor
# model = CatBoostRegressor(iterations=100, learning_rate=0.1, verbose=0)

# from xgboost import XGBRegressor
# model = XGBRegressor()

model.fit(X_train, y_train)
explainer = RegressionExplainer(model, X_test, y_test, 
                                cats=['Sex', 'Deck', 'Embarked'], 
                                idxs=test_names, 
                                units="$")

In [None]:
ExplainerDashboard(explainer).run()

# Multiclass Classifiers

In [None]:
X_train, y_train, X_test, y_test = titanic_class()
train_names, test_names = titanic_names()

In [None]:
from sklearn.ensemble import RandomForestClassifier
model = RandomForestClassifier()

model.fit(X_train, y_train)

explainer = ClassifierExplainer(model, X_test, y_test, 
                                    cats=['Sex', 'Deck', 'Embarked'],
                                    idxs=test_names, 
                                    labels=['First Class', 'Second Class', ' Third Class'],
                                    pos_label='First Class')

In [None]:
ExplainerDashboard(explainer).run()

# RandomForestsClassifierExplainer, RandomForestRegresionExplainer

visualize individual decision trees

## RandomForestClassifier

In [None]:
from sklearn.ensemble import RandomForestClassifier

X_train, y_train, X_test, y_test = titanic_survive()
train_names, test_names = titanic_names()

model = RandomForestClassifier(n_estimators=50, max_depth=5)

model.fit(X_train, y_train)

explainer = RandomForestClassifierExplainer(model, X_test, y_test, 
                                    cats=['Sex', 'Deck', 'Embarked'],
                                    idxs=test_names, 
                                    labels=['Not survived', 'Survived'])
ExplainerDashboard(explainer).run()

## RandomForestRegressor

In [None]:
from sklearn.ensemble import RandomForestRegressor
X_train, y_train, X_test, y_test = titanic_fare()
train_names, test_names = titanic_names()

model = RandomForestRegressor(n_estimators=50, max_depth=5)
model.fit(X_train, y_train)

explainer = RandomForestRegressionExplainer(model, X_test, y_test, 
                                cats=['Sex', 'Deck', 'Embarked'], 
                                idxs=test_names, 
                                units="$")

ExplainerDashboard(explainer).run()