Skip to content

Commit

Permalink
updated README
Browse files Browse the repository at this point in the history
  • Loading branch information
oegedijk committed Jun 17, 2020
1 parent 351abf5 commit 2e02365
Showing 1 changed file with 10 additions and 8 deletions.
18 changes: 10 additions & 8 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ by: Oege Dijk
This package makes it convenient to quickly explain the workings of a (scikit-learn compatible)
fitted machine learning model using either interactive plots in e.g. Jupyter Notebook or
deploying an interactive dashboard (based on Flask/Dash) that allows you to quickly explore
the impact of different features on model predictions. Example deployed at: [titanicexplainer.herokuapp.com](http://titanicexplainer.herokuapp.com), detailed documentation at [explainerdashboard.readthedocs.io](explainerdashboard.readthedocs.io).
the impact of different features on model predictions. Example deployed at: [titanicexplainer.herokuapp.com](http://titanicexplainer.herokuapp.com), detailed documentation at [explainerdashboard.readthedocs.io](explainerdashboard.readthedocs.io), example notebook on how to launch dashboard for different models [here](https://github.com/oegedijk/explainerdashboard/blob/master/dashboard_examples.ipynb), and an example notebook on how to interact with the explainer object [here](https://github.com/oegedijk/explainerdashboard/blob/master/explainer_examples.ipynb).

In a lot of organizations, especially governmental, but with the GDPR also increasingly in private sector, it is becoming more and more important to be able to explain the inner workings of your machine learning algorithms. Customers have to some extent a right to an explanation why they were selected, and more and more internal and external regulators require it. With recent innovations in explainable AI (e.g. SHAP values) the old black box trope is nog longer valid, but it can still take quite a bit of data wrangling and plot manipulation to get the explanations out of a model. This library aims to make this easy.

Expand All @@ -24,7 +24,7 @@ The library includes:
- Plus for classifiers: precision plots, confusion matrix, ROC AUC plot, PR AUC plot, etc
- For regression models: goodness-of-fit plots, residual plots, etc.

The library is designed to be modular so that it should be easy to design your own interactive dashboards with plotly dash, with most of the work of calculating and formatting data, and rendering plots and tables handled by *explainerdashboard*, so that you can focus on the layout, logic of the interactions, and project specific textual explanations of the dashboard. (i.e. design it so that it will be interpretable for business users in your organization, not just data scientists)
The library is designed to be modular so that it should be easy to design your own interactive dashboards with plotly dash, with most of the work of calculating and formatting data, and rendering plots and tables handled by `explainerdashboard`, so that you can focus on the layout, logic of the interactions, and project specific textual explanations of the dashboard. (i.e. design it so that it will be interpretable for business users in your organization, not just data scientists)

Alternatively, there is a built-in standard dashboard with pre-built tabs that you can select individually. Fitting a model, building the explainer object, building the dashboard, and then running it is as simple as:

Expand All @@ -39,7 +39,7 @@ from explainerdashboard.datasets import *
X_train, y_train, X_test, y_test = titanic_survive()
train_names, test_names = titanic_names()
# fit the mode:
# fit the model:
model = RandomForestClassifier(n_estimators=50, max_depth=5)
model.fit(X_train, y_train)
Expand Down Expand Up @@ -69,7 +69,10 @@ Documentation can be found at [explainerdashboard.readthedocs.io](https://explai

### Constructing an ExplainerBunch

The package works by first constructing an ExplainerBunch object. You can then use this ExplainerBunch to manually call different plots, or to start the dashboard. You construct the ExplainerBunch instancefrom your fitted `model`, a feature matrix `X`, and optionally the corresponding target values `y`.
The package works by first constructing an `Explainer` object. You can then use
this `Explainer` to manually call different plots, or pass it on to an `ExplainerDashboard`
object. You construct the `Explainer` instancefrom your fitted `model`, a feature matrix `X`,
and optionally the corresponding target values `y`.

In addition you can pass:
- `metric`: permutation importances get calculated against a particular metric (for regression defaults to `r2_score` and for classification to `roc_auc_score`)
Expand All @@ -78,6 +81,7 @@ In addition you can pass:
- `model_output`: for classification models either 'logodds' or 'probability', defaults to 'probability'
- `cats`: a list of onehot encoded variables (e.g. if encoded as 'Gender_Female', 'Gender_Male' you would pass `cats=['Gender']`). This allows you to group the onehotencoded columns together in various plots with the argument `cats=True`.
- `idxs`: a list of indentifiers for each row in your dataset. This makes it easier to look up predictions for specific id's.
- `descriptions`: a dictionary of descriptions of the meaning of individual variables.
- `labels`: for classifier models a list of labels for the classes of your model.
- `na_fill`: Value used to fill in missing values (default to -999)

Expand All @@ -90,8 +94,8 @@ train_names, test_names = titanic_names()
model = RandomForestClassifier(n_estimators=50, max_depth=5)
model.fit(X_train, y_train)
explainer = RandomForestClassifierExplainer(model, X_test, y_test, roc_auc_score,
shap='tree', X_background=None, model_output='probability',
explainer = RandomForestClassifierExplainer(model, X_test, y_test,
X_background=None, model_output='probability',
cats=['Sex', 'Deck', 'Embarked'],
idxs=test_names, #names of passengers
labels=['Not survived', 'Survived'])
Expand Down Expand Up @@ -131,8 +135,6 @@ If you wish to use e.g. gunicorn to deploy the dashboard you should add `server

It may take some time to calculate all the properties of the ExplainerBunch (especially shap interaction values). However all properties get calculated lazily, so they are only calculated when you call a plot or table that depends on them. To save startup time you can save the ExplainerBunch to disk with e.g. joblib and then load the ExplainerBunch with pre-calculated properties whenever you wish to start the dashboard.

See [dashboard_examples.ipynb](dashboard_examples.ipynb)


## Deployed example:

Expand Down

0 comments on commit 2e02365

Please sign in to comment.