Skip to content

Commit

Permalink
Browse files Browse the repository at this point in the history
… into master
  • Loading branch information
oegedijk committed Dec 17, 2023
2 parents 6d13398 + 263b370 commit 850c1d6
Show file tree
Hide file tree
Showing 2 changed files with 9 additions and 7 deletions.
2 changes: 1 addition & 1 deletion .github/workflows/explainerdashboard.yml
Original file line number Diff line number Diff line change
Expand Up @@ -41,4 +41,4 @@ jobs:
flake8 . --count --exit-zero --max-complexity=10 --max-line-length=127 --statistics
- name: Test with pytest
run: |
pytest
pytest -k "not selenium"
14 changes: 8 additions & 6 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ that explains the workings of a (scikit-learn compatible) machine
learning model. The dashboard provides interactive plots on model performance,
feature importances, feature contributions to individual predictions,
"what if" analysis,
partial dependence plots, SHAP (interaction) values, visualisation of individual
partial dependence plots, SHAP (interaction) values, visualization of individual
decision trees, etc.

You can also interactively explore components of the dashboard in a
Expand Down Expand Up @@ -218,23 +218,25 @@ There are a few tricks to make this less painful:
values can be very slow to calculate, and often are not needed for analysis.
For permutation importances you can set the `n_jobs` parameter to speed up
the calculation in parallel.
2. Storing the explainer. The calculated properties are only calculated once
2. Calculate approximate shap values. You can pass approximate=True as a shap parameter by
passing `shap_kwargs=dict(approximate=True)` to the explainer initialization.
4. Storing the explainer. The calculated properties are only calculated once
for each instance, however each time when you instantiate a new explainer
instance they will have to be recalculated. You can store them with
`explainer.dump("explainer.joblib")` and load with e.g.
`ClassifierExplainer.from_file("explainer.joblib")`. All calculated properties
are stored along with the explainer.
3. Using a smaller (test) dataset, or using smaller decision trees.
5. Using a smaller (test) dataset, or using smaller decision trees.
TreeShap computational complexity is `O(TLD^2)`, where `T` is the
number of trees, `L` is the maximum number of leaves in any tree and
`D` the maximal depth of any tree. So reducing the number of leaves or average
depth in the decision tree can really speed up SHAP calculations.
4. Pre-computing shap values. Perhaps you already have calculated the shap values
6. Pre-computing shap values. Perhaps you already have calculated the shap values
somewhere, or you can calculate them off on a giant cluster somewhere, or
your model supports [GPU generated shap values](https://github.com/rapidsai/gputreeshap).
You can simply add these pre-calculated shap values to the explainer
with `explainer.set_shap_values()` and `explainer.set_shap_interaction_values()` methods.
5. Plotting only a random sample of points. When you have a lots of observations,
7. Plotting only a random sample of points. When you have a lots of observations,
simply rendering the plots may get slow as well. You can pass the `plot_sample`
parameter to render a (different each time) random sample of observations
for the various scatter plots in the dashboard. E.g.:
Expand Down Expand Up @@ -536,7 +538,7 @@ In order to reduce the memory footprint there are a number of things you can do:
2. Setting a lower precision. By default shap values are stored as `'float64'`,
but you can store them as `'float32'` instead and save half the space:
```ClassifierExplainer(model, X_test, y_test, precision='float32')```. You
can also set a lower precision on your `X_test` dataset yourself ofcourse.
can also set a lower precision on your `X_test` dataset yourself of course.
3. For multi class classifier, by default `ClassifierExplainer` calculates
shap values for all classes. If you're only interested in a single class
you can drop the other shap values: `explainer.keep_shap_pos_label_only(pos_label)`
Expand Down

0 comments on commit 850c1d6

Please sign in to comment.