Skip to content

Commit

Permalink
Update README.md
Browse files Browse the repository at this point in the history
  • Loading branch information
oegedijk committed Dec 17, 2023
1 parent 1a45500 commit 263b370
Showing 1 changed file with 6 additions and 4 deletions.
10 changes: 6 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -218,23 +218,25 @@ There are a few tricks to make this less painful:
values can be very slow to calculate, and often are not needed for analysis.
For permutation importances you can set the `n_jobs` parameter to speed up
the calculation in parallel.
2. Storing the explainer. The calculated properties are only calculated once
2. Calculate approximate shap values. You can pass approximate=True as a shap parameter by
passing `shap_kwargs=dict(approximate=True)` to the explainer initialization.
4. Storing the explainer. The calculated properties are only calculated once
for each instance, however each time when you instantiate a new explainer
instance they will have to be recalculated. You can store them with
`explainer.dump("explainer.joblib")` and load with e.g.
`ClassifierExplainer.from_file("explainer.joblib")`. All calculated properties
are stored along with the explainer.
3. Using a smaller (test) dataset, or using smaller decision trees.
5. Using a smaller (test) dataset, or using smaller decision trees.
TreeShap computational complexity is `O(TLD^2)`, where `T` is the
number of trees, `L` is the maximum number of leaves in any tree and
`D` the maximal depth of any tree. So reducing the number of leaves or average
depth in the decision tree can really speed up SHAP calculations.
4. Pre-computing shap values. Perhaps you already have calculated the shap values
6. Pre-computing shap values. Perhaps you already have calculated the shap values
somewhere, or you can calculate them off on a giant cluster somewhere, or
your model supports [GPU generated shap values](https://github.com/rapidsai/gputreeshap).
You can simply add these pre-calculated shap values to the explainer
with `explainer.set_shap_values()` and `explainer.set_shap_interaction_values()` methods.
5. Plotting only a random sample of points. When you have a lots of observations,
7. Plotting only a random sample of points. When you have a lots of observations,
simply rendering the plots may get slow as well. You can pass the `plot_sample`
parameter to render a (different each time) random sample of observations
for the various scatter plots in the dashboard. E.g.:
Expand Down

0 comments on commit 263b370

Please sign in to comment.