Update README.md

oegedijk · Dec 17, 2023 · 263b370 · 263b370
1 parent 1a45500
commit 263b370
Showing 1 changed file with 6 additions and 4 deletions.
diff --git a/README.md b/README.md
@@ -218,23 +218,25 @@ There are a few tricks to make this less painful:
     values can be very slow to calculate, and often are not needed for analysis.
     For permutation importances you can set the `n_jobs` parameter to speed up
     the calculation in parallel.
-2. Storing the explainer. The calculated properties are only calculated once
+2. Calculate approximate shap values. You can pass approximate=True as a shap parameter by
+   passing `shap_kwargs=dict(approximate=True)` to the explainer initialization. 
+4. Storing the explainer. The calculated properties are only calculated once
     for each instance, however each time when you instantiate a new explainer
     instance they will have to be recalculated. You can store them with
     `explainer.dump("explainer.joblib")` and load with e.g. 
     `ClassifierExplainer.from_file("explainer.joblib")`. All calculated properties
     are stored along with the explainer.
-3. Using a smaller (test) dataset, or using smaller decision trees. 
+5. Using a smaller (test) dataset, or using smaller decision trees. 
     TreeShap computational complexity is `O(TLD^2)`, where `T` is the 
     number of trees, `L` is the maximum number of leaves in any tree and 
     `D` the maximal depth of any tree. So reducing the number of leaves or average
     depth in the decision tree can really speed up SHAP calculations.
-4. Pre-computing shap values. Perhaps you already have calculated the shap values
+6. Pre-computing shap values. Perhaps you already have calculated the shap values
     somewhere, or you can calculate them off on a giant cluster somewhere, or
     your model supports [GPU generated shap values](https://github.com/rapidsai/gputreeshap). 
     You can simply add these pre-calculated shap values to the explainer 
     with `explainer.set_shap_values()` and `explainer.set_shap_interaction_values()` methods.
-5. Plotting only a random sample of points. When you have a lots of observations,
+7. Plotting only a random sample of points. When you have a lots of observations,
     simply rendering the plots may get slow as well. You can pass the `plot_sample`
     parameter to render a (different each time) random sample of observations
     for the various scatter plots in the dashboard. E.g.: