Merge pull request #99 from oegedijk/dev

v0.3.3
oegedijk · Mar 11, 2021 · f5bd4a5 · f5bd4a5
2 parents 4a4aa57 + d0f7a91
commit f5bd4a5
Show file tree

Hide file tree

Showing 13 changed files with 797 additions and 356 deletions.
diff --git a/RELEASE_NOTES.md b/RELEASE_NOTES.md
@@ -1,4 +1,44 @@
 # Release Notes
+## Version 0.3.3:
+
+Highlights:
+* Adding support for cross validated metrics
+* Better support for pipelines by using kernel explainer
+* Making explainer threadsafe by adding locks
+* Remove outliers from shap dependence plots
+
+### Breaking Changes
+- parameter `permutation_cv` has been deprecated and replaced by parameter `cv` which
+    now also works to calculate cross-validated metrics besides cross-validated
+    permutation importances.
+
+### New Features
+- metrics now get calculated with cross validation over `X` when you pass the
+    `cv` parameter to the explainer, this is useful when for some reason you
+    want to pass the training set to the explainer.
+- adds winsorization to shap dependence and shap interaction plots
+- If `shap='guess'` fails (unable to guess the right type of shap explainer),
+    then default to the model agnostic `shap='kernel'`.
+- Better support for sklearn `Pipelines`: if not able to extract transformer+model,
+    then default to `shap.KernelExplainer` to explain the entire pipeline
+- you can now remove outliers from shap dependence/interaction plots with 
+    `remove_outliers=True`: filters all outliers beyond 1.5*IQR
+
+### Bug Fixes
+-   Sets proper `threading.Locks` before making calls to shap explainer to prevent race
+    conditions with dashboards calling for shap values in multiple threads. 
+    (shap is unfortunately not threadsafe)
+-
+
+### Improvements
+- single shap row KernelExplainer calculations now go without tqdm progress bar
+- added cutoff tpr anf fpr to roc auc plot
+- added cutoff precision and recall to pr auc plot
+- put a loading spinner on shap contrib table
+
+### Other Changes
+-
+-
 
 
 ## Version 0.3.2.2:
@@ -12,7 +52,7 @@
 ### Bug Fixes
 - bug fix to make `shap.KernelExplainer` (used with explainer parameter`shap='kernel'`) 
     work with `RegressionExplainer`
-- bug fix when no explicit `labels` are passed with index selector
+- bug fix when no explicit `labels` are based with index selector
 - component only update if `explainer.index_exists()`: no `IndexNotFoundErrors` anymore.
 - fixed title for regression index selector labeled 'Custom' bug
 - `get_y()` now returns `.item()` when necessary

diff --git a/TODO.md b/TODO.md
@@ -7,7 +7,6 @@
 ## Plots:
 - add SHAP decision plots:
     https://towardsdatascience.com/introducing-shap-decision-plots-52ed3b4a1cba
-- add winsor to shap dependence
 - make plot background transparent?
 - Only use ScatterGl above a certain cutoff
 - seperate standard shap plots for shap_interaction plots 
@@ -24,6 +23,7 @@
 ### Regression plots:
 
 ## Explainers:
+- Turn print statements into logging
 - pass n_jobs to pdp_isolate
 - add ExtraTrees and GradientBoostingClassifier to tree visualizers
 - add plain language explanations
@@ -37,6 +37,7 @@
 
 
 ## Dashboard:
+- Turn print statements into logging
 - make poweredby right align
 - more flexible instantiate_component:
     - no explainer needed (if explainer component detected, pass otherwise ignore)
@@ -59,7 +60,7 @@
 
 
 ### Components
-- add winsor to shap dependence
+- add feature descriptions component
 - add predictions list to whatif composite:
     - https://github.com/oegedijk/explainerdashboard/issues/85
 - add circular callbacks to cutoff - cutoff percentile
@@ -86,6 +87,7 @@
 - Add this method? : https://arxiv.org/abs/2006.04750?
 
 ## Tests:
+- add cv metrics tests
 - add tests for InterpretML EBM (shap 0.37)
 - write tests for explainerhub CLI add user
 - test model_output='probability' and 'raw' or 'logodds' seperately

diff --git a/docs/source/explainers.rst b/docs/source/explainers.rst
@@ -233,13 +233,15 @@ An example of using setting ``X_background`` and ``model_output`` with a
     ExplainerDashboard(explainer).run()
 
 
-permutation_cv
---------------
+cv
+--
+
+Normally metrics and permutation importances get calculated over a single fold 
+(assuming the data ``X`` is the test set). However if you pass the training set 
+to the explainer, you may wish to cross-validate calculate the permutation 
+importances and metrics. In that case pass the number of folds to ``cv``. 
+Note that custom metrics do not work with cross validation for now.
 
-Normally permutation importances get calculated over a single fold (assuming the
-data is the test set). However if you pass the training set to the explainer,
-you may wish to cross-validate calculate the permutation importances. In that
-case pass the number of folds to ``permutation_cv``.
 
 na_fill
 -------
@@ -505,7 +507,7 @@ get_importances_df
 .. automethod:: explainerdashboard.explainers.BaseExplainer.get_importances_df
 
 get_contrib_df
-^^^^^^^^^^
+^^^^^^^^^^^^^^
 
 .. automethod:: explainerdashboard.explainers.BaseExplainer.get_contrib_df
 
@@ -614,12 +616,12 @@ with the following additional methods::
 
 
 get_decisionpath_df
-^^^^^^^^^^^^^^^
+^^^^^^^^^^^^^^^^^^^
 
 .. automethod:: explainerdashboard.explainers.RandomForestExplainer.get_decisionpath_df
 
 get_decisionpath_summary_df
-^^^^^^^^^^^^^^^^^^^^^^^
+^^^^^^^^^^^^^^^^^^^^^^^^^^^
 
 .. automethod:: explainerdashboard.explainers.RandomForestExplainer.get_decisionpath_summary_df
 

diff --git a/explainerdashboard/dashboard_components/classifier_components.py b/explainerdashboard/dashboard_components/classifier_components.py
@@ -357,9 +357,10 @@ def update_output_div(index, pos_label):
                 preds_df = self.explainer.prediction_result_df(index, round=self.round, logodds=True)                
                 preds_df.probability = np.round(100*preds_df.probability.values, self.round).astype(str)
                 preds_df.probability = preds_df.probability + ' %'
-                preds_df.logodds = np.round(preds_df.logodds.values, self.round).astype(str)
+                if 'logodds' in preds_df.columns:
+                    preds_df.logodds = np.round(preds_df.logodds.values, self.round).astype(str)
 
-                if self.explainer.model_output!='logodds':
+                if self.explainer.model_output != 'logodds':
                     preds_df = preds_df[['label', 'probability']]
 
                 preds_table = dbc.Table.from_dataframe(preds_df, 
@@ -379,7 +380,8 @@ def update_output_div(pos_label, *inputs):
                 preds_df = self.explainer.prediction_result_df(X_row=X_row, round=self.round, logodds=True)                
                 preds_df.probability = np.round(100*preds_df.probability.values, self.round).astype(str)
                 preds_df.probability = preds_df.probability + ' %'
-                preds_df.logodds = np.round(preds_df.logodds.values, self.round).astype(str)
+                if 'logodds' in preds_df.columns:
+                    preds_df.logodds = np.round(preds_df.logodds.values, self.round).astype(str)
 
                 if self.explainer.model_output!='logodds':
                     preds_df = preds_df[['label', 'probability']]
@@ -527,7 +529,8 @@ def layout(self):
                                             marks={0.01: '0.01', 0.25: '0.25', 0.50: '0.50',
                                                     0.75: '0.75', 0.99: '0.99'}, 
                                             included=False,
-                                            tooltip = {'always_visible' : False}),
+                                            tooltip = {'always_visible' : False},
+                                            updatemode='drag'),
                             ], id='precision-cutoff-div-'+self.name),
                             dbc.Tooltip(f"Scores above this cutoff will be labeled positive",
                                             target='precision-cutoff-div-'+self.name,