Merge pull request #63 from oegedijk/dev

Dev - refactored onehot_cols and categorical_cols
oegedijk · Jan 12, 2021 · 0bc863c · 0bc863c
2 parents 080597a + 98dbbb9
commit 0bc863c
Show file tree

Hide file tree

Showing 13 changed files with 1,321 additions and 500 deletions.
diff --git a/RELEASE_NOTES.md b/RELEASE_NOTES.md
@@ -1,5 +1,35 @@
 # Release Notes
 
+
+## 0.2.20:
+### Breaking Changes
+-  `WhatIfComponent` deprecated. Use `WhatIfComposite` or connect components 
+    yourself to a `FeatureInputComponent`
+- renaming properties:
+    `explainer.cats` -> `explainer.onehot_cols`
+    `explainer.cats_dict` -> `explainer.onehot_dict`
+
+### New Features
+- Adds support for model with categorical features that were not onehot encoded 
+    (e.g. CatBoost)
+- Adds filter on number of categories to display in violin plots and pdp plot, 
+    and how to sort the categories (alphabetical, by frequency or by mean abs shap)
+
+### Bug Fixes
+- fixes bug where str tab indicators returned e.g. the old ImportancesTab instead of ImportancesComposite
+-
+
+### Improvements
+- No longer dependening on PDPbox dependency: built own partial dependence 
+    functions with categorical feature support
+- autodetect xgboost.core.Booster or lightgbm.Booster and give ValueError to
+    use the sklearn compatible wrappers instead.
+
+### Other Changes
+- Introduces list of categorical columns: `explainer.categorical_cols`
+- Introduces dictionary with categorical columns categories: `explainer.categorical_dict`
+- Introduces list of all categorical features: `explainer.cat_cols`
+
 ## 0.2.19
 ### Breaking Changes
 - ExplainerHub: parameter `user_json` is now called `users_file` (and default to a `users.yaml` file)

diff --git a/TODO.md b/TODO.md
@@ -3,6 +3,7 @@
 
 ## Bugs:
 - dash contributions reload bug: Exception: Additivity check failed in TreeExplainer!
+- shap dependence: when no point cloud, do not highlight!
 
 ## Layout:
 - Find a proper frontender to help :)
@@ -20,20 +21,26 @@
     - https://community.plotly.com/t/announcing-plotly-py-4-12-horizontal-and-vertical-lines-and-rectangles/46783
 - add some of these:
     https://towardsdatascience.com/introducing-shap-decision-plots-52ed3b4a1cba
-
+- shap dependence plot, sort categorical features by:
+    - alphabet
+    - number of obs
+    - mean abs shap
 
 ### Classifier plots:
 - move predicted and actual to outer layer of ConfusionMatrixComponent
     - move predicted below graph?
 - pdp: add multiclass option
     - no icelines just mean and index with different thickness
+    - new method?
 
 ### Regression plots:
 
 
+
 ## Explainers:
+- minimize pd.DataFrame and np.array size:
+    - astype(float16), pd.category, etc
 - pass n_jobs to pdp_isolate
-- autodetect xgboost booster or catboost.core and suggest XGBClassifier, etc
 - make X_cats with categorical encoding .astype("category")
 - add ExtraTrees and GradientBoostingClassifier to tree visualizers
 - add plain language explanations
@@ -45,6 +52,7 @@
 - rename RandomForestExplainer and XGBExplainer methods into something more logical
     - Breaking change!
 
+
 ## notebooks:
 
 
@@ -68,8 +76,8 @@
 
 ### Components
 - autodetect when uuid name get rendered and issue warning
-- Add side-by-side option to cutoff selector component
 
+- Add side-by-side option to cutoff selector component
 - add filter to index selector using pattern matching callbacks:
     - https://dash.plotly.com/pattern-matching-callbacks
 - add querystring method to ExplainerComponents
@@ -94,14 +102,14 @@
 - Add this method? : https://arxiv.org/abs/2006.04750?
 
 ## Tests:
-- add wizard test
 - add tests for InterpretML EBM (shap 0.37)
 - write tests for explainerhub CLI add user
 - test model_output='probability' and 'raw' or 'logodds' seperately
 - write tests for explainer_methods
 - write tests for explainer_plots
 
 ## Docs:
+- add cats_topx cats_sort to docs
 - add hide_wizard and wizard to docs
 - add hide_poweredby to docs
 - add Docker deploy example (from issue)

diff --git a/explainerdashboard/dashboard_components/overview_components.py b/explainerdashboard/dashboard_components/overview_components.py
diff --git a/explainerdashboard/dashboard_components/regression_components.py b/explainerdashboard/dashboard_components/regression_components.py
@@ -915,9 +915,9 @@ def layout(self):
                                         "When you have some real outliers it can help to remove them"
                                         " from the plot so it is easier to see the overall pattern.", 
                                     target='reg-vs-col-winsor-label-'+self.name),
-                            dbc.Input(id='reg-vs-col-winsor-'+self.name, 
-                                    value=self.winsor,
-                                type="number", min=0, max=49, step=1),
+                                dbc.Input(id='reg-vs-col-winsor-'+self.name, 
+                                        value=self.winsor,
+                                    type="number", min=0, max=49, step=1),
                         ], md=4), hide=self.hide_winsor),  
                     make_hideable(
                         dbc.Col([
@@ -951,7 +951,10 @@ def register_callbacks(self, app):
              Input('reg-vs-col-winsor-'+self.name, 'value')],
         )
         def update_residuals_graph(col, display, points, winsor):
-            style = {} if col in self.explainer.cats else dict(display="none")
+            if col in self.explainer.onehot_cols or col in self.explainer.categorical_cols:
+                style = {}
+            else:
+                style = dict(display="none")
             if display == 'observed':
                 return self.explainer.plot_y_vs_feature(
                         col, points=bool(points), winsor=winsor, dropna=True), style