Skip to content

Releases: oegedijk/explainerdashboard

v0.3.4.1: fixes detailed shap plots bug when cats=None

05 May 08:35
Compare
Choose a tag to compare

Fixes dtreeviz 1.3 breaking change bug

13 Apr 18:15
479746e
Compare
Choose a tag to compare

Release Notes

Version 0.3.4:

Bug Fixes

  • Fixes incompatibility bug with dtreeviz >= 1.3
  • Fixes ExplainerHub dbc.Jumbotron style bug

Improvements

  • raises ValueError when passing shap='deep' as it is not yet correctly supported

v0.3.3.1: minor bugfix with outliers and nan

22 Mar 19:11
d46b8fc
Compare
Choose a tag to compare

Fixes a bug with removing outliers when nan's are present.

v0.3.3: better pipeline support and thread safety

11 Mar 19:15
f5bd4a5
Compare
Choose a tag to compare

Version 0.3.3:

Highlights:

  • Adding support for cross validated metrics
  • Better support for pipelines by using kernel explainer
  • Making explainer threadsafe by adding locks
  • Remove outliers from shap dependence plots

Breaking Changes

  • parameter permutation_cv has been deprecated and replaced by parameter cv which
    now also works to calculate cross-validated metrics besides cross-validated
    permutation importances.

New Features

  • metrics now get calculated with cross validation over X when you pass the
    cv parameter to the explainer, this is useful when for some reason you
    want to pass the training set to the explainer.
  • adds winsorization to shap dependence and shap interaction plots
  • If shap='guess' fails (unable to guess the right type of shap explainer),
    then default to the model agnostic shap='kernel'.
  • Better support for sklearn Pipelines: if not able to extract transformer+model,
    then default to shap.KernelExplainer to explain the entire pipeline
  • you can now remove outliers from shap dependence/interaction plots with
    remove_outliers=True: filters all outliers beyond 1.5*IQR

Bug Fixes

  • Sets proper threading.Locks before making calls to shap explainer to prevent race
    conditions with dashboards calling for shap values in multiple threads.
    (shap is unfortunately not threadsafe)

Improvements

  • single shap row KernelExplainer calculations now go without tqdm progress bar
  • added cutoff tpr anf fpr to roc auc plot
  • added cutoff precision and recall to pr auc plot
  • put a loading spinner on shap contrib table

v0.3.2.2: more bugfixes

03 Mar 19:28
dfc4b5a
Compare
Choose a tag to compare

Version 0.3.2.2:

index_dropdown=False now works for indexes not listed in set_index_list_func()
as long as it can be found by set_index_exists_func

New Features

  • adds set_index_exists_func to add function that checks for index existing
    besides those listed by set_index_list_func()

Bug Fixes

  • bug fix to make shap.KernelExplainer (used with explainer parametershap='kernel')
    work with RegressionExplainer
  • bug fix when no explicit labels are passed with index selector
  • component only update if explainer.index_exists(): no IndexNotFoundErrors anymore.
  • fixed title for regression index selector labeled 'Custom' bug
  • get_y() now returns .item() when necessary
  • removed ticks from confusion matrix plot when no labels param passed
    (this bug got reintroduced in recent plotly release)

Improvements

  • new helper function get_shap_row(index) to calculate or look up a single
    row of shap values.

v0.3.2.1: add index_dropdown=False to regression dashboard

26 Feb 19:06
05dfa18
Compare
Choose a tag to compare

Bugfix: new index_dropdown=False feature was not working correctly for regression dashboards

v0.3.2: custom metrics

25 Feb 19:50
3884b8e
Compare
Choose a tag to compare

Version 0.3.2:

Highlights:

  • Control what metrics to show or use your own custom metrics using show_metrics
  • Set the naming for onehot features with all 0s with cats_notencoded
  • Speed up plots by displaying only a random sample of markers in scatter plots with plot_sample.
  • make index selection a free text field with index_dropdown=False

New Features

  • new parameter show_metrics for both explainer.metrics(), ClassifierModelSummaryComponent
    and RegressionModelSummaryComponent:
    • pass a list of metrics and only display those metrics in that order
    • you can also pass custom scoring functions as long as they
      are of the form metric_func(y_true, y_pred): show_metrics=[metric_func]
      • For ClassifierExplainer what is passed to the custom metric function
        depends on whether the function takes additional parameters cutoff
        and pos_label. If these are not arguments, then y_true=self.y_binary(pos_label)
        and y_pred=np.where(self.pred_probas(pos_label)>cutoff, 1, 0).
        Else the raw self.y and self.pred_probas are passed for the
        custom metric function to do something with.
      • custom functions are also stored to dashboard.yaml and imported upon
        loading ExplainerDashboard.from_config()
  • new parameter cats_notencoded: a dict to indicate how to name the value
    of a onehotencoded features when all onehot columns equal 0. Defaults
    to 'NOT_ENCODED', but can be adjusted with this parameter. E.g.
    cats_notencoded=dict(Deck="Deck not known").
  • new parameter plot_sample to only plot a random sample in the various
    scatter plots. When you have a large dataset, this may significantly
    speed up various plots without sacrificing much in expressiveness:
    ExplainerDashboard(explainer, plot_sample=1000).run
  • new parameter index_dropdown=False will replace the index dropdowns with a
    free text field. This can be useful when you have a lot of potential indexes,
    and the user is expected to know the index string.
    Input will be checked for validity with explainer.index_exists(index),
    and field indicates when input index does not exist. If index does not exist,
    will not be forwarded to other components, unless you also set index_check=False.
  • adds mean absolute percentage error to the regression metrics. If it is too
    large a warning will be printed. Can be excluded with the new show_metrics
    parameter.

Bug Fixes

  • get_classification_df added to ClassificationComponent dependencies.

Improvements

  • accepting single column pd.Dataframe for y, and automatically converting
    it to a pd.Series
  • if WhatIf FeatureInputComponent detects the presence of missing onehot features
    (i.e. rows where all columns of the onehotencoded feature equal 0), then
    adds 'NOT_ENCODED' or the matching value from cats_notencoded to the
    dropdown options.
  • Generating name for parameters for ExplainerComponents for which no
    name is given is now done with a determinative process instead of a random
    uuid. This should help with scaling custom dashboards across cluster
    deployments. Also drops shortuuid dependency.
  • ExplainerDashboard now prints out local ip address when starting dashboard.
  • get_index_list() is only called once upon starting dashboard.

v0.3.1: responsive classifier components

31 Jan 13:46
Compare
Choose a tag to compare

Version 0.3.1:

This version is mostly about pre-calculating and optimizing the classifier statistics
components. Those components should now be much more responsive with large datasets.

New Features

  • new methods roc_auc_curve(pos_label) and pr_auc_curve(pos_label)
  • new method get_classification_df(...) to get dataframe with number of labels
    above and below a given cutoff.
    • this now gets used by plot_classification(..)
  • new method confusion_matrix(cutoff, binary, pos_label)
  • added parameters sort_features to FeatureInputComponent:
    • defaults to 'shap': order features by mean absolute shap
    • if set to 'alphabet' features are sorted alphabetically
  • added parameter fill_row_first to FeatureInputComponent:
    • defaults to True: fill first row first, then next row, etc
    • if False: fill first column first, then second column, etc

Bug Fixes

  • categorical mappings now updateable with pandas<=1.2 and python==3.6
  • title now overridable for RegressionRandomIndexComponent
  • added assert check on summary_type for ShapSummaryComponent

Improvements

  • pre-Calculating lift_curve_df only once and then storing for each pos_label
    • plus: storing only 100 evenly spaced rows of lift_curve_df
    • dashboard should be more responsive for large datasets
  • pre-calculating roc_auc_curve and pr_auc_curve
    • dashboard should be more responsive for large datasets
  • pre-calculating confusion matrices
    • dashboard should be more responsive for large datasets
  • pre-calculating classification_dfs
    • dashboard should be more responsive for large datasets
  • confusion matrix: added axis title, moved predicted labels to bottom of graph
  • precision plot component: when only adjusting cutoff, simply updating the cutoff
    line, without recalculating the plot.

v0.3.0.1: dependency fixes

27 Jan 15:02
304918b
Compare
Choose a tag to compare

version 0.3.0.1:

Some of the new features of version 0.3 only work with pandas>=1.2, which is not available for python 3.6.

Breaking Changes

  • new dependency requirements pandas>=1.2 also implies python>=3.7

Bug Fixes

  • updates pandas version to be compatible with categorical feature operations
  • updates dtreeviz version to make xgboost and pyspark dependencies optional

v0.3.0: reducing memory footprint

27 Jan 13:49
f58767a
Compare
Choose a tag to compare

Version 0.3.0:

This is a major release and comes with lots of breaking changes to the lower level
ClassifierExplainer and RegressionExplainer API. The higherlevel ExplainerComponent and ExplainerDashboard API has not been
changed however, except for the deprecation of the cats and hide_cats parameters.

Explainers generated with version explainerdashboard <= 0.2.20.1 will not work
with this version! So if you have stored explainers to disk you either have to
rebuild them with this new version, or downgrade back to explainerdashboard==0.2.20.1!
(hope you pinned your dependencies in production! ;)

Main motivation for these breaking changes was to improve memory usage of the
dashboards, especially in production. This lead to the deprecation of the
dual cats grouped/not grouped functionality of the dashboard. Once I had committed
to that breaking change, I decided to clean up the entire API and do all the
needed breaking changes at once.

Breaking Changes

  • onehot encoded features (passed with the cats parameter) are now merged by default. This means that the cats=True
    parameter has been removed from all explainer methods, and the group cats
    toggle has been removed from all ExplainerComponents. This saves both
    on code complexity and memory usage. If you wish to see the see the individual
    contributions of onehot encoded columns, simply don't pass them to the
    cats parameter upon construction.

  • Deprecated explainer attributes:

    • BaseExplainer:
      • shap_values_cats
      • shap_interaction_values_cats
      • permutation_importances_cats
      • get_dfs()
      • formatted_contrib_df()
      • to_sql()
      • check_cats()
      • equivalent_col
    • ClassifierExplainer:
      • get_prop_for_label
  • Naming changes to attributes:

    • BaseExplainer:
      • importances_df() -> get_importances_df()
      • feature_permutations_df() -> get_feature_permutations_df()
      • get_int_idx(index) -> get_idx(index)
      • importances_df() -> get_importances_df()
      • contrib_df() -> get_contrib_df() *
      • contrib_summary_df() -> self.get_summary_contrib_df() *
      • interaction_df() -> get_interactions_df() *
      • shap_values -> get_shap_values_df
      • plot_shap_contributions() -> plot_contributions()
      • plot_shap_summary() -> plot_importances_detailed()
      • plot_shap_dependence() -> plot_dependence()
      • plot_shap_interaction() -> plot_interaction()
      • plot_shap_interaction_summary() -> plot_interactions_detailed()
      • plot_interactions() -> plot_interactions_importance()
      • n_features() -> n_features
      • shap_top_interaction() -> top_shap_interactions
      • shap_interaction_values_by_col() -> shap_interactions_values_for_col()
    • ClassifierExplainer:
      • self.pred_probas -> self.pred_probas()
      • precision_df() -> get_precision_df() *
      • lift_curve_df() -> get_liftcurve_df() *
    • RandomForestExplainer/XGBExplainer:
      • decision_trees -> shadow_trees
      • decisiontree_df() -> get_decisionpath_df()
      • decisiontree_summary_df() -> get_decisionpath_summary_df()
      • decision_path_file() -> decisiontree_file()
      • decision_path() -> decisiontree()
      • decision_path_encoded() -> decisiontree_encoded()

New Features

  • new Explainer parameter precision: defaults to 'float64'. Can be set to
    'float32' to save on memory usage: ClassifierExplainer(model, X, y, precision='float32')
  • new memory_usage() method to show which internal attributes take the most memory.
  • for multiclass classifiers: keep_shap_pos_label_only(pos_label) method:
    • drops shap values and shap interactions for all labels except pos_label
    • this should significantly reduce memory usage for multi class classification
      models.
    • not needed for binary classifiers.
  • added get_index_list(), get_X_row(index), and get_y(index) methods.
    • these can be overridden with .set_index_list_func(), .set_X_row_func()
      and .set_y_func().
    • by overriding these functions you can for example sample observations
      from a database or other external storage instead of from X_test, y_test.
  • added Popout buttons to all the major graphs that open a large modal
    showing just the graph. This makes it easier to focus on a particular
    graph without distraction from the rest of the dashboard and all it's toggles.
  • added max_cat_colors parameters to plot_importance_detailed and plot_dependence and plot_interactions_detailed
    • prevents plotting getting slow with categorical features with many categories.
    • defaults to 5
    • can be set as **kwarg to ExplainerDashboard
  • adds category limits and sorting to RegressionVsCol component
  • adds property X_merged that gives a dataframe with the onehot columns merged.

Bug Fixes

  • shap dependence: when no point cloud, do not highlight!
  • Fixed bug with calculating contributions plot/table for whatif component,
    when InputFeatures had not fully loaded, resulting in shap error.

Improvements

  • saving X.copy(), instead of using a reference to X
    • this would result in more memory usage in development
      though, so you can del X_test to save memory.
  • ClassifierExplainer only stores shap (interaction) values for the positive
    class: shap values for the negative class are generated on the fly
    by multiplying with -1.
  • encoding onehot columns as np.int8 saving memory usage
  • encoding categorical features as pd.category saving memory usage
  • added base TreeExplainer class that RandomForestExplainer and XGBExplainer both derive from
    • will make it easier to extend tree explainers to other models in the future
      • e.g. catboost and lightgbm
  • got rid of the callable properties (that were their to assure backward compatibility),
    and replaced them with regular methods.