Skip to content

nliulab/ShapleyVIC

Repository files navigation

ShapleyVIC: Shapley Variable Importance Cloud for Interpretable Machine Learning

ShapleyVIC: Shapley Variable Importance Cloud for Interpretable Machine Learning

ShapleyVIC is now implemented by combining a Python library and an R package. Previous version of the R package is archived in Historical version subdirectory.

ShapleyVIC Introduction

Variable importance assessment is important for interpreting machine learning models. Current practice in interpretable machine learning applications focuses on explaining the final models that optimize predictive performance. However, this does not fully address practical needs, where researchers are willing to consider models that are “good enough” but are easier to understand or implement. Shapley variable importance cloud (ShapleyVIC) fills this gap by extending current method to a set of “good models” for comprehensive and robust assessments. Building on a common theoretical basis (i.e., Shapley values for variable importance), ShapleyVIC seamlessly complements the widely adopted SHAP assessments of a single final model to avoid biased inference.

Usage

ShapleyVIC version 1.2.0 now supports binary, ordinal and continuous outcomes.

Please visit our bookdown page for a full tutorial on ShapleyVIC usage.

ShapleyVIC analysis of variable importance consists of 3 general steps:

  1. Training an optimal prediction model (e.g., a logistic regression model).
  2. Generating a reasonable number of (e.g., 350) nearly optimal models of the same model class (e.g., logistic regression).
  3. Evaluate Shapley-based variable importance from each nearly optimal model and pool information for inference.

We provide functions to visualize ShapleyVIC values to facilitate interpretation, and to generate ensemble variable ranking to use with the AutoScore framework to develop interpretable clinical risk scores.

Installation

The ShapleyVIC framework is now implemented using a Python library that trains the optimal model, generates nearly optimal models and evaluate Shapley-based variable importance from such models, and an R package that pools information across models to generate summary statistics and visualizations for inference.

Python library

  • Required: Python version 3.6 or higher.
    • Recommended: latest stable release of Python 3.9 or 3.10.
  • Required: latest version of git.

Execute the following command in Terminal/Command Prompt to install the Python library from GitHub:

  • Linux/macOS:
pip install git+"https://github.com/nliulab/ShapleyVIC#egg=ShapleyVIC&subdirectory=python"
  • Windows:
python.exe -m pip install git+"https://github.com/nliulab/ShapleyVIC#egg=ShapleyVIC&subdirectory=python"

ShapleyVIC uses a modified version of the SAGE library (version 0.0.4b1), which avoids occasional stack overflow problems on Windows but does not affect variable importance evaluation.

R package

  • Required: R version 3.5.0 or higher.
    • Recommended: use latest version of R with RStudio.

Execute the following command in R/RStudio to install the R package from GitHub:

if (!require("devtools", quietly = TRUE)) install.packages("devtools")
devtools::install_github("nliulab/ShapleyVIC/r")

Citation

Core paper

Method extension

Contact

About

ShapleyVIC: Shapley Variable Importance Cloud for Interpretable Machine Learning

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 4

  •  
  •  
  •  
  •