ShapleyVIC: Shapley Variable Importance Cloud for Interpretable Machine Learning

ShapleyVIC: Shapley Variable Importance Cloud for Interpretable Machine Learning
ShapleyVIC Introduction

ShapleyVIC: Shapley Variable Importance Cloud for Interpretable Machine Learning

ShapleyVIC is now implemented by combining a Python library and an R package. Previous version of the R package is archived in Historical version subdirectory.

ShapleyVIC Introduction

Variable importance assessment is important for interpreting machine learning models. Current practice in interpretable machine learning applications focuses on explaining the final models that optimize predictive performance. However, this does not fully address practical needs, where researchers are willing to consider models that are “good enough” but are easier to understand or implement. Shapley variable importance cloud (ShapleyVIC) fills this gap by extending current method to a set of “good models” for comprehensive and robust assessments. Building on a common theoretical basis (i.e., Shapley values for variable importance), ShapleyVIC seamlessly complements the widely adopted SHAP assessments of a single final model to avoid biased inference.

Usage

ShapleyVIC version 1.2.0 now supports binary, ordinal and continuous outcomes.

Please visit our bookdown page for a full tutorial on ShapleyVIC usage.

ShapleyVIC analysis of variable importance consists of 3 general steps:

Training an optimal prediction model (e.g., a logistic regression model).
Generating a reasonable number of (e.g., 350) nearly optimal models of the same model class (e.g., logistic regression).
Evaluate Shapley-based variable importance from each nearly optimal model and pool information for inference.

We provide functions to visualize ShapleyVIC values to facilitate interpretation, and to generate ensemble variable ranking to use with the AutoScore framework to develop interpretable clinical risk scores.

Installation

The ShapleyVIC framework is now implemented using a Python library that trains the optimal model, generates nearly optimal models and evaluate Shapley-based variable importance from such models, and an R package that pools information across models to generate summary statistics and visualizations for inference.

Python library

Required: Python version 3.6 or higher.
- Recommended: latest stable release of Python 3.9 or 3.10.
Required: latest version of git.

Execute the following command in Terminal/Command Prompt to install the Python library from GitHub:

Linux/macOS:

pip install git+"https://github.com/nliulab/ShapleyVIC#egg=ShapleyVIC&subdirectory=python"

Windows:

python.exe -m pip install git+"https://github.com/nliulab/ShapleyVIC#egg=ShapleyVIC&subdirectory=python"

ShapleyVIC uses a modified version of the SAGE library (version 0.0.4b1), which avoids occasional stack overflow problems on Windows but does not affect variable importance evaluation.

R package

Required: R version 3.5.0 or higher.
- Recommended: use latest version of R with RStudio.

Execute the following command in R/RStudio to install the R package from GitHub:

if (!require("devtools", quietly = TRUE)) install.packages("devtools")
devtools::install_github("nliulab/ShapleyVIC/r")

Citation

Core paper

Ning Y, Ong ME, Chakraborty B, Goldstein BA, Ting DS, Vaughan R, Liu N. Shapley variable importance cloud for interpretable machine learning. Patterns 2022; 3: 100452.

Method extension

Ning Y, Li S, Ong ME, Xie F, Chakraborty B, Ting DS, Liu N. A novel interpretable machine learning system to generate clinical risk scores: An application for predicting early mortality or unplanned readmission in a retrospective cohort study. PLOS Digit Health 2022; 1(6): e0000062.

Contact

Yilin Ning (Email: yilin.ning@duke-nus.edu.sg)
Nan Liu (Email: liu.nan@duke-nus.edu.sg)

Name		Name	Last commit message	Last commit date
Latest commit History 57 Commits
Historical version		Historical version
figures		figures
python		python
r		r
.gitignore		.gitignore
README.Rmd		README.Rmd
README.html		README.html
README.md		README.md
README_autoscore_shapleyvic.md		README_autoscore_shapleyvic.md

nliulab/ShapleyVIC

Folders and files

Latest commit

History

Repository files navigation

ShapleyVIC: Shapley Variable Importance Cloud for Interpretable Machine Learning

ShapleyVIC: Shapley Variable Importance Cloud for Interpretable Machine Learning

ShapleyVIC is now implemented by combining a Python library and an R package. Previous version of the R package is archived in Historical version subdirectory.

ShapleyVIC Introduction

Usage

ShapleyVIC version 1.2.0 now supports binary, ordinal and continuous outcomes.

Please visit our bookdown page for a full tutorial on ShapleyVIC usage.

Installation

Python library

R package

Citation

Core paper

Method extension

Contact

About

Resources

Stars

Watchers

Forks

Languages

ShapleyVIC is now implemented by combining a Python library and an R package. Previous version of the R package is archived in `Historical version` subdirectory.