subSAGE

subSAGE is a Shapley value based framework to infer feature importance in high-dimensional data. It is based on SAGE (Shapley Additive Global importancE), but adjusted for high-dimensional data. We also demonstrate how to perform paired bootstrapping in order to estimate confidence intervals. We investimate in particular subSAGE applied on tree ensemble models. We emphasize the importance of computing subSAGE on independent test data not used during training of the model.

Preprint

Preprint is available here.

Usage

Given an xgboost-model, test data, and a particular feature, the subSAGE estimate can be computed, in R, as:

source("~/subSAGE/subSAGE.R")
t = xgb.model.dt.tree(model = model)
trees = as.data.table(xgboost.trees(xgb_model = model, data = data, recalculate = FALSE))
estimate = subSage_cpp(data,trees,feature,loss = "RMSE")

Name		Name	Last commit message	Last commit date
Latest commit History 26 Commits
R		R
HistBootstrapsSAGE.pdf		HistBootstrapsSAGE.pdf
README.md		README.md
f_trees_fast.cpp		f_trees_fast.cpp

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

R

R

HistBootstrapsSAGE.pdf

HistBootstrapsSAGE.pdf

README.md

README.md

f_trees_fast.cpp

f_trees_fast.cpp

Repository files navigation

subSAGE

Preprint

Usage

About

Releases

Packages

Languages

palVJ/subSAGE

Folders and files

Latest commit

History

Repository files navigation

subSAGE

Preprint

Usage

About

Resources

Stars

Watchers

Forks

Languages