Skip to content
/ subSAGE Public

Inferring feature importance in high-dimensional data

Notifications You must be signed in to change notification settings

palVJ/subSAGE

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

26 Commits
 
 
 
 
 
 
 
 

Repository files navigation

subSAGE

subSAGE is a Shapley value based framework to infer feature importance in high-dimensional data. It is based on SAGE (Shapley Additive Global importancE), but adjusted for high-dimensional data. We also demonstrate how to perform paired bootstrapping in order to estimate confidence intervals. We investimate in particular subSAGE applied on tree ensemble models. We emphasize the importance of computing subSAGE on independent test data not used during training of the model.

Preprint

Preprint is available here.

Usage

Given an xgboost-model, test data, and a particular feature, the subSAGE estimate can be computed, in R, as:

source("~/subSAGE/subSAGE.R")
t = xgb.model.dt.tree(model = model)
trees = as.data.table(xgboost.trees(xgb_model = model, data = data, recalculate = FALSE))
estimate = subSage_cpp(data,trees,feature,loss = "RMSE")

About

Inferring feature importance in high-dimensional data

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published