Debiasing SHAP scores in random forests

This repository provides code to reproduce results in the paper "Debiasing SHAP scores in random forests" by Markus Loecher (2023)

The folder structure is as follows:

The actual code to run simulations and the functions to create figures are all in the src folder.
The resulting pdfs are in the figures folder.
Some (most) simulations take a long time to run, so we have saved the data outputs from those runs in the data folder.
Figures from the paper:
- We do not include source code for Figures 1 and Figure 6, since these are legacy figures from the work leading to this paper and were created prior to this research.
- There are dedicated Rmd files to create Figures 2 and 3. The figures in the Appendix are created by makeFigs_Appendix.R.
- Figures 4 and 5 were created by python code and requite installing the python module TreeModelsFromScratch, directions for which can be found on the original github page. Deviating from the general pattern, even the plots are created in the data/titanic/ folder.
Table 1: We imported the AUC scores for SHAP, SHAP_oob, MDA, MDI from a previous paper and only recreated the entry for $\widehat{\text{SHAP}}^{shrunk}_{in}$. The relevant files for this are AUC_run_simulations.R and src/AUC_simulations_functions.R. (Again, these simulations take a long time to run, so we have saved the data for your convenience)

I should note the following: this code base is more complex than strictly necessary as a result of having evolved over the years by contributions from various students both at the Bachelor and Master level. Roughly speaking, there are four separate parts, all of which led to the final insights in the paper:

The initial research ideas were tested in python (sklearn) where we successfully separated out-of-bag from inbag data for trees/forests.
Inspired by the need for more control on the features and attributes of trees, we then developed our own random forest library (still in python) which we used to create Figures 4 and 5.
In order to replicate our results in different settings, we eventually switched from the original shap module to treeshap in R. Most of the initial functions are found in the files StroblData_ASTA2022.R and helperFuns.R. However, those functions work only for the train/test methodology.
The actual separation of out-of-bag from inbag data for trees/forests was achieved in a separate effort and can be found in the files treewise_shap_simulation.R and sim_utils.R.

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
README_files/libs		README_files/libs
data		data
figures		figures
src		src
.DS_Store		.DS_Store
AUC_run_simulations.R		AUC_run_simulations.R
Correlations_SHAP.Rmd		Correlations_SHAP.Rmd
DebiasingSHAP_main.pdf		DebiasingSHAP_main.pdf
README.html		README.html
README.md		README.md
makeFigs_Appendix.R		makeFigs_Appendix.R
makeFigure2.Rmd		makeFigure2.Rmd
makeFigure3.Rmd		makeFigure3.Rmd
makeFigures4-5.ipynb		makeFigures4-5.ipynb
shrunk_SHAP.Rproj		shrunk_SHAP.Rproj
testStuff.R		testStuff.R
treeshap.log		treeshap.log

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Debiasing SHAP scores in random forests

About

Releases

Packages

Languages

markusloecher/shrunk_SHAP

Folders and files

Latest commit

History

Repository files navigation

Debiasing SHAP scores in random forests

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages