Skip to content

The official repository for "The Rashomon Importance Distribution: Getting RID of Unstable, Single Model-based Variable Importance". Code will be posted here upon publication.

License

jdonnelly36/Rashomon_Importance_Distribution

Repository files navigation

RID Documentation

This codebase contains the code necessary to compute the Rashomon Importance Distribution as described in "The Rashomon Importance Distribution: Getting RID of Unstable, Single Model-Based Variable Importance".

Environment Setup

In order to configure your environment, you will need to:

  1. Install all required python packages via pip install -r requirements.txt

Computing RID

A simple demonstration of the RID interface can be seen in example.ipynb. The primary interface for RID is the RashomonImportanceDistribution class, which computes RID in its constructor and provides functions to examine RID. The following parameters are available in the RashomonImportanceDistributionconstructor:

input_df -- A pandas DataFrame containing a binarized version of the dataset we seek to explain

binning_map -- A dictionary of the form {0: [0, 1, 2], 1: [3, 4], 2: [5, 6, 7, 8]} describing which variables in the original, unbinned version of the dataset map to which columns of the binarized version input_df. The example given states that the 0-th variable in the original data is represented by bins 0, 1, and 2, and so on.

db -- The maximum depth allowed for the decision trees in our rashomon sets. Note that large values can cause computational problems

lam -- The regularization weight to use when computing rashomon sets

eps -- The threshold to use when computing rashomon sets (i.e., models within eps of optimal are included)

dataset_name -- The name of the datset being analyzed. Used to determine where to cache various files

n_resamples -- The number of bootstrap samples to compute. Default: 100

cache_dir_root -- The root file path at which all cached files should be stored. Default: './cached_files'

rashomon_output_dir -- The name of the subfolder of cache_dir_root in which rashomon sets will be stored. Default: 'rashomon_outputs'

verbose -- Whether to produce extra logging. Default: False

vi_metric -- The VI metric to use for this RID; should be one of ['sub_mr', 'div_mr']. Default: 'sub_mr'

max_par_for_gosdt -- The maximum number of instances of GOSDT to run in parallell; reduce this number if memory issues occur. Default: 5

About

The official repository for "The Rashomon Importance Distribution: Getting RID of Unstable, Single Model-based Variable Importance". Code will be posted here upon publication.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published