Skip to content


Folders and files

Last commit message
Last commit date

Latest commit



2 Commits

Repository files navigation


This repository contains the implementation used in our CP23 paper. The implementation aims at generating decision sets that are both interpretable and accurate, by compiling a gradient boosted tree model on demand, where each generated rule is equivalent to an abductive explanation for the prediction made by the gradient boosted tree. The experiments compare the proposed implementation with other state-of-the-art decision set learning algorithms in terms of accuracy, scalability, model size and explanation size.


Before using the implementation, we need to extract the datasets stored in datasets.tar.xz. To extract the datasets, please ensure tar is installed and run:

$ tar -xvf datasets.tar.xz

If interested in the logs, please run:

$ tar -xvf logs.tar.xz

Table of Content

Required Packages

The implementation is written as a set of Python scripts. The python version used in the experiments is 3.8.5. Some packages are required. To install requirements:

$ pip install -r requirements.txt

In addition to the packages above, Gurobi with full licence is also required. To install Gurobi, please follow the instruction. Please also follow the instruction to install IDS.

Usage provides a number of parameters, which can be set from the command line. To see the list of parameters, run:

$ cd src/ && python -h

Preparing a dataset

Cpl can address datasets in the CSV format. Before compiling a gradient boosted tree (BT) model in to a decision set (DS), we need to prepare the datasets the train a BT model.

  1. Assume a target dataset is stored in somepath/dataset.csv
  2. Create an extra file named somepath/dataset.csv.catcol containing the indices of the categorical columns ofthe target dataset. For example, if columns 0, 3, and 6 are categorical features, the file should be as follow:
  3. With the two files above, we can run:
$ python -p --pfiles dataset.csv,somename somepath/

to create a new dataset file somepath/somename_data.csv with the categorical features properly addressed. For example:

$ python -p --pfiles iris_train1.csv,iris_train1 ../datasets/train/iris/

Training a gradient boosted tree model

A gradient boosted tree model is required before generating a decision set. Run the following command to train a BT model:

$ python -c -t -n 50 -d 3 --testsplit 0 ../datasets/train/iris/iris_train1_data.csv 

Here, a boosted tree consisting of 50 trees per class is trained, where the maximum depth of each tree is 3. ../datasets/train/iris/iris_train1_data.csv is the dataset to be trained. The value of --testsplit ranges from 0.0 to 1.0. In this command line, the given dataset is split into 100% to train and 0% to test. By default, the generated model is saved in ./temp/iris_train1_data/iris_train1_data_nbestim_50_maxdepth_3_testsplit_0.0.mod.pkl

Compiling a boosted tree into a decision set

To generate a decision set via local compilation, i.e. the computed decision set covers all instances in the training dataset:

$ python -f -I -R lin -e mx -s g3 -v --clocal --fsort --fqupdate ./temp/iris_train1_data/iris_train1_data_nbestim_50_maxdepth_3_testsplit_0.0.mod.pkl

-f enables the compiled decision set in a particular format. -I -R lin activates the compilation process where the standard linear search for rule extraction is used. -e mx -s g3 indicates the MaxSAT encoding and g3 SAT solver are used. -v increases verbosity level. --clocal --fsort --fqupdate indicates local compilation and the feature sorting based on feature frequencies is activated.

Lexicographic optimization on each rule, i.e. minimizing misclassifications first then the number of literals used, can be activated by adding --reduce-lit after --reduce-lit-appr maxsat.

$ python -f -I -R lin -e mx -s g3 -v --clocal --fsort --fqupdate --reduce-lit after --reduce-lit-appr maxsat ./temp/iris_train1_data/iris_train1_data_nbestim_50_maxdepth_3_testsplit_0.0.mod.pkl

To enable the tradeoff between misclassifications and the number of literals used in each rule, add --lam 0.005 --approx 1 :

$ python -f -I -R lin -e mx -s g3 -v --clocal --fsort --fqupdate --reduce-lit after --reduce-lit-appr maxsat --lam 0.005 --approx 1 ./temp/iris_train1_data/iris_train1_data_nbestim_50_maxdepth_3_testsplit_0.0.mod.pkl

To activate rule reduction, add --reduce-rule --weighted:

$ python -f -I -R lin -e mx -s g3 -v --clocal --fsort --fqupdate --reduce-rule --weighted ./temp/iris_train1_data/iris_train1_data_nbestim_50_maxdepth_3_testsplit_0.0.mod.pkl

To activate both lexicographic optimization and rule reduction, add both ```` --reduce-lit after --reduce-lit-appr maxsat`` and --reduce-rule --weighted :

$ python -f -I -R lin -e mx -s g3 -v --clocal --fsort --fqupdate --reduce-lit after --reduce-lit-appr maxsat --reduce-rule --weighted ./temp/iris_train1_data/iris_train1_data_nbestim_50_maxdepth_3_testsplit_0.0.mod.pkl

The implementation also supports exhaustive compilation:

$ python -f -I -R lin -e mx -s g3 -v ./temp/iris_train1_data/iris_train1_data_nbestim_50_maxdepth_3_testsplit_0.0.mod.pkl

Reproducing Experimental Results

Due to randomization used in the training phase, it seems unlikely that the experimental results reported in the report can be completely reproduced. Similar experimental results can be obtained by the following script:

$ ./src/experiment/

Since the total number of datasets is 295 and 13 decision set competitors are considered, running the experiments will take a while.


No description, website, or topics provided.







No releases published


No packages published