# Reducing Distinct Branching Conditions in Decision Forests

We are following the paper 'An Algorithm for Reducing the Number of Distinct Branching Conditions in a Decision Forest' by Nakamura and Sakurada.

## Datasets

We will use the 'adult' and 'wine-quality' datasets only. The data are given as json files of the following names:

In [2]:
ls forests/*/text/*.json

forests/adult/text/DT_10.json  forests/wine-quality/text/DT_10.json
forests/adult/text/DT_15.json  forests/wine-quality/text/DT_15.json
forests/adult/text/DT_1.json   forests/wine-quality/text/DT_1.json
forests/adult/text/DT_20.json  forests/wine-quality/text/DT_20.json
forests/adult/text/DT_5.json   forests/wine-quality/text/DT_5.json
forests/adult/text/ET_10.json  forests/wine-quality/text/ET_10.json
forests/adult/text/ET_15.json  forests/wine-quality/text/ET_15.json
forests/adult/text/ET_1.json   forests/wine-quality/text/ET_1.json
forests/adult/text/ET_20.json  forests/wine-quality/text/ET_20.json
forests/adult/text/ET_5.json   forests/wine-quality/text/ET_5.json
forests/adult/text/RF_10.json  forests/wine-quality/text/RF_10.json
forests/adult/text/RF_15.json  forests/wine-quality/text/RF_15.json
forests/adult/text/RF_1.json   forests/wine-quality/text/RF_1.json
forests/adult/text/RF_20.json  forests/wine-quality/text/RF_20.json
forests/adult/text/RF_5.json   forests/

We now prune these decision forests with $\sigma = 0.1$. <br>
TODO: do the pruning for $\sigma \in \{0.0, 0.1, 0.2, 0.3 \}$.

This took about 5 to 6 minutes for my laptop.

In [12]:
%%bash
for sigma in 0.0 0.1 0.2 0.3; do (
    for dataset in adult wine-quality; do (
        for f in forests/${dataset}/text/*.json; do
            echo ${f} '->' `basename ${f} .json`_pruned_with_sigma_${sigma}.json
            ./Pruning/pruning.py ${f} forests/${dataset}/FeatureVectors.dat ${sigma}
        done ) & #The '&' character here parallelizes it on 8 threads
    done ) &
done

forests/adult/text/DT_10.json -> DT_10_pruned_with_sigma_0.2.json
forests/adult/text/DT_10.json -> DT_10_pruned_with_sigma_0.3.json
forests/wine-quality/text/DT_10.json -> DT_10_pruned_with_sigma_0.1.json
forests/adult/text/DT_10.json -> DT_10_pruned_with_sigma_0.0.json
forests/adult/text/DT_10.json -> DT_10_pruned_with_sigma_0.1.json
forests/wine-quality/text/DT_10.json -> DT_10_pruned_with_sigma_0.0.json
forests/wine-quality/text/DT_10.json -> DT_10_pruned_with_sigma_0.3.json
forests/wine-quality/text/DT_10.json -> DT_10_pruned_with_sigma_0.2.json
forests/adult/text/DT_10_pruned_with_sigma_0_0.json -> DT_10_pruned_with_sigma_0_0_pruned_with_sigma_0.2.json
forests/adult/text/DT_10_pruned_with_sigma_0_0.json -> DT_10_pruned_with_sigma_0_0_pruned_with_sigma_0.3.json
forests/adult/text/DT_10_pruned_with_sigma_0_0.json -> DT_10_pruned_with_sigma_0_0_pruned_with_sigma_0.0.json
forests/wine-quality/text/DT_10_pruned_with_sigma_0_0.json -> DT_10_pruned_with_sigma_0_0_pruned_with_sigma_0.1.js

In [6]:
ls forests/*/text/*.json

forests/adult/text/DT_10.json
forests/adult/text/DT_10_pruned_with_sigma_0_0.json
forests/adult/text/DT_10_pruned_with_sigma_0_1.json
forests/adult/text/DT_15.json
forests/adult/text/DT_15_pruned_with_sigma_0_0.json
forests/adult/text/DT_15_pruned_with_sigma_0_1.json
forests/adult/text/DT_1.json
forests/adult/text/DT_1_pruned_with_sigma_0_0.json
forests/adult/text/DT_1_pruned_with_sigma_0_1.json
forests/adult/text/DT_20.json
forests/adult/text/DT_20_pruned_with_sigma_0_0.json
forests/adult/text/DT_20_pruned_with_sigma_0_1.json
forests/adult/text/DT_5.json
forests/adult/text/DT_5_pruned_with_sigma_0_0.json
forests/adult/text/DT_5_pruned_with_sigma_0_1.json
forests/adult/text/ET_10.json
forests/adult/text/ET_10_pruned_with_sigma_0_0.json
forests/adult/text/ET_10_pruned_with_sigma_0_1.json
forests/adult/text/ET_15.json
forests/adult/text/ET_15_pruned_with_sigma_0_0.json
forests/adult/text/ET_15_pruned_with_sigma_0_1.json
forests/adult/text/ET_1.json
forests/adult/tex