# Frequent Subtree Counting in Random Forests

The goal of this project is to compress the generated source code of decision tree classifiers on embedded devices.
Therefore, as a first step, we investigate for several trained random forests, whether they have certain frequent subtrees in common.
Such subtrees might be implemented by a function which is called several times in the corresponding places of the decision trees. 
This can decrease the code size of the generated embedded-c source files and executables.

## Datasets
There are several datasets.
At the moment, however, I'll experiment only with 'adult' and 'wine-quality'.

## Find Frequent Subtrees

To be able to meaningfully find frequent subtrees here, we actually need to do two things in the graph mining executable:
- [x] make the algorithm able to deal with rooted trees in a meaningful way (this is done here)
- make the algorithm output at least the root vertex of an embedding (if it exists for a given transaction tree) instead of just 'there is a mapping'

### Find Frequent Rooted Trees

Let's see how many rooted frequent trees we can find in the random forests.
That is: We consider the undirected graphs arising from the rooted decision trees by 'forgetting' the root. 

In [1]:
# create output directories
mkdir forests/rootedFrequentTrees/
for dataset in adult wine-quality; do
    mkdir forests/rootedFrequentTrees/${dataset}/
    for variant in WithLeafEdges NoLeafEdges; do
        mkdir forests/rootedFrequentTrees/${dataset}/${variant}/
    done
done

In [3]:
./lwgr -h

This is a frequent rooted subtree mining tool.
Implemented by Pascal Welke starting in 2018.

This program computes and outputs frequent *rooted* subtrees and feature
representations of the mined graphs. The database is expected to contain
tree transactions that are interpreted as being rooted at the first
vertex.

usage: ./lwg [options] [FILE]

If no FILE argument is given or FILE is - the program reads from stdin.
It always prints to stdout (unless specified by parameters) and 
stderr (statistics).


Options:
-h:           print this possibly helpful information.

-t THRESHOLD: Minimum absolute support threshold in the graph database

-p SIZE:      Maximum size (number of vertices) of patterns returned

-o FILE:      output the frequent subtrees in this file

-f FILE:      output the feature information in this file

-i VALUE:     Some embedding operators require a parameter that might be
              a float between 0.0 and 1.0 or an integer >=1, depending 
              on the ope

In [5]:
for dataset in adult wine-quality; do
    for variant in WithLeafEdges NoLeafEdges; do
        for f in forests/${dataset}/${variant}/*.graph; do
            for threshold in `seq 25 -1 2`; do
            
                echo "processing threshold ${threshold} for ${f}"
                ./lwgr -e rootedTrees -m bfs -t ${threshold} -p 10 \
                  -o forests/rootedFrequentTrees/${dataset}/${variant}/`basename ${f} .graph`_t${threshold}.patterns \
                  < ${f} \
                  > forests/rootedFrequentTrees/${dataset}/${variant}/`basename ${f} .graph`_t${threshold}.features \
                  2> forests/rootedFrequentTrees/${dataset}/${variant}/`basename ${f} .graph`_t${threshold}.logs
                  
            done
        done
    done
done

processing threshold 25 for forests/adult/WithLeafEdges/DT_10.graph
processing threshold 24 for forests/adult/WithLeafEdges/DT_10.graph
processing threshold 23 for forests/adult/WithLeafEdges/DT_10.graph
processing threshold 22 for forests/adult/WithLeafEdges/DT_10.graph
processing threshold 21 for forests/adult/WithLeafEdges/DT_10.graph
processing threshold 20 for forests/adult/WithLeafEdges/DT_10.graph
processing threshold 19 for forests/adult/WithLeafEdges/DT_10.graph
processing threshold 18 for forests/adult/WithLeafEdges/DT_10.graph
processing threshold 17 for forests/adult/WithLeafEdges/DT_10.graph
processing threshold 16 for forests/adult/WithLeafEdges/DT_10.graph
processing threshold 15 for forests/adult/WithLeafEdges/DT_10.graph
processing threshold 14 for forests/adult/WithLeafEdges/DT_10.graph
processing threshold 13 for forests/adult/WithLeafEdges/DT_10.graph
processing threshold 12 for forests/adult/WithLeafEdges/DT_10.graph
processing threshold 11 for forests/adult/WithLe

processing threshold 23 for forests/adult/WithLeafEdges/ET_10.graph
processing threshold 22 for forests/adult/WithLeafEdges/ET_10.graph
processing threshold 21 for forests/adult/WithLeafEdges/ET_10.graph
processing threshold 20 for forests/adult/WithLeafEdges/ET_10.graph
processing threshold 19 for forests/adult/WithLeafEdges/ET_10.graph
processing threshold 18 for forests/adult/WithLeafEdges/ET_10.graph
processing threshold 17 for forests/adult/WithLeafEdges/ET_10.graph
processing threshold 16 for forests/adult/WithLeafEdges/ET_10.graph
processing threshold 15 for forests/adult/WithLeafEdges/ET_10.graph
processing threshold 14 for forests/adult/WithLeafEdges/ET_10.graph
processing threshold 13 for forests/adult/WithLeafEdges/ET_10.graph
processing threshold 12 for forests/adult/WithLeafEdges/ET_10.graph
processing threshold 11 for forests/adult/WithLeafEdges/ET_10.graph
processing threshold 10 for forests/adult/WithLeafEdges/ET_10.graph
processing threshold 9 for forests/adult/WithLea

processing threshold 21 for forests/adult/WithLeafEdges/RF_10.graph
processing threshold 20 for forests/adult/WithLeafEdges/RF_10.graph
processing threshold 19 for forests/adult/WithLeafEdges/RF_10.graph
processing threshold 18 for forests/adult/WithLeafEdges/RF_10.graph
processing threshold 17 for forests/adult/WithLeafEdges/RF_10.graph
processing threshold 16 for forests/adult/WithLeafEdges/RF_10.graph
processing threshold 15 for forests/adult/WithLeafEdges/RF_10.graph
processing threshold 14 for forests/adult/WithLeafEdges/RF_10.graph
processing threshold 13 for forests/adult/WithLeafEdges/RF_10.graph
processing threshold 12 for forests/adult/WithLeafEdges/RF_10.graph
processing threshold 11 for forests/adult/WithLeafEdges/RF_10.graph
processing threshold 10 for forests/adult/WithLeafEdges/RF_10.graph
processing threshold 9 for forests/adult/WithLeafEdges/RF_10.graph
processing threshold 8 for forests/adult/WithLeafEdges/RF_10.graph
processing threshold 7 for forests/adult/WithLeafE

processing threshold 19 for forests/adult/NoLeafEdges/DT_10.graph
processing threshold 18 for forests/adult/NoLeafEdges/DT_10.graph
processing threshold 17 for forests/adult/NoLeafEdges/DT_10.graph
processing threshold 16 for forests/adult/NoLeafEdges/DT_10.graph
processing threshold 15 for forests/adult/NoLeafEdges/DT_10.graph
processing threshold 14 for forests/adult/NoLeafEdges/DT_10.graph
processing threshold 13 for forests/adult/NoLeafEdges/DT_10.graph
processing threshold 12 for forests/adult/NoLeafEdges/DT_10.graph
processing threshold 11 for forests/adult/NoLeafEdges/DT_10.graph
processing threshold 10 for forests/adult/NoLeafEdges/DT_10.graph
processing threshold 9 for forests/adult/NoLeafEdges/DT_10.graph
processing threshold 8 for forests/adult/NoLeafEdges/DT_10.graph
processing threshold 7 for forests/adult/NoLeafEdges/DT_10.graph
processing threshold 6 for forests/adult/NoLeafEdges/DT_10.graph
processing threshold 5 for forests/adult/NoLeafEdges/DT_10.graph
processing thre

processing threshold 13 for forests/adult/NoLeafEdges/ET_10.graph
processing threshold 12 for forests/adult/NoLeafEdges/ET_10.graph
processing threshold 11 for forests/adult/NoLeafEdges/ET_10.graph
processing threshold 10 for forests/adult/NoLeafEdges/ET_10.graph
processing threshold 9 for forests/adult/NoLeafEdges/ET_10.graph
processing threshold 8 for forests/adult/NoLeafEdges/ET_10.graph
processing threshold 7 for forests/adult/NoLeafEdges/ET_10.graph
processing threshold 6 for forests/adult/NoLeafEdges/ET_10.graph
processing threshold 5 for forests/adult/NoLeafEdges/ET_10.graph
processing threshold 4 for forests/adult/NoLeafEdges/ET_10.graph
processing threshold 3 for forests/adult/NoLeafEdges/ET_10.graph
processing threshold 2 for forests/adult/NoLeafEdges/ET_10.graph
processing threshold 25 for forests/adult/NoLeafEdges/ET_15.graph
processing threshold 24 for forests/adult/NoLeafEdges/ET_15.graph
processing threshold 23 for forests/adult/NoLeafEdges/ET_15.graph
processing thresho

processing threshold 7 for forests/adult/NoLeafEdges/RF_10.graph
processing threshold 6 for forests/adult/NoLeafEdges/RF_10.graph
processing threshold 5 for forests/adult/NoLeafEdges/RF_10.graph
processing threshold 4 for forests/adult/NoLeafEdges/RF_10.graph
processing threshold 3 for forests/adult/NoLeafEdges/RF_10.graph
processing threshold 2 for forests/adult/NoLeafEdges/RF_10.graph
processing threshold 25 for forests/adult/NoLeafEdges/RF_15.graph
processing threshold 24 for forests/adult/NoLeafEdges/RF_15.graph
processing threshold 23 for forests/adult/NoLeafEdges/RF_15.graph
processing threshold 22 for forests/adult/NoLeafEdges/RF_15.graph
processing threshold 21 for forests/adult/NoLeafEdges/RF_15.graph
processing threshold 20 for forests/adult/NoLeafEdges/RF_15.graph
processing threshold 19 for forests/adult/NoLeafEdges/RF_15.graph
processing threshold 18 for forests/adult/NoLeafEdges/RF_15.graph
processing threshold 17 for forests/adult/NoLeafEdges/RF_15.graph
processing thres

processing threshold 4 for forests/wine-quality/WithLeafEdges/DT_10.graph
processing threshold 3 for forests/wine-quality/WithLeafEdges/DT_10.graph
processing threshold 2 for forests/wine-quality/WithLeafEdges/DT_10.graph
processing threshold 25 for forests/wine-quality/WithLeafEdges/DT_15.graph
processing threshold 24 for forests/wine-quality/WithLeafEdges/DT_15.graph
processing threshold 23 for forests/wine-quality/WithLeafEdges/DT_15.graph
processing threshold 22 for forests/wine-quality/WithLeafEdges/DT_15.graph
processing threshold 21 for forests/wine-quality/WithLeafEdges/DT_15.graph
processing threshold 20 for forests/wine-quality/WithLeafEdges/DT_15.graph
processing threshold 19 for forests/wine-quality/WithLeafEdges/DT_15.graph
processing threshold 18 for forests/wine-quality/WithLeafEdges/DT_15.graph
processing threshold 17 for forests/wine-quality/WithLeafEdges/DT_15.graph
processing threshold 16 for forests/wine-quality/WithLeafEdges/DT_15.graph
processing threshold 15 for 

processing threshold 13 for forests/wine-quality/WithLeafEdges/ET_10.graph
processing threshold 12 for forests/wine-quality/WithLeafEdges/ET_10.graph
processing threshold 11 for forests/wine-quality/WithLeafEdges/ET_10.graph
processing threshold 10 for forests/wine-quality/WithLeafEdges/ET_10.graph
processing threshold 9 for forests/wine-quality/WithLeafEdges/ET_10.graph
processing threshold 8 for forests/wine-quality/WithLeafEdges/ET_10.graph
processing threshold 7 for forests/wine-quality/WithLeafEdges/ET_10.graph
processing threshold 6 for forests/wine-quality/WithLeafEdges/ET_10.graph
processing threshold 5 for forests/wine-quality/WithLeafEdges/ET_10.graph
processing threshold 4 for forests/wine-quality/WithLeafEdges/ET_10.graph
processing threshold 3 for forests/wine-quality/WithLeafEdges/ET_10.graph
processing threshold 2 for forests/wine-quality/WithLeafEdges/ET_10.graph
processing threshold 25 for forests/wine-quality/WithLeafEdges/ET_15.graph
processing threshold 24 for fores

processing threshold 22 for forests/wine-quality/WithLeafEdges/RF_10.graph
processing threshold 21 for forests/wine-quality/WithLeafEdges/RF_10.graph
processing threshold 20 for forests/wine-quality/WithLeafEdges/RF_10.graph
processing threshold 19 for forests/wine-quality/WithLeafEdges/RF_10.graph
processing threshold 18 for forests/wine-quality/WithLeafEdges/RF_10.graph
processing threshold 17 for forests/wine-quality/WithLeafEdges/RF_10.graph
processing threshold 16 for forests/wine-quality/WithLeafEdges/RF_10.graph
processing threshold 15 for forests/wine-quality/WithLeafEdges/RF_10.graph
processing threshold 14 for forests/wine-quality/WithLeafEdges/RF_10.graph
processing threshold 13 for forests/wine-quality/WithLeafEdges/RF_10.graph
processing threshold 12 for forests/wine-quality/WithLeafEdges/RF_10.graph
processing threshold 11 for forests/wine-quality/WithLeafEdges/RF_10.graph
processing threshold 10 for forests/wine-quality/WithLeafEdges/RF_10.graph
processing threshold 9 fo

processing threshold 7 for forests/wine-quality/WithLeafEdges/RF_5.graph
processing threshold 6 for forests/wine-quality/WithLeafEdges/RF_5.graph
processing threshold 5 for forests/wine-quality/WithLeafEdges/RF_5.graph
processing threshold 4 for forests/wine-quality/WithLeafEdges/RF_5.graph
processing threshold 3 for forests/wine-quality/WithLeafEdges/RF_5.graph
processing threshold 2 for forests/wine-quality/WithLeafEdges/RF_5.graph
processing threshold 25 for forests/wine-quality/NoLeafEdges/DT_10.graph
processing threshold 24 for forests/wine-quality/NoLeafEdges/DT_10.graph
processing threshold 23 for forests/wine-quality/NoLeafEdges/DT_10.graph
processing threshold 22 for forests/wine-quality/NoLeafEdges/DT_10.graph
processing threshold 21 for forests/wine-quality/NoLeafEdges/DT_10.graph
processing threshold 20 for forests/wine-quality/NoLeafEdges/DT_10.graph
processing threshold 19 for forests/wine-quality/NoLeafEdges/DT_10.graph
processing threshold 18 for forests/wine-quality/No

processing threshold 13 for forests/wine-quality/NoLeafEdges/DT_5.graph
processing threshold 12 for forests/wine-quality/NoLeafEdges/DT_5.graph
processing threshold 11 for forests/wine-quality/NoLeafEdges/DT_5.graph
processing threshold 10 for forests/wine-quality/NoLeafEdges/DT_5.graph
processing threshold 9 for forests/wine-quality/NoLeafEdges/DT_5.graph
processing threshold 8 for forests/wine-quality/NoLeafEdges/DT_5.graph
processing threshold 7 for forests/wine-quality/NoLeafEdges/DT_5.graph
processing threshold 6 for forests/wine-quality/NoLeafEdges/DT_5.graph
processing threshold 5 for forests/wine-quality/NoLeafEdges/DT_5.graph
processing threshold 4 for forests/wine-quality/NoLeafEdges/DT_5.graph
processing threshold 3 for forests/wine-quality/NoLeafEdges/DT_5.graph
processing threshold 2 for forests/wine-quality/NoLeafEdges/DT_5.graph
processing threshold 25 for forests/wine-quality/NoLeafEdges/ET_10.graph
processing threshold 24 for forests/wine-quality/NoLeafEdges/ET_10.grap

processing threshold 19 for forests/wine-quality/NoLeafEdges/ET_5.graph
processing threshold 18 for forests/wine-quality/NoLeafEdges/ET_5.graph
processing threshold 17 for forests/wine-quality/NoLeafEdges/ET_5.graph
processing threshold 16 for forests/wine-quality/NoLeafEdges/ET_5.graph
processing threshold 15 for forests/wine-quality/NoLeafEdges/ET_5.graph
processing threshold 14 for forests/wine-quality/NoLeafEdges/ET_5.graph
processing threshold 13 for forests/wine-quality/NoLeafEdges/ET_5.graph
processing threshold 12 for forests/wine-quality/NoLeafEdges/ET_5.graph
processing threshold 11 for forests/wine-quality/NoLeafEdges/ET_5.graph
processing threshold 10 for forests/wine-quality/NoLeafEdges/ET_5.graph
processing threshold 9 for forests/wine-quality/NoLeafEdges/ET_5.graph
processing threshold 8 for forests/wine-quality/NoLeafEdges/ET_5.graph
processing threshold 7 for forests/wine-quality/NoLeafEdges/ET_5.graph
processing threshold 6 for forests/wine-quality/NoLeafEdges/ET_5.gr

processing threshold 25 for forests/wine-quality/NoLeafEdges/RF_5.graph
processing threshold 24 for forests/wine-quality/NoLeafEdges/RF_5.graph
processing threshold 23 for forests/wine-quality/NoLeafEdges/RF_5.graph
processing threshold 22 for forests/wine-quality/NoLeafEdges/RF_5.graph
processing threshold 21 for forests/wine-quality/NoLeafEdges/RF_5.graph
processing threshold 20 for forests/wine-quality/NoLeafEdges/RF_5.graph
processing threshold 19 for forests/wine-quality/NoLeafEdges/RF_5.graph
processing threshold 18 for forests/wine-quality/NoLeafEdges/RF_5.graph
processing threshold 17 for forests/wine-quality/NoLeafEdges/RF_5.graph
processing threshold 16 for forests/wine-quality/NoLeafEdges/RF_5.graph
processing threshold 15 for forests/wine-quality/NoLeafEdges/RF_5.graph
processing threshold 14 for forests/wine-quality/NoLeafEdges/RF_5.graph
processing threshold 13 for forests/wine-quality/NoLeafEdges/RF_5.graph
processing threshold 12 for forests/wine-quality/NoLeafEdges/RF_

### Next Steps

The results of this mining process are plotted in the python3 notebook 'Plotting Results for Undirected Graphs.ipynb'.
Note that the mining process resulting in output 'forests/undirectedFrequentTrees/adult/WithLeafEdges/ER_20_t2.*' did not finish properly and probably got killed due to excessive memory usage while processing patterns of size 6.