# Minimum Rule Cover - MIRCO

Minimum Cover Boosting (MIRCO) algorithm aims at extracting a small set of rules that could be used to interpret a model trained with Random Forest (RF) algorithm. In this note, we demonstrate how one can use MIRCO and assess it success in mimicking the underlying random forest model.

We start with adding the required packages. Please note that except MIRCO (provided here) all other packages are bundled with the standard installation of [Anaconda Distribution](https://www.anaconda.com/products/individual) (Pyhton 3.7). Our implementation also imports the `gurobipy` Python package that can be separately installed by the Anaconda package manager. Note that along with the Python package, you also need to install [Gurobi Optimizer](https://www.gurobi.com/academia/academic-program-and-licenses/), which is free for research and educational work.

In [1]:
import numpy as np
from sklearn.metrics import accuracy_score
from sklearn.ensemble import RandomForestClassifier
from sklearn.tree import DecisionTreeClassifier
from sklearn.model_selection import train_test_split 
from MIRCO import MIRCO

Here is a list of problems that we can try.

In [2]:
import RuleCoverDatasets as RCDS
problems = [RCDS.banknote, RCDS.ILPD, RCDS.ionosphere,
            RCDS.transfusion, RCDS.liver, RCDS.tictactoe,
            RCDS.wdbc, RCDS.mammography, RCDS.diabetes, 
            RCDS.oilspill, RCDS.phoneme, RCDS.seeds, RCDS.wine,
            RCDS.glass, RCDS.ecoli]

As an example let's work with problem `ionosphere` and import its dataset.

In [3]:
df = np.array(RCDS.ionosphere('datasets/'))                                                                       
X = df[:, 0:-1]                                                                    
y = df[:, -1]                                                                                      

First we will use the *entire* data set for training a random forest classifier. Then, we will obtain the predictions with both the random forest model and the set of rules extracted by MIRCO. 

In [4]:
randomstate = 13 # Random seed is fixed
crit = "gini" # Impurity criterion (for now only other option is "entropy")

# Random Forest
RF = RandomForestClassifier(random_state=randomstate, criterion=crit)
RF_fit = RF.fit(X, y)
RF_pred = RF_fit.predict(X)

# MIRCO
MRC = MIRCO(RF_fit)
MRC_fit = MRC.fit(X, y)
MRC_pred = MRC_fit.predict(X)

Since we are using the entire dataset, we naturally obtain quite optimistic accuracies.

In [5]:
print('## ACCURACIES ##')
print('Random Forest: ', accuracy_score(RF_pred, y)) 
print('MIRCO: ', accuracy_score(MRC_pred, y))

## ACCURACIES ##
Random Forest:  1.0
MIRCO:  0.9515669515669516


We can also check the numbers of rules generated by the random forest algorithm ant MIRCO.

In [6]:
print('\n## NUMBERS OF RULES ##')
print('Random Forest: ', MRC_fit.initNumOfRules)
print('MIRCO: ', MRC_fit.numOfRules)


## NUMBERS OF RULES ##
Random Forest:  2433
MIRCO:  11


These figures show that MIRCO obtains a close-enough performance with a significantly less number of rules. As we use the entire data set, MIRCO covers *all* the samples.

In [7]:
print('Number of missed test samples by MIRCO: ', MRC_fit.numOfMissed)

Number of missed test samples by MIRCO:  0


For interpretation, we can also print the set of rules used by MIRCO.

In [8]:
print('\n\nRules obtained by MIRCO')
MRC_fit.exportRules()



Rules obtained by MIRCO
RULE 0:
==> x[0] > 0.50
==> x[30] > 0.19
==> x[22] > -0.81
==> x[7] > -0.99
==> x[23] <= 0.45
==> x[17] > -0.81
==> x[2] > 0.13
==> x[31] <= 0.64
==> x[32] > -0.12
==> x[30] > 0.49
==> Class numbers: [1.00, 123.00]
RULE 1:
==> x[13] <= -0.56
==> Class numbers: [32.00, 1.00]
RULE 2:
==> x[2] > 0.19
==> x[26] <= 1.00
==> x[22] <= 0.09
==> x[4] > 0.08
==> x[7] > -0.58
==> x[4] > 0.42
==> Class numbers: [1.00, 50.00]
RULE 3:
==> x[28] > 0.13
==> x[33] > 0.95
==> x[7] > 0.61
==> Class numbers: [10.00, 1.00]
RULE 4:
==> x[6] > 0.04
==> x[23] > -0.99
==> x[7] > -0.99
==> x[7] <= 0.99
==> x[4] > 0.03
==> x[27] > -0.96
==> x[3] > -0.74
==> x[7] > -0.63
==> x[5] <= 0.80
==> x[11] > -0.12
==> Class numbers: [8.00, 183.00]
RULE 5:
==> x[4] <= 0.04
==> Class numbers: [67.00, 0.00]
RULE 6:
==> x[19] > -0.94
==> x[4] > 0.23
==> x[2] > 0.11
==> x[3] > -0.80
==> x[21] <= 0.91
==> x[32] <= 0.99
==> x[23] <= 0.16
==> Class numbers: [5.00, 154.00]
RULE 7:
==> x[4] > 0.04
==> x[7]

Though MIRCO itself is not a classifier, we can also apply the rules that it extracts from a random forest model. Suppose we apply a standard train-test split to the dataset and compare the results of MIRCO against a decision tree classifier and a random forest classifier (the one used for rule extraction).

In [9]:
randomstate = 67 # Random seed is fixed
X_train, X_test, y_train, y_test = \
    train_test_split(X, y, test_size=0.3, random_state=randomstate)                                           

# Decision Tree
DT = DecisionTreeClassifier(random_state=randomstate, criterion=crit)
DT_fit = DT.fit(X_train, y_train)
DT_pred = DT_fit.predict(X_test)

# Random Forest
RF = RandomForestClassifier(random_state=randomstate, criterion=crit)
RF_fit = RF.fit(X_train, y_train)
RF_pred = RF_fit.predict(X_test)

# MIRCO
MRC = MIRCO(RF_fit)
MRC_fit = MRC.fit(X_train, y_train)
MRC_pred = MRC_fit.predict(X_test)

print('## ACCURACIES ##')
print('Decision Tree: ', accuracy_score(DT_pred, y_test)) 
print('Random Forest: ', accuracy_score(RF_pred, y_test)) 
print('MIRCO: ', accuracy_score(MRC_pred, y_test))
print('\n## NUMBERS OF RULES ##')
print('Decision Tree: ', DT_fit.tree_.n_leaves)
print('Random Forest: ', MRC_fit.initNumOfRules)
print('MIRCO: ', MRC_fit.numOfRules)

## ACCURACIES ##
Decision Tree:  0.8867924528301887
Random Forest:  0.9433962264150944
MIRCO:  0.9150943396226415

## NUMBERS OF RULES ##
Decision Tree:  23
Random Forest:  2062
MIRCO:  10


MIRCO outperforms the decision tree classifier with a half of the number of rules generated by the decision tree algorithm. However, as we have applied a train-test split, some of the test samples may not be covered with MIRCO.

In [10]:
print('Number of missed test samples by MIRCO: ', MRC_fit.numOfMissed)

Number of missed test samples by MIRCO:  6


We note that the accuracy values reported above do include the missed points because we still classify such samples by applying the rules that have the largest fraction of accepted clauses.