# Rule Cover Boosting - RCBoost

In this note, we compare the solutions obtained with Rule Cover Boosting (RCBoost) algorithm against several well-known ensemble methods. 

We start with adding the required packages. Please note that except RCBoost (provided here) all other packages are bundled with the standard installation of [Anaconda Distribution](https://www.anaconda.com/products/individual) (Pyhton 3.7). For solving the linear programming problems, RCBoost also uses the `gurobipy` Python package that can be separately installed by the Anaconda package manager. Note that along with the Python package, you also need to install [Gurobi Optimizer](https://www.gurobi.com/academia/academic-program-and-licenses/), which is free for research and educational work.

In [1]:
import numpy as np
from sklearn.metrics import accuracy_score
from sklearn.ensemble import RandomForestClassifier
from sklearn.ensemble import AdaBoostClassifier
from sklearn.ensemble import GradientBoostingClassifier
from sklearn.tree import DecisionTreeClassifier
from sklearn.model_selection import train_test_split
from RCBoost import RCBoost

Next we list the set of problems.

In [2]:
import RuleCoverDatasets as RCDS
problems = [RCDS.banknote, RCDS.ILPD, RCDS.ionosphere,
            RCDS.transfusion, RCDS.liver, RCDS.tictactoe,
            RCDS.wdbc, RCDS.mammography, RCDS.diabetes, 
            RCDS.oilspill, RCDS.phoneme, RCDS.seeds, RCDS.wine,
            RCDS.glass, RCDS.ecoli]

To give an example, let's solve problem `ionosphere` with different methods. We first import the dataset and apply a standard train-test split.

In [3]:
df = np.array(RCDS.ionosphere('datasets/'))                                                
X = df[:, 0:-1]                                                                    
y = df[:, -1]                                                                                      

randomstate = 29 # Random seed is fixed

# Train (70%) - Test (30%) split
X_train, X_test, y_train, y_test = \
    train_test_split(X, y, test_size=0.3, random_state=randomstate)

The contender methods are Random Forest (RF), ADABoost (ADA), Gradient Boosting (GDB) and RCBoost (RCB). We set different hyperparameters for the methods. Note that the base estimator for ADA is a decision tree.

In [4]:
maxdepth = 10 # Used by all methods
crit = "gini" # Used by all methods - only other option is "entropy"
nestimators = 50 # Used by RF, ADA and GDB
maxrmpcalls = 50 # Maximum number of RMP calls - Used only by RCB

We are now ready to train all contender methods and obtain their predictions.

In [5]:
# Random Forest
RF = RandomForestClassifier(random_state=randomstate, criterion=crit)
RF_fit = RF.fit(X_train, y_train)
RF_pred = RF_fit.predict(X_test)

# ADABoost
ADA = AdaBoostClassifier(base_estimator=
                         DecisionTreeClassifier(max_depth=maxdepth,criterion=crit),
                         random_state=randomstate, n_estimators=nestimators)
ADA_fit = ADA.fit(X_train, y_train)
ADA_pred = ADA_fit.predict(X_test)

# Gradient Boosting
GDB = GradientBoostingClassifier(max_depth=maxdepth, n_estimators=nestimators,
                                random_state=randomstate)
GDB_fit = GDB.fit(X_train, y_train)
GDB_pred = GDB_fit.predict(X_test)

# RCBoost
RCB = RCBoost(max_depth=maxdepth, maxNumOfRMPCalls=maxrmpcalls,
              random_state=randomstate)                                                       
RCB_fit = RCB.fit(X_train, y_train)
RCB_pred = RCB_fit.predict(X_test)

Using license file /opt/gurobi/gurobi.lic
Academic license - for non-commercial use only


Since the initial set of rules in RCBoost comes from a decision tree, we also report the results with this base tree to see the improvement obtained with RCBoost.

In [6]:
# Starting Decision Tree of RCBoost
initDT_fit = RCB_fit.initialEstimator.fit(X_train, y_train)
initDT_pred = initDT_fit.predict(X_test)

We next check the accuracies of different methods.

In [7]:
print('## ACCURACIES ##')
print('Random Forest: ', accuracy_score(RF_pred, y_test)) 
print('ADABoost: ', accuracy_score(ADA_pred, y_test)) 
print('Gradient Boosting: ', accuracy_score(GDB_pred, y_test)) 
print('RCBoost: ', accuracy_score(RCB_pred, y_test)) 
print('Initial Decision Tree: ', accuracy_score(initDT_pred, y_test))

## ACCURACIES ##
Random Forest:  0.9339622641509434
ADABoost:  0.9339622641509434
Gradient Boosting:  0.8584905660377359
RCBoost:  0.9433962264150944
Initial Decision Tree:  0.8490566037735849


RCBoost obtains a better accuracy than all methods with this particular random seed. We can also see how many RMP calls are made with RCBoost to obtain this result.

In [8]:
print('\nNumber of RMP calls by RCBoost: ', RCB_fit.nofRMPcalls)


Number of RMP calls by RCBoost:  11
