<h1>gcForest Algorithm</h1>

<p>The gcForest algorithm was suggested in Zhou and Feng 2017 (refer for this paper for technical details) and I provide here a python implementation of this algorithm.<br>
I chose to adopt the scikit-learn syntax for ease of use and hereafter I present how it can be used.</p>

In [1]:
from GCForest import gcForest
from sklearn.datasets import load_iris, load_digits
from sklearn.metrics import accuracy_score

<h2>Iris example</h2>

<p>The iris data set is actually not a very good example as the gcForest algorithm is better suited for time series and images where informations can be found at different scales in one sample.<br>
Nonetheless it is still an easy way to test the method.</p>

In [2]:
# loading the data
iris = load_iris()
X = iris.data
y = iris.target

<p>First calling and training the algorithm.
A specificity here is the presence of the 'shape_1x' keyword to specify the shape of a single sample.
I have added it as pictures fed to the machinery might not be square.<br>
Obviously it is not very relevant for the iris data set but still, it has to be defined.</p>

In [3]:
gcf = gcForest(cascade_layer=5, window=[3])
gcf.fit(X, y, shape_1X=[4,1])

Slicing Sequence...
Training MGS Random Forests...
Adding/Training Layer, n_layer=1
Layer validation accuracy = 0.9666666666666667
Adding/Training Layer, n_layer=2
Layer validation accuracy = 0.9666666666666667


<p>Now checking the prediction for the same input:<p>

In [4]:
pred_X = gcf.predict(X)
print(pred_X)

Slicing Sequence...
[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
 1 1 1 1 1 1 1 1 1 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 1 2 2 2 2
 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
 2 2]


In [5]:
# evaluating accuracy
accuracy = accuracy_score(y_true=y, y_pred=pred_X)
print('gcForest accuracy : {}'.format(accuracy))

gcForest accuracy : 0.9866666666666667


<h2>Digits Example</h2>
<p>A much better example is the digits data set containing images of hand written digits.
The scikit data set can be viewed as a mini-MNIST for training purpose.</p>

In [6]:
# loading the data
digits = load_digits()
X = digits.data
y = digits.target

<p> ... taining gcForest ... (can take some time...) </p>

In [7]:
gcf = gcForest(cascade_layer=5, window=[3,6])
gcf.fit(X, y, shape_1X=[8,8])

Slicing Images...
Training MGS Random Forests...
Slicing Images...
Training MGS Random Forests...
Adding/Training Layer, n_layer=1
Layer validation accuracy = 0.9277777777777778
Adding/Training Layer, n_layer=2
Layer validation accuracy = 0.9555555555555556
Adding/Training Layer, n_layer=3
Layer validation accuracy = 0.9527777777777777


<p> ... and predicting classes ... </p>

In [8]:
pred_X = gcf.predict(X)
print(pred_X)

Slicing Images...
Slicing Images...
[0 1 8 ..., 8 9 8]


In [9]:
# evaluating accuracy
accuracy = accuracy_score(y_true=y, y_pred=pred_X)
print('gcForest accuracy : {}'.format(accuracy))

gcForest accuracy : 0.9894268224819143
