<h1>gcForest Algorithm</h1>

<p>The gcForest algorithm was suggested in Zhou and Feng 2017 ( https://arxiv.org/abs/1702.08835 , refer for this paper for technical details) and I provide here a python3 implementation of this algorithm.<br>
I chose to adopt the scikit-learn syntax for ease of use and hereafter I present how it can be used.</p>

In [1]:
from GCForest import gcForest
from sklearn.datasets import load_iris, load_digits
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

<h2>Iris example</h2>

<p>The iris data set is actually not a very good example as the gcForest algorithm is better suited for time series and images where informations can be found at different scales in one sample.<br>
Nonetheless it is still an easy way to test the method.</p>

In [6]:
# loading the data
iris = load_iris()
X = iris.data
y = iris.target
X_tr, X_te, y_tr, y_te = train_test_split(X, y, test_size=0.33)

<p>First calling and training the algorithm.
A specificity here is the presence of the 'shape_1x' keyword to specify the shape of a single sample.
I have added it as pictures fed to the machinery might not be square.<br>
Obviously it is not very relevant for the iris data set but still, it has to be defined.</p>

In [7]:
gcf = gcForest(window=[2], tolerance=0.0)
gcf.fit(X_tr, y_tr, shape_1X=[4,1])

Slicing Sequence...
Training MGS Random Forests...
Adding/Training Layer, n_layer=1
Layer validation accuracy = 1.0
Adding/Training Layer, n_layer=2
Layer validation accuracy = 1.0


<p>Now checking the prediction for the same input:<p>

In [8]:
pred_X = gcf.predict(X_te)
print(pred_X)

Slicing Sequence...
[2 1 2 1 2 0 2 1 2 2 1 1 2 2 0 0 0 0 0 1 2 2 0 2 0 0 1 2 2 1 0 0 2 0 1 1 2
 2 2 0 0 0 1 2 0 0 0 1 2 1]


In [9]:
# evaluating accuracy
accuracy = accuracy_score(y_true=y_te, y_pred=pred_X)
print('gcForest accuracy : {}'.format(accuracy))

gcForest accuracy : 0.92


<h2>Digits Example</h2>
<p>A much better example is the digits data set containing images of hand written digits.
The scikit data set can be viewed as a mini-MNIST for training purpose.</p>

In [10]:
# loading the data
digits = load_digits()
X = digits.data
y = digits.target
X_tr, X_te, y_tr, y_te = train_test_split(X, y, test_size=0.33)

<p> ... taining gcForest ... (can take some time...) </p>

In [11]:
gcf = gcForest(window=[4,6], tolerance=0.0, min_samples=7)
gcf.fit(X_tr, y_tr, shape_1X=[8,8])

Slicing Images...
Training MGS Random Forests...
Slicing Images...
Training MGS Random Forests...
Adding/Training Layer, n_layer=1
Layer validation accuracy = 1.0
Adding/Training Layer, n_layer=2
Layer validation accuracy = 1.0


<p> ... and predicting classes ... </p>

In [12]:
pred_X = gcf.predict(X_te)
print(pred_X)

Slicing Images...
Slicing Images...
[0 4 9 5 6 6 9 6 7 1 7 5 4 5 2 8 3 1 4 2 4 9 1 7 0 5 2 2 6 6 5 8 4 2 9 7 8
 4 2 3 1 9 8 6 8 6 8 5 5 1 1 6 6 1 6 7 3 2 1 8 5 9 5 2 1 3 7 4 8 5 7 6 7 1
 9 1 0 9 9 0 3 1 1 9 4 5 1 9 6 5 3 6 3 3 2 8 3 1 9 2 7 0 4 3 1 7 5 7 4 0 5
 8 4 9 1 8 8 7 3 4 5 0 5 4 2 5 8 0 8 9 7 6 2 9 7 1 0 4 9 5 5 7 9 2 5 2 8 4
 1 9 9 1 4 9 1 0 6 0 2 7 6 5 3 1 4 7 5 3 4 8 2 2 1 8 1 0 1 1 8 2 3 3 0 1 4
 6 3 2 5 9 4 3 8 1 7 5 3 3 8 5 2 1 4 3 0 5 3 2 9 5 1 7 7 4 3 4 0 2 3 4 1 1
 2 7 9 2 0 7 7 7 6 8 8 5 2 7 2 6 1 0 3 6 2 3 0 0 7 5 3 5 2 4 1 0 4 9 1 6 3
 6 6 1 9 6 6 6 6 3 7 3 5 7 5 7 6 0 4 0 2 4 1 6 2 0 8 6 6 4 9 2 6 7 1 8 9 3
 6 8 5 4 7 9 0 6 5 0 1 7 9 5 4 2 0 8 7 1 1 6 3 3 1 8 0 7 5 0 9 4 0 3 9 8 8
 8 8 8 4 5 8 4 1 3 8 9 9 0 0 8 3 1 4 3 4 9 5 2 6 6 2 8 0 6 3 3 9 7 5 4 6 4
 6 7 1 4 9 5 9 6 1 8 4 8 6 6 8 2 3 8 2 2 9 1 9 6 4 2 3 7 5 6 8 1 8 0 7 1 8
 4 2 9 5 1 1 0 1 9 0 7 9 5 2 8 4 6 2 3 0 1 6 3 0 9 7 3 1 0 0 1 5 4 5 5 0 9
 8 9 9 2 1 3 6 7 8 7 2 9 4 3 0 8 5 9 9 5 4 0 5 3 9 2 1 8 4 7 3 3

In [15]:
# evaluating accuracy
accuracy = accuracy_score(y_true=y_te, y_pred=pred_X)
print('gcForest accuracy : {}'.format(accuracy))

gcForest accuracy : 0.98989898989899
