<h1> Multilayer Neural Network from scratch using C++</h1>

Here I implemented a multilayer neural network in c++ using the wrapper pybind11 to create a python library (the library is imported using <tt>import nnet</tt>).

After importing the library I test the network on the iris dataset against an equivalent architecture built with the keras interface.

In [1]:
# Import libraries
import numpy as np
import pandas as pd
import random
# Add library path to system libraries
import sys
sys.path.append('../CPP/MNNET/')
import nnet
import matplotlib.pyplot as plt
%matplotlib inline 

In [2]:
# Read dataset in dataframe
df = pd.read_csv('../DATASETS/iris_dataset.csv',header=None).dropna()
print(df.head())

     0    1    2    3            4
0  5.1  3.5  1.4  0.2  Iris-setosa
1  4.9  3.0  1.4  0.2  Iris-setosa
2  4.7  3.2  1.3  0.2  Iris-setosa
3  4.6  3.1  1.5  0.2  Iris-setosa
4  5.0  3.6  1.4  0.2  Iris-setosa


In [3]:
# Design matrix and target vector
X = df.loc[:,:3].values
y = df.loc[:,4].values
# Scale design matrix (X) entries from 0 to 1
from sklearn.preprocessing import MinMaxScaler
scaler = MinMaxScaler(copy=True, feature_range=(0,1))
X = scaler.fit_transform(X)
# Encode target vector (y) with dummy variables (one-hot encoding)
from sklearn.preprocessing import LabelBinarizer
encoder = LabelBinarizer()
y = encoder.fit_transform(y)

In [4]:
# Function to shuffle data
def _shuffle(X,y):
    idx = np.arange(len(y))
    np.random.shuffle(idx)    
    return(X[idx,:], y[idx,:])
# Function to split data set into training and cross-validation (or test) data sets
# p_train is the percentage of the data set used as training (default is 80%)
def _split(X,y,p_train=0.8):
    idx = int(len(y) * p_train)
    Xtrain = X[:idx,:]
    ytrain = y[:idx,:]
    Xcv = X[idx:,:]
    ycv = y[idx:,:]
    return Xtrain, ytrain, Xcv, ycv

# Shuffle data randomly
np.random.seed(12365)
X,y = _shuffle(X,y)
# Split in training and cross validation sets
Xtrain, ytrain, Xcv, ycv = _split(X,y)

# Get accuracy on a given dataset
def _test(nnet, X, y, verbose=False):
    ypred = []
    for x in X:
        temp = np.argmax(nnet.predict(x,[]))
        ypred.append(temp)
    y = [np.argmax(i) for i in y]
    success = np.nansum([1 for i in range(len(y)) if y[i] == ypred[i]])
    if verbose:
        print("Accuracy on dataset: %5.2f %%" % (100*np.divide(success,len(ypred))))
    return 100*np.divide(success,len(ypred)), ypred


<h3> Custom implementation in C++</h3>

For this particular dataset we will use three hidden layers with sigmoid activation function.

In [5]:
from sklearn.metrics import confusion_matrix
# Set parameters for the network
learning_rate = 0.1
momentum = 0.9

# Initialize network
network = nnet.network(learning_rate, momentum)
# Build network architecture
network.add_layer(nnet.layer(Xtrain.shape[1],"sigmoid")) #input layer
network.add_layer(nnet.layer(100,"sigmoid"))
network.add_layer(nnet.layer(100,"sigmoid"))
network.add_layer(nnet.layer(100,"sigmoid"))
network.add_layer(nnet.layer(ytrain.shape[1],"sigmoid")) #output layer
# Train network
n_epochs = 100
network.train(X,y,n_epochs)
acc_train, ypred = _test(network, Xtrain, ytrain, verbose=True)
cm = confusion_matrix([np.argmax(l) for l in ytrain],ypred)
print(cm)

# Test on cross validation set
acc_cv, ypred = _test(network, Xcv, ycv, verbose=True)
cm = confusion_matrix([np.argmax(l) for l in ycv],ypred)
print(cm)



Accuracy on dataset: 97.50 %
[[46  0  0]
 [ 0 35  2]
 [ 0  1 36]]
Accuracy on dataset: 96.67 %
[[ 4  0  0]
 [ 0 12  1]
 [ 0  0 13]]


<h3> Keras implementation</h3>

Here we implement the same architecture with keras (three hidden layers with 100 units each and sigmoid activation function). We also use the same momentum and learning rate and stochastic gradient descent optimizer with a mean squared error loss function).

In [6]:
# Get accuracy on a given dataset
def _test_keras(nnet, X, y, verbose=False):
    ypred = [np.argmax(i) for i in nnet.predict(X, batch_size=1)]
    y = [np.argmax(i) for i in y]
    success = np.nansum([1 for i in range(len(y)) if y[i] == ypred[i]])
    if verbose:
        print("Accuracy on dataset: %5.2f %%" % (100*np.divide(success,len(ypred))))
    return 100*np.divide(success,len(ypred)),ypred

# Keras implementation
from keras.models import Sequential
from keras.layers import Dense
from keras.optimizers import SGD

nnet = Sequential()
# Hidden layers
nnet.add(Dense(100, input_dim=Xtrain.shape[1], activation='sigmoid'))
nnet.add(Dense(100, activation='sigmoid'))
nnet.add(Dense(100, activation='sigmoid'))
# Output layer
nnet.add(Dense(3, activation='sigmoid'))
# Set optimizer parameters
sgd = SGD(decay=0.0, lr=0.1, nesterov=False, momentum=0.9)
# Compile and set the loss function
nnet.compile(loss="mean_squared_error", optimizer=sgd, metrics=['accuracy'])

# Train online for 10 epochs
nnet.fit(Xtrain,ytrain,epochs=100,shuffle=True, verbose=0, batch_size=1)
acc_train, ypred = _test_keras(nnet, Xtrain, ytrain, verbose=True)
cm = confusion_matrix([np.argmax(l) for l in ytrain],ypred)
print(cm)
# Test on cross validation set
acc_cv, ypred = _test_keras(nnet, Xcv, ycv, verbose=True)
cm = confusion_matrix([np.argmax(l) for l in ycv],ypred)
print(cm)

Using TensorFlow backend.


Accuracy on dataset: 95.83 %
[[46  0  0]
 [ 0 32  5]
 [ 0  0 37]]
Accuracy on dataset: 93.33 %
[[ 4  0  0]
 [ 0 11  2]
 [ 0  0 13]]


Results using the two implementations are very similar. Discrepancies are due to different initializations and random shuffling of the data. For a more robust comparison one could use K-fold cross-validation or bootstrapping.