# Feed Forward Neural Network
This Notebook contains the definition of a class, which implements a feed forward neural network. This network can be used to solve a classification problem (e.g. the classification of text documents). When instantiating an object of the class, the features must be specified. Afterwards, the labels must be set. Internally, the class will provide a one-hot encoding of the labels. Further, the class contains a method which implements the splitting of the data in a training set and a testing set. Also, the size of batches used to perform (stochastic) gradient descent, can be specified, as well as the number of hidden nodes used by the network. The output of the network is a probability distribution over the set of labels. The output is generated by applying softmax on the output of the hidden layer. The non-linearity used by the hidden layer is given by the ReLu function i.e.
$$f(x)=\max\{0,x\}$$
The cost function is defined in terms of the cross-entropy of the output and the true training sample:
$$ Cost(y,\hat{y})= -\sum_iy_i\ln(\hat{y_1})$$

## loading of the relevant packages

In [12]:
import tensorflow as tf
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import LabelBinarizer

## Definition of the Class

In [13]:

class NeuralNetwork(object):
    def __init__(self,features):
        '''Constructor of the class
            Args:
                features(matrix) - feature matrix where the columns describe the features and the rows the samples
        '''
        self.features=features
        
    def set_labels(self,labels):
        '''Method which sets the labels, i.e. the targets we want to predict based on the features
            Args:
                labels(vector) - vector where the i-th element is the label which corresponds to the i-th row of the features
                matrix
        '''
        self.labels=labels
        # the code below implements one-hot encoding of the labels
        lb=LabelBinarizer()
        lb.fit([label[0] for label in labels])
        self.labels_one_hot=lb.transform(labels)

    def get_train_test_split(self,test_size=0.2):
        '''Method which splits the features and labels in a training set and a testing set
            Args:
                test_size(number) - defines the size of the test set
        '''
        X_train,X_test,y_train,y_test=\
        train_test_split(self.features,self.labels_one_hot,test_size=test_size)
        return (X_train,X_test,y_train,y_test)
        
    def get_batches(self,batch_size):
        '''Helper method to devide the features and labels in mini-batches
            Args:
                batch_size(number) - size of the batches e.g. 256 (should fit in memory of the machine)
        '''
        assert len(self.features) == len(self.labels)
        output_batches = []
    
        sample_size = len(self.features)
        for start_i in range(0, sample_size, batch_size):
            end_i = start_i + batch_size
            batch = [self.features[start_i:end_i], self.labels_one_hot[start_i:end_i]]
            output_batches.append(batch)
        
        return output_batches
    
    def build_neural_net(self,n_hidden_nodes):
        '''Method which builds the Neuronal Network in TensorFlow
            Args:
                n_hidden_nodes(number) - number of hidden notes of the network
            
        '''
        self.n_hidden_nodes=n_hidden_nodes
        self.n_features=len(self.features[0])
        self.n_labels=len(self.labels_one_hot[0])
        
        self.x=tf.placeholder(dtype=tf.float32,shape=[None,self.n_features])
        self.y=tf.placeholder(dtype=tf.float32,shape=[None,self.n_labels])
        
        self.w1=tf.Variable(tf.truncated_normal([self.n_features,self.n_hidden_nodes]))
        self.w2=tf.Variable(tf.truncated_normal([self.n_hidden_nodes,self.n_labels]))
        self.b1=tf.Variable(tf.zeros([self.n_hidden_nodes]))
        self.b2=tf.Variable(tf.zeros([self.n_labels]))
        
        h1=tf.matmul(self.x,self.w1)+self.b1
        a1=tf.nn.relu(h1)
        
        h2=tf.matmul(a1,self.w2)+self.b2
        a2=tf.nn.relu(h2)
        
        self.output=tf.nn.softmax(a2)
        self.prediction = tf.argmax(self.output,1)
        self.correct_prediction = tf.equal(self.prediction, tf.argmax(self.y, 1))
        self.accuracy = tf.reduce_mean(tf.cast(self.correct_prediction, tf.float32))
        
        self.cost=tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=self.output,labels=self.y))
        
    def train_model(self,learning_rate,epochs,batch_size,test_size=0.2):
        '''Method which trains the model
            Args:
                learning_rate(number) - learning rate used in gradient descent
                epochs(number) - number of iterations (epochs) used in gradient descent
                batch_size(number) - size of the mini batches used in training
                test_size(number) - size of the test set
        '''
        self.lr=learning_rate
        self.optimizer=tf.train.GradientDescentOptimizer(learning_rate=self.lr).minimize(self.cost)
        X_train,X_test,y_train,y_test=self.get_train_test_split(test_size)
        batches=self.get_batches(batch_size)

        init = tf.global_variables_initializer()
        with tf.Session() as sess:
            sess.run(init)
            
            for epoch in range(epochs):
                for X_batch,y_batch in batches:
                    sess.run(self.optimizer,feed_dict={self.x:X_batch,self.y:y_batch})
                cost_train=sess.run(self.cost,feed_dict={self.x:X_train,self.y:y_train})
                accuracy_test=sess.run(self.accuracy,feed_dict={self.x:X_test,self.y:y_test})
                accuracy_train=sess.run(self.accuracy,feed_dict={self.x:X_train,self.y:y_train})
                print("In epoch {} is the cost equals {}".format(epoch,cost_train))
                print("In epoch {} is the accuracy on the training set equals {}".format(epoch,accuracy_train))
                print("In epoch {} is the accuracy on the test set equals {}".format(epoch,accuracy_test)) 

## Illustration
The cells below show by using toy data how the class behaves

In [14]:
features=[[1,2,3],
          [6,6,6],
          [8,8,8],
          [4.6,6,7.9],
          [4,4,4],
          [3,4,2]]
labels=[[3],
        [7],
        [9],
        [3],
        [1],
        [2]]

In [15]:
network=NeuralNetwork(features)

In [19]:
network.set_labels(labels)

In [22]:
network.build_neural_net(3)

In [25]:
network.train_model(learning_rate=0.9,epochs=7,batch_size=2)

In epoch 0 is the cost equals 1.605173110961914
In epoch 0 is the accuracy on the training set equals 0.5
In epoch 0 is the accuracy on the test set equals 0.0
In epoch 1 is the cost equals 1.6000416278839111
In epoch 1 is the accuracy on the training set equals 0.5
In epoch 1 is the accuracy on the test set equals 0.0
In epoch 2 is the cost equals 1.5952329635620117
In epoch 2 is the accuracy on the training set equals 0.5
In epoch 2 is the accuracy on the test set equals 0.0
In epoch 3 is the cost equals 1.5902115106582642
In epoch 3 is the accuracy on the training set equals 0.5
In epoch 3 is the accuracy on the test set equals 0.0
In epoch 4 is the cost equals 1.5849881172180176
In epoch 4 is the accuracy on the training set equals 0.5
In epoch 4 is the accuracy on the test set equals 0.0
In epoch 5 is the cost equals 1.5795880556106567
In epoch 5 is the accuracy on the training set equals 0.5
In epoch 5 is the accuracy on the test set equals 0.0
In epoch 6 is the cost equals 1.574

## Next Steps
The model can easily extended to include more hidden layers (i.e. we can make the model deeper). Further, the cost function can be adjusted to reflect class imbalance and dropout can be included to avoid overfitting.