# DeepLearning 01b. Feedforward Neural Network (FNN) Extensions

* **Implementation 2a**: FNN from scratch
    * *Source*: Michael Nielson's book on neural nets 
    (http://neuralnetworksanddeeplearning.com/chap2.html)
    (http://neuralnetworksanddeeplearning.com/chap3.html).
    * *Data*: MNIST
    * *Contribution*: 
        * Hopefully more readable code annotation.
        * Extension to batch-vectorized implementation.
        * Extension with
            * Regularization
            * Weight Initialization
            * Early Stopping
            * Learning Rate Decay
            * Tanh, Relu Activation

* **Implementation 2b**: FNN with Tensorflow
    * *Source*: Tensorflow tutorial, which is a more complex implementation: (https://github.com/tensorflow/tensorflow/blob/master/tensorflow/examples/tutorials/mnist/mnist.py) 
    * *Contribution*: Simple demo how the same thing can be done exponentially easier with a library.
    

## I. Implementation 2a

* Cost: Crossent
* Library: Numpy only
* Added: 
    * Regularization
    * Weight Initialization
    * Early Stopping
    * Learning Rate Decay
    * Tanh, Relu Activation

In [1]:
import numpy as np
import random, os, sys
from __future__ import division

In [2]:
def sigmoid(z):
    """
    Elementwise sigmoid conversion.
    
    Arguments:
    z: Vector computed from (weight * x + bias).
    
    Returns sigmoid converted vector of the same shape.
    """
    return 1.0 / (1.0+np.exp(-z)) 

def sigmoid_prime(z): 
    """
    Computes the derivative of sigmoid function.
    
    Arguments:
    z: Vector computed from (weight * x + bias).
    
    Returns the derivative vector for sigmoid(z).
    """
    return sigmoid(z) * (1.0-sigmoid(z))

def tanh(z): 
    """
    Elementwise tanh conversion.
    
    Arguments:
    z: Vector computed from (weight * x + bias).
    
    Returns tanh converted vector of the same shape.
    """
    return (np.exp(z)-np.exp(-z)) / (np.exp(z)+np.exp(-z))

def tanh_prime(z):
    """
    Computes the derivative of tanh function.
    
    Arguments:
    z: Vector computed from (weight * x + bias).
    
    Returns the derivative vector for tanh(z) (i.e. cosh(z)).
    """
    return 1.0 - np.power(tanh(z),2)

def relu(z): 
    """
    Elementwise relu conversion.
    
    Arguments:
    z: Vector computed from (weight * x + bias).
    
    Returns relu converted vector of the same shape.
    """    
    return np.maximum(0.0, z)

def relu_prime(z):
    """
    Computes the derivative of relu function.
    
    Arguments:
    z: Vector computed from (weight * x + bias).
    
    Returns the derivative vector for relu(z) (i.e. constant gradient/slope = 1).
    """
    return 1.0 * (z > 0)

def sgn(x):
    """
    Return the sign of the input.
    
    Arguments:
    x: Arbitrary number.
    
    Returns the sign of x, with the exception of 0 when the input is equal to 0.
    """
    if x==0: return 0
    return -1 if x<0 else 1

def activation_function(activation_type):
    """
    Selection type of activation neuron.
    
    Arguments:
    activation_type: String specifying activation type.
    
    Returns: a tuple where the first item is the activation function, 
             the second the derivative of the activation function.
    """
    functions = {'sigmoid': (sigmoid, sigmoid_prime),
                 'tanh': (tanh, tanh_prime),
                 'relu': (relu, relu_prime)}
    assert activation_type in functions
    return functions[activation_type]

In [25]:
class NNNumpy:
    """
    Batch-vectorized implementation of Michael Nielson's FNN network.
    """
    
    def __init__(self, sizes, activation_type='sigmoid'): 
        """
        Initialize parameters for FNN.
        
        Arguments:
        sizes:           List of sizes of layers of FNN.
        activation_type: String specifying activation type.
        """
        self.numLayers = len(sizes)
        self.sizes = sizes
        self.biases = [np.random.randn(y,1) for y in sizes[1:]] 
        self.weights = [np.random.randn(y,x) / np.sqrt(x)       
                        for x,y in zip(sizes[:-1],sizes[1:])]
        
        self.activation_func, self.activation_prime = activation_function(activation_type)
     
    def forward_propagation(self, a):
        """
        Feedforward step, feed input through FNN to obtain output.
        
        Arguments:
        a: Input vector.
        
        Returns output vector.
        """
        for b, w in zip(self.biases, self.weights):
            a = self.activation_func(np.dot(w, a) + b)
        return a
    
    def SGD(self, train_data, epochs, batchSize, lr,                # basic configs
            lmd=0.1, regularize=False,                              # regularization
            validation=None, valid_tol=0.0, early_stop_steps=10,    # early stopping
            lr_decay=0.5, lr_stop=7,                                # learning decay
            momentum=0.0):                                          # momentum coefficient
        """
        Stochastic Gradient Descent with configurable batch size.
        
        Arguments:
        train_data:       List of training data points.
        epochs:           Number of epochs to be run.
        batchSize:        Size of minibatch.
        lr:               Learning rate.
        lmd:              Regularization parameter.
        regularize:       Type of regularization.
        validation:       List of validation data points.
        valid_tol:        Float specifying tolerance of improvement in accuracy on validation data.
        early_stop_steps: Upper bound for the number of non-improvement steps.
        lr_decay:         Float specifying the rate of decrease in the learning rate after 
                          a non-improvement step.
        lr_stop:          Upper bound for the number of steps of learning rate decay before stop.
        momentum:         Momentum coefficient
        """
        self.n = len(train_data) # for regularization.
        self.lr = lr
        self.mu = momentum
        self.v = 0.0
        
        if validation: nValid = len(validation)
        cur_valid_acc = 0.0
        no_improve_steps = 0
        lr_decay_steps = 0
        stop_at = 0
        
        n = len(train_data)
        for j in xrange(epochs):
            random.shuffle(train_data)
            batches = [ train_data[k:k+batchSize] for k in xrange(0, n, batchSize) ]
            for batch in batches:
                self.update_batch(batch, lmd, regularize)
            if validation:
                valid_acc = self.evaluate(validation)
                print "Epoch {0} validation accuracy: {1} / {2}".format(j, valid_acc, nValid)
                if valid_acc - cur_valid_acc < valid_tol:
                    cur_valid_acc = valid_acc
                    no_improve_steps += 1
                    self.lr = self.lr * lr_decay
                    lr_decay_steps += 1
                else: no_improve_steps = 0
                if no_improve_steps==early_stop_steps or lr_decay_steps==lr_stop: 
                    stop_at = j
                    break
            else: 
                print "Epoch {0} complete".format(j)
        print "Early Stopping at epoch", stop_at
    
    def update_batch(self, batch, lmd, regularize):
        """
        Update weights and biases for the layers of FNN with a minibatch.
        
        Arguments:
        batch:      List of training data points in minibatch.
        lmd:        Regularization parameter.
        regularize: Type of regularization.
        """
        assert not regularize or regularize in {'l1', 'l2'}
        
        bGrads = [ np.zeros(b.shape) for b in self.biases ]
        wGrads = [ np.zeros(w.shape) for w in self.weights ]

        x_batch = np.hstack([x for x,y in batch])
        y_batch = np.hstack([y for x,y in batch])

        bGradDeltas, wGradDeltas = self.back_propagation(x_batch, y_batch)

        bGradDeltas = [bGradDelta.sum(axis=1).reshape(b.shape) 
                       for bGradDelta,b in zip(bGradDeltas,self.biases)] 
        
        bGrads = [bGrad+bGradDelta for bGrad,bGradDelta in zip(bGrads,bGradDeltas)]
        wGrads = [wGrad+wGradDelta for wGrad,wGradDelta in zip(wGrads,wGradDeltas)]

        if regularize:
            self.weights = [ self.regularize(w,lmd,regularize)+self.m_gradient(wGrad,len(batch)) 
                             for w,wGrad in zip(self.weights,wGrads) ]            
        else:
            self.weights = [ w+self.m_gradient(wGrad,len(batch)) for w,wGrad in zip(self.weights,wGrads) ]
        self.biases = [ b-(self.lr/len(batch))*bGrad for b,bGrad in zip(self.biases,bGrads) ]
    
    def back_propagation(self, x_batch, y_batch): 
        """
        Backpropagation step, propagates errors from output end.
        
        Arguments:
        x_batch: List of training data inputs.
        y_batch: List of training data true classes (binarized).
        
        Returns gradients of weights and biases.
        """
        bGrads = [ np.zeros(b.shape) for b in self.biases ]
        wGrads = [ np.zeros(w.shape) for w in self.weights ]

        activation = x_batch                                                     
        activations = [x_batch]                          
        zs = []                                             
        for b,w in zip(self.biases, self.weights):                               
            z = np.dot(w, activation) + b                                        
            zs.append(z)                                                         
            activation = self.activation_func(z)                                              
            activations.append(activation)                                       

        delta = self.cost_derivative(activations[-1], y_batch) 
                                                                                 
        bGrads[-1] = delta                                                                                                                           
        wGrads[-1] = np.dot(delta, activations[-2].transpose())                   

        for l in xrange(2, self.numLayers):                                      
            z = zs[-l]                                                           
            sp = self.activation_prime(z)    
            delta = np.dot(self.weights[-l+1].transpose(), delta) * sp                                                                            
            bGrads[-l] = delta                                             
            wGrads[-l] = np.dot(delta, activations[-l-1].transpose())            
            
        return (bGrads, wGrads)                                                  
    
    def evaluate(self, validation):
        """
        Evaluate function for test data as validation.
        
        Arguments:
        validation: List of validation data points.
        
        Returns accuracy on the validation.
        """
        validResults = [ (np.argmax(self.forward_propagation(x)), y) for (x, y) in validation ]
        return sum(int(x==y) for (x,y) in validResults)
    
    def cost_derivative(self, outputActivations, y):
        """
        Computes error on the output end.
        
        Arguments:
        outputActivations: Output of FNN.
        y:                 True classes (binarized).
        
        Returns error matrix.
        """
        return (outputActivations-y)

    def regularize(self, w, lmd, mode='l2'):
        """
        Regularization.
        
        Arguments:
        w:    Weight.
        lmd:  Regularization parameter.
        mode: Regularization type.
        
        Returns regularized weight.
        """
        assert mode in {'l1', 'l2'}
        return (w - ((self.lr*lmd)/self.n)*sgn(w)) if mode=='l1' else w*(1 - (self.lr*lmd)/self.n)

    def m_gradient(self, grad, batchSize):
        """
        Momentum update.
        
        Arguments:
        grad: Single gradient.
        batchSize: Current batch size.
        
        Returns momentum-included gradient update.
        """
        grad_sum = ((self.lr/batchSize)*grad).sum()
        grad_update = self.mu*self.v - (self.lr/batchSize)*grad
        self.v = self.mu*self.v - grad_sum
        
        return grad_update


In [4]:
# LOAD DATA

path = "/Users/jacobsw/Desktop/CODER/IMPLEMENTATION_CAMP/BASIC_TOPICS/NN/DATA/neural-networks-and-deep-learning-master/src"
os.chdir(path)
sys.path.append(path)
import mnist_loader
train_data, dev_data, test_data = mnist_loader.load_data_wrapper()

In [26]:
%%time
fnn = NNNumpy([784,30,10],activation_type='sigmoid')
fnn.SGD(train_data,30,10,0.5,lmd=5.0,regularize='l2',validation=dev_data) 

Epoch 0 validation accuracy: 9436 / 10000
Epoch 1 validation accuracy: 9457 / 10000
Epoch 2 validation accuracy: 9535 / 10000
Epoch 3 validation accuracy: 9580 / 10000
Epoch 4 validation accuracy: 9587 / 10000
Epoch 5 validation accuracy: 9566 / 10000
Epoch 6 validation accuracy: 9556 / 10000
Epoch 7 validation accuracy: 9588 / 10000
Epoch 8 validation accuracy: 9604 / 10000
Epoch 9 validation accuracy: 9610 / 10000
Epoch 10 validation accuracy: 9573 / 10000
Epoch 11 validation accuracy: 9569 / 10000
Epoch 12 validation accuracy: 9605 / 10000
Epoch 13 validation accuracy: 9585 / 10000
Epoch 14 validation accuracy: 9580 / 10000
Epoch 15 validation accuracy: 9622 / 10000
Epoch 16 validation accuracy: 9596 / 10000
Epoch 17 validation accuracy: 9612 / 10000
Epoch 18 validation accuracy: 9582 / 10000
Epoch 19 validation accuracy: 9562 / 10000
Epoch 20 validation accuracy: 9597 / 10000
Epoch 21 validation accuracy: 9603 / 10000
Epoch 22 validation accuracy: 9615 / 10000
Epoch 23 validation a

In [10]:
%%time
fnn = NNNumpy([784,30,10],activation_type='tanh')
fnn.SGD(train_data,30,10,0.05,lmd=5.0,regularize='l2',validation=dev_data) 

Epoch 0 validation accuracy: 9180 / 10000
Epoch 1 validation accuracy: 9265 / 10000
Epoch 2 validation accuracy: 9393 / 10000
Epoch 3 validation accuracy: 9412 / 10000
Epoch 4 validation accuracy: 9437 / 10000
Epoch 5 validation accuracy: 9433 / 10000
Epoch 6 validation accuracy: 9474 / 10000
Epoch 7 validation accuracy: 9465 / 10000
Epoch 8 validation accuracy: 9480 / 10000
Epoch 9 validation accuracy: 9489 / 10000
Epoch 10 validation accuracy: 9495 / 10000
Epoch 11 validation accuracy: 9490 / 10000
Epoch 12 validation accuracy: 9481 / 10000
Epoch 13 validation accuracy: 9485 / 10000
Epoch 14 validation accuracy: 9468 / 10000
Epoch 15 validation accuracy: 9510 / 10000
Epoch 16 validation accuracy: 9494 / 10000
Epoch 17 validation accuracy: 9486 / 10000
Epoch 18 validation accuracy: 9517 / 10000
Epoch 19 validation accuracy: 9517 / 10000
Epoch 20 validation accuracy: 9518 / 10000
Epoch 21 validation accuracy: 9497 / 10000
Epoch 22 validation accuracy: 9502 / 10000
Epoch 23 validation a

In [11]:
%%time
fnn = NNNumpy([784,30,10],activation_type='relu')
fnn.SGD(train_data,30,10,0.1,lmd=5.0,regularize='l2',validation=dev_data)

Epoch 0 validation accuracy: 9299 / 10000
Epoch 1 validation accuracy: 9482 / 10000
Epoch 2 validation accuracy: 9512 / 10000
Epoch 3 validation accuracy: 9525 / 10000
Epoch 4 validation accuracy: 9577 / 10000
Epoch 5 validation accuracy: 9602 / 10000
Epoch 6 validation accuracy: 9612 / 10000
Epoch 7 validation accuracy: 9634 / 10000
Epoch 8 validation accuracy: 9646 / 10000
Epoch 9 validation accuracy: 9648 / 10000
Epoch 10 validation accuracy: 9646 / 10000
Epoch 11 validation accuracy: 9642 / 10000
Epoch 12 validation accuracy: 9662 / 10000
Epoch 13 validation accuracy: 9634 / 10000
Epoch 14 validation accuracy: 9647 / 10000
Epoch 15 validation accuracy: 9653 / 10000
Epoch 16 validation accuracy: 9663 / 10000
Epoch 17 validation accuracy: 9635 / 10000
Epoch 18 validation accuracy: 9669 / 10000
Epoch 19 validation accuracy: 9676 / 10000
Epoch 20 validation accuracy: 9683 / 10000
Epoch 21 validation accuracy: 9673 / 10000
Epoch 22 validation accuracy: 9666 / 10000
Epoch 23 validation a

## II. Implementation 2b

* Cost: Crossent
* Library: Tensorflow
* Added: Regularization, Weight Initialization

In [None]:
# TODO