##Two-Layers Neural Network: Scratch/TensorFlow/Pytorch
    This notebook shows the basic idea for two-layers neural network and how to develop it from scratch as well as use Tensorflow and Pytorch.

    The architecture of the neural network is connected by two forward pass, one rectified linear unit (ReLU) and one softmax function. Forward pass combine the input feature and the hidden laery by weights and bias. ReLU is general nonlinear part within modern neural network and softmax can generate multi-class probability for the prediction.

In [0]:
import tensorflow as tf
from tensorflow.examples.tutorials.mnist import input_data
import torch
import numpy as np

## Scratch Version
    In scratch version,the input data is an (nxp) image and the first layer contain 512 neurons whcih require px512 weights and 512 bias. After the linear conbination, ReLU is implemented to generate nonlinear activation. The second layer contain 10 neurons inorder to generate 10 output categories for MNIST prediction. To find out which category has the highest score, a softmax function is utilized.
    
    After the forward pass, the cross-entrophy is calculated and the corresponding gradient with respect to weights and bias are computed. Through the iteration, weights and bias will be update until the accuracy achieve 85 percent for MNIST prediction.

In [0]:
###############################################
## Function 1: Write 2-layer NN from scratch ##
###############################################

def my_NN_scratch(mnist):

    X_test = mnist.test.images
    Y_test = mnist.test.labels
    ntest = X_test.shape[0]
    num_hidden = 512
    num_iterations = 5000
    learning_rate =3.5e-3

    #######################
    ## FILL IN CODE HERE ##
    #######################
    np.random.seed(2)
    d = num_hidden
    p = X_test.shape[1]
    n = 128
    W1 = np.random.randn(p,d)*1e-2
    b1 = np.zeros((1,d))
    W2 = np.random.randn(d,Y_test.shape[1])*1e-2
    b2 = np.zeros((1,Y_test.shape[1]))
    alpha = np.zeros((p+1,d)) #(p+1)xd
    beta = np.zeros((d+1,Y_test.shape[1])) #(d+1)x10
    
    

    for it in range(num_iterations):
        
        #Forward Proporgation
        batch_xs, batch_ys = mnist.train.next_batch(n)

        #FC1
        h1 = np.dot(batch_xs,W1)+b1 #(n,p)(p,d)+(1,d)=(n,d)
        #ReLU
        S1 = np.maximum(0,h1) #(n,d)
        #assert(S1.shape == h1.shape) #nxd
        
        #FC2
        h2 =  np.dot(S1,W2)+b2 #(n,d)(d,10)+(1,10)=(n,10)
        #Softmax
        Pr = np.zeros((n,Y_test.shape[1])) #nx10
        for i in range(n):
          Max = np.max(h2[i,:])
          Pr[i,:]=np.exp(h2[i,:]-Max)/np.sum(np.exp(h2[i,:]-Max)) #avoid overflow: subtract maximum
        
        #Back_Softmax
        dh2 = (1/n)*(batch_ys-Pr)#nx10
        
        #BackFC
        dW2 = np.dot(dh2.T,S1).T #(dL/dh2)(dh2/dW2) ((n,10).T(n,d)).T=(d,10)
        db2 = np.dot(np.ones((1,n)),dh2) #(dL/dh2)(dh2/db2) (1,n)(n,10)=(1,10)
        
        #BackReLU
        dS1 = np.dot(dh2,W2.T) #(dL/dh2)(dh2/dS1) ((n,10)(d,10).T)=(n,d)
        dh1 = np.array(dS1, copy=True)
        dh1[dS1<0]=0 #nxd
        
        #BackFC
        dW1 = np.dot(dh1.T,batch_xs).T#(dL/dh1)(dh1/dW1) ((n,d).T(n,p)).T
        db1 = np.dot(dh1.T,np.ones((n,1))).T#(dL/dh1)(dh1/db1) ((n,d).T(n,1)).T

        
        #Updated
        W2 = W2 + learning_rate * dW2
        b2 = b2 + learning_rate * db2
        W1 = W1 + learning_rate * dW1
        b1 = b1 + learning_rate * db1
        
        alpha[1:, :] = W1
        alpha[0, :] = b1
        beta[1:, :] = W2
        beta[0, :] = b2
        
        #######################
        ## FILL IN CODE HERE ##
        #######################

    return alpha[1:, :], alpha[0, :], beta[1:, :], beta[0, :]

## TensorFlow Version
    The following part utilize TensorFlow to build up the two-layers neural network. The builtin function can efficiently develop the forward pass and the backward pass. Due to the optimized methodology in TensorFLow, more hidden layers and neurons can be implemented so that the accuracy can achieve 96%.

In [0]:
def my_NN_tensorflow(mnist):

    num_hidden = 1024
    x = tf.placeholder(tf.float32, [None, 784])

    W1 = tf.Variable(tf.random_normal([784,num_hidden])) # Define it
    b1 = tf.Variable(tf.random_normal([num_hidden])) # Define it
    W2 = tf.Variable(tf.random_normal([num_hidden,10])) # Define it
    b2 = tf.Variable(tf.random_normal([10])) # Define it
    z = tf.nn.relu(tf.matmul(x, W1) + b1)
    y = tf.matmul(z, W2) + b2

    y_ =  tf.placeholder(tf.float32, [None, 10])
    cross_entropy = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=y, labels=y_)) # Define it formula:-sum([Y*log(P)]cross_entrophy) /n
    train_step = tf.train.GradientDescentOptimizer(0.075).minimize(cross_entropy)
    sess = tf.InteractiveSession()
    tf.global_variables_initializer().run()
    for epoch in range(6000):
        batch_xs, batch_ys = mnist.train.next_batch(100)
        res = sess.run(train_step,feed_dict = {x: batch_xs,y_: batch_ys})  # Define it
    
    #Test
    correct_prediction = tf.equal(tf.argmax(y, 1), tf.argmax(y_, 1))
    accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
    print(sess.run(accuracy, feed_dict={x: mnist.test.images, y_: mnist.test.labels}))
    W1_e, b1_e, W2_e, b2_e = W1.eval(), b1.eval(), W2.eval(), b2.eval()
    sess.close()

    return W1_e, b1_e, W2_e, b2_e

## Pytorch Version
    Pythoch also has efficient builtin function that is helpful to develop forward and backward pass. The other powerful function that is utilized here is the optimizer such as stoachistic graident descent method, which increase the accuracy to 96%.

In [0]:
from torch.autograd import Variable
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim



def my_NN_pytorch(mnist_m):

    class Net(torch.nn.Module):
      def __init__(self):
        super(Net, self).__init__()
        self.fc1 =  nn.Linear(784,100)# Define it
        self.fc2 =  nn.Linear(100,10)# Define it
        
      def forward(self, x):
        x = F.relu(self.fc1(x))# Define it
        x = self.fc2(x)# Define it
        return x

    net = Net()
    #net.zero_grad() ? should not be this; should be optimizer
    Loss =  nn.CrossEntropyLoss() # Define it cross-entrophy (combine softmax and null-loss)
    optimizer = optim.SGD(net.parameters(), lr=0.01, momentum=0.9) # Define it

    for epoch in range(4000):  # loop over the dataset multiple times

        batch_xs, batch_ys = mnist_m.train.next_batch(100)
        #######################
        ## FILL IN CODE HERE ##
        #######################
        batch_xs = torch.as_tensor(batch_xs)
        batch_ys = torch.as_tensor(batch_ys)
        batch_ys = Variable(batch_ys).long()
        optimizer.zero_grad()
        outputs = net(batch_xs)
        loss = Loss(outputs, batch_ys) # input: Torch.floatTensor; level:Torch.longTensor
        loss.backward()
        optimizer.step()

    params = list(net.parameters())
    return params[0].detach().numpy().T, params[1].detach().numpy(), \
        params[2].detach().numpy().T, params[3].detach().numpy()

## Evaluate the training

In [0]:
def evaluate(W1, b1, W2, b2, data):

    inputs = data.test.images
    outputs = np.dot(np.maximum(np.dot(inputs, W1) + b1, 0), W2) + b2
    predicted = np.argmax(outputs, axis=1)
    accuracy = np.sum(predicted == data.test.labels)*100 / outputs.shape[0]
    print('Accuracy of the network on test images: %.f %%' % accuracy)
    return accuracy


In [19]:
def main_test():

    mnist = input_data.read_data_sets('input_data', one_hot=True)
    mnist_m = input_data.read_data_sets('input_data', one_hot=False)
    W1, b1, W2, b2 = my_NN_scratch(mnist)
    evaluate(W1, b1, W2, b2, mnist_m)
    W1, b1, W2, b2 = my_NN_tensorflow(mnist)
    evaluate(W1, b1, W2, b2, mnist_m)
    W1, b1, W2, b2 = my_NN_pytorch(mnist_m)
    evaluate(W1, b1, W2, b2, mnist_m)

main_test()


Extracting input_data/train-images-idx3-ubyte.gz
Extracting input_data/train-labels-idx1-ubyte.gz
Extracting input_data/t10k-images-idx3-ubyte.gz
Extracting input_data/t10k-labels-idx1-ubyte.gz
Extracting input_data/train-images-idx3-ubyte.gz
Extracting input_data/train-labels-idx1-ubyte.gz
Extracting input_data/t10k-images-idx3-ubyte.gz
Extracting input_data/t10k-labels-idx1-ubyte.gz
Accuracy of the network on test images: 85 %
Instructions for updating:

Future major versions of TensorFlow will allow gradients to flow
into the labels input on backprop by default.

See `tf.nn.softmax_cross_entropy_with_logits_v2`.

0.956
Accuracy of the network on test images: 96 %
Accuracy of the network on test images: 96 %
