<a href="https://colab.research.google.com/github/kritshan/INDE-577/blob/main/Supervised%20Learning/Neural%20Networks/feed_forward_NN.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Feed Forward Neural Network

In this notebook, we will be implementing a feed forward neural network. Feed forward networks are a type of dense neural network, where information solely flows in one direction through the three layers.

Here are the three layers:
* Input Layer: This layer contains the input nodes where the data is fed into the network. Each node represents a feature or attribute of the input data.

* Hidden Layers: These are intermediary layers between the input and output layers. Each hidden layer consists of multiple nodes (neurons) which apply transformations to the input data through weighted connections.

* Output Layer: This layer produces the final output of the network based on the transformations applied to the input data. The number of nodes in the output layer depends on the nature of the problem, such as classification (where each node may represent a class) or regression (where there may be a single node for continuous output).

In a feed forward network, there is no cycle of information. Information only goes from the input layer to the hidden layer and finally to the output layer. This is unlike recurrent neural networks where connections allow the information to loop back through the layers. The cycles can create a vanishing gradient problem, as there can be many layers in the network from the repetitive loops. We avoid this problem by using a feed forward network.

We will be attempting to classify text documents, in terms of their source. We have three input documents, and we can try to determine which document a sequence of characters belongs to.

First, we need to define our data processing steps, the number of classes, batch sizes for gradient descent training, the number of layers, and the number of neurons in each layer.

We will implement our network using one of the beginner versions of TensorFlow. This version is more low-level than the newer versions, so we have more visibility into what is occurring.


In [None]:
import tensorflow.compat.v1 as tf
import numpy as np
import urllib
tf.compat.v1.disable_eager_execution()

# the number of iterations to train for
numTrainingIters = 10000

# the number of hidden neurons that hold the state of the RNN
hiddenUnits = 500

hiddenUnitsLayer1 = 515
hiddenUnitsLayer2 = 250

# the number of classes that we are learning over
numClasses = 3

# the number of data points in a batch
batchSize = 100

# this function takes a dictionary (called data) which contains
# of (dataPointID, (classNumber, matrix)) entries.  Each matrix
# is a sequence of vectors; each vector has a one-hot-encoding of
# an ascii character, and the sequence of vectors corresponds to
# one line of text.  classNumber indicates which file the line of
# text came from.
#
# The argument maxSeqLen is the maximum length of a line of text
# seen so far.  fileName is the name of a file whose contents
# we want to add to data.  classNum is an indicator of the class
# we are going to associate with text from that file.  linesToUse
# tells us how many lines to sample from the file.
#
# The return val is the new maxSeqLen, as well as the new data
# dictionary with the additional lines of text added
def addToData (maxSeqLen, data, testData, fileName, classNum, linesToUse):
    #
    # open the file and read it in
    response = urllib.request.urlopen(fileName)
    content = response.readlines ()
    #
    # sample linesToUse numbers; these will tell us what lines
    # from the text file we will use
    myInts = np.random.choice(len(content), size=linesToUse + 1000, replace=False)
    testInts = myInts[-2000:]
    trainingInts = myInts[:10000]
    #
    # i is the key of the next line of text to add to the dictionary
    i = len(data)
    #
    # loop thru and add the lines of text to the dictionary
    for whichLine in trainingInts.flat:
        #
        # get the line and ignore it if it has nothing in it
        line = content[whichLine].decode("utf-8")
        if line.isspace () or len(line) == 0:
            continue;
        #
        # take note if this is the longest line we've seen
        if len (line) > maxSeqLen:
            maxSeqLen = len (line)
        #
        # create the matrix that will hold this line
        temp = np.zeros((len(line), 256))
        #
        # j is the character we are on
        j = 0
        #
        # loop thru the characters
        for ch in line:
            #
            # non-ascii? ignore
            if ord(ch) >= 256:
                continue
            #
            # one hot!
            temp[j][ord(ch)] = 1
            #
            # move onto the next character
            j = j + 1
            #
        # remember the line of text
        data[i] = (classNum, temp)
        #
        # move onto the next line
        i = i + 1
    #
    # and return the dictionary with the new data

    testI = len(testData)
    for whichLine in testInts.flat:
        if len(testData) == 3000:
            break
        line = content[whichLine].decode("utf-8")
        if line.isspace () or len(line) == 0:
            continue;
        #
        # take note if this is the longest line we've seen
        if len (line) > maxSeqLen:
            maxSeqLen = len (line)
        #
        # create the matrix that will hold this line
        temp = np.zeros((len(line), 256))
        #
        # j is the character we are on
        j = 0
        #
        # loop thru the characters
        for ch in line:
            #
            # non-ascii? ignore
            if ord(ch) >= 256:
                continue
            #
            # one hot!
            temp[j][ord(ch)] = 1
            #
            # move onto the next character
            j = j + 1
            #
        # remember the line of text
        testData[testI] = (classNum, temp)
        #
        # move onto the next line
        testI = testI + 1

    return (maxSeqLen, data), (maxSeqLen, testData)

# this function takes as input a data set encoded as a dictionary
# (same encoding as the last function) and pre-pends every line of
# text with empty characters so that each line of text is exactly
# maxSeqLen characters in size
def pad (maxSeqLen, data):
   #
   # loop thru every line of text
   for i in data:
        #
        # access the matrix and the label
        temp = data[i][1]
        label = data[i][0]
        #
        # get the number of chatacters in this line
        len = temp.shape[0]
        #
        # and then pad so the line is the correct length
        padding = np.zeros ((maxSeqLen - len,256))
        data[i] = (label, np.transpose (np.concatenate ((padding, temp), axis = 0)))
   #
   # return the new data set
   return data

# this generates a new batch of training data of size batchSize from the
# list of lines of text data. This version of generateData is useful for
# an RNN because the data set x is a NumPy array with dimensions
# [batchSize, 256, maxSeqLen]; it can be unstacked into a series of
# matrices containing one-hot character encodings for each data point
# using tf.unstack(inputX, axis=2)
def generateDataRNN (maxSeqLen, data):
    #
    # randomly sample batchSize lines of text
    myInts = np.random.randint (0, len(data), batchSize)
    #
    # stack all of the text into a matrix of one-hot characters
    x = np.stack ([data[i][1] for i in myInts.flat])
    #
    # and stack all of the labels into a vector of labels
    y = np.stack ([np.array((data[i][0])) for i in myInts.flat])
    #
    # return the pair
    return (x, y)

# this also generates a new batch of training data, but it represents
# the data as a NumPy array with dimensions [batchSize, 256 * maxSeqLen]
# where for each data point, all characters have been appended.  Useful
# for feed-forward network training
def generateDataFeedForward (maxSeqLen, data):
    #
    # randomly sample batchSize lines of text
    myInts = np.random.randint (0, len(data), batchSize)
    #
    # stack all of the text into a matrix of one-hot characters
    x = np.stack ([data[i][1].flatten () for i in myInts.flat])
    #
    # and stack all of the labels into a vector of labels
    y = np.stack ([np.array((data[i][0])) for i in myInts.flat])
    #
    # return the pair
    return (x, y)

# create the data dictionary
maxSeqLen = 0
data = {}
testData = {}

# load up the three data sets and the test data sets
(maxSeqLen, data), (maxSeqLen, testData) = addToData (maxSeqLen, data, testData, "https://s3.amazonaws.com/chrisjermainebucket/text/Holmes.txt", 0, 11000)
(maxSeqLen, data), (maxSeqLen, testData) = addToData (maxSeqLen, data, testData, "https://s3.amazonaws.com/chrisjermainebucket/text/war.txt", 1, 11000)
(maxSeqLen, data), (maxSeqLen, testData) = addToData (maxSeqLen, data, testData, "https://s3.amazonaws.com/chrisjermainebucket/text/william.txt", 2, 11000)

# pad each entry in the dictionary with empty characters as needed so
# that the sequences are all of the same length
data = pad (maxSeqLen, data)
testData = pad (maxSeqLen, testData)


Now that we have processed our data and prepared our training/test datasets, we are ready to define our TensorFlow variables and placeholders. Variables are tensors/arrays whose values will be learned. Placeholders are values that are given at training times.

After that, we are prepared to implement our training process and our testing session. We will utilize a cross entropy loss function and an Adaptive Gradient Descent Algorithm for training.

For testing, we will see how many documents the network was able to correctly predict our of 3000 test documents.

In [None]:
# now we build the TensorFlow computation... there are two inputs,
# a batch of text lines and a batch of labels
inputX = tf.placeholder(tf.float32, [batchSize, 256 * maxSeqLen])
inputY = tf.placeholder(tf.int32, [batchSize])

# this is the inital state of the RNN, before processing any data
initialState = tf.placeholder(tf.float32, [batchSize, hiddenUnits])

# Define weights and biases for the hidden layers and output layer
W1 = tf.Variable(np.random.normal(0, 0.01, (256 * maxSeqLen, hiddenUnitsLayer1)), dtype=tf.float32)
b1 = tf.Variable(np.zeros((1, hiddenUnitsLayer1)), dtype=tf.float32)

W2 = tf.Variable(np.random.normal(0, 0.01, (hiddenUnitsLayer1, hiddenUnitsLayer2)), dtype=tf.float32)
b2 = tf.Variable(np.zeros((1, hiddenUnitsLayer2)), dtype=tf.float32)

W3 = tf.Variable(np.random.normal(0, 0.01, (hiddenUnitsLayer2, numClasses)), dtype=tf.float32)
b3 = tf.Variable(np.zeros((1, numClasses)), dtype=tf.float32)


# Define the network architecture
hiddenLayer1 = tf.nn.relu(tf.matmul(inputX, W1) + b1)
hiddenLayer2 = tf.nn.relu(tf.matmul(hiddenLayer1, W2) + b2)


# compute the set of outputs
outputs = tf.matmul(hiddenLayer2, W3) + b3

predictions = tf.nn.softmax(outputs)

# compute the loss
losses = tf.nn.sparse_softmax_cross_entropy_with_logits(logits=outputs, labels=inputY)
totalLoss = tf.reduce_mean(losses)

# use gradient descent to train
trainingAlg = tf.compat.v1.train.AdagradOptimizer(0.01).minimize(totalLoss)

# and train!!
with tf.Session() as sess:
    #
    # initialize everything
    sess.run(tf.compat.v1.global_variables_initializer())
    #
    # and run the training iters
    for epoch in range(numTrainingIters):
        #
        # get some data
        x, y = generateDataFeedForward(maxSeqLen, data)
        #
        # do the training epoch
        _currentState = np.zeros((batchSize, hiddenUnits))
        _totalLoss, _trainingAlg, _predictions, _outputs = sess.run(
                [totalLoss, trainingAlg, predictions, outputs],
                feed_dict={
                    inputX:x,
                    inputY:y,
                })
        #
        # just FYI, compute the number of correct predictions
        numCorrect = 0
        for i in range (len(y)):
           maxPos = -1
           maxVal = 0.0
           for j in range (numClasses):
               if maxVal < _predictions[i][j]:
                   maxVal = _predictions[i][j]
                   maxPos = j
           if maxPos == y[i]:
               numCorrect = numCorrect + 1
        #
        # print out to the screen
        print("Step", epoch, "Loss", _totalLoss, "Correct", numCorrect, "out of", batchSize)

    testX = np.stack([data[i][1].flatten() for i in testData])
    testY = np.stack([data[i][0] for i in testData])
    numCorrect = 0
    losses = []
    for i in range(30):
        x = testX[(i*batchSize):((i+1) * batchSize)]
        y = testY[(i*batchSize):((i+1) * batchSize)]
        #
        # do the training epoch
        _currentState = np.zeros((batchSize, hiddenUnits))
        _totalLoss, _predictions = sess.run(
                [totalLoss, predictions],
                feed_dict={
                    inputX:x,
                    inputY:y,
                })
        losses.append(_totalLoss)
        #
        # just FYI, compute the number of correct predictions
        for i in range (len(y)):
           maxPos = -1
           maxVal = 0.0
           for j in range (numClasses):
               if maxVal < _predictions[i][j]:
                   maxVal = _predictions[i][j]
                   maxPos = j
           if maxPos == y[i]:
               numCorrect = numCorrect + 1
    totalLosses = np.mean(losses)
    print("Loss for 3000 randomly chosen documents is", totalLosses, "correct labels is", numCorrect, "out of 3000")


Instructions for updating:
Call initializer instance with the dtype argument instead of passing it to the constructor


[1;30;43mStreaming output truncated to the last 5000 lines.[0m
Step 5001 Loss 0.10241549 Correct 98 out of 100
Step 5002 Loss 0.11184539 Correct 96 out of 100
Step 5003 Loss 0.22955267 Correct 88 out of 100
Step 5004 Loss 0.1859892 Correct 94 out of 100
Step 5005 Loss 0.17801276 Correct 93 out of 100
Step 5006 Loss 0.28297392 Correct 89 out of 100
Step 5007 Loss 0.08819753 Correct 97 out of 100
Step 5008 Loss 0.1904661 Correct 92 out of 100
Step 5009 Loss 0.22362322 Correct 92 out of 100
Step 5010 Loss 0.17906296 Correct 94 out of 100
Step 5011 Loss 0.12205652 Correct 96 out of 100
Step 5012 Loss 0.2438014 Correct 94 out of 100
Step 5013 Loss 0.09343378 Correct 99 out of 100
Step 5014 Loss 0.20408227 Correct 92 out of 100
Step 5015 Loss 0.20152724 Correct 93 out of 100
Step 5016 Loss 0.08558669 Correct 98 out of 100
Step 5017 Loss 0.28622025 Correct 91 out of 100
Step 5018 Loss 0.15664485 Correct 93 out of 100
Step 5019 Loss 0.18957432 Correct 93 out of 100
Step 5020 Loss 0.252623 Co

# Conclusion

Great! We were successfuly able to predict 2932 documents out of the 3000 test documents. It is evident that feed forward neural networks are powerful models, and they tools like TensorFlow make them relatively easy to understand. Unlike some of the other models in our exploration, they are black-box models. Their inner processes are not easily interpretable.

Nonetheless, Feedforward neural networks are widely used in various applications, including image and speech recognition, natural language processing, financial forecasting, and more. They serve as a fundamental building block for more complex neural network architectures like convolutional neural networks (CNNs) and recurrent neural networks (RNNs)