# Project 3: Building a Neural Network
### Importing previous step

In [1]:
from Project2 import *

TODO: We've included the framework of a class called SentimentNetork. 

* Create a basic neural network much like the networks we've seen in earlier lessons and in Project 1, with an input layer, a 
  hidden layer, and an output layer.
* Do not add a non-linearity in the hidden layer. That is, do not use an activation function when calculating the hidden layer 
  outputs.
* Re-use the code from earlier in this notebook to create the training data
* Implement the pre_process_data function to create the vocabulary for our training data generating functions
* Ensure train trains over the entire corpus

In [2]:
import time
import sys
import numpy as np

# Encapsulate our neural network in a class
class SentimentNetwork:
    def __init__(self, reviews,labels,hidden_nodes = 10, learning_rate = 0.1):
        """Create a SentimenNetwork with the given settings
        Args:
            reviews(list) - List of reviews used for training
            labels(list) - List of POSITIVE/NEGATIVE labels associated with the given reviews
            hidden_nodes(int) - Number of nodes to create in the hidden layer
            learning_rate(float) - Learning rate to use while training
        
        """
        # Assign a seed to our random number generator to ensure we get
        # reproducable results during development 
        np.random.seed(1)

        # process the reviews and their associated labels so that everything
        # is ready for training
        self.pre_process_data(reviews, labels)
        
        # Build the network to have the number of hidden nodes and the learning rate that
        # were passed into this initializer. Make the same number of input nodes as
        # there are vocabulary words and create a single output node.
        self.init_network(len(self.review_vocab),hidden_nodes, 1, learning_rate)

    def pre_process_data(self, reviews, labels):
        
        # populate review_vocab with all of the words in the given reviews
        review_vocab = set()
        for review in reviews:
            for word in review.split(" "):
                review_vocab.add(word)

        # Convert the vocabulary set to a list so we can access words via indices
        self.review_vocab = list(review_vocab)
        
        # populate label_vocab with all of the words in the given labels.
        label_vocab = set()
        for label in labels:
            label_vocab.add(label)
        
        # Convert the label vocabulary set to a list so we can access labels via indices
        self.label_vocab = list(label_vocab)
        
        # Store the sizes of the review and label vocabularies.
        self.review_vocab_size = len(self.review_vocab)
        self.label_vocab_size = len(self.label_vocab)
        
        # Create a dictionary of words in the vocabulary mapped to index positions
        self.word2index = {}
        for i, word in enumerate(self.review_vocab):
            self.word2index[word] = i
        
        # Create a dictionary of labels mapped to index positions
        self.label2index = {}
        for i, label in enumerate(self.label_vocab):
            self.label2index[label] = i
        
    def init_network(self, input_nodes, hidden_nodes, output_nodes, learning_rate):
        # Set number of nodes in input, hidden and output layers.
        self.input_nodes = input_nodes
        self.hidden_nodes = hidden_nodes
        self.output_nodes = output_nodes

        # Store the learning rate
        self.learning_rate = learning_rate

        # Initialize weights

        # These are the weights between the input layer and the hidden layer.
        self.weights_0_1 = np.zeros((self.input_nodes,self.hidden_nodes))
    
        # These are the weights between the hidden layer and the output layer.
        self.weights_1_2 = np.random.normal(0.0, self.output_nodes**-0.5, 
                                                (self.hidden_nodes, self.output_nodes))
        
        # The input layer, a two-dimensional matrix with shape 1 x input_nodes
        self.layer_0 = np.zeros((1,input_nodes))
    
    def update_input_layer(self,review):

        # clear out previous state, reset the layer to be all 0s
        self.layer_0 *= 0
        
        for word in review.split(" "):
            # NOTE: This if-check was not in the version of this method created in Project 2. 
            #       It simply ensures the word is actually a key in word2index before
            #       accessing it, which is important because accessing an invalid key
            #       with raise an exception in Python. This allows us to ignore unknown
            #       words encountered in new reviews.
            if(word in self.word2index.keys()):
                self.layer_0[0][self.word2index[word]] += 1
                
    def get_target_for_label(self,label):
        if(label == 'POSITIVE'):
            return 1
        else:
            return 0
        
    def sigmoid(self,x):
        return 1 / (1 + np.exp(-x))
    
    def sigmoid_output_2_derivative(self,output):
        return output * (1 - output)
    
    def train(self, training_reviews, training_labels):
        
        # make sure out we have a matching number of reviews and labels
        assert(len(training_reviews) == len(training_labels))
        
        # Keep track of correct predictions to display accuracy during training 
        correct_so_far = 0

        # Remember when we started for printing time statistics
        start = time.time()
        
        # loop through all the given reviews and run a forward and backward pass,
        # updating weights for every item
        for i in range(len(training_reviews)):
            
            # Get the next review and its correct label
            review = training_reviews[i]
            label = training_labels[i]
            
            #### Implement the forward pass here ####
            ### Forward pass ###

            # Input Layer
            self.update_input_layer(review)

            # Hidden layer
            layer_1 = self.layer_0.dot(self.weights_0_1)

            # Output layer
            layer_2 = self.sigmoid(layer_1.dot(self.weights_1_2))
            
            #### Implement the backward pass here ####
            ### Backward pass ###

            # Output error
            layer_2_error = layer_2 - self.get_target_for_label(label) # Output layer error is the difference between desired target and actual output.
            layer_2_delta = layer_2_error * self.sigmoid_output_2_derivative(layer_2)

            # Backpropagated error
            layer_1_error = layer_2_delta.dot(self.weights_1_2.T) # errors propagated to the hidden layer
            layer_1_delta = layer_1_error # hidden layer gradients - no nonlinearity so it's the same as the error

            # Update the weights
            self.weights_1_2 -= layer_1.T.dot(layer_2_delta) * self.learning_rate # update hidden-to-output weights with gradient descent step
            self.weights_0_1 -= self.layer_0.T.dot(layer_1_delta) * self.learning_rate # update input-to-hidden weights with gradient descent step

            # Keep track of correct predictions.
            if(layer_2 >= 0.5 and label == 'POSITIVE'):
                correct_so_far += 1
            elif(layer_2 < 0.5 and label == 'NEGATIVE'):
                correct_so_far += 1
            
            # For debug purposes, print out our prediction accuracy and speed 
            # throughout the training process. 
            elapsed_time = float(time.time() - start)
            reviews_per_second = i / elapsed_time if elapsed_time > 0 else 0
            
            sys.stdout.write("\rProgress:" + str(100 * i/float(len(training_reviews)))[:4] \
                             + "% Speed(reviews/sec):" + str(reviews_per_second)[0:5] \
                             + " #Correct:" + str(correct_so_far) + " #Trained:" + str(i+1) \
                             + " Training Accuracy:" + str(correct_so_far * 100 / float(i+1))[:4] + "%")
            if(i % 2500 == 0):
                print("")
    
    def test(self, testing_reviews, testing_labels):
        """
        Attempts to predict the labels for the given testing_reviews,
        and uses the test_labels to calculate the accuracy of those predictions.
        """
        
        # keep track of how many correct predictions we make
        correct = 0

        # we'll time how many predictions per second we make
        start = time.time()

        # Loop through each of the given reviews and call run to predict
        # its label. 
        for i in range(len(testing_reviews)):
            pred = self.run(testing_reviews[i])
            if(pred == testing_labels[i]):
                correct += 1
            
            # For debug purposes, print out our prediction accuracy and speed 
            # throughout the prediction process. 

            elapsed_time = float(time.time() - start)
            reviews_per_second = i / elapsed_time if elapsed_time > 0 else 0
            
            sys.stdout.write("\rProgress:" + str(100 * i/float(len(testing_reviews)))[:4] \
                             + "% Speed(reviews/sec):" + str(reviews_per_second)[0:5] \
                             + " #Correct:" + str(correct) + " #Tested:" + str(i+1) \
                             + " Testing Accuracy:" + str(correct * 100 / float(i+1))[:4] + "%")
    
    def run(self, review):
        """
        Returns a POSITIVE or NEGATIVE prediction for the given review.
        """
        # Run a forward pass through the network, like in the "train" function.
        
        # Input Layer
        self.update_input_layer(review.lower())

        # Hidden layer
        layer_1 = self.layer_0.dot(self.weights_0_1)

        # Output layer
        layer_2 = self.sigmoid(layer_1.dot(self.weights_1_2))
        
        # Return POSITIVE for values above greater-than-or-equal-to 0.5 in the output layer;
        # return NEGATIVE for other values
        if(layer_2[0] >= 0.5):
            return "POSITIVE"
        else:
            return "NEGATIVE"

#### Running the above cell will create a SentimentNetwork that will train on all but the last 1000 reviews (we're saving those for testing). Here we use a learning rate of 0.1.

In [3]:
mlp = SentimentNetwork(reviews[:-1000],labels[:-1000], learning_rate=0.1)

Running the following cell to test the network's performance against the last 1000 reviews (the ones we held out from our training set).

#### We have not trained the model yet, so the results should be about 50% as it will just be guessing and there are only two possible values to choose from.

In [4]:
mlp.test(reviews[-1000:],labels[-1000:])

Progress:0.0% Speed(reviews/sec):0.0 #Correct:1 #Tested:1 Testing Accuracy:100.%Progress:0.1% Speed(reviews/sec):13.81 #Correct:1 #Tested:2 Testing Accuracy:50.0%Progress:0.2% Speed(reviews/sec):27.25 #Correct:2 #Tested:3 Testing Accuracy:66.6%Progress:0.3% Speed(reviews/sec):40.32 #Correct:2 #Tested:4 Testing Accuracy:50.0%Progress:0.4% Speed(reviews/sec):53.04 #Correct:3 #Tested:5 Testing Accuracy:60.0%Progress:0.5% Speed(reviews/sec):64.59 #Correct:3 #Tested:6 Testing Accuracy:50.0%Progress:0.6% Speed(reviews/sec):75.43 #Correct:4 #Tested:7 Testing Accuracy:57.1%Progress:0.7% Speed(reviews/sec):86.91 #Correct:4 #Tested:8 Testing Accuracy:50.0%Progress:0.8% Speed(reviews/sec):96.93 #Correct:5 #Tested:9 Testing Accuracy:55.5%Progress:0.9% Speed(reviews/sec):105.2 #Correct:5 #Tested:10 Testing Accuracy:50.0%Progress:1.0% Speed(reviews/sec):115.5 #Correct:6 #Tested:11 Testing Accuracy:54.5%Progress:1.1% Speed(reviews/sec):124.1 #Correct:6 #Tested:12 Testing Accuracy:50.0%Pr

Progress:11.5% Speed(reviews/sec):445.2 #Correct:58 #Tested:116 Testing Accuracy:50.0%Progress:11.6% Speed(reviews/sec):447.3 #Correct:59 #Tested:117 Testing Accuracy:50.4%Progress:11.7% Speed(reviews/sec):449.5 #Correct:59 #Tested:118 Testing Accuracy:50.0%Progress:11.8% Speed(reviews/sec):451.6 #Correct:60 #Tested:119 Testing Accuracy:50.4%Progress:11.9% Speed(reviews/sec):451.9 #Correct:60 #Tested:120 Testing Accuracy:50.0%Progress:12.0% Speed(reviews/sec):452.7 #Correct:61 #Tested:121 Testing Accuracy:50.4%Progress:12.1% Speed(reviews/sec):454.8 #Correct:61 #Tested:122 Testing Accuracy:50.0%Progress:12.2% Speed(reviews/sec):455.2 #Correct:62 #Tested:123 Testing Accuracy:50.4%Progress:12.3% Speed(reviews/sec):455.9 #Correct:62 #Tested:124 Testing Accuracy:50.0%Progress:12.4% Speed(reviews/sec):457.7 #Correct:63 #Tested:125 Testing Accuracy:50.4%Progress:12.5% Speed(reviews/sec):458.0 #Correct:63 #Tested:126 Testing Accuracy:50.0%Progress:12.6% Speed(reviews/sec):460.0 #C

Progress:24.7% Speed(reviews/sec):562.6 #Correct:124 #Tested:248 Testing Accuracy:50.0%Progress:24.8% Speed(reviews/sec):563.6 #Correct:125 #Tested:249 Testing Accuracy:50.2%Progress:24.9% Speed(reviews/sec):563.3 #Correct:125 #Tested:250 Testing Accuracy:50.0%Progress:25.0% Speed(reviews/sec):564.3 #Correct:126 #Tested:251 Testing Accuracy:50.1%Progress:25.1% Speed(reviews/sec):565.3 #Correct:126 #Tested:252 Testing Accuracy:50.0%Progress:25.2% Speed(reviews/sec):566.3 #Correct:127 #Tested:253 Testing Accuracy:50.1%Progress:25.3% Speed(reviews/sec):566.4 #Correct:127 #Tested:254 Testing Accuracy:50.0%Progress:25.4% Speed(reviews/sec):567.3 #Correct:128 #Tested:255 Testing Accuracy:50.1%Progress:25.5% Speed(reviews/sec):568.3 #Correct:128 #Tested:256 Testing Accuracy:50.0%Progress:25.6% Speed(reviews/sec):569.2 #Correct:129 #Tested:257 Testing Accuracy:50.1%Progress:25.7% Speed(reviews/sec):570.2 #Correct:129 #Tested:258 Testing Accuracy:50.0%Progress:25.8% Speed(reviews/se

Progress:99.9% Speed(reviews/sec):467.4 #Correct:500 #Tested:1000 Testing Accuracy:50.0%

Run the following cell to actually train the network. During training, it will display the model's accuracy repeatedly as it trains so we can see how well it's doing.

In [5]:
mlp.train(reviews[:-1000],labels[:-1000])

Progress:0.0% Speed(reviews/sec):0.0 #Correct:1 #Trained:1 Training Accuracy:100.%
Progress:10.4% Speed(reviews/sec):90.22 #Correct:1251 #Trained:2501 Training Accuracy:50.0%
Progress:20.8% Speed(reviews/sec):92.93 #Correct:2501 #Trained:5001 Training Accuracy:50.0%
Progress:31.2% Speed(reviews/sec):93.77 #Correct:3751 #Trained:7501 Training Accuracy:50.0%
Progress:41.6% Speed(reviews/sec):92.21 #Correct:5001 #Trained:10001 Training Accuracy:50.0%
Progress:52.0% Speed(reviews/sec):90.90 #Correct:6251 #Trained:12501 Training Accuracy:50.0%
Progress:62.5% Speed(reviews/sec):90.72 #Correct:7501 #Trained:15001 Training Accuracy:50.0%
Progress:72.9% Speed(reviews/sec):89.97 #Correct:8751 #Trained:17501 Training Accuracy:50.0%
Progress:83.3% Speed(reviews/sec):90.20 #Correct:10001 #Trained:20001 Training Accuracy:50.0%
Progress:93.7% Speed(reviews/sec):89.47 #Correct:11251 #Trained:22501 Training Accuracy:50.0%
Progress:99.9% Speed(reviews/sec):88.84 #Correct:12000 #Trained:24000 Training Ac

That most likely didn't train very well. Part of the reason may be because the learning rate is too high. Running the following cell to recreate the network with a smaller learning rate, 0.01, and then train the new network.

In [6]:
mlp = SentimentNetwork(reviews[:-1000],labels[:-1000], learning_rate=0.01)
mlp.train(reviews[:-1000],labels[:-1000])

Progress:0.0% Speed(reviews/sec):0.0 #Correct:1 #Trained:1 Training Accuracy:100.%
Progress:10.4% Speed(reviews/sec):74.75 #Correct:1248 #Trained:2501 Training Accuracy:49.9%
Progress:20.8% Speed(reviews/sec):75.89 #Correct:2498 #Trained:5001 Training Accuracy:49.9%
Progress:31.2% Speed(reviews/sec):76.22 #Correct:3748 #Trained:7501 Training Accuracy:49.9%
Progress:41.6% Speed(reviews/sec):76.08 #Correct:4998 #Trained:10001 Training Accuracy:49.9%
Progress:52.0% Speed(reviews/sec):76.99 #Correct:6248 #Trained:12501 Training Accuracy:49.9%
Progress:62.5% Speed(reviews/sec):76.70 #Correct:7491 #Trained:15001 Training Accuracy:49.9%
Progress:72.9% Speed(reviews/sec):76.30 #Correct:8746 #Trained:17501 Training Accuracy:49.9%
Progress:83.3% Speed(reviews/sec):76.17 #Correct:9996 #Trained:20001 Training Accuracy:49.9%
Progress:93.7% Speed(reviews/sec):76.13 #Correct:11246 #Trained:22501 Training Accuracy:49.9%
Progress:99.9% Speed(reviews/sec):76.05 #Correct:11995 #Trained:24000 Training Acc

That probably wasn't much different. Running the following cell to recreate the network one more time with an even smaller learning rate, 0.001, and then train the new network.

In [7]:
mlp = SentimentNetwork(reviews[:-1000],labels[:-1000], learning_rate=0.001)
mlp.train(reviews[:-1000],labels[:-1000])

Progress:0.0% Speed(reviews/sec):0.0 #Correct:1 #Trained:1 Training Accuracy:100.%
Progress:10.4% Speed(reviews/sec):73.14 #Correct:1282 #Trained:2501 Training Accuracy:51.2%
Progress:20.8% Speed(reviews/sec):74.25 #Correct:2635 #Trained:5001 Training Accuracy:52.6%
Progress:31.2% Speed(reviews/sec):74.60 #Correct:4093 #Trained:7501 Training Accuracy:54.5%
Progress:41.6% Speed(reviews/sec):74.27 #Correct:5636 #Trained:10001 Training Accuracy:56.3%
Progress:52.0% Speed(reviews/sec):74.31 #Correct:7210 #Trained:12501 Training Accuracy:57.6%
Progress:62.5% Speed(reviews/sec):74.47 #Correct:8798 #Trained:15001 Training Accuracy:58.6%
Progress:72.9% Speed(reviews/sec):74.57 #Correct:10362 #Trained:17501 Training Accuracy:59.2%
Progress:83.3% Speed(reviews/sec):74.56 #Correct:12048 #Trained:20001 Training Accuracy:60.2%
Progress:93.7% Speed(reviews/sec):74.45 #Correct:13715 #Trained:22501 Training Accuracy:60.9%
Progress:99.9% Speed(reviews/sec):74.51 #Correct:14782 #Trained:24000 Training A

Trying to recreate the network one more time with an even smaller learning rate, 0.0001, and then train the new network.

In [8]:
mlp = SentimentNetwork(reviews[:-1000],labels[:-1000], learning_rate=0.0001)
mlp.train(reviews[:-1000],labels[:-1000])

Progress:0.0% Speed(reviews/sec):0.0 #Correct:1 #Trained:1 Training Accuracy:100.%
Progress:10.4% Speed(reviews/sec):76.02 #Correct:1358 #Trained:2501 Training Accuracy:54.2%
Progress:20.8% Speed(reviews/sec):76.13 #Correct:2900 #Trained:5001 Training Accuracy:57.9%
Progress:31.2% Speed(reviews/sec):76.05 #Correct:4562 #Trained:7501 Training Accuracy:60.8%
Progress:41.6% Speed(reviews/sec):75.61 #Correct:6306 #Trained:10001 Training Accuracy:63.0%
Progress:52.0% Speed(reviews/sec):75.63 #Correct:8099 #Trained:12501 Training Accuracy:64.7%
Progress:62.5% Speed(reviews/sec):75.77 #Correct:9896 #Trained:15001 Training Accuracy:65.9%
Progress:72.9% Speed(reviews/sec):75.75 #Correct:11682 #Trained:17501 Training Accuracy:66.7%
Progress:83.3% Speed(reviews/sec):75.77 #Correct:13551 #Trained:20001 Training Accuracy:67.7%
Progress:93.7% Speed(reviews/sec):75.81 #Correct:15421 #Trained:22501 Training Accuracy:68.5%
Progress:99.9% Speed(reviews/sec):75.77 #Correct:16579 #Trained:24000 Training A

### With a learning rate of 0.001, the network should finall have started to improve during training. It's still not very good, but it shows that this solution has potential. We will improve it in the next lesson.