# Project : Sentiment Analysis

# # Curating the Dataset

In [36]:
def pretty_print_review_and_label(i):
    print(labels[i] + "\t:\t" + reviews[i][:80] + "...")

g = open('reviews.txt','r') # What we know!
reviews = list(map(lambda x:x[:-1],g.readlines()))
g.close()

g = open('labels.txt','r') # What we WANT to know!
labels = list(map(lambda x:x[:-1].upper(),g.readlines()))
g.close()

**Note:** The data in `reviews.txt` we're using has already been preprocessed a bit and contains only lower case characters. 

In [37]:
len(reviews)

25000

In [38]:
reviews[0]

'bromwell high is a cartoon comedy . it ran at the same time as some other programs about school life  such as  teachers  . my   years in the teaching profession lead me to believe that bromwell high  s satire is much closer to reality than is  teachers  . the scramble to survive financially  the insightful students who can see right through their pathetic teachers  pomp  the pettiness of the whole situation  all remind me of the schools i knew and their students . when i saw the episode in which a student repeatedly tried to burn down the school  i immediately recalled . . . . . . . . . at . . . . . . . . . . high . a classic line inspector i  m here to sack one of your teachers . student welcome to bromwell high . i expect that many adults of my age think that bromwell high is far fetched . what a pity that it isn  t   '

In [39]:
labels[0]

'POSITIVE'

In [40]:
print("labels.txt \t : \t reviews.txt\n")
pretty_print_review_and_label(2137)
pretty_print_review_and_label(12816)
pretty_print_review_and_label(6267)
pretty_print_review_and_label(21934)
pretty_print_review_and_label(5297)
pretty_print_review_and_label(4998)

labels.txt 	 : 	 reviews.txt

NEGATIVE	:	this movie is terrible but it has some good effects .  ...
POSITIVE	:	adrian pasdar is excellent is this film . he makes a fascinating woman .  ...
NEGATIVE	:	comment this movie is impossible . is terrible  very improbable  bad interpretat...
POSITIVE	:	excellent episode movie ala pulp fiction .  days   suicides . it doesnt get more...
NEGATIVE	:	if you haven  t seen this  it  s terrible . it is pure trash . i saw this about ...
POSITIVE	:	this schiffer guy is a real genius  the movie is of excellent quality and both e...


# The Predictive Theory

We create three `Counter` objects, one for words from postive reviews, one for words from negative reviews, and one for all the words.

Now we Examine all the reviews. For each word in a positive review,we increase the count for that word in both our positive counter and the total words counter; likewise, for each word in a negative review,we increase the count for that word in both our negative counter and the total words counter.

In [41]:
from collections import Counter
import numpy as np

# Create three Counter objects to store positive, negative and total counts
positive_counts = Counter()
negative_counts = Counter()
total_counts = Counter()

# Loop over all the words in all the reviews and increment the counts in the appropriate counter objects
for j,i in enumerate(labels):
    if i=='POSITIVE':
        words=reviews[j].split(' ')
        for w in words: 
            positive_counts.update({w: 1})
            total_counts.update({w: 1})
    else:
        words=reviews[j].split(' ')
        for w in words: 
            negative_counts.update({w: 1})
            total_counts.update({w: 1})

Now we Run the following two cells to list the words used in positive reviews and negative reviews, respectively, ordered from most to least commonly used. 

In [42]:
# Examine the counts of the most common words in positive reviews
positive_counts.most_common()

[('', 550468),
 ('the', 173324),
 ('.', 159654),
 ('and', 89722),
 ('a', 83688),
 ('of', 76855),
 ('to', 66746),
 ('is', 57245),
 ('in', 50215),
 ('br', 49235),
 ('it', 48025),
 ('i', 40743),
 ('that', 35630),
 ('this', 35080),
 ('s', 33815),
 ('as', 26308),
 ('with', 23247),
 ('for', 22416),
 ('was', 21917),
 ('film', 20937),
 ('but', 20822),
 ('movie', 19074),
 ('his', 17227),
 ('on', 17008),
 ('you', 16681),
 ('he', 16282),
 ('are', 14807),
 ('not', 14272),
 ('t', 13720),
 ('one', 13655),
 ('have', 12587),
 ('be', 12416),
 ('by', 11997),
 ('all', 11942),
 ('who', 11464),
 ('an', 11294),
 ('at', 11234),
 ('from', 10767),
 ('her', 10474),
 ('they', 9895),
 ('has', 9186),
 ('so', 9154),
 ('like', 9038),
 ('about', 8313),
 ('very', 8305),
 ('out', 8134),
 ('there', 8057),
 ('she', 7779),
 ('what', 7737),
 ('or', 7732),
 ('good', 7720),
 ('more', 7521),
 ('when', 7456),
 ('some', 7441),
 ('if', 7285),
 ('just', 7152),
 ('can', 7001),
 ('story', 6780),
 ('time', 6515),
 ('my', 6488),
 ('g

In [43]:
# Examine the counts of the most common words in negative reviews
negative_counts.most_common()

[('', 561462),
 ('.', 167538),
 ('the', 163389),
 ('a', 79321),
 ('and', 74385),
 ('of', 69009),
 ('to', 68974),
 ('br', 52637),
 ('is', 50083),
 ('it', 48327),
 ('i', 46880),
 ('in', 43753),
 ('this', 40920),
 ('that', 37615),
 ('s', 31546),
 ('was', 26291),
 ('movie', 24965),
 ('for', 21927),
 ('but', 21781),
 ('with', 20878),
 ('as', 20625),
 ('t', 20361),
 ('film', 19218),
 ('you', 17549),
 ('on', 17192),
 ('not', 16354),
 ('have', 15144),
 ('are', 14623),
 ('be', 14541),
 ('he', 13856),
 ('one', 13134),
 ('they', 13011),
 ('at', 12279),
 ('his', 12147),
 ('all', 12036),
 ('so', 11463),
 ('like', 11238),
 ('there', 10775),
 ('just', 10619),
 ('by', 10549),
 ('or', 10272),
 ('an', 10266),
 ('who', 9969),
 ('from', 9731),
 ('if', 9518),
 ('about', 9061),
 ('out', 8979),
 ('what', 8422),
 ('some', 8306),
 ('no', 8143),
 ('her', 7947),
 ('even', 7687),
 ('can', 7653),
 ('has', 7604),
 ('good', 7423),
 ('bad', 7401),
 ('would', 7036),
 ('up', 6970),
 ('only', 6781),
 ('more', 6730),
 ('

As you can see, common words like "the" appear very often in both positive and negative reviews. Instead of finding the most common words in positive or negative reviews, what we really want are the words found in positive reviews more often than in negative reviews, and vice versa. To accomplish this, we calculate the **ratios** of word usage between positive and negative reviews.

In [44]:
# Create Counter object to store positive/negative ratios
pos_neg_ratios = Counter()
# TODO: Calculate the ratios of positive and negative uses of the most common words
#       Consider words to be "common" if they've been used at least 100 times
for i in positive_counts:
    if negative_counts[i]>0 and negative_counts[i]>=100:
        ratio=positive_counts[i]/float(negative_counts[i]+1)
        pos_neg_ratios.update({i: ratio})

Examining the ratios we calculated for a few words:

In [45]:
print("Pos-to-neg ratio for 'the' = {}".format(pos_neg_ratios["the"]))
print("Pos-to-neg ratio for 'amazing' = {}".format(pos_neg_ratios["amazing"]))
print("Pos-to-neg ratio for 'terrible' = {}".format(pos_neg_ratios["terrible"]))

Pos-to-neg ratio for 'the' = 1.0607993145235326
Pos-to-neg ratio for 'amazing' = 4.022813688212928
Pos-to-neg ratio for 'terrible' = 0.17744252873563218


Looking closely at the values we just calculated, we see the following:

* Words that we would expect to see more often in positive reviews – like "amazing" – have a ratio greater than 1. The more skewed a word is toward postive, the farther from 1 its positive-to-negative ratio  will be.
* Words that we would expect to see more often in negative reviews – like "terrible" – have positive values that are less than 1. The more skewed a word is toward negative, the closer to zero its positive-to-negative ratio will be.
* Neutral words, which don't really convey any sentiment because we would expect to see them in all sorts of reviews – like "the" – have values very close to 1. A perfectly neutral word – one that was used in exactly the same number of positive reviews as negative reviews – would be almost exactly 1. The `+1` add to the denominator slightly biases words toward negative, but it will be a tiny bias and we'll be ignoring words that are too close to neutral anyway.

The ratios tell us which words are used more often in postive or negative reviews, but the specific values we've calculated are a bit difficult to work with. A very positive word like "amazing" has a value above 4, whereas a very negative word like "terrible" has a value around 0.18. Those values aren't easy to compare for a couple of reasons:

* Right now, 1 is considered neutral, but the absolute value of the postive-to-negative rations of very postive words is larger than the absolute value of the ratios for the very negative words. So there is no way to directly compare two numbers and see if one word conveys the same magnitude of positive sentiment as another word conveys negative sentiment. So we center all the values around netural so the absolute value from neutral of the postive-to-negative ratio for a word would indicate how much sentiment (positive or negative) that word conveys. 

To fix these issues, we converted all of our ratios to new values using logarithms.

In the end, extremely positive and extremely negative words will have positive-to-negative ratios with similar magnitudes but opposite signs.

In [46]:
# Convert ratios to logs
for w in pos_neg_ratios:
    pos_neg_ratios[w]=np.log(pos_neg_ratios[w])

Examining the new ratios we calculated for the same words from before:

In [47]:
print("Pos-to-neg ratio for 'the' = {}".format(pos_neg_ratios["the"]))
print("Pos-to-neg ratio for 'amazing' = {}".format(pos_neg_ratios["amazing"]))
print("Pos-to-neg ratio for 'terrible' = {}".format(pos_neg_ratios["terrible"]))

Pos-to-neg ratio for 'the' = 0.05902269426102881
Pos-to-neg ratio for 'amazing' = 1.3919815802404802
Pos-to-neg ratio for 'terrible' = -1.7291085042663878


Now we run the following cells to see more ratios to check the theory. 

The first cell displays all the words, ordered by how associated they are with postive reviews.

The second cell displays the 30 words most associated with negative reviews by reversing the order of the first list and then looking at the first 30 words.

We continue to see values similar to the earlier ones we checked – neutral words will be close to `0`, words will get more positive as their ratios approach and go above `1`, and words will get more negative as their ratios approach and go below `-1`. That's why we decided to use the logs instead of the raw ratios.

In [48]:
# words most frequently seen in a review with a "POSITIVE" label
pos_neg_ratios.most_common()

[('superb', 1.7091514458966952),
 ('wonderful', 1.5645425925262093),
 ('fantastic', 1.5048433868558566),
 ('excellent', 1.4647538505723599),
 ('amazing', 1.3919815802404802),
 ('powerful', 1.2999662776313934),
 ('favorite', 1.2668956297860055),
 ('perfect', 1.246742480713785),
 ('brilliant', 1.2287554137664785),
 ('recommended', 1.2163953243244932),
 ('perfectly', 1.1971931173405572),
 ('subtle', 1.173413501750808),
 ('rare', 1.1566438362402944),
 ('loved', 1.1563661500586044),
 ('highly', 1.1420208631618658),
 ('tony', 1.139749194228599),
 ('today', 1.1050431789984),
 ('awesome', 1.0931328229034842),
 ('unique', 1.0881409888008142),
 ('beauty', 1.050410186850232),
 ('fascinating', 1.0414538748281612),
 ('greatest', 1.0248947127715422),
 ('portrayal', 1.0189810189761024),
 ('incredible', 1.0061677561461084),
 ('harry', 0.9917691930500606),
 ('sweet', 0.9896611048795548),
 ('oscar', 0.9872190511104971),
 ('complex', 0.977618977381478),
 ('solid', 0.9753796482441615),
 ('beautiful', 0.97

In [49]:
# words most frequently seen in a review with a "NEGATIVE" label
list(reversed(pos_neg_ratios.most_common()))[0:30]

[('boll', -4.969813299576001),
 ('uwe', -4.624972813284271),
 ('seagal', -3.644143560272545),
 ('unwatchable', -3.258096538021482),
 ('mst', -2.9502698994772336),
 ('incoherent', -2.9368917735310576),
 ('unfunny', -2.6922395950755678),
 ('waste', -2.6193845640165536),
 ('blah', -2.5704288232261625),
 ('horrid', -2.4849066497880004),
 ('pointless', -2.4553061800117097),
 ('atrocious', -2.4259083090260445),
 ('redeeming', -2.3682390632154826),
 ('prom', -2.3608540011180215),
 ('drivel', -2.3470368555648795),
 ('lousy', -2.307572634505085),
 ('worst', -2.286987896180378),
 ('laughable', -2.264363880173848),
 ('awful', -2.227194247027435),
 ('poorly', -2.2207550747464135),
 ('wasting', -2.204604684633842),
 ('remotely', -2.1972245773362196),
 ('existent', -2.0794415416798357),
 ('boredom', -1.995100393246085),
 ('miserably', -1.9924301646902063),
 ('sucks', -1.987068221548821),
 ('uninspired', -1.9832976811269336),
 ('lame', -1.981767458946166),
 ('insult', -1.978345424808467),
 ('unintere

# Transforming Text into Numbers

## Creating the Input/Output Data

In [52]:
# Create set named "vocab" containing all of the words from all of the reviews
vocab = total_counts.keys()

vocab_size = len(vocab)
print(vocab_size)

74074


We now create a numpy array called `layer_0` and initialize it to all zeros. 

In [53]:
# Create layer_0 matrix with dimensions 1 by vocab_size, initially filled with zeros
layer_0 = np.zeros((1, vocab_size))

layer_0.shape

(1, 74074)

`layer_0` contains one entry for every word in the vocabulary. We need to make sure we know the index of each word, so we run the following cell to create a lookup table that stores the index of every word.

In [54]:
# Create a dictionary of words in the vocabulary mapped to index positions
# (to be used in layer_0)
word2index = {}
for i,word in enumerate(vocab):
    word2index[word] = i
    
# display the map of words to indices
word2index

{'bromwell': 0,
 'high': 1,
 'is': 2,
 'a': 3,
 'cartoon': 4,
 'comedy': 5,
 '.': 6,
 'it': 7,
 'ran': 8,
 'at': 9,
 'the': 10,
 'same': 11,
 'time': 12,
 'as': 13,
 'some': 14,
 'other': 15,
 'programs': 16,
 'about': 17,
 'school': 18,
 'life': 19,
 '': 20,
 'such': 21,
 'teachers': 22,
 'my': 23,
 'years': 24,
 'in': 25,
 'teaching': 26,
 'profession': 27,
 'lead': 28,
 'me': 29,
 'to': 30,
 'believe': 31,
 'that': 32,
 's': 33,
 'satire': 34,
 'much': 35,
 'closer': 36,
 'reality': 37,
 'than': 38,
 'scramble': 39,
 'survive': 40,
 'financially': 41,
 'insightful': 42,
 'students': 43,
 'who': 44,
 'can': 45,
 'see': 46,
 'right': 47,
 'through': 48,
 'their': 49,
 'pathetic': 50,
 'pomp': 51,
 'pettiness': 52,
 'of': 53,
 'whole': 54,
 'situation': 55,
 'all': 56,
 'remind': 57,
 'schools': 58,
 'i': 59,
 'knew': 60,
 'and': 61,
 'when': 62,
 'saw': 63,
 'episode': 64,
 'which': 65,
 'student': 66,
 'repeatedly': 67,
 'tried': 68,
 'burn': 69,
 'down': 70,
 'immediately': 71,
 're

We now create a function with the name of `update_input_layer`. It counts how many times each word is used in the given review, and then store those counts at the appropriate indices inside `layer_0`.

In [55]:
def update_input_layer(review):
    global layer_0
    # clearing out previous state by resetting the layer to be all 0s
    layer_0 *= 0
    
    # counting how many times each word is used in the given review and store the results in layer_0
    lc = Counter()
    words=review.split(' ')
    for w in words: 
        lc.update({w: 1})
    for word in words:
        layer_0[0][word2index[word]]=lc[word]

We run the following cell to test updating the input layer with the first review.  

In [56]:
update_input_layer(reviews[0])
layer_0

array([[4., 5., 4., ..., 0., 0., 0.]])

Implementation of function `get_target_for_labels`. It returns `0` or `1`, depending on whether the given label is `NEGATIVE` or `POSITIVE`, respectively.

In [57]:
def get_target_for_label(label):
    if label=='NEGATIVE':
        return 0
    else:
        return 1

In [58]:
labels[0]

'POSITIVE'

In [59]:
get_target_for_label(labels[0])

1

In [60]:
labels[1]

'NEGATIVE'

In [61]:
get_target_for_label(labels[1])

0

# Building the Neural Network

Now we implement all of the items given below in this.
- Creation of a neural network with an input layer, a hidden layer, and an output layer. 
- `pre_process_data` function to create the vocabulary for our training data generating functions.
- `train` function that trains over the entire dataset.

In [72]:
g = open('reviews.txt','r') # What we know!
reviews = list(map(lambda x:x[:-1],g.readlines()))
g.close()

g = open('labels.txt','r') # What we WANT to know!
labels = list(map(lambda x:x[:-1].upper(),g.readlines()))
g.close()

import time
import sys
import numpy as np

# neural network class
class SentimentNetwork:
    def __init__(self, reviews, labels, hidden_nodes = 10, learning_rate = 0.1): 
        np.random.seed(1)
        self.pre_process_data(reviews, labels)
        self.init_network(len(self.review_vocab),hidden_nodes, 1, learning_rate)

    def pre_process_data(self, reviews, labels):
        review_vocab = set()
        for review in reviews:
            for word in review.split(' '):
                review_vocab.add(word)
                
        # Converting the vocabulary set to a list so we can access words via indices
        self.review_vocab = list(review_vocab)
        
        label_vocab = set()
        # populating label_vocab with all of the words in the given labels.
        for label in labels:
            label_vocab.add(label)
        # Converting the label vocabulary set to a list so we can access labels via indices
        self.label_vocab = list(label_vocab)
        
        # Storing the sizes of the review and label vocabularies.
        self.review_vocab_size = len(self.review_vocab)
        self.label_vocab_size = len(self.label_vocab)
        
        # Creating a dictionary of words in the vocabulary mapped to index positions
        self.word2index = {}
        # populating self.word2index with indices for all the words in self.review_vocab
        for i, word in enumerate(self.review_vocab):
            self.word2index[word] = i
        # Creating a dictionary of labels mapped to index positions
        self.label2index = {}
        for i, label in enumerate(self.label_vocab):
            self.label2index[label] = i
        
    def init_network(self, input_nodes, hidden_nodes, output_nodes, learning_rate):
        # Storing the number of nodes in input, hidden, and output layers.
        self.input_nodes = input_nodes
        self.hidden_nodes = hidden_nodes
        self.output_nodes = output_nodes

        # Storing the learning rate
        self.learning_rate = learning_rate

        # Initialize weights between the input layer and the hidden layer
        self.weights_0_1 = np.zeros((self.input_nodes,self.hidden_nodes))
        
        # Initialize weights between the hidden layer and the output layer
        self.weights_1_2 = np.random.normal(0.0, self.output_nodes**-0.5, 
                                                (self.hidden_nodes, self.output_nodes))
        
        # Creating the input layer, a two-dimensional matrix with shape 
        #   1 x input_nodes, with all values initialized to zero
        self.layer_0 = np.zeros((1,input_nodes))
    
        
    def update_input_layer(self,review):
        self.layer_0 *= 0

        # counting how many times each word is used in the given review and store the results in layer_0
        for word in review.split(" "):
            if(word in self.word2index.keys()):
                self.layer_0[0][self.word2index[word]] += 1
                
    def get_target_for_label(self,label):
        if label=='NEGATIVE':
            return 0
        else:
            return 1
        
    def sigmoid(self,x):
        return 1/(1+np.exp(-x))
    
    def sigmoid_output_2_derivative(self,output): 
        return output*(1-output)

    def train(self, training_reviews, training_labels):
        
        # checking we have a matching number of reviews and labels
        assert(len(training_reviews) == len(training_labels))
        
        # Keeping track of correct predictions to display accuracy during training 
        correct_so_far = 0
        
        # Remembering the time when we started for printing time statistics
        start = time.time()

        # looping through all the given reviews and run a forward and backward pass,
        # updating weights for every item
        for i in range(len(training_reviews)):
            
            # Getting the next review and its correct label
            review = training_reviews[i]
            label = training_labels[i]
            # Implementing the forward pass through the network. 
            # That means use the given review to update the input layer, 
            # then calculate values for the hidden layer,
            # and finally calculate the output layer.
            # Input Layer
            self.update_input_layer(review)
            # Hidden layer
            layer_1 = self.layer_0.dot(self.weights_0_1)
            # Output layer
            layer_2 = self.sigmoid(layer_1.dot(self.weights_1_2))
            # Implementing the back propagation pass here. 
            # That means calculate the error for the forward pass's prediction
            # and update the weights in the network according to their
            # contributions toward the error, as calculated via the
            # gradient descent and back propagation algorithms you 
            # learned in class.
            ### Backward pass ###
            
            # Output error
            layer_2_error = layer_2 - self.get_target_for_label(label)
            # Output layer error is the difference between desired target and actual output.
            layer_2_delta = layer_2_error * self.sigmoid_output_2_derivative(layer_2)

            # Backpropagated error
            layer_1_error = layer_2_delta.dot(self.weights_1_2.T) # errors propagated to the hidden layer
            layer_1_delta = layer_1_error
            # hidden layer gradients - no nonlinearity so it's the same as the error

            # Updating the weights
            
            # update hidden-to-output weights with gradient descent step
            self.weights_1_2 -= layer_1.T.dot(layer_2_delta) * self.learning_rate
            # update input-to-hidden weights with gradient descent step
            self.weights_0_1 -= self.layer_0.T.dot(layer_1_delta) * self.learning_rate
            # To determine if the prediction was
            # correct, we check that the absolute value of the output error 
            # is less than 0.5. If so, add one to the correct_so_far count.
            if(layer_2 >= 0.5 and label == 'POSITIVE'):
                correct_so_far += 1
            elif(layer_2 < 0.5 and label == 'NEGATIVE'):
                correct_so_far += 1
            # printing out our prediction accuracy and speed 
            # throughout the training process. 

            elapsed_time = float(time.time() - start)
            reviews_per_second = i / elapsed_time if elapsed_time > 0 else 0
            
            sys.stdout.write("\rProgress:" + str(100 * i/float(len(training_reviews)))[:4] \
                             + "% Speed(reviews/sec):" + str(reviews_per_second)[0:5] \
                             + " #Correct:" + str(correct_so_far) + " #Trained:" + str(i+1) \
                             + " Training Accuracy:" + str(correct_so_far * 100 / float(i+1))[:4] + "%")
            if(i % 2500 == 0):
                print("")
    
    def test(self, testing_reviews, testing_labels):
        # keeping track of how many correct predictions we make
        correct = 0

        # we'll time how many predictions per second we make
        start = time.time()

        # Looping through each of the given reviews and calling run to predict
        # its label. 
        for i in range(len(testing_reviews)):
            pred = self.run(testing_reviews[i])
            if(pred == testing_labels[i]):
                correct += 1
            
            # printing out the prediction accuracy and speed 
            # throughout the prediction process. 

            elapsed_time = float(time.time() - start)
            reviews_per_second = i / elapsed_time if elapsed_time > 0 else 0
            
            sys.stdout.write("\rProgress:" + str(100 * i/float(len(testing_reviews)))[:4] \
                             + "% Speed(reviews/sec):" + str(reviews_per_second)[0:5] \
                             + " #Correct:" + str(correct) + " #Tested:" + str(i+1) \
                             + " Testing Accuracy:" + str(correct * 100 / float(i+1))[:4] + "%")
            if(i % 100 == 0):
                print("")
    
    def run(self, review):
         # Input Layer
        self.update_input_layer(review.lower())

        # Hidden layer
        layer_1 = self.layer_0.dot(self.weights_0_1)

        # Output layer
        layer_2 = self.sigmoid(layer_1.dot(self.weights_1_2))
        
        if(layer_2[0] >= 0.5):
            return "POSITIVE"
        else:
            return "NEGATIVE"


In the following cell we create a `SentimentNetwork` that will train on all but the last 1000 reviews (we're saving those for testing). Here we use a learning rate of `0.01`.

In [70]:
mlp = SentimentNetwork(reviews[:-1000],labels[:-1000], learning_rate=0.001)

We run the following cell to test the network's performance against the last 1000 reviews (the ones we held out from our training set). 

**We have not trained the model yet, so the results should be about 50% as it will just be guessing and there are only two possible values to choose from.**

In [71]:
mlp.test(reviews[-1000:],labels[-1000:])

Progress:0.0% Speed(reviews/sec):0.0 #Correct:1 #Tested:1 Testing Accuracy:100.%
Progress:10.0% Speed(reviews/sec):407.2 #Correct:51 #Tested:101 Testing Accuracy:50.4%
Progress:20.0% Speed(reviews/sec):432.7 #Correct:101 #Tested:201 Testing Accuracy:50.2%
Progress:30.0% Speed(reviews/sec):441.2 #Correct:151 #Tested:301 Testing Accuracy:50.1%
Progress:40.0% Speed(reviews/sec):447.2 #Correct:201 #Tested:401 Testing Accuracy:50.1%
Progress:50.0% Speed(reviews/sec):441.2 #Correct:251 #Tested:501 Testing Accuracy:50.0%
Progress:60.0% Speed(reviews/sec):446.6 #Correct:301 #Tested:601 Testing Accuracy:50.0%
Progress:70.0% Speed(reviews/sec):446.2 #Correct:351 #Tested:701 Testing Accuracy:50.0%
Progress:80.0% Speed(reviews/sec):450.3 #Correct:401 #Tested:801 Testing Accuracy:50.0%
Progress:90.0% Speed(reviews/sec):451.5 #Correct:451 #Tested:901 Testing Accuracy:50.0%
Progress:99.9% Speed(reviews/sec):451.9 #Correct:500 #Tested:1000 Testing Accuracy:50.0%

We run the following cell to actually train the network. During training, it will display the model's accuracy repeatedly as it trains so we can see how well it's doing.

In [73]:
mlp.train(reviews[:-1000],labels[:-1000])

Progress:0.0% Speed(reviews/sec):0.0 #Correct:1 #Trained:1 Training Accuracy:100.%
Progress:10.4% Speed(reviews/sec):59.40 #Correct:1258 #Trained:2501 Training Accuracy:50.2%
Progress:20.8% Speed(reviews/sec):58.94 #Correct:2583 #Trained:5001 Training Accuracy:51.6%
Progress:31.2% Speed(reviews/sec):58.83 #Correct:3942 #Trained:7501 Training Accuracy:52.5%
Progress:41.6% Speed(reviews/sec):59.32 #Correct:5441 #Trained:10001 Training Accuracy:54.4%
Progress:52.0% Speed(reviews/sec):58.65 #Correct:7009 #Trained:12501 Training Accuracy:56.0%
Progress:62.5% Speed(reviews/sec):58.99 #Correct:8613 #Trained:15001 Training Accuracy:57.4%
Progress:72.9% Speed(reviews/sec):58.64 #Correct:10179 #Trained:17501 Training Accuracy:58.1%
Progress:83.3% Speed(reviews/sec):58.65 #Correct:11844 #Trained:20001 Training Accuracy:59.2%
Progress:93.7% Speed(reviews/sec):61.24 #Correct:13538 #Trained:22501 Training Accuracy:60.1%
Progress:99.9% Speed(reviews/sec):62.65 #Correct:14621 #Trained:24000 Training A

In [74]:
mlp.test(reviews[-1000:],labels[-1000:])

Progress:0.0% Speed(reviews/sec):0.0 #Correct:1 #Tested:1 Testing Accuracy:100.%
Progress:0.1% Speed(reviews/sec):166.2 #Correct:1 #Tested:2 Testing Accuracy:50.0%Progress:0.2% Speed(reviews/sec):249.2 #Correct:1 #Tested:3 Testing Accuracy:33.3%Progress:0.3% Speed(reviews/sec):332.3 #Correct:2 #Tested:4 Testing Accuracy:50.0%Progress:0.4% Speed(reviews/sec):362.7 #Correct:3 #Tested:5 Testing Accuracy:60.0%Progress:0.5% Speed(reviews/sec):415.4 #Correct:4 #Tested:6 Testing Accuracy:66.6%Progress:0.6% Speed(reviews/sec):427.3 #Correct:5 #Tested:7 Testing Accuracy:71.4%Progress:0.7% Speed(reviews/sec):465.3 #Correct:6 #Tested:8 Testing Accuracy:75.0%Progress:0.8% Speed(reviews/sec):469.2 #Correct:7 #Tested:9 Testing Accuracy:77.7%Progress:0.9% Speed(reviews/sec):448.7 #Correct:7 #Tested:10 Testing Accuracy:70.0%Progress:1.0% Speed(reviews/sec):453.3 #Correct:8 #Tested:11 Testing Accuracy:72.7%Progress:1.1% Speed(reviews/sec):476.9 #Correct:9 #Tested:12 Testing Accuracy:75.0%P

Progress:10.1% Speed(reviews/sec):524.6 #Correct:73 #Tested:102 Testing Accuracy:71.5%Progress:10.2% Speed(reviews/sec):524.3 #Correct:74 #Tested:103 Testing Accuracy:71.8%Progress:10.3% Speed(reviews/sec):524.1 #Correct:74 #Tested:104 Testing Accuracy:71.1%Progress:10.4% Speed(reviews/sec):526.5 #Correct:75 #Tested:105 Testing Accuracy:71.4%Progress:10.5% Speed(reviews/sec):528.8 #Correct:76 #Tested:106 Testing Accuracy:71.6%Progress:10.6% Speed(reviews/sec):528.5 #Correct:77 #Tested:107 Testing Accuracy:71.9%Progress:10.7% Speed(reviews/sec):530.9 #Correct:77 #Tested:108 Testing Accuracy:71.2%Progress:10.8% Speed(reviews/sec):525.4 #Correct:78 #Tested:109 Testing Accuracy:71.5%Progress:10.9% Speed(reviews/sec):525.1 #Correct:78 #Tested:110 Testing Accuracy:70.9%Progress:11.0% Speed(reviews/sec):524.9 #Correct:79 #Tested:111 Testing Accuracy:71.1%Progress:11.1% Speed(reviews/sec):524.6 #Correct:79 #Tested:112 Testing Accuracy:70.5%Progress:11.2% Speed(reviews/sec):517.1 #C

Progress:20.1% Speed(reviews/sec):532.3 #Correct:135 #Tested:202 Testing Accuracy:66.8%Progress:20.2% Speed(reviews/sec):532.1 #Correct:136 #Tested:203 Testing Accuracy:66.9%Progress:20.3% Speed(reviews/sec):532.0 #Correct:136 #Tested:204 Testing Accuracy:66.6%Progress:20.4% Speed(reviews/sec):533.2 #Correct:137 #Tested:205 Testing Accuracy:66.8%Progress:20.5% Speed(reviews/sec):533.0 #Correct:137 #Tested:206 Testing Accuracy:66.5%Progress:20.6% Speed(reviews/sec):532.8 #Correct:138 #Tested:207 Testing Accuracy:66.6%Progress:20.7% Speed(reviews/sec):532.6 #Correct:138 #Tested:208 Testing Accuracy:66.3%Progress:20.8% Speed(reviews/sec):533.8 #Correct:139 #Tested:209 Testing Accuracy:66.5%Progress:20.9% Speed(reviews/sec):533.7 #Correct:139 #Tested:210 Testing Accuracy:66.1%Progress:21.0% Speed(reviews/sec):533.5 #Correct:140 #Tested:211 Testing Accuracy:66.3%Progress:21.1% Speed(reviews/sec):533.3 #Correct:141 #Tested:212 Testing Accuracy:66.5%Progress:21.2% Speed(reviews/se

Progress:30.1% Speed(reviews/sec):531.7 #Correct:198 #Tested:302 Testing Accuracy:65.5%Progress:30.2% Speed(reviews/sec):531.6 #Correct:199 #Tested:303 Testing Accuracy:65.6%Progress:30.3% Speed(reviews/sec):531.5 #Correct:199 #Tested:304 Testing Accuracy:65.4%Progress:30.4% Speed(reviews/sec):532.3 #Correct:200 #Tested:305 Testing Accuracy:65.5%Progress:30.5% Speed(reviews/sec):533.1 #Correct:200 #Tested:306 Testing Accuracy:65.3%Progress:30.6% Speed(reviews/sec):533.0 #Correct:201 #Tested:307 Testing Accuracy:65.4%Progress:30.7% Speed(reviews/sec):532.8 #Correct:201 #Tested:308 Testing Accuracy:65.2%Progress:30.8% Speed(reviews/sec):533.7 #Correct:202 #Tested:309 Testing Accuracy:65.3%Progress:30.9% Speed(reviews/sec):533.5 #Correct:202 #Tested:310 Testing Accuracy:65.1%Progress:31.0% Speed(reviews/sec):534.3 #Correct:203 #Tested:311 Testing Accuracy:65.2%Progress:31.1% Speed(reviews/sec):535.1 #Correct:203 #Tested:312 Testing Accuracy:65.0%Progress:31.2% Speed(reviews/se

Progress:40.1% Speed(reviews/sec):557.3 #Correct:254 #Tested:402 Testing Accuracy:63.1%Progress:40.2% Speed(reviews/sec):557.9 #Correct:255 #Tested:403 Testing Accuracy:63.2%Progress:40.3% Speed(reviews/sec):558.5 #Correct:255 #Tested:404 Testing Accuracy:63.1%Progress:40.4% Speed(reviews/sec):558.4 #Correct:256 #Tested:405 Testing Accuracy:63.2%Progress:40.5% Speed(reviews/sec):559.0 #Correct:256 #Tested:406 Testing Accuracy:63.0%Progress:40.6% Speed(reviews/sec):559.6 #Correct:257 #Tested:407 Testing Accuracy:63.1%Progress:40.7% Speed(reviews/sec):560.2 #Correct:257 #Tested:408 Testing Accuracy:62.9%Progress:40.8% Speed(reviews/sec):560.0 #Correct:258 #Tested:409 Testing Accuracy:63.0%Progress:40.9% Speed(reviews/sec):560.6 #Correct:259 #Tested:410 Testing Accuracy:63.1%Progress:41.0% Speed(reviews/sec):561.2 #Correct:260 #Tested:411 Testing Accuracy:63.2%Progress:41.1% Speed(reviews/sec):561.0 #Correct:260 #Tested:412 Testing Accuracy:63.1%Progress:41.2% Speed(reviews/se

Progress:60.1% Speed(reviews/sec):597.8 #Correct:382 #Tested:602 Testing Accuracy:63.4%Progress:60.2% Speed(reviews/sec):598.2 #Correct:382 #Tested:603 Testing Accuracy:63.3%Progress:60.3% Speed(reviews/sec):598.6 #Correct:383 #Tested:604 Testing Accuracy:63.4%Progress:60.4% Speed(reviews/sec):599.0 #Correct:384 #Tested:605 Testing Accuracy:63.4%Progress:60.5% Speed(reviews/sec):598.2 #Correct:384 #Tested:606 Testing Accuracy:63.3%Progress:60.6% Speed(reviews/sec):598.0 #Correct:384 #Tested:607 Testing Accuracy:63.2%Progress:60.7% Speed(reviews/sec):597.8 #Correct:384 #Tested:608 Testing Accuracy:63.1%Progress:60.8% Speed(reviews/sec):597.6 #Correct:385 #Tested:609 Testing Accuracy:63.2%Progress:60.9% Speed(reviews/sec):597.4 #Correct:386 #Tested:610 Testing Accuracy:63.2%Progress:61.0% Speed(reviews/sec):597.8 #Correct:386 #Tested:611 Testing Accuracy:63.1%Progress:61.1% Speed(reviews/sec):597.6 #Correct:386 #Tested:612 Testing Accuracy:63.0%Progress:61.2% Speed(reviews/se

Progress:70.1% Speed(reviews/sec):610.8 #Correct:444 #Tested:702 Testing Accuracy:63.2%Progress:70.2% Speed(reviews/sec):611.1 #Correct:445 #Tested:703 Testing Accuracy:63.3%Progress:70.3% Speed(reviews/sec):610.9 #Correct:445 #Tested:704 Testing Accuracy:63.2%Progress:70.4% Speed(reviews/sec):611.3 #Correct:446 #Tested:705 Testing Accuracy:63.2%Progress:70.5% Speed(reviews/sec):611.1 #Correct:446 #Tested:706 Testing Accuracy:63.1%Progress:70.6% Speed(reviews/sec):611.4 #Correct:447 #Tested:707 Testing Accuracy:63.2%Progress:70.7% Speed(reviews/sec):611.2 #Correct:448 #Tested:708 Testing Accuracy:63.2%Progress:70.8% Speed(reviews/sec):611.5 #Correct:449 #Tested:709 Testing Accuracy:63.3%Progress:70.9% Speed(reviews/sec):611.3 #Correct:449 #Tested:710 Testing Accuracy:63.2%Progress:71.0% Speed(reviews/sec):611.7 #Correct:450 #Tested:711 Testing Accuracy:63.2%Progress:71.1% Speed(reviews/sec):611.5 #Correct:451 #Tested:712 Testing Accuracy:63.3%Progress:71.2% Speed(reviews/se

Progress:80.1% Speed(reviews/sec):603.4 #Correct:505 #Tested:802 Testing Accuracy:62.9%Progress:80.2% Speed(reviews/sec):603.7 #Correct:506 #Tested:803 Testing Accuracy:63.0%Progress:80.3% Speed(reviews/sec):603.9 #Correct:506 #Tested:804 Testing Accuracy:62.9%Progress:80.4% Speed(reviews/sec):604.2 #Correct:507 #Tested:805 Testing Accuracy:62.9%Progress:80.5% Speed(reviews/sec):604.5 #Correct:507 #Tested:806 Testing Accuracy:62.9%Progress:80.6% Speed(reviews/sec):604.8 #Correct:508 #Tested:807 Testing Accuracy:62.9%Progress:80.7% Speed(reviews/sec):604.7 #Correct:508 #Tested:808 Testing Accuracy:62.8%Progress:80.8% Speed(reviews/sec):604.1 #Correct:509 #Tested:809 Testing Accuracy:62.9%Progress:80.9% Speed(reviews/sec):603.5 #Correct:509 #Tested:810 Testing Accuracy:62.8%Progress:81.0% Speed(reviews/sec):603.8 #Correct:510 #Tested:811 Testing Accuracy:62.8%Progress:81.1% Speed(reviews/sec):603.6 #Correct:510 #Tested:812 Testing Accuracy:62.8%Progress:81.2% Speed(reviews/se

Progress:90.1% Speed(reviews/sec):601.0 #Correct:563 #Tested:902 Testing Accuracy:62.4%Progress:90.2% Speed(reviews/sec):600.9 #Correct:564 #Tested:903 Testing Accuracy:62.4%Progress:90.3% Speed(reviews/sec):601.2 #Correct:564 #Tested:904 Testing Accuracy:62.3%Progress:90.4% Speed(reviews/sec):600.2 #Correct:565 #Tested:905 Testing Accuracy:62.4%Progress:90.5% Speed(reviews/sec):600.5 #Correct:565 #Tested:906 Testing Accuracy:62.3%Progress:90.6% Speed(reviews/sec):600.8 #Correct:566 #Tested:907 Testing Accuracy:62.4%Progress:90.7% Speed(reviews/sec):600.6 #Correct:567 #Tested:908 Testing Accuracy:62.4%Progress:90.8% Speed(reviews/sec):600.9 #Correct:568 #Tested:909 Testing Accuracy:62.4%Progress:90.9% Speed(reviews/sec):601.2 #Correct:568 #Tested:910 Testing Accuracy:62.4%Progress:91.0% Speed(reviews/sec):601.0 #Correct:569 #Tested:911 Testing Accuracy:62.4%Progress:91.1% Speed(reviews/sec):601.3 #Correct:569 #Tested:912 Testing Accuracy:62.3%Progress:91.2% Speed(reviews/se

# Reducing Noise in Our Input Data
* Modifying `update_input_layer` so it does not count how many times each word is used, but rather just stores whether or not a word was used. 

The following code is the same as the previous project, with project-specific changes marked with "New"

In [5]:
g = open('reviews.txt','r') # What we know!
reviews = list(map(lambda x:x[:-1],g.readlines()))
g.close()

g = open('labels.txt','r') # What we WANT to know!
labels = list(map(lambda x:x[:-1].upper(),g.readlines()))
g.close()

import time
import sys
import numpy as np

# neural network class
class SentimentNetwork:
    def __init__(self, reviews, labels, hidden_nodes = 10, learning_rate = 0.1): 
        np.random.seed(1)
        self.pre_process_data(reviews, labels)
        self.init_network(len(self.review_vocab),hidden_nodes, 1, learning_rate)

    def pre_process_data(self, reviews, labels):
        review_vocab = set()
        for review in reviews:
            for word in review.split(' '):
                review_vocab.add(word)
                
        # Converting the vocabulary set to a list so we can access words via indices
        self.review_vocab = list(review_vocab)
        
        label_vocab = set()
        # populating label_vocab with all of the words in the given labels.
        for label in labels:
            label_vocab.add(label)
        # Converting the label vocabulary set to a list so we can access labels via indices
        self.label_vocab = list(label_vocab)
        
        # Storing the sizes of the review and label vocabularies.
        self.review_vocab_size = len(self.review_vocab)
        self.label_vocab_size = len(self.label_vocab)
        
        # Creating a dictionary of words in the vocabulary mapped to index positions
        self.word2index = {}
        # populating self.word2index with indices for all the words in self.review_vocab
        for i, word in enumerate(self.review_vocab):
            self.word2index[word] = i
        # Creating a dictionary of labels mapped to index positions
        self.label2index = {}
        for i, label in enumerate(self.label_vocab):
            self.label2index[label] = i
        
    def init_network(self, input_nodes, hidden_nodes, output_nodes, learning_rate):
        # Storing the number of nodes in input, hidden, and output layers.
        self.input_nodes = input_nodes
        self.hidden_nodes = hidden_nodes
        self.output_nodes = output_nodes

        # Storing the learning rate
        self.learning_rate = learning_rate

        # Initialize weights between the input layer and the hidden layer
        self.weights_0_1 = np.zeros((self.input_nodes,self.hidden_nodes))
        
        # Initialize weights between the hidden layer and the output layer
        self.weights_1_2 = np.random.normal(0.0, self.output_nodes**-0.5, 
                                                (self.hidden_nodes, self.output_nodes))
        
        # Creating the input layer, a two-dimensional matrix with shape 
        #   1 x input_nodes, with all values initialized to zero
        self.layer_0 = np.zeros((1,input_nodes))
    
        
    def update_input_layer(self,review):
        self.layer_0 *= 0

        # counting how many times each word is used in the given review and store the results in layer_0
        for word in review.split(" "):
            if(word in self.word2index.keys()):
                self.layer_0[0][self.word2index[word]] = 1
                
    def get_target_for_label(self,label):
        if label=='NEGATIVE':
            return 0
        else:
            return 1
        
    def sigmoid(self,x):
        return 1/(1+np.exp(-x))
    
    def sigmoid_output_2_derivative(self,output): 
        return output*(1-output)

    def train(self, training_reviews, training_labels):
        
        # checking we have a matching number of reviews and labels
        assert(len(training_reviews) == len(training_labels))
        
        # Keeping track of correct predictions to display accuracy during training 
        correct_so_far = 0
        
        # Remembering the time when we started for printing time statistics
        start = time.time()

        # looping through all the given reviews and run a forward and backward pass,
        # updating weights for every item
        for i in range(len(training_reviews)):
            
            # Getting the next review and its correct label
            review = training_reviews[i]
            label = training_labels[i]
            # Implementing the forward pass through the network. 
            # That means use the given review to update the input layer, 
            # then calculate values for the hidden layer,
            # and finally calculate the output layer.
            # Input Layer
            self.update_input_layer(review)
            # Hidden layer
            layer_1 = self.layer_0.dot(self.weights_0_1)
            # Output layer
            layer_2 = self.sigmoid(layer_1.dot(self.weights_1_2))
            # Implementing the back propagation pass here. 
            # That means calculate the error for the forward pass's prediction
            # and update the weights in the network according to their
            # contributions toward the error, as calculated via the
            # gradient descent and back propagation algorithms you 
            # learned in class.
            ### Backward pass ###
            
            # Output error
            layer_2_error = layer_2 - self.get_target_for_label(label)
            # Output layer error is the difference between desired target and actual output.
            layer_2_delta = layer_2_error * self.sigmoid_output_2_derivative(layer_2)

            # Backpropagated error
            layer_1_error = layer_2_delta.dot(self.weights_1_2.T) # errors propagated to the hidden layer
            layer_1_delta = layer_1_error
            # hidden layer gradients - no nonlinearity so it's the same as the error

            # Updating the weights
            
            # update hidden-to-output weights with gradient descent step
            self.weights_1_2 -= layer_1.T.dot(layer_2_delta) * self.learning_rate
            # update input-to-hidden weights with gradient descent step
            self.weights_0_1 -= self.layer_0.T.dot(layer_1_delta) * self.learning_rate
            # To determine if the prediction was
            # correct, we check that the absolute value of the output error 
            # is less than 0.5. If so, add one to the correct_so_far count.
            if(layer_2 >= 0.5 and label == 'POSITIVE'):
                correct_so_far += 1
            elif(layer_2 < 0.5 and label == 'NEGATIVE'):
                correct_so_far += 1
            # printing out our prediction accuracy and speed 
            # throughout the training process. 

            elapsed_time = float(time.time() - start)
            reviews_per_second = i / elapsed_time if elapsed_time > 0 else 0
            
            sys.stdout.write("\rProgress:" + str(100 * i/float(len(training_reviews)))[:4] \
                             + "% Speed(reviews/sec):" + str(reviews_per_second)[0:5] \
                             + " #Correct:" + str(correct_so_far) + " #Trained:" + str(i+1) \
                             + " Training Accuracy:" + str(correct_so_far * 100 / float(i+1))[:4] + "%")
            if(i % 2500 == 0):
                print("")
    
    def test(self, testing_reviews, testing_labels):
        # keeping track of how many correct predictions we make
        correct = 0

        # we'll time how many predictions per second we make
        start = time.time()

        # Looping through each of the given reviews and calling run to predict
        # its label. 
        for i in range(len(testing_reviews)):
            pred = self.run(testing_reviews[i])
            if(pred == testing_labels[i]):
                correct += 1
            
            # printing out the prediction accuracy and speed 
            # throughout the prediction process. 

            elapsed_time = float(time.time() - start)
            reviews_per_second = i / elapsed_time if elapsed_time > 0 else 0
            
            sys.stdout.write("\rProgress:" + str(100 * i/float(len(testing_reviews)))[:4] \
                             + "% Speed(reviews/sec):" + str(reviews_per_second)[0:5] \
                             + " #Correct:" + str(correct) + " #Tested:" + str(i+1) \
                             + " Testing Accuracy:" + str(correct * 100 / float(i+1))[:4] + "%")
            if(i % 100 == 0):
                print("")
    
    def run(self, review):
         # Input Layer
        self.update_input_layer(review.lower())

        # Hidden layer
        layer_1 = self.layer_0.dot(self.weights_0_1)

        # Output layer
        layer_2 = self.sigmoid(layer_1.dot(self.weights_1_2))
        
        if(layer_2[0] >= 0.5):
            return "POSITIVE"
        else:
            return "NEGATIVE"


In [6]:
mlp = SentimentNetwork(reviews[:-1000],labels[:-1000], learning_rate=0.1)
mlp.train(reviews[:-1000],labels[:-1000])

Progress:0.0% Speed(reviews/sec):0.0 #Correct:1 #Trained:1 Training Accuracy:100.%
Progress:10.4% Speed(reviews/sec):103.0 #Correct:1787 #Trained:2501 Training Accuracy:71.4%
Progress:20.8% Speed(reviews/sec):103.5 #Correct:3788 #Trained:5001 Training Accuracy:75.7%
Progress:31.2% Speed(reviews/sec):102.7 #Correct:5892 #Trained:7501 Training Accuracy:78.5%
Progress:41.6% Speed(reviews/sec):101.1 #Correct:8029 #Trained:10001 Training Accuracy:80.2%
Progress:52.0% Speed(reviews/sec):100.8 #Correct:10158 #Trained:12501 Training Accuracy:81.2%
Progress:62.5% Speed(reviews/sec):100.6 #Correct:12292 #Trained:15001 Training Accuracy:81.9%
Progress:72.9% Speed(reviews/sec):100.7 #Correct:14408 #Trained:17501 Training Accuracy:82.3%
Progress:83.3% Speed(reviews/sec):100.6 #Correct:16592 #Trained:20001 Training Accuracy:82.9%
Progress:93.7% Speed(reviews/sec):100.6 #Correct:18771 #Trained:22501 Training Accuracy:83.4%
Progress:99.9% Speed(reviews/sec):100.4 #Correct:20092 #Trained:24000 Training

In [8]:
mlp.test(reviews[-1000:],labels[-1000:])

Progress:0.0% Speed(reviews/sec):0.0 #Correct:1 #Tested:1 Testing Accuracy:100.%
Progress:0.1% Speed(reviews/sec):142.4 #Correct:1 #Tested:2 Testing Accuracy:50.0%Progress:0.2% Speed(reviews/sec):249.3 #Correct:2 #Tested:3 Testing Accuracy:66.6%Progress:0.3% Speed(reviews/sec):299.1 #Correct:3 #Tested:4 Testing Accuracy:75.0%Progress:0.4% Speed(reviews/sec):362.6 #Correct:4 #Tested:5 Testing Accuracy:80.0%Progress:0.5% Speed(reviews/sec):383.5 #Correct:5 #Tested:6 Testing Accuracy:83.3%Progress:0.6% Speed(reviews/sec):398.9 #Correct:6 #Tested:7 Testing Accuracy:85.7%Progress:0.7% Speed(reviews/sec):436.3 #Correct:7 #Tested:8 Testing Accuracy:87.5%Progress:0.8% Speed(reviews/sec):443.1 #Correct:8 #Tested:9 Testing Accuracy:88.8%Progress:0.9% Speed(reviews/sec):427.4 #Correct:9 #Tested:10 Testing Accuracy:90.0%Progress:1.0% Speed(reviews/sec):433.6 #Correct:10 #Tested:11 Testing Accuracy:90.9%Progress:1.1% Speed(reviews/sec):457.0 #Correct:11 #Tested:12 Testing Accuracy:91.6%

Progress:10.1% Speed(reviews/sec):620.6 #Correct:89 #Tested:102 Testing Accuracy:87.2%Progress:10.2% Speed(reviews/sec):622.9 #Correct:90 #Tested:103 Testing Accuracy:87.3%Progress:10.3% Speed(reviews/sec):625.2 #Correct:90 #Tested:104 Testing Accuracy:86.5%Progress:10.4% Speed(reviews/sec):627.4 #Correct:91 #Tested:105 Testing Accuracy:86.6%Progress:10.5% Speed(reviews/sec):629.6 #Correct:92 #Tested:106 Testing Accuracy:86.7%Progress:10.6% Speed(reviews/sec):628.1 #Correct:93 #Tested:107 Testing Accuracy:86.9%Progress:10.7% Speed(reviews/sec):630.2 #Correct:94 #Tested:108 Testing Accuracy:87.0%Progress:10.8% Speed(reviews/sec):632.4 #Correct:95 #Tested:109 Testing Accuracy:87.1%Progress:10.9% Speed(reviews/sec):634.5 #Correct:95 #Tested:110 Testing Accuracy:86.3%Progress:11.0% Speed(reviews/sec):636.6 #Correct:96 #Tested:111 Testing Accuracy:86.4%Progress:11.1% Speed(reviews/sec):638.7 #Correct:97 #Tested:112 Testing Accuracy:86.6%Progress:11.2% Speed(reviews/sec):637.1 #C

Progress:20.1% Speed(reviews/sec):684.9 #Correct:175 #Tested:202 Testing Accuracy:86.6%Progress:20.2% Speed(reviews/sec):685.9 #Correct:176 #Tested:203 Testing Accuracy:86.6%Progress:20.3% Speed(reviews/sec):687.0 #Correct:177 #Tested:204 Testing Accuracy:86.7%Progress:20.4% Speed(reviews/sec):688.1 #Correct:178 #Tested:205 Testing Accuracy:86.8%Progress:20.5% Speed(reviews/sec):686.8 #Correct:179 #Tested:206 Testing Accuracy:86.8%Progress:20.6% Speed(reviews/sec):687.8 #Correct:180 #Tested:207 Testing Accuracy:86.9%Progress:20.7% Speed(reviews/sec):688.9 #Correct:181 #Tested:208 Testing Accuracy:87.0%Progress:20.8% Speed(reviews/sec):689.9 #Correct:182 #Tested:209 Testing Accuracy:87.0%Progress:20.9% Speed(reviews/sec):690.9 #Correct:182 #Tested:210 Testing Accuracy:86.6%Progress:21.0% Speed(reviews/sec):691.3 #Correct:182 #Tested:211 Testing Accuracy:86.2%Progress:21.1% Speed(reviews/sec):694.6 #Correct:183 #Tested:212 Testing Accuracy:86.3%Progress:21.2% Speed(reviews/se

Progress:40.1% Speed(reviews/sec):726.9 #Correct:351 #Tested:402 Testing Accuracy:87.3%Progress:40.2% Speed(reviews/sec):728.7 #Correct:352 #Tested:403 Testing Accuracy:87.3%Progress:40.3% Speed(reviews/sec):730.5 #Correct:352 #Tested:404 Testing Accuracy:87.1%Progress:40.4% Speed(reviews/sec):732.3 #Correct:353 #Tested:405 Testing Accuracy:87.1%Progress:40.5% Speed(reviews/sec):734.1 #Correct:354 #Tested:406 Testing Accuracy:87.1%Progress:40.6% Speed(reviews/sec):735.9 #Correct:355 #Tested:407 Testing Accuracy:87.2%Progress:40.7% Speed(reviews/sec):737.8 #Correct:356 #Tested:408 Testing Accuracy:87.2%Progress:40.8% Speed(reviews/sec):739.6 #Correct:357 #Tested:409 Testing Accuracy:87.2%Progress:40.9% Speed(reviews/sec):741.4 #Correct:358 #Tested:410 Testing Accuracy:87.3%Progress:41.0% Speed(reviews/sec):743.2 #Correct:359 #Tested:411 Testing Accuracy:87.3%Progress:41.1% Speed(reviews/sec):724.5 #Correct:360 #Tested:412 Testing Accuracy:87.3%Progress:41.2% Speed(reviews/se

Progress:50.1% Speed(reviews/sec):731.7 #Correct:440 #Tested:502 Testing Accuracy:87.6%Progress:50.2% Speed(reviews/sec):732.1 #Correct:441 #Tested:503 Testing Accuracy:87.6%Progress:50.3% Speed(reviews/sec):731.4 #Correct:442 #Tested:504 Testing Accuracy:87.6%Progress:50.4% Speed(reviews/sec):731.8 #Correct:443 #Tested:505 Testing Accuracy:87.7%Progress:50.5% Speed(reviews/sec):731.1 #Correct:444 #Tested:506 Testing Accuracy:87.7%Progress:50.6% Speed(reviews/sec):730.4 #Correct:445 #Tested:507 Testing Accuracy:87.7%Progress:50.7% Speed(reviews/sec):727.6 #Correct:446 #Tested:508 Testing Accuracy:87.7%Progress:50.8% Speed(reviews/sec):728.0 #Correct:447 #Tested:509 Testing Accuracy:87.8%Progress:50.9% Speed(reviews/sec):728.4 #Correct:448 #Tested:510 Testing Accuracy:87.8%Progress:51.0% Speed(reviews/sec):727.8 #Correct:449 #Tested:511 Testing Accuracy:87.8%Progress:51.1% Speed(reviews/sec):727.1 #Correct:450 #Tested:512 Testing Accuracy:87.8%Progress:51.2% Speed(reviews/se

Progress:70.1% Speed(reviews/sec):726.8 #Correct:605 #Tested:702 Testing Accuracy:86.1%Progress:70.2% Speed(reviews/sec):727.1 #Correct:606 #Tested:703 Testing Accuracy:86.2%Progress:70.3% Speed(reviews/sec):727.4 #Correct:607 #Tested:704 Testing Accuracy:86.2%Progress:70.4% Speed(reviews/sec):727.6 #Correct:608 #Tested:705 Testing Accuracy:86.2%Progress:70.5% Speed(reviews/sec):727.2 #Correct:609 #Tested:706 Testing Accuracy:86.2%Progress:70.6% Speed(reviews/sec):726.7 #Correct:610 #Tested:707 Testing Accuracy:86.2%Progress:70.7% Speed(reviews/sec):727.0 #Correct:611 #Tested:708 Testing Accuracy:86.2%Progress:70.8% Speed(reviews/sec):726.5 #Correct:612 #Tested:709 Testing Accuracy:86.3%Progress:70.9% Speed(reviews/sec):726.8 #Correct:613 #Tested:710 Testing Accuracy:86.3%Progress:71.0% Speed(reviews/sec):727.1 #Correct:613 #Tested:711 Testing Accuracy:86.2%Progress:71.1% Speed(reviews/sec):727.3 #Correct:614 #Tested:712 Testing Accuracy:86.2%Progress:71.2% Speed(reviews/se

Progress:80.1% Speed(reviews/sec):723.0 #Correct:681 #Tested:802 Testing Accuracy:84.9%Progress:80.2% Speed(reviews/sec):722.6 #Correct:682 #Tested:803 Testing Accuracy:84.9%Progress:80.3% Speed(reviews/sec):722.9 #Correct:683 #Tested:804 Testing Accuracy:84.9%Progress:80.4% Speed(reviews/sec):723.1 #Correct:684 #Tested:805 Testing Accuracy:84.9%Progress:80.5% Speed(reviews/sec):722.7 #Correct:685 #Tested:806 Testing Accuracy:84.9%Progress:80.6% Speed(reviews/sec):722.9 #Correct:686 #Tested:807 Testing Accuracy:85.0%Progress:80.7% Speed(reviews/sec):722.5 #Correct:686 #Tested:808 Testing Accuracy:84.9%Progress:80.8% Speed(reviews/sec):722.8 #Correct:687 #Tested:809 Testing Accuracy:84.9%Progress:80.9% Speed(reviews/sec):722.4 #Correct:688 #Tested:810 Testing Accuracy:84.9%Progress:81.0% Speed(reviews/sec):722.6 #Correct:689 #Tested:811 Testing Accuracy:84.9%Progress:81.1% Speed(reviews/sec):722.2 #Correct:690 #Tested:812 Testing Accuracy:84.9%Progress:81.2% Speed(reviews/se

Progress:90.1% Speed(reviews/sec):703.3 #Correct:769 #Tested:902 Testing Accuracy:85.2%Progress:90.2% Speed(reviews/sec):703.0 #Correct:770 #Tested:903 Testing Accuracy:85.2%Progress:90.3% Speed(reviews/sec):703.2 #Correct:771 #Tested:904 Testing Accuracy:85.2%Progress:90.4% Speed(reviews/sec):702.9 #Correct:772 #Tested:905 Testing Accuracy:85.3%Progress:90.5% Speed(reviews/sec):703.1 #Correct:773 #Tested:906 Testing Accuracy:85.3%Progress:90.6% Speed(reviews/sec):702.8 #Correct:774 #Tested:907 Testing Accuracy:85.3%Progress:90.7% Speed(reviews/sec):703.1 #Correct:775 #Tested:908 Testing Accuracy:85.3%Progress:90.8% Speed(reviews/sec):702.7 #Correct:776 #Tested:909 Testing Accuracy:85.3%Progress:90.9% Speed(reviews/sec):702.4 #Correct:776 #Tested:910 Testing Accuracy:85.2%Progress:91.0% Speed(reviews/sec):702.1 #Correct:777 #Tested:911 Testing Accuracy:85.2%Progress:91.1% Speed(reviews/sec):702.3 #Correct:777 #Tested:912 Testing Accuracy:85.1%Progress:91.2% Speed(reviews/se

# Analyzing Inefficiencies in our Network

In [9]:
layer_0 = np.zeros(10)

In [10]:
layer_0

array([0., 0., 0., 0., 0., 0., 0., 0., 0., 0.])

In [11]:
layer_0[4] = 1
layer_0[9] = 1

In [12]:
layer_0

array([0., 0., 0., 0., 1., 0., 0., 0., 0., 1.])

In [13]:
weights_0_1 = np.random.randn(10,5)

In [14]:
layer_0.dot(weights_0_1)

array([-0.10503756,  0.44222989,  0.24392938, -0.55961832,  0.21389503])

In [15]:
indices = [4,9]

In [16]:
layer_1 = np.zeros(5)

In [17]:
for index in indices:
    layer_1 += (1 * weights_0_1[index])

In [18]:
layer_1

array([-0.10503756,  0.44222989,  0.24392938, -0.55961832,  0.21389503])

In [20]:
layer_1 = np.zeros(5)

In [21]:
for index in indices:
    layer_1 += (weights_0_1[index])

In [22]:
layer_1

array([-0.10503756,  0.44222989,  0.24392938, -0.55961832,  0.21389503])

# Making our Network More Efficient
Making the `SentimentNetwork` class more efficient by eliminating unnecessary multiplications and additions that occur during forward and backward propagation. To do that,we do the following:
* Remove the `update_input_layer` function.
* Modify `init_network`:
>* we no longer need a separate input layer, so remove any mention of `self.layer_0`
>* we will be dealing with the old hidden layer more directly, so creating `self.layer_1`, a two-dimensional matrix with shape 1 x hidden_nodes, with all values initialized to zero
* Modify `train`:
>* Change the name of the input parameter `training_reviews` to `training_reviews_raw`.
>* At the beginning of the function, we want to preprocess our reviews to convert them to a list of indices (from `word2index`) that are actually used in the review.Our code should create a local `list` variable named `training_reviews` that should contain a `list` for each review in `training_reviews_raw`. Those lists should contain the indices for words found in the review.
>* Remove call to `update_input_layer`
>* Use `self`'s  `layer_1` instead of a local `layer_1` object.
>* In the forward pass, replace the code that updates `layer_1` with new logic that only adds the weights for the indices used in the review.
>* When updating `weights_0_1`, only update the individual weights that were used in the forward pass.
* Modify `run`:
>* Remove call to `update_input_layer` 
>* Use `self`'s  `layer_1` instead of a local `layer_1` object.
>* Much like we did in `train`, we will need to pre-process the `review` so we can work with word indices, then update `layer_1` by adding weights for the indices used in the review.

In [45]:
g = open('reviews.txt','r') # What we know!
reviews = list(map(lambda x:x[:-1],g.readlines()))
g.close()

g = open('labels.txt','r') # What we WANT to know!
labels = list(map(lambda x:x[:-1].upper(),g.readlines()))
g.close()

import time
import sys
import numpy as np

# neural network class
class SentimentNetwork:
    def __init__(self, reviews, labels, hidden_nodes = 10, learning_rate = 0.1):
        np.random.seed(1)
        self.pre_process_data(reviews, labels)
        self.init_network(len(self.review_vocab),hidden_nodes, 1, learning_rate)

    def pre_process_data(self, reviews, labels):
        review_vocab = set()
        for review in reviews:
            for word in review.split(' '):
                review_vocab.add(word)

        # Converting the vocabulary set to a list so we can access words via indices
        self.review_vocab = list(review_vocab)

        label_vocab = set()
        # populating label_vocab with all of the words in the given labels.
        for label in labels:
            label_vocab.add(label)
        # Converting the label vocabulary set to a list so we can access labels via indices
        self.label_vocab = list(label_vocab)

        # Storing the sizes of the review and label vocabularies.
        self.review_vocab_size = len(self.review_vocab)
        self.label_vocab_size = len(self.label_vocab)

        # Creating a dictionary of words in the vocabulary mapped to index positions
        self.word2index = {}
        # populating self.word2index with indices for all the words in self.review_vocab
        for i, word in enumerate(self.review_vocab):
            self.word2index[word] = i
        # Creating a dictionary of labels mapped to index positions
        self.label2index = {}
        for i, label in enumerate(self.label_vocab):
            self.label2index[label] = i

    def init_network(self, input_nodes, hidden_nodes, output_nodes, learning_rate):
        # Storing the number of nodes in input, hidden, and output layers.
        self.input_nodes = input_nodes
        self.hidden_nodes = hidden_nodes
        self.output_nodes = output_nodes

        # Storing the learning rate
        self.learning_rate = learning_rate

        # Initialize weights between the input layer and the hidden layer
        self.weights_0_1 = np.zeros((self.input_nodes,self.hidden_nodes))

        # Initialize weights between the hidden layer and the output layer
        self.weights_1_2 = np.random.normal(0.0, self.output_nodes**-0.5,
                                                (self.hidden_nodes, self.output_nodes))

        ## New : Removed self.layer_0; added self.layer_1
        # The input layer, a two-dimensional matrix with shape 1 x hidden_nodes
        self.layer_1 = np.zeros((1,hidden_nodes))

    def get_target_for_label(self,label):
        if label=='NEGATIVE':
            return 0
        else:
            return 1

    def sigmoid(self,x):
        return 1/(1+np.exp(-x))

    def sigmoid_output_2_derivative(self,output):
        return output*(1-output)

    def train(self, training_reviews_raw, training_labels):
         ## New : changed name of first parameter form 'training_reviews'
         #                     to 'training_reviews_raw'
        ##pre-process training reviews so we can deal directly with the indices of non-zero inputs
        training_reviews = list()
        for review in training_reviews_raw:
            indices = set()
            for word in review.split(" "):
                if(word in self.word2index.keys()):
                    indices.add(self.word2index[word])
            training_reviews.append(list(indices))
        # checking we have a matching number of reviews and labels
        assert(len(training_reviews) == len(training_labels))

        # Keeping track of correct predictions to display accuracy during training
        correct_so_far = 0

        # Remembering the time when we started for printing time statistics
        start = time.time()

        # looping through all the given reviews and run a forward and backward pass,
        # updating weights for every item
        for i in range(len(training_reviews)):

            # Getting the next review and its correct label
            review = training_reviews[i]
            label = training_labels[i]

            # Implementing the forward pass through the network.
            ## New : Removed call to 'update_input_layer' function because 'layer_0' is no longer used

            # Hidden layer
            ## New : Add in only the weights for non-zero items
            self.layer_1 *= 0
            for index in review:
                self.layer_1 += self.weights_0_1[index]

            # Output layer
            ## New : changed to use 'self.layer_1' instead of 'local layer_1'
            layer_2 = self.sigmoid(self.layer_1.dot(self.weights_1_2))

            # Implementing the back propagation pass here.
            ### Backward pass ###

            # Output error
            layer_2_error = layer_2 - self.get_target_for_label(label)
            # Output layer error is the difference between desired target and actual output.
            layer_2_delta = layer_2_error * self.sigmoid_output_2_derivative(layer_2)

            # Backpropagated error
            layer_1_error = layer_2_delta.dot(self.weights_1_2.T) # errors propagated to the hidden layer
            layer_1_delta = layer_1_error
            # hidden layer gradients - no nonlinearity so it's the same as the error

            # Update the weights
            ## New : changed to use 'self.layer_1' instead of local 'layer_1'
            self.weights_1_2 -= self.layer_1.T.dot(layer_2_delta) * self.learning_rate

            ## New : Only update the weights that were used in the forward pass
            for index in review:
                self.weights_0_1[index] -= layer_1_delta[0] * self.learning_rate

            # To determine if the prediction was
            # correct, we check that the absolute value of the output error
            # is less than 0.5. If so, add one to the correct_so_far count.
            if(layer_2 >= 0.5 and label == 'POSITIVE'):
                correct_so_far += 1
            elif(layer_2 < 0.5 and label == 'NEGATIVE'):
                correct_so_far += 1
            # printing out our prediction accuracy and speed
            # throughout the training process.

            elapsed_time = float(time.time() - start)
            reviews_per_second = i / elapsed_time if elapsed_time > 0 else 0

            sys.stdout.write("\rProgress:" + str(100 * i/float(len(training_reviews)))[:4] \
                             + "% Speed(reviews/sec):" + str(reviews_per_second)[0:5] \
                             + " #Correct:" + str(correct_so_far) + " #Trained:" + str(i+1) \
                             + " Training Accuracy:" + str(correct_so_far * 100 / float(i+1))[:4] + "%")
            if(i % 2400 == 0):
                print("")

    def test(self, testing_reviews, testing_labels):
        # keeping track of how many correct predictions we make
        correct = 0

        # we'll time how many predictions per second we make
        start = time.time()

        # Looping through each of the given reviews and calling run to predict
        # its label.
        for i in range(len(testing_reviews)):
            pred = self.run(testing_reviews[i])
            if(pred == testing_labels[i]):
                correct += 1

            # printing out the prediction accuracy and speed
            # throughout the prediction process.

            elapsed_time = float(time.time() - start)
            reviews_per_second = i / elapsed_time if elapsed_time > 0 else 0

            sys.stdout.write("\rProgress:" + str(100 * i/float(len(testing_reviews)))[:4] \
                             + "% Speed(reviews/sec):" + str(reviews_per_second)[0:5] \
                             + " #Correct:" + str(correct) + " #Tested:" + str(i+1) \
                             + " Testing Accuracy:" + str(correct * 100 / float(i+1))[:4] + "%")

    def run(self, review):
        ## New: Removed call to update_input_layer function
        #                     because layer_0 is no longer used

        # Hidden layer
        ## New: Identify the indices used in the review and then add
        #                     just those weights to layer_1
        self.layer_1 *= 0
        unique_indices = set()
        for word in review.lower().split(" "):
            if word in self.word2index.keys():
                unique_indices.add(self.word2index[word])
        for index in unique_indices:
            self.layer_1 += self.weights_0_1[index]

        # Output layer
        ## New : changed to use self.layer_1 instead of local layer_1
        layer_2 = self.sigmoid(self.layer_1.dot(self.weights_1_2))

        if(layer_2[0] >= 0.5):
            return "POSITIVE"
        else:
            return "NEGATIVE" 

We run the following cell to recreate the network and train it once again.

In [46]:
mlp = SentimentNetwork(reviews[:-1000],labels[:-1000], learning_rate=0.1)
mlp.train(reviews[:-1000],labels[:-1000])

Progress:0.0% Speed(reviews/sec):0 #Correct:1 #Trained:1 Training Accuracy:100.%
Progress:10.0% Speed(reviews/sec):1639. #Correct:1724 #Trained:2401 Training Accuracy:71.8%
Progress:20.0% Speed(reviews/sec):1562. #Correct:3640 #Trained:4801 Training Accuracy:75.8%
Progress:30.0% Speed(reviews/sec):1543. #Correct:5656 #Trained:7201 Training Accuracy:78.5%
Progress:40.0% Speed(reviews/sec):1550. #Correct:7688 #Trained:9601 Training Accuracy:80.0%
Progress:50.0% Speed(reviews/sec):1563. #Correct:9724 #Trained:12001 Training Accuracy:81.0%
Progress:60.0% Speed(reviews/sec):1560. #Correct:11779 #Trained:14401 Training Accuracy:81.7%
Progress:70.0% Speed(reviews/sec):1549. #Correct:13794 #Trained:16801 Training Accuracy:82.1%
Progress:80.0% Speed(reviews/sec):1555. #Correct:15888 #Trained:19201 Training Accuracy:82.7%
Progress:90.0% Speed(reviews/sec):1556. #Correct:17986 #Trained:21601 Training Accuracy:83.2%
Progress:99.9% Speed(reviews/sec):1559. #Correct:20089 #Trained:24000 Training Acc

We run the following cell to test your model with 1000 predictions.

In [47]:
mlp.test(reviews[-1000:],labels[-1000:])

Progress:0.0% Speed(reviews/sec):0 #Correct:1 #Tested:1 Testing Accuracy:100.%Progress:0.1% Speed(reviews/sec):997.9 #Correct:1 #Tested:2 Testing Accuracy:50.0%Progress:0.2% Speed(reviews/sec):1995. #Correct:2 #Tested:3 Testing Accuracy:66.6%Progress:0.3% Speed(reviews/sec):1494. #Correct:3 #Tested:4 Testing Accuracy:75.0%Progress:0.4% Speed(reviews/sec):1993. #Correct:4 #Tested:5 Testing Accuracy:80.0%Progress:0.5% Speed(reviews/sec):2491. #Correct:5 #Tested:6 Testing Accuracy:83.3%Progress:0.6% Speed(reviews/sec):1994. #Correct:6 #Tested:7 Testing Accuracy:85.7%Progress:0.7% Speed(reviews/sec):1745. #Correct:7 #Tested:8 Testing Accuracy:87.5%Progress:0.8% Speed(reviews/sec):1994. #Correct:8 #Tested:9 Testing Accuracy:88.8%Progress:0.9% Speed(reviews/sec):1495. #Correct:9 #Tested:10 Testing Accuracy:90.0%Progress:1.0% Speed(reviews/sec):1662. #Correct:10 #Tested:11 Testing Accuracy:90.9%Progress:1.1% Speed(reviews/sec):1828. #Correct:11 #Tested:12 Testing Accuracy:91.6%Pr

Progress:46.2% Speed(reviews/sec):2046. #Correct:408 #Tested:463 Testing Accuracy:88.1%Progress:46.3% Speed(reviews/sec):2051. #Correct:408 #Tested:464 Testing Accuracy:87.9%Progress:46.4% Speed(reviews/sec):2055. #Correct:409 #Tested:465 Testing Accuracy:87.9%Progress:46.5% Speed(reviews/sec):2060. #Correct:410 #Tested:466 Testing Accuracy:87.9%Progress:46.6% Speed(reviews/sec):2064. #Correct:411 #Tested:467 Testing Accuracy:88.0%Progress:46.7% Speed(reviews/sec):2068. #Correct:412 #Tested:468 Testing Accuracy:88.0%Progress:46.8% Speed(reviews/sec):2073. #Correct:413 #Tested:469 Testing Accuracy:88.0%Progress:46.9% Speed(reviews/sec):2077. #Correct:413 #Tested:470 Testing Accuracy:87.8%Progress:47.0% Speed(reviews/sec):2082. #Correct:414 #Tested:471 Testing Accuracy:87.8%Progress:47.1% Speed(reviews/sec):2086. #Correct:414 #Tested:472 Testing Accuracy:87.7%Progress:47.2% Speed(reviews/sec):2091. #Correct:415 #Tested:473 Testing Accuracy:87.7%Progress:47.3% Speed(reviews/se

Progress:75.8% Speed(reviews/sec):1776. #Correct:648 #Tested:759 Testing Accuracy:85.3%Progress:75.9% Speed(reviews/sec):1779. #Correct:649 #Tested:760 Testing Accuracy:85.3%Progress:76.0% Speed(reviews/sec):1781. #Correct:649 #Tested:761 Testing Accuracy:85.2%Progress:76.1% Speed(reviews/sec):1783. #Correct:650 #Tested:762 Testing Accuracy:85.3%Progress:76.2% Speed(reviews/sec):1786. #Correct:650 #Tested:763 Testing Accuracy:85.1%Progress:76.3% Speed(reviews/sec):1788. #Correct:651 #Tested:764 Testing Accuracy:85.2%Progress:76.4% Speed(reviews/sec):1790. #Correct:652 #Tested:765 Testing Accuracy:85.2%Progress:76.5% Speed(reviews/sec):1793. #Correct:652 #Tested:766 Testing Accuracy:85.1%Progress:76.6% Speed(reviews/sec):1795. #Correct:652 #Tested:767 Testing Accuracy:85.0%Progress:76.7% Speed(reviews/sec):1797. #Correct:653 #Tested:768 Testing Accuracy:85.0%Progress:76.8% Speed(reviews/sec):1800. #Correct:653 #Tested:769 Testing Accuracy:84.9%Progress:76.9% Speed(reviews/se