# Sentiment Classification

## Curate a Dataset

In [1]:
def pretty_print_review_and_label(i):
    print(labels[i] + "\t\t" + reviews[i][:80] + "...")

g = open('reviews.txt','r')
reviews = list(map(lambda x:x[:-1],g.readlines()))
g.close()

g = open('labels.txt','r')
labels = list(map(lambda x:x[:-1].upper(),g.readlines()))

In [2]:
len(reviews)

25000

In [3]:
reviews[0]

'bromwell high is a cartoon comedy . it ran at the same time as some other programs about school life  such as  teachers  . my   years in the teaching profession lead me to believe that bromwell high  s satire is much closer to reality than is  teachers  . the scramble to survive financially  the insightful students who can see right through their pathetic teachers  pomp  the pettiness of the whole situation  all remind me of the schools i knew and their students . when i saw the episode in which a student repeatedly tried to burn down the school  i immediately recalled . . . . . . . . . at . . . . . . . . . . high . a classic line inspector i  m here to sack one of your teachers . student welcome to bromwell high . i expect that many adults of my age think that bromwell high is far fetched . what a pity that it isn  t   '

In [4]:
labels[0]

'POSITIVE'

In [5]:
print("labels.txt \t : \t reviews.txt\n")
pretty_print_review_and_label(2137)
pretty_print_review_and_label(12816)
pretty_print_review_and_label(6267)
pretty_print_review_and_label(21934)
pretty_print_review_and_label(5297)
pretty_print_review_and_label(4998)

labels.txt 	 : 	 reviews.txt

NEGATIVE		this movie is terrible but it has some good effects .  ...
POSITIVE		adrian pasdar is excellent is this film . he makes a fascinating woman .  ...
NEGATIVE		comment this movie is impossible . is terrible  very improbable  bad interpretat...
POSITIVE		excellent episode movie ala pulp fiction .  days   suicides . it doesnt get more...
NEGATIVE		if you haven  t seen this  it  s terrible . it is pure trash . i saw this about ...
POSITIVE		this schiffer guy is a real genius  the movie is of excellent quality and both e...


In [6]:
from collections import Counter
import numpy as np

In [7]:
# Create three counter objects to store positive, negative and total counts
positive_counts = Counter()
negative_counts = Counter()
total_counts = Counter()

In [8]:
# Loop over all the words in all the reviews and increment the counts in the appropriate counter objects
for i in range(len(reviews)):
    if(labels[i] == 'POSITIVE'):
        for word in reviews[i].split(" "):
            positive_counts[word] += 1
            total_counts[word] += 1
    else:
        for word in reviews[i].split(" "):
            negative_counts[word] += 1
            total_counts[word] += 1

In [9]:
# Examine the counts of the most common words in positive reviews
positive_counts.most_common()

[('', 550468),
 ('the', 173324),
 ('.', 159654),
 ('and', 89722),
 ('a', 83688),
 ('of', 76855),
 ('to', 66746),
 ('is', 57245),
 ('in', 50215),
 ('br', 49235),
 ('it', 48025),
 ('i', 40743),
 ('that', 35630),
 ('this', 35080),
 ('s', 33815),
 ('as', 26308),
 ('with', 23247),
 ('for', 22416),
 ('was', 21917),
 ('film', 20937),
 ('but', 20822),
 ('movie', 19074),
 ('his', 17227),
 ('on', 17008),
 ('you', 16681),
 ('he', 16282),
 ('are', 14807),
 ('not', 14272),
 ('t', 13720),
 ('one', 13655),
 ('have', 12587),
 ('be', 12416),
 ('by', 11997),
 ('all', 11942),
 ('who', 11464),
 ('an', 11294),
 ('at', 11234),
 ('from', 10767),
 ('her', 10474),
 ('they', 9895),
 ('has', 9186),
 ('so', 9154),
 ('like', 9038),
 ('about', 8313),
 ('very', 8305),
 ('out', 8134),
 ('there', 8057),
 ('she', 7779),
 ('what', 7737),
 ('or', 7732),
 ('good', 7720),
 ('more', 7521),
 ('when', 7456),
 ('some', 7441),
 ('if', 7285),
 ('just', 7152),
 ('can', 7001),
 ('story', 6780),
 ('time', 6515),
 ('my', 6488),
 ('g

In [10]:
# Examine the counts of the most common words in negative reviews
negative_counts.most_common()

[('', 561462),
 ('.', 167538),
 ('the', 163389),
 ('a', 79321),
 ('and', 74385),
 ('of', 69009),
 ('to', 68974),
 ('br', 52637),
 ('is', 50083),
 ('it', 48327),
 ('i', 46880),
 ('in', 43753),
 ('this', 40920),
 ('that', 37615),
 ('s', 31546),
 ('was', 26291),
 ('movie', 24965),
 ('for', 21927),
 ('but', 21781),
 ('with', 20878),
 ('as', 20625),
 ('t', 20361),
 ('film', 19218),
 ('you', 17549),
 ('on', 17192),
 ('not', 16354),
 ('have', 15144),
 ('are', 14623),
 ('be', 14541),
 ('he', 13856),
 ('one', 13134),
 ('they', 13011),
 ('at', 12279),
 ('his', 12147),
 ('all', 12036),
 ('so', 11463),
 ('like', 11238),
 ('there', 10775),
 ('just', 10619),
 ('by', 10549),
 ('or', 10272),
 ('an', 10266),
 ('who', 9969),
 ('from', 9731),
 ('if', 9518),
 ('about', 9061),
 ('out', 8979),
 ('what', 8422),
 ('some', 8306),
 ('no', 8143),
 ('her', 7947),
 ('even', 7687),
 ('can', 7653),
 ('has', 7604),
 ('good', 7423),
 ('bad', 7401),
 ('would', 7036),
 ('up', 6970),
 ('only', 6781),
 ('more', 6730),
 ('

As you can see, common words like "the" appear very often in both positive and negative reviews. Instead of finding the most common words in positive or negative reviews, what you really want are the words found in positive reviews more often than in negative reviews, and vice versa. To accomplish this, you'll need to calculate the **ratios** of word usage between positive and negative reviews.

In [12]:
pos_neg_ratios = Counter()

# Calculate the ratios of positive and negative uses of the most common words
# Consider words to be "common" if they've been used at least 100 times
for term, cnt in list(total_counts.most_common()):
    if(cnt > 100):
        pos_neg_ratio = positive_counts[term] / float(negative_counts[term]+1)
        pos_neg_ratios[term] = pos_neg_ratio

Examine the ratios you've calculated for a few words:

In [13]:
print("Pos-to-neg ratio for 'the' = {}".format(pos_neg_ratios["the"]))
print("Pos-to-neg ratio for 'amazing' = {}".format(pos_neg_ratios["amazing"]))
print("Pos-to-neg ratio for 'terrible' = {}".format(pos_neg_ratios["terrible"]))

Pos-to-neg ratio for 'the' = 1.0607993145235326
Pos-to-neg ratio for 'amazing' = 4.022813688212928
Pos-to-neg ratio for 'terrible' = 0.17744252873563218


In [14]:
# Convert ratios to logs
for word, ratio in pos_neg_ratios.most_common():
    pos_neg_ratios[word] = np.log(ratio)

In [15]:
print("Pos-to-neg ratio for 'the' = {}".format(pos_neg_ratios["the"]))
print("Pos-to-neg ratio for 'amazing' = {}".format(pos_neg_ratios["amazing"]))
print("Pos-to-neg ratio for 'terrible' = {}".format(pos_neg_ratios["terrible"]))

Pos-to-neg ratio for 'the' = 0.05902269426102881
Pos-to-neg ratio for 'amazing' = 1.3919815802404802
Pos-to-neg ratio for 'terrible' = -1.7291085042663878


In [16]:
# words most frequently seen in a review with a "POSITIVE" label
pos_neg_ratios.most_common()

[('edie', 4.6913478822291435),
 ('paulie', 4.07753744390572),
 ('felix', 3.152736022363656),
 ('polanski', 2.8233610476132043),
 ('matthau', 2.80672172860924),
 ('victoria', 2.681021528714291),
 ('mildred', 2.6026896854443837),
 ('gandhi', 2.538973871058276),
 ('flawless', 2.451005098112319),
 ('superbly', 2.26002547857525),
 ('perfection', 2.159484249353372),
 ('astaire', 2.1400661634962708),
 ('captures', 2.038619547159581),
 ('voight', 2.030170492673053),
 ('wonderfully', 2.0218960560332353),
 ('powell', 1.978345424808467),
 ('brosnan', 1.9547990964725592),
 ('lily', 1.9203768470501485),
 ('bakshi', 1.9029851043382795),
 ('lincoln', 1.9014583864844796),
 ('refreshing', 1.8551812956655511),
 ('breathtaking', 1.8481124057791867),
 ('bourne', 1.8478489358790986),
 ('lemmon', 1.8458266904983307),
 ('delightful', 1.8002701588959635),
 ('flynn', 1.7996646487351682),
 ('andrews', 1.7764919970972666),
 ('homer', 1.7692866133759964),
 ('beautifully', 1.7626953362841438),
 ('soccer', 1.757857

## Creating the Input/Output Data

In [17]:
# Create a set named `vocab` that contains every words in the vocabulary
vocab = set(total_counts.keys())
vocab_size = len(vocab)
print(vocab_size)

74074


In [18]:
# Create a numpy array called layer_0 as a 2-dimensional matrix with 1 row and vocab_size columns
layer_0 = np.zeros((1, vocab_size))
print(layer_0.shape)

(1, 74074)


In [19]:
# Create a dictionary of words in the vocabulary mapped to index positions
word2index = {}
for i, word in enumerate(vocab):
    word2index[word] = i

word2index

{'': 0,
 'warheads': 1,
 'teenybopper': 2,
 'buffett': 3,
 'fengler': 4,
 'unintentionally': 5,
 'compiled': 6,
 'unfulfillment': 7,
 'flavia': 8,
 'shya': 9,
 'tarot': 10,
 'verne': 11,
 'plough': 12,
 'timewarped': 13,
 'disparage': 14,
 'almighty': 15,
 'propoghanda': 16,
 'sotto': 17,
 'cancelled': 18,
 'crumbles': 19,
 'distances': 20,
 'kites': 21,
 'stagehands': 22,
 'muslin': 23,
 'neve': 24,
 'kaminsky': 25,
 'panther': 26,
 'margaritas': 27,
 'reine': 28,
 'erikssons': 29,
 'characterised': 30,
 'panels': 31,
 'evolving': 32,
 'vistor': 33,
 'habenera': 34,
 'volpe': 35,
 'rrhs': 36,
 'square': 37,
 'redevelopment': 38,
 'doyon': 39,
 'debtors': 40,
 'growing': 41,
 'formalist': 42,
 'jacking': 43,
 'dayan': 44,
 'ketchup': 45,
 'piccirillo': 46,
 'telford': 47,
 'deader': 48,
 'twined': 49,
 'personage': 50,
 'mckee': 51,
 'pleasaunces': 52,
 'taster': 53,
 'hammerhead': 54,
 'anaemic': 55,
 'manville': 56,
 'audry': 57,
 'schne': 58,
 'upgrades': 59,
 'bashki': 60,
 'cumula

create a function `update_input_layer` count how many times each word is used in the given review, and then store those counts at the appropriate indices inside `layer_0`

In [20]:
def update_input_layer(review):
    global layer_0
    
    # Clear out previous state, reset the layer to be all 0s
    layer_0 *= 0
    
    # count how many times each word is used in the given review and store the results in layer_0
    for word in review.split(" "):
        layer_0[0][word2index[word]] += 1

In [23]:
update_input_layer(reviews[0])
layer_0

array([[18.,  0.,  0., ...,  0.,  0.,  0.]])

Create a function `get_target_for_labels` which return 0 or 1, depending on whether the given label is NEGATIVE or POSITIVE, respectively

In [24]:
def get_target_for_label(label):
    if(label == 'POSITIVE'):
        return 1
    else:
        return 0

In [25]:
labels[0]

'POSITIVE'

In [26]:
get_target_for_label(labels[0])

1

## Building a Neural Network

In [46]:
import time
import sys
import numpy as np

class SentimentNetwork:
    def __init__(self, reviews, labels, hidden_nodes = 10, learning_rate=0.1):
        """Create a SentimentNetwork with the given settings
        Args:
            reviews(list) - List of reviews used for training
            labels(list) - List of POSITIVE/NEGATIVE labels associated with the given reviews
            hidden_nodes(int) - Number of nodes to create in the hidden layer
            learning_rate(float) - Learning rate to use while training
        """
        np.random.seed(1)
        
        # process the reviews and their associated labels so that everything
        # is ready for training
        self.pre_process_data(reviews, labels)
        
        self.init_network(len(self.review_vocab), hidden_nodes, 1, learning_rate)
    
    def pre_process_data(self, reviews, labels):
        # populate review_vocab with all of the words in the given reviews
        review_vocab = set()
        for review in reviews:
            for word in review.split(" "):
                review_vocab.add(word)
        
        # Convert the vocabulary set to a list so we can access words via indices
        self.review_vocab = list(review_vocab)
        
        # populate label_vocab with all the words in the given labels
        label_vocab = set()
        for label in labels:
            label_vocab.add(label)
        
        # Convert the label set to a list so we can access labels via indices
        self.label_vocab = list(label_vocab)
        
        # Store the sizes of the review and label vocabularies.
        self.review_vocab_size = len(self.review_vocab)
        self.label_vocab_size = len(self.label_vocab)
        
        # Create a dictionary of words in the vocabulary mapped to index positions
        self.word2index = {}
        for i, word in enumerate(self.review_vocab):
            self.word2index[word] = i
        
        # Create a dictionary of labels mapped to index positions
        self.label2index = {}
        for i, label in enumerate(self.label_vocab):
            self.label2index[label] = i
    
    def init_network(self, input_nodes, hidden_nodes, output_nodes, learning_rate):
        self.input_nodes = input_nodes
        self.hidden_nodes = hidden_nodes
        self.output_nodes = output_nodes
        
        self.learning_rate = learning_rate
        
        # Initialize the weights
        
        # These are the weights between the input_layer and the hidden_layer
        self.weights_0_1 = np.zeros((self.input_nodes, self.hidden_nodes))
        
        # These are the weights between the hidden layer and the output layer
        self.weights_1_2 = np.random.normal(0.0, self.output_nodes**-0.5,
                                           (self.hidden_nodes, self.output_nodes))
        
        # The input layer, a two-dimensional matrix with shape 1 x input_nodes
        self.layer_0 = np.zeros((1, input_nodes))
    
    def update_input_layer(self, review):
        
        # Clear the pervious state, reset the layer to be all 0s
        self.layer_0 *= 0
        
        for word in review.split(" "):
            if(word in self.word2index.keys()):
                self.layer_0[0][self.word2index[word]] += 1
    
    def get_target_for_label(self, label):
        if(label == 'POSITIVE'):
            return 1
        else:
            return 0
    
    def sigmoid(self, x):
        return 1 / (1 + np.exp(-x))
    
    def sigmoid_output_2_derivative(self, output):
        return output * (1 - output)
    
    def train(self, training_reviews, training_labels):
        
        # make sure out we have a matching number of reviews 
        assert(len(training_reviews) == len(training_labels))
        
        # keep track of correct predictions to display accuracy during training
        correct_so_far = 0
        
        # Remember when we started for printing time statistics
        start = time.time()
        
        # Loop through all the given reviews and run a forward and backward pass,
        # updating weights for every item
        for i in range(len(training_reviews)):
            
            # Get the next review and its correct label
            review = training_reviews[i]
            label = training_labels[i]
            
            ### Implementing the forward pass ###
            
            # Input Layer
            self.update_input_layer(review)
            
            # Hidden Layer
            layer_1 = self.layer_0.dot(self.weights_0_1)
            
            # Output Layer
            layer_2 = self.sigmoid(layer_1.dot(self.weights_1_2))
            
            ### Implementing the backward pass ###
            
            # Output Layer
            layer_2_error = layer_2 - self.get_target_for_label(label)
            layer_2_delta = layer_2_error * self.sigmoid_output_2_derivative(layer_2)
            
            # Backpropagated error
            layer_1_error = layer_2_delta.dot(self.weights_1_2.T) # errors propageted to the hidden layer
            layer_1_delta = layer_1_error
            
            # Update the weights
            self.weights_1_2 -= layer_1.T.dot(layer_2_delta) * self.learning_rate
            self.weights_0_1 -= self.layer_0.T.dot(layer_1_delta) * self.learning_rate
            
            # keep track of correct prediction
            if(layer_2 >= 0.5 and label == 'POSITIVE'):
                correct_so_far += 1
            elif(layer_2 < 0.5 and label == 'NEGATIVE'):
                correct_so_far += 1
            
            # For debug purposes, print out our prediction accuracy and speed
            # throughout the training process
            elapsed_time = float(time.time() - start)
            reviews_per_second = i / elapsed_time if elapsed_time > 0 else 0
            
            sys.stdout.write("\rProgress:" + str(100 * i/float(len(training_reviews)))[:4] \
                             + "% Speed(reviews/sec):" + str(reviews_per_second)[0:5] \
                             + " #Correct:" + str(correct_so_far) + " #Trained:" + str(i+1) \
                             + " Training Accuracy:" + str(correct_so_far * 100 / float(i+1))[:4] + "%")
            if(i % 2500 == 0):
                print("")
    def test(self, testing_reviews, testing_labels):
        """
        Attempts to predict the labels for the given testing_reviews,
        and uses the test_labels to calculate the accuracy of those predictions.
        """
        
        # keep track of how many correct predictions we make
        correct = 0

        # we'll time how many predictions per second we make
        start = time.time()

        # Loop through each of the given reviews and call run to predict
        # its label. 
        for i in range(len(testing_reviews)):
            pred = self.run(testing_reviews[i])
            if(pred == testing_labels[i]):
                correct += 1
            
            # For debug purposes, print out our prediction accuracy and speed 
            # throughout the prediction process. 

            elapsed_time = float(time.time() - start)
            reviews_per_second = i / elapsed_time if elapsed_time > 0 else 0
            
            sys.stdout.write("\rProgress:" + str(100 * i/float(len(testing_reviews)))[:4] \
                             + "% Speed(reviews/sec):" + str(reviews_per_second)[0:5] \
                             + " #Correct:" + str(correct) + " #Tested:" + str(i+1) \
                             + " Testing Accuracy:" + str(correct * 100 / float(i+1))[:4] + "%")
    
    def run(self, review):
        """
        Returns a POSITIVE or NEGATIVE prediction for the given review.
        """
        
        # Run the forward pass through the network
        
        # Input Layer
        self.update_input_layer(review.lower())
        
        # Hidden Layer
        layer_1 = self.layer_0.dot(self.weights_0_1)
        
        # Output Layer
        layer_2 = self.sigmoid(layer_1.dot(self.weights_1_2))
        
        if(layer_2[0] >= 0.5):
            return 'POSITIVE'
        else:
            return 'NEGATIVE'
    

In [47]:
mlp = SentimentNetwork(reviews[:-1000], labels[:-1000], learning_rate=0.1)

In [48]:
mlp.test(reviews[-1000:], labels[-1000:])

Progress:0.0% Speed(reviews/sec):0.0 #Correct:1 #Tested:1 Testing Accuracy:100.%Progress:0.1% Speed(reviews/sec):200.5 #Correct:1 #Tested:2 Testing Accuracy:50.0%Progress:0.2% Speed(reviews/sec):334.3 #Correct:2 #Tested:3 Testing Accuracy:66.6%Progress:0.3% Speed(reviews/sec):376.0 #Correct:2 #Tested:4 Testing Accuracy:50.0%Progress:0.4% Speed(reviews/sec):445.5 #Correct:3 #Tested:5 Testing Accuracy:60.0%Progress:0.5% Speed(reviews/sec):501.4 #Correct:3 #Tested:6 Testing Accuracy:50.0%Progress:0.6% Speed(reviews/sec):546.9 #Correct:4 #Tested:7 Testing Accuracy:57.1%Progress:0.7% Speed(reviews/sec):539.9 #Correct:4 #Tested:8 Testing Accuracy:50.0%Progress:0.8% Speed(reviews/sec):534.7 #Correct:5 #Tested:9 Testing Accuracy:55.5%Progress:0.9% Speed(reviews/sec):530.8 #Correct:5 #Tested:10 Testing Accuracy:50.0%Progress:1.0% Speed(reviews/sec):557.0 #Correct:6 #Tested:11 Testing Accuracy:54.5%Progress:1.1% Speed(reviews/sec):580.5 #Correct:6 #Tested:12 Testing Accuracy:50.0%Pr

Progress:14.9% Speed(reviews/sec):803.2 #Correct:75 #Tested:150 Testing Accuracy:50.0%Progress:15.0% Speed(reviews/sec):804.2 #Correct:76 #Tested:151 Testing Accuracy:50.3%Progress:15.1% Speed(reviews/sec):805.3 #Correct:76 #Tested:152 Testing Accuracy:50.0%Progress:15.2% Speed(reviews/sec):802.1 #Correct:77 #Tested:153 Testing Accuracy:50.3%Progress:15.3% Speed(reviews/sec):799.0 #Correct:77 #Tested:154 Testing Accuracy:50.0%Progress:15.4% Speed(reviews/sec):800.0 #Correct:78 #Tested:155 Testing Accuracy:50.3%Progress:15.5% Speed(reviews/sec):797.0 #Correct:78 #Tested:156 Testing Accuracy:50.0%Progress:15.6% Speed(reviews/sec):794.0 #Correct:79 #Tested:157 Testing Accuracy:50.3%Progress:15.7% Speed(reviews/sec):791.0 #Correct:79 #Tested:158 Testing Accuracy:50.0%Progress:15.8% Speed(reviews/sec):792.1 #Correct:80 #Tested:159 Testing Accuracy:50.3%Progress:15.9% Speed(reviews/sec):793.1 #Correct:80 #Tested:160 Testing Accuracy:50.0%Progress:16.0% Speed(reviews/sec):790.2 #C

Progress:31.2% Speed(reviews/sec):836.4 #Correct:157 #Tested:313 Testing Accuracy:50.1%Progress:31.3% Speed(reviews/sec):834.6 #Correct:157 #Tested:314 Testing Accuracy:50.0%Progress:31.4% Speed(reviews/sec):835.1 #Correct:158 #Tested:315 Testing Accuracy:50.1%Progress:31.5% Speed(reviews/sec):835.5 #Correct:158 #Tested:316 Testing Accuracy:50.0%Progress:31.6% Speed(reviews/sec):836.0 #Correct:159 #Tested:317 Testing Accuracy:50.1%Progress:31.7% Speed(reviews/sec):836.4 #Correct:159 #Tested:318 Testing Accuracy:50.0%Progress:31.8% Speed(reviews/sec):836.8 #Correct:160 #Tested:319 Testing Accuracy:50.1%Progress:31.9% Speed(reviews/sec):837.3 #Correct:160 #Tested:320 Testing Accuracy:50.0%Progress:32.0% Speed(reviews/sec):837.7 #Correct:161 #Tested:321 Testing Accuracy:50.1%Progress:32.1% Speed(reviews/sec):838.1 #Correct:161 #Tested:322 Testing Accuracy:50.0%Progress:32.2% Speed(reviews/sec):838.6 #Correct:162 #Tested:323 Testing Accuracy:50.1%Progress:32.3% Speed(reviews/se

Progress:47.5% Speed(reviews/sec):847.4 #Correct:238 #Tested:476 Testing Accuracy:50.0%Progress:47.6% Speed(reviews/sec):846.2 #Correct:239 #Tested:477 Testing Accuracy:50.1%Progress:47.7% Speed(reviews/sec):845.0 #Correct:239 #Tested:478 Testing Accuracy:50.0%Progress:47.8% Speed(reviews/sec):843.8 #Correct:240 #Tested:479 Testing Accuracy:50.1%Progress:47.9% Speed(reviews/sec):844.0 #Correct:240 #Tested:480 Testing Accuracy:50.0%Progress:48.0% Speed(reviews/sec):844.3 #Correct:241 #Tested:481 Testing Accuracy:50.1%Progress:48.1% Speed(reviews/sec):841.6 #Correct:241 #Tested:482 Testing Accuracy:50.0%Progress:48.2% Speed(reviews/sec):840.5 #Correct:242 #Tested:483 Testing Accuracy:50.1%Progress:48.3% Speed(reviews/sec):839.3 #Correct:242 #Tested:484 Testing Accuracy:50.0%Progress:48.4% Speed(reviews/sec):839.6 #Correct:243 #Tested:485 Testing Accuracy:50.1%Progress:48.5% Speed(reviews/sec):838.4 #Correct:243 #Tested:486 Testing Accuracy:50.0%Progress:48.6% Speed(reviews/se

Progress:62.8% Speed(reviews/sec):840.6 #Correct:315 #Tested:629 Testing Accuracy:50.0%Progress:62.9% Speed(reviews/sec):839.7 #Correct:315 #Tested:630 Testing Accuracy:50.0%Progress:63.0% Speed(reviews/sec):838.8 #Correct:316 #Tested:631 Testing Accuracy:50.0%Progress:63.1% Speed(reviews/sec):839.1 #Correct:316 #Tested:632 Testing Accuracy:50.0%Progress:63.2% Speed(reviews/sec):839.3 #Correct:317 #Tested:633 Testing Accuracy:50.0%Progress:63.3% Speed(reviews/sec):838.4 #Correct:317 #Tested:634 Testing Accuracy:50.0%Progress:63.4% Speed(reviews/sec):838.6 #Correct:318 #Tested:635 Testing Accuracy:50.0%Progress:63.5% Speed(reviews/sec):837.7 #Correct:318 #Tested:636 Testing Accuracy:50.0%Progress:63.6% Speed(reviews/sec):836.8 #Correct:319 #Tested:637 Testing Accuracy:50.0%Progress:63.7% Speed(reviews/sec):835.9 #Correct:319 #Tested:638 Testing Accuracy:50.0%Progress:63.8% Speed(reviews/sec):836.2 #Correct:320 #Tested:639 Testing Accuracy:50.0%Progress:63.9% Speed(reviews/se

Progress:79.6% Speed(reviews/sec):851.7 #Correct:399 #Tested:797 Testing Accuracy:50.0%Progress:79.7% Speed(reviews/sec):851.9 #Correct:399 #Tested:798 Testing Accuracy:50.0%Progress:79.8% Speed(reviews/sec):852.1 #Correct:400 #Tested:799 Testing Accuracy:50.0%Progress:79.9% Speed(reviews/sec):852.2 #Correct:400 #Tested:800 Testing Accuracy:50.0%Progress:80.0% Speed(reviews/sec):852.4 #Correct:401 #Tested:801 Testing Accuracy:50.0%Progress:80.1% Speed(reviews/sec):852.5 #Correct:401 #Tested:802 Testing Accuracy:50.0%Progress:80.2% Speed(reviews/sec):851.8 #Correct:402 #Tested:803 Testing Accuracy:50.0%Progress:80.3% Speed(reviews/sec):852.0 #Correct:402 #Tested:804 Testing Accuracy:50.0%Progress:80.4% Speed(reviews/sec):852.1 #Correct:403 #Tested:805 Testing Accuracy:50.0%Progress:80.5% Speed(reviews/sec):851.4 #Correct:403 #Tested:806 Testing Accuracy:50.0%Progress:80.6% Speed(reviews/sec):851.5 #Correct:404 #Tested:807 Testing Accuracy:50.0%Progress:80.7% Speed(reviews/se

Progress:93.7% Speed(reviews/sec):835.1 #Correct:469 #Tested:938 Testing Accuracy:50.0%Progress:93.8% Speed(reviews/sec):834.5 #Correct:470 #Tested:939 Testing Accuracy:50.0%Progress:93.9% Speed(reviews/sec):834.6 #Correct:470 #Tested:940 Testing Accuracy:50.0%Progress:94.0% Speed(reviews/sec):834.8 #Correct:471 #Tested:941 Testing Accuracy:50.0%Progress:94.1% Speed(reviews/sec):834.9 #Correct:471 #Tested:942 Testing Accuracy:50.0%Progress:94.2% Speed(reviews/sec):834.3 #Correct:472 #Tested:943 Testing Accuracy:50.0%Progress:94.3% Speed(reviews/sec):834.5 #Correct:472 #Tested:944 Testing Accuracy:50.0%Progress:94.4% Speed(reviews/sec):833.2 #Correct:473 #Tested:945 Testing Accuracy:50.0%Progress:94.5% Speed(reviews/sec):832.6 #Correct:473 #Tested:946 Testing Accuracy:50.0%Progress:94.6% Speed(reviews/sec):832.7 #Correct:474 #Tested:947 Testing Accuracy:50.0%Progress:94.7% Speed(reviews/sec):832.9 #Correct:474 #Tested:948 Testing Accuracy:50.0%Progress:94.8% Speed(reviews/se

That most likely didn't train very well. Part of the reason may be because the learning rate is too high. Run the following cell to recreate the network with a smaller learning rate, 0.01, and then train the new network

In [49]:
mlp = SentimentNetwork(reviews[:-1000],labels[:-1000], learning_rate=0.01)
mlp.train(reviews[:-1000],labels[:-1000])

Progress:0.0% Speed(reviews/sec):0.0 #Correct:1 #Trained:1 Training Accuracy:100.%
Progress:10.4% Speed(reviews/sec):109.2 #Correct:1248 #Trained:2501 Training Accuracy:49.9%
Progress:20.8% Speed(reviews/sec):109.6 #Correct:2498 #Trained:5001 Training Accuracy:49.9%
Progress:31.2% Speed(reviews/sec):109.6 #Correct:3748 #Trained:7501 Training Accuracy:49.9%
Progress:41.6% Speed(reviews/sec):110.0 #Correct:4998 #Trained:10001 Training Accuracy:49.9%
Progress:52.0% Speed(reviews/sec):110.3 #Correct:6248 #Trained:12501 Training Accuracy:49.9%
Progress:62.5% Speed(reviews/sec):110.5 #Correct:7488 #Trained:15001 Training Accuracy:49.9%
Progress:72.9% Speed(reviews/sec):110.5 #Correct:8738 #Trained:17501 Training Accuracy:49.9%
Progress:83.3% Speed(reviews/sec):110.6 #Correct:9988 #Trained:20001 Training Accuracy:49.9%
Progress:93.7% Speed(reviews/sec):110.6 #Correct:11238 #Trained:22501 Training Accuracy:49.9%
Progress:99.9% Speed(reviews/sec):110.6 #Correct:11988 #Trained:24000 Training Acc

That probably wasn't much different. Run the following cell to recreate the network one more time with an even smaller learning rate, 0.001, and then train the new network.

In [50]:
mlp = SentimentNetwork(reviews[:-1000],labels[:-1000], learning_rate=0.001)
mlp.train(reviews[:-1000],labels[:-1000])

Progress:0.0% Speed(reviews/sec):0.0 #Correct:1 #Trained:1 Training Accuracy:100.%
Progress:10.4% Speed(reviews/sec):111.6 #Correct:1278 #Trained:2501 Training Accuracy:51.0%
Progress:20.8% Speed(reviews/sec):111.4 #Correct:2630 #Trained:5001 Training Accuracy:52.5%
Progress:31.2% Speed(reviews/sec):110.4 #Correct:4055 #Trained:7501 Training Accuracy:54.0%
Progress:41.6% Speed(reviews/sec):109.7 #Correct:5586 #Trained:10001 Training Accuracy:55.8%
Progress:52.0% Speed(reviews/sec):109.7 #Correct:7140 #Trained:12501 Training Accuracy:57.1%
Progress:62.5% Speed(reviews/sec):109.8 #Correct:8758 #Trained:15001 Training Accuracy:58.3%
Progress:72.9% Speed(reviews/sec):109.8 #Correct:10366 #Trained:17501 Training Accuracy:59.2%
Progress:83.3% Speed(reviews/sec):109.9 #Correct:12033 #Trained:20001 Training Accuracy:60.1%
Progress:93.7% Speed(reviews/sec):109.9 #Correct:13737 #Trained:22501 Training Accuracy:61.0%
Progress:99.9% Speed(reviews/sec):109.9 #Correct:14734 #Trained:24000 Training A

With a learning rate of 0.001, the network should finall have started to improve during training. It's still not very good, but it shows that this solution has potential.