# Sentiment Classification & How To "Frame Problems" for a Neural Network

by Andrew Trask

- **Twitter**: @iamtrask
- **Blog**: http://iamtrask.github.io

### What You Should Already Know

- neural networks, forward and back-propagation
- stochastic gradient descent
- mean squared error
- and train/test splits

### Where to Get Help if You Need it
- Re-watch previous Udacity Lectures
- Leverage the recommended Course Reading Material - [Grokking Deep Learning](https://www.manning.com/books/grokking-deep-learning) (40% Off: **traskud17**)
- Shoot me a tweet @iamtrask


### Tutorial Outline:

- Intro: The Importance of "Framing a Problem"


- Curate a Dataset
- Developing a "Predictive Theory"
- **PROJECT 1**: Quick Theory Validation


- Transforming Text to Numbers
- **PROJECT 2**: Creating the Input/Output Data


- Putting it all together in a Neural Network
- **PROJECT 3**: Building our Neural Network


- Understanding Neural Noise
- **PROJECT 4**: Making Learning Faster by Reducing Noise


- Analyzing Inefficiencies in our Network
- **PROJECT 5**: Making our Network Train and Run Faster


- Further Noise Reduction
- **PROJECT 6**: Reducing Noise by Strategically Reducing the Vocabulary


- Analysis: What's going on in the weights?

# Lesson: Curate a Dataset

In [1]:
def pretty_print_review_and_label(i):
    print(labels[i] + "\t:\t" + reviews[i][:80] + "...")

g = open('reviews.txt','r') # What we know!
reviews = list(map(lambda x:x[:-1],g.readlines()))
g.close()

g = open('labels.txt','r') # What we WANT to know!
labels = list(map(lambda x:x[:-1].upper(),g.readlines()))
g.close()

In [2]:
len(reviews)

25000

In [8]:
reviews[124]

'we watched this in my women  s health issues class to point out how women are treated inferior to men in many societies  and i absolutely loved this movie . i plan on trying to get a copy of it myself to watch . the story is very touching and i would recommend it to anyone . i am a fan of different cultures and this movie was just what i needed . this is a movie for the whole family despite its rating . this is a movie i will show to my children . the professor of our class meant for the movie to primarily be a too to educate about women  but this movie was more than that . it is one of those movies that will forever stick out in my mind and will be a favorite .  '

In [9]:
labels[124]

'POSITIVE'

# Lesson: Develop a Predictive Theory

In [7]:
print("labels.txt \t : \t reviews.txt\n")
pretty_print_review_and_label(2137)
pretty_print_review_and_label(12816)
pretty_print_review_and_label(6267)
pretty_print_review_and_label(21934)
pretty_print_review_and_label(5297)
pretty_print_review_and_label(4998)
pretty_print_review_and_label(124)

labels.txt 	 : 	 reviews.txt

NEGATIVE	:	this movie is terrible but it has some good effects .  ...
POSITIVE	:	adrian pasdar is excellent is this film . he makes a fascinating woman .  ...
NEGATIVE	:	comment this movie is impossible . is terrible  very improbable  bad interpretat...
POSITIVE	:	excellent episode movie ala pulp fiction .  days   suicides . it doesnt get more...
NEGATIVE	:	if you haven  t seen this  it  s terrible . it is pure trash . i saw this about ...
POSITIVE	:	this schiffer guy is a real genius  the movie is of excellent quality and both e...
POSITIVE	:	we watched this in my women  s health issues class to point out how women are tr...


In [27]:
total_word_count = 0.0
positive_feature_table = dict()
negative_feature_table = dict()

In [30]:
def create_features():
    for i in range(len(reviews)):
        label = labels[i]
        review_words = reviews[i].split(' ')
        total_word_count = total_word_count + len(review_words)
        for word in review_words:
            if(labels[i] == 'POSITIVE'):
                positive_feature_table[word] = positive_feature_table.get(word, default=0) + 1
            else:
                negative_feature_table[word] = negative_feature_table.get(word, default=0) + 1
    # normalization..
    for i in positive_feature_table.keys():
        positive_feature_table[i] = positive_feature_table[i] / total_word_count
    for i in negative_feature_table.keys():
        negative_feature_table[i] = negative_feature_table[i] / total_word_count

In [31]:
create_features()
labels[123]
positive_feature_table
for v in positive_feature_table.values():
    print(v)

UnboundLocalError: local variable 'total_word_count' referenced before assignment