# Sentiment Classification & How To "Frame Problems" for a Neural Network

by Andrew Trask

- **Twitter**: @iamtrask
- **Blog**: http://iamtrask.github.io

### What You Should Already Know

- neural networks, forward and back-propagation
- stochastic gradient descent
- mean squared error
- and train/test splits

### Where to Get Help if You Need it
- Re-watch previous Udacity Lectures
- Leverage the recommended Course Reading Material - [Grokking Deep Learning](https://www.manning.com/books/grokking-deep-learning) (40% Off: **traskud17**)
- Shoot me a tweet @iamtrask


### Tutorial Outline:

- Intro: The Importance of "Framing a Problem"


- Curate a Dataset
- Developing a "Predictive Theory"
- **PROJECT 1**: Quick Theory Validation


- Transforming Text to Numbers
- **PROJECT 2**: Creating the Input/Output Data


- Putting it all together in a Neural Network
- **PROJECT 3**: Building our Neural Network


- Understanding Neural Noise
- **PROJECT 4**: Making Learning Faster by Reducing Noise


- Analyzing Inefficiencies in our Network
- **PROJECT 5**: Making our Network Train and Run Faster


- Further Noise Reduction
- **PROJECT 6**: Reducing Noise by Strategically Reducing the Vocabulary


- Analysis: What's going on in the weights?

# Lesson: Curate a Dataset

In [1]:
def pretty_print_review_and_label(i):
    print(labels[i] + "\t:\t" + reviews[i][:80] + "...")

g = open('reviews.txt','r') # What we know!
reviews = list(map(lambda x:x[:-1],g.readlines()))
g.close()

g = open('labels.txt','r') # What we WANT to know!
labels = list(map(lambda x:x[:-1].upper(),g.readlines()))
g.close()

In [2]:
len(reviews)

25000

In [3]:
reviews[0]

'bromwell high is a cartoon comedy . it ran at the same time as some other programs about school life  such as  teachers  . my   years in the teaching profession lead me to believe that bromwell high  s satire is much closer to reality than is  teachers  . the scramble to survive financially  the insightful students who can see right through their pathetic teachers  pomp  the pettiness of the whole situation  all remind me of the schools i knew and their students . when i saw the episode in which a student repeatedly tried to burn down the school  i immediately recalled . . . . . . . . . at . . . . . . . . . . high . a classic line inspector i  m here to sack one of your teachers . student welcome to bromwell high . i expect that many adults of my age think that bromwell high is far fetched . what a pity that it isn  t   '

In [4]:
labels[0]

'POSITIVE'

# Lesson: Develop a Predictive Theory

In [5]:
print("labels.txt \t : \t reviews.txt\n")
pretty_print_review_and_label(2137)
pretty_print_review_and_label(12816)
pretty_print_review_and_label(6267)
pretty_print_review_and_label(21934)
pretty_print_review_and_label(5297)
pretty_print_review_and_label(4998)

labels.txt 	 : 	 reviews.txt

NEGATIVE	:	this movie is terrible but it has some good effects .  ...
POSITIVE	:	adrian pasdar is excellent is this film . he makes a fascinating woman .  ...
NEGATIVE	:	comment this movie is impossible . is terrible  very improbable  bad interpretat...
POSITIVE	:	excellent episode movie ala pulp fiction .  days   suicides . it doesnt get more...
NEGATIVE	:	if you haven  t seen this  it  s terrible . it is pure trash . i saw this about ...
POSITIVE	:	this schiffer guy is a real genius  the movie is of excellent quality and both e...


In [10]:
import collections

positive_words = collections.Counter()
negative_words = collections.Counter()
total_words = collections.Counter()

print('a')
for i in range(len(reviews)):
    if labels[i] == 'POSITIVE':
        for word in reviews[i].split(" "):
           positive_words[word] +=1
           total_words[word] += 1
    if labels[i] =='NEGATIVE':
        for word in reviews[i].split(" "):
           negative_words[word] += 1
           total_words[word] += 1

print(len(positive_words))
for word,count in positive_words.most_common(1000):
   if count < 500 :
    print(word, count)
 
    
    


a
55214
wish 499
begins 498
taken 497
sad 497
ways 496
richard 495
knows 494
atmosphere 493
similar 491
surprised 491
taking 491
car 491
george 490
perfectly 490
across 489
team 489
eye 489
sequence 489
room 488
due 488
among 488
serious 488
powerful 488
strange 487
order 487
cannot 487
b 487
beauty 486
famous 485
happened 484
tries 484
herself 484
myself 484
class 483
four 482
cool 481
release 479
anyway 479
theme 479
opening 478
entertainment 477
slow 475
ends 475
unique 475
exactly 475
easily 474
level 474
o 474
red 474
interest 472
happen 471
crime 470
viewing 468
sets 467
memorable 467
stop 466
group 466
problems 463
dance 463
working 463
sister 463
message 463
knew 462
mystery 461
nature 461
bring 460
believable 459
thinking 459
brought 459
mostly 458
disney 457
couldn 457
society 456
lady 455
within 455
blood 454
parents 453
upon 453
viewers 453
meets 452
form 452
peter 452
tom 452
usually 452
soundtrack 452
local 450
certain 448
follow 448
whether 447
possible 446
emotional 445