# Sentiment Analysis with an RNN

In this notebook, you'll implement a recurrent neural network that performs sentiment analysis. Using an RNN rather than a feedfoward network is more accurate since we can include information about the *sequence* of words. Here we'll use a dataset of movie reviews, accompanied by labels.

The architecture for this network is shown below.

<img src="assets/network_diagram.png" width=400px>

Here, we'll pass in words to an embedding layer. We need an embedding layer because we have tens of thousands of words, so we'll need a more efficient representation for our input data than one-hot encoded vectors. You should have seen this before from the word2vec lesson. You can actually train up an embedding with word2vec and use it here. But it's good enough to just have an embedding layer and let the network learn the embedding table on it's own.

From the embedding layer, the new representations will be passed to LSTM cells. These will add recurrent connections to the network so we can include information about the sequence of words in the data. Finally, the LSTM cells will go to a sigmoid output layer here. We're using the sigmoid because we're trying to predict if this text has positive or negative sentiment. The output layer will just be a single unit then, with a sigmoid activation function.

We don't care about the sigmoid outputs except for the very last one, we can ignore the rest. We'll calculate the cost from the output of the last step and the training label.

In [2]:
import numpy as np
import tensorflow as tf

In [3]:
with open('./reviews.txt', 'r') as f:
    reviews = f.read()
with open('./labels.txt', 'r') as f:
    labels = f.read()

In [4]:
reviews[:2000]

'bromwell high is a cartoon comedy . it ran at the same time as some other programs about school life  such as  teachers  . my   years in the teaching profession lead me to believe that bromwell high  s satire is much closer to reality than is  teachers  . the scramble to survive financially  the insightful students who can see right through their pathetic teachers  pomp  the pettiness of the whole situation  all remind me of the schools i knew and their students . when i saw the episode in which a student repeatedly tried to burn down the school  i immediately recalled . . . . . . . . . at . . . . . . . . . . high . a classic line inspector i  m here to sack one of your teachers . student welcome to bromwell high . i expect that many adults of my age think that bromwell high is far fetched . what a pity that it isn  t   \nstory of a man who has unnatural feelings for a pig . starts out with a opening scene that is a terrific example of absurd comedy . a formal orchestra audience is tu

## Data preprocessing

The first step when building a neural network model is getting your data into the proper form to feed into the network. Since we're using embedding layers, we'll need to encode each word with an integer. We'll also want to clean it up a bit.

You can see an example of the reviews data above. We'll want to get rid of those periods. Also, you might notice that the reviews are delimited with newlines `\n`. To deal with those, I'm going to split the text into each review using `\n` as the delimiter. Then I can combined all the reviews back together into one big string.

First, let's remove all punctuation. Then get all the text without the newlines and split it into individual words.

In [15]:
reviews[0]

'b'

In [16]:
c = reviews[0]
c

'b'

In [17]:
str1 = ['b', 'o', 'y', 'g', 'i', 'r', 'l']
str1

['b', 'o', 'y', 'g', 'i', 'r', 'l']

> join element in list with nothing

In [19]:
text =''.join(str1)
text

'boygirl'

In [20]:
from string import punctuation
'\n' in punctuation

False

In [21]:
',' in punctuation

True

> c is character in below cell, 

> the second line join all characters except of punctuation

> Note '\n' is NOT in punctuation

> reviews and words  are what we want

In [5]:
from string import punctuation
all_text = ''.join([c for c in reviews if c not in punctuation])
reviews = all_text.split('\n')

all_text = ' '.join(reviews)
words = all_text.split()

In [31]:
reviews[0]

'bromwell high is a cartoon comedy  it ran at the same time as some other programs about school life  such as  teachers   my   years in the teaching profession lead me to believe that bromwell high  s satire is much closer to reality than is  teachers   the scramble to survive financially  the insightful students who can see right through their pathetic teachers  pomp  the pettiness of the whole situation  all remind me of the schools i knew and their students  when i saw the episode in which a student repeatedly tried to burn down the school  i immediately recalled          at           high  a classic line inspector i  m here to sack one of your teachers  student welcome to bromwell high  i expect that many adults of my age think that bromwell high is far fetched  what a pity that it isn  t   '

In [35]:
all_text[0]

'b'

In [36]:
words[:100]

['bromwell',
 'high',
 'is',
 'a',
 'cartoon',
 'comedy',
 'it',
 'ran',
 'at',
 'the',
 'same',
 'time',
 'as',
 'some',
 'other',
 'programs',
 'about',
 'school',
 'life',
 'such',
 'as',
 'teachers',
 'my',
 'years',
 'in',
 'the',
 'teaching',
 'profession',
 'lead',
 'me',
 'to',
 'believe',
 'that',
 'bromwell',
 'high',
 's',
 'satire',
 'is',
 'much',
 'closer',
 'to',
 'reality',
 'than',
 'is',
 'teachers',
 'the',
 'scramble',
 'to',
 'survive',
 'financially',
 'the',
 'insightful',
 'students',
 'who',
 'can',
 'see',
 'right',
 'through',
 'their',
 'pathetic',
 'teachers',
 'pomp',
 'the',
 'pettiness',
 'of',
 'the',
 'whole',
 'situation',
 'all',
 'remind',
 'me',
 'of',
 'the',
 'schools',
 'i',
 'knew',
 'and',
 'their',
 'students',
 'when',
 'i',
 'saw',
 'the',
 'episode',
 'in',
 'which',
 'a',
 'student',
 'repeatedly',
 'tried',
 'to',
 'burn',
 'down',
 'the',
 'school',
 'i',
 'immediately',
 'recalled',
 'at',
 'high']

### Encoding the words

The embedding lookup requires that we pass in integers to our network. The easiest way to do this is to create dictionaries that map the words in the vocabulary to integers. Then we can convert each of our reviews into integers so they can be passed into the network.

> **Exercise:** Now you're going to encode the words with integers. Build a dictionary that maps words to integers. Later we're going to pad our input vectors with zeros, so make sure the integers **start at 1, not 0**.
> Also, convert the reviews to integers and store the reviews in a new list called `reviews_ints`. 

In [57]:
type(reviews[0])

str

In [55]:
reviews[0].split() == reviews[0].split(' ')

False

In [56]:
# Create your dictionary that maps vocab words to integers here
vocab = set(words)
vocab_to_int = {w: i+1 for i, w in enumerate(vocab)}

# Convert the reviews to integers, same shape as reviews list, but with integers
reviews_ints = list()
for i in range(len(reviews)):
    review = reviews[i].split()
    print('review:', review)
    print('\n')
    for ii in range(len(review)):
        print('review[ii]:', review[ii])
        print('\n')
        print(vocab_to_int[review[ii]])

IOPub data rate exceeded.
The notebook server will temporarily stop sending output
to the client in order to avoid crashing it.
To change this limit, set the config variable
`--NotebookApp.iopub_data_rate_limit`.


 bromwell


54014
review[ii]: high


34878
review[ii]: is


39646
review[ii]: a


33087
review[ii]: cartoon


10592
review[ii]: comedy


33884
review[ii]: it


70513
review[ii]: ran


66225
review[ii]: at


5640
review[ii]: the


22483
review[ii]: same


42950
review[ii]: time


24534
review[ii]: as


70129
review[ii]: some


52674
review[ii]: other


4308
review[ii]: programs


55851
review[ii]: about


57899
review[ii]: school


15290
review[ii]: life


35711
review[ii]: such


1413
review[ii]: as


70129
review[ii]: teachers


44323
review[ii]: my


25897
review[ii]: years


50632
review[ii]: in


40130
review[ii]: the


22483
review[ii]: teaching


23102
review[ii]: profession


20580
review[ii]: lead


4793
review[ii]: me


22025
review[ii]: to


7858
review[ii]: believe


63787
review[ii]: that


43433
review[ii]: bromwell


54014
review[ii]: high


34878
review[ii]: s


26937
review[ii]: satire


70644
review[ii]: is


39646
review[ii]: much


10024
review[ii]: closer


35028
re

42895
review[ii]: they


31066
review[ii]: re


55699
review[ii]: survivors


63889
review[ii]: bolt


70583
review[ii]: isn


57435
review[ii]: t


11339
review[ii]: he


61308
review[ii]: s


26937
review[ii]: not


66217
review[ii]: used


12317
review[ii]: to


7858
review[ii]: reaching


57408
review[ii]: mutual


28932
review[ii]: agreements


10431
review[ii]: like


61721
review[ii]: he


61308
review[ii]: once


3517
review[ii]: did


61640
review[ii]: when


72788
review[ii]: being


21687
review[ii]: rich


10679
review[ii]: where


30069
review[ii]: it


70513
review[ii]: s


26937
review[ii]: fight


29089
review[ii]: or


17790
review[ii]: flight


7505
review[ii]: kill


6394
review[ii]: or


17790
review[ii]: be


54471
review[ii]: killed


26763
review[ii]: br


57051
review[ii]: br


57051
review[ii]: while


45456
review[ii]: the


22483
review[ii]: love


64385
review[ii]: connection


72204
review[ii]: between


58723
review[ii]: molly


20367
review[ii]: and


469

review[ii]: missed


64480
review[ii]: opportunity


51222
review[ii]: while


45456
review[ii]: the


22483
review[ii]: rather


48654
review[ii]: sluggish


59
review[ii]: plot


7218
review[ii]: keeps


71273
review[ii]: one


9190
review[ii]: entertained


1450
review[ii]: for


32953
review[ii]: odd


48414
review[ii]: minutes


47234
review[ii]: not


66217
review[ii]: that


43433
review[ii]: much


10024
review[ii]: happens


24904
review[ii]: after


6788
review[ii]: the


22483
review[ii]: plane


50385
review[ii]: sinks


26140
review[ii]: there


6512
review[ii]: s


26937
review[ii]: not


66217
review[ii]: as


70129
review[ii]: much


10024
review[ii]: urgency


59756
review[ii]: as


70129
review[ii]: i


27254
review[ii]: thought


45884
review[ii]: there


6512
review[ii]: should


49971
review[ii]: have


51833
review[ii]: been


26330
review[ii]: even


45017
review[ii]: when


72788
review[ii]: the


22483
review[ii]: navy


67014
review[ii]: become


13761
review[

review[ii]: than


32782
review[ii]: usual


4921
review[ii]: the


22483
review[ii]: hospital


71946
review[ii]: scene


71500
review[ii]: and


46937
review[ii]: the


22483
review[ii]: scene


71500
review[ii]: where


30069
review[ii]: the


22483
review[ii]: homeless


64060
review[ii]: invade


28657
review[ii]: a


33087
review[ii]: demolition


25600
review[ii]: site


6570
review[ii]: are


20085
review[ii]: all


57595
review[ii]: time


24534
review[ii]: classics


16191
review[ii]: look


33778
review[ii]: for


32953
review[ii]: the


22483
review[ii]: legs


22043
review[ii]: scene


71500
review[ii]: and


46937
review[ii]: the


22483
review[ii]: two


64817
review[ii]: big


53558
review[ii]: diggers


51803
review[ii]: fighting


40465
review[ii]: one


9190
review[ii]: bleeds


36266
review[ii]: this


52403
review[ii]: movie


17829
review[ii]: gets


65794
review[ii]: better


17458
review[ii]: each


121
review[ii]: time


24534
review[ii]: i


27254
review[ii]: 

38592
review[ii]: either


53544
review[ii]: i


27254
review[ii]: think


14974
review[ii]: anyone


63351
review[ii]: who


12226
review[ii]: says


37515
review[ii]: they


31066
review[ii]: enjoyed


48756
review[ii]: hours


49294
review[ii]: of


63948
review[ii]: this


52403
review[ii]: is


39646
review[ii]: well


26392
review[ii]: lying


2143
review[ii]: this


52403
review[ii]: is


39646
review[ii]: not


66217
review[ii]: the


22483
review[ii]: typical


45800
review[ii]: mel


49268
review[ii]: brooks


8572
review[ii]: film


27838
review[ii]: it


70513
review[ii]: was


1316
review[ii]: much


10024
review[ii]: less


17697
review[ii]: slapstick


17858
review[ii]: than


32782
review[ii]: most


11080
review[ii]: of


63948
review[ii]: his


38592
review[ii]: movies


54396
review[ii]: and


46937
review[ii]: actually


48130
review[ii]: had


28971
review[ii]: a


33087
review[ii]: plot


7218
review[ii]: that


43433
review[ii]: was


1316
review[ii]: followable


65237
review[ii]: s


26937
review[ii]: just


18081
review[ii]: pretend


19294
review[ii]: this


52403
review[ii]: film


27838
review[ii]: never


14831
review[ii]: happened


73835
review[ii]: this


52403
review[ii]: isn


57435
review[ii]: t


11339
review[ii]: the


22483
review[ii]: comedic


45567
review[ii]: robin


32538
review[ii]: williams


57786
review[ii]: nor


23346
review[ii]: is


39646
review[ii]: it


70513
review[ii]: the


22483
review[ii]: quirky


59697
review[ii]: insane


63167
review[ii]: robin


32538
review[ii]: williams


57786
review[ii]: of


63948
review[ii]: recent


41364
review[ii]: thriller


62759
review[ii]: fame


25447
review[ii]: this


52403
review[ii]: is


39646
review[ii]: a


33087
review[ii]: hybrid


55919
review[ii]: of


63948
review[ii]: the


22483
review[ii]: classic


67819
review[ii]: drama


59279
review[ii]: without


45389
review[ii]: over


4422
review[ii]: dramatization


72617
review[ii]: mixed


66938
review[ii]: with




review[ii]: becoming


67609
review[ii]: the


22483
review[ii]: directorial


46159
review[ii]: quotations


19482
review[ii]: to


7858
review[ii]: use


49998
review[ii]: a


33087
review[ii]: polite


54251
review[ii]: term


36716
review[ii]: from


36714
review[ii]: bergman


28069
review[ii]: are


20085
review[ii]: close


44470
review[ii]: to


7858
review[ii]: parody


45023
review[ii]: the


22483
review[ii]: incredibly


6445
review[ii]: self


7620
review[ii]: involved


64155
review[ii]: family


3580
review[ii]: keep


33934
review[ii]: reminding


58212
review[ii]: us


66689
review[ii]: of


63948
review[ii]: how


27462
review[ii]: brilliant


56387
review[ii]: and


46937
review[ii]: talented


1553
review[ii]: they


31066
review[ii]: are


20085
review[ii]: to


7858
review[ii]: the


22483
review[ii]: point


36494
review[ii]: of


63948
review[ii]: strangulation


37275
review[ii]: i


27254
review[ii]: read


6365
review[ii]: a


33087
review[ii]: poem


65757
r



4602
review[ii]: collette


67883
review[ii]: might


58182
review[ii]: be


54471
review[ii]: you


31774
review[ii]: know


18191
review[ii]: like


61721
review[ii]: that


43433
review[ii]: guy


43128
review[ii]: from


36714
review[ii]: psycho


52818
review[ii]: there


6512
review[ii]: have


51833
review[ii]: been


26330
review[ii]: several


32815
review[ii]: years


50632
review[ii]: when


72788
review[ii]: organizations


26547
review[ii]: giving


6732
review[ii]: acting


72171
review[ii]: awards


53631
review[ii]: seemed


44179
review[ii]: to


7858
review[ii]: reach


32520
review[ii]: for


32953
review[ii]: women


1424
review[ii]: due


56394
review[ii]: to


7858
review[ii]: a


33087
review[ii]: slighter


47835
review[ii]: dispersion


10222
review[ii]: of


63948
review[ii]: roles


22806
review[ii]: certainly


30849
review[ii]: they


31066
review[ii]: could


36477
review[ii]: have


51833
review[ii]: noticed


57578
review[ii]: collette


67883
review[i


22483
review[ii]: nightmarish


66410
review[ii]: existence


28386
review[ii]: of


63948
review[ii]: pete


73201
review[ii]: being


21687
review[ii]: abducted


69260
review[ii]: and


46937
review[ii]: sexually


29068
review[ii]: abused


52431
review[ii]: for


32953
review[ii]: years


50632
review[ii]: until


59315
review[ii]: he


61308
review[ii]: was


1316
review[ii]: finally


37984
review[ii]: rescued


10145
review[ii]: by


8052
review[ii]: a


33087
review[ii]: nurse


72850
review[ii]: named


34838
review[ii]: donna


23767
review[ii]: collette


67883
review[ii]: giving


6732
review[ii]: an


47250
review[ii]: excellent


45199
review[ii]: performance


44795
review[ii]: who


12226
review[ii]: has


32434
review[ii]: adopted


33410
review[ii]: the


22483
review[ii]: boy


32060
review[ii]: but


21918
review[ii]: her


47728
review[ii]: correspondence


10501
review[ii]: with


51325
review[ii]: no


58667
review[ii]: one


9190
review[ii]: reveals


49921
re

review[ii]: role


25513
review[ii]: was


1316
review[ii]: a


33087
review[ii]: far


43986
review[ii]: cry


46057
review[ii]: from


36714
review[ii]: those


50679
review[ii]: she


62338
review[ii]: had


28971
review[ii]: in


40130
review[ii]: movies


54396
review[ii]: like


61721
review[ii]: little


9828
review[ii]: miss


48758
review[ii]: sunshine


30426
review[ii]: there


6512
review[ii]: were


37192
review[ii]: even


45017
review[ii]: times


64067
review[ii]: she


62338
review[ii]: looked


11511
review[ii]: into


7080
review[ii]: the


22483
review[ii]: camera


14786
review[ii]: where


30069
review[ii]: i


27254
review[ii]: thought


45884
review[ii]: she


62338
review[ii]: was


1316
review[ii]: staring


59319
review[ii]: right


67545
review[ii]: at


5640
review[ii]: me


22025
review[ii]: it


70513
review[ii]: takes


55949
review[ii]: a


33087
review[ii]: good


43272
review[ii]: actress


72170
review[ii]: to


7858
review[ii]: play


37453
review[i

1578
review[ii]: stettner


46557
review[ii]: have


51833
review[ii]: truly


11722
review[ii]: succeeded


56032
review[ii]: br


57051
review[ii]: br


57051
review[ii]: with


51325
review[ii]: just


18081
review[ii]: the


22483
review[ii]: right


67545
review[ii]: amount


71158
review[ii]: of


63948
review[ii]: restraint


2651
review[ii]: robin


32538
review[ii]: williams


57786
review[ii]: captures


13741
review[ii]: the


22483
review[ii]: fragile


28123
review[ii]: essence


21048
review[ii]: of


63948
review[ii]: gabriel


66263
review[ii]: and


46937
review[ii]: lets


13436
review[ii]: us


66689
review[ii]: see


47201
review[ii]: his


38592
review[ii]: struggle


71234
review[ii]: with


51325
review[ii]: issues


5315
review[ii]: of


63948
review[ii]: trust


64568
review[ii]: both


62473
review[ii]: in


40130
review[ii]: his


38592
review[ii]: personnel


9924
review[ii]: life


35711
review[ii]: jess


6488
review[ii]: and


46937
review[ii]: the


2248


71137
review[ii]: of


63948
review[ii]: being


21687
review[ii]: trapped


3386
review[ii]: in


40130
review[ii]: a


33087
review[ii]: really


62137
review[ii]: bad


50702
review[ii]: film


27838
review[ii]: ufortunately


50871
review[ii]: no


58667
review[ii]: one


9190
review[ii]: could


36477
review[ii]: ever


40853
review[ii]: be


54471
review[ii]: good


43272
review[ii]: enough


44178
review[ii]: to


7858
review[ii]: redeem


50920
review[ii]: this


52403
review[ii]: endless


19787
review[ii]: series


30535
review[ii]: of


63948
review[ii]: flaws


70071
review[ii]: if


56128
review[ii]: you


31774
review[ii]: like


61721
review[ii]: these


46612
review[ii]: three


19782
review[ii]: actresses


52378
review[ii]: watch


7866
review[ii]: them


25761
review[ii]: in


40130
review[ii]: something


30183
review[ii]: else


28017
review[ii]: this


52403
review[ii]: movie


17829
review[ii]: is


39646
review[ii]: not


66217
review[ii]: worth


18868
review[



57006
review[ii]: right


67545
review[ii]: out


69538
review[ii]: of


63948
review[ii]: a


33087
review[ii]: medical


53134
review[ii]: nightmare


42264
review[ii]: this


52403
review[ii]: movie


17829
review[ii]: is


39646
review[ii]: pure


60462
review[ii]: robin


32538
review[ii]: williams


57786
review[ii]: and


46937
review[ii]: were


37192
review[ii]: it


70513
review[ii]: not


66217
review[ii]: for


32953
review[ii]: toni


15692
review[ii]: collette


67883
review[ii]: who


12226
review[ii]: plays


70641
review[ii]: donna


23767
review[ii]: d


14942
review[ii]: logand


24196
review[ii]: sandra


15546
review[ii]: oh


51639
review[ii]: as


70129
review[ii]: anna


52694
review[ii]: and


46937
review[ii]: john


40483
review[ii]: cullum


13329
review[ii]: as


70129
review[ii]: pop


73132
review[ii]: this


52403
review[ii]: might


58182
review[ii]: be


54471
review[ii]: comical


52762
review[ii]: instead


58515
review[ii]: this


52403
review[ii]

review[ii]: emotion


29260
review[ii]: maybe


33590
review[ii]: it


70513
review[ii]: is


39646
review[ii]: because


25852
review[ii]: of


63948
review[ii]: the


22483
review[ii]: stiff


58865
review[ii]: upper


22448
review[ii]: lip


32853
review[ii]: of


63948
review[ii]: the


22483
review[ii]: higher


69265
review[ii]: social


20548
review[ii]: class


58791
review[ii]: br


57051
review[ii]: br


57051
review[ii]: it


70513
review[ii]: s


26937
review[ii]: sad


72466
review[ii]: that


43433
review[ii]: the


22483
review[ii]: walker


36993
review[ii]: becomes


31194
review[ii]: such


1413
review[ii]: a


33087
review[ii]: boring


59408
review[ii]: mess


55768
review[ii]: despite


30761
review[ii]: such


1413
review[ii]: a


33087
review[ii]: strong


42760
review[ii]: cast


71912
review[ii]: blame


12153
review[ii]: it


70513
review[ii]: on


25050
review[ii]: the


22483
review[ii]: poor


70299
review[ii]: plot


7218
review[ii]: and


46937
review[ii]

review[ii]: to


7858
review[ii]: have


51833
review[ii]: a


33087
review[ii]: director


4565
review[ii]: s


26937
review[ii]: cut


44685
review[ii]: that


43433
review[ii]: tries


22471
review[ii]: to


7858
review[ii]: fix


1259
review[ii]: these


46612
review[ii]: problems


24766
review[ii]: a


33087
review[ii]: worn


29291
review[ii]: out


69538
review[ii]: plot


7218
review[ii]: of


63948
review[ii]: a


33087
review[ii]: man


46450
review[ii]: who


12226
review[ii]: takes


55949
review[ii]: the


22483
review[ii]: rap


11356
review[ii]: for


32953
review[ii]: a


33087
review[ii]: woman


50040
review[ii]: in


40130
review[ii]: a


33087
review[ii]: murder


64516
review[ii]: case


70497
review[ii]: the


22483
review[ii]: equally


51902
review[ii]: worn


29291
review[ii]: out


69538
review[ii]: plot


7218
review[ii]: of


63948
review[ii]: an


47250
review[ii]: outsider


61380
review[ii]: on


25050
review[ii]: the


22483
review[ii]: inside


64972
r

review[ii]: jess


6488
review[ii]: bobby


72590
review[ii]: cannavale


19193
review[ii]: good


43272
review[ii]: happens


24904
review[ii]: to


7858
review[ii]: be


54471
review[ii]: a


33087
review[ii]: survivor


15348
review[ii]: of


63948
review[ii]: hiv


14573
review[ii]: himself


4950
review[ii]: br


57051
review[ii]: br


57051
review[ii]: he


61308
review[ii]: also


39418
review[ii]: acquaints


25374
review[ii]: himself


4950
review[ii]: with


51325
review[ii]: pete


73201
review[ii]: s


26937
review[ii]: guardian


54534
review[ii]: a


33087
review[ii]: woman


50040
review[ii]: named


34838
review[ii]: donna


23767
review[ii]: toni


15692
review[ii]: collette


67883
review[ii]: brilliant


56387
review[ii]: and


46937
review[ii]: when


72788
review[ii]: gabriel


66263
review[ii]: decides


46013
review[ii]: he


61308
review[ii]: wants


15894
review[ii]: to


7858
review[ii]: meet


45060
review[ii]: and


46937
review[ii]: talk


24109
review[ii]:

review[ii]: certain


30636
review[ii]: taupin


45386
review[ii]: must


68684
review[ii]: have


51833
review[ii]: written


3961
review[ii]: something


30183
review[ii]: truly


11722
review[ii]: good


43272
review[ii]: to


7858
review[ii]: have


51833
review[ii]: inspired


20697
review[ii]: at


5640
review[ii]: least


58852
review[ii]: one


9190
review[ii]: commendable


57935
review[ii]: effort


67991
review[ii]: ghost


59947
review[ii]: of


63948
review[ii]: dragstrip


38333
review[ii]: hollow


41011
review[ii]: was


1316
review[ii]: one


9190
review[ii]: of


63948
review[ii]: the


22483
review[ii]: many


67033
review[ii]: s


26937
review[ii]: movies


54396
review[ii]: about


57899
review[ii]: hot


57567
review[ii]: rodding


4274
review[ii]: teens


69526
review[ii]: encountering


28030
review[ii]: the


22483
review[ii]: supernatural


44722
review[ii]: in


40130
review[ii]: this


52403
review[ii]: case


70497
review[ii]: the


22483
review[ii]: teens


48296
review[ii]: but


21918
review[ii]: the


22483
review[ii]: slow


45507
review[ii]: and


46937
review[ii]: sometimes


55079
review[ii]: awkward


12265
review[ii]: pacing


48296
review[ii]: is


39646
review[ii]: deliberate


31546
review[ii]: everything


5920
review[ii]: that


43433
review[ii]: unfolds


29949
review[ii]: in


40130
review[ii]: this


52403
review[ii]: movie


17829
review[ii]: is


39646
review[ii]: kept


27249
review[ii]: well


26392
review[ii]: within


72712
review[ii]: the


22483
review[ii]: realm


63192
review[ii]: of


63948
review[ii]: possibility


26267
review[ii]: and


46937
review[ii]: real


61315
review[ii]: life


35711
review[ii]: just


18081
review[ii]: sort


64752
review[ii]: of


63948
review[ii]: plods


72361
review[ii]: alongno


49561
review[ii]: so


12183
review[ii]: there


6512
review[ii]: are


20085
review[ii]: no


58667
review[ii]: flashy


23884
review[ii]: endings


33826
review[ii]: or


17790
review[ii]: earth


50



22483
review[ii]: other


4308
review[ii]: poster


43119
review[ii]: to


7858
review[ii]: which


13587
review[ii]: i


27254
review[ii]: referred


16238
review[ii]: i


27254
review[ii]: actually


48130
review[ii]: know


18191
review[ii]: something


30183
review[ii]: about


57899
review[ii]: direction


4600
review[ii]: i


27254
review[ii]: ve


72151
review[ii]: been


26330
review[ii]: sutdying


68026
review[ii]: the


22483
review[ii]: art


1444
review[ii]: of


63948
review[ii]: direction


4600
review[ii]: at


5640
review[ii]: school


15290
review[ii]: now


22300
review[ii]: for


32953
review[ii]: years


50632
review[ii]: of


63948
review[ii]: course


18772
review[ii]: i


27254
review[ii]: really


62137
review[ii]: don


38568
review[ii]: t


11339
review[ii]: think


14974
review[ii]: that


43433
review[ii]: makes


36691
review[ii]: a


33087
review[ii]: lick


49592
review[ii]: of


63948
review[ii]: difference


53501
review[ii]: the


22483
review[ii]: 

51734
review[ii]: like


61721
review[ii]: a


33087
review[ii]: comic


44198
review[ii]: sidekick


33484
review[ii]: br


57051
review[ii]: br


57051
review[ii]: suddenly


70484
review[ii]: the


22483
review[ii]: sexy


45022
review[ii]: kareena


61087
review[ii]: looks


9166
review[ii]: anorexic


62042
review[ii]: you


31774
review[ii]: realise


6989
review[ii]: that


43433
review[ii]: the


22483
review[ii]: second


15411
review[ii]: last


41959
review[ii]: floor


16606
review[ii]: is


39646
review[ii]: now


22300
review[ii]: empty


71079
review[ii]: and


46937
review[ii]: her


47728
review[ii]: face


30769
review[ii]: looks


9166
review[ii]: to


7858
review[ii]: big


53558
review[ii]: for


32953
review[ii]: her


47728
review[ii]: body


34564
review[ii]: only


61041
review[ii]: girls


64443
review[ii]: can


10609
review[ii]: notice


58596
review[ii]: this


52403
review[ii]: and


46937
review[ii]: make


36470
review[ii]: other


4308
review[ii]: guys




11578
review[ii]: and


46937
review[ii]: that


43433
review[ii]: gets


65794
review[ii]: to


7858
review[ii]: the


22483
review[ii]: big


53558
review[ii]: problem


36286
review[ii]: which


13587
review[ii]: is


39646
review[ii]: that


43433
review[ii]: it


70513
review[ii]: really


62137
review[ii]: doesn


5758
review[ii]: t


11339
review[ii]: have


51833
review[ii]: much


10024
review[ii]: of


63948
review[ii]: cinematic


8574
review[ii]: interest


61079
review[ii]: to


7858
review[ii]: it


70513
review[ii]: besides


58447
review[ii]: the


22483
review[ii]: point


36494
review[ii]: it


70513
review[ii]: ends


5716
review[ii]: up


52168
review[ii]: being


21687
review[ii]: a


33087
review[ii]: fairly


31594
review[ii]: bland


32975
review[ii]: movie


17829
review[ii]: overall


55335
review[ii]: that


43433
review[ii]: invests


15655
review[ii]: everything


5920
review[ii]: in


40130
review[ii]: the


22483
review[ii]: idea


53805
review[ii]: tha

review[ii]: the


22483
review[ii]: guests


23611
review[ii]: he


61308
review[ii]: dies


19209
review[ii]: early


7296
review[ii]: in


40130
review[ii]: the


22483
review[ii]: film


27838
review[ii]: or


17790
review[ii]: does


71718
review[ii]: he


61308
review[ii]: and


46937
review[ii]: the


22483
review[ii]: residents


72434
review[ii]: of


63948
review[ii]: the


22483
review[ii]: house


4052
review[ii]: are


20085
review[ii]: subjected


16662
review[ii]: to


7858
review[ii]: a


33087
review[ii]: number


31353
review[ii]: of


63948
review[ii]: terrifying


48494
review[ii]: experiences


56628
review[ii]: i


27254
review[ii]: won


36562
review[ii]: t


11339
review[ii]: go


27847
review[ii]: into


7080
review[ii]: too


23430
review[ii]: much


10024
review[ii]: detail


11205
review[ii]: here


39807
review[ii]: but


21918
review[ii]: it


70513
review[ii]: is


39646
review[ii]: definitely


64821
review[ii]: a


33087
review[ii]: must


68684
review[i

12226
review[ii]: wrote


69048
review[ii]: dhoom


65711
review[ii]: and


46937
review[ii]: dhoom


65711
review[ii]: both


62473
review[ii]: of


63948
review[ii]: which


13587
review[ii]: i


27254
review[ii]: enjoyed


48756
review[ii]: tremendously


25023
review[ii]: br


57051
review[ii]: br


57051
review[ii]: in


40130
review[ii]: his


38592
review[ii]: rookie


67700
review[ii]: directorial


46159
review[ii]: outing


9552
review[ii]: with


51325
review[ii]: tashan


63837
review[ii]: while


45456
review[ii]: you


31774
review[ii]: can


10609
review[ii]: t


11339
review[ii]: fault


4231
review[ii]: his


38592
review[ii]: direction


4600
review[ii]: you


31774
review[ii]: d


14942
review[ii]: probably


15790
review[ii]: scratch


12970
review[ii]: your


72497
review[ii]: head


35211
review[ii]: over


4422
review[ii]: the


22483
review[ii]: plot


7218
review[ii]: which


13587
review[ii]: was


1316
review[ii]: clunky


35471
review[ii]: at


5640
review[i

review[ii]: able


19324
review[ii]: to


7858
review[ii]: despite


30761
review[ii]: the


22483
review[ii]: obvious


2279
review[ii]: language


6861
review[ii]: gaps


8363
review[ii]: emerge


1253
review[ii]: from


36714
review[ii]: screenings


54101
review[ii]: humming


59769
review[ii]: a


33087
review[ii]: tune


54372
review[ii]: or


17790
review[ii]: two


64817
review[ii]: i


27254
review[ii]: wasn


6927
review[ii]: t


11339
review[ii]: able


19324
review[ii]: to


7858
review[ii]: do


17501
review[ii]: that


43433
review[ii]: after


6788
review[ii]: tashan


63837
review[ii]: because


25852
review[ii]: the


22483
review[ii]: songs


29812
review[ii]: unfortunately


41451
review[ii]: just


18081
review[ii]: weren


18111
review[ii]: t


11339
review[ii]: catchy


71426
review[ii]: at


5640
review[ii]: all


57595
review[ii]: usually


50704
review[ii]: the


22483
review[ii]: song


61254
review[ii]: dance


59248
review[ii]: routine


68102
review[ii]: wo



57899
review[ii]: it


70513
review[ii]: without


45389
review[ii]: giving


6732
review[ii]: any


59299
review[ii]: kind


2813
review[ii]: of


63948


32068
review[ii]: let


65237
review[ii]: s


26937
review[ii]: just


18081
review[ii]: say


13860
review[ii]: that


43433
review[ii]: this


52403
review[ii]: movie


17829
review[ii]: is


39646
review[ii]: like


61721
review[ii]: an


47250
review[ii]: exercise


69944
review[ii]: in


40130
review[ii]: cinema


47607
review[ii]: but


21918
review[ii]: really


62137
review[ii]: really


62137
review[ii]: great


24682
review[ii]: done


71568
review[ii]: its


54814
review[ii]: made


26819
review[ii]: with


51325
review[ii]: super


40831
review[ii]: black


26782
review[ii]: and


46937
review[ii]: white


49974
review[ii]: shots


47362
review[ii]: milimeters


63065
review[ii]: color


69752
review[ii]: interviews


31089
review[ii]: flashbacks


31146
review[ii]: aro


30027
review[ii]: tolbukhin


29990
review[ii]:

KeyboardInterrupt: 

## solution

In [6]:
from collections import Counter
counts = Counter(words)
vocab = sorted(counts, key=counts.get, reverse=True)
vocab_to_int = {word: ii for ii, word in enumerate(vocab, 1)}

reviews_ints = []
for each in reviews:
    reviews_ints.append([vocab_to_int[word] for word in each.split()])

In [89]:
len(reviews_ints)

25001

### Encoding the labels

Our labels are "positive" or "negative". To use these labels in our network, we need to convert them to 0 and 1.

> **Exercise:** Convert labels from `positive` and `negative` to 1 and 0, respectively.

In [7]:
# Convert labels to 1s and 0s for 'positive' and 'negative'
labels = (np.array(labels.split('\n')) == 'positive').astype(int)

In [8]:
labels.shape

(25001,)

If you built `labels` correctly, you should see the next output.

In [9]:
from collections import Counter
review_lens = Counter([len(x) for x in reviews_ints])
print("Zero-length reviews: {}".format(review_lens[0]))
print("Maximum review length: {}".format(max(review_lens)))

Zero-length reviews: 1
Maximum review length: 2514


Okay, a couple issues here. We seem to have one review with zero length. And, the maximum review length is way too many steps for our RNN. Let's truncate to 200 steps. For reviews shorter than 200, we'll pad with 0s. For reviews longer than 200, we can truncate them to the first 200 characters.

> **Exercise:** First, remove the review with zero length from the `reviews_ints` list.

In [93]:
# Filter out that review with 0 length
reviews_ints = [ review for review in reviews_ints if len(review) != 0]

> labels also needed to remove

# solution

In [31]:
non_zero_idx = [ii for ii, review in enumerate(reviews_ints) if len(review) != 0]
len(non_zero_idx)
reviews_ints = [reviews_ints[ii] for ii in non_zero_idx]
labels = np.array([labels[ii] for ii in non_zero_idx])

In [33]:
labels.shape

(25000,)

In [11]:
from collections import Counter
review_lens = Counter([len(x) for x in reviews_ints])
print("Zero-length reviews: {}".format(review_lens[0]))
print("Maximum review length: {}".format(max(review_lens)))

Zero-length reviews: 0
Maximum review length: 2514


> **Exercise:** Now, create an array `features` that contains the data we'll pass to the network. The data should come from `review_ints`, since we want to feed integers to the network. Each row should be 200 elements long. For reviews shorter than 200 words, left pad with 0s. That is, if the review is `['best', 'movie', 'ever']`, `[117, 18, 128]` as integers, the row will look like `[0, 0, 0, ..., 0, 117, 18, 128]`. For reviews longer than 200, use on the first 200 words as the feature vector.

This isn't trivial and there are a bunch of ways to do this. But, if you're going to be building your own deep learning networks, you're going to have to get used to preparing your data.



In [12]:
reviews_ints[0]

[21025,
 308,
 6,
 3,
 1050,
 207,
 8,
 2138,
 32,
 1,
 171,
 57,
 15,
 49,
 81,
 5785,
 44,
 382,
 110,
 140,
 15,
 5194,
 60,
 154,
 9,
 1,
 4975,
 5852,
 475,
 71,
 5,
 260,
 12,
 21025,
 308,
 13,
 1978,
 6,
 74,
 2395,
 5,
 613,
 73,
 6,
 5194,
 1,
 24103,
 5,
 1983,
 10166,
 1,
 5786,
 1499,
 36,
 51,
 66,
 204,
 145,
 67,
 1199,
 5194,
 19869,
 1,
 37442,
 4,
 1,
 221,
 883,
 31,
 2988,
 71,
 4,
 1,
 5787,
 10,
 686,
 2,
 67,
 1499,
 54,
 10,
 216,
 1,
 383,
 9,
 62,
 3,
 1406,
 3686,
 783,
 5,
 3483,
 180,
 1,
 382,
 10,
 1212,
 13583,
 32,
 308,
 3,
 349,
 341,
 2913,
 10,
 143,
 127,
 5,
 7690,
 30,
 4,
 129,
 5194,
 1406,
 2326,
 5,
 21025,
 308,
 10,
 528,
 12,
 109,
 1448,
 4,
 60,
 543,
 102,
 12,
 21025,
 308,
 6,
 227,
 4146,
 48,
 3,
 2211,
 12,
 8,
 215,
 23]

In [34]:
seq_len = 200
features = list()
for review in reviews_ints:
    len_review = len(review)
    if len_review > 200:
        features.append(review[0:200])
    elif len_review == 200:
        features.append(review)
    else:
        temp = [0] * (seq_len -  len_review) + review 
        features.append(temp)
features = np.array(features)

If you build features correctly, it should look like that cell output below.

In [14]:
features[:10,:100]

array([[    0,     0,     0,     0,     0,     0,     0,     0,     0,
            0,     0,     0,     0,     0,     0,     0,     0,     0,
            0,     0,     0,     0,     0,     0,     0,     0,     0,
            0,     0,     0,     0,     0,     0,     0,     0,     0,
            0,     0,     0,     0,     0,     0,     0,     0,     0,
            0,     0,     0,     0,     0,     0,     0,     0,     0,
            0,     0,     0,     0,     0,     0, 21025,   308,     6,
            3,  1050,   207,     8,  2138,    32,     1,   171,    57,
           15,    49,    81,  5785,    44,   382,   110,   140,    15,
         5194,    60,   154,     9,     1,  4975,  5852,   475,    71,
            5,   260,    12, 21025,   308,    13,  1978,     6,    74,
         2395],
       [    0,     0,     0,     0,     0,     0,     0,     0,     0,
            0,     0,     0,     0,     0,     0,     0,     0,     0,
            0,     0,     0,     0,     0,     0,     0,     

In [35]:
features.shape

(25000, 200)

## Training, Validation, Test



With our data in nice shape, we'll split it into training, validation, and test sets.

> **Exercise:** Create the training, validation, and test sets here. You'll need to create sets for the features and the labels, `train_x` and `train_y` for example. Define a split fraction, `split_frac` as the fraction of data to keep in the training set. Usually this is set to 0.8 or 0.9. The rest of the data will be split in half to create the validation and testing data.

In [42]:
split_frac = 0.8
index_split = np.int(labels.shape[0] * split_frac)
print(index_split)
train_x, val_x = features[: index_split, :], features[index_split:, :]
train_y, val_y = labels[: index_split], labels[index_split:]

index_split2 = np.int(val_y.shape[0] * 0.5)
print(val_y.shape[0])
print(index_split2)
val_x, test_x = val_x[: index_split2, :], val_x[index_split2:, :]
val_y, test_y = val_y[: index_split2], val_y[index_split2:]

print("\t\t\tFeature Shapes:")
print("Train set: \t\t{}".format(train_x.shape), 
      "\nValidation set: \t{}".format(val_x.shape),
      "\nTest set: \t\t{}".format(test_x.shape))

20000
5000
2500
			Feature Shapes:
Train set: 		(20000, 200) 
Validation set: 	(2500, 200) 
Test set: 		(2500, 200)


With train, validation, and text fractions of 0.8, 0.1, 0.1, the final shapes should look like:
```
                    Feature Shapes:
Train set: 		 (20000, 200) 
Validation set: 	(2500, 200) 
Test set: 		  (2500, 200)
```

In [37]:
train_x

array([[    0,     0,     0, ...,     8,   215,    23],
       [    0,     0,     0, ...,    29,   108,  3324],
       [22382,    42, 46418, ...,   483,    17,     3],
       ..., 
       [    0,     0,     0, ...,    28,    77,   384],
       [    0,     0,     0, ...,     1,  1893,  3610],
       [    0,     0,     0, ...,     2,  2428,     8]])

In [38]:
train_x.shape

(20000, 200)

In [39]:
train_y.shape

(20000,)

> add one dimension to labels

In [43]:
train_y = train_y.reshape(-1, 1)

In [44]:
train_y.shape

(20000, 1)

In [45]:
val_y = val_y.reshape(-1, 1)
test_y = test_y.reshape(-1, 1)

In [46]:
val_y.shape

(2500, 1)

In [47]:
test_y.shape

(2500, 1)

In [48]:
print("\t\t\tFeature Shapes:")
print("Train set: \t\t{}".format(train_x.shape), 
      "\nValidation set: \t{}".format(val_x.shape),
      "\nTest set: \t\t{}".format(test_x.shape))

			Feature Shapes:
Train set: 		(20000, 200) 
Validation set: 	(2500, 200) 
Test set: 		(2500, 200)


In [49]:
print("\t\t\tlabes Shapes:")
print("Train set: \t\t{}".format(train_y.shape), 
      "\nValidation set: \t{}".format(val_y.shape),
      "\nTest set: \t\t{}".format(test_y.shape))

			labes Shapes:
Train set: 		(20000, 1) 
Validation set: 	(2500, 1) 
Test set: 		(2500, 1)


## Build the graph

Here, we'll build the graph. First up, defining the hyperparameters.

* `lstm_size`: Number of units in the hidden layers in the LSTM cells. Usually larger is better performance wise. Common values are 128, 256, 512, etc.
* `lstm_layers`: Number of LSTM layers in the network. I'd start with 1, then add more if I'm underfitting.
* `batch_size`: The number of reviews to feed the network in one training pass. Typically this should be set as high as you can go without running out of memory.
* `learning_rate`: Learning rate

In [107]:
lstm_size = 256
lstm_layers = 1
batch_size = 500
learning_rate = 0.001

For the network itself, we'll be passing in our 200 element long review vectors. Each batch will be `batch_size` vectors. We'll also be using dropout on the LSTM layer, so we'll make a placeholder for the keep probability.

> **Exercise:** Create the `inputs_`, `labels_`, and drop out `keep_prob` placeholders using `tf.placeholder`. `labels_` needs to be two-dimensional to work with some functions later.  Since `keep_prob` is a scalar (a 0-dimensional tensor), you shouldn't provide a size to `tf.placeholder`.

In [108]:
n_words = len(vocab_to_int) + 1 # Adding 1 because we use 0's for padding, dictionary started at 1

# Create the graph object
graph = tf.Graph()
# Add nodes to the graph
with graph.as_default():
    inputs_ = tf.placeholder(dtype= tf.int32, shape=[None, seq_len], name ='inputs')
    labels_ = tf.placeholder(dtype= tf.int32, shape= [None, 1], name = 'labels')
    keep_prob = tf.placeholder(dtype= tf.float32, name= 'keep_prob')

### Embedding

Now we'll add an embedding layer. We need to do this because there are 74000 words in our vocabulary. It is massively inefficient to one-hot encode our classes here. You should remember dealing with this problem from the word2vec lesson. Instead of one-hot encoding, we can have an embedding layer and use that layer as a lookup table. You could train an embedding layer using word2vec, then load it here. But, it's fine to just make a new layer and let the network learn the weights.

> **Exercise:** Create the embedding lookup matrix as a `tf.Variable`. Use that embedding matrix to get the embedded vectors to pass to the LSTM cell with [`tf.nn.embedding_lookup`](https://www.tensorflow.org/api_docs/python/tf/nn/embedding_lookup). This function takes the embedding matrix and an input tensor, such as the review vectors. Then, it'll return another tensor with the embedded vectors. So, if the embedding layer has 200 units, the function will return a tensor with size [batch_size, 200].



In [109]:
# Size of the embedding vectors (number of units in the embedding layer)
embed_size = 300 

with graph.as_default():
    embedding = tf.Variable(tf.random_uniform((n_words, embed_size), -1, 1))
    embed = tf.nn.embedding_lookup(embedding, inputs_)

In [110]:
embed.shape

TensorShape([Dimension(None), Dimension(200), Dimension(300)])

### LSTM cell

<img src="assets/network_diagram.png" width=400px>

Next, we'll create our LSTM cells to use in the recurrent network ([TensorFlow documentation](https://www.tensorflow.org/api_docs/python/tf/contrib/rnn)). Here we are just defining what the cells look like. This isn't actually building the graph, just defining the type of cells we want in our graph.

To create a basic LSTM cell for the graph, you'll want to use `tf.contrib.rnn.BasicLSTMCell`. Looking at the function documentation:

```
tf.contrib.rnn.BasicLSTMCell(num_units, forget_bias=1.0, input_size=None, state_is_tuple=True, activation=<function tanh at 0x109f1ef28>)
```

you can see it takes a parameter called `num_units`, the number of units in the cell, called `lstm_size` in this code. So then, you can write something like 

```
lstm = tf.contrib.rnn.BasicLSTMCell(num_units)
```

to create an LSTM cell with `num_units`. Next, you can add dropout to the cell with `tf.contrib.rnn.DropoutWrapper`. This just wraps the cell in another cell, but with dropout added to the inputs and/or outputs. It's a really convenient way to make your network better with almost no effort! So you'd do something like

```
drop = tf.contrib.rnn.DropoutWrapper(cell, output_keep_prob=keep_prob)
```

Most of the time, your network will have better performance with more layers. That's sort of the magic of deep learning, adding more layers allows the network to learn really complex relationships. Again, there is a simple way to create multiple layers of LSTM cells with `tf.contrib.rnn.MultiRNNCell`:

```
cell = tf.contrib.rnn.MultiRNNCell([drop] * lstm_layers)
```

Here, `[drop] * lstm_layers` creates a list of cells (`drop`) that is `lstm_layers` long. The `MultiRNNCell` wrapper builds this into multiple layers of RNN cells, one for each cell in the list.

So the final cell you're using in the network is actually multiple (or just one) LSTM cells with dropout. But it all works the same from an achitectural viewpoint, just a more complicated graph in the cell.

> **Exercise:** Below, use `tf.contrib.rnn.BasicLSTMCell` to create an LSTM cell. Then, add drop out to it with `tf.contrib.rnn.DropoutWrapper`. Finally, create multiple LSTM layers with `tf.contrib.rnn.MultiRNNCell`.

Here is [a tutorial on building RNNs](https://www.tensorflow.org/tutorials/recurrent) that will help you out.


In [111]:
with graph.as_default():
    # Your basic LSTM cell
    lstm = tf.contrib.rnn.BasicLSTMCell(lstm_size)
    
    # Add dropout to the cell
    drop = tf.contrib.rnn.DropoutWrapper(lstm, output_keep_prob=keep_prob)
    
    # Stack up multiple LSTM layers, for deep learning
    cell = tf.contrib.rnn.MultiRNNCell([drop] * lstm_layers)
    
    # Getting an initial state of all zeros
    initial_state = cell.zero_state(batch_size, tf.float32)

In [112]:
initial_state

(LSTMStateTuple(c=<tf.Tensor 'MultiRNNCellZeroState/DropoutWrapperZeroState/BasicLSTMCellZeroState/zeros:0' shape=(500, 256) dtype=float32>, h=<tf.Tensor 'MultiRNNCellZeroState/DropoutWrapperZeroState/BasicLSTMCellZeroState/zeros_1:0' shape=(500, 256) dtype=float32>),)

### RNN forward pass

<img src="assets/network_diagram.png" width=400px>

Now we need to actually run the data through the RNN nodes. You can use [`tf.nn.dynamic_rnn`](https://www.tensorflow.org/api_docs/python/tf/nn/dynamic_rnn) to do this. You'd pass in the RNN cell you created (our multiple layered LSTM `cell` for instance), and the inputs to the network.

```
outputs, final_state = tf.nn.dynamic_rnn(cell, inputs, initial_state=initial_state)
```

Above I created an initial state, `initial_state`, to pass to the RNN. This is the cell state that is passed between the hidden layers in successive time steps. `tf.nn.dynamic_rnn` takes care of most of the work for us. We pass in our cell and the input to the cell, then it does the unrolling and everything else for us. It returns outputs for each time step and the final_state of the hidden layer.

> **Exercise:** Use `tf.nn.dynamic_rnn` to add the forward pass through the RNN. Remember that we're actually passing in vectors from the embedding layer, `embed`.



In [113]:
initial_state

(LSTMStateTuple(c=<tf.Tensor 'MultiRNNCellZeroState/DropoutWrapperZeroState/BasicLSTMCellZeroState/zeros:0' shape=(500, 256) dtype=float32>, h=<tf.Tensor 'MultiRNNCellZeroState/DropoutWrapperZeroState/BasicLSTMCellZeroState/zeros_1:0' shape=(500, 256) dtype=float32>),)

In [114]:
with graph.as_default():
    outputs, final_state = tf.nn.dynamic_rnn(cell, embed,
                                             initial_state=initial_state)

In [115]:
outputs.shape

TensorShape([Dimension(500), Dimension(200), Dimension(256)])

In [116]:
type(final_state)

tuple

In [117]:
len(final_state)

1

In [118]:
final_state

(LSTMStateTuple(c=<tf.Tensor 'rnn/while/Exit_2:0' shape=(500, 256) dtype=float32>, h=<tf.Tensor 'rnn/while/Exit_3:0' shape=(500, 256) dtype=float32>),)

### Output

We only care about the final output, we'll be using that as our sentiment prediction. So we need to grab the last output with `outputs[:, -1]`, the calculate the cost from that and `labels_`.

In [119]:
outputs[:, -1].shape

TensorShape([Dimension(500), Dimension(256)])

In [120]:
labels_.shape

TensorShape([Dimension(None), Dimension(1)])

> It seems that ouput layer( fully_connected) works untill hidden layer(lstm) all step are finished

In [121]:
with graph.as_default():
    predictions = tf.contrib.layers.fully_connected(outputs[:, -1], 1, activation_fn=tf.sigmoid)
    cost = tf.losses.mean_squared_error(labels_, predictions)
    
    optimizer = tf.train.AdamOptimizer(learning_rate).minimize(cost)

In [122]:
predictions.shape

TensorShape([Dimension(500), Dimension(1)])

### Validation accuracy

Here we can add a few nodes to calculate the accuracy which we'll use in the validation pass.

In [128]:
with graph.as_default():
    correct_pred = tf.equal(tf.cast(tf.round(predictions), tf.int32), labels_)
    accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32))

### Batching

This is a simple function for returning batches from our data. First it removes data such that we only have full batches. Then it iterates through the `x` and `y` arrays and returns slices out of those arrays with size `[batch_size]`.

In [123]:
def get_batches(x, y, batch_size=100):
    
    n_batches = len(x)//batch_size
    x, y = x[:n_batches*batch_size], y[:n_batches*batch_size]
    for ii in range(0, len(x), batch_size):
        yield x[ii:ii+batch_size], y[ii:ii+batch_size]

## Training

Below is the typical training code. If you want to do this yourself, feel free to delete all this code and implement it yourself. Before you run this, make sure the `checkpoints` directory exists.

In [126]:
epochs = 10

with graph.as_default():
    saver = tf.train.Saver()

with tf.Session(graph=graph) as sess:
    sess.run(tf.global_variables_initializer())
    iteration = 1
    for e in range(epochs):
        state = sess.run(initial_state) # reset initial_ state in LSTM cell  every epoch
        
        for ii, (x, y) in enumerate(get_batches(train_x, train_y, batch_size), 1):
            print(x.shape)
            feed = {inputs_: x,
                    labels_: y,
                    keep_prob: 0.5,
                    initial_state: state}
            loss, state, _ = sess.run([cost, final_state, optimizer], feed_dict=feed)
            
            if iteration%5==0:
                print("Epoch: {}/{}".format(e, epochs),
                      "Iteration: {}".format(iteration),
                      "Train loss: {:.3f}".format(loss))

            if iteration%25==0:
                val_acc = []
                val_state = sess.run(cell.zero_state(batch_size, tf.float32))
                for x, y in get_batches(val_x, val_y, batch_size):
                    feed = {inputs_: x,
                            labels_: y[:, None],
                            keep_prob: 1,
                            initial_state: val_state}
                    batch_acc, val_state = sess.run([accuracy, final_state], feed_dict=feed)
                    val_acc.append(batch_acc)
                print("Val acc: {:.3f}".format(np.mean(val_acc)))
            iteration +=1
    saver.save(sess, "checkpoints/sentiment.ckpt")

(500, 200)


KeyboardInterrupt: 

## Testing

In [None]:
test_acc = []
with tf.Session(graph=graph) as sess:
    saver.restore(sess, tf.train.latest_checkpoint('checkpoints'))
    test_state = sess.run(cell.zero_state(batch_size, tf.float32))
    for ii, (x, y) in enumerate(get_batches(test_x, test_y, batch_size), 1):
        feed = {inputs_: x,
                labels_: y[:, None],
                keep_prob: 1,
                initial_state: test_state}
        batch_acc, test_state = sess.run([accuracy, final_state], feed_dict=feed)
        test_acc.append(batch_acc)
    print("Test accuracy: {:.3f}".format(np.mean(test_acc)))