<h1 style="color:brown;">  Recurrent neural nets</h1> 

![Looping network](./img/RNN_colah.png)

##### RNNs can produce amazing results <a href ="http://karpathy.github.io/2015/05/21/rnn-effectiveness/">blog</a>

### Lesson plan 
1. Why classic neural nets are not enough?
2. RNN 
3. Takeaways
4. Hands on RNN

In [1]:
import numpy as np

### Classic nets vs. RNN's

Classic:
    - Inputs and outputs must be fixed-sized vectors
    - No idea of location or time 

RNNs: 

![](./img/diags.jpeg)

### Idea I: Memory - your current choices are based on previous understanding

Add some cell in the network to keep previous memory and combine with current input to predict next word

![](./img/memory_rnn.png)

#### Problem: calculating the derivative (aka gradient) is problematic, either infinite or zero.

Imagine the memory at time t is the memory at time t-1 times a weight vector:
    $h_t = W*h_{t-1}$
Then:
    $h_t = W^t * h_0$ 
    
  $W > 1$ $h_t --> \infty$

### Solution: LSTM/GRU

<a href="https://colah.github.io/posts/2015-08-Understanding-LSTMs/">LSTM/GRU blog</a>


![](./img/RNNs.png)

![](./img/LSTM_colah.png)

#### Idea II: gates: don't multiply, use addition for memory!

#### well, even if we can include many words (large n-gram), how can we capture context?
#### If the text mentioned queen Mary and few pages later is talking about the queen, how will our network 
#### know her name is Mary? 

##### Components

    - cell state
    - candidates  

##### Gates
- forget - information to throw (0 means throw all from the cell state)
- input - what values we are going to update
- output - filter which values of the cell we are going to output 

The current cell state is the sum of forgetting and updating with new candidates

### Extension: attention

<a href="https://www.youtube.com/watch?v=SysgYptB198">Intuition</a>

######  - Translate part by part
###### -  Use attention weights - how much attention should you give to each word in the input (update weights to each new word)

![](./img/attention.png)

### Takeaways:
    

##### RNN
- Old generation RNNs suffered from exploding/vanishing gradients
- New generation RNNs (commonly LSTM or GRU) are using memory gates to mitigate this problem
- RNNs are just multiple copies of a NN connected by the hidden layer
- Training is done again by backpropogation
- Weights are shared accros all network
- RNN's can be used for any sequence. Unlike time series models can include both time and features.
- Are flexible in input and output sizes
- Amazing results in NLP, recommendations and many more.
- Many flavours - BRNN, CRNN...

##### Attention
- Typicall for translations/images
- Weight all the words in one language to decide how much they should influence input to translated language
- components: word weights, BRNN, RNN, context vectors.

##### Hands on RNN's

data source: https://www.kaggle.com/crowdflower/data

In [38]:
import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
import codecs
import csv
import re
import datetime
import tensorflow as tf
import text_to_word_list
from keras.preprocessing.text import Tokenizer
from keras.preprocessing.sequence import pad_sequences
from keras.models import Sequential, Model
from keras.callbacks import ModelCheckpoint, EarlyStopping, TensorBoard
from keras.layers import Dense, Embedding, LSTM, SpatialDropout1D, Input, Dropout, BatchNormalization

In [39]:
DATA_FILE = '/Users/omer/Downloads/Sentiment.csv'

In [40]:
data = pd.read_csv('/Users/omer/Downloads/Sentiment.csv')
# Keeping only the neccessary columns
data = data[['text','sentiment']]

In [41]:
data.sentiment.value_counts()

Negative    8493
Neutral     3142
Positive    2236
Name: sentiment, dtype: int64

In [42]:
neg_idx = data[data['sentiment']=='Negative'].index.values
neutral_idx = data[data['sentiment']=='Neutral'].index.values
pos_idx = data[data['sentiment']=='Positive'].index.values

In [43]:
pos_idx_te = np.random.choice(pos_idx, replace=False, size=int(0.1*len(pos_idx)))
pos_idx_tr = np.delete(pos_idx, pos_idx_te)
pos_idx_val = np.random.choice(pos_idx_tr, replace=False, size=int(0.1*len(pos_idx)))
pos_idx_tr = np.delete(pos_idx_tr, pos_idx_val)
n = len(pos_idx_tr)

neut_idx_te = np.random.choice(neutral_idx, replace=False, size=int(0.1*len(neutral_idx)))
neutral_idx_tr = np.delete(neutral_idx, neut_idx_te)
neut_idx_val = np.random.choice(neutral_idx_tr, replace=False, size=int(0.1*len(neutral_idx)))
neutral_idx_tr = np.random.choice(np.delete(neutral_idx_tr, neut_idx_val), size = n, replace = False)


neg_idx_te = np.random.choice(neg_idx, replace=False, size=int(0.1*len(neg_idx)))
neg_idx_tr = np.delete(neg_idx, neg_idx_te)
neg_idx_val = np.random.choice(neg_idx_tr, replace=False, size=int(0.1*len(neg_idx)))
neg_idx_tr = np.random.choice(np.delete(neg_idx_tr, neg_idx_val), replace = False, size = n)


tr_idx = np.concatenate((pos_idx_tr, neg_idx_tr, neutral_idx_tr))
val_idx = np.concatenate((pos_idx_val, neg_idx_val, neut_idx_val))
te_idx = np.concatenate((pos_idx_te, neg_idx_te, neut_idx_te))

  
  after removing the cwd from sys.path.
  
  # Remove the CWD from sys.path while we load stuff.
  
  app.launch_new_instance()


In [44]:
## create list of lists: each sentence is a list containing a list of its words
count = 1 
texts_1_tr, texts_1_val, texts_1_te  = [], [], [] 
labels_tr, labels_val, labels_te = [], [], []
for row in tr_idx:
    texts_1_tr.append(text_to_word_list.text_to_wordlist(data.iloc[row, 0]))
    labels_tr.append((data.iloc[row, 1]))
    count += 1
print('Found %s texts in train.csv' % len(texts_1_tr))
labels_tr = pd.get_dummies(pd.DataFrame(labels_tr))

Found 6459 texts in train.csv


In [45]:
import text_to_word_list
for row in val_idx:
    texts_1_val.append(text_to_word_list.text_to_wordlist(data.iloc[row, 0]))
    labels_val.append((data.iloc[row, 1]))
    count += 1
print('Found %s texts in val.csv' % len(texts_1_val))
labels_val = pd.get_dummies(pd.DataFrame(labels_val))

Found 1386 texts in val.csv


In [46]:
for row in te_idx:
    texts_1_te.append(text_to_word_list.text_to_wordlist(data.iloc[row, 0]))
    labels_te.append((data.iloc[row, 1]))
    count += 1
print('Found %s texts in test.csv' % len(texts_1_te))
labels_te = pd.get_dummies(pd.DataFrame(labels_te))


Found 1386 texts in test.csv


In [47]:
tokenizer = Tokenizer(num_words=200000)
tokenizer.fit_on_texts(texts_1_tr + texts_1_te + texts_1_val)

In [48]:
sequences_1_tr = tokenizer.texts_to_sequences(texts_1_tr)
sequences_1_te = tokenizer.texts_to_sequences(texts_1_te)
sequences_1_val = tokenizer.texts_to_sequences(texts_1_val)

In [50]:
sequences_1_tr[1]

[20,
 200,
 174,
 11,
 3395,
 1136,
 103,
 182,
 29,
 1632,
 157,
 9,
 2,
 184,
 17,
 53,
 5467,
 1,
 228,
 144,
 19]

In [53]:
vocab_size = len(tokenizer.word_index)

10714

In [51]:
maxi = 0 
for listo in range(len(sequences_1_tr)):
    if (len(sequences_1_tr[listo]) > maxi):
        maxi = len(sequences_1_tr[listo])
        
print('the max sequence length is: ' + str(maxi))

the max sequence length is: 30


In [52]:
data_1_tr = pad_sequences(sequences_1_tr, maxlen=maxi, padding='post')
data_1_val = pad_sequences(sequences_1_val, maxlen=maxi, padding='post')
data_1_te = pad_sequences(sequences_1_te, maxlen=maxi, padding='post')

In [16]:
########################################
## define the embedding structure
########################################
embedding_layer = Embedding(vocab_size,
        output_dim = 300,
        input_length=maxi,
        trainable=True, name = 'features')

In [19]:
####
# RNN model
####
lstm_layer = LSTM(64, dropout=0.5, recurrent_dropout=0.5)

sequence_input = Input(shape=(maxi,), dtype='int32')
embedded_sequences = embedding_layer(sequence_input)
lstm = lstm_layer(embedded_sequences)
drop_lstm = Dropout(0.5)(lstm)
batched_norm = BatchNormalization()(drop_lstm)
full_connected = Dense(32, activation='relu')(batched_norm)
drop_full = Dropout(0.5)(full_connected)
batched_norm_2 = BatchNormalization()(drop_full)
pred = Dense(3, activation='softmax')(batched_norm_2)

W0904 19:23:17.495033 4387050944 deprecation_wrapper.py:119] From /Users/omer/anaconda3/lib/python3.7/site-packages/keras/backend/tensorflow_backend.py:74: The name tf.get_default_graph is deprecated. Please use tf.compat.v1.get_default_graph instead.

W0904 19:23:17.509901 4387050944 deprecation_wrapper.py:119] From /Users/omer/anaconda3/lib/python3.7/site-packages/keras/backend/tensorflow_backend.py:517: The name tf.placeholder is deprecated. Please use tf.compat.v1.placeholder instead.

W0904 19:23:17.512438 4387050944 deprecation_wrapper.py:119] From /Users/omer/anaconda3/lib/python3.7/site-packages/keras/backend/tensorflow_backend.py:4138: The name tf.random_uniform is deprecated. Please use tf.random.uniform instead.

W0904 19:23:17.620816 4387050944 deprecation_wrapper.py:119] From /Users/omer/anaconda3/lib/python3.7/site-packages/keras/backend/tensorflow_backend.py:133: The name tf.placeholder_with_default is deprecated. Please use tf.compat.v1.placeholder_with_default instead.

In [20]:
model = Model(inputs=[sequence_input], \
        outputs=pred)

model.compile(loss='categorical_crossentropy',
        optimizer='adam',
        metrics=['acc'])
print(model.summary())

W0904 19:23:18.442364 4387050944 deprecation_wrapper.py:119] From /Users/omer/anaconda3/lib/python3.7/site-packages/keras/optimizers.py:790: The name tf.train.Optimizer is deprecated. Please use tf.compat.v1.train.Optimizer instead.

W0904 19:23:18.465545 4387050944 deprecation_wrapper.py:119] From /Users/omer/anaconda3/lib/python3.7/site-packages/keras/backend/tensorflow_backend.py:3295: The name tf.log is deprecated. Please use tf.math.log instead.



_________________________________________________________________
Layer (type)                 Output Shape              Param #   
input_1 (InputLayer)         (None, 29)                0         
_________________________________________________________________
features (Embedding)         (None, 29, 300)           4525500   
_________________________________________________________________
lstm_1 (LSTM)                (None, 64)                93440     
_________________________________________________________________
dropout_1 (Dropout)          (None, 64)                0         
_________________________________________________________________
batch_normalization_1 (Batch (None, 64)                256       
_________________________________________________________________
dense_1 (Dense)              (None, 32)                2080      
_________________________________________________________________
dropout_2 (Dropout)          (None, 32)                0         
__________

In [28]:
tokenizer.index_word

{1: 'gopdebate',
 2: 'the',
 3: 'gopdebates',
 4: 'to',
 5: 't',
 6: 'co',
 7: 'i',
 8: 'a',
 9: 'is',
 10: 'of',
 11: 'and',
 12: 'http',
 13: 'not',
 14: 'it',
 15: 'you',
 16: 'in',
 17: 'trump',
 18: 'for',
 19: 'on',
 20: 'that',
 21: 'this',
 22: 'do',
 23: 'fox',
 24: 'realdonaldtrump',
 25: 'was',
 26: 'are',
 27: 'about',
 28: 'debate',
 29: 'amp',
 30: 'have',
 31: 'we',
 32: 'he',
 33: 'be',
 34: 'from',
 35: 'at',
 36: 'would',
 37: 'news',
 38: 'they',
 39: 'night',
 40: 'me',
 41: 'what',
 42: 'last',
 43: 'candidates',
 44: 'who',
 45: 'up',
 46: 'with',
 47: 'will',
 48: 'but',
 49: 'so',
 50: 'my',
 51: 'has',
 52: 'gop',
 53: 'as',
 54: 'am',
 55: 'did',
 56: 'like',
 57: 'all',
 58: 'if',
 59: 'just',
 60: 'one',
 61: 'bush',
 62: 'megynkelly',
 63: 'foxnews',
 64: 'how',
 65: 'think',
 66: 'when',
 67: 'cruz',
 68: 'rubio',
 69: 'people',
 70: 'should',
 71: 'https',
 72: 'by',
 73: 'out',
 74: 'get',
 75: 'can',
 76: 'no',
 77: 'jeb',
 78: 'need',
 79: 'president',

In [34]:
tsv_file_path = "tensorboard/metadata.tsv"
!mkdir tensorboard
with open(tsv_file_path,'w+', encoding='utf-8') as file_metadata:
    for i, word in enumerate(tokenizer.word_index):
        print(word)
        file_metadata.write(word +'\n')

mkdir: tensorboard: File exists
gopdebate
the
gopdebates
to
t
co
i
a
is
of
and
http
not
it
you
in
trump
for
on
that
this
do
fox
realdonaldtrump
was
are
about
debate
amp
have
we
he
be
from
at
would
news
they
night
me
what
last
candidates
who
up
with
will
but
so
my
has
gop
as
am
did
like
all
if
just
one
bush
megynkelly
foxnews
how
think
when
cruz
rubio
people
should
https
by
out
get
can
no
jeb
need
president
does
carson
ask
their
god
more
your
or
republican
his
these
said
tedcruz
question
know
him
watching
job
time
only
wallace
donald
questions
chris
now
next
tonight
want
huckabee
why
them
most
candidate
our
right
an
women
really
after
see
very
than
america
were
got
go
thanks
kasich
great
set
there
g
ben
good
carlyfiorina
say
tcot
never
megyn
hillary
face
together
other
rid
she
had
won
tell
take
american
anyone
fair
cannot
debates
stage
us
best
talk
presidential
expose
trying
band
obama
via
2
look
any
ratings
ted
watch
paul
where
truth
even
party
her
walker
hear
fiorina
support
balanced


continues
specifics
debacle
hashtag
latino
lacks
hoped
ll
grown
barbaraboxer
stance
goal
commentary
simple
lower
usually
voxdotcom
sc
funniest
invoking
realalexjones
ball
protecting
mil
dnc
lame
bs
reps
smarter
cute
consensus
secure
baltimore
pls
touch
trumpeffect
believes
constitutional
cause
rosieodonnell
hits
amount
unlike
americaonpoint
floor
jihad
east
liking
beer
gold
iacaucus
major
billmaher
ways
successful
write
fyi
places
stream
upset
wiunion
usual
kick
disagrees
fool
gilmore
allow
player
transgender
roast
didn
daddy
straighten
dont
nearly
trend
screaming
fewer
age
figured
popular
shining
contenders
claim
megynkellydebatequestions
genius
often
insult
opportunity
drama
selfie
low
ability
thehill
taught
fellow
offensive
challenge
expectations
neither
ap
loyalty
aerosmith
motleycrue
speaker
per
4th
forbid
cuz
able
jump
interested
television
summary
airtime
22aday
iava
miracles
owns
putin
cold
walks
branch
accomplishments
democracy
planning
stick
coincidence
boss
screw
donations
b

k37hnbe2jy
8jbvwjtj3t
mel8jxm19b
secular
heart
kasichs
creating
calculated
wisdom
believed
photobomb
relatable
ba4o0000wluu
introduced
mental
fn6m0gmknq
commitments
doesnt
exist
exhausting
thursday
blocked
fringe
contender
sheriffclarke
2000vlxwb9000d
zycvny4noq
kqlgcbav7p
lx0rkahzmq
prklk6x4ji
crushes
confidence
trustworthiness
dominate
cl2buzxqyz
42
users
charted
proven
lights
pk45i5zvk0
fbi
trs8vmwjyg
teammarco
vanity
fabulous
vgz4000nfxk7
rnunbpawgp
sfqiwda61g
1on1
pleasant
wv
4ljeug3gye
defines
amendments
enjoy
6000hqpdqpn9
outs
spoton
googling
lmfao
epic
otr
shared
principled
fearless
priebus
risks
surges
hagmnn1tv7
bottom
keeping
industry
refinery29
partyofthefuture
1980
thanever
usvetram
freelion7
quwoclfs8q
strongly
mayor
hero
controversy
arguments
details
yea
vigorous
feminists
0riaoipykq
qusaxyqqvn
xiwha8000jf7
woot
warehouse
gcsszpjtht
aisle
salute
unflinching
cautiously
lessgovmorefun
fortunate
solider
onlyshaneortega
zn
weakness
sake
e8uwgy4chf
principles
b54cr1000x4q
miy

retarded
barf
emoji
moves
doubletalk
supernatural
marks
nor
kinds
jigisup
libby
dementia
riveting
toaster
watchin
hp
wakes
splitting
headache
qjg1hsvqd1
instance
iammgraham
partially
relevant
absolute
nut
minority
clue
feeling
rose
1828
pull
baiting
arguing
poh5ugb9ut
insulter
sized
lesbians
byv2iifmpr
mammogram
machines
hospital
no1
deport
hmm
serve
socail
ofpk4g0shj
newsmax
industrial
uh
showdown
prepare
profiling
perpetrating
incompetent
closet
cared
international
net
tanned
marvel
unc
function
generation
zsff
completed
6ov5hxhicv
harmon
romantic
rival
summer
kellyfil
signed
groupers
positively
jlfhfvz7st
1ai9cuz8by
enacted
oba
icredssplr
qcmxmu8mlp
fdgxg3vv13
gofundme
trcot
5hkcefp1hf
nn6s7do22j
porous
ygnprhiehv
usfree4life
2straighten
teamdrbencarson
collected
empowering
187000
25000
forgettable
extraordinary
preference
critics
opportunism
inconsistency
0a1w02w7sz
startups
conference
dy8000nz122t
knwmideqty
tgowchxf3x
deserved
nreditorial
stzw
brutal
assclown
lesson
gzdltyumnn
sh

evens
championing
experiments
21st
breath
increasingly
careless
generic
stale
tools
80000
boat
leased
50000
luxury
suv
qtn4mpf3e2
dictating
repo
demagogue
infuses
soriya
organizer
laid
workers
outsourced
nubzumszcj
slogan
mofos
appointed
atty
appt
december
fpftx3nfcu
o3rfecfbkp
noseparationofchurchandstate
hecked
kiazkaknxv
hb86fxzb2a
pad
bonniesamerica
tragic
niloy
ic
fundamentalists
weave
downgrades
brilliantly
8xadp0wwtq
kdeags3iee
smelling
patting
veiw
reflecting
aka
tkeadpd5mf
t5piokhd2z
wubsnet
hoe
sewers
rbk7gmyhce
3unfah5r4a
vtzmlujrqi
ml4uvbruvv
ine
whined
curtisellis
stooges
properly
laundromat
grandma
backwards
peaceful
erase
pdi5czn
sleeps
intern
ch
575
56
housecracka
foes
walking
agn0zxchx9
collapse
terminate
crippling
sanctions
allies
govwal
jsc1835
achievement
dxrmdwqvbe
physical
denial
jessicavalenti
coif
hijack
despises
santoru
cwyuul8jkn
trumpmisogonist
hybrid
assholery
naivete
derpness
idiocy
y6mudmr66p
meantweets
extortion
kiplin
kfakvv5tkw
downs
individually
paste


mrchuckd
afrofactz
nicolassarkozy
responding
suggested
nss1u3salh
e3hjmpoezk
sdvtoogm7a
indies
observers
mildness
aggregate
0spghg1dvn
moibjspuol
klazc
tapatiocosteno
payasoorpresident
schiavo
bcqso1az29
tbt
pz96mmlt9t
nurtured
gopbuiltthat
vokolq52xp
cuv3af58ua
andy
devine
xyy8fxpjcd
jalisco
sinaloa
cougar
helicopter
army
launcher
cannabisculture
nhr
probability
fvb6nrmjbc
egu2ziueqh
jylevqszrr
beertweets
rchxaxikpm
9ebtdrkunk
retrial
fla
added
3m
housing
bubble
driver
flips
realizes
overtake
alright
yol6hrkhqv
undemocratic
wewantdebate
nsd1bebdoa
scratch
whopping
moreush
warming
5e7936xkou
ealq1ylv7y
socialistsdemocrats
ilnovju2a9
marthamccallum
georgewill
santo
kushandalcohol
pywbcnrkn8
hurry
bkls8c0tzl
tardiness
cameron
british
leftie
bizarre
stalking
repsanfordsc
nielslesniewski
46jhssbv5c
zac
dbongino
wbalradio
wywlcgtczr
ome5bnryjm
uniteblue2016
dmhzlkvobs
holadonald
jjsn7000it37
mzds5vt9zg
ni5c0olntf
choosing
entertain
riz6aungru
kylwwoj8oq
cpl
resources
foundations
waiting4pal

In [35]:
logdir = "logs/scalars/" + datetime.time().strftime("%Y%m%d-%H%M%S")
tensorboard_callback = TensorBoard(log_dir=logdir, histogram_freq = 10, embeddings_freq = 10, embeddings_layer_names = ['features'], embeddings_data=data_1_te, embeddings_metadata='tensorboard/metadata.tsv')

In [36]:
########################################
## train the model
########################################
early_stopping = EarlyStopping(monitor='val_loss', patience=50)
bst_model_path = 'best_model' + '.h5'
model_checkpoint = ModelCheckpoint(bst_model_path, save_best_only=True, save_weights_only=True)

hist = model.fit(data_1_tr, labels_tr, \
        validation_data=(data_1_val, labels_val), \
        epochs=200, batch_size=30, shuffle=True, \
                 callbacks=[early_stopping, model_checkpoint, tensorboard_callback])


Train on 6519 samples, validate on 1386 samples
Epoch 1/200
Epoch 2/200
Epoch 3/200
Epoch 4/200
Epoch 5/200
Epoch 6/200
Epoch 7/200
Epoch 8/200
Epoch 9/200
Epoch 10/200
Epoch 11/200
Epoch 12/200
Epoch 13/200
Epoch 14/200
Epoch 15/200
Epoch 16/200
Epoch 17/200
Epoch 18/200
Epoch 19/200
Epoch 20/200
Epoch 21/200
Epoch 22/200
Epoch 23/200
Epoch 24/200
Epoch 25/200
Epoch 26/200
Epoch 27/200
Epoch 28/200
Epoch 29/200
Epoch 30/200
Epoch 31/200
Epoch 32/200
Epoch 33/200
Epoch 34/200
Epoch 35/200
Epoch 36/200
Epoch 37/200
Epoch 38/200
Epoch 39/200
Epoch 40/200
Epoch 41/200
Epoch 42/200
Epoch 43/200
Epoch 44/200
Epoch 45/200
Epoch 46/200
Epoch 47/200
Epoch 48/200
Epoch 49/200
Epoch 50/200
Epoch 51/200


W0904 20:22:39.966748 4387050944 deprecation.py:323] From /Users/omer/anaconda3/lib/python3.7/site-packages/tensorflow/python/training/saver.py:960: remove_checkpoint (from tensorflow.python.training.checkpoint_management) is deprecated and will be removed in a future version.
Instructions for updating:
Use standard file APIs to delete files with this prefix.


Epoch 52/200
Epoch 53/200
Epoch 54/200
Epoch 55/200
Epoch 56/200
Epoch 57/200
Epoch 58/200
Epoch 59/200
Epoch 60/200
Epoch 61/200
Epoch 62/200
Epoch 63/200
Epoch 64/200
Epoch 65/200
Epoch 66/200
Epoch 67/200


In [None]:
%tensorboard --logdir logs/scalars