# Sequence Models - RNNs

We have developed some fairly sophisticated NLP models, but none has made use of the *order* of words, with the partial exception of word embeddings.

What might our models therefore be missing out on?

- structure: grammatical
- negatives and double negatives
- sentence quality
- meaning / context
- source / dialogical context
- linguistic identity
- anaphora (words that point back to the original subject/word) ex. The Queen went to sleep, and **SHE** just woke up.

The idea of an RNN is to _remember_ recent inputs during training. So the network will make use not just of inputs and weights but also of _internal states_, which capture information about previous time-steps.

Recurrent models process data *in sequence*, so they are useful for analyzing:

- time-series data
- text
- drawings

Here's a demo of a drawing-to-'photo' network: [Image-to-Image](https://affinelayer.com/pixsrv/)

In [1]:
from matplotlib import pyplot as plt
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn import metrics
from sklearn.preprocessing import StandardScaler
from keras.models import Sequential
from keras.layers import Dense, LSTM
from keras.utils import to_categorical
from sklearn.feature_extraction.text import TfidfVectorizer

Using TensorFlow backend.


## Reading in the Twitter data

In [2]:
import pandas as pd
tweets = pd.read_csv('Tweets.csv')
tweets.head()

Unnamed: 0,tweet_id,airline_sentiment,airline_sentiment_confidence,negativereason,negativereason_confidence,airline,airline_sentiment_gold,name,negativereason_gold,retweet_count,text,tweet_coord,tweet_created,tweet_location,user_timezone
0,570306133677760513,neutral,1.0,,,Virgin America,,cairdin,,0,@VirginAmerica What @dhepburn said.,,2015-02-24 11:35:52 -0800,,Eastern Time (US & Canada)
1,570301130888122368,positive,0.3486,,0.0,Virgin America,,jnardino,,0,@VirginAmerica plus you've added commercials t...,,2015-02-24 11:15:59 -0800,,Pacific Time (US & Canada)
2,570301083672813571,neutral,0.6837,,,Virgin America,,yvonnalynn,,0,@VirginAmerica I didn't today... Must mean I n...,,2015-02-24 11:15:48 -0800,Lets Play,Central Time (US & Canada)
3,570301031407624196,negative,1.0,Bad Flight,0.7033,Virgin America,,jnardino,,0,@VirginAmerica it's really aggressive to blast...,,2015-02-24 11:15:36 -0800,,Pacific Time (US & Canada)
4,570300817074462722,negative,1.0,Can't Tell,1.0,Virgin America,,jnardino,,0,@VirginAmerica and it's a really big bad thing...,,2015-02-24 11:14:45 -0800,,Pacific Time (US & Canada)


In [3]:
tweets.shape

(14640, 15)

In [5]:
# Let's look at the text of the first tweet.

tweets['text'][0]

'@VirginAmerica What @dhepburn said.'

In [6]:
# We'll use the translate method to eliminate the punctuation
# from our strings.

import string
data = []
for tweet in tweets['text']:
    sentence = tweet.translate(str.maketrans('','',string.punctuation))
    
    data.append(sentence)

In [7]:
# Let's make sure that worked.

data[0]

'VirginAmerica What dhepburn said'

In [8]:
# The TF-IDF Vectorizer will look for words uncommon in the corpus
# as a whole.

tfidf = TfidfVectorizer(stop_words='english')
tfidf.fit(data)

TfidfVectorizer(analyzer='word', binary=False, decode_error='strict',
        dtype=<class 'numpy.float64'>, encoding='utf-8', input='content',
        lowercase=True, max_df=1.0, max_features=None, min_df=1,
        ngram_range=(1, 1), norm='l2', preprocessor=None, smooth_idf=True,
        stop_words='english', strip_accents=None, sublinear_tf=False,
        token_pattern='(?u)\\b\\w\\w+\\b', tokenizer=None, use_idf=True,
        vocabulary=None)

In [10]:
# We'll transform the data, turn it into a dense matrix, and
# then into a DataFrame!

twitter_df = pd.DataFrame(tfidf.transform(data).todense(),
                          columns=tfidf.get_feature_names())

twitter_df.head()

Unnamed: 0,00,0011,0016,006,0162389030167,0162424965446,0162431184663,0167560070877,0214,021mbps,...,zkatcher,zombie,zone,zones,zoom,zrh,zrhairport,zukes,zurich,zurichnew
0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [11]:
# How big is our DataFrame now?

twitter_df.shape

(14640, 16349)

In [12]:
# Examining our target

tweets['airline_sentiment'].value_counts()

negative    9178
neutral     3099
positive    2363
Name: airline_sentiment, dtype: int64

In [13]:
# Coding up our target

sent_dict = {'negative': 0, 'neutral': 1, 'positive': 2}

tweets['airline_sentiment'] = tweets['airline_sentiment'].map(sent_dict)

In [15]:
# Defining X and y

X = twitter_df
y = tweets['airline_sentiment']

In [16]:
# Train-Test Splitting

X_train, X_test, y_train, y_test = train_test_split(X, y, stratify=y, random_state=42)

In [17]:
# Categoricalizing our target

y_train_c = to_categorical(y_train)
y_test_c = to_categorical(y_test)

In [19]:
# Building a dense NN

model = Sequential()

inputs = X_train.shape[1]

model.add(Dense(inputs, activation='relu')) # Input
model.add(Dense(100, activation='relu'))    # Hidden Layer
model.add(Dense(3, activation='softmax'))    # Output

In [20]:
# Compiling

model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['acc'])

Instructions for updating:
Colocations handled automatically by placer.


In [21]:
# Recording the history of fit

history_log = model.fit(np.asarray(X_train), y_train_c,
                        validation_data=(np.asarray(X_test), y_test_c),
                        epochs=5, batch_size=2000)

# TAKES A LONG TIME! Interrupted Kernel.

Instructions for updating:
Use tf.cast instead.
Train on 10980 samples, validate on 3660 samples
Epoch 1/5
Epoch 2/5
 2000/10980 [====>.........................] - ETA: 3:55 - loss: 0.7617 - acc: 0.6280

KeyboardInterrupt: 

## This time with an LSTM

Long short-term memory network: Architecturally speaking, the networks have *forget* gates in addition to input and output gates. These forget gates control the extent to which previous states affect future states.

Because of this capacity for memory, LSTMs (and RNNs in general) have been compared to [Turing  machines](https://medium.com/@Lordunlocked/turing-machines-what-are-they-and-why-you-should-care-aaf030c37d40), and there is some debates about whether it makes sense to think of RNNs as Turing-complete.

In [22]:
# LSTMs can take a long time to set up and to train; this
# is why I'm only using the first 1000 tweets.

data = []
for tweet in tweets['text'][:1000]:
    sentence = tweet.translate(str.maketrans('','',string.punctuation))
    
    data.append(sentence)

In [23]:
# TF-IDF

tfidf = TfidfVectorizer(stop_words='english')
tfidf.fit(data)

TfidfVectorizer(analyzer='word', binary=False, decode_error='strict',
        dtype=<class 'numpy.float64'>, encoding='utf-8', input='content',
        lowercase=True, max_df=1.0, max_features=None, min_df=1,
        ngram_range=(1, 1), norm='l2', preprocessor=None, smooth_idf=True,
        stop_words='english', strip_accents=None, sublinear_tf=False,
        token_pattern='(?u)\\b\\w\\w+\\b', tokenizer=None, use_idf=True,
        vocabulary=None)

In [24]:
# DataFrame

twitter_df = pd.DataFrame(tfidf.transform(data).todense(),
                          columns=tfidf.get_feature_names())


In [25]:
# X and y

X = twitter_df
y = tweets['airline_sentiment'][:1000]

In [26]:
# Split

X_train, X_test, y_train, y_test = train_test_split(X, y, stratify=y, random_state=42)

In [27]:
# Categoricalize

y_train_c = to_categorical(y_train)
y_test_c = to_categorical(y_test)

In [28]:
# 3-d-ing for the LSTM

X_train_r = np.reshape(np.asarray(X_train), (X_train.shape[0], 1, X_train.shape[1]))

X_test_r = np.reshape(np.asarray(X_test), (X_test.shape[0], 1, X_test.shape[1]))

y_train_r = np.reshape(np.asarray(y_train_c), (y_train_c.shape[0], 1, y_train_c.shape[1]))

y_test_r = np.reshape(np.asarray(y_test_c), (y_test_c.shape[0], 1, y_test_c.shape[1]))

In [29]:
# Model build

rnn = Sequential()

inputs = X_train_r.shape[2]

rnn.add(LSTM(inputs, input_shape=(1, inputs), return_sequences=True))  # Input
rnn.add(LSTM(200, return_sequences=True))  # Hidden Layer
rnn.add(LSTM(30, return_sequences=True))   # Hidden Layer
rnn.add(Dense(3, activation='softmax'))    # Output

In [30]:
# Compile

rnn.compile(optimizer='rmsprop', loss='categorical_crossentropy', metrics=['acc'])


In [31]:
# Fitting our model

rnn.fit(X_train_r, y_train_r, validation_data=(X_test_r, y_test_r), epochs=5)

Instructions for updating:
Deprecated in favor of operator or tf.math.divide.
Train on 750 samples, validate on 250 samples
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


<keras.callbacks.History at 0x1a85370550>