# Recurrent neural network (RNN) for sentiment analysis

### Problems of using CNN for NLP tasks?

![cnn-senti.png](attachments/cnn-senti.png)


### RNN is good at processing sequences of variable length

![rnn-unfold.png](attachments/rnn-unfold.png)


### Architecture of RNN for sentiment analysis

![rnn-sent.png](attachments/rnn-sent.png)

### Prepare text data

The imdb dataset is used here again for sentiment analysis.

In [2]:
from __future__ import print_function
from keras.preprocessing import sequence
from keras.datasets import imdb


max_features = 2000
maxlen = 80  # cut texts after this number of words (among top max_features most common words)
batch_size = 32
hidden_size = 64
embedding_size = 32

(x_train, y_train), (x_test, y_test) = imdb.load_data(
    num_words=max_features)
print(len(x_train), 'train sequences')
print(len(x_test), 'test sequences')

x_train = sequence.pad_sequences(x_train, maxlen=maxlen)
x_test = sequence.pad_sequences(x_test, maxlen=maxlen)
print("x_train shape: ", x_train.shape)

25000 train sequences
25000 test sequences
x_train shape:  (25000, 80)


### Build model 

* Each word is represented by a vector of length max_features by the embedding layer, and then fed into the RNN unit.
* The hidden feature of the RNN unit is then fed into a linear feature transformation layer (Dense), which generates a single value reprenting the score for being postive.
* The final activation layer normalize the score to be within (0, 1).

In [3]:
from keras.models import Sequential
from keras.layers import Dense, Embedding
from keras.layers import SimpleRNN

model = Sequential()
model.add(Embedding(max_features, embedding_size, 
                    mask_zero=True, input_length=maxlen))
model.add(SimpleRNN(hidden_size, dropout=0.2, recurrent_dropout=0.2))
model.add(Dense(1, activation='sigmoid'))
model.summary()

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
embedding_1 (Embedding)      (None, 80, 32)            64000     
_________________________________________________________________
simple_rnn_1 (SimpleRNN)     (None, 64)                6208      
_________________________________________________________________
dense_1 (Dense)              (None, 1)                 65        
Total params: 70,273
Trainable params: 70,273
Non-trainable params: 0
_________________________________________________________________


### Training 

* loss function: binary_crossentropy is $-tlogp - (1-t)log(1-p)$, where t=1 if the true label is positive; otherwise 0.
* optimizer (SGD): Adam
* metric: Accuracy = num of correct prediction / num of total samples

In [5]:
from keras.optimizers import RMSprop
# try using different optimizers and different optimizer configs
optimizer = RMSprop(lr=0.01, clipnorm=5.)
model.compile(loss='binary_crossentropy',
              optimizer=optimizer,
              metrics=['accuracy'])

print('Train...')
model.fit(x_train, y_train, batch_size=batch_size, 
          epochs=10, validation_data=(x_test, y_test))
model.save_weights('ckpt/simplernn.h5')

Train...
Train on 25000 samples, validate on 25000 samples
Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


### Prediction

## RNN Unit

![vanilla-rnn.png](attachments/vanilla-rnn.png)

$$h_t = f(h_{t-1}, x_t|\Theta)$$

* Vanilla RNN
* LSTM
* GRU

## Vanilla RNN

$$h_t = a(h_{t-1}U+x_tW+b)$$
$$x_t \in R^{d_x}, h_t\in R^{d_h}, U\in R^{d_h \times d_h}, W\in R^{d_x \times d_h}$$
a() = tanh, ReLu, Sigmoid

In [3]:
import keras
class VanillaRNNCell(keras.layers.Layer):

    def __init__(self, units, **kwargs):
        self.units = units
        self.state_size = units
        super(VanillaRNNCell, self).__init__(**kwargs)

    def build(self, input_shape):
        """Create weight matrix"""
        self.W = self.add_weight(shape=(input_shape[-1], self.units), initializer='uniform', name='kernel')
        self.U = self.add_weight(shape=(self.units, self.units), initializer='uniform', name='recurrent_kernel')
        self.built = True

    def call(self, inputs, states):
        """Called per position/timestep for a batch of data"""
        prev_output = states[0]
        h = K.dot(inputs, self.W)
        output = h + K.dot(prev_output, self.U)
        # can also add dropout here
        return output, [output]

In [4]:
from keras.layers import RNN, Embedding, Dense
from keras import Sequential
from keras import backend as K

model = Sequential()
model.add(Embedding(max_features, embedding_size, 
                    mask_zero=True, input_length=maxlen))
model.add(RNN(VanillaRNNCell(hidden_size)))
model.add(Dense(1, activation='sigmoid'))

# try using different optimizers and different optimizer configs
model.compile(loss='binary_crossentropy', 
              optimizer='adam', metrics=['accuracy'])

print('Train...')
model.fit(x_train, y_train, batch_size=batch_size, 
          epochs=10, validation_data=(x_test, y_test))
model.save_weights('ckpt/vanilla.h5')

Train...
Train on 25000 samples, validate on 25000 samples
Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


## LSTM

![lstm1.png](attachments/lstm1.png)
![lstm2.png](attachments/lstm2.png)
![lstm3.png](attachments/lstm3.png)
![lstm4.png](attachments/lstm4.png)

**Gate + States**
![lstm5.png](attachments/lstm5.png)

In [7]:
from keras.layers import LSTM, Embedding, Dense
from keras.models import Sequential
print('Build model...')
model = Sequential()
model.add(Embedding(max_features, embedding_size, 
                    mask_zero=True, input_length=maxlen))
model.add(LSTM(hidden_size, dropout=0.2, recurrent_dropout=0.2))
model.add(Dense(1, activation='sigmoid'))

# try using different optimizers and different optimizer configs
model.compile(loss='binary_crossentropy', 
              optimizer='adam', metrics=['accuracy'])

print('Train...')
model.fit(x_train, y_train,
          batch_size=batch_size,
          epochs=10,
          validation_data=(x_test, y_test))
model.save_weights('ckpt/lstm.h5')
          

Build model...
Train...
Train on 25000 samples, validate on 25000 samples
Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10

IOPub message rate exceeded.
The notebook server will temporarily stop sending output
to the client in order to avoid crashing it.
To change this limit, set the config variable
`--NotebookApp.iopub_msg_rate_limit`.

Current values:
NotebookApp.iopub_msg_rate_limit=1000.0 (msgs/sec)
NotebookApp.rate_limit_window=3.0 (secs)



Epoch 10/10


## GRU

![gru.png](attachments/gru.png)



In [8]:
from keras.layers import GRU


model = Sequential()
model.add(Embedding(max_features, embedding_size, 
                    mask_zero=True, input_length=maxlen))
model.add(GRU(hidden_size, dropout=0.2, recurrent_dropout=0.2))
model.add(Dense(1, activation='sigmoid'))

# try using different optimizers and different optimizer configs
model.compile(loss='binary_crossentropy', optimizer='adam', 
              metrics=['accuracy'])

print('Train...')
model.fit(x_train, y_train, batch_size=batch_size, epochs=2, 
          validation_data=(x_test, y_test))
model.save_weights('ckpt/gru.h5')

Train...
Train on 25000 samples, validate on 25000 samples
Epoch 1/2
Epoch 2/2


## RNN training tricks

* Adaptive learning rate
* E.g. Adam, RMSProp
* Normalizing the losses
* Use gated RNN units
* LSTM or GRU (not introduced yet)
* Stack multiple RNN layers
![rnn-stacks.png](attachments/rnn-stacks.png)
![bptt.png](attachments/bptt.png)

# RNN for language modelling

Lauage modelling is to generate words/sentences

$$P(w_n|w_{n-1}, w_{n-2}, ..., w_1)$$

Applicaitons including
* Machine translation
* Question answering
* Image caption generation


## CharRNN

To generate sentences automatically, e.g. for papers, novels, code, etc.

![rnn-ti.png](attachments/rnn-ti.png)

In [6]:
from keras.utils.data_utils import get_file
import numpy as np
import random
import sys

Downlaod the text file

In [8]:
path = get_file('nietzsche.txt', 
                origin='https://s3.amazonaws.com/text-datasets/nietzsche.txt')
text = open(path, encoding="utf-8").read().lower()  # encoding="utf-8")
print('corpus length:', len(text))

corpus length: 600893


Create char to index and index to char mappings

In [10]:
chars = sorted(list(set(text)))
print('total chars:', len(chars))
print(chars)
char_indices = dict((c, i) for i, c in enumerate(chars))
indices_char = dict((i, c) for i, c in enumerate(chars))

total chars: 57
['\n', ' ', '!', '"', "'", '(', ')', ',', '-', '.', '0', '1', '2', '3', '4', '5', '6', '7', '8', '9', ':', ';', '=', '?', '[', ']', '_', 'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z', 'ä', 'æ', 'é', 'ë']


Convert chars into batches of one-hot representation.

In [11]:
# cut the text in semi-redundant sequences of maxlen characters
maxlen = 40
step = 3
sentences = []
next_chars = []
for i in range(0, len(text) - maxlen, step):
    sentences.append(text[i: i + maxlen])
    next_chars.append(text[i + maxlen])
print('nb sequences:', len(sentences))

print('Vectorization...')
x = np.zeros((len(sentences), maxlen, len(chars)), dtype=np.bool)
y = np.zeros((len(sentences), len(chars)), dtype=np.bool)
for i, sentence in enumerate(sentences):
    for t, char in enumerate(sentence):
        x[i, t, char_indices[char]] = 1
    y[i, char_indices[next_chars[i]]] = 1 
print("x shape:", x.shape)
print("y shape:", y.shape)

nb sequences: 200285
Vectorization...
x shape: (200285, 40, 57)
y shape: (200285, 57)


In [13]:
from keras.optimizers import RMSprop
from keras.layers import Activation
from keras.layers.recurrent import LSTM

# build the model: a single LSTM
model = Sequential()
model.add(LSTM(128, input_shape=(maxlen, len(chars))))
model.add(Dense(len(chars)))
#model.add(Activation('softmax'))

optimizer = RMSprop(lr=0.01, clipnorm=5.)
model.compile(loss='categorical_crossentropy', optimizer=optimizer)

In [14]:
def sample(preds, temperature=1.0):
    # helper function to sample an index from a probability array
    preds = np.asarray(preds).astype('float64')
    preds = np.log(preds) / temperature
    exp_preds = np.exp(preds)
    preds = exp_preds / np.sum(exp_preds)
    probas = np.random.multinomial(1, preds, 1)
    return np.argmax(probas)

# train the model, output generated text after each iteration
for iteration in range(1, 10):
    print()
    print('-' * 50)
    print('Iteration', iteration)
    model.fit(x[0:x.shape[0]//10], y[0:x.shape[0]//10],
              batch_size=128,
              epochs=1)

    start_index = random.randint(0, len(text) - maxlen - 1)

    for diversity in [0.2, 0.5, 1.0, 1.2]:
        print()
        print('----- diversity:', diversity)

        generated = ''
        sentence = text[start_index: start_index + maxlen]
        generated += sentence
        print('----- Generating with seed: "' + sentence + '"')
        sys.stdout.write(generated)

        for i in range(400):
            x_pred = np.zeros((1, maxlen, len(chars)))
            for t, char in enumerate(sentence):
                x_pred[0, t, char_indices[char]] = 1.

            preds = model.predict(x_pred, verbose=0)[0]
            next_index = sample(preds, diversity)
            next_char = indices_char[next_index]

            generated += next_char
            sentence = sentence[1:] + next_char

            sys.stdout.write(next_char)
            sys.stdout.flush()
print()
model.save_weights('ckpt/charnn.h5')


--------------------------------------------------
Iteration 1
Epoch 1/1

----- diversity: 0.2
----- Generating with seed: "l and witness of every act, every moment"
l and witness of every act, every moment the most the self--and the condince of the condince of the such and the seem to the such a the self--in the such a strended to the self the condince to the most and such a nother the self--and the such a stance the most the condince of the most the most the condition of the constince the most of the contrance to the conture of the self--in the serves of the excerture to the such a so the men the 
----- diversity: 0.5
----- Generating with seed: "l and witness of every act, every moment"
l and witness of every act, every moment to fired to the would the constance of self--and points not the excectus in mean and has one with the regreated, and as with a fine the conception of the matter of they every for the surt decestion, in the exception to be the contruent of the such and in the 



nd usced of 
----- diversity: 1.2
----- Generating with seed: "ng he himself must perhaps have been cri"
ng he himself must perhaps have been crieate won life
langinging to we kiln as entive understand.agrve and and inderious for thinald behool. iazfrorave pist clumons. it sufey thet rate least, even a faludial to vove despaw
cress' have cruelting that mar. europ! wath compless of great appearance iftacthe possenking
asting and it stugung of
our domant that constlect and stroyen
imprevailive whict rellgypport?

"le thee sympathy, science w


## Seq2seq model

For machine translation

![seq2seq.png](attachments/seq2seq.png)

Download the [data](http://www.manythings.org/anki/fra-eng.zip) and extract the file into `data/fra-eng/fra.txt`.

In [15]:
%load_ext autoreload
%autoreload 2 

from keras.models import Model
from keras.layers import Input, LSTM, Dense
import fraeng

batch_size = 64  # Batch size for training.
epochs = 3  # Number of epochs to train for.
latent_dim = 256  # Latent dimensionality of the encoding space.
num_samples = 10000  # Number of samples to train on.

(encoder_input_data, decoder_input_data, decoder_target_data, 
 input_token_index, target_token_index, input_texts) = fraeng.load_data(num_samples)

print("encoder_input_data shape: ", encoder_input_data.shape)
print("decoder_input_data shape: ", decoder_input_data.shape)
print("decoder_target_data shape: ", decoder_target_data.shape)
print("input text sample:", input_texts[0:3])

Number of samples: 10000
Number of unique input tokens: 71
Number of unique output tokens: 93
Max sequence length for inputs: 16
Max sequence length for outputs: 59
encoder_input_data shape:  (10000, 16, 71)
decoder_input_data shape:  (10000, 59, 93)
decoder_target_data shape:  (10000, 59, 93)
input text sample: ['Go.', 'Run!', 'Run!']


In [17]:
num_encoder_tokens = len(input_token_index)
num_decoder_tokens = len(target_token_index)
# Define an input sequence and process it.
encoder_inputs = Input(shape=(None, num_encoder_tokens))
encoder = LSTM(latent_dim, return_state=True)
encoder_outputs, state_h, state_c = encoder(encoder_inputs)
# We discard `encoder_outputs` and only keep the states.
encoder_states = [state_h, state_c]

# Set up the decoder, using `encoder_states` as initial state.
decoder_inputs = Input(shape=(None, num_decoder_tokens))
# We set up our decoder to return full output sequences,
# and to return internal states as well. We don't use the
# return states in the training model, but we will use them in inference.
decoder_lstm = LSTM(latent_dim, return_sequences=True, return_state=True)
decoder_outputs, _, _ = decoder_lstm(decoder_inputs,
                                     initial_state=encoder_states)
decoder_dense = Dense(num_decoder_tokens, activation='softmax')
decoder_outputs = decoder_dense(decoder_outputs)

# Define the model that will turn
# `encoder_input_data` & `decoder_input_data` into `decoder_target_data`
model = Model([encoder_inputs, decoder_inputs], decoder_outputs)

In [18]:
# Run training
model.compile(optimizer='rmsprop', loss='categorical_crossentropy')
model.fit([encoder_input_data, decoder_input_data], 
          decoder_target_data,
          batch_size=batch_size,
          epochs=epochs,
          validation_split=0.2)
# Save model
model.save('ckpt/s2s.h5')

Train on 8000 samples, validate on 2000 samples
Epoch 1/3
Epoch 2/3
Epoch 3/3


  str(node.arguments) + '. They will not be included '


In [19]:
# Next: inference mode (sampling).
# Here's the drill:
# 1) encode input and retrieve initial decoder state
# 2) run one step of decoder with this initial state
# and a "start of sequence" token as target.
# Output will be the next target token
# 3) Repeat with the current target token and current states
import numpy as np
# Define sampling models
encoder_model = Model(encoder_inputs, encoder_states)

decoder_state_input_h = Input(shape=(latent_dim,))
decoder_state_input_c = Input(shape=(latent_dim,))
decoder_states_inputs = [decoder_state_input_h, decoder_state_input_c]
decoder_outputs, state_h, state_c = decoder_lstm(decoder_inputs, initial_state=decoder_states_inputs)
decoder_states = [state_h, state_c]
decoder_outputs = decoder_dense(decoder_outputs)
decoder_model = Model([decoder_inputs] + decoder_states_inputs,
                      [decoder_outputs] + decoder_states)

# Reverse-lookup token index to decode sequences back to
# something readable.
reverse_input_char_index = dict((i, char) for char, 
                                i in input_token_index.items())
reverse_target_char_index = dict((i, char) for char, 
                                 i in target_token_index.items())
max_decoder_seq_length = decoder_input_data.shape[1]

def decode_sequence(input_seq):
    # Encode the input as state vectors.
    states_value = encoder_model.predict(input_seq)

    # Generate empty target sequence of length 1.
    target_seq = np.zeros((1, 1, num_decoder_tokens))
    # Populate the first character of target sequence with the start character.
    target_seq[0, 0, target_token_index['\t']] = 1.

    # Sampling loop for a batch of sequences
    # (to simplify, here we assume a batch of size 1).
    stop_condition = False
    decoded_sentence = ''
    while not stop_condition:
        output_tokens, h, c = decoder_model.predict([target_seq] + states_value)

        # Sample a token
        sampled_token_index = np.argmax(output_tokens[0, -1, :])
        sampled_char = reverse_target_char_index[sampled_token_index]
        decoded_sentence += sampled_char

        # Exit condition: either hit max length
        # or find stop character.
        if (sampled_char == '\n' or 
            len(decoded_sentence) > max_decoder_seq_length):
            stop_condition = True

        # Update the target sequence (of length 1).
        target_seq = np.zeros((1, 1, num_decoder_tokens))
        target_seq[0, 0, sampled_token_index] = 1.

        # Update states
        states_value = [h, c]

    return decoded_sentence


for seq_index in range(100):
    # Take one sequence (part of the training test)
    # for trying out decoding.
    input_seq = encoder_input_data[seq_index: seq_index + 1]
    decoded_sentence = decode_sequence(input_seq)
    print('-')
    print('Input sentence:', input_texts[seq_index])
    print('Decoded sentence:', decoded_sentence)

-
Input sentence: Go.
Decoded sentence: Arentez !

-
Input sentence: Run!
Decoded sentence: Arentez !

-
Input sentence: Run!
Decoded sentence: Arentez !

-
Input sentence: Wow!
Decoded sentence: Arentez !

-
Input sentence: Fire!
Decoded sentence: Arentez !

-
Input sentence: Help!
Decoded sentence: Arentez !

-
Input sentence: Jump.
Decoded sentence: Arerez !

-
Input sentence: Stop!
Decoded sentence: Arentez !

-
Input sentence: Stop!
Decoded sentence: Arentez !

-
Input sentence: Stop!
Decoded sentence: Arentez !

-
Input sentence: Wait!
Decoded sentence: Arentez !

-
Input sentence: Wait!
Decoded sentence: Arentez !

-
Input sentence: I see.
Decoded sentence: Je me suis pas aite.

-
Input sentence: I try.
Decoded sentence: Je me suis pas aite.

-
Input sentence: I won!
Decoded sentence: Je me suis pas aite.

-
Input sentence: I won!
Decoded sentence: Je me suis pas aite.

-
Input sentence: Oh no!
Decoded sentence: Tome te te te paite.

-
Input sentence: Attack!
Decoded sentence: A

# Assingment 3

Tune the training algorithms for all RNN models above.