<a href="https://colab.research.google.com/github/rahiakela/natural-language-processing-in-action/blob/8-loopy-recurrent-neural-networks/recurrent_neural_network_with_keras.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Recurrent Neural Network with Keras

First, you load the dataset, grab the labels, and shuffle the examples. Then you
tokenize it and vectorize it again using the Google Word2vec model. Next, you grab the labels. And finally you split it 80/20 into the training and test sets.

## Setup

In [1]:
from __future__ import absolute_import, division, print_function, unicode_literals

try:
  # %tensorflow_version only exists in Colab.
  %tensorflow_version 2.x
except Exception:
  pass
import tensorflow as tf
from tensorflow import keras
import numpy as np
import pandas as pd

import matplotlib.pyplot as plt

from tensorflow.keras import backend as keras_backend
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Dropout, Activation, Flatten, SimpleRNN
from tensorflow.keras.preprocessing import sequence

import os
import tarfile
import re
import tqdm

import glob
from random import shuffle
from nltk.tokenize import TreebankWordTokenizer

import requests

TensorFlow 2.x selected.


In [2]:
! pip install pugnlp

Collecting pugnlp
[?25l  Downloading https://files.pythonhosted.org/packages/a3/c6/17a0ef5af34e20b595f1983be0468efa9f903a17a5af2a0a980ecaf9c411/pugnlp-0.2.5-py2.py3-none-any.whl (706kB)
[K     |████████████████████████████████| 716kB 4.7MB/s 
Collecting python-Levenshtein
[?25l  Downloading https://files.pythonhosted.org/packages/42/a9/d1785c85ebf9b7dfacd08938dd028209c34a0ea3b1bcdb895208bd40a67d/python-Levenshtein-0.12.0.tar.gz (48kB)
[K     |████████████████████████████████| 51kB 6.8MB/s 
Collecting pypandoc
  Downloading https://files.pythonhosted.org/packages/71/81/00184643e5a10a456b4118fc12c96780823adb8ed974eb2289f29703b29b/pypandoc-1.4.tar.gz
Collecting fuzzywuzzy
  Downloading https://files.pythonhosted.org/packages/d8/f1/5a267addb30ab7eaa1beab2b9323073815da4551076554ecc890a3595ec9/fuzzywuzzy-0.17.0-py2.py3-none-any.whl
Building wheels for collected packages: python-Levenshtein, pypandoc
  Building wheel for python-Levenshtein (setup.py) ... [?25l[?25hdone
  Created wheel f

In [0]:
from pugnlp.futil import path_status, find_files

## Data Preparation

Each data point is prelabeled with a 0 (negative sentiment) or a 1 (positive sentiment).you’re going to swap out their example IMDB movie review dataset
for one in raw text, so you can get your hands dirty with the preprocessing of the text as well. And then you’ll see if you can use this trained network to classify text it has never seen before.

### Downloading data

In [0]:
BIG_URLS = {
    'w2v': ('https://www.dropbox.com/s/965dir4dje0hfi4/GoogleNews-vectors-negative300.bin.gz?dl=1', 1647046227),
    'slang': ('https://www.dropbox.com/s/43c22018fbfzypd/slang.csv.gz?dl=1', 117633024),
    'tweets': ('https://www.dropbox.com/s/5gpb43c494mc8p0/tweets.csv.gz?dl=1', 311725313),
    'lsa_tweets': ('https://www.dropbox.com/s/rpjt0d060t4n1mr/lsa_tweets_5589798_2003588x200.tar.gz?dl=1', 3112841563),  # 3112841312
    'imdb': ('https://www.dropbox.com/s/yviic64qv84x73j/aclImdb_v1.tar.gz?dl=1', 3112841563),  # 3112841312
}

In [0]:
# These functions are part of the nlpia package which can be pip installed and run from there.
def dropbox_basename(url):
    filename = os.path.basename(url)
    match = re.findall(r'\?dl=[0-9]$', filename)
    if match:
        return filename[:-len(match[0])]
    return filename

def download_file(url, data_path='.', filename=None, size=None, chunk_size=4096, verbose=True):
    """Uses stream=True and a reasonable chunk size to be able to download large (GB) files over https"""
    if filename is None:
        filename = dropbox_basename(url)
    file_path = os.path.join(data_path, filename)
    if url.endswith('?dl=0'):
        url = url[:-1] + '1'  # noninteractive download
    if verbose:
        tqdm_prog = tqdm
        print('requesting URL: {}'.format(url))
    else:
        tqdm_prog = no_tqdm
    r = requests.get(url, stream=True, allow_redirects=True)
    size = r.headers.get('Content-Length', None) if size is None else size
    print('remote size: {}'.format(size))

    stat = path_status(file_path)
    print('local size: {}'.format(stat.get('size', None)))
    if stat['type'] == 'file' and stat['size'] == size:  # TODO: check md5 or get the right size of remote file
        r.close()
        return file_path

    print('Downloading to {}'.format(file_path))

    with open(file_path, 'wb') as f:
        for chunk in r.iter_content(chunk_size=chunk_size):
            if chunk:  # filter out keep-alive chunks
                f.write(chunk)

    r.close()
    return file_path

def untar(fname):
    if fname.endswith("tar.gz"):
        with tarfile.open(fname) as tf:
            tf.extractall()
    else:
        print("Not a tar.gz file: {}".format(fname))

In [6]:
download_file(BIG_URLS['w2v'][0])

requesting URL: https://www.dropbox.com/s/965dir4dje0hfi4/GoogleNews-vectors-negative300.bin.gz?dl=1
remote size: 1647046227
local size: None
Downloading to ./GoogleNews-vectors-negative300.bin.gz


'./GoogleNews-vectors-negative300.bin.gz'

In [7]:
untar(download_file(BIG_URLS['imdb'][0]))

requesting URL: https://www.dropbox.com/s/yviic64qv84x73j/aclImdb_v1.tar.gz?dl=1
remote size: 84125825
local size: None
Downloading to ./aclImdb_v1.tar.gz


### Preprocessing the loaded documents

The reviews in the train folder are broken up into text files in either the pos or neg folders. You’ll first need to read those in Python with their appropriate label and then shuffle the deck so the samples aren’t all positive and then all negative. Training with the sorted labels will skew training toward whatever comes last, especially when you use certain hyperparameters, such as momentum.

In [8]:
import glob
from random import shuffle

def pre_process_data(filepath):
  '''
  This is dependent on your training data source but we will try to generalize it as best as possible.
  '''
  positive_path = os.path.join(filepath, 'pos')
  negative_path = os.path.join(filepath, 'neg')

  pos_label = 1
  neg_label = 0

  dataset = []

  for filename in glob.glob(os.path.join(positive_path, '*.txt')):
    with open(filename, 'r') as f:
      dataset.append((pos_label, f.read()))

  for filename in glob.glob(os.path.join(negative_path, '*.txt')):
    with open(filename, 'r') as f:
      dataset.append((neg_label, f.read()))

  shuffle(dataset)

  return dataset

dataset = pre_process_data('./aclImdb/train')
print(dataset[0])

(1, 'This movie of 370 minutes was aired by the Italian public television during the early seventies. It tells you the myth attributed to Homer of the Journey home of Odysseus after the Troy war. It is an epic story about the ancient Minoan and Mycenaean civilizations, told at list 500 years after those events toke place, around 1100 BC.<br /><br />This is a 1969 movie, so if you buy the DVD version you would find that the sound is just mono and there is no other language than Italian, even the close caption is in Italian. Pity. Many people would enjoy this masterpiece if it had at list the English subtitles. But if this is not a problem for you, than I would strongly recommend to watch this movie.')


### Data tokenization and vectorization

The next step is to tokenize and vectorize the data. You’ll use the Google News pretrained Word2vec vectors, so download those directly from Google.

You’ll use gensim to unpack the vectors, You can
experiment with the limit argument to the load_word2vec_format method; a
higher number will get you more vectors to play with, but memory quickly becomes an issue and return on investment drops quickly in really high values for limit.

In [9]:
from nltk.tokenize import TreebankWordTokenizer
from gensim.models import KeyedVectors

word_vectors = KeyedVectors.load_word2vec_format('GoogleNews-vectors-negative300.bin.gz', binary=True, limit=200000)

def tokenize_and_vectorize(dataset):
  tokenizer = TreebankWordTokenizer()
  vectorized_data = []
  expected = []

  for sample in dataset:
    tokens = tokenizer.tokenize(sample[1])
    sample_vecs = []
    for token in tokens:
      try:
        sample_vecs.append(word_vectors[token])
      except KeyError:
        pass    # No matching token in the Google w2v vocab

    vectorized_data.append(sample_vecs)

  return vectorized_data

  'See the migration notes for details: %s' % _MIGRATION_NOTES_URL


In [10]:
word_vectors['dog']

array([ 5.12695312e-02, -2.23388672e-02, -1.72851562e-01,  1.61132812e-01,
       -8.44726562e-02,  5.73730469e-02,  5.85937500e-02, -8.25195312e-02,
       -1.53808594e-02, -6.34765625e-02,  1.79687500e-01, -4.23828125e-01,
       -2.25830078e-02, -1.66015625e-01, -2.51464844e-02,  1.07421875e-01,
       -1.99218750e-01,  1.59179688e-01, -1.87500000e-01, -1.20117188e-01,
        1.55273438e-01, -9.91210938e-02,  1.42578125e-01, -1.64062500e-01,
       -8.93554688e-02,  2.00195312e-01, -1.49414062e-01,  3.20312500e-01,
        3.28125000e-01,  2.44140625e-02, -9.71679688e-02, -8.20312500e-02,
       -3.63769531e-02, -8.59375000e-02, -9.86328125e-02,  7.78198242e-03,
       -1.34277344e-02,  5.27343750e-02,  1.48437500e-01,  3.33984375e-01,
        1.66015625e-02, -2.12890625e-01, -1.50756836e-02,  5.24902344e-02,
       -1.07421875e-01, -8.88671875e-02,  2.49023438e-01, -7.03125000e-02,
       -1.59912109e-02,  7.56835938e-02, -7.03125000e-02,  1.19140625e-01,
        2.29492188e-01,  

You also need to collect the target values—0 for a negative review, 1 for a positive review—in the same order as the training samples.

In [0]:
def collect_expected(dataset):
  '''Peel of the target values from the dataset'''
  expected = []
  for sample in dataset:
    expected.append(sample[0])
  
  return expected

And then you simply pass your data into those functions:

In [0]:
vectorized_data = tokenize_and_vectorize(dataset)
expected = collect_expected(dataset)

### Train/Test splitting

Next you’ll split the prepared data into a training set and a test set. You’re just going to split your imported dataset 80/20, but this ignores the folder of test data.

In [0]:
split_point = int(len(vectorized_data) * .8)

x_train = vectorized_data[:split_point]
y_train = expected[:split_point]
x_test = vectorized_data[split_point:]
y_test = expected[split_point:]

### Hyper-parameters

The next sets most of the hyperparameters for the net.

In [0]:
maxlen = 400          # holds the maximum review length
batch_size = 32       # How many samples to show the net before backpropagating the error and updating the weights
embedding_dims = 300  # Length of the token vectors you’ll create for passing into the convnet
epochs = 2            # Number of times we will pass the entire training dataset through the network

### Padding and truncating token sequence(sequences of vectors)

Keras has a preprocessing helper method, pad_sequences, that in theory could be
used to pad your input data, but unfortunately it works only with sequences of scalars, and you have sequences of vectors. 

Let’s write a helper function of your own to pad your input data.

In [0]:
def pad_trunc(data, maxlen):
  '''For a given dataset pad with zero vectors or truncate to maxlen'''
  new_data = []

  # Create a vector of 0's the length of our word vectors
  zero_vector = []
  for _ in range(len(data[0][0])):
    zero_vector.append(0.0)
  #zero_vector = [0.0 for _ in range(len(data[0][0]))]

  for sample in data:
    if len(sample) > maxlen:
        temp = sample[:maxlen]
    elif len(sample) < maxlen:
        temp = sample
        additional_elems = maxlen - len(sample)
        for _ in range(additional_elems):
            temp.append(zero_vector)
    else:
        temp = sample
    new_data.append(temp)
  
  return new_data

Then you need to pass your train and test data into the padder/truncator. After that you can convert it to numpy arrays to make Keras happy. This is a tensor with the shape (number of samples, sequence length, word vector length) that you need for your CNN.

In [0]:
x_train = pad_trunc(x_train, maxlen)
x_test = pad_trunc(x_test, maxlen)

In [0]:
x_train = np.reshape(x_train, (len(x_train), maxlen, embedding_dims))
y_train = np.array(y_train)

x_test = np.reshape(x_test, (len(x_test), maxlen, embedding_dims))
y_test = np.array(y_test)

In [0]:
keras_backend.clear_session()

Phew; finally you’re ready to build a neural network.

## Recurrent neural network architecture

Sequential is one of the base classes for neural networks in Keras. From here you can start to layer on the magic.

```python
model = Sequential()
```
And then, as before, the Keras magic handles the complexity of assembling a neural net: you just need to add the recurrent layer you want to your network.

```python
num_neurons = 50
odel.add(SimpleRNN(num_neurons, return_sequences=True,input_shape=(maxlen,embedding_dims)))
```

Now the infrastructure is set up to take each input and pass it into a simple recurrent neural net and for each token, gather
the output into a vector. Because your sequences are 400 tokens long and you’re using 50 hidden neurons, your output from this layer will be a vector 400 elements long. Each of those elements is a vector 50 elements long, with one output for each of the neurons.

Notice here the keyword argument return_sequences. It’s going to tell the network to return the network value at each time step, hence the 400 vectors, each 50 long. If return_sequences was set to False (the Keras default behavior), only a single 50-dimensional vector would be returned.


When using a recurrent neural net, truncating and padding isn’t usually necessary. You can provide training data of varying lengths and unroll the net until you hit the end of the input. Keras will handle this automatically. The catch is that your output of the recurrent layer will vary from time step to time step with the input. A four-token input will output a sequence four elements long. A 100-token sequence will produce a sequence of 100 elements. If you need to pass this into another layer, one that expects a uniform input, it won’t work. But there are cases where that’s acceptable, and even preferred. But back to your classifier.


```python
model.add(Dropout(.2))

model.add(Flatten())
model.add(Dense(1, activation='sigmoid'))
```

You requested that the simple RNN return full sequences, but to prevent overfitting you add a Dropout layer to zero out 20% of those inputs, randomly chosen on each input example. And then finally you add a classifier. In this case, you have one class: “Yes - Positive Sentiment - 1” or “No - Negative Sentiment - 0,” so you chose a layer with one neuron (Dense(1)) and a sigmoid activation function. 

But a Dense layer expects a “flat” vector of n elements (each element a float) as input. And the data coming out of the SimpleRNN is a tensor 400 elements long, and each of those are 50 elements long. But a feedforward network doesn’t care about order of elements as long as you’re consistent with the order. You use the convenience layer, Flatten(), that Keras provides to flatten the input from a 400 x 50 tensor to a vector 20,000 elements long. And that’s what you pass into the final layer that’ll make the classification.

In reality, the Flatten layer is a mapping. That means the error is backpropagated from the last layer back to the appropriate output in the RNN layer and each of those backpropagated errors are then backpropagated through time from the appropriate point in the output.

Passing the “thought vector” produced by the recurrent neural network layer into a feedforward network no longer keeps the order of the input you tried so hard to incorporate. But the important takeaway is to notice that the “learning” related to
sequence of tokens happens in the RNN layer itself; the aggregation of errors via backpropagation through time is encoding that relationship in the network and expressing
it in the “thought vector” itself. Your decision based on the thought vector, via the classifier, is providing feedback to the “quality” of that thought vector with respect to your specific classification problem.

### Putting things together

In [19]:
num_neurons = 50

model = Sequential()
model.add(SimpleRNN(num_neurons, return_sequences=True, input_shape=(maxlen, embedding_dims)))
model.add(Dropout(0.2))

model.add(Flatten())
model.add(Dense(1, activation='sigmoid'))

model.compile(loss='binary_crossentropy', optimizer='rmsprop', metrics=['accuracy'])
model.summary()

Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
simple_rnn (SimpleRNN)       (None, 400, 50)           17550     
_________________________________________________________________
dropout (Dropout)            (None, 400, 50)           0         
_________________________________________________________________
flatten (Flatten)            (None, 20000)             0         
_________________________________________________________________
dense (Dense)                (None, 1)                 20001     
Total params: 37,551
Trainable params: 37,551
Non-trainable params: 0
_________________________________________________________________


In the SimpleRNN layer, you requested 50 neurons. Each of those neurons will
receive input (and apply a weight to) each input sample. In an RNN, the input at each
time step is one token. Your tokens are represented by word vectors in this case, each
300 elements long (300-dimensional). Each neuron will need 300 weights:

50 * 300 = 15,000

Each neuron also has the bias term, which always has an input value of 1 (that’s what
makes it a bias) but has a trainable weight:

15,000 + 50 (bias weights) = 15,050

15,050 weights in the first time step of the first layer. Now each of those 50 neurons
will feed its output into the network’s next time step. Each neuron accepts the full
input vector as well as the full output vector. In the first time step, the feedback from
the output doesn’t exist yet. It’s initiated as a vector of zeros, its length the same as the
length of the output.

Each neuron in the hidden layer now has weights for each token embedding dimension:
that’s 300 weights. It also has 1 bias for each neuron. And you have the 50 weights
for the output results in the previous time step (or zeros for the first t=0 time step).
These 50 weights are the key feedback step in a recurrent neural network. That gives us

300 + 1 + 50 = 351

351 times 50 neurons gives:

351 * 50 = 17,550

17,550 parameters to train. You’re unrolling this net 400 time steps (probably too much given the problems associated with vanishing gradients, but even so, this network turns out to still be effective). But those 17,550 parameters are the same in each of the unrollings, and they remain the same until all the backpropagations have been calculated.
The updates to the weights occur at once at the end of the sequence forward propagation and subsequent backpropagation out to still be effective). But those 17,550 parameters are the same in each of the unrollings, and they remain the same until all the backpropagations have been calculated.
The updates to the weights occur at once at the end of the sequence forward propagation and subsequent backpropagation Although you’re adding complexity to
the backpropagation algorithm, you’re saved by the fact you’re not training a net with
a little over 7 million parameters (17,550 * 400), which is what it would look like if the
unrollings each had their own weight sets.

The final layer in the summary is reporting 20,001 parameters to train, which is relatively straightforward. After the Flatten() layer, the input is a 20,000-dimensional vector plus the one bias input. Because you only have one neuron in the output layer, the total number of parameters is 

(20,000 input elements + 1 bias unit) * 1 neuron = 20,001 parameters

Those numbers can be a little misleading in computational time because there are so many extra steps to backpropagation through time (compared to convolutional neural networks or standard feedforward networks). Computation time shouldn’t be a
deal killer. Recurrent nets’ special talent at memory is the start of a bigger world in NLP or any other sequence data.


### Traing and saving model

OK, now it’s time to actually train that recurrent network that we so carefully assembled
in the previous section. As with your other Keras models, you need to give the
.fit() method your data and tell it how long you want to run training (epochs).

In [20]:
# train the model
model.fit(x_train, y_train, batch_size=batch_size, epochs=epochs, validation_data=(x_test, y_test))

Train on 20000 samples, validate on 5000 samples
Epoch 1/2
Epoch 2/2


<tensorflow.python.keras.callbacks.History at 0x7f8df42ba3c8>

You would like to save the model state after training.
Because you aren’t going to hold the model in memory for now, you can grab its
structure in a JSON file and save the trained weights in another file for later reinstantiation.

In [0]:
model_structure = model.to_json()   # Note that this doesn’t save the weights of the network, only the structure.

# Save your trained model before you lose it!
with open('simple_rnn_model1.json', 'w') as json_file:
  json_file.write(model_structure)
model.save_weights('simple_rnn_weights1.h5')

Now your trained model will be persisted on disk; should it converge, you won’t have to train it again.

## Prediction

Let’s make up a sentence with an obvious negative sentiment and see what the network has to say about it.

In [0]:
# loading model
from tensorflow.keras.models import model_from_json

with open('simple_rnn_model1.json', 'r') as json_file:
  json_string = json_file.read()
model = model_from_json(json_string)

model.load_weights('simple_rnn_weights1.h5')

In [0]:
sample_1 = """
I'm hate that the dismal weather that had me down for so long, when will it break! Ugh, when does happiness return?  
The sun is blinding and the puffy clouds are too thin.  I can't wait for the weekend.
"""

With the model pretrained, testing a new sample is quick. The are still thousands and
thousands of calculations to do, but for each sample you only need one forward pass
and no backpropagation to get a result.

In [27]:
# You pass a dummy value in the first element of the tuple just because
# your helper expects it from the way you processed the initial data.
# That value won’t ever see the network, so it can be anything.
vec_list = tokenize_and_vectorize([(1, sample_1)])

# Tokenize returns a list of the data (length 1 here)
test_vec_list = pad_trunc(vec_list, maxlen)

test_vec = np.reshape(test_vec_list, (len(test_vec_list), maxlen, embedding_dims))
model.predict(test_vec)

array([[0.23188677]], dtype=float32)

In [28]:
model.predict_classes(test_vec)

array([[0]], dtype=int32)

### Build a larger network

In [30]:
num_neurons = 100

model1 = Sequential()
model1.add(SimpleRNN(num_neurons, return_sequences=True, input_shape=(maxlen, embedding_dims)))
model1.add(Dropout(0.2))

model1.add(Flatten())
model1.add(Dense(1, activation='sigmoid'))

model1.compile(loss='binary_crossentropy', optimizer='rmsprop', metrics=['accuracy'])
model1.summary()

Model: "sequential_2"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
simple_rnn_2 (SimpleRNN)     (None, 400, 100)          40100     
_________________________________________________________________
dropout_2 (Dropout)          (None, 400, 100)          0         
_________________________________________________________________
flatten_2 (Flatten)          (None, 40000)             0         
_________________________________________________________________
dense_2 (Dense)              (None, 1)                 40001     
Total params: 80,101
Trainable params: 80,101
Non-trainable params: 0
_________________________________________________________________


Train your larger network

In [31]:
model1.fit(x_train, y_train, batch_size=batch_size, epochs=epochs, validation_data=(x_test, y_test))

Train on 20000 samples, validate on 5000 samples
Epoch 1/2
Epoch 2/2


<tensorflow.python.keras.callbacks.History at 0x7f8ddc3bb3c8>

The validation accuracy of 78.24% is only 0.04% better after we doubled the complexity of our model in one of the layers. This negligible improvement should lead you to think the model (for this network layer) is too complex for the data.

In [0]:
model_structure = model1.to_json()   # Note that this doesn’t save the weights of the network, only the structure.

# Save your trained model before you lose it!
with open('simple_rnn_model2.json', 'w') as json_file:
  json_file.write(model_structure)
model1.save_weights('simple_rnn_weights2.h5')

In [0]:
# loading model
from tensorflow.keras.models import model_from_json

with open('simple_rnn_model2.json', 'r') as json_file:
  json_string = json_file.read()
model = model_from_json(json_string)

model.load_weights('simple_rnn_weights2.h5')

In [0]:
sample_1 = """
I'm hate that the dismal weather that had me down for so long, when will it break! Ugh, when does happiness return?  
The sun is blinding and the puffy clouds are too thin.  I can't wait for the weekend.
"""

In [35]:
vec_list = tokenize_and_vectorize([(1, sample_1)])

# Tokenize returns a list of the data (length 1 here)
test_vec_list = pad_trunc(vec_list, maxlen)

test_vec = np.reshape(test_vec_list, (len(test_vec_list), maxlen, embedding_dims))
model.predict(test_vec)

array([[0.46618623]], dtype=float32)

In [36]:
model.predict_classes(test_vec)

array([[0]], dtype=int32)

If you feel the model is overfitting the training data but you can’t find a way to make
your model simpler, you can always try increasing the Dropout(percentage). This is
a sledgehammer (actually a shotgun) that can mitigate the risk of overfitting while
allowing your model to have as much complexity as it needs to match the data. If you
set the dropout percentage much above 50%, the model starts to have a difficult time
learning. Your learning will slow and validation error will bounce around a lot. But
20% to 50% is a pretty safe range for a lot of NLP problems for recurrent networks.

## Statefulness

Sometimes you want to remember information from one input sample to the next, not just one-time step (token) to the next within a single sample.

Keras provides a keyword argument in the base RNN layer called stateful. It defaults to False. If you flip this to True
when adding the SimpleRNN layer to your model, the last sample’s last output passes
into itself at the next time step along with the first token input, just as it would in the
middle of the sample.

Setting stateful to True can be a good idea when you want to model a large document
that has been split into paragraphs or sentences for processing. And you might even use it to model the meaning of an entire corpus of related documents. But you
wouldn’t want to train a stateful RNN on unrelated documents or passages without
resetting the state of the model between samples.


## Two-way street

So far we’ve discussed relationships between words and what has come before. But
can’t a case be made for flipping those word dependencies?

*They wanted to pet the dog whose fur was brown.*

As you get to the token “fur,” you have encountered “dog” already and know something
about it. But the sentence also contains the information that the dog has fur,
and that the dog’s fur is brown. And that information is relevant to the previous action
of petting and the fact that “they” wanted to do the petting. Perhaps “they” only like to
pet soft, furry brown things and don’t like petting prickly green things like cacti.

Humans read the sentence in one direction but are capable of flitting back to earlier
parts of the text in their brain as new information is revealed. Humans can deal
with information that isn’t presented in the best possible order. It would be nice if you
could allow your model to flit back across the input as well. That is where bidirectional
recurrent neural nets come in.

<img src='https://github.com/rahiakela/img-repo/blob/master/bidirectional-recurrent-neural-net.png?raw=1' width='800'/>

The basic idea is you arrange two RNNs right next to each other, passing the input into one as normal and the same input backward into the other net.

The output of those two are then concatenated at each time step to the related (same input token) time step in the other network. You take the output of the final time step
in the input and concatenate it with the output generated by the same input token at the first time step of the backward net.

**Note**


---

Keras also has a go_backwards keyword argument. If this is set to True, Keras automatically flips the input sequences and inputs them into the network in reverse order. This is the second half of a bidirectional layer.

If you’re not using a bidirectional wrapper, this keyword can be useful, because a recurrent neural network (due to the vanishing gradients problem) is more receptive to data at the end of the sample than at the beginning. 

If you have padded your samples with <PAD> tokens at the end, all the good, juicy stuff is buried deep in the input loop. go_backwards can be a quick way around this problem.

---

Keras added a layer wrapper that will automatically flip
around the necessary inputs and outputs to automatically assemble a bi-directional
RNN for us.



In [42]:
from tensorflow.keras.layers import Bidirectional

num_neurons = 10
maxlen = 100
embedding_dims = 300

model2 = Sequential()
model2.add(Bidirectional(SimpleRNN(num_neurons, return_sequences=True, input_shape=(maxlen, embedding_dims))))
model2.add(Dropout(0.2))

model2.add(Flatten())
model2.add(Dense(1, activation='sigmoid'))

model2.compile(loss='binary_crossentropy', optimizer='rmsprop', metrics=['accuracy'])

model2.fit(x_train, y_train, batch_size=batch_size, epochs=epochs, validation_data=(x_test, y_test))

model2.summary()

Train on 20000 samples, validate on 5000 samples
Epoch 1/2
Epoch 2/2
Model: "sequential_7"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
bidirectional_3 (Bidirection multiple                  6220      
_________________________________________________________________
dropout_6 (Dropout)          multiple                  0         
_________________________________________________________________
flatten_6 (Flatten)          multiple                  0         
_________________________________________________________________
dense_6 (Dense)              multiple                  8001      
Total params: 14,221
Trainable params: 14,221
Non-trainable params: 0
_________________________________________________________________


In [0]:
model_structure = model2.to_json()   # Note that this doesn’t save the weights of the network, only the structure.

# Save your trained model before you lose it!
with open('simple_rnn_model3.json', 'w') as json_file:
  json_file.write(model_structure)
model2.save_weights('simple_rnn_weights3.h5')

In [0]:
# loading model
from tensorflow.keras.models import model_from_json

with open('simple_rnn_model3.json', 'r') as json_file:
  json_string = json_file.read()
model = model_from_json(json_string)

model.load_weights('simple_rnn_weights3.h5')

In [0]:
sample_1 = """
I'm hate that the dismal weather that had me down for so long, when will it break! Ugh, when does happiness return?  
The sun is blinding and the puffy clouds are too thin.  I can't wait for the weekend.
"""

In [0]:
vec_list = tokenize_and_vectorize([(1, sample_1)])

# Tokenize returns a list of the data (length 1 here)
test_vec_list = pad_trunc(vec_list, maxlen)

test_vec = np.reshape(test_vec_list, (len(test_vec_list), maxlen, embedding_dims))
model.predict(test_vec)

In [0]:
model.predict_classes(test_vec)

With these tools you’re well on your way to not just predicting and classifying text, but
actually modeling language itself and how it’s used. And with that deeper algorithmic
understanding, instead of just parroting text your model has seen before, you can
generate completely new statements!