# A simple example to illustrate the training of RNN in Keras
- RNN is designed to retain context of past to make predictions for the future.
- Here, we will first use a special kind of RNN called LSTM. 
- Credits based on github repo: https://github.com/WillKoehrsen/recurrent-neural-networks 

In [1]:
import warnings
warnings.filterwarnings('ignore', category = RuntimeWarning)
warnings.filterwarnings('ignore', category = UserWarning)

In [17]:
import pandas as pd
import numpy as np
from utils import get_data, generate_output, guess_human, seed_sequence, get_embeddings, find_closest, format_sequence

# Fetch the data
- 3000+ patents total

In [4]:
data = pd.read_csv('data/neural_network_patent_query.csv')
data.head()

Unnamed: 0,patent_abstract,patent_date,patent_number,patent_title
0,""" A """"Barometer"""" Neuron enhances stability in...",1996-07-09,5535303,"""""""Barometer"""" neuron for a neural network"""
1,""" This invention is a novel high-speed neural ...",1993-10-19,5255349,"""Electronic neural network for solving """"trave..."
2,An optical information processor for use as a ...,1995-01-17,5383042,3 layer liquid crystal neural network with out...
3,A method and system for intelligent control of...,2001-01-02,6169981,3-brain architecture for an intelligent decisi...
4,A method and system for intelligent control of...,2003-06-17,6581048,3-brain architecture for an intelligent decisi...


In [5]:
training_dict, word_idx, idx_word, sequences = get_data('data/neural_network_patent_query.csv', training_len = 50)

There are 16192 unique words.
There are 318563 sequences.


- Sequences of text are represented as integers
- word_idx maps words to integers
- idx_word maps integers to words
- Features are integer sequences of length 50
- Label is next word in sequence
- Labels are one-hot encoded

In [6]:
print(training_dict['X_train'][:2])
print(training_dict['y_train'][:2])

[[  117     7   141   277     4    18    81   110    10   219    29     1
    952  2453    19     5     6     1   117    10   182  2166    21     1
     81   178     4    13   117   894    14  6163     7   302     1     9
      8    29    33    23    74   428     7   692     1    81   183     4
     13   117]
 [    6    41     2    87     3  1340    79     7     1   409   543    22
    484     6     2  2113   728    24     1   178     3     1  1820    55
     14 13942  7240   244     5    14 13943  7240   244     5     2  2113
   7240   244     5     2    38  9292   244     2    49  9292   244    14
     22 13944]]
[[0 0 0 ... 0 0 0]
 [0 0 0 ... 0 0 0]]


In [7]:
for i, sequence in enumerate(training_dict['X_train'][:2]):
    text = []
    for idx in sequence:
        text.append(idx_word[idx])
        
    print('Features: ' + ' '.join(text) + '\n')
    print('Label: ' + idx_word[np.argmax(training_dict['y_train'][i])] + '\n')
    

Features: user to provide samples . A recognition operation is performed on the user's handwritten input , and the user is not satisfied with the recognition result . The user selects an option to train the neural network on one or more characters to improve the recognition results . The user

Label: is

Features: and includes a number of amplifiers corresponding to the N bit output sum and a carry generation from the result of the adding process an augend input-synapse group , an addend input-synapse group , a carry input-synapse group , a first bias-synapse group a second bias-synapse group an output feedback-synapse

Label: group



 # Make RNN (LSTM)
- Embedding dimension = 100
- 64 LSTM cells in one layer
- Dropout and recurrent dropout for regularization
- Fully connected layer with 64 units on top of LSTM
- 'relu' activation
- Drop out for regularization
- Output layer produces prediction for each word
- 'softmax' activation
- Adam optimizer with defaults
- Categorical cross entropy loss
- Monitor accuracy

In [8]:
from tensorflow import keras as keras
from keras.models import Sequential, load_model
from keras.layers import LSTM, Dense, Dropout, Embedding, Masking, Bidirectional
from keras.optimizers import Adam

from keras.utils import plot_model

In [9]:
model = Sequential()


# Embedding layer
model.add(Embedding(input_dim=len(word_idx) + 1,output_dim=100,weights=None,trainable=True))

# Recurrent layer
model.add(LSTM(64, return_sequences=False, dropout=0.1,recurrent_dropout=0.1))

# Fully connected layer
model.add(Dense(64, activation='relu'))

# Dropout for regularization
model.add(Dropout(0.5))

# Output layer
model.add(Dense(len(word_idx) + 1, activation='softmax'))

# Compile the model
model.compile(
    optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])

model.summary()


Model: "sequential_1"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
embedding_1 (Embedding)      (None, None, 100)         1619200   
_________________________________________________________________
lstm_1 (LSTM)                (None, 64)                42240     
_________________________________________________________________
dense_1 (Dense)              (None, 64)                4160      
_________________________________________________________________
dropout_1 (Dropout)          (None, 64)                0         
_________________________________________________________________
dense_2 (Dense)              (None, 16192)             1052480   
Total params: 2,718,080
Trainable params: 2,718,080
Non-trainable params: 0
_________________________________________________________________


# Load in Pre-Trained Model
Rather than waiting several hours to train the model, we can load in a model trained for 150 epochs. We'll demonstrate how to train this model for another 5 epochs which shouldn't take too long depending on your hardware.



In [12]:
from keras.models import load_model

# Load in model and demonstrate training
model = load_model('models/train-embeddings-rnn.h5')
#model.load_weights('models/train-embeddings-rnn.h5')

h = model.fit(training_dict['X_train'], training_dict['y_train'], epochs = 1, batch_size = 2048, 
          validation_data = (training_dict['X_valid'], training_dict['y_valid']), 
          verbose = 1)

Instructions for updating:
Use tf.where in 2.0, which has the same broadcast rule as np.where

Train on 222994 samples, validate on 95569 samples
Epoch 1/1


# See the model performance

In [13]:
print('Model Performance: Log Loss and Accuracy on training data')
model.evaluate(training_dict['X_train'], training_dict['y_train'], batch_size = 2048)

print('\nModel Performance: Log Loss and Accuracy on validation data')
model.evaluate(training_dict['X_valid'], training_dict['y_valid'], batch_size = 2048)


Model Performance: Log Loss and Accuracy on training data

Model Performance: Log Loss and Accuracy on validation data


[5.122893927639151, 0.2681204080581665]

# Generate an output

In [14]:
from IPython.display import HTML

## Here you can input your own starting sequence for the network. The network will produce num_words of text.

In [34]:
s = 'This patent provides a basis for using a LSTM neural network to '
HTML(seed_sequence(model, s, word_idx, idx_word, diversity = 0.75, num_words = 20))

# Inspect the seed_sequence

In [35]:
s = 'This patent provides a basis for using a neural network to '
diversity = 0.75
num_words = 20

start = format_sequence(s).split()
print(start)

['This', 'patent', 'provides', 'a', 'basis', 'for', 'using', 'a', 'neural', 'network', 'to']


In [36]:
import re

def remove_spaces(s):
    """Remove spaces around punctuation"""
    s = re.sub(r'\s+([.,;?])', r'\1', s)
    return s

def addContent(old_html, raw_html):
    old_html += raw_html
    return old_html

def header(text, color = 'black', gen_text = None):
    if gen_text:
        raw_html = f'<h1 style="color: {color};"><p><center>' + str(
        text) + '<span style="color: red">' + str(gen_text) + '</center></p></h1>'
    else:
        raw_html = f'<h1 style="color: {color};"><center>' + str(
            text) + '</center></h1>'
    return raw_html


def box(text, gen_text=None):
    if gen_text:
        raw_html = '<div style="border:1px inset black;padding:1em;font-size: 20px;"> <p>' + str(
            text) +'<span style="color: red">' + str(gen_text) + '</p></div>'

    else:
        raw_html = '<div style="border:1px inset black;padding:1em;font-size: 20px;">' + str(
            text) + '</div>'
    return raw_html

In [37]:
gen = []
s = start[:]

# Generate output
for _ in range(num_words):
    # Conver to arrary of words as input
    x = np.array([word_idx.get(word, 0) for word in s]).reshape((1, -1))
    #print('x is:', x.shape)
    
    # Make predictions: Next word propabilities
    preds = model.predict(x)[0].astype(float)
    #print('preds is:', preds.shape)
    
    # Diversify
    preds = np.log(preds) / diversity
    exp_preds = np.exp(preds)
    # Softmax
    preds = exp_preds / np.sum(exp_preds)
    # Pick next index
    next_idx = np.argmax(np.random.multinomial(1, preds, size = 1))
    s.append(idx_word[next_idx])
    gen.append(idx_word[next_idx])

# Formatting in html
start = remove_spaces(' '.join(start)) + ' '
gen = remove_spaces(' '.join(gen)) 
html = ''
html = addContent(html, header('Input Seed ', color = 'black', gen_text = 'Network Output'))
html = addContent(html, box(start, gen))

In [38]:
HTML(html)