**Name: Nidhi Rajkumar Saini**

In [None]:
import numpy
from keras.models import Sequential
from keras.layers import LSTM, Dense, Dropout
from keras.optimizers import Adam
from keras.callbacks import EarlyStopping, ModelCheckpoint
from keras.metrics import categorical_accuracy
from keras.utils import np_utils
import re, string
import os
import sys

from pydrive.auth import GoogleAuth
from pydrive.drive import GoogleDrive
from google.colab import auth
from oauth2client.client import GoogleCredentials

In [None]:
!pip install PyDrive



In [None]:
auth.authenticate_user()
gauth = GoogleAuth()
gauth.credentials = GoogleCredentials.get_application_default()
drive = GoogleDrive(gauth)

1. **Generative Models for Text**<br>
(a) *In this problem, we are trying to build a generative model to mimic the writing style of prominent British Mathematician, Philosopher, prolific writer, and
political activist, Bertrand Russell.*

(b) Load the following books of Bertrand Russell in text format:<br>
*i. The Problems of Philosophy <br> ii. The Analysis of Mind <br> iii. Mysticism and Logic and Other Essays <br> iv. Our Knowledge of the External World as a Field for Scientific Method in Philosophy<br>Load the following books from The Library of Congress and convert them to text files:<br>i. The History of Western Philosophy<br> ii. The Analysis of Matter<br> iii. An Inquiry into Meaning and Truth*

(c) **LSTM**<br>
*Train an LSTM to mimic Russell’s style and thoughts.*

Concatenate your text files to create a corpus of Russell’s writings.

In [None]:
books = ['1MzLAL9DCAsyE7MczLQhTo6U44hbJC182', '1Ovi24DG7yMKt8ttr5lQbu9mXzkZDPn8D', '1VpasnabN4DJQWcjoMUevaoo5qe8konfJ', '1r9VpB5Yxs0mVr5Tr4tNS2e8BE7ohXTMc', '1XV9CV93VaWGriEqQ5F0LmtB8RdFqExHk']
raw_data = ''
print("Reading Bertrand Russell's famous books!!")
for each in books:
    download = drive.CreateFile({'id': each})
    download.GetContentFile(each)
    raw_text = open(each, 'r', encoding = "utf-8", errors='ignore').read()
    print("\nFinished reading book of length", len(raw_text), "...")
    raw_data += raw_text.lower()
print("\nTotal length of combined book is", len(raw_data))

Reading Bertrand Russell's famous books!!

Finished reading book of length 747034 ...

Finished reading book of length 412289 ...

Finished reading book of length 405988 ...

Finished reading book of length 514653 ...

Finished reading book of length 766542 ...

Total length of combined book is 2846506


Ignore non-ascii characters and punctuations to clean the input data.

In [None]:
raw_data = raw_data.encode("ascii", "ignore")
raw_data = raw_data.decode()
rx = re.compile('([\n])')
raw_data = raw_data.translate(str.maketrans('', '', string.punctuation))
raw_data = rx.sub('', raw_data)

Use a character-level representation for this model. Each character will be encoded into an integer using its ASCII code.

In [None]:
# create mapping of unique chars to integer using its ASCII code
chars = sorted(list(set(raw_data)))
char_to_int = dict((c, ord(c)) for c in chars)

In [None]:
char_to_int

{' ': 32,
 '0': 48,
 '1': 49,
 '2': 50,
 '3': 51,
 '4': 52,
 '5': 53,
 '6': 54,
 '7': 55,
 '8': 56,
 '9': 57,
 'a': 97,
 'b': 98,
 'c': 99,
 'd': 100,
 'e': 101,
 'f': 102,
 'g': 103,
 'h': 104,
 'i': 105,
 'j': 106,
 'k': 107,
 'l': 108,
 'm': 109,
 'n': 110,
 'o': 111,
 'p': 112,
 'q': 113,
 'r': 114,
 's': 115,
 't': 116,
 'u': 117,
 'v': 118,
 'w': 119,
 'x': 120,
 'y': 121,
 'z': 122}

In [None]:
n_chars = len(raw_data)
n_vocab = len(chars)
print ("Total Characters: ", n_chars)
print ("Total Vocab: ", n_vocab)

Total Characters:  2682224
Total Vocab:  37


Choose a window size, e.g., W = 100. Inputs to the network will be the first
W − 1 = 99 characters of each sequence, and the output of the network will be the Wth character of the sequence. Basically, we are training the network to predict each character using the 99 characters that precede it. Slide the window in strides of S = 1 on the text.


In [None]:
# prepare the dataset of input to output pairs encoded as integers
seq_length = 99
dataX = []
dataY = []
for i in range(0, n_chars - seq_length, 1):
	seq_in = raw_data[i:i + seq_length]
	seq_out = raw_data[i + seq_length]
	dataX.append([char_to_int[char] for char in seq_in])
	dataY.append(char_to_int[seq_out])
n_patterns = len(dataX)
print ("Total Patterns: ", n_patterns)

Total Patterns:  2682125


In [None]:
# for validating model later
first_data_pt = dataX[0]
first_res = dataY[0]

Rescale the integers to the range [0,1], because LSTM uses a sigmoid activation function. LSTM will receive the rescaled integers as its input.<br>
Note that the output has to be encoded using a one-hot encoding scheme with
N = 256 (or less) elements. This means that the network reads integers, but
outputs a vector of N = 256 (or less) elements.

In [None]:
# reshape X to be [samples, time steps, features]
X = numpy.reshape(dataX, (n_patterns, seq_length, 1))
# normalize
X = X / float(n_vocab)
# one hot encode the output variable
y = np_utils.to_categorical(dataY)

Use a single hidden layer for the LSTM with N = 256 (or less) memory units.<br>
Use a Softmax output layer to yield a probability prediction for each of the
characters between 0 and 1. This is actually a character classification problem
with N classes. Choose log loss (cross entropy) as the objective function for
the network (research what it means).

In [None]:
# define the LSTM model
model = Sequential()
model.add(LSTM(256, input_shape=(X.shape[1], X.shape[2])))
model.add(Dropout(0.2))
model.add(Dense(y.shape[1], activation='softmax'))
model.compile(loss='categorical_crossentropy', optimizer='adam')

We do not use a test dataset. We are using the whole training dataset to
learn the probability of each character in a sequence. We are not seeking for
a very accurate model. Instead we are interested in a generalization of the
dataset that can mimic the gist of the text.

Choose a reasonable number of epochs for training, considering your computational power.<br>Use model checkpointing to keep the network weights to determine each time
an improvement in loss is observed at the end of the epoch. Find the best set
of weights in terms of loss.

In [None]:
batch_size = 128 # minibatch size
num_epochs = 50 # number of epochs
file_path="weights-improvement-{epoch:02d}-{loss:.4f}-bigger.hdf5"
# added earlystopping to avoid overfitting
callbacks=[EarlyStopping(patience=4, monitor='val_loss'),
           ModelCheckpoint(file_path, monitor='val_loss', verbose=1, save_best_only=True, mode='min')]
#fit the model
history = model.fit(X, y,
                 batch_size=batch_size,
                 shuffle=True,
                 epochs=num_epochs,
                 callbacks=callbacks,
                 validation_split=0.1)

Epoch 1/50

Epoch 00001: val_loss improved from inf to 2.38181, saving model to weights-improvement-01-2.5723-bigger.hdf5
Epoch 2/50

Epoch 00002: val_loss improved from 2.38181 to 2.14788, saving model to weights-improvement-02-2.3437-bigger.hdf5
Epoch 3/50

Epoch 00003: val_loss improved from 2.14788 to 2.00715, saving model to weights-improvement-03-2.1833-bigger.hdf5
Epoch 4/50

Epoch 00004: val_loss improved from 2.00715 to 1.93010, saving model to weights-improvement-04-2.0880-bigger.hdf5
Epoch 5/50

Epoch 00005: val_loss improved from 1.93010 to 1.88618, saving model to weights-improvement-05-2.0270-bigger.hdf5
Epoch 6/50

Epoch 00006: val_loss improved from 1.88618 to 1.83885, saving model to weights-improvement-06-1.9826-bigger.hdf5
Epoch 7/50

Epoch 00007: val_loss improved from 1.83885 to 1.81572, saving model to weights-improvement-07-1.9480-bigger.hdf5
Epoch 8/50

Epoch 00008: val_loss improved from 1.81572 to 1.78174, saving model to weights-improvement-08-1.9195-bigger.h

In [None]:
# load the network weights
filename = "weights-improvement-30-1.7166-bigger.hdf5"
model.load_weights(filename) 
model.compile(loss='categorical_crossentropy', optimizer='adam')

In [None]:
int_to_char = dict((ord(c), c) for c in chars)

In [None]:
# testing model on first train data pt
pattern = first_data_pt
print("Seed:")
print("\"", ''.join([int_to_char[value] for value in pattern]), "\"")
# generate characters
for i in range(1000):
	x = numpy.reshape(pattern, (1, len(pattern), 1))
	x = x / float(n_vocab)
	prediction = model.predict(x, verbose=0)
	index = numpy.argmax(prediction)
	result = int_to_char[index]
	seq_in = [int_to_char[value] for value in pattern]
	sys.stdout.write(result)
	pattern.append(index)
	pattern = pattern[1:len(pattern)]
print("\nDone.")

Seed:
" introduction the present work is intended as an investigation of certain problems concerning empiri "
cal propositions and the semse if the semse if the semte is the semte if the semte is the semte if the semte if the semte is the semte if teet aelieve the semte if the semte if the semte is the semte if the semte is the semte if teet aelieve the semte if the semte if the semte is the semte if the semte is the semte if teet aelieve the semte if the semte if the semte is the semte if the semte is the semte if teet aelieve the semte if the semte if the semte is the semte if the semte is the semte if teet aelieve the semte if the semte if the semte is the semte if the semte is the semte if teet aelieve the semte if the semte if the semte is the semte if the semte is the semte if teet aelieve the semte if the semte if the semte is the semte if the semte is the semte if teet aelieve the semte if the semte if the semte is the semte if the semte is the semte if teet aelieve the semte i

Use the network with the best weights to generate 1000 characters, using the
following text as initialization of the network:<br>
There are those who take mental phenomena naively, just as they
would physical phenomena. This school of psychologists tends not to
emphasize the object.

In [None]:
pattern = 'There are those who take mental phenomena naively, just as they would physical phenomena. This school of psychologists tends not to emphasize the object.'
pattern = pattern.translate(str.maketrans('', '', string.punctuation))
pattern = [char_to_int[c] for c in pattern[-99:].lower()]
print("Seed:")
print("\"", ''.join([int_to_char[value] for value in pattern]), "\"")
# generate characters
for i in range(1000):
	x = numpy.reshape(pattern, (1, len(pattern), 1))
	x = x / float(n_vocab)
	prediction = model.predict(x, verbose=0)
	index = numpy.argmax(prediction)
	result = int_to_char[index]
	seq_in = [int_to_char[value] for value in pattern]
	sys.stdout.write(result)
	pattern.append(index)
	pattern = pattern[1:len(pattern)]
print("\nDone.")

Seed:
" ust as they would physical phenomena this school of psychologists tends not to emphasize the object "
 of the perception of the semse if the semse if the semte if the semtence is the semtence is the semtence is the semtence is the semtence is the semtence is the semtence is the semtence is the semtence is the semtence is the semtence is the semtence is the semtence is the semtence is the semtence is the semtence is the semtence is the semtence is the semtence is the semtence is the semtence is the semtence is the semtence is the semtence is the semtence is the semtence is the semtence is the semtence is the semtence is the semtence is the semtence is the semtence is the semtence is the semtence is the semtence is the semtence is the semtence is the semtence is the semtence is the semtence is the semtence is the semtence is the semtence is the semtence is the semtence is the semtence is the semtence is the semtence is the semtence is the semtence is the semtence is the semtence 

**References**<br>
https://wiki.pathmind.com/lstm<br>
https://machinelearningmastery.com/text-generation-lstm-recurrent-neural-networks-python-keras/<br>
https://machinelearningmastery.com/how-to-stop-training-deep-neural-networks-at-the-right-time-using-early-stopping/<br>
https://towardsdatascience.com/long-short-term-memory-lstm-in-keras-2b5749e953ac