<a href="https://colab.research.google.com/github/saisatvik5/Playing-With-Ai-ML/blob/main/ALice_in_the_WonderLand_inpyb.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# LSTM Network 

Here I have tried to generate a text using a small LSTM Network. I have downloaded the given ASCII file and converted and stored in text format.


In [1]:

import numpy
import sys
import tensorflow as tf
from keras.models import Sequential
from keras.layers import Dense
from keras.layers import Dropout
from keras.layers import LSTM, GRU
from keras.callbacks import ModelCheckpoint
from keras.utils import np_utils

In [2]:
path_to_file = tf.keras.utils.get_file('Alice in the Wonder Land.txt', 'https://www.gutenberg.org/files/11/11-0.txt')

Downloading data from https://www.gutenberg.org/files/11/11-0.txt


### Here i have summerized the text. We have many characters which have to be removed to attain more clear vocabulary and Characters.

In [3]:
# Read, then decode for py2 compat.
raw_text = open(path_to_file, 'rb').read().decode(encoding='utf-8')
# length of text is the number of characters in it
print(f'Length of text: {len(raw_text)} characters')

Length of text: 167808 characters


Now here i have printed the first 10,000 words of the raw_text. 

In [6]:
print(raw_text[:10000])

﻿The Project Gutenberg eBook of Alice’s Adventures in Wonderland, by Lewis Carroll

This eBook is for the use of anyone anywhere in the United States and
most other parts of the world at no cost and with almost no restrictions
whatsoever. You may copy it, give it away or re-use it under the terms
of the Project Gutenberg License included with this eBook or online at
www.gutenberg.org. If you are not located in the United States, you
will have to check the laws of the country where you are located before
using this eBook.

Title: Alice’s Adventures in Wonderland

Author: Lewis Carroll

Release Date: January, 1991 [eBook #11]
[Most recently updated: October 12, 2020]

Language: English

Character set encoding: UTF-8

Produced by: Arthur DiBianca and David Widger

*** START OF THE PROJECT GUTENBERG EBOOK ALICE’S ADVENTURES IN WONDERLAND ***

[Illustration]




Alice’s Adventures in Wonderland

by Lewis Carroll

THE MILLENNIUM FULCRUM EDITION 3.0

Conten

It is important to convert characters to integers to make it easy for my LSTM model to understand and get trained. 

All the Ascii characters which are to be removed or can not be read by my LSTM model will be converted into integers for better understanding of these patterns. ANd these patterns are useed to train our model and can understnad the novel better.

In [7]:
raw_text = raw_text.lower()

In [None]:
chars = sorted(list(set(raw_text)))
char_to_int = dict((c, i) for i, c in enumerate(chars))
int_to_char = dict((i, c) for i, c in enumerate(chars))

In [8]:
n_chars = len(raw_text)
n_vocab = len(chars)
print("Total Characters: ", n_chars)
print("Total Vocab: ", n_vocab)

Total Characters:  167808
Total Vocab:  65


In [9]:
seq_length = 100
dataX = []
dataY = []
for i in range(0, n_chars - seq_length, 1):
	seq_in = raw_text[i:i + seq_length]
	seq_out = raw_text[i + seq_length]
	dataX.append([char_to_int[char] for char in seq_in])
	dataY.append(char_to_int[seq_out])
n_patterns = len(dataX)
print("Total Patterns: ", n_patterns)

Total Patterns:  167708


In [None]:
X = numpy.reshape(dataX, (n_patterns, seq_length, 1))

In [None]:
X = X / float(n_vocab)

In [10]:
y = np_utils.to_categorical(dataY)

### Defining LSTM Model

Down, I have coded for my LSTM model which will be helping me to understand the patterns or analyize the text of the novel

In [11]:
model = Sequential()
model.add(LSTM(256, input_shape=(X.shape[1], X.shape[2])))
model.add(Dropout(0.2))
model.add(Dense(y.shape[1], activation='softmax'))
model.compile(loss='categorical_crossentropy', optimizer='adam')

In [12]:

filepath="weights-improvement-{epoch:02d}-{loss:.4f}.hdf5"
checkpoint = ModelCheckpoint(filepath, monitor='loss', verbose=1, save_best_only=True, mode='min')
callbacks_list = [checkpoint]

## Fit the model

I run my model and check the epoch, which shows me the loss and accuracy i have attained using my defined model.

In [17]:
model.fit(X[:100000], y[:100000], epochs=10, batch_size=128, callbacks=callbacks_list)

Epoch 1/10

Epoch 00001: loss improved from 2.68790 to 2.63546, saving model to weights-improvement-01-2.6355.hdf5
Epoch 2/10

Epoch 00002: loss improved from 2.63546 to 2.59643, saving model to weights-improvement-02-2.5964.hdf5
Epoch 3/10

Epoch 00003: loss improved from 2.59643 to 2.56357, saving model to weights-improvement-03-2.5636.hdf5
Epoch 4/10

Epoch 00004: loss improved from 2.56357 to 2.53284, saving model to weights-improvement-04-2.5328.hdf5
Epoch 5/10

Epoch 00005: loss improved from 2.53284 to 2.50325, saving model to weights-improvement-05-2.5033.hdf5
Epoch 6/10

Epoch 00006: loss improved from 2.50325 to 2.46999, saving model to weights-improvement-06-2.4700.hdf5
Epoch 7/10

Epoch 00007: loss improved from 2.46999 to 2.44134, saving model to weights-improvement-07-2.4413.hdf5
Epoch 8/10

Epoch 00008: loss improved from 2.44134 to 2.41318, saving model to weights-improvement-08-2.4132.hdf5
Epoch 9/10

Epoch 00009: loss improved from 2.41318 to 2.38442, saving model to 

<keras.callbacks.History at 0x7f54b6d1cb90>

In [18]:
# pick a random seed
start = numpy.random.randint(0, len(dataX)-1)
pattern = dataX[start]
print("Seed:")
print("\"", ''.join([int_to_char[value] for value in pattern]), "\"")

Seed:
" dest tea-party i ever was at in
all my life!”

just as she said this, she noticed that one of the "


In [19]:
# generate characters
for i in range(1000):
	x = numpy.reshape(pattern, (1, len(pattern), 1))
	x = x / float(n_vocab)
	prediction = model.predict(x, verbose=0)
	index = numpy.argmax(prediction)
	result = int_to_char[index]
	seq_in = [int_to_char[value] for value in pattern]
	sys.stdout.write(result)
	pattern.append(index)
	pattern = pattern[1:len(pattern)]
print("\nDone.")


and the was io toe was soe tas io the woel  the was soen the was oo the
toee  and the wose to tee toee  the was soen the was ani the was oo the
toeee to tee toe toee  and the was io toe toee  and the was ani the was
toe toe toe to tee was an the cad  no toe tas io the woel  and the woeee
noe the was io toe toee  and the was io toe toee  and the was ani the
aad to tee was ani the was io toe toee  and the was io toe toee  and the
aot she was io toe toe toe to tee woel  the was ani the was oo the 
aaree an the woel  the wose to tee toee  the was soen the was oo the
toeee to tee toee  the was soen the wosed toe tas io the woee  the wos

toe toe toe to toe toe toe to tee was an the cad                                                                                                                                                                                                                      
Done.
