## Char Prediction using LSTM

1. Download data of Alice in Wonderland or Dracula from https://www.gutenberg.org/browse/scores/top in plain text format
2. Create an char_to_int map which maps each character used in the novel to an integer. example {a: 3}
3. Read data from the text file and do the following:
    3.1 Create a sliding window in which it takes in first 100 characters as the input sequence and 101th character as the output sequence. (It slides over every character).
    For example: 
        "Avul Pakir Jainulabdeen Abdul Kalam better known as A.P.J. Abdul Kalam"
        You should slide from "A" to the 100th char and 101th char will be your output.
        Then you should start sliding from "v" to the 100th char and 101th char will be your output.
    The input and the output sequence should be converted to their integer representation using the char_to_int map.
    With this you basically have two arrays seqIn and seqOut with each element containing integer representation of 100 characters and 1 character respectively.
    seqIn = [[10........15], [5.....25]...] seqOut = [5, 2, 5]
4. Now reshape your seqIn as (NumberOfSamples, 100, 1) - So you basically get this [[[10]........[15]], [[5]..... [25]]...]
5. One hot encode your seqOut using np_utils.to_categorical

6. Now create a simple model with LSTM followed by a Dense layer.

7. Then, given a seed sentence predict the next character using the model created.


In [41]:
import numpy as np
import matplotlib.pyplot as plt
import math
from sklearn import cross_validation
import plotly.plotly as py
import plotly.graph_objs as go
from plotly.offline import plot    
import plotly.graph_objs as go
import pandas as pd
from keras.models import Sequential
from keras.layers import Dense, Dropout
from keras.optimizers import RMSprop
from sklearn.cross_validation import train_test_split
import keras
from keras.models import Sequential
from keras.layers import Dense
from keras.layers import Dropout
from keras.layers import LSTM
from keras.callbacks import ModelCheckpoint
from keras.utils import np_utils

In [45]:
text = """ CHAPTER I. Down the Rabbit-Hole

Alice was beginning to get very tired of sitting by her sister on the
bank, and of having nothing to do: once or twice she had peeped into the
book her sister was reading, but it had no pictures or conversations in
it, and what is the use of a book, thought Alice without pictures or
conversations?

So she was considering in her own mind (as well as she could, for the
hot day made her feel very sleepy and stupid), whether the pleasure
of making a daisy-chain would be worth the trouble of getting up and
picking the daisies, when suddenly a White Rabbit with pink eyes ran
close by her.
CHAPTER II. The Pool of Tears

Curiouser and curiouser! cried Alice (she was so much surprised, that
for the moment she quite forgot how to speak good English); now I am
opening out like the largest telescope that ever was! Good-bye, feet!
(for when she looked down at her feet, they seemed to be almost out of
sight, they were getting so far off). Oh, my poor little feet, I wonder
who will put on your shoes and stockings for you now, dears? I am sure
I shall not be able! I shall be a great deal too far off to trouble
myself about you: you must manage the best way you can;--but I must be
kind to them, thought Alice, or perhaps they will not walk the way I want
to go! Let me see: I will give them a new pair of boots every Christmas.

And she went on planning to herself how she would manage it. They must
go by the carrier, she thought; and how funny it will seem, sending
presents to ones own feet! And how odd the directions will look!

     
*    *    *    *    *    *    * """

In [46]:
def ascii(text):
    text_list = []
    for i in text:
        text_list.append(ord(i))  
    return text_list    

In [47]:
text_list = ascii(text)
max(text_list)

121

In [48]:
def sliding_window(text,length):
    sqin = []
    sqout = []
    for i in range(len(text)):
        output = []
        long = len(text)
        if((long-i)<(length+1)):
            break
        sqin.append(text[i:i+(length)])
        sqout.append(text[i+(length)])
    return sqin,sqout
            

In [49]:
sqin,sqout = sliding_window(text_list,100)
sqout

[116,
 104,
 101,
 10,
 98,
 97,
 110,
 107,
 44,
 32,
 97,
 110,
 100,
 32,
 111,
 102,
 32,
 104,
 97,
 118,
 105,
 110,
 103,
 32,
 110,
 111,
 116,
 104,
 105,
 110,
 103,
 32,
 116,
 111,
 32,
 100,
 111,
 58,
 32,
 111,
 110,
 99,
 101,
 32,
 111,
 114,
 32,
 116,
 119,
 105,
 99,
 101,
 32,
 115,
 104,
 101,
 32,
 104,
 97,
 100,
 32,
 112,
 101,
 101,
 112,
 101,
 100,
 32,
 105,
 110,
 116,
 111,
 32,
 116,
 104,
 101,
 10,
 98,
 111,
 111,
 107,
 32,
 104,
 101,
 114,
 32,
 115,
 105,
 115,
 116,
 101,
 114,
 32,
 119,
 97,
 115,
 32,
 114,
 101,
 97,
 100,
 105,
 110,
 103,
 44,
 32,
 98,
 117,
 116,
 32,
 105,
 116,
 32,
 104,
 97,
 100,
 32,
 110,
 111,
 32,
 112,
 105,
 99,
 116,
 117,
 114,
 101,
 115,
 32,
 111,
 114,
 32,
 99,
 111,
 110,
 118,
 101,
 114,
 115,
 97,
 116,
 105,
 111,
 110,
 115,
 32,
 105,
 110,
 10,
 105,
 116,
 44,
 32,
 97,
 110,
 100,
 32,
 119,
 104,
 97,
 116,
 32,
 105,
 115,
 32,
 116,
 104,
 101,
 32,
 117,
 115,
 101,
 32,
 111,
 102,
 32,
 

In [50]:
sqin_ar = np.array(sqin)
sqin_ar = sqin_ar.reshape(len(sqin),100,1)
sqout_ar = np.array(sqout)


In [51]:
sqout_ar

array([116, 104, 101, ...,  32,  42,  32])

In [52]:
x_train,x_test,y_train, y_test = train_test_split(sqin_ar, sqout_ar, test_size=0.3, random_state=42)

In [53]:
xtrain = x_train
xtest = x_test

In [54]:
from keras.utils import np_utils
ytrain = np_utils.to_categorical(y_train, num_classes=255)
ytest = np_utils.to_categorical(y_test, num_classes=255)

In [55]:
ytrain.shape

(1055, 255)

In [56]:
model = Sequential()
model.add(LSTM(64, activation='relu', input_shape=(xtrain.shape[1],xtrain.shape[2],)))
model.add(Dense(500, activation='relu'))
model.add(Dense(500, activation='relu'))
model.add(Dense(ytrain.shape[1], activation='softmax'))
model.compile(loss='categorical_crossentropy', optimizer=RMSprop(),metrics = ['accuracy'])
model.fit(xtrain, ytrain, epochs=5, batch_size=1, verbose=2,validation_split = .2)

Train on 844 samples, validate on 211 samples
Epoch 1/5
54s - loss: 13.2248 - acc: 0.1754 - val_loss: 13.2917 - val_acc: 0.1754
Epoch 2/5
55s - loss: 13.0243 - acc: 0.1919 - val_loss: 13.2917 - val_acc: 0.1754
Epoch 3/5
52s - loss: 13.0243 - acc: 0.1919 - val_loss: 13.2917 - val_acc: 0.1754
Epoch 4/5
56s - loss: 13.0243 - acc: 0.1919 - val_loss: 13.2917 - val_acc: 0.1754
Epoch 5/5
52s - loss: 13.0243 - acc: 0.1919 - val_loss: 13.2917 - val_acc: 0.1754


<keras.callbacks.History at 0x7f9128970f60>

In [74]:
predict = """So she was considering in her own mind (as well as she could, for the
hot day made her feel very sleepy and stupid), whether the pleasure
of making a daisy-chain would be worth the trouble of getting up and
picking the daisies, when suddenly a White Rabbit with pink eyes ran
close by her.
CHAPTER II. The Pool of Tears

Curiouser and curiouser! cried Alice (she was so much surprised, that
for the moment she quite forgot how to speak good English); now I am
opening out like the largest telescope that ever was! Good-bye, feet!
(for when she looked down at her feet, """

In [75]:
pred = ascii(predict)
pred

[83,
 111,
 32,
 115,
 104,
 101,
 32,
 119,
 97,
 115,
 32,
 99,
 111,
 110,
 115,
 105,
 100,
 101,
 114,
 105,
 110,
 103,
 32,
 105,
 110,
 32,
 104,
 101,
 114,
 32,
 111,
 119,
 110,
 32,
 109,
 105,
 110,
 100,
 32,
 40,
 97,
 115,
 32,
 119,
 101,
 108,
 108,
 32,
 97,
 115,
 32,
 115,
 104,
 101,
 32,
 99,
 111,
 117,
 108,
 100,
 44,
 32,
 102,
 111,
 114,
 32,
 116,
 104,
 101,
 10,
 104,
 111,
 116,
 32,
 100,
 97,
 121,
 32,
 109,
 97,
 100,
 101,
 32,
 104,
 101,
 114,
 32,
 102,
 101,
 101,
 108,
 32,
 118,
 101,
 114,
 121,
 32,
 115,
 108,
 101,
 101,
 112,
 121,
 32,
 97,
 110,
 100,
 32,
 115,
 116,
 117,
 112,
 105,
 100,
 41,
 44,
 32,
 119,
 104,
 101,
 116,
 104,
 101,
 114,
 32,
 116,
 104,
 101,
 32,
 112,
 108,
 101,
 97,
 115,
 117,
 114,
 101,
 10,
 111,
 102,
 32,
 109,
 97,
 107,
 105,
 110,
 103,
 32,
 97,
 32,
 100,
 97,
 105,
 115,
 121,
 45,
 99,
 104,
 97,
 105,
 110,
 32,
 119,
 111,
 117,
 108,
 100,
 32,
 98,
 101,
 32,
 119,
 111,
 114,
 116,
 104

In [76]:
pred_slide,pred2 = sliding_window(pred,100)
len(pred_slide)

469

In [77]:
pred_a = np.array(pred_slide)
pred_ar = pred_a.reshape(len(pred_slide),100,1)
pred_ar

array([[[ 83],
        [111],
        [ 32],
        ..., 
        [115],
        [108],
        [101]],

       [[111],
        [ 32],
        [115],
        ..., 
        [108],
        [101],
        [101]],

       [[ 32],
        [115],
        [104],
        ..., 
        [101],
        [101],
        [112]],

       ..., 
       [[103],
        [ 32],
        [111],
        ..., 
        [102],
        [101],
        [101]],

       [[ 32],
        [111],
        [117],
        ..., 
        [101],
        [101],
        [116]],

       [[111],
        [117],
        [116],
        ..., 
        [101],
        [116],
        [ 44]]])

In [78]:
result = model.predict(pred_ar, batch_size=1, verbose=2)
result.shape

(469, 255)

In [79]:
for i in range(result.shape[0]):
    res = np.argmax(result[i])
    test_result = chr(res)
    print(test_result)
    print("---")

 
---
 
---
 
---
 
---
 
---
 
---
 
---
 
---
 
---
 
---
 
---
 
---
 
---
 
---
 
---
 
---
 
---
 
---
 
---
 
---
 
---
 
---
 
---
 
---
 
---
 
---
 
---
 
---
 
---
 
---
 
---
 
---
 
---
 
---
 
---
 
---
 
---
 
---
 
---
 
---
 
---
 
---
 
---
 
---
 
---
 
---
 
---
 
---
 
---
 
---
 
---
 
---
 
---
 
---
 
---
 
---
 
---
 
---
 
---
 
---
 
---
 
---
 
---
 
---
 
---
 
---
 
---
 
---
 
---
 
---
 
---
 
---
 
---
 
---
 
---
 
---
 
---
 
---
 
---
 
---
 
---
 
---
 
---
 
---
 
---
 
---
 
---
 
---
 
---
 
---
 
---
 
---
 
---
 
---
 
---
 
---
 
---
 
---
 
---
 
---
 
---
 
---
 
---
 
---
 
---
 
---
 
---
 
---
 
---
 
---
 
---
 
---
 
---
 
---
 
---
 
---
 
---
 
---
 
---
 
---
 
---
 
---
 
---
 
---
 
---
 
---
 
---
 
---
 
---
 
---
 
---
 
---
 
---
 
---
 
---
 
---
 
---
 
---
 
---
 
---
 
---
 
---
 
---
 
---
 
---
 
---
 
---
 
---
 
---
 
---
 
---
 
---
 
---
 
---
 
---
 
---
 
---
 
---
 
---
 
---
 
---
 
---
 
---
 
---
 
---
 
---
 
--