<a href="https://colab.research.google.com/github/sudhir2016/Google-Colab-6/blob/master/Word_gen_LSTM.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

This is a simple example of LSTM for text generation.

Load TensorFlow, Keras and Python libraries.

In [0]:
import numpy as np

In [0]:
import tensorflow as tf

In [0]:
from tensorflow import keras

In [0]:
from keras.utils import np_utils

In [0]:
import string

Read Alice in Wonderland as txt file.

In [0]:
text=open('/content/sample_data/Alice.txt','r',encoding='utf-8').read()

Convert to lower case.

In [0]:
text=text.lower()

Remove punctuations.

In [0]:
trans=str.maketrans('','',string.punctuation)

In [0]:
text=text.translate(trans)

Prepare sorted list of unique characters.

In [0]:
char=set(text)

In [0]:
char=sorted(list(char))

In [0]:
print(char)

Create character to integer dictionary.

In [0]:
x1=enumerate(char)

In [0]:
char_int = dict((d, s) for s, d in x1)

In [0]:
print(char_int)

Create integer to character dictionary.

In [0]:
x1=enumerate(char)

In [0]:
int_char= dict((d,s) for d,s in x1)

In [0]:
print(int_char)

Total number of characters.

In [0]:
n_char=len(text)
print(n_char)

Number of unique characters.

In [0]:
n_vocab=len(char)
print(n_vocab)

Prepare X and Y datasets based on a sequence of 100.

In [0]:
seq_length = 100
dataX = []
dataY = []
for i in range(0, n_char - seq_length, 1):
	seq_in = text[i:i + seq_length]
	seq_out = text[i + seq_length]
	dataX.append([char_int[c] for c in seq_in])
	dataY.append(char_int[seq_out])
n_patterns = len(dataX)
print (n_patterns)

Preprocess X and Y datasets as per LSTM requirement.

In [0]:
# reshape X to be [samples, time steps, features]
X = np.reshape(dataX, (n_patterns, seq_length, 1))
# normalize
X = X / float(n_vocab)
# one hot encode the output variable
y = np_utils.to_categorical(dataY)

Set up LSTM model.

In [0]:
l1= keras.layers.LSTM(units=256,input_shape=(X.shape[1],X.shape[2]))

In [0]:
l2=keras.layers.Dropout(0.2)

In [0]:
l3=keras.layers.Dense(y.shape[1],activation='softmax')

In [0]:
model=keras.models.Sequential([l1,l2,l3])

In [0]:
model.compile(optimizer='adam',loss='categorical_crossentropy')

In [0]:
model.summary()

Train model.

In [0]:
model.fit(X,y,epochs=20,batch_size=128)

Generate seed text.

In [0]:
text1='alice was beginning to get very tired of sitting by her sister on the bank  and of having nothing to'

Convert seed text to integer and process for feeding to model.

In [0]:
pattern=[]

In [0]:
pattern.append([char_int[c] for c in text1])

In [0]:
import itertools
pattern = list(itertools.chain(*pattern))

In [0]:
pattern=np.asarray(pattern)

In [0]:
pattern=list(pattern)

Feed seed to model and generate output text of 500 characters.

In [0]:
out=[]

In [0]:
for i in range (500):
  x=np.reshape(pattern,(1,100,1))
  x=x/float(n_vocab)
  p=model.predict(x)
  a=np.argmax(p)
  r=int_char[a]
  pattern.append(a)
  pattern = pattern[1:len(pattern)]
  out.append(r)



In [0]:
print(*out,sep='')