## Recurrent Neural Network (LSTM) that generates haiku (Japanese poems) in Keras/Tensorflow

### Information about the dataset

- 10 000 haikus of Issa were used to train RNN
- Poems were taken from this website: http://haikuguy.com/issa/searchenglish2.php

### Keras implementation

**Importing main libraries**

In [1]:
import sys
import numpy
from keras.models import Sequential
from keras.layers import Dense
from keras.layers import Dropout
from keras.layers import LSTM
from keras.callbacks import ModelCheckpoint
from keras.utils import np_utils
import warnings
warnings.filterwarnings('ignore')

Using TensorFlow backend.


**Text loading, opening and coverting it to lowercase**

In [2]:
filename = "issa.txt"
raw_text = open(filename).read()
raw_text = raw_text.lower()

**Creating unique id for every character**

In [3]:
chars = sorted(list(set(raw_text)))
char_to_int = dict((c, i) for i, c in enumerate(chars))

In [4]:
n_chars = len(raw_text)
print("Total number of characters in the text: ", n_chars)

Total number of characters in the text:  541081


In [5]:
n_vocab = len(chars)
print("Total number of unique characters: ", n_vocab)

Total number of unique characters:  36


**Preparing the input by encoding characters, dividing text by 54 characters (creating inputs, every input is 54 characters)** <br>
54 - average number of characters in one haiku (54 000/10 000)

In [6]:
seq_length = 54 #average
dataX = []
dataY = []
for i in range(0, n_chars - seq_length, 1):
    seq_in = raw_text[i:i + seq_length]
    seq_out = raw_text[i + seq_length]
    dataX.append([char_to_int[char] for char in seq_in])
    dataY.append(char_to_int[seq_out])
n_patterns = len(dataX)
print("Total Patterns: ", n_patterns)

Total Patterns:  541027


In [7]:
len(dataX[1])

54

Input is transformed into the form [samples, time steps, features] expected by an LSTM network.<br>

Then imput is scaled from 0 to 1.<br>

Lastly the output pattern is OneHotEncoded

**Reshape X to be [samples, time steps, features]**

In [8]:
X = numpy.reshape(dataX, (n_patterns, seq_length, 1))

**Normalization**

In [9]:
X = X / float(n_vocab)

**One hot encode the output variable**

In [10]:
y = np_utils.to_categorical(dataY)

The LSTM model: <br>
1 layer, 256 neurons <br>
Dropout - 0.2 <br>
"Softmax" activation function <br>

In [11]:
model = Sequential()
model.add(LSTM(256, input_shape=(X.shape[1], X.shape[2])))
model.add(Dropout(0.2))
model.add(Dense(y.shape[1], activation='softmax'))
model.compile(loss='categorical_crossentropy', optimizer='adam')

**Define checkpoint**

In [12]:
filepath="weights-improvement-{epoch:02d}-{loss:.4f}.hdf5"
checkpoint = ModelCheckpoint(filepath, monitor='loss', verbose=1, save_best_only=True, mode='min')
callbacks_list = [checkpoint]

**Fit data**

In [None]:
model.fit(X, y, epochs=20, batch_size=128, callbacks=callbacks_list)

**The pre-trained model is loaded:**

In [13]:
filename = "weights-improvement-14-1.7628.hdf5"
model.load_weights(filename)
model.compile(loss='categorical_crossentropy', optimizer='adam')

**Code to convert encoded characters back**

In [14]:
int_to_char = dict((i, c) for i, c in enumerate(chars))

In [15]:
start = numpy.random.randint(0, len(dataX)-1)
pattern = dataX[start]
print("Seed:")
print("\"", ''.join([int_to_char[value] for value in pattern]), "\"")
# generate characters
for i in range(50):
    x = numpy.reshape(pattern, (1, len(pattern), 1))
    x = x / float(n_vocab)
    prediction = model.predict(x, verbose=0)
    index = numpy.argmax(prediction)
    result = int_to_char[index]
    seq_in = [int_to_char[value] for value in pattern]
    sys.stdout.write(result)
    pattern.append(index)
    pattern = pattern[1:len(pattern)]

Seed:
" ce field
the greatest sight of all!
summer's early daw "
n


the siae field soow
fer she sorw falls...
poum

### Tensorflow implementation

**Importing main libraries**

In [16]:
import tensorflow as tf
import numpy as np

**Set parameters**

In [17]:
#set hyperparameters
max_len = 50
step = 2
num_units = 256
learning_rate = 0.001
batch_size = 128
epoch = 20
temperature = 0.5

**Text loading, opening and coverting it to lowercase**

In [18]:
filename = "issa.txt"
text = open(filename, 'r').read()
text = text.lower()

**Creating unique id for every character**

In [19]:
unique_chars = list(set(text))
len_unique_chars = len(unique_chars)

input_chars = []
output_char = []

for i in range(0, len(text) - max_len, step):
    input_chars.append(text[i:i+max_len])
    output_char.append(text[i+max_len])

train_data = np.zeros((len(input_chars), max_len, len_unique_chars))
target_data = np.zeros((len(input_chars), len_unique_chars))

for i , each in enumerate(input_chars):
    for j, char in enumerate(each):
        train_data[i, j, unique_chars.index(char)] = 1
    target_data[i, unique_chars.index(output_char[i])] = 1

**Define RNN**

In [20]:
def rnn(x, weight, bias, len_unique_chars):

    x = tf.transpose(x, [1, 0, 2])
    x = tf.reshape(x, [-1, len_unique_chars])
    x = tf.split(x, max_len, 0)

    cell = tf.contrib.rnn.BasicLSTMCell(num_units, forget_bias=1.0)
    outputs, states = tf.contrib.rnn.static_rnn(cell, x, dtype=tf.float32)
    prediction = tf.matmul(outputs[-1], weight) + bias
    return prediction

**Helper function to sample an index from a probability array**

In [21]:
def sample(predicted):
    '''
     
    '''
    exp_predicted = np.exp(predicted/temperature)
    predicted = exp_predicted / np.sum(exp_predicted)
    probabilities = np.random.multinomial(1, predicted, 1)
    return probabilities


In [22]:
x = tf.placeholder("float", [None, max_len, len_unique_chars])
y = tf.placeholder("float", [None, len_unique_chars])
weight = tf.Variable(tf.random_normal([num_units, len_unique_chars]))
bias = tf.Variable(tf.random_normal([len_unique_chars]))

prediction = rnn(x, weight, bias, len_unique_chars)
softmax = tf.nn.softmax_cross_entropy_with_logits(logits=prediction, labels=y)
cost = tf.reduce_mean(softmax)
optimizer = tf.train.RMSPropOptimizer(learning_rate=learning_rate).minimize(cost)

init_op = tf.global_variables_initializer()
sess = tf.Session()
sess.run(init_op)

num_batches = int(len(train_data)/batch_size)


Instructions for updating:

Future major versions of TensorFlow will allow gradients to flow
into the labels input on backprop by default.

See @{tf.nn.softmax_cross_entropy_with_logits_v2}.



In [None]:
for i in range(epoch):
    print("Epoch {0}/{1}".format(i+1, epoch))
    count = 0
    for _ in range(num_batches):
        train_batch, target_batch = train_data[count:count+batch_size], target_data[count:count+batch_size]
        count += batch_size
        sess.run([optimizer] ,feed_dict={x:train_batch, y:target_batch})

    #get on of training set as seed
    seed = train_batch[:1:]

    #to print the seed 40 characters
    seed_chars = ''
    for each in seed[0]:
        seed_chars += unique_chars[np.where(each == max(each))[0][0]]
    print("Seed:", seed_chars)

    #predict next 100 characters
    for i in range(100):
        if i > 0:
            remove_fist_char = seed[:,1:,:]
            seed = np.append(remove_fist_char, np.reshape(probabilities, [1, 1, len_unique_chars]), axis=1)
        predicted = sess.run([prediction], feed_dict = {x:seed})
        predicted = np.asarray(predicted[0]).astype('float64')[0]
        probabilities = sample(predicted)
        predicted_chars = unique_chars[np.argmax(probabilities)]
        seed_chars += predicted_chars
    print('Result:', seed_chars)
sess.close()

## Some generated haikus

_three men <br>
use it for a pillow...<br>
green rice field_<br>
<br>
  _willow tree<br>
catch the blossom-scented wind<br>
of the cherry_
<br>
<br>
_honeybees--<br>
but right next door<br>
hornets_
<br><br>
_following<br>
the setting sun...<br>
a frog_