# Ismail Kadare - Gjakftohtesia

This notebook builds a character-wise RNN trained on "Gjakftohtesia" of the albanian author Ismail Kadare. It'll be used to generate a new chapter.

This network is based on https://github.com/udacity/deep-learning/tree/master/intro-to-rnns

In [1]:
import time
from collections import namedtuple

import numpy as np
import tensorflow as tf

Create the dictionaries

In [2]:
with open('Ismail Kadare - Gjakftohtesia.txt', 'r', encoding="iso-8859-1") as f:
    text=f.read()
vocab = sorted(set(text))
vocab_to_int = {c: i for i, c in enumerate(vocab)}
int_to_vocab = dict(enumerate(vocab))
encoded = np.array([vocab_to_int[c] for c in text], dtype=np.int32)

In [3]:
print('vocab', vocab)
print('vocab_to_int', vocab_to_int)
print('int_to_vocab', int_to_vocab)

vocab ['\n', '\x0c', ' ', '!', '"', '#', '%', '&', "'", '(', ')', '*', '+', ',', '-', '.', '/', '0', '1', '2', '3', '4', '5', '6', '7', '8', '9', ':', ';', '<', '>', '?', 'A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J', 'K', 'L', 'M', 'N', 'O', 'P', 'Q', 'R', 'S', 'T', 'U', 'V', 'W', 'X', 'Y', 'Z', '[', '\\', ']', '^', '_', 'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z', '~', '\x86', '\x88', '\x89', '\x8a', '\x8e', '\x8f', '\x90', '\x93', '\x95', '\x97', '\x98', '\x99', '\x9a', '\x9d', '\x9e', '\x9f', '¡', '£', '¤', '¥', '§', 'Ç', 'È', 'ã', 'å', 'æ', 'ë', 'ó']
vocab_to_int {'+': 12, 'H': 39, ':': 27, '¡': 106, 't': 82, 'h': 70, 'Z': 57, 'A': 32, 'f': 68, '3': 20, 'j': 72, 'È': 112, '%': 6, 'K': 42, 'm': 75, 'b': 64, 'k': 73, 'Ç': 111, '\x0c': 1, 'ó': 117, 'E': 36, '&': 7, ',': 13, 'Q': 48, '¥': 109, '~': 89, '[': 58, '<': 29, '§': 110, 'p': 78, '\x97': 99, ' ': 2, '-': 14, 'X': 55, '9': 26, 'z': 88, 'D':

In [4]:
text[:100]

'ISMAIL \nKADARE \n\nGJAKFTOHTESIA \n\nNOVELA \n\nSHTEPIA BOTUESE NAIM FRASHERI \n\n\x0c\nRedaktor \n\n\x0c\ns \n\nSilva e'

Characters encoded as integers

In [5]:
encoded[:100]

array([40, 50, 44, 32, 40, 43,  2,  0, 42, 32, 35, 32, 49, 36,  2,  0,  0,
       38, 41, 32, 42, 37, 51, 46, 39, 51, 36, 50, 40, 32,  2,  0,  0, 45,
       46, 53, 36, 43, 32,  2,  0,  0, 50, 39, 51, 36, 47, 40, 32,  2, 33,
       46, 51, 52, 36, 50, 36,  2, 45, 32, 40, 44,  2, 37, 49, 32, 50, 39,
       36, 49, 40,  2,  0,  0,  1,  0, 49, 67, 66, 63, 73, 82, 77, 80,  2,
        0,  0,  1,  0, 81,  2,  0,  0, 50, 71, 74, 84, 63,  2, 67], dtype=int32)

How many character classes?

In [6]:
len(vocab)

118

## Making training mini-batches

In [7]:
def get_batches(arr, batch_size, n_steps):
    '''Create a generator that returns batches of size
       batch_size x n_steps from arr.
       
       Arguments
       ---------
       arr: Array you want to make batches from
       batch_size: Batch size, the number of sequences per batch
       n_steps: Number of sequence steps per batch
    '''
    # Get the number of characters per batch and number of batches we can make
    characters_per_batch = batch_size * n_steps
    n_batches = len(arr) // characters_per_batch
    
    # Keep only enough characters to make full batches
    arr = arr[:n_batches*characters_per_batch]
    
    # Reshape into batch_size rows
    arr = arr.reshape(batch_size,-1)
    
    for n in range(0, arr.shape[1], n_steps):
        # The features
        x = arr[:, n:n+n_steps]
        # The targets, shifted by one
        y = np.zeros_like(x)
        y[:, :-1] = x[:, 1:]
        y[:, -1] = x[:, 0]
        yield x, y

Test get_batches()

In [8]:
batches = get_batches(encoded, 10, 50)
x, y = next(batches)

In [9]:
print('x\n', x[:10, :10])
print('\ny\n', y[:10, :10])

x
 [[40 50 44 32 40 43  2  0 42 32]
 [ 2 81 67  2 42 77 81 82 63 76]
 [34  8 67 81 70 82 67  2 73 72]
 [74 67 70 82 67 81 83 63 80  2]
 [63 80  2 84 67 82 67 82 71 75]
 [73  2 74 77 66 70 72 63 31  2]
 [78 71  2 38 72 77 80 69 83 76]
 [83 66 70 67 81  2  0 66 70 67]
 [80 81 67 13  2 75 77 80 71  2]
 [ 2 81 70 83 75 67  2 75 67  2]]

y
 [[50 44 32 40 43  2  0 42 32 35]
 [81 67  2 42 77 81 82 63 76 66]
 [ 8 67 81 70 82 67  2 73 72 77]
 [67 70 82 67 81 83 63 80  2 81]
 [80  2 84 67 82 67 82 71 75 63]
 [ 2 74 77 66 70 72 63 31  2  0]
 [71  2 38 72 77 80 69 83 76  2]
 [66 70 67 81  2  0 66 70 67 13]
 [81 67 13  2 75 77 80 71  2 68]
 [81 70 83 75 67  2 75 67  2 71]]


In [10]:
def build_inputs(batch_size, num_steps):
    ''' Define placeholders for inputs, targets, and dropout 
    
        Arguments
        ---------
        batch_size: Batch size, number of sequences per batch
        num_steps: Number of sequence steps in a batch
        
    '''
    # Declare placeholders we'll feed into the graph
    inputs = tf.placeholder(tf.int32, [batch_size, num_steps])
    targets = tf.placeholder(tf.int32, [batch_size, num_steps])
    
    # Keep probability placeholder for drop out layers
    keep_prob = tf.placeholder(tf.float32, name='keep_prob')
    
    return inputs, targets, keep_prob

### LSTM Cell

In [11]:
def build_lstm(lstm_size, num_layers, batch_size, keep_prob):
    ''' Build LSTM cell.
    
        Arguments
        ---------
        keep_prob: Scalar tensor (tf.placeholder) for the dropout keep probability
        lstm_size: Size of the hidden layers in the LSTM cells
        num_layers: Number of LSTM layers
        batch_size: Batch size

    '''
    ### Build the LSTM Cell
    def build_cell(num_units, keep_prob):
        # Use a basic LSTM cell
        lstm = tf.contrib.rnn.BasicLSTMCell(num_units)
        # Add dropout to the cell outputs
        drop = tf.contrib.rnn.DropoutWrapper(lstm, output_keep_prob=keep_prob)
    
        return drop
    
    
    # Stack up multiple LSTM layers, for deep learning
    cell = tf.contrib.rnn.MultiRNNCell([build_cell(lstm_size, keep_prob) for _ in range(num_layers)])
    initial_state = cell.zero_state(batch_size, tf.float32)
    
    return cell, initial_state

### RNN Output

In [12]:
def build_output(lstm_output, in_size, out_size):
    ''' Build a softmax layer, return the softmax output and logits.
    
        Arguments
        ---------
        
        lstm_output: List of output tensors from the LSTM layer
        in_size: Size of the input tensor, for example, size of the LSTM cells
        out_size: Size of this softmax layer
    
    '''

    # Reshape output so it's a bunch of rows, one row for each step for each sequence.
    # Concatenate lstm_output over axis 1 (the columns)
    seq_output = tf.concat(lstm_output, axis=1)
    # Reshape seq_output to a 2D tensor with lstm_size columns
    x = tf.reshape(seq_output, [-1, in_size])
    
    # Connect the RNN outputs to a softmax layer
    with tf.variable_scope('softmax'):
        # Create the weight and bias variables here
        softmax_w = tf.Variable(tf.truncated_normal((in_size, out_size), stddev=0.1))
        softmax_b = tf.Variable(tf.zeros(out_size))
    
    # Since output is a bunch of rows of RNN cell outputs, logits will be a bunch
    # of rows of logit outputs, one for each step and sequence
    logits = tf.matmul(x, softmax_w) + softmax_b
    
    # Use softmax to get the probabilities for predicted characters
    out = tf.nn.softmax(logits, name='predictions')
    
    return out, logits

### Training loss

In [13]:
def build_loss(logits, targets, lstm_size, num_classes):
    ''' Calculate the loss from the logits and the targets.
    
        Arguments
        ---------
        logits: Logits from final fully connected layer
        targets: Targets for supervised learning
        lstm_size: Number of LSTM hidden units
        num_classes: Number of classes in targets
        
    '''
    
    # One-hot encode targets and reshape to match logits, one row per sequence per step
    y_one_hot = tf.one_hot(targets, num_classes)
    y_reshaped =  tf.reshape(y_one_hot, logits.get_shape())
    
    # Softmax cross entropy loss
    loss = tf.nn.softmax_cross_entropy_with_logits(logits=logits, labels=y_reshaped)
    loss = tf.reduce_mean(loss)
    
    return loss

### Optimizer

In [14]:
def build_optimizer(loss, learning_rate, grad_clip):
    ''' Build optmizer for training, using gradient clipping.
    
        Arguments:
        loss: Network loss
        learning_rate: Learning rate for optimizer
    
    '''
    
    # Optimizer for training, using gradient clipping to control exploding gradients
    tvars = tf.trainable_variables()
    grads, _ = tf.clip_by_global_norm(tf.gradients(loss, tvars), grad_clip)
    train_op = tf.train.AdamOptimizer(learning_rate)
    optimizer = train_op.apply_gradients(zip(grads, tvars))
    
    return optimizer

### Build the network

In [15]:
class CharRNN:
    
    def __init__(self, num_classes, batch_size=64, num_steps=50, 
                       lstm_size=128, num_layers=2, learning_rate=0.001, 
                       grad_clip=5, sampling=False):
    
        # When we're using this network for sampling later, we'll be passing in
        # one character at a time, so providing an option for that
        if sampling == True:
            batch_size, num_steps = 1, 1
        else:
            batch_size, num_steps = batch_size, num_steps

        tf.reset_default_graph()
        
        # Build the input placeholder tensors
        self.inputs, self.targets, self.keep_prob = build_inputs(batch_size, num_steps)

        # Build the LSTM cell
        cell, self.initial_state = build_lstm(lstm_size, num_layers, batch_size, keep_prob)

        ### Run the data through the RNN layers
        # First, one-hot encode the input tokens
        x_one_hot = tf.one_hot(self.inputs, num_classes)
        
        # Run each sequence step through the RNN with tf.nn.dynamic_rnn 
        outputs, state = tf.nn.dynamic_rnn(cell, x_one_hot, initial_state=self.initial_state)
        self.final_state = state
        
        # Get softmax predictions and logits
        self.prediction, self.logits = build_output(outputs, lstm_size, num_classes)
        
        # Loss and optimizer (with gradient clipping)
        self.loss = build_loss(self.logits, self.targets, lstm_size, num_classes) 
        self.optimizer = build_optimizer(self.loss, learning_rate, grad_clip)

## Hyperparameters

In [16]:
batch_size = 100         # Sequences per batch
num_steps = 100          # Number of sequence steps per batch
lstm_size = 512         # Size of hidden layers in LSTMs
num_layers = 2          # Number of LSTM layers
learning_rate = 0.001    # Learning rate
keep_prob = 0.5          # Dropout keep probability

## Training

In [None]:
epochs = 500
# Print losses every N interations
print_every_n = 50

# Save every N iterations
save_every_n = 200

model = CharRNN(len(vocab), batch_size=batch_size, num_steps=num_steps,
                lstm_size=lstm_size, num_layers=num_layers, 
                learning_rate=learning_rate)

saver = tf.train.Saver(max_to_keep=100)
with tf.Session() as sess:
    sess.run(tf.global_variables_initializer())
    
    # Use the line below to load a checkpoint and resume training
    #saver.restore(sess, 'checkpoints/______.ckpt')
    counter = 0
    for e in range(epochs):
        # Train network
        new_state = sess.run(model.initial_state)
        loss = 0
        for x, y in get_batches(encoded, batch_size, num_steps):
            counter += 1
            start = time.time()
            feed = {model.inputs: x,
                    model.targets: y,
                    model.keep_prob: keep_prob,
                    model.initial_state: new_state}
            batch_loss, new_state, _ = sess.run([model.loss, 
                                                 model.final_state, 
                                                 model.optimizer], 
                                                 feed_dict=feed)
            if (counter % print_every_n == 0):
                end = time.time()
                print('Epoch: {}/{}... '.format(e+1, epochs),
                      'Training Step: {}... '.format(counter),
                      'Training loss: {:.4f}... '.format(batch_loss),
                      '{:.4f} sec/batch'.format((end-start)))
        
            if (counter % save_every_n == 0):
                saver.save(sess, "checkpoints/i{}_l{}.ckpt".format(counter, lstm_size))
    
    saver.save(sess, "checkpoints/i{}_l{}.ckpt".format(counter, lstm_size))

Epoch: 1/500...  Training Step: 50...  Training loss: 3.0784...  0.3110 sec/batch
Epoch: 2/500...  Training Step: 100...  Training loss: 2.9962...  0.3145 sec/batch
Epoch: 2/500...  Training Step: 150...  Training loss: 2.6809...  0.3153 sec/batch
Epoch: 3/500...  Training Step: 200...  Training loss: 2.4355...  0.3152 sec/batch
Epoch: 3/500...  Training Step: 250...  Training loss: 2.3322...  0.3192 sec/batch
Epoch: 4/500...  Training Step: 300...  Training loss: 2.2571...  0.3193 sec/batch
Epoch: 4/500...  Training Step: 350...  Training loss: 2.2289...  0.3219 sec/batch
Epoch: 5/500...  Training Step: 400...  Training loss: 2.2030...  0.3209 sec/batch
Epoch: 6/500...  Training Step: 450...  Training loss: 2.1202...  0.3198 sec/batch
Epoch: 6/500...  Training Step: 500...  Training loss: 2.1189...  0.3205 sec/batch
Epoch: 7/500...  Training Step: 550...  Training loss: 2.0472...  0.3232 sec/batch
Epoch: 7/500...  Training Step: 600...  Training loss: 1.9949...  0.3215 sec/batch
Epoch

Epoch: 56/500...  Training Step: 4900...  Training loss: 1.1941...  0.3228 sec/batch
Epoch: 57/500...  Training Step: 4950...  Training loss: 1.1991...  0.3279 sec/batch
Epoch: 57/500...  Training Step: 5000...  Training loss: 1.1781...  0.3250 sec/batch
Epoch: 58/500...  Training Step: 5050...  Training loss: 1.1562...  0.3220 sec/batch
Epoch: 58/500...  Training Step: 5100...  Training loss: 1.1670...  0.3259 sec/batch
Epoch: 59/500...  Training Step: 5150...  Training loss: 1.1617...  0.3247 sec/batch
Epoch: 60/500...  Training Step: 5200...  Training loss: 1.1632...  0.3221 sec/batch
Epoch: 60/500...  Training Step: 5250...  Training loss: 1.1646...  0.3268 sec/batch
Epoch: 61/500...  Training Step: 5300...  Training loss: 1.1795...  0.3243 sec/batch
Epoch: 61/500...  Training Step: 5350...  Training loss: 1.1429...  0.3222 sec/batch
Epoch: 62/500...  Training Step: 5400...  Training loss: 1.1756...  0.3267 sec/batch
Epoch: 62/500...  Training Step: 5450...  Training loss: 1.1331..

#### Saved checkpoints

In [17]:
tf.train.get_checkpoint_state('checkpoints')

model_checkpoint_path: "checkpoints/i9600_l512.ckpt"
all_model_checkpoint_paths: "checkpoints/i200_l512.ckpt"
all_model_checkpoint_paths: "checkpoints/i400_l512.ckpt"
all_model_checkpoint_paths: "checkpoints/i600_l512.ckpt"
all_model_checkpoint_paths: "checkpoints/i800_l512.ckpt"
all_model_checkpoint_paths: "checkpoints/i1000_l512.ckpt"
all_model_checkpoint_paths: "checkpoints/i1200_l512.ckpt"
all_model_checkpoint_paths: "checkpoints/i1400_l512.ckpt"
all_model_checkpoint_paths: "checkpoints/i1600_l512.ckpt"
all_model_checkpoint_paths: "checkpoints/i1800_l512.ckpt"
all_model_checkpoint_paths: "checkpoints/i2000_l512.ckpt"
all_model_checkpoint_paths: "checkpoints/i2200_l512.ckpt"
all_model_checkpoint_paths: "checkpoints/i2400_l512.ckpt"
all_model_checkpoint_paths: "checkpoints/i2600_l512.ckpt"
all_model_checkpoint_paths: "checkpoints/i2800_l512.ckpt"
all_model_checkpoint_paths: "checkpoints/i3000_l512.ckpt"
all_model_checkpoint_paths: "checkpoints/i3200_l512.ckpt"
all_model_checkpoint_pa

## Sampling

Now that the network is trained, we'll can use it to generate new text. The idea is that we pass in a character, then the network will predict the next character. We can use the new one, to predict the next one. And we keep doing this to generate all new text. I also included some functionality to prime the network with some text by passing in a string and building up a state from that.

The network gives us predictions for each character. To reduce noise and make things a little less random, I'm going to only choose a new character from the top N most likely characters.



In [18]:
def pick_top_n(preds, vocab_size, top_n=5):
    p = np.squeeze(preds)
    p[np.argsort(p)[:-top_n]] = 0
    p = p / np.sum(p)
    c = np.random.choice(vocab_size, 1, p=p)[0]
    return c

In [19]:
def sample(checkpoint, n_samples, lstm_size, vocab_size, prime="The "):
    samples = [c for c in prime]
    model = CharRNN(len(vocab), lstm_size=lstm_size, sampling=True)
    saver = tf.train.Saver()
    with tf.Session() as sess:
        saver.restore(sess, checkpoint)
        new_state = sess.run(model.initial_state)
        for c in prime:
            x = np.zeros((1, 1))
            x[0,0] = vocab_to_int[c]
            feed = {model.inputs: x,
                    model.keep_prob: 1.,
                    model.initial_state: new_state}
            preds, new_state = sess.run([model.prediction, model.final_state], 
                                         feed_dict=feed)

        c = pick_top_n(preds, len(vocab))
        samples.append(int_to_vocab[c])

        for i in range(n_samples):
            x[0,0] = c
            feed = {model.inputs: x,
                    model.keep_prob: 1.,
                    model.initial_state: new_state}
            preds, new_state = sess.run([model.prediction, model.final_state], 
                                         feed_dict=feed)

            c = pick_top_n(preds, len(vocab))
            samples.append(int_to_vocab[c])
        
    return ''.join(samples)

Here, pass in the path to a checkpoint and sample from the network.

In [20]:
tf.train.latest_checkpoint('checkpoints')

'checkpoints/i9600_l512.ckpt'

In [72]:
checkpoint = tf.train.latest_checkpoint('checkpoints')
samp = sample(checkpoint, 2000, lstm_size, len(vocab), prime="Njeriu")
print(samp)

Njeriu qe kishte 
mos e lente te gjithe ne fjalet e te celit te kureshtjes, per 
menxoncat, dhe kishin pasur nje myre ja shume me mendjen e 
tij? Ndasheone me kete, por kaq vajtoni. Katundi per 
ajo njeri pjere imshem. Ate mjeft nje derendajle te ngul


anda qe, sipaq i shkreti, sidomos sipas kanucit, por ai ishte 
mire qaske dhe ne te gjitha ajer, prane ato fleheje qe drejt 
tyre, sic duket e zgjatesha.S'kam ardhur atje porse kishte 
njoftim, por njeletjeset, kur mbari kur e kishte marre krahun me te 
cilin njeriu qe po kryqet. 

Kanuni kishte pershkueshem pastaj shume me ne fund. 
 S'kushe trumbllisi, apothaven e keshtu me rruge, 
dasha. Dukej sikur vertiteti te metronjet. Kush e kim nevajut? 
Ata e veshtroi menjehere. 

Paktazet. Ai u kthy nje levizje e vogel te piote, 
sikur ta distimnin. 

Por kjo eshte e cidijur shkonte gjer ne me 
te aterruksin e tij. 
 Ataj degjohej me e panjohurin, thuase ne kembe, 
me nje fustan e nderi ne mendje. 

 Si eshte e mundur qe jame keshtu?e pyeti 