## RNN definition

In a language model we want to estimate the joint probability of a sequence of symbols (words, chars, phonemes, etc):
$$ P\left(x_{1},x_{2},\dots,x_{T}\right) = \prod_{t=1}^{T}P\left(x_{t}|x_{1},\dots,x_{t-1}\right) $$

A Recurrent Neural Language Model models the conditional probability as:

$$ P\left(x_{t}|x_{1},\dots,x_{t-1}\right) = f_\Theta\left(x_{t}, h_{t - 1} \right) $$

Where $f_\Theta $ represents the Neural network transformations and $ \Theta $ the set of parameters to be learned. $ h_t $ is the hidden state at time $ t $. The $ h_0 $ is part of the learned parameters in the model ($ h_0 \in \Theta $). 

With a rnn char-based language model, the neural netork architecture would be:

![alt text](unroll.jpeg "Unrolled network")

Here $ p_t \in \mathbb{R}^{\left| V \right|} $ is the probability distribution of the next character. It contains as many values as characters we have in the vocabulary (i.e. $ \left| V \right| $)

## Playing with a RNN model

This notebook provides a wrapper class for a Language model trained with a small [Shakespeare corpus](https://raw.githubusercontent.com/karpathy/char-rnn/master/data/tinyshakespeare/input.txt). You can see this class as the object representing the gray circle in the previous figure. It contains the function ```predict(char, state=h_0)``` which receives the input char $ c_t $ and (optionally) the previous state $ h_{t-1} $, and outputs the probability distribution $ p_t $ and the hidden state $ h_t $. If the previous state is not given, the default initial state $ h_0 $ is used.

In [None]:
import argparse
import os
import json
import numpy as np

from keras.models import Sequential, load_model
from keras.layers import LSTM, Dropout, TimeDistributed, Dense, Activation, Embedding
import cufflinks as cf
import pandas as pd

cf.set_config_file(offline=True, world_readable=True)


corpus = './corpus1'

class Sampler(object):
    def __init__(self):
        with open(os.path.join(corpus, 'data', 'char_to_idx.json')) as f:
            self.char_to_idx = json.load(f)
        self.idx_to_char = { i: ch for (ch, i) in self.char_to_idx.items() }
        self.vocab_size = len(self.char_to_idx)
        
        model = Sequential()
        model.add(Embedding(self.vocab_size, 512, batch_input_shape=(1, 1)))
        for i in range(3):
            model.add(LSTM(256, return_sequences=(i != 2), stateful=True))
            model.add(Dropout(0.2))

        model.add(Dense(self.vocab_size))
        model.add(Activation('softmax'))
        self.model = model
    

    def sample(self, num_chars, header='', epoch=100, returnProb=False):
        self.model.load_weights(os.path.join(corpus, 'model', 'weights.{}.h5'.format(epoch)))

        sampled = [self.char_to_idx[c] for c in header]
        for c in header[:-1]:
            batch = np.zeros((1, 1))
            batch[0, 0] = self.char_to_idx[c]
            self.model.predict_on_batch(batch)

        for i in range(num_chars):
            batch = np.zeros((1, 1))
            if sampled:
                batch[0, 0] = sampled[-1]
            else:
                batch[0, 0] = np.random.randint(self.vocab_size)
            prob = self.model.predict_on_batch(batch).ravel()
            sample = np.random.choice(range(self.vocab_size), p=prob)
            sampled.append(sample)

        if returnProb:
            return ''.join(self.idx_to_char[c] for c in sampled), prob
        else:
            return ''.join(self.idx_to_char[c] for c in sampled)

sampler = Sampler()

In [None]:
print(sampler.sample(1000))

In [None]:
sample, prob = sampler.sample(1, header=' p', returnProb=True)
dist = pd.Series(prob, index=[sampler.idx_to_char[ix] for ix in range(sampler.vocab_size)])
dist.sort_values(ascending=False).iloc[:20].iplot(kind='bar', size=(100, 2))

In [None]:
sample, prob = sampler.sample(1, header=' pe', returnProb=True)
dist = pd.Series(prob, index=[sampler.idx_to_char[ix] for ix in range(sampler.vocab_size)])
dist.sort_values(ascending=False).iloc[:20].iplot(kind='bar')

In [None]:
sample, prob = sampler.sample(1, header=' per', returnProb=True)
dist = pd.Series(prob, index=[sampler.idx_to_char[ix] for ix in range(sampler.vocab_size)])
dist.sort_values(ascending=False).iloc[:20].iplot(kind='bar')

In [None]:
sample, prob = sampler.sample(1, header=' perd', returnProb=True)
dist = pd.Series(prob, index=[sampler.idx_to_char[ix] for ix in range(sampler.vocab_size)])
dist.sort_values(ascending=False).iloc[:20].iplot(kind='bar')

In [None]:
sample, prob = sampler.sample(1, header=' perdo', returnProb=True)
dist = pd.Series(prob, index=[sampler.idx_to_char[ix] for ix in range(sampler.vocab_size)])
dist.sort_values(ascending=False).iloc[:20].iplot(kind='bar')

In [None]:
sample, prob = sampler.sample(1, header=' perdon', returnProb=True)
dist = pd.Series(prob, index=[sampler.idx_to_char[ix] for ix in range(sampler.vocab_size)])
dist.sort_values(ascending=False).iloc[:20].iplot(kind='bar')

In [None]:
print(sampler.sample(1000, header='y jesús dijo:'))

# What else to do with language models:
 - Complete missing characters:
   B i e n _ v e n t _ r a d o ...
 - Given a set of characters, what is the most likely word:
   {m,r,a,o}