http://karpathy.github.io/2015/05/21/rnn-effectiveness/ -- Good post about RNN.


<img src="pic/rnn.png" width="600">

<img src="pic/diags.jpeg" width="600">

<img src="pic/bptt.png" width="600">


## Truncated Backprop Through Time

**It helps to save resources. And, as you will see RNNs training is a very long process.**

<img src="pic/tbptt.png" width="600">


## Char RNN (Embeding Layer vs One Hot Encoding)

** At each step we use all our previous information to predict next symbol. **

<img src="pic/crnn.png" width="600">


## Captioning (Embeding Layer vs One Hot Encoding).
** This is about convolution, you can see more information there: https://habrahabr.ru/post/309508/. It is very good in feature extraction, espesially for images.**

<img src="pic/cap.png" width="600">

## We are going to do char RNN for text generation.

## It is interesting to generate new laws, isn't it?

In [1]:
import os
import random
import numpy as np
import tensorflow as tf

  from ._conv import register_converters as _register_converters


** Read data **

In [2]:
# Text.
data = ""
for fname in os.listdir("codex"):
    with open("codex/"+fname, encoding='cp1251') as fin:
        data += fin.read()

In [3]:
print(data[500:800])

ство, исходя из общепризнанных принципов равноправия и самоопределения народов, чтя память предков, передавших нам любовь и уважение к Отечеству, веру в добро и справедливость, возрождая суверенную государственность России и утверждая незыблемость ее демократической основы, стремясь обеспечить благо


Preprocessing

In [4]:
chars = sorted(list(set(data)))

In [6]:
def one_hot(v):
    return np.eye(vocab_size)[v]

In [7]:
data_size, vocab_size = len(data), len(chars)
print('Data has %d characters, %d unique.' % (data_size, vocab_size))
char_to_ix = {ch: i for i, ch in enumerate(chars)}
ix_to_char = {i: ch for i, ch in enumerate(chars)}

Data has 3331133 characters, 101 unique.


In [8]:
# Hyper-parameters
hidden_size   = 100  # hidden layer's size
seq_length    = 25   # number of steps to unroll

inputs = tf.placeholder(shape=[None, vocab_size], dtype=tf.float32, name="inputs")
targets = tf.placeholder(shape=[None, vocab_size], dtype=tf.float32, name="targets")
init_state = tf.placeholder(shape=[1, hidden_size], dtype=tf.float32, name="state")

initializer = tf.random_normal_initializer(stddev=0.1)

** Next, we will write our oun simple RNN. If you want to see work of RNN from black box, you can watch file Simple_LSTM.ipynb **

** $tanh(x) = \frac{1 - e^{-2x}}{1 + e^{-2x}}$**

** $h = tanh(W_{xh}\cdot xs_t + W_{hh}\cdot hs_t + b_h)$**

** $y = W_{hy} \cdot h + b_y$  **

In [9]:
# Scope is handly environment for variables.
with tf.variable_scope("RNN") as scope:
    hs_t = init_state
    ys = []
    for t, xs_t in enumerate(tf.split(inputs, seq_length, axis=0)):
        if t > 0: scope.reuse_variables()  # Reuse variables
        # Gets an existing variable with these parameters or create a new one.
        Wxh = tf.get_variable("Wxh", [vocab_size, hidden_size], initializer=initializer)
        Whh = tf.get_variable("Whh", [hidden_size, hidden_size], initializer=initializer)
        Why = tf.get_variable("Why", [hidden_size, vocab_size], initializer=initializer)
        bh  = tf.get_variable("bh", [hidden_size], initializer=initializer)
        by  = tf.get_variable("by", [vocab_size], initializer=initializer)
        
        # Our function.
        hs_t = tf.tanh(tf.matmul(xs_t, Wxh) + tf.matmul(hs_t, Whh) + bh)
        ys_t = tf.matmul(hs_t, Why) + by
        ys.append(ys_t)

In [10]:
hprev = hs_t
# Get softmax for sampling.
output_softmax = tf.nn.softmax(ys[-1])  
outputs = tf.concat(ys, axis=0)
loss = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(labels=targets, logits=outputs))

# Minimizer
minimizer = tf.train.AdamOptimizer(learning_rate=0.005)
grads_and_vars = minimizer.compute_gradients(loss)

Instructions for updating:

Future major versions of TensorFlow will allow gradients to flow
into the labels input on backprop by default.

See tf.nn.softmax_cross_entropy_with_logits_v2.



In [11]:
# Gradient clipping 
# We prevent them from being too large. 
# Simple: new_gradients = gradients * threshold / l2_norm(gradients).
grad_clipping = tf.constant(5.0, name="grad_clipping")
clipped_grads_and_vars = []
for grad, var in grads_and_vars:
    clipped_grad = tf.clip_by_value(grad, -grad_clipping, grad_clipping)
    clipped_grads_and_vars.append((clipped_grad, var))

# Gradient updates
updates = minimizer.apply_gradients(clipped_grads_and_vars)

# Session
sess = tf.Session()
init = tf.global_variables_initializer()
sess.run(init)

In [12]:
# Initial values
n, p = 0, 0
hprev_val = np.zeros([1, hidden_size])

while n < 200000:
    # Initialize
    if p + seq_length + 1 >= len(data) or n == 0:
        hprev_val = np.zeros([1, hidden_size])
        p = 0  # reset

    # Prepare inputs
    input_vals  = [char_to_ix[ch] for ch in data[p:p + seq_length]]
    target_vals = [char_to_ix[ch] for ch in data[p + 1:p + seq_length + 1]]

    input_vals  = one_hot(input_vals)
    target_vals = one_hot(target_vals)

    hprev_val, loss_val, _ = sess.run([hprev, loss, updates],
                                      feed_dict={inputs: input_vals,
                                                 targets: target_vals,
                                                 init_state: hprev_val})
    if n % 10000 == 0:
        # Progress
        print('iter: %d, p: %d, loss: %f' % (n, p, loss_val))

        # Do sampling
        sample_length = 200
        start_ix      = random.randint(0, len(data) - seq_length)
        sample_seq_ix = [char_to_ix[ch] for ch in data[start_ix:start_ix + seq_length]]
        ixes          = []
        sample_prev_state_val = np.copy(hprev_val)

        for t in range(sample_length):
            sample_input_vals = one_hot(sample_seq_ix)
            sample_output_softmax_val, sample_prev_state_val = \
                sess.run([output_softmax, hprev],
                         feed_dict={inputs: sample_input_vals, init_state: sample_prev_state_val})

            ix = np.random.choice(range(vocab_size), p=sample_output_softmax_val.ravel())
            ixes.append(ix)
            sample_seq_ix = sample_seq_ix[1:] + [ix]

        txt = ''.join(ix_to_char[ix] for ix in ixes)
        print('----\n %s \n----\n' % (txt,))

    p += seq_length
    n += 1

iter: 0, p: 0, loss: 3.849723
----
 О!+щЖ 'лМРcЧЖ'“СЧ!У'Е„ЖЧ-“--0Э+IIЙ%СIЧP/---Р6Н
©P-„8оАОэ+7-!-г/--Е-тЫ8Усч84 !э-;ЫчЫc-УI-Е-Ч"I--у+Е---а(--Ч«
РЕу-84I„ЕIуЫ+У'--"Еуо--ЕI-у+/%Ы-P/эФ-Е-рц-----уеЛ-мЕЧт%уРэ-,P--!-»ЕЖ--/–Ч-Э-/Йсу-Е-Ж-8P-85гж 
----

iter: 10000, p: 250000, loss: 1.680023
----
 огогласпровренные,.
.2.
 Седвия.
 Сденным ооящероляля татьненнях скспреде, в в ния ниявлидия в, неензыемя.
 2), и си, обевохся)хеддомовяроя, обяелятьящия освия дозлечения ращания прытогы,, тазозония,, 
----

iter: 20000, p: 500000, loss: 2.792428
----
 а, вана, (29. аевееждавоздаезаюьннообьты, поссий вотасьнеменовов пельтьставьногогобда оедездавые ошльсое пречаемения0505 в в обыльногодльные споссеоаванаемых сеедльзоссеожльтодномодия ездемаюьспотодан 
----

iter: 30000, p: 750000, loss: 3.330765
----
 ожерогденяя оссий дерятогойстодаведедеря сра срй дередедередаляченея обля дерсямедерення Коредакожеяваров в боракоданеся, в Кож с0000ря 220диящего содаовебов.
 ащественя рговаря порваямобо госсложера  
---

KeyboardInterrupt: 