### 1. Recurrent Neural Network  (RNN): Generating the next character or next word of text
Used extensively for building language models. A language model allows us to
**predict the probability of a word in a text given the previous words**. Language models are important
for various higher level tasks such as machine translation, spelling correction, and so on.

**A side effect** of the ability to predict the next word given previous words is a generative model that
allows us **to generate text by sampling from the output probabilities.** In language modeling, our input
is typically a sequence of words and the output is a sequence of predicted words. **The training data**
used is existing **unlabeled text**, where we set the label $y_t$ at time $t$ to be the input $x_{t+1}$ at time $t+1$.

For our first example of using Keras for building RNNs, **we will train a character based language
model** on the text of "Alice in Wonderland" **to predict the next character given 10 previous characters.**
We have chosen to build a character-based model here because it has a smaller vocabulary and trains
quicker. The idea is the same as using a word-based language model, except we use characters
instead of words. We will then use the trained model to generate some text in the same style.

**Import Modules**

In [2]:
from __future__ import print_function
from keras.layers import Dense, Activation
from keras.layers.recurrent import SimpleRNN
from keras.models import Sequential
#from keras.utils.visualize_util import plot    #it not be found
import numpy as np

Using TensorFlow backend.


**Read Input File**:http://www.gutenberg.org/files/11/11-0.txt)

The file contains line breaks and non-ASCII characters, so we do some preliminary cleanup and write out the contents into a variable called _text_:

In [12]:
INPUT_FILE = "data/alice_in_wonderland.txt"
# extract the input as a stream of characters
print("Extracting text from input...")
fin = open(INPUT_FILE, 'rb')
lines = []
for line in fin:
    line = line.strip().lower()
    line = line.decode("ascii", "ignore")
    if len(line) == 0:
        continue
    lines.append(line)
fin.close()
text = " ".join(lines)

Extracting text from input...


**Creating the index for the characters:** 
    
Since we are building a character-level RNN, our vocabulary is the set of characters that occur in the
text. There are 42 of them in our case.Since we will be dealing with the indexes to these characters
rather than the characters themselves, the following code snippet creates the necessary lookup tables

In [13]:
chars = set([c for c in text])
nb_chars = len(chars)
char2index = dict((c, i) for i, c in enumerate(chars))
index2char = dict((i, c) for i, c in enumerate(chars))

**Create the input and label texts**

We do this by stepping through the text by a number
of characters given by the STEP variable (1 in our case) and then extracting a span of text whose size is
determined by the SEQLEN variable (10 in our case). The next character after the span is our label
character.
For the next input text: _it turned into a pig_
- it turned_ -> the next character is "i" 
- t turned i -> the next character is "n" 
- turned in -> the next character is "t" 
- turned int -> the next character is "o"
- urned into -> the next character is " "
- rned into -> the next character is "a"
- ned into a-> the next character is " "
- ed into a -> the next character is "p"
- d into a p-> the next character is "i"
- into a pi-> the next character is "g"

In [14]:
SEQLEN = 10
STEP = 1
input_chars = []
label_chars = []
for i in range(0, len(text) - SEQLEN, STEP):
    input_chars.append(text[i:i + SEQLEN])
    label_chars.append(text[i + SEQLEN])

**Vectorize the input and label texts**

Each row of the input to the RNN corresponds to one of the input texts shown previously. There are _SEQLEN_ characters in this input, and since our vocabulary size is given by nb_chars_, we represent each input character as a one-hot encoded vector of size (nb_chars). **Thus each input row is a tensor of size (SEQLEN and nb_chars)**. Our output label is a single character, so similar to the way we represent each character of our input, it is represented
as a one-hot vector of size (nb_chars). Thus, the shape of each label is nb_chars.


In [15]:
X = np.zeros((len(input_chars), SEQLEN, nb_chars), dtype=np.bool)   # input is a tensor with size: SEQLEN x nb_chars
y = np.zeros((len(input_chars), nb_chars), dtype=np.bool)           # output with size: nb_chars  
for i, input_char in enumerate(input_chars):
    for j, ch in enumerate(input_char):
        X[i, j, char2index[ch]] = 1
    y[i, char2index[label_chars[i]]] = 1

**Building the RNN Model**

the RNN's output dimension needs to be determined by experimentation. In general, if we
choose too small a size, then the model does not have sufficient capacity for generating good text, and
you will see long runs of repeating characters or runs of repeating word groups. On the other hand, if
the value chosen is too large, the model has too many parameters and needs a lot more data to train
effectively. 

We want to return a single character as output, not a sequence of characters, so
**return_sequences=False**. We have already seen that the input to the RNN is of shape (SEQLEN and nb_chars).
In addition, we set **unroll=True** because it **improves performance on the TensorFlow backend**.

In [16]:
HIDDEN_SIZE = 128                      
BATCH_SIZE = 128                                                # RNN's output dimension
NUM_ITERATIONS = 25
NUM_EPOCHS_PER_ITERATION = 1
NUM_PREDS_PER_EPOCH = 100

In [17]:
model = Sequential()

model.add(SimpleRNN(HIDDEN_SIZE, 
                    return_sequences=False,                     # we want only one character, not a sequence
                    input_shape=(SEQLEN, nb_chars),             # input is a tensor with size: SEQLEN x nb_chars
                    unroll=True))                               # to improve the performance on the TensorFlow backend

model.add(Dense(nb_chars))
model.add(Activation("softmax"))

model.compile(loss="categorical_crossentropy", optimizer="rmsprop")

Our training approach is a little different from what we have seen so far. 
- So far our approach has been to train a model for a fixed number of epochs, then evaluate it against a portion of held-out test data.
- Since we don't have any labeled data here, we train the model for an epoch(NUM_EPOCHS_PER_ITERATION=1) then test it.

We continue training like this for 25 (NUM_ITERATIONS=25) iterations,
stopping once we see intelligible output. So effectively, we are training for NUM_ITERATIONS epochs and
testing the model after each epoch.

Our test consists of
- generating a character from the model given a random input, 
- then dropping the first character from the input 
- and appending the predicted character from our previous run, 
- and generating another character from the model.

We continue this 100 times (NUM_PREDS_PER_EPOCH=100) and
generate and print the resulting string. The string gives us an indication of the quality of the model:

**We train the model in batches and test output generated at each step**

In [18]:
# We train the model in batches and test output generated at each step
for iteration in range(NUM_ITERATIONS):
    print("=" * 50)
    print("Iteration #: %d" % (iteration))
    model.fit(X, y, 
              batch_size=BATCH_SIZE, 
              epochs=NUM_EPOCHS_PER_ITERATION)
    
    # testing model
    # randomly choose a row from input_chars, then use it to 
    # generate text from model for next 100 chars
    test_idx = np.random.randint(len(input_chars))
    test_chars = input_chars[test_idx]
    print("Generating from seed: %s" % (test_chars))
    print(test_chars, end="")
    for i in range(NUM_PREDS_PER_EPOCH):
        Xtest = np.zeros((1, SEQLEN, nb_chars))                      # input is a tensor with size: SEQLEN x nb_chars
        for i, ch in enumerate(test_chars):
            Xtest[0, i, char2index[ch]] = 1
        pred = model.predict(Xtest, verbose=0)[0]
        ypred = index2char[np.argmax(pred)]
        print(ypred, end="")
        # move forward with test_chars + ypred
        test_chars = test_chars[1:] + ypred
print()

Iteration #: 0
Epoch 1/1
Generating from seed: no meaning
Iteration #: 1
Epoch 1/1
Generating from seed:  felt quit
Iteration #: 2
Epoch 1/1
Generating from seed: an wrappin
Iteration #: 3
Epoch 1/1
Generating from seed: after-time
Iteration #: 4
Epoch 1/1
Generating from seed: at alice q
Iteration #: 5
Epoch 1/1
Generating from seed: e was goin
Iteration #: 6
Epoch 1/1
Generating from seed: tners-- --
Iteration #: 7
Epoch 1/1
Generating from seed: oes your w
Iteration #: 8
Epoch 1/1
Generating from seed: o a farmer
Iteration #: 9
Epoch 1/1
Generating from seed: rning to t
Iteration #: 10
Epoch 1/1
Generating from seed: ou, she sa
Iteration #: 11
Epoch 1/1
Generating from seed: rried on, 
Iteration #: 12
Epoch 1/1
Generating from seed: and then a
Iteration #: 13
Epoch 1/1
Generating from seed:  these wor
Iteration #: 14
Epoch 1/1
Generating from seed: ould be fr
Iteration #: 15
Epoch 1/1
Generating from seed: s hardly r
Iteration #: 16
Epoch 1/1
Generating from seed: t tree in 
Iterati

Generating the next character or next word of text is not the only thing you can do with this sort of
model. This kind of model has been successfully used to make
- stock predictions (Financial Market Time Series Prediction with Recurrent Neural Networks, by A. Bernal, S. Fok, and R. Pidaparthi, 2012) 
- and generate classical music ( DeepBach: A Steerable Model for Bach Chorales Generation, by G. Hadjeres and F. Pachet, arXiv:1612.01010, 2016)
- Andrej Karpathy covers a few other fun examples, such as generating fake Wikipedia pages, algebraic geometry proofs, and Linux source code in his blog post at: The Unreasonable Effectiveness of Recurrent Neural Networks at http://karpathy.github.io/2015/05/21/rnn-effectiveness/.