In [1]:
import numpy as np
import tensorflow as tf
import matplotlib.pyplot as plt

We will use TensorFlow for RNN and Numpy to prepare our own data.

In [2]:
num_examples = 3000
seq_length = 10
sequences = np.empty((num_examples,seq_length))
for i in range(num_examples):
    seq = np.random.randint(2, size=(seq_length)).astype('float32')
    sequences[i] = seq

print("shape of input data:",sequences.shape)
print("first element:", sequences[0])

shape of input data: (3000, 10)
first element: [ 0.  0.  0.  1.  1.  0.  1.  0.  0.  1.]


Our objective is to classify sequences with size 10. For this purpose we're creating a dataset that includes 3.000 example array. Each array consists of 0's and 1's. Number of 1's and 0's is random. Number of 1's determines the array's class. So we can only have 11 classes at most because our array length is 10. (Don't forget we can have all zeroes)

In [3]:
target_classes = []
for input in sequences: 
    target = (input == 1).sum()
    target_classes.append(target)

target_classes = np.asarray(target_classes)
target_classes

array([4, 5, 7, ..., 3, 5, 5])

For a supervised learning method we need to know the answers. This means we need to have the correct classes of our sequences for training. So we count the number of 1's for each array and we append them to a 1D array.

Now we need to encode our label array with 1-hot encoding. Because in Machine Learning algorithms we tend to encode our class labels with 1-hot encoding. There are a couple of reasons for this. For example, in this problem our network can predict the class labels with probabilities instead of exact class labels. Like below:

In [4]:
sample = np.random.exponential(2,10)
sample /= sample.sum()
sample

array([ 0.10612717,  0.06763875,  0.06630916,  0.07286086,  0.18861263,
        0.03540382,  0.24893265,  0.12198714,  0.0880622 ,  0.00406562])

I think you can see why 1-hot encoding is useful in this case. You can think like that:
"With 1-hot encoding we say:
"This is an apple 100% and this is a banana 0%", instead of saying just "This is an apple". Now let's see how we can encode our label array."

In [5]:
np.eye(11)

array([[ 1.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.],
       [ 0.,  1.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.],
       [ 0.,  0.,  1.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  1.,  0.,  0.,  0.,  0.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  0.,  1.,  0.,  0.,  0.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  0.,  0.,  1.,  0.,  0.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  0.,  0.,  0.,  1.,  0.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  0.,  0.,  0.,  0.,  1.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  1.,  0.,  0.],
       [ 0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  1.,  0.],
       [ 0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  1.]])

The function above is just creating a unit matrix with the size as a parameter. But if you look carefully this is 1-hot encoded array between 0 and 10. This is actually our 1-hot encoded class labels. We just need to encode our training data using this matrix.

In [6]:
target_classes = np.eye(11)[target_classes]
target_classes[0]

array([ 0.,  0.,  0.,  0.,  1.,  0.,  0.,  0.,  0.,  0.,  0.])

That's it! This is why Python is awesome.

In [7]:
n_training = 2000
trainX  = sequences[:n_training]
trainY = target_classes[:n_training]
validX = sequences[n_training:2900]
validY = target_classes[n_training:2900]
testX = sequences[2900:]
testY = target_classes[2900:]

trainX = trainX.reshape(1,trainX.shape[0],trainX.shape[1])
validX = validX.reshape(1,validX.shape[0],validX.shape[1])
testX = testX.reshape(1,testX.shape[0],testX.shape[1])

trainX.shape,validX.shape,testX.shape,trainY.shape,validY.shape,testY.shape

((1, 2000, 10), (1, 900, 10), (1, 100, 10), (2000, 11), (900, 11), (100, 11))

We've splitted our data in two parts. 2.000 of them is for training, 900 for validation and remaining 100 is for testing. We've reshaped our input data in a 3D shape. Because that's what TensorFlow RNN function's requirement.

In [8]:
batch_size = 100
n_hidden = 128
n_chunks = 28
x = tf.placeholder("float32", [None, None, seq_length])
y = tf.placeholder("int32", [None, 11])

In [9]:
def next_batch(batch_index):
        # Go to the next epoch
        if batch_index*batch_size + batch_size > trainX.shape[1]:
            # Finished epoch
            # Get the rest examples in this epoch
            rest_num_examples = trainX.shape[1] - batch_index*batch_size
            input_rest_part = trainX[:,trainX.shape[1]-rest_num_examples:trainX.shape[1]]
            labels_rest_part = trainY[trainY.shape[0]-rest_num_examples:trainY.shape[0]]
            # Start next epoch
            batch_index = 0
            start = batch_index * batch_size
            end = start + batch_size
            input_new_part = trainX[:,start:end]
            labels_new_part = trainY[start:end]
            batch_index += 1
            return np.concatenate((input_rest_part, input_new_part), axis=1), np.concatenate(
                (labels_rest_part, labels_new_part), axis=0)
        else:
            start = batch_index*batch_size
            end = start+batch_size
            batch_index += 1
            return trainX[:,start:end], trainY[start:end]

We need two placeholders. x for input y for class labels. 'n_hidden' means "number of hidden layers".

In [10]:
weights = tf.Variable(tf.random_normal([n_hidden, 11]))
biases = tf.Variable(tf.random_normal([11]))

Just like that:

In [11]:
for i in range(3):
    ex,ey = next_batch(i)
    print(ex.shape)
    print(ey.shape)

(1, 100, 10)
(100, 11)
(1, 100, 10)
(100, 11)
(1, 100, 10)
(100, 11)


We've defined our weight and bias variables.

Above code splits the training data to batches in batch_size according to the epoch number that provided as a parameter.

In [12]:
def RNN(x, weights, biases):
    
    # Define a lstm cell with tensorflow
    lstm_cell = tf.contrib.rnn.BasicLSTMCell(n_hidden, forget_bias=1.0)

    # Get lstm cell output
    outputs, states = tf.nn.dynamic_rnn(lstm_cell, x, dtype=tf.float32)

    # Linear activation, using rnn inner loop last output
    return tf.matmul(outputs[-1], weights) + biases

'output[-1]' means the last output in a an array of outputs.

In [13]:
def train(x):
    pred = RNN(x,weights,biases)
    output = pred
    softmax = tf.nn.softmax(output)
    index_of_max_prob = tf.argmax(softmax, 1)
    correct_labels =  tf.argmax(y, 1)
    
    cost = tf.reduce_mean( tf.nn.softmax_cross_entropy_with_logits(logits=pred,labels=y) )
    optimizer = tf.train.AdamOptimizer(learning_rate=0.0001).minimize(cost)
    
    hm_epochs = 200
    display_step = 20
    with tf.variable_scope('training'):
        with tf.Session() as sess:
            sess.run(tf.global_variables_initializer())
            print("Before training|Prediction for first 10 sequence:",index_of_max_prob.eval({x:testX[:,0:10]}))
            for epoch in range(hm_epochs):
                epoch_loss = 0
                for batch_index in range(int(n_training/batch_size)):
                    epoch_x, epoch_y = next_batch(batch_index)
                    _, c = sess.run([optimizer, cost], feed_dict={x: epoch_x, y: epoch_y})
                    epoch_loss += c

                if (epoch)%display_step==0:
                    print('Epoch', epoch, 'completed out of',hm_epochs,'loss:',epoch_loss)
                    correct = tf.equal(tf.argmax(pred, 1), tf.argmax(y, 1))
                    accuracy = tf.reduce_mean(tf.cast(correct, 'float'))
                    if(epoch==hm_epochs-1):
                        print("Optimization finished!")
                    print('Training accuracy:',accuracy.eval({x:trainX, y:trainY}))
                    print('Validation accuracy:',accuracy.eval({x:validX, y:validY}))
                    print('Test accuracy:',accuracy.eval({x:testX, y:testY}))
                    print("--------------------------------------------------------")
                  
            print("After training|Prediction for first 10 sequence:",index_of_max_prob.eval({x:testX[:,0:10]}))
            print("Correct labels for first 10 sequence",correct_labels.eval({y:testY[:10]}))

We train our network and make predictions with it.

In [14]:
train(x)

Before training|Prediction for first 10 sequence: [9 1 1 1 9 9 9 9 9 9]
Epoch 0 completed out of 200 loss: 67.9889593124
Training accuracy: 0.2035
Validation accuracy: 0.188889
Test accuracy: 0.18
--------------------------------------------------------
Epoch 20 completed out of 200 loss: 35.622640729
Training accuracy: 0.3035
Validation accuracy: 0.281111
Test accuracy: 0.27
--------------------------------------------------------
Epoch 40 completed out of 200 loss: 30.9267612696
Training accuracy: 0.3575
Validation accuracy: 0.343333
Test accuracy: 0.32
--------------------------------------------------------
Epoch 60 completed out of 200 loss: 28.0669082403
Training accuracy: 0.4235
Validation accuracy: 0.39
Test accuracy: 0.39
--------------------------------------------------------
Epoch 80 completed out of 200 loss: 26.2641035318
Training accuracy: 0.4785
Validation accuracy: 0.448889
Test accuracy: 0.42
--------------------------------------------------------
Epoch 100 completed