# A short & practical introduction to Tensor Flow!

Part 6

Example of a single-layer one-directional long short-term memory network (LSTM) trained with
[connectionist temporal classification](http://www.cs.toronto.edu/~graves/icml_2006.pdf) to predict character sequences from nFeatures x nFrames
arrays of Mel-Frequency Cepstral Coefficients.  This is test code to run on the
8-item data set in the "sample_data" directory, for those without access to TIMIT.

Author: [Jon Rein](https://github.com/jonrein/tensorflow_CTC_example) 

Adapted by: Pablo M. Olmos (olmos@tsc.uc3m.es)

Date: March 2017






In [None]:
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function

import tensorflow as tf
import zipfile
from tensorflow.python.ops import ctc_ops as ctc
import numpy as np
from utils import load_batched_data

In [None]:
# Lets check what version of tensorflow we have installed. The provided scripts should run with tf 1.0 and above

print(tf.__version__)

In [None]:
batchSize = 4 # Batch of sequences

## Change according to the folder where you saved the dataset provided
INPUT_PATH = '../../DataSets/MCC_sample_data_phoneme_recog/mfcc' #directory of MFCC nFeatures x nFrames 2-D array .npy files
TARGET_PATH = '../../DataSets/MCC_sample_data_phoneme_recog/char_y/' #directory of nCharacters 1-D array .npy files


####Load data
print('Loading data')
batchedData, maxTimeSteps, totalN = load_batched_data(INPUT_PATH, TARGET_PATH, batchSize) 

In [None]:
####Learning Parameters
learningRate = 0.001
momentum = 0.9
nEpochs = 200

####Network Parameters
nFeatures = 26 #12 MFCC coefficients + energy, and derivatives
nHidden = 128
nClasses = 28#27 characters, plus the "blank" for CTC

## Creating the computation graph

We will create a LSTM layer with 128 memory cells. On top of this, we use a fully connected soft-max layer.

We use [CTC classification](http://www.cs.toronto.edu/~graves/icml_2006.pdf) to compute the loss function that we can optimize by gradient descend. This function is already provided in the TF [contributions library](        https://www.tensorflow.org/versions/r0.10/api_docs/python/nn/conectionist_temporal_classification__ctc_)

### LSTMs 

Recall the fundamental model


<img src="files/figLSTM.png">

Also, the un-regularized cost function is

\begin{align}
J(\boldsymbol{\theta})=\frac{1}{N}\sum_{n=1}^N\sum_{t=1}^{T_n}d(\boldsymbol{y}_t^{(n)},\text{softmax}(\boldsymbol{W}_h \boldsymbol{h}_t^{(n)}+\mathbf{b}))
\end{align}
where $d(\cdot,\cdot)$ is the cross-entropy loss function.

### Bi-directional LSTMs 

\begin{align}
J(\boldsymbol{\theta})=\frac{1}{N}\sum_{n=1}^N\sum_{t=1}^{T_n}d(\boldsymbol{y}_t^{(n)},\text{softmax}(\boldsymbol{W}_h \boldsymbol{h}_t^{(n)}+\boldsymbol{W}_h \boldsymbol{z}_t^{(n)}+\mathbf{b})),
\end{align}
where $\boldsymbol{z}_t^{(n)}$ is the output at time t of the backward LSTM NN. 







In [None]:
####Define graph
print('Defining graph')
graph = tf.Graph()
with graph.as_default():

    

    #### We start by encoding the forward LSTM with nHidden cells
    
    
    #i(t) parameters
    # Input gate: input, previous output, and bias.
    ix = tf.Variable(tf.truncated_normal([nFeatures, nHidden], -0.05, 0.05))   ##W^ix
    im = tf.Variable(tf.truncated_normal([nHidden, nHidden], -0.05, 0.05)) ## W^ih
    ib = tf.Variable(tf.zeros([1, nHidden])) ##b_i
    
    #f(t) parameters
    # Forget gate: input, previous output, and bias.
    fx = tf.Variable(tf.truncated_normal([nFeatures, nHidden], -0.05, 0.05)) ##W^fx
    fm = tf.Variable(tf.truncated_normal([nHidden, nHidden], -0.05, 0.05)) ##W^fh
    fb = tf.Variable(tf.zeros([1, nHidden])) ##b_f
    
    #g(t) parameters
    # Memory cell: input, state and bias.                             
    cx = tf.Variable(tf.truncated_normal([nFeatures, nHidden], -0.05, 0.05)) ##W^gx
    cm = tf.Variable(tf.truncated_normal([nHidden, nHidden], -0.05, 0.05)) ##W^gh
    cb = tf.Variable(tf.zeros([1, nHidden]))  ##b_g
    
    #o(t) parameters
    # Output gate: input, previous output, and bias.
    ox = tf.Variable(tf.truncated_normal([nFeatures, nHidden], -0.05, 0.05))  ##W^ox
    om = tf.Variable(tf.truncated_normal([nHidden, nHidden], -0.05, 0.05))  ##W^oh
    ob = tf.Variable(tf.zeros([1, nHidden])) ##b_o
    
    # Variable saving state across unrollings.
    saved_output = tf.Variable(tf.zeros([batchSize, nHidden]), trainable=False) #h(t)
    saved_state = tf.Variable(tf.zeros([batchSize, nHidden]), trainable=False) #s(t)
    
    ####  ####  ####  ####  ####  ####  ####  ####  ####  ####  ####  ####  ####  ####  ####  ####  #### 
    
    # Definition of the cell computation.
    def lstm_cell(i, o, state):
        """Create a LSTM cell. See e.g.: http://arxiv.org/pdf/1402.1128v1.pdf
        Note that in this formulation, we omit the various connections between the
        previous state and the gates."""
        
        input_gate = tf.sigmoid(tf.matmul(i, ix) + tf.matmul(o, im) + ib)
        forget_gate = tf.sigmoid(tf.matmul(i, fx) + tf.matmul(o, fm) + fb)
        update = tf.matmul(i, cx) + tf.matmul(o, cm) + cb       
        state = forget_gate * state + input_gate * tf.tanh(update)    #tf.tanh(update) is g(t)
        output_gate = tf.sigmoid(tf.matmul(i, ox) + tf.matmul(o, om) + ob)
        return output_gate * tf.tanh(state), state      #h(t) is output_gate * tf.tanh(state) 
    
        
    #### Now the backward LSTM with nHidden cells
    
    #i(t) parameters
    # Input gate: input, previous output, and bias.
    b_ix = tf.Variable(tf.truncated_normal([nFeatures, nHidden], -0.05, 0.05))   ##W^ix
    b_im = tf.Variable(tf.truncated_normal([nHidden, nHidden], -0.05, 0.05)) ## W^ih
    b_ib = tf.Variable(tf.zeros([1, nHidden])) ##b_i
    
    #f(t) parameters
    # Forget gate: input, previous output, and bias.
    b_fx = tf.Variable(tf.truncated_normal([nFeatures, nHidden], -0.05, 0.05)) ##W^fx
    b_fm = tf.Variable(tf.truncated_normal([nHidden, nHidden], -0.05, 0.05)) ##W^fh
    b_fb = tf.Variable(tf.zeros([1, nHidden])) ##b_f
    
    #g(t) parameters
    # Memory cell: input, state and bias.                             
    b_cx = tf.Variable(tf.truncated_normal([nFeatures, nHidden], -0.05, 0.05)) ##W^gx
    b_cm = tf.Variable(tf.truncated_normal([nHidden, nHidden], -0.05, 0.05)) ##W^gh
    b_cb = tf.Variable(tf.zeros([1, nHidden]))  ##b_g
    
    #o(t) parameters
    # Output gate: input, previous output, and bias.
    b_ox = tf.Variable(tf.truncated_normal([nFeatures, nHidden], -0.05, 0.05))  ##W^ox
    b_om = tf.Variable(tf.truncated_normal([nHidden, nHidden], -0.05, 0.05))  ##W^oh
    b_ob = tf.Variable(tf.zeros([1, nHidden])) ##b_o
    
    # Variable saving state across unrollings.
    b_saved_output = tf.Variable(tf.zeros([batchSize, nHidden]), trainable=False) #h(t)
    b_saved_state = tf.Variable(tf.zeros([batchSize, nHidden]), trainable=False) #s(t)
    
    
    # Definition of the backward_cell computation.
    def lstm_cell_back(i, o, state):
        """Create a LSTM cell. See e.g.: http://arxiv.org/pdf/1402.1128v1.pdf
        Note that in this formulation, we omit the various connections between the
        previous state and the gates."""
        
        input_gate = tf.sigmoid(tf.matmul(i, b_ix) + tf.matmul(o, b_im) + b_ib)
        forget_gate = tf.sigmoid(tf.matmul(i, b_fx) + tf.matmul(o, b_fm) + b_fb)
        update = tf.matmul(i, b_cx) + tf.matmul(o, b_cm) + b_cb       
        state = forget_gate * state + input_gate * tf.tanh(update)    #tf.tanh(update) is g(t)
        output_gate = tf.sigmoid(tf.matmul(i, b_ox) + tf.matmul(o, b_om) + b_ob)
        return output_gate * tf.tanh(state), state      #h(t) is output_gate * tf.tanh(state) 
    
    
    ####  ####  ####  ####  ####  ####  ####  ####  ####  ####  ####  ####  ####  ####  ####  ####  #### 
    
    # Classifier weights and biases (over h(t) and b_h(t) to labels)
    w = tf.Variable(tf.truncated_normal([nHidden, nClasses], -0.05, 0.05))
    b = tf.Variable(tf.zeros([nClasses]))    
    b_w = tf.Variable(tf.truncated_normal([nHidden, nClasses], -0.05, 0.05))

    ####  ####  ####  ####  ####  ####  ####  ####  ####  ####  ####  ####  ####  ####  ####  #### 
    
    # Now we define the placeholders for the input data
        
    train_data = list()
    for _ in range(maxTimeSteps):
        train_data.append(tf.placeholder(tf.float32, shape=[batchSize,nFeatures]))

    targetIxs = tf.placeholder(tf.int64)
    targetVals = tf.placeholder(tf.int32)
    targetShape = tf.placeholder(tf.int64)
    targetY = tf.SparseTensor(targetIxs, targetVals, targetShape)
    seqLengths = tf.placeholder(tf.int32, shape=(batchSize))    
        
    # Given the input, we indicate how to compute the hidden states  
        
    # Unrolled forward LSTM loop.
    outputs = list()
    output = saved_output
    state = saved_state
    for i in train_data:
        output, state = lstm_cell(i, output, state)
        outputs.append(output)

        
    with tf.control_dependencies([saved_output.assign(output),saved_state.assign(state)]):

        # Unrolled backward LSTM loop. Initilialized with the last state of the forward loop!!
        # With the control dependencies command, we make sure we do not run the backward unrolling
        # until the forward one is finished.
        
        b_outputs = list()
        b_output = output
        b_state = state
        for i in reversed(train_data):
            b_output, b_state = lstm_cell_back(i, b_output, b_state)
            b_outputs.append(b_output)        
        
        b_outputs=b_outputs[::-1]

        with tf.control_dependencies([b_saved_output.assign(b_output),b_saved_state.assign(b_state)]):        

            logits = tf.reshape(tf.matmul(tf.concat(axis=0,values=outputs),w)+
                                tf.matmul(tf.concat(axis=0,values=b_outputs),b_w)+b,[-1,batchSize,nClasses]) 

            #https://www.tensorflow.org/versions/r0.050/api_docs/python/nn/conectionist_temporal_classification__ctc_
            loss = tf.reduce_mean(ctc.ctc_loss(inputs=logits, labels=targetY,  sequence_length=seqLengths))

                ####Optimizing
    optimizer = tf.train.MomentumOptimizer(learningRate, momentum).minimize(loss)


    ####Evaluating
    logitsMaxTest = tf.slice(tf.argmax(logits, 2), [0, 0], [seqLengths[0], 1])
    predictions = tf.to_int32(ctc.ctc_beam_search_decoder(logits, seqLengths)[0][0])
    errorRate = tf.reduce_sum(tf.edit_distance(predictions, targetY, normalize=False)) / tf.to_float(tf.size(targetY.values))

In [None]:
####Run session
with tf.Session(graph=graph) as session:
    print('Initializing')
    tf.global_variables_initializer().run()
    
    for epoch in range(nEpochs):
        print('Epoch', epoch+1, '...')
        batchErrors = np.zeros(len(batchedData))
        batchRandIxs = np.random.permutation(len(batchedData)) #randomize batch order
        for batch, batchOrigI in enumerate(batchRandIxs):
            batchInputs, batchTargetSparse, batchSeqLengths = batchedData[batchOrigI]
            batchTargetIxs, batchTargetVals, batchTargetShape = batchTargetSparse
            feedDict = {targetIxs: batchTargetIxs, targetVals: batchTargetVals,
                        targetShape: batchTargetShape, seqLengths: batchSeqLengths}
            for i in range(maxTimeSteps):
                feedDict[train_data[i]] = batchInputs[i,:,:]
                
            _, l, er, lmt,logits_out = session.run([optimizer, loss, errorRate, logitsMaxTest,logits], feed_dict=feedDict)
            print(np.unique(lmt)) #print unique argmax values of first sample in batch; should be blank for a while, then spit out target values
            if (batch % 1) == 0:
                print('Minibatch', batch, '/', batchOrigI, 'loss:', l)
                print('Minibatch', batch, '/', batchOrigI, 'error rate:', er)
            batchErrors[batch] = er*len(batchSeqLengths)
        epochErrorRate = batchErrors.sum() / totalN
        print('Epoch', epoch+1, 'error rate:', epochErrorRate)

In [None]:
# Lets visualize the prediction for the first training sequence
np.argmax(logits_out[:,0,:],axis=1)

In [None]:
import matplotlib.pyplot as plt

In [None]:
%matplotlib notebook
plt.plot(logits_out[:,0,27],'k-',label='blank')
plt.plot(logits_out[:,0,10],'b-',label='Index 10')
plt.plot(logits_out[:,0,9],'r-',label='Index 9')
plt.plot(logits_out[:,0,26],'g-',label='Index 12')
plt.legend(loc=4)
plt.xlabel('Time Steps')
plt.ylabel('Output Logits')

More about the spykes that tend to apear at the RNN output when using CTC 

[Supervised Sequence Labelling](https://www.cs.toronto.edu/~graves/preprint.pdf)