# Temporal Action Segmentation in Videos Using Recurrent Neural Networks

Welcome! In this lab, you'll learn how to train a recurrent neural network on the [Breakfast dataset](http://serre-lab.clps.brown.edu/resource/breakfast-actions-dataset/) for the task of temporal action segmentation and recognition. Our goal is to predict one action label for each frame of a long untrimmed video showing a person preparing breakfast. We will go through all the steps, including loading the data, building and training a model, calculating the accuracy, and making predictions. We will use the [Tensorflow library](https://github.com/tensorflow/tensorflow) to build and train our model.

## Breakfast Dataset Overview

[Breakfast dataset](http://serre-lab.clps.brown.edu/resource/breakfast-actions-dataset/) is a video dataset containing 1712 long untrimmed videos, each one containing a sequence of multiple actions. Videos contain actions related to breakfast preparation, performed by 52 different individuals in 18 different kitchens. It features videos from multiple cameras, which were uncalibrated and their position changes based on the location. There are ∼77 hours of video (> 4 million frames). The resolution of videos is 320×240 pixels and the frame rate is 15 fps.

For this lab, we will use precomputed features describing each frame of the video and our goal is to predict a label for the action performed by the person in each frame of a video.
![Example output](http://serre-lab.clps.brown.edu/wp-content/uploads/2012/04/juice_frame550.png)


### Installing software and importing libraries

In [None]:
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function


import logging
from numpy import array, vstack, sum, prod, argmax
import os
import time
from collections import namedtuple, defaultdict
import pdb

import tensorflow as tf
from tensorflow.python.ops import clip_ops, array_ops, nn_ops, math_ops
from tensorflow.python.framework import ops
from tensorflow.contrib.rnn import LSTMCell, GRUCell, MultiRNNCell

from datasets.breakfast import BreakfastDataset
from datasets.batch_generator import FrameSequenceBatchGenerator
from utils.preprocessing import pad_sequences, DataPreprocessor
from utils.metrics import per_frame_accuracy
from utils.tf_utils import debug_nans, to_categorical, clip_gradients, \
    stack_bidirectional_dynamic_rnn
from utils.plot_utils import plot_optimization_log_frame
from utils.my_io_utils import save_to_pickle
from utils.misc import frame_labels_to_segments

### 1) Loading data

# TODO: download data with command

In [None]:
training_annotations_json_file = '../data/breakfast/annotations_train_split0.json'
testing_annotations_json_file = '../data/breakfast/annotations_test_split0.json'
downsampling_factor = 15

# BreakfastDataset class implements methods: __len__(): returning the number of training/testing samples/videos
# and __getitem__: returning a dictionary with keys: 'video_name', 'feat', 'labels', 'frame_indices'
training_set = BreakfastDataset(dataset_path='../data/breakfast', downsampling_factor=downsampling_factor,
                 annotations_json_file=training_annotations_json_file)

testing_set = BreakfastDataset(dataset_path='../data/breakfast', downsampling_factor=downsampling_factor,
                 annotations_json_file=testing_annotations_json_file)

#### Let's look into one training sample

In [None]:
nb_training_samples = len(training_set)
print('Nb training videos: {:d}'.format(nb_training_samples))

In [None]:
training_sample_ind = 0
training_sample = training_set[training_sample_ind]
segs, seg_labels = frame_labels_to_segments(training_sample['labels'])
action_names = training_set.get_action_names()
action_names_sequence = [action_names[seg_label] for seg_label in seg_labels]

In [None]:
print('Training sample {} video name: {}'.format(training_sample_ind, training_sample['video_name']))

In [None]:
print('Training sample {} feature shape (nb_frames, feat_dim): {}'.format(training_sample_ind, str(training_sample['feat'].shape)))

In [None]:
print('Training sample {} per frame labels: {}'.format(training_sample_ind, str(training_sample['labels'])))

In [None]:
print('Training sample {} sequence of actions : {}'.format(training_sample_ind, str(action_names_sequence)))

In [None]:
print('Training sample {} frame indices: {}'.format(training_sample_ind, str(training_sample['frame_indices'])))

In [None]:
print('\n'.join(action_names))

### 2) Preprocessing data

Preprocessing includes padding feature sequences, so that they all have the same length, and converting labels from an integer format (e.g., "2"), to an [one hot encoding](https://en.wikipedia.org/wiki/One-hot) (e.g., "0, 0, 1, 0, 0, 0, 0, 0, 0, 0"). We will define a class that implements a preprocess method with arguments features_lst and frame_labels_lst.

In [None]:
class RNNDataPreprocessor(DataPreprocessor):
    def __init__(self, preprocessor_params):
        """
        :param preprocessor_params: dict with keys:
                       'encoder_nb_classes' and 'max_nb_frames',
                       encoder_nb_classes: number of classes for encoder
                                          (number of action labels)
                       max_nb_frames: maximum number of frames

        """
        super().__init__(preprocessor_params)

    def preprocess(self, features_lst, frame_labels_lst):
        """
        Pads feature sequences and frame_labels to max_nb_frames
        Converts frame_labels to one-hot encoding

        :param features_lst: list with nb_samples elements. Each element
                              is a numpy array of size (timesteps, feat_dim)
        :param frame_labels_lst: a list with nb_samples elements.
                    Each element is a (nb_timesteps,) numpy array or list,
                    each element of which list is a list of the labels for
                    this timestep
        :return: feat: numpy array with shape (nb_samples, max_nb_frames,
                       feat_dim)
                 frame_labels: numpy array with shape
                              (nb_samples, max_nb_frames, encoder_nb_classes)
                 frame_sample_weights: numpy binary array with shape
                   (nb_samples, max_nb_frames, 1),
                   having zeroes at padded indices and ones elsewhere
                 frame_sequence_lengths: numpy array with shape = (nb_samples, )
                   with true length (nb_timesteps) per sample sequence
        """
        encoder_nb_classes = self.params['encoder_nb_classes']
        max_nb_frames = self.params['max_nb_frames']

        # ----------------------------------------------------------------------
        # Process features and frame_labels (Pad and one-hot encoding)
        frame_sequence_lengths = [x.shape[0] for x in features_lst]
        feat, frame_sample_weights = pad_sequences(
            features_lst, max_len=max_nb_frames,
            dtype='float32', value=0.)

        # Format labels
        # Go from y_t = {1...C} to one-hot vector (e.g. y_t = [1, 0, 1, 0])
        # list of arrays (timesteps, num_classes))
        frame_labels_lst = [to_categorical(y, encoder_nb_classes)
                            for y in frame_labels_lst]

        # Pad frame labels sequences
        frame_labels, _ = pad_sequences(frame_labels_lst, max_len=max_nb_frames,
                                        dtype='float32', value=0.)

        return feat, frame_labels, frame_sample_weights, \
            frame_sequence_lengths

In [None]:
nb_classes = len(action_names)
preprocessor_obj = RNNDataPreprocessor(
    preprocessor_params={
        'encoder_nb_classes': nb_classes,
    })

### 3) Define data generators, providing batches of data

We will now get training and testing data generators, implementing the method __next__(), which is yielding batches of data. Each batch contains features: (batch_size, nb_timesteps, feat_dim),  frame_labels: (batch_size, nb_timesteps, nb_classes), frame_sample_weights: (batch_size, nb_timesteps,) and frame_sequence_lengths: (batch_size,)

In [None]:
batch_size = 32

training_batch_generator_obj = FrameSequenceBatchGenerator(batch_size, dataset_obj=training_set, preprocessor_obj=preprocessor_obj, nb_classes=nb_classes,
                 shuffle=True, seed=42)

In [None]:

testing_batch_generator_obj = FrameSequenceBatchGenerator(batch_size, dataset_obj=testing_set, preprocessor_obj=preprocessor_obj, nb_classes=nb_classes,
                 shuffle=False, seed=42)

In [None]:
max_nb_frames_train = training_batch_generator_obj.get_max_nb_frames()
print('Maximum number of frames in training videos: {:d}'.format(max_nb_frames_train))
max_nb_frames_test = testing_batch_generator_obj.get_max_nb_frames()
print('Maximum number of frames in testing videos: {:d}'.format(max_nb_frames_test))
max_nb_frames = max(max_nb_frames_train, max_nb_frames_test)
training_batch_generator_obj.configure_preprocessing(_max_nb_frames=max_nb_frames)
testing_batch_generator_obj.configure_preprocessing(_max_nb_frames=max_nb_frames)

feat_dim = training_batch_generator_obj.get_feat_dim()

### 4) Build tensorflow model

We will now declare our Recurrent Neural Network model. TensorFlow uses a dataflow graph to represent our computation in terms of the dependencies between individual operations. 

4a) First, we will set the values of our hyperparameters.

In [None]:
DEBUG = 0

# Output returned after running a forward pass through our model.
InferenceOutput = namedtuple(
    "InferenceOutput",
    "frame_logits frame_predictions frame_y_pred")

# Results on validation set
ValFrameResults = namedtuple(
    "ValFrameLevelResults",
    "val_frame_metric val_frame_loss val_frame_predictions val_frame_y_pred "
    "val_frame_y_true val_frame_sample_weights "
    "val_frame_logits"
)

# Results on training set
TrainFrameResults = namedtuple(
    "TrainFrameLevelResults",
    "train_frame_metric train_frame_loss train_frame_predictions "
    "train_frame_y_pred train_frame_y_true train_frame_sample_weights "
    "train_frame_logits"
)

# Output of rnn encoder
EncoderOutput = namedtuple(
    "EncoderOutput",
    "outputs final_state")


params = {
    "feat_dim": feat_dim,
    "encoder_nb_classes": nb_classes,
    "max_nb_frames": max_nb_frames,
    # Model parameters
    "encoder_nb_hidden_units": 256,
    "encoder_cell_type": 'gru',
    "encoder_is_bidirectional": True,
    "encoder_activation": 'tanh',
    "encoder_initializer": 'random_uniform',
    "encoder_init_scale": 0.1,
    "encoder_clip_gradients": 1,
    "encoder_nb_layers": 2,
    # Training/Optimizer parameters
    "dropout_rate": 0.3,
    "optimizer_name": "Adam",
    "learning_rate": 0.001,
    "nb_epochs": 50,
    "shuffle": True,
    "momentum": 0.0,
    "decay_period": 10.0,
    "decay_rate": 0.5,
    "nesterov": False,
    "seed": 42,
    "log_dir": '../data/breakfast/results',
}

if params['encoder_initializer'] == 'None':
    params['encoder_initializer'] = None
if params['encoder_clip_gradients'] == -1:
    params['encoder_clip_gradients'] = None


4b) Then, we will define our network architecture, i.e. all the operations
required to go from an input sequence of features to an output sequence of class predictions per frame.

In [None]:
def output_fc_layer(layer_inputs, initializer):
    # Output Fully Connected layer
    layer_name = "encoder_time_fully_connected"
    with tf.name_scope(layer_name):
        # Apply dense layer to each timestep
        # kernel: (nb_hidden_states , nb_classes)
        # bias: (nb_classes,)

        unstacked = tf.unstack(layer_inputs, axis=1)
        dense_res = [tf.layers.dense(
            inputs=s, units=params['encoder_nb_classes'],
            activation=None,
            use_bias=True, kernel_initializer=initializer,
            bias_initializer=tf.zeros_initializer(),
            kernel_regularizer=None, bias_regularizer=None,
            activity_regularizer=None, trainable=True,
            name=layer_name, reuse=(i != 0))
            for (i, s) in enumerate(unstacked)]

        # output: 3D Tensor (batch_size, nb_timesteps,
        # nb_classes)
        frame_logits = tf.stack(dense_res, axis=1)
        frame_logits = debug_nans(frame_logits, "frame_logits", debug=DEBUG)
        print("Logits static shape: ",
              frame_logits.shape)
        frame_predictions = tf.nn.softmax(frame_logits)
        frame_y_pred = tf.cast(tf.argmax(frame_logits, axis=-1), tf.int32)

        # Create summaries for tensorboard visualization
        fc_vars = tf.get_collection(tf.GraphKeys.TRAINABLE_VARIABLES,
                                    layer_name)
        tf.summary.histogram('kernel', fc_vars[0])
        tf.summary.histogram('bias', fc_vars[1])
        tf.summary.histogram('frame_logits', frame_logits)
        tf.summary.histogram('frame_predictions', frame_predictions)

        return frame_logits, frame_predictions, frame_y_pred

In [None]:
def get_rnn_cell(rnn_cell_params, is_training, layer_name='rnn_cell'):
    with tf.variable_scope(layer_name):

        # Initialize cell weights/biases
        if rnn_cell_params['initializer'] is None:
            initializer = None
        elif rnn_cell_params['initializer'] == "random_uniform":
            init_scale = rnn_cell_params['init_scale'] / rnn_cell_params[
                'nb_hidden_units']
            initializer = tf.random_uniform_initializer(
                -init_scale, init_scale)
        elif rnn_cell_params['initializer'] == 'random_normal':
            initializer = tf.random_normal_initializer(
                0, rnn_cell_params['init_scale'])
        else:
            raise ValueError("Invalid initializer %s",
                             rnn_cell_params['initializer'])
        
        # Get forward and backward cells for bidirectional RNN. Weights are not shared.
        if rnn_cell_params['cell_type'] == 'lstm':
            # LSTM Cell
            # Default activation: tanh
            if rnn_cell_params['activation'] == 'tanh':
                activation = tf.nn.tanh
            else:
                activation = None

            cell_fw = LSTMCell(num_units=rnn_cell_params['nb_hidden_units'],
                               use_peepholes=False,
                               cell_clip=None, initializer=initializer,
                               num_proj=None, proj_clip=None,
                               forget_bias=1.0, state_is_tuple=True,
                               activation=activation)
            cell_bw = LSTMCell(num_units=rnn_cell_params['nb_hidden_units'],
                               use_peepholes=False,
                               cell_clip=None, initializer=initializer,
                               num_proj=None, proj_clip=None,
                               forget_bias=1.0, state_is_tuple=True,
                               activation=activation)
        elif rnn_cell_params['cell_type'] == 'gru':
            # GRU Cell
            # If bias_initializer is None, then it starts with bias
            # of 1.0
            # to not reset and not update.
            # Default activation: tanh
            if rnn_cell_params['activation'] == 'tanh':
                activation = tf.nn.tanh
            else:
                activation = None
            cell_fw = GRUCell(num_units=rnn_cell_params['nb_hidden_units'],
                              activation=activation,
                              reuse=None,
                              kernel_initializer=initializer,
                              bias_initializer=None)
            cell_bw = GRUCell(num_units=rnn_cell_params['nb_hidden_units'],
                              activation=activation,
                              reuse=None,
                              kernel_initializer=initializer,
                              bias_initializer=None)
        else:
            raise ValueError('Not supported cell type: %s',
                             rnn_cell_params['cell_type'])

        # Add dropout
        output_keep_prob = 1 - rnn_cell_params['dropout_rate']
        dropout_cell_fw = tf.contrib.rnn.DropoutWrapper(
            cell_fw, input_keep_prob=1.0,
            output_keep_prob=tf.cond(
                is_training,
                lambda: tf.constant(output_keep_prob),
                lambda: tf.constant(1.0)),
            state_keep_prob=1.0,
            variational_recurrent=True, input_size=None,
            dtype=tf.float32, seed=None)
        dropout_cell_bw = tf.contrib.rnn.DropoutWrapper(
            cell_bw, input_keep_prob=1.0,
            output_keep_prob=tf.cond(
                is_training,
                lambda: tf.constant(output_keep_prob),
                lambda: tf.constant(1.0)),
            state_keep_prob=1.0,
            variational_recurrent=True, input_size=None,
            dtype=tf.float32, seed=None)

        return dropout_cell_fw, dropout_cell_bw

In [None]:
def rnn_encoder(input_sequences, is_training, sequence_lengths, encoder_params):
    # layer_inputs: [batch_size, max_len, feat_dim]
    layer_inputs = input_sequences
    layer_inputs = debug_nans(layer_inputs, "layer_inputs", debug=DEBUG)
    print("Layer inputs static shape: ", layer_inputs.shape)
    layer_name = "encoder_rnn"

    # Get forward/backward cells for each layer
    cells_fw = []
    cells_bw = []
    for _ in range(encoder_params['nb_layers']):
        cell_fw, cell_bw = get_rnn_cell(encoder_params, is_training=is_training,
                                        layer_name=layer_name)
        cells_fw.append(cell_fw)
        cells_bw.append(cell_bw)

    if encoder_params['is_bidirectional']:
        # outputs: concatenated fw and bw hidden states of last layer
        # output_state_fw, output_state_bw = final_state
        # output_states_fw is the final states, one tensor per layer,
        # of the forward rnn.
        # output_states_bw is the final states, one tensor per layer,
        # of the backward rnn.

        outputs, output_states_fw, output_states_bw, _ = \
            stack_bidirectional_dynamic_rnn(
                cells_fw,
                cells_bw,
                inputs=layer_inputs,
                initial_states_fw=None,
                initial_states_bw=None,
                dtype=tf.float32,
                sequence_length=sequence_lengths,
                parallel_iterations=None,
                scope=layer_name)
        final_state = (output_states_fw, output_states_bw)
    else:
        multi_rnn_cell = MultiRNNCell(cells_fw, state_is_tuple=True)
        outputs, final_state = tf.nn.dynamic_rnn(
            cell=multi_rnn_cell, inputs=layer_inputs,
            sequence_length=sequence_lengths, initial_state=None,
            dtype=tf.float32, parallel_iterations=None,
            time_major=False, scope=layer_name)

    return EncoderOutput(
        outputs=outputs,
        final_state=final_state)

In [None]:
def _inference(input_sequences, is_training,
               sequence_lengths=None):
    """

    :param input_sequences:
    :param is_training:
    :param sequence_lengths:
    :return:
    """

    input_features = input_sequences.features
    frame_sequence_lengths = sequence_lengths.frame_sequence_lengths

    # Encoder
    encoder_params = {
        "feat_dim": params['feat_dim'],
        "nb_classes": params['encoder_nb_classes'],
        "max_nb_frames": params['max_nb_frames'],
        # Model parameters
        "nb_hidden_units": params['encoder_nb_hidden_units'],
        "cell_type": params['encoder_cell_type'],
        "is_bidirectional": params['encoder_is_bidirectional'],
        "activation": params['encoder_activation'],
        "initializer": params['encoder_initializer'],
        "init_scale": params['encoder_init_scale'],
        "clip_gradients": params['encoder_clip_gradients'],
        "nb_layers": params['encoder_nb_layers'],
        "seed": params['seed'],
        "log_dir": params['log_dir'],
        "dropout_rate": params['dropout_rate']
    }

    encoder_output = rnn_encoder(
        input_sequences=input_features, is_training=is_training,
        sequence_lengths=frame_sequence_lengths,
        encoder_params=encoder_params)

    # Encoder Time Distributed Fully Connected Layer
    # for segmentation task
    frame_logits, frame_predictions, frame_y_pred = output_fc_layer(
        encoder_output.outputs, initializer=None)

    return InferenceOutput(
        frame_logits=frame_logits,
        frame_predictions=frame_predictions,
        frame_y_pred=frame_y_pred)

4c) Now we will define our loss and training operation (These ops are only used during training)

In [None]:
def _loss(logits, labels, sample_weights=None):
    """
    Args:
        logits: encoder output logits
                (3D Tensor [batch_size, max_len, nb_classes])
        labels: ground truth frame level labels in one-hot encoding
                [batch_size, max_len, nb_classes]
        sample_weights: 2D Tensor [batch_size, max_len] with
                        0 for all timesteps that are padded.
    Returns:
        loss_op: masked cross-entropy loss op
    """
    # [batch_size, max_nb_frames, nb_classes]
    frame_logits = logits.frame_logits
    # [batch_size, max_nb_frames, nb_classes]
    frame_labels = labels.frame_labels
    # [batch_size, max_nb_frames]
    frame_sample_weights = sample_weights.frame_sample_weights
    
    with tf.name_scope("cross_entropy_sequence_loss"):
        # Reshape logits, labels, sample_weights
        # [batch_size, max_len, nb_classes] ->
        # [batch_size*max_len, nb_classes]
        tf.Print(frame_logits, [tf.shape(frame_logits)], "Logits shape in loss")
        tf.Print(frame_labels, [tf.shape(frame_labels)], "Labels shape in loss")
        nb_classes = array_ops.shape(frame_logits)[2]
        flat_frame_logits = array_ops.reshape(frame_logits, [-1, nb_classes])
        flat_frame_labels = array_ops.reshape(frame_labels, [-1, nb_classes])
        # [batch_size, max_len] -> [batch_size*max_len]
        # Compute cross-entropy for each frame separately
        # flat_xent: 1D Tensor (batch_size*max_len,)
        flat_frame_sample_weights = array_ops.reshape(
            frame_sample_weights, [-1])
        flat_frame_xent = nn_ops.softmax_cross_entropy_with_logits(
            labels=flat_frame_labels, logits=flat_frame_logits)
        # Apply masking on cent, setting to zero summands of
        # the cross-entropy which correspond to padded frames
        weighted_frame_xent = tf.multiply(flat_frame_xent, 
                                          flat_frame_sample_weights)
        # Compute average cross-entropy loss over batches and non-padded
        # timesteps
        frame_xent = math_ops.reduce_sum(weighted_frame_xent)
        nb_unpadded_frames = math_ops.reduce_sum(flat_frame_sample_weights)
        nb_unpadded_frames += 1e-12  # avoid division by 0 for all-0 weights
        frame_xent /= nb_unpadded_frames
        frame_loss_op = frame_xent
        tf.summary.scalar('frame_loss', frame_loss_op)

    return frame_loss_op

In [None]:
def _grad_vars():
    g_vars = tf.trainable_variables()
    return g_vars


def _train_op(loss):
    # Add a scalar summary for loss.
    tf.summary.scalar('loss', loss)

    # Create a variable to track the global step.
    global_step = tf.Variable(0, name='global_step', trainable=False)

    # Learning rate scheduler
    if params['decay_rate'] > 0:
        learning_rate = tf.train.exponential_decay(
            learning_rate=params['learning_rate'],
            global_step=global_step,
            decay_steps=params['decay_period'],
            decay_rate=params['decay_rate'], staircase=True)
    else:
        learning_rate = params['learning_rate']
    # Create optimizer
    optimizer_name = params['optimizer_name']
    if optimizer_name == 'sgd':
        optimizer = tf.train.MomentumOptimizer(
            learning_rate=learning_rate,
            momentum=params['momentum'],
            use_nesterov=params['nesterov'])
    elif optimizer_name == 'rmsprop':
        optimizer = tf.train.RMSPropOptimizer(learning_rate=learning_rate)
    elif optimizer_name == 'Adagrad':
        optimizer = tf.train.AdagradOptimizer(learning_rate=learning_rate)
    elif optimizer_name == 'Adadelta':
        optimizer = tf.train.AdadeltaOptimizer(learning_rate=learning_rate)
    elif optimizer_name == 'Adam':
        optimizer = tf.train.AdamOptimizer(learning_rate=learning_rate)
    else:
        raise ValueError("Not supported optimizer: %s", optimizer_name)

    # Use the optimizer to apply the gradients that minimize the loss
    # and to increment the global step counter as a single training step.
    # train_op = optimizer.minimize(loss, global_step=global_step,
    # name=name)
    g_vars = _grad_vars()

    # Compute gradients
    grads_and_vars = optimizer.compute_gradients(
        loss, var_list=g_vars)

    vars_with_grad = [v for g, v in grads_and_vars if g is not None]
    print("vars_with_grad: ", vars_with_grad)
    if not vars_with_grad:
        raise ValueError(
            "No gradients provided for any variable, "
            "check your graph for ops "
            "that do not support gradients, "
            "between variables %s and loss %s." %
            ([str(v) for _, v in grads_and_vars], loss))

    # Optionally clip gradients by global norm.
    if params['encoder_clip_gradients'] is not None:
        clipped_grads_and_vars = clip_gradients(
            grads_and_vars, params['encoder_clip_gradients'])
    else:
        clipped_grads_and_vars = grads_and_vars

    # Add histograms for variables, gradients and gradient norms
    for gradient, variable in clipped_grads_and_vars:
        if isinstance(gradient, ops.IndexedSlices):
            grad_values = gradient.values
        else:
            grad_values = gradient

        if grad_values is not None:
            var_name = variable.name.replace(":", "_")
            tf.summary.histogram("gradients/%s" % var_name,
                                 grad_values)
            tf.summary.scalar("gradient_norm/%s" % var_name,
                              clip_ops.global_norm([grad_values]))

    if params['encoder_clip_gradients'] is not None:
        for gradient, variable in grads_and_vars:
            if isinstance(gradient, ops.IndexedSlices):
                grad_values = gradient.values
            else:
                grad_values = gradient

            if grad_values is not None:
                var_name = variable.name.replace(":", "_")
                tf.summary.histogram("gradients_before_clip/%s" % var_name,
                                     grad_values)
                tf.summary.scalar("gradient_norm_before_clip/%s" % var_name,
                                  clip_ops.global_norm([grad_values]))

    # Create gradient updates.
    train_op = optimizer.apply_gradients(clipped_grads_and_vars,
                                         global_step=global_step,
                                         name="train")
    return train_op

4d) Finally, we will declare placeholders for feeding data to our graph during training/testing.

In [None]:
def _placeholder_inputs():
    """Generate placeholder variables to represent the input tensors.
    These placeholders are used as inputs for the model.
    Returns: dictionary of placeholders with keys/values:
      'features_pl': Sequences placeholder.
      'frame_labels_pl': Frame labels placeholder.
      'frame_sample_weights_pl': Frame sample weight placeholder
      'frame_sequence_lengths_pl': Frame sequence lengths placeholder
    """

    features_placeholder = tf.placeholder(
        tf.float32,
        [None, params['max_nb_frames'], params['feat_dim']],
        name="features_pl")

    frame_labels_placeholder = tf.placeholder(
        tf.int32,
        [None, params['max_nb_frames'],
         params['encoder_nb_classes']],
        name="frame_labels_pl")
    
    frame_sample_weights = tf.placeholder(
        tf.float32,
        [None, params['max_nb_frames']],
        name="frame_sample_weights_pl")
    
    frame_sequence_lengths_placeholder = tf.placeholder(
        tf.int32,
        [None],
        name="frame_sequence_lengths_pl")

    is_training_placeholder = tf.placeholder(
        tf.bool, [],
        name="is_training_pl")

    placeholders = {
        'features_pl': features_placeholder,
        'frame_labels_pl': frame_labels_placeholder,
        'frame_sample_weights_pl': frame_sample_weights,
        'frame_sequence_lengths_pl':
            frame_sequence_lengths_placeholder,
        'is_training_pl':
            is_training_placeholder,
    }
    return placeholders


def _fill_feed_dict(generator, placeholders_dict, is_training):
    """Fills the feed_dict for training the given step.
    A feed_dict takes the form of:
    feed_dict = {
        <placeholder>: <tensor of values to be passed for placeholder>,
        ....
    }
    Args:
        generator: instance of SequenceBatchGenerator class
                   use methods .__next__() for getting next batch from
                   data and .get_all_samples() for getting
                   all samples
        placeholders_dict: dictionary with keys corresponding to placeholder
         names and values consisting of the actual placeholders
    Returns:
      feed_dict: The feed dictionary mapping from placeholders to values.
      values_dict: The same dictionary but with strings as keys and
                   not placeholders
    """

    # Create the feed_dict for the placeholders filled with the next
    # `batch size` examples.
    features, frame_labels, frame_sample_weights, \
        frame_sequence_lengths = \
        generator.__next__()
    feed_dict = {
        placeholders_dict['features_pl']: features,
        placeholders_dict['frame_labels_pl']: frame_labels,
        placeholders_dict[
            'frame_sample_weights_pl']: frame_sample_weights,
        placeholders_dict[
            'frame_sequence_lengths_pl']: frame_sequence_lengths,
        placeholders_dict['is_training_pl']: is_training,
    }
    values_dict = {
        'features': features,
        'frame_labels': frame_labels,
        'frame_sample_weights': frame_sample_weights,
        'frame_sequence_lengths': frame_sequence_lengths,
        'is_training': is_training,
    }


    return feed_dict, values_dict

### 5) Functions for launching graph and training model

In [None]:
# Building, launching and executing operations on the graph
def _do_train_eval(sess, train_op, loss, eval_metric, predictions,
                   y_pred, training_generator, validation_generator,
                   placeholders_dict, summary, summary_writer, logits):
    """Performs training
    Args:
      sess: The session in which the model has been initialized
      train_op: Training operation
      loss: namedtuple ("frame_loss")
      eval_metric: namedtuple ("frame_eval_metric")
      predictions: namedtuple ("frame_predictions")
      y_pred: namedtuple ("frame_y_pred")
      training_generator: instance of class SequenceBatchGenerator
      validation_generator: instance of class SequenceBatchGenerator
      placeholders_dict: dictionary of placeholders
      summary_writer, summary
      logits: namedtuple ("frame_logits")
    Returns:
        avg_metric: average eval metric evaluated over all validation
                    sequences ("frame_avg_metric")
        train_loss: average validation loss over all validation sequences
                  ("train_frame_loss")
        predictions: 3D Tensor [nb_train_samples, max_len, nb_classes]
                  ("frame_predictions")
        logits: 3D Tensor [nb_train_samples, max_len, nb_classes]
    """

    frame_loss = loss.frame_loss
    frame_eval_metric = eval_metric.frame_eval_metric
    frame_predictions = predictions.frame_predictions
    frame_logits = logits.frame_logits
    frame_y_pred = y_pred.frame_y_pred

    # Start the training loop.
    nb_samples = training_generator.__len__()
    nb_steps = training_generator.steps
    logging.info('Nb of mini-batches per epoch: %d', nb_steps)

    # Train
    epoch = 0
    avg_frame_metric = 0
    train_frame_loss = 0
    train_frame_loss_sum = 0
    train_frame_y_true = []
    train_frame_y_pred = []
    train_frame_sample_weights = []

    val_frame_res = None

    start_time = time.time()

    optimization_log = defaultdict(list)
    for iteration in range(params['nb_epochs'] * nb_steps):
        logging.debug("Iteration: %d", iteration)
        if iteration % nb_steps == 0:
            epoch += 1

            train_frame_loss_sum = 0
            train_frame_y_true = []
            train_frame_y_pred = []
            train_frame_sample_weights = []

            start_time = time.time()

        # Fill a feed dictionary with current batch of
        # training this particular training step.
        training_feed_dict, training_values_dict = _fill_feed_dict(
            training_generator, placeholders_dict, is_training=1)

        # Run one step of the model.  The return values are the
        # activations from the `train_op` (which is discarded)
        # and the `loss` Op.  To inspect the values of your Ops or
        # variables, you may include them
        # in the list passed to sess.run() and the value tensors will be
        # returned in the tuple from the call.

        # Train with one batch and evaluate
        sess_run_ops = [train_op, frame_loss,
                        frame_predictions, frame_logits, frame_y_pred]
        sess_run_res = sess.run(sess_run_ops, feed_dict=training_feed_dict)
        train_op_res, train_frame_loss_batch, \
            train_frame_predictions_batch, train_frame_logits_batch, \
            train_frame_y_pred_batch, \
            = sess_run_res

        train_frame_loss_sum += train_frame_loss_batch

        train_frame_y_pred.append(train_frame_y_pred_batch)
        train_frame_y_true.append(argmax(
            training_values_dict['frame_labels'], axis=-1))
        train_frame_sample_weights.append(
            training_values_dict['frame_sample_weights'])

        # Write the summaries, print an overview and evaluate the model
        # at the end of every epoch
        if (iteration + 1) % nb_steps == 0:
            duration = time.time() - start_time
            # Results aggregation over all batches
            train_frame_loss = float(train_frame_loss_sum) / nb_steps

            # train_frame_y_pred: (nb_samples, nb_timesteps, nb_classes)
            train_frame_y_pred = vstack(train_frame_y_pred)
            # train_frame_y_true: (nb_samples, nb_timesteps, nb_classes)
            train_frame_y_true = vstack(train_frame_y_true)
            train_frame_sample_weights = vstack(train_frame_sample_weights)
            avg_frame_metric = frame_eval_metric(
                train_frame_y_true, train_frame_y_pred,
                sample_weights=train_frame_sample_weights)

            logging.info('Epoch %d: , '
                         'nb_train_samples: %d, train_frame_metric: %f, '
                         'train_frame_loss: %f, '
                         'duration: %f',
                         epoch,  nb_samples,
                         avg_frame_metric,
                         train_frame_loss,
                         duration)

            optimization_log['train_frame_loss'].append(train_frame_loss)
            optimization_log['train_frame_metric'].append(avg_frame_metric)

            # Update the events file.
            summary_str = sess.run(summary,
                                   feed_dict=training_feed_dict)
            summary_writer.add_summary(summary_str, iteration)
            summary_writer.flush()

            val_frame_res = _do_eval(
                sess=sess, loss=loss, eval_metric=eval_metric,
                predictions=predictions, y_pred=y_pred,
                generator=validation_generator,
                placeholders_dict=placeholders_dict,
                logits=logits)

            optimization_log['val_frame_loss'].append(
                val_frame_res.val_frame_loss)
            optimization_log['val_frame_metric'].append(
                val_frame_res.val_frame_metric)

            # TODO: Log output using output_logger if available

    train_frame_res = TrainFrameResults(
        train_frame_metric=avg_frame_metric,
        train_frame_loss=train_frame_loss,
        train_frame_predictions=None,
        train_frame_y_pred=train_frame_y_pred,
        train_frame_y_true=train_frame_y_true,
        train_frame_sample_weights=train_frame_sample_weights,
        train_frame_logits=None,
    )

    plot_optimization_log_frame(optimization_log,
                                params['log_dir'])
    save_to_pickle(os.path.join(params['log_dir'], 'optimization_log.p'),
                   optimization_log)

    return train_frame_res, val_frame_res


def _do_eval(sess, loss, eval_metric, predictions, y_pred,
             generator, placeholders_dict, logits):
    """Runs one evaluation against all data samples, batch by batch.
    Args:
      sess: The session in which the model has been trained.
      loss: namedtuple ("frame_loss")
      eval_metric: namedtuple ("frame_eval_metric")
      predictions: namedtuple ("frame_predictions")
      y_pred: namedtuple ("frame_y_pred")
      generator: instance of class SequenceBatchGenerator
      placeholders_dict: dictionary of placeholders
      logits: namedtuple ("frame_logits")
    Returns:
        avg_metric: average eval metric evaluated over all validation
                    sequences ("frame_avg_metric")
        val_loss: average validation loss over all validation sequences
                  ("val_frame_loss")
        predictions: 3D Tensor [nb_val_samples, max_len, nb_classes]
                  ("frame_predictions")
        y_pred_val: 3D Tensor [nb_val_samples, max_len]
                  ("frame_y_pred_val")
        y_true_val: 3D Tensor [nb_val_samples, max_len]
                  ("frame_y_true_val")
    """

    frame_loss = loss.frame_loss
    frame_eval_metric = eval_metric.frame_eval_metric
    frame_predictions = predictions.frame_predictions
    frame_logits = logits.frame_logits
    frame_y_pred = y_pred.frame_y_pred

    # nb_samples: number of samples to evaluate against
    # nb_steps: number of steps until we see all nb_samples
    nb_samples = generator.__len__()
    nb_steps = generator.steps

    # And run one epoch of eval.

    val_frame_loss_sum = 0
    val_frame_predictions = []
    val_frame_logits = []
    val_frame_y_pred = []
    val_frame_y_true = []
    val_frame_sample_weights = []

    for step in range(nb_steps):

        feed_dict, values_dict = _fill_feed_dict(
            generator, placeholders_dict, is_training=0)

        # Frame segmentation evaluation
        val_frame_loss_batch, \
            val_frame_predictions_batch, val_frame_logits_batch, \
            val_frame_y_pred_batch = sess.run(
                [frame_loss, frame_predictions, frame_logits, frame_y_pred],
                feed_dict=feed_dict)

        val_frame_loss_sum += val_frame_loss_batch
        val_frame_predictions.append(array(val_frame_predictions_batch))
        val_frame_logits.append(array(val_frame_logits_batch))
        val_frame_y_pred.append(val_frame_y_pred_batch)
        val_frame_y_true.append(argmax(values_dict['frame_labels'], axis=-1))
        val_frame_sample_weights.append(values_dict['frame_sample_weights'])

    # Results aggregation over all batches
    val_frame_loss = float(val_frame_loss_sum) / nb_steps
    val_frame_predictions = vstack(val_frame_predictions)
    val_frame_logits = vstack(val_frame_logits)
    val_frame_y_pred = vstack(val_frame_y_pred)
    val_frame_y_true = vstack(val_frame_y_true)
    val_frame_sample_weights = vstack(val_frame_sample_weights)

    avg_frame_metric = frame_eval_metric(
        val_frame_y_true, val_frame_y_pred,
        sample_weights=val_frame_sample_weights)

    logging.info('nb_val_samples: %d, val_frame_metric: %f, '
                 'val_frame_loss: %f',
                 nb_samples, avg_frame_metric,
                 val_frame_loss)

    frame_res = ValFrameResults(
        val_frame_metric=avg_frame_metric,
        val_frame_loss=val_frame_loss,
        val_frame_predictions=val_frame_predictions,
        val_frame_y_pred=val_frame_y_pred,
        val_frame_y_true=val_frame_y_true,
        val_frame_sample_weights=val_frame_sample_weights,
        val_frame_logits=val_frame_logits,
    )

    return frame_res

In [None]:
def fit_generator(training_generator, validation_generator=None):
    """Fit SupSeq2SeqTF Model to training_data and validate
    on validation data
    Args:
        training_generator: training data generator.
        instance of class SequenceBatchGenerator,
        instance of class SequenceBatchGenerator,
         or any class implementing methods: __next__() getting next batch
          (X, y, sample_weights, sequence_lengths), get_all_samples()
          getting all samples (X, y, sample_weights, sequence_lengths)
          , __len__(): get number of samples, steps: get number of steps
          required to get all samples if we load them in a batch fashion
          and batch_size: batch size
        validation_generator: validation data generator
    """

    nb_steps = training_generator.steps
    # Convert decay_rate from epochs to iterations
    params['decay_period'] *= nb_steps

    # Tell TensorFlow that the model will be built into the default Graph.
    with tf.Graph().as_default():
        tf.set_random_seed(params['seed'])

        # Generate placeholders for training/validation data
        placeholders_dict = _placeholder_inputs()

        # Build a Graph that computes predictions from the inference model.
        input_sequences_tuple = namedtuple("InputSequences",
                                           "features")
        input_sequences = input_sequences_tuple(
            features=placeholders_dict['features_pl']
        )
        sequence_lengths_tuple = namedtuple(
            "SequenceLengths",
            "frame_sequence_lengths")
        sequence_lengths = sequence_lengths_tuple(
            frame_sequence_lengths=placeholders_dict[
                'frame_sequence_lengths_pl'],
        )

        inference_output = _inference(
            input_sequences=input_sequences,
            is_training=placeholders_dict['is_training_pl'],
            sequence_lengths=sequence_lengths)

        # Parse inference output
        frame_logits = inference_output.frame_logits
        frame_predictions = inference_output.frame_predictions
        frame_y_pred = inference_output.frame_y_pred

        # Create inputs for _loss and _do_train_eval functions
        logits_tuple = namedtuple("logits",
                                  "frame_logits")
        logits = logits_tuple(
            frame_logits=frame_logits,
        )
        labels_tuple = namedtuple("labels",
                                  "frame_labels")
        labels = labels_tuple(
            frame_labels=placeholders_dict['frame_labels_pl'],
        )
        sample_weights_tuple = namedtuple(
            "sample_weights",
            "frame_sample_weights")
        sample_weights = sample_weights_tuple(
            frame_sample_weights=placeholders_dict[
                'frame_sample_weights_pl'],
        )

        # Add to the Graph the Ops for loss calculation.
        frame_loss = _loss(
            logits=logits, labels=labels,
            sample_weights=sample_weights
        )

        grad_vars = _grad_vars()
        # Add to the Graph the Ops that calculate and apply gradients.
        train_op = _train_op(frame_loss)

        # Add the Op to compare predictions to labels during evaluation.
        loss_tuple = namedtuple("loss",
                                "frame_loss")
        loss = loss_tuple(
            frame_loss=frame_loss,
        )
        eval_metric_tuple = namedtuple("eval_metric",
                                       "frame_eval_metric")
        eval_metric = eval_metric_tuple(
            frame_eval_metric=per_frame_accuracy,
        )
        predictions_tuple = namedtuple("predictions",
                                       "frame_predictions")
        predictions = predictions_tuple(
            frame_predictions=frame_predictions,
        )
        y_pred_tuple = namedtuple("y_pred",
                                  "frame_y_pred")
        y_pred = y_pred_tuple(
            frame_y_pred=frame_y_pred,
        )

        # Build the summary Tensor based on the TF collection of Summaries.
        summary = tf.summary.merge_all()

        # Add the variable initializer Op.
        init = tf.global_variables_initializer()
        init_l = tf.local_variables_initializer()

        # Create a saver for writing training checkpoints.
        saver = tf.train.Saver()

        # Create a session for running Ops on the Graph.
        sess = tf.Session()

        # Instantiate a SummaryWriter to output summaries and the Graph.
        summary_writer = tf.summary.FileWriter(params['log_dir'],
                                               sess.graph)

        # And then after everything is built:
        # Run the Op to initialize the variables.
        sess.run(init)
        sess.run(init_l)

        # Print number of trainable parameters
        nb_trainable_params = sum([prod(v.get_shape().as_list())
                                   for v in grad_vars])
        logging.info("Number of trainable params: %d", nb_trainable_params)
        # Train and evaluate at each epoch
        nb_training_steps_per_epoch = training_generator.steps
        train_frame_res, val_frame_res = \
            _do_train_eval(
                sess=sess, train_op=train_op, loss=loss,
                eval_metric=eval_metric, predictions=predictions,
                y_pred=y_pred,
                training_generator=training_generator,
                validation_generator=validation_generator,
                placeholders_dict=placeholders_dict,
                summary=summary, summary_writer=summary_writer,
                logits=logits)

        # Save checkpoint
        checkpoint_file = os.path.join(params['log_dir'],
                                       'multilabel_rnn_model.ckpt')
        nb_total_steps = params[
                             'nb_epochs'] * nb_training_steps_per_epoch
        saver.save(sess, checkpoint_file,
                   global_step=nb_total_steps - 1)

        # TODO: Log output using output_logger if available

        return val_frame_res, train_frame_res

### 6) Train and evaluate model

In [None]:
val_frame_res, train_frame_res = fit_generator(training_generator=training_batch_generator_obj, validation_generator=testing_batch_generator_obj)