# NIMA: Neural Image Assessment
replicate model for NIMA image assessment
see: 
* https://research.googleblog.com/2017/12/introducing-nima-neural-image-assessment.html
* https://arxiv.org/abs/1709.05424



## Table of Contents
<a href="#Install">Installation and setup</a><br>
<a href="#Conversion">`TFRecord` Conversion</a><br>
<a href="#Model">Model & Loss</a><br>
<a href="#Train">Training</a><br>

## Overview
### Components
tensorflow models for VGG16, Inception-v2, MobileNet
* https://github.com/tensorflow/models/tree/master/research/slim

training datasets
* AVA: A largescale database for aesthetic visual analysis
  * http://refbase.cvc.uab.es/files/MMP2012a.pdf
  * https://github.com/mtobeiyf/ava_downloader
  * https://mega.nz/#F!hIEhQTLY key `!RkOnZv8Fz7EbYreHsiEzvA` (32GB)
  * https://mega.nz/#!MUcXyBSB key `!0Q0Nq8_zBuSGiKmEHuKXKoAg8SDsB-21GwlJ22AJegU`
  
* TID2013: http://www.ponomarenko.info/tid2013.htm
  * http://www.ponomarenko.info/tid2013/tid2013.rar (1GB)

### Pipeline
* input images are rescaled to 256 × 256, and then a crop of size 224 × 224 crop is randomly extracted.
* random data augmentation in our training process is horizontal flipping of the image crops.

### Score
* mean quality score = `sum_N( s_i*p_i)`


### Loss function
* EMD (Earth Movers Distance) penalize mis-classifications according to class distances.
  * https://gist.github.com/mjdietzx/a8121604385ce6da251d20d018f9a6d6
  * https://www.tensorflow.org/api_docs/python/tf/distributions/Distribution
* CMD, cumulative distr function, N_ava=10, N_tid=9
    ```
    EMD(p,phat) = (1/N.*sum_k( abs(CDF_p(k)-CDF_phat(k)).^2 )).^0.5
    ```

### Training
* 80/20 train/test split on AVA and TID datasets

hyperparameters
  * `momentum=0.9, lambda=3e-7` 
  * `dropout=0.75` applied to last layer of baseline network

FC layer, n=10, followed by softmax activations
  * `lambda_fc=3e-6`

lambda `decay=0.95` after every 10 epochs

**???: how many epochs**


  

<a id='Install'></a>
## Installation and Setup

### Download datasets
* AVA dataset is 32GB, 256K images
* TID dataset is about 1GB, about 3K images


In [1]:
# set key paths
import os
if not 'HOME' in globals(): 
    HOME = %pwd
SLIM = HOME + '/models/research/slim'
CHECKPOINTS = os.path.join(HOME, 'ckpt')
TRAIN_LOG = os.path.join(HOME, 'log')
TMP = HOME + '/tmp'
TID=os.path.join(HOME, 'data', 'tid')
AVA=os.path.join(HOME, 'data', 'ava')

In [None]:
# !mkdir -p $AVA
# !mkdir -p $TID

### TID dataset

In [None]:
%cd $DATA
# download tid, 1GB 
!wget http://www.ponomarenko.info/tid2013/tid2013.rar

In [None]:
# rar archiver for python/conda, https://anaconda.org/pypi/unrar
!pip install -i https://pypi.anaconda.org/pypi/simple unrar
#!conda install -c mlgill rarfile 

### AVA dataset
The AVA dataset is available on MEGA.nz as a 32GB download split into 64 `7z` archive files. You must first register with MEGA and install the desktop client in order to get enough transfer bandwidth to download. Overall, it will take a few days. see https://github.com/mtobeiyf/ava_downloader.

Once the archive files are available, use a `7z` unarchiver to extract. The dataset is 255,000 JPG files in **one** directory.

The images are converted into `TF_Records` for learning. For NIMA, an optional step is to resize all images to `(256,256,3)` before creating `TF_Records` to minimize upload times for cloud=based training. On OSX, this can be done via the following shell script:

```
  export SOURCE=/Volumes/data/DATASETS/AVA/images
  export TARGET=/Volumes/data/DATASETS/AVA/images-256
  mkdir -p $TARGET
  for f in $SOURCE/*.jpg; do
    sips -z 256 256 --setProperty formatOptions high $f --out $TARGET
  done
```

### Tensorflow Models with Pre-trained Weights

In [None]:
%cd $HOME
!git clone https://github.com/mixuala/models  # https://github.com/tensorflow/models/

In [None]:
# check tf-slim install
%cd $SLIM
!python -c "from nets import cifarnet; mynet = cifarnet.cifarnet"


In [None]:
%cd $SLIM
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function

import matplotlib
%matplotlib inline
import matplotlib.pyplot as plt
import math
import numpy as np
import tensorflow as tf
import time

from datasets import dataset_utils

# Main slim library
from tensorflow.contrib import slim

In [None]:
# download tensorflow checkpoints, i.e. pre-trained weights
!mkdir -p $CKPT
!mkdir -p $TMP
%cd $TMP

### download model checkpoint
# vgg16
!wget http://download.tensorflow.org/models/vgg_16_2016_08_28.tar.gz
!tar -xvf vgg_16_2016_08_28.tar.gz
%mv vgg_16.ckpt $CKPT
%rm vgg_16_2016_08_28.tar.gz

# inception-v2
!wget http://download.tensorflow.org/models/inception_v2_2016_08_28.tar.gz
!tar -xvf inception_v2_2016_08_28.tar.gz
%mv inception_v2.ckpt $CKPT
%rm inception_v2_2016_08_28.tar.gz

# MobileNet
!wget http://download.tensorflow.org/models/mobilenet_v1_1.0_224_2017_06_14.tar.gz
!tar -xvf mobilenet_v1_1.0_224_2017_06_14.tar.gz
%mv mobilenet_v1_1.0_224.ckpt.* $CKPT
%rm mobilenet_v1_1.0_224_2017_06_14.tar.gz  

<a id='Conversion'></a>
## Dataset Conversion to `TFRecord`

* Choosing shard values to get TFRecord files of size ~100MB see: https://www.tensorflow.org/performance/performance_guide#input_pipeline_optimization
* 20% of dataset reserved for testing


In [None]:
# config env
import os
if not 'HOME' in globals(): 
    HOME = %pwd
SLIM = HOME + '/models/research/slim'
CHECKPOINTS = os.path.join(HOME, 'ckpt')
TMP = HOME + '/tmp'
TID = os.path.join(HOME, 'data', 'tid')
AVA = os.path.join(HOME, 'data', 'ava')   # dev dataset  "/snappi.ai/tensorflow/nima/data/ava"
# AVA = "/Volumes/data/DATASETS/AVA"        # 32GB dataset


In [None]:
# convert dataset to `TFRecord`
%cd $SLIM
import tensorflow as tf
from datasets import dataset_utils, convert_nima_tid, convert_nima_ava

# convert_nima_tid.run(TID)
# convert_nima_ava.run(AVA, resized=True, shards=16)

In [None]:
# verify conversion by checking sample data
# NOTE: _mean_image_subtraction() will cause color shifts
%cd $SLIM
import matplotlib
%matplotlib inline
import matplotlib.pyplot as plt
import math
import numpy as np
import tensorflow as tf
import time

from datasets import dataset_utils, nima_tid, nima_ava
import tensorflow as tf
from preprocessing import preprocessing_factory
import preprocessing.nima_preprocessing as nima_pre
from tensorflow.contrib import slim

import sys

use_resized_images = False
is_training = True
split_name = 'train' if is_training else 'validation'

nima_preprocessing = preprocessing_factory.get_preprocessing('nima')


with tf.Graph().as_default(): 
    dataset_name = "AVA"
    if dataset_name == "AVA":
        dataset = nima_ava.get_split(split_name, AVA, 
                                     resized=use_resized_images)
        data_provider = slim.dataset_data_provider.DatasetDataProvider(
            dataset, common_queue_capacity=32, common_queue_min=1)
        image, id, ratings, mean, stddev, height, width = data_provider.get(
            ['image', 'id', 'ratings', 'mean', 'stddev', 'height', 'width'])

    elif dataset_name == "TID":
        dataset = nima_tid.get_split(split_name, TID, file_pattern='nima_tid_%s_*.tfrecord')
        data_provider = slim.dataset_data_provider.DatasetDataProvider(
            dataset, common_queue_capacity=32, common_queue_min=1)
        image, id, mean, stddev, height, width = data_provider.get(
            ['image', 'id', 'mean', 'stddev', 'height', 'width'])
    else:
        exit
    
    
    # apply preprocessing
    image = nima_preprocessing(image, 224,224,
            is_training=is_training,
            resized=use_resized_images)

        
    with tf.Session() as sess:    
        with slim.queues.QueueRunners(sess):
            for i in range(4):
                if dataset_name=="AVA":
                    np_image, np_id, np_mean, np_stddev, np_ratings, np_h, np_w = sess.run(
                        [image, id, mean, stddev, ratings, height, width])
                    title = '%s: %d x %d, %f/%f, (%s) %s' % (
                        np_id.decode("utf-8"), np_h, np_w, np_mean, np_stddev, np_ratings, tf.shape(image))
                else:
                    np_image, np_id, np_mean, np_stddev, np_h, np_w = sess.run(
                        [image, id, mean, stddev, height, width])
                    title = '%s: %d x %d, %f/%f, %s' % (
                        np_id.decode("utf-8"), np_h, np_w, np_mean, np_stddev, tf.shape(image))
                    
                h,w, _ = np_image.shape
                
                plt.figure()
                plt.imshow(np_image.astype(np.uint8))
                plt.title(title)
                plt.axis('off')
                plt.show()

<a id='Model'></a>
## Model & Loss
Nima was developed to work with multiple pre-trained CNNs, including `Vgg16`, `Inception-v2`, and `MobileNet`



### Helper Functions

In [5]:
# from datasets.nima import load_batch


# modified from slim_walkthrough
%cd $SLIM
import tensorflow as tf
from preprocessing import preprocessing_factory
from tensorflow.contrib import slim

nima_preprocessing = preprocessing_factory.get_preprocessing('nima')

def load_batch(dataset, batch_size=32, height=224, width=224, 
            is_training=False, 
            resized=True,
            model="vgg16",
            label_name="ratings"):
    """Loads a single batch of data.
    
    Args:
      dataset: The dataset to load.
      batch_size: The number of images in the batch.
      height: The size of each image after preprocessing.
      width: The size of each image after preprocessing.
      is_training: Whether or not we're currently training or evaluating.
      resized: Whether the TFRecords were converted with images already resized to (256,256,3)
    
    Returns:
      images: A Tensor of size [batch_size, height, width, 3], image samples that have been preprocessed.
      images_raw: A Tensor of size [batch_size, height, width, 3], image samples that can be used for visualization.
      labels: A Tensor of size [batch_size], whose values range between 0 and dataset.num_classes.
    """
    data_provider = slim.dataset_data_provider.DatasetDataProvider(
        dataset, common_queue_capacity=32,
        common_queue_min=8)
    image_raw, label = data_provider.get(['image', label_name])
    
    # Preprocess image for usage by the appropriate model.
    image = {
      'vgg16': nima_preprocessing(image_raw, height, width, is_training=is_training,
                              resized=resized),
      'inception': None,
      'mobilenet': None,
    }[model]

        
    # Preprocess the image for display purposes.
    image_raw = tf.expand_dims(image_raw, 0)
    image_raw = tf.image.resize_images(image_raw, [height, width])
    image_raw = tf.squeeze(image_raw)

    # Batch it up.
    images, images_raw, labels = tf.train.batch(
          [image, image_raw, label],
          batch_size=batch_size,
          num_threads=1,
          capacity=2 * batch_size)
    
    return images, images_raw, labels



/snappi.ai/tensorflow/nima/models/research/slim


The Nima model specifies a different `learning rate` for the `finetune` layers of the model, and also a `momentum` value, which was not part of the original `vgg_16.ckpt`. This seems to suggest that we need to use 2 separate optimizers with 2 separate gradient calculations.

In [6]:
from tensorflow.python.framework import ops
from tensorflow.python.ops import array_ops
from tensorflow.python.ops import control_flow_ops
from tensorflow.python.training import training_util

def slim_learning_create_train_op_with_manual_grads( total_loss, optimizers, grads_and_vars,
            global_step=0,                                                            
#                     update_ops=None,
#                     variables_to_train=None,
            clip_gradient_norm=0,
            summarize_gradients=False,
            gate_gradients=1,               # tf.python.training.optimizer.Optimizer.GATE_OP,
            aggregation_method=None,
            colocate_gradients_with_ops=False,
            gradient_multipliers=None,
            check_numerics=True):
    """Runs the training loop
            modified from slim.learning.create_train_op() to work with
            a matched list of optimizers and grads_and_vars

    Returns:
        train_ops - the value of the loss function after training.
    """

    def transform_grads_fn(grads):
        if gradient_multipliers:
            with ops.name_scope('multiply_grads'):
                grads = multiply_gradients(grads, gradient_multipliers)

        # Clip gradients.
        if clip_gradient_norm > 0:
            with ops.name_scope('clip_grads'):
                grads = clip_gradient_norms(grads, clip_gradient_norm)
        return grads

    if global_step is None:
        global_step = training_util.get_or_create_global_step()

    assert len(optimizers)==len(grads_and_vars)

    ### order of processing:
    # 0. grads = opt.compute_gradients() 
    # 1. grads = transform_grads_fn(grads)
    # 2. add_gradients_summaries(grads)
    # 3. grads = opt.apply_gradients(grads, global_step=global_step) 

    grad_updates = []
    for i in range(len(optimizers)):
        grads = grads_and_vars[i]                               # 0. kvarg, from opt.compute_gradients()
        grads = transform_grads_fn(grads)                       # 1. transform_grads_fn()
        if summarize_gradients:
            with ops.name_scope('summarize_grads'):
                slim.learning.add_gradients_summaries(grads)    # 2. add_gradients_summaries()
        if i==0:
            grad_update = optimizers[i].apply_gradients( grads, # 3. optimizer.apply_gradients()
                        global_step=global_step)                #    update global_step only once
        else:
            grad_update = optimizers[i].apply_gradients( grads )
        grad_updates.append(grad_update)

    with ops.name_scope('train_op'):
        total_loss = array_ops.check_numerics(total_loss,
                                        'LossTensor is inf or nan')
        train_op = control_flow_ops.with_dependencies(grad_updates, total_loss)

    # Add the operation used for training to the 'train_op' collection    
    train_ops = ops.get_collection_ref(ops.GraphKeys.TRAIN_OP)
    if train_op not in train_ops:
        train_ops.append(train_op)

    return train_op

#### EMD loss function

The loss function uses a normalized `Earth Movers Distance` to penalize mis-classifications according to class distances. This is based on the difference between the `Cumulative Distribution Function (CDF)` of the `y` and `y_hat` values for the ratings distribution.

In [7]:
# %cd $HOME
# from nima_utils import NimaUtils

import tensorflow as tf
import numpy as np

""" _CDF in tensorflow """
#
# private methods used by class NimaUtils()
#
def _weighted_score(x):
  m,n = tf.convert_to_tensor(x).get_shape().as_list()
  return tf.multiply(x, tf.range(1, n+1 , dtype=tf.float32))  # (None,10)

def _CDF (k, x):
  # assert k <= tf.shape(x)[1]
  m,n = tf.convert_to_tensor(x).get_shape().as_list()
  w_score = _weighted_score(x)        # (None,10)
  cum_k_score = tf.reduce_sum(w_score[:,:k], axis=1)  # (None)
  total = tf.reduce_sum(w_score, axis=1)  # (None)
  cdf = tf.divide(cum_k_score, total)     # (None)
  return tf.reshape(cdf, [m,1] ) # (None,1)

def _cum_CDF (x):
  # y = tf.concat( [   _CDF(i,x)    for i in tf.range(1, tf.shape(x)[1]+1) ] )
  x = tf.to_float(x)
  m,n = tf.convert_to_tensor(x).get_shape().as_list()
  y = tf.concat( [_CDF(1,x),_CDF(2,x),_CDF(3,x),_CDF(4,x),_CDF(5,x),
      _CDF(6,x),_CDF(7,x),_CDF(8,x),_CDF(9,x),_CDF(10,x)], 
      axis=1 )
  return tf.reshape(y, [m,n] )

def _emd(y, y_hat):
    """Returns the earth mover distance between to arrays of ratings, 
    based on cumulative distribution function
    
    Args:
      y, y_hat: a mini-batch of ratings, each composed of a count of scores 
                shape = (None, n), array of count of scores for score from 1..n

    Returns:
      float 
    """
    r = 2.
    m,n = tf.convert_to_tensor(y).get_shape().as_list()
    N = tf.to_float(n)
    cdf_loss = tf.subtract(_cum_CDF(y), _cum_CDF(y_hat))
    emd_loss = tf.pow( tf.divide( tf.reduce_sum( tf.pow(cdf_loss, r), axis=1 ), N), 1/r)
  #   return tf.reshape(emd_loss, [m,1])
    return tf.reduce_mean(emd_loss)


class NimaUtils(object):
  """Help Class for Nima calculations
    NimaUtils.emd(y, y_hat) return float
    NimaUtils.score( y ) returns [[mean, std]]
  """
  @staticmethod
  def emd(y, y_hat):
    return _emd(y, y_hat)

  @staticmethod
  def mu(y, shape=None):
    """mean quality score for ratings
    
    Args:
      y, y_hat: a mini-batch of ratings, each composed of a count of scores 
                shape = (None, n), array of count of scores for score from 1..n

    Returns:
      array of [mean] floats for each row in y
    """
    y = tf.convert_to_tensor(y)
    m,n = y.get_shape().as_list()
    mean = tf.reduce_sum(_weighted_score(y), axis=1)/tf.reduce_sum(y, axis=1)
    return tf.reshape(mean, [m,1])
  
  @staticmethod
  def sigma(y, shape=None):
    """standard deviation of ratings
    
    Args:
      y, y_hat: a mini-batch of ratings, each composed of a count of scores 
                shape = (None, n), array of count of scores for score from 1..n

    Returns:
      array of [stddev] floats for each row in y
    """    
    y = tf.convert_to_tensor(y)
    m,n = y.get_shape().as_list()    
    mean = mu(y)
    s = tf.range(1, n+1 , dtype=tf.float32)
    p_score = tf.divide(y, tf.reshape(tf.reduce_sum(y, axis=1),[m,1]))
    stddev = tf.sqrt(tf.reduce_sum( tf.multiply(tf.square(tf.subtract(s,mean)),p_score), axis=1))
    return tf.reshape(stddev, [m,1])

  @staticmethod
  def score(y):
    """returns [mean quality score, stddev] for each row"""
    return tf.concat([mu(y), sigma(y)], axis=1)


### Hyperparameters

In [17]:
# Hyperparams
h_params = {
    "regularization": "?",
    "momentum": 0.9,    # weight and bias   
    "dropout_keep": 0.75,    # applied to last layer of baseline network only, scope="dropout7"
    "learning_rate": {  
        "baseline": 3e-7,    # baseline CNN layers
        "finetune": 3e-6,    # last fc layer only
    },
    "learning_rate_decay": 0.95,  # applied to both learning rates, after every 10 epochs
}

In [9]:
dataset_params = {
    'ava':{
        'path': AVA,
        'max_score': 10
    },
    'tid':{
        'path': TID,
        'max_score': 9
    },
}

### Vgg16
Finetune `Vgg16` for `Nima`, using AVA dataset
* the `baseline` model is `Vgg16` with the top (fc8) layer removed
* `finetune` with a `fully_connected` layer with `softmax` activations, `n=10` for AVA
* **???:** `n=9` for TID2013. TID2013 is based on ratings from 1-9, but these score distributions are approximated from a mean/stddev target value using `maximum entropy optimization`
* remove `Vgg16/fc8` and add `fc8` for 10 classes to learn ratings
* restore `vgg_16.ckpt` weights for all but last layer



In [10]:
# Nima Model based on Vgg16 for training against AVA dataset
%cd $SLIM
from nets import vgg
from tensorflow.contrib import slim

def nima_vgg16(inputs, dataset_params, h_params):
    #
    # load the model, use the default arg scope to configure the batch norm parameters.
    #
    net = { 
        "baseline": "baseline image classifier with last layer removed", 
        "finetune": "finetuning layer with softmax activations, n=10 for AVA, 9 for TID2013"
    }
    with slim.arg_scope(vgg.vgg_arg_scope()):
        # define baseline network with last layer removed
        net["baseline"], end_points = vgg.vgg_16(inputs, 
#                                      dropout_keep_prob=0.5,
                                     num_classes=None,
                                     )
        
        # define finetuning network, dropout, fc, softmax
        num_classes=dataset_params['ava']['max_score']
        dropout_keep_prob=h_params["dropout_keep"]
        net["finetune"] = slim.dropout( net["baseline"], dropout_keep_prob,
                                       is_training=is_training,
                                       scope='nima/dropout7' )
        net["finetune"] = slim.fully_connected( net["finetune"], num_classes,
                                               activation_fn=tf.nn.softmax,
                                               scope='nima/fc8' )
        end_points["nima/fc8"] = net["finetune"] = tf.squeeze( net["finetune"], [1, 2], name='nima/fc8/squeezed')
    return net, end_points

#     predictions = net["finetune"]

/snappi.ai/tensorflow/nima/models/research/slim


<a id='Train'></a>
## Training
Using `tf.slim.learning`-style training loop



In [18]:
%cd $HOME
# from nima_utils import NimaUtils

%cd $SLIM
import numpy as np
import os
import tensorflow as tf
from tensorflow.contrib import slim
from datasets import dataset_utils, nima_tid, nima_ava

# training params
TRAIN_LOG = os.path.join(HOME, 'log')
checkpoints_dir = CHECKPOINTS
log_dir = TRAIN_LOG
use_resized_images = True
is_training = True
split_name = 'train' if is_training else 'validation'

if not tf.gfile.Exists(log_dir):
    tf.gfile.MakeDirs(log_dir)

    
tf.reset_default_graph()
with tf.Graph().as_default():
    global_step = tf.Variable(0, trainable=False)

    #
    # prepare mini-batches
    #
    dataset = nima_ava.get_split(split_name, dataset_params["ava"]["path"], resized=use_resized_images)
    images, images_raw, labels = load_batch(dataset, 
                batch_size=32,
                is_training=is_training,
                resized=use_resized_images )


    
    #
    # load the model, use the default arg scope to configure the batch norm parameters.
    #
    #     net = { 
    #         "baseline": "baseline image classifier with last layer removed", 
    #         "finetune": "finetuning layer with softmax activations, n=10 for AVA, 9 for TID2013"
    #     }
    #
    net, end_points = nima_vgg16(images, dataset_params, h_params)
    predictions = net["finetune"]
#     print(end_points)

    
    #
    # define loss functions––include with slim.losses.get_total_loss()
    #
    emd_loss =  NimaUtils.emd(labels, predictions) 
    tf.losses.add_loss(emd_loss) # Letting TF-Slim know about the emd loss.
    total_loss = tf.losses.get_total_loss( add_regularization_losses=True )
    
    
    #
    # configure training loop MANUALLY & apply hyperparams:
    #   - exponential_decay of learning_rate
    #   - learning_rate by layers
    # 
    # apply learning rate decay    
    for k in h_params["learning_rate"]:
        h_params["learning_rate"][k] = tf.train.exponential_decay( h_params["learning_rate"][k],
                                    global_step, 10, h_params["learning_rate_decay"], staircase=True)

        
    #
    # configure training loop MANUALLY, apply learning rates by layer
    #
    #   see: https://stackoverflow.com/questions/34945554/how-to-set-layer-wise-learning-rate-in-tensorflow
    split_index = -2     # last layer weights & bias, count=2
    training = {"baseline":{}, "finetune":{}}
    # vars
    trainable = tf.trainable_variables()
    training["baseline"]["vars"] = trainable[:split_index]
    training["finetune"]["vars"] = trainable[-split_index:]
    # grads
    gradients = tf.gradients( total_loss, trainable )
    training["baseline"]["grads"] = gradients[:split_index]
    training["finetune"]["grads"] = gradients[-split_index:]
    # optimizers
    
    training["baseline"]["opt"] = tf.train.GradientDescentOptimizer(
                                    learning_rate=h_params["learning_rate"]["baseline"])
    # ???: Does this really work as I expect? 
    # I only want to apply momentum to the finetune layers, and the vgg_16.ckpt did not include momentum...
    training["finetune"]["opt"] = tf.train.MomentumOptimizer(
                                    learning_rate=h_params["learning_rate"]["finetune"],
                                                   momentum=h_params["momentum"]
                                    )

    grads_and_vars = [ zip(training["baseline"]["grads"], training["baseline"]["vars"]), 
                       zip(training["finetune"]["grads"], training["finetune"]["vars"]) ]
    optimizers = [ training["baseline"]["opt"], training["finetune"]["opt"] ]

    train_op = slim_learning_create_train_op_with_manual_grads(total_loss, 
                                                               optimizers, grads_and_vars, 
                                                               global_step=global_step)

    
    
    # restore Vgg16 to net["baseline"], excludes last layer, fc8
    init_fn = slim.assign_from_checkpoint_fn(
        os.path.join(checkpoints_dir, 'vgg_16.ckpt'),
        slim.get_variables_to_restore(exclude=['fc8']),
        ignore_missing_vars=True
        )
    
    
    # start training
    slim.learning.train(train_op, log_dir, 
                        init_fn=init_fn,
                        global_step=global_step,
                        number_of_steps=1,
                        save_summaries_secs=300,
                        save_interval_secs=600                       
                       )


/snappi.ai/tensorflow/nima
/snappi.ai/tensorflow/nima/models/research/slim
>> TFRecord_dir=/snappi.ai/tensorflow/nima/data/ava/TFRecords_resized, 
>> pattern=nima_ava_train_*.tfrecord
INFO:tensorflow:Restoring parameters from /snappi.ai/tensorflow/nima/log/model.ckpt-200
INFO:tensorflow:Starting Session.
INFO:tensorflow:Saving checkpoint to path /snappi.ai/tensorflow/nima/log/model.ckpt
INFO:tensorflow:Starting Queues.
INFO:tensorflow:Recording summary at step 200.
INFO:tensorflow:Variable/sec: 0
INFO:tensorflow:global step 201: loss = 0.8296 (53.901 sec/step)
INFO:tensorflow:Stopping Training.
INFO:tensorflow:Finished training! Saving model to disk.


## Comments
I'm not sure if my work is correct because the loss does not seem to be decreasing. It seems the learning rate apecified by the paper is very low. But that may be the case for finetuning.

example:

```
cwd= /snappi.ai/tensorflow/nima
/snappi.ai/tensorflow/nima
/snappi.ai/tensorflow/nima/models/research/slim
>> TFRecord_dir=/snappi.ai/tensorflow/nima/data/ava/TFRecords_resized, 
>> pattern=nima_ava_train_*.tfrecord
WARNING:tensorflow:Variable Variable missing in checkpoint /snappi.ai/tensorflow/nima/ckpt/vgg_16.ckpt
  ...
WARNING:tensorflow:Variable nima/fc8/biases/Momentum missing in checkpoint /snappi.ai/tensorflow/nima/ckpt/vgg_16.ckpt
INFO:tensorflow:Restoring parameters from /snappi.ai/tensorflow/nima/ckpt/vgg_16.ckpt
INFO:tensorflow:Starting Session.
INFO:tensorflow:Saving checkpoint to path /snappi.ai/tensorflow/nima/log/model.ckpt
INFO:tensorflow:Starting Queues.
INFO:tensorflow:Recording summary at step 0.
INFO:tensorflow:global step 1: loss = 0.2411 (52.944 sec/step)
INFO:tensorflow:global step 2: loss = 0.2706 (47.458 sec/step)
INFO:tensorflow:global step 3: loss = 0.2439 (46.986 sec/step)
INFO:tensorflow:global step 4: loss = 0.2590 (46.905 sec/step)
INFO:tensorflow:global step 5: loss = 0.2604 (46.879 sec/step)
INFO:tensorflow:global step 6: loss = 0.2965 (47.169 sec/step)
INFO:tensorflow:Recording summary at step 6.
INFO:tensorflow:global step 7: loss = 0.2557 (48.756 sec/step)
INFO:tensorflow:global step 8: loss = 0.2793 (48.956 sec/step)
INFO:tensorflow:global step 9: loss = 0.2354 (47.047 sec/step)
INFO:tensorflow:global step 10: loss = 0.2429 (47.182 sec/step)
INFO:tensorflow:global step 11: loss = 0.2738 (47.103 sec/step)
INFO:tensorflow:global step 12: loss = 0.2293 (47.438 sec/step)
INFO:tensorflow:Saving checkpoint to path /snappi.ai/tensorflow/nima/log/model.ckpt
INFO:tensorflow:Recording summary at step 12.
INFO:tensorflow:global step 13: loss = 0.2434 (48.358 sec/step)
INFO:tensorflow:global step 14: loss = 0.2394 (46.977 sec/step)
INFO:tensorflow:global step 15: loss = 0.2367 (47.118 sec/step)
INFO:tensorflow:global step 16: loss = 0.2863 (46.911 sec/step)
INFO:tensorflow:global step 17: loss = 0.2633 (46.842 sec/step)
INFO:tensorflow:global step 18: loss = 0.2641 (47.154 sec/step)
INFO:tensorflow:Recording summary at step 18.
INFO:tensorflow:global step 19: loss = 0.2886 (47.331 sec/step)
INFO:tensorflow:global step 20: loss = 0.2568 (47.216 sec/step)
INFO:tensorflow:global step 21: loss = 0.2487 (47.125 sec/step)
INFO:tensorflow:global step 22: loss = 0.2566 (47.144 sec/step)
INFO:tensorflow:global step 23: loss = 0.2630 (47.150 sec/step)
INFO:tensorflow:global step 24: loss = 0.2882 (47.146 sec/step)
INFO:tensorflow:global step 25: loss = 0.2494 (47.674 sec/step)
INFO:tensorflow:Saving checkpoint to path /snappi.ai/tensorflow/nima/log/model.ckpt
INFO:tensorflow:Recording summary at step 25.
INFO:tensorflow:global step 26: loss = 0.2635 (48.129 sec/step)
INFO:tensorflow:global step 27: loss = 0.3057 (46.899 sec/step)
INFO:tensorflow:global step 28: loss = 0.2511 (46.734 sec/step)
INFO:tensorflow:global step 29: loss = 0.2301 (46.627 sec/step)
INFO:tensorflow:global step 30: loss = 0.2742 (46.537 sec/step)
INFO:tensorflow:global step 31: loss = 0.2958 (46.652 sec/step)
INFO:tensorflow:Recording summary at step 31.
INFO:tensorflow:global step 32: loss = 0.2851 (46.692 sec/step)
INFO:tensorflow:global step 33: loss = 0.2647 (46.899 sec/step)
INFO:tensorflow:global step 34: loss = 0.2555 (46.852 sec/step)
INFO:tensorflow:global step 35: loss = 0.2908 (46.984 sec/step)
INFO:tensorflow:global step 36: loss = 0.2725 (47.227 sec/step)
INFO:tensorflow:global step 37: loss = 0.2378 (47.406 sec/step)
INFO:tensorflow:global step 38: loss = 0.2399 (47.037 sec/step)
INFO:tensorflow:Saving checkpoint to path /snappi.ai/tensorflow/nima/log/model.ckpt
INFO:tensorflow:Recording summary at step 38.
INFO:tensorflow:global step 39: loss = 0.2363 (48.432 sec/step)
INFO:tensorflow:global step 40: loss = 0.2476 (46.969 sec/step)
INFO:tensorflow:global step 41: loss = 0.2826 (46.525 sec/step)
INFO:tensorflow:global step 42: loss = 0.2775 (46.459 sec/step)
INFO:tensorflow:global step 43: loss = 0.2431 (46.489 sec/step)
INFO:tensorflow:global step 44: loss = 0.3016 (46.677 sec/step)
INFO:tensorflow:Recording summary at step 44.
INFO:tensorflow:global step 45: loss = 0.2622 (46.657 sec/step)
INFO:tensorflow:global step 46: loss = 0.2686 (46.656 sec/step)
INFO:tensorflow:global step 47: loss = 0.3073 (46.638 sec/step)
INFO:tensorflow:global step 48: loss = 0.2798 (46.625 sec/step)
INFO:tensorflow:global step 49: loss = 0.2584 (46.735 sec/step)
INFO:tensorflow:global step 50: loss = 0.2532 (46.627 sec/step)
INFO:tensorflow:Saving checkpoint to path /snappi.ai/tensorflow/nima/log/model.ckpt
INFO:tensorflow:Recording summary at step 50.
INFO:tensorflow:global step 51: loss = 0.2988 (47.766 sec/step)
INFO:tensorflow:global step 52: loss = 0.2730 (46.951 sec/step)
INFO:tensorflow:global step 53: loss = 0.2739 (46.550 sec/step)
INFO:tensorflow:global step 54: loss = 0.2351 (46.631 sec/step)
INFO:tensorflow:global step 55: loss = 0.2330 (46.575 sec/step)
INFO:tensorflow:global step 56: loss = 0.3008 (46.410 sec/step)
INFO:tensorflow:global step 57: loss = 0.2345 (46.577 sec/step)
INFO:tensorflow:Recording summary at step 57.
INFO:tensorflow:global step 58: loss = 0.2517 (46.613 sec/step)
INFO:tensorflow:global step 59: loss = 0.2318 (46.482 sec/step)
INFO:tensorflow:global step 60: loss = 0.2236 (46.541 sec/step)
INFO:tensorflow:global step 61: loss = 0.2356 (46.566 sec/step)
INFO:tensorflow:global step 62: loss = 0.2926 (46.569 sec/step)
INFO:tensorflow:global step 63: loss = 0.2860 (46.663 sec/step)
INFO:tensorflow:Saving checkpoint to path /snappi.ai/tensorflow/nima/log/model.ckpt
INFO:tensorflow:Recording summary at step 63.
INFO:tensorflow:global step 64: loss = 0.2564 (47.574 sec/step)
INFO:tensorflow:global step 65: loss = 0.2384 (46.611 sec/step)
INFO:tensorflow:global step 66: loss = 0.2528 (46.527 sec/step)
INFO:tensorflow:global step 67: loss = 0.3030 (46.552 sec/step)
INFO:tensorflow:global step 68: loss = 0.2381 (46.524 sec/step)
INFO:tensorflow:global step 69: loss = 0.2605 (46.387 sec/step)
INFO:tensorflow:global step 70: loss = 0.2575 (46.646 sec/step)
INFO:tensorflow:Recording summary at step 70.
INFO:tensorflow:global step 71: loss = 0.2514 (46.583 sec/step)
INFO:tensorflow:global step 72: loss = 0.2616 (46.463 sec/step)
INFO:tensorflow:global step 73: loss = 0.2593 (46.600 sec/step)
INFO:tensorflow:global step 74: loss = 0.2676 (46.619 sec/step)
INFO:tensorflow:global step 75: loss = 0.2774 (46.669 sec/step)
INFO:tensorflow:global step 76: loss = 0.2359 (48.814 sec/step)
INFO:tensorflow:Saving checkpoint to path /snappi.ai/tensorflow/nima/log/model.ckpt
INFO:tensorflow:Recording summary at step 76.
INFO:tensorflow:global step 77: loss = 0.2480 (48.683 sec/step)
INFO:tensorflow:global step 78: loss = 0.2611 (48.136 sec/step)
INFO:tensorflow:global step 79: loss = 0.2570 (46.677 sec/step)
INFO:tensorflow:global step 80: loss = 0.2615 (46.352 sec/step)
INFO:tensorflow:global step 81: loss = 0.2978 (47.083 sec/step)
INFO:tensorflow:global step 82: loss = 0.2632 (46.399 sec/step)
INFO:tensorflow:Recording summary at step 82.
INFO:tensorflow:global step 83: loss = 0.2473 (46.549 sec/step)
INFO:tensorflow:global step 84: loss = 0.2491 (46.496 sec/step)
INFO:tensorflow:global step 85: loss = 0.2741 (46.842 sec/step)
INFO:tensorflow:global step 86: loss = 0.2594 (46.641 sec/step)
INFO:tensorflow:global step 87: loss = 0.2812 (46.693 sec/step)
INFO:tensorflow:global step 88: loss = 0.2529 (46.740 sec/step)
INFO:tensorflow:global step 89: loss = 0.2584 (46.629 sec/step)
INFO:tensorflow:Saving checkpoint to path /snappi.ai/tensorflow/nima/log/model.ckpt
INFO:tensorflow:Recording summary at step 89.
INFO:tensorflow:global step 90: loss = 0.2596 (47.670 sec/step)
INFO:tensorflow:global step 91: loss = 0.2453 (47.042 sec/step)
INFO:tensorflow:global step 92: loss = 0.2478 (46.811 sec/step)
INFO:tensorflow:global step 93: loss = 0.2633 (46.666 sec/step)
INFO:tensorflow:global step 94: loss = 0.2550 (47.038 sec/step)
INFO:tensorflow:global step 95: loss = 0.2699 (46.861 sec/step)
INFO:tensorflow:Recording summary at step 95.
INFO:tensorflow:global step 96: loss = 0.2561 (46.975 sec/step)
INFO:tensorflow:global step 97: loss = 0.2732 (46.860 sec/step)
INFO:tensorflow:global step 98: loss = 0.2523 (46.889 sec/step)
INFO:tensorflow:global step 99: loss = 0.2449 (46.868 sec/step)
INFO:tensorflow:global step 100: loss = 0.2529 (48.063 sec/step)
INFO:tensorflow:global step 101: loss = 0.2441 (48.094 sec/step)
INFO:tensorflow:global step 102: loss = 0.2421 (46.797 sec/step)
INFO:tensorflow:Recording summary at step 102.
INFO:tensorflow:Saving checkpoint to path /snappi.ai/tensorflow/nima/log/model.ckpt
INFO:tensorflow:global step 103: loss = 0.2573 (48.506 sec/step)
INFO:tensorflow:global step 104: loss = 0.2563 (46.535 sec/step)
INFO:tensorflow:global step 105: loss = 0.2886 (46.542 sec/step)
INFO:tensorflow:global step 106: loss = 0.2864 (46.441 sec/step)
INFO:tensorflow:global step 107: loss = 0.2621 (46.477 sec/step)
INFO:tensorflow:global step 108: loss = 0.2334 (46.507 sec/step)
INFO:tensorflow:Recording summary at step 108.
INFO:tensorflow:global step 109: loss = 0.2396 (46.706 sec/step)
INFO:tensorflow:global step 110: loss = 0.2778 (46.587 sec/step)
INFO:tensorflow:global step 111: loss = 0.2857 (46.772 sec/step)
INFO:tensorflow:global step 112: loss = 0.2342 (46.642 sec/step)
INFO:tensorflow:global step 113: loss = 0.2274 (46.854 sec/step)
INFO:tensorflow:global step 114: loss = 0.2599 (46.784 sec/step)
INFO:tensorflow:Saving checkpoint to path /snappi.ai/tensorflow/nima/log/model.ckpt
INFO:tensorflow:Recording summary at step 114.
INFO:tensorflow:global step 115: loss = 0.2395 (47.684 sec/step)
INFO:tensorflow:global step 116: loss = 0.2639 (46.540 sec/step)
INFO:tensorflow:global step 117: loss = 0.2711 (46.579 sec/step)
INFO:tensorflow:global step 118: loss = 0.2284 (46.787 sec/step)
INFO:tensorflow:global step 119: loss = 0.2465 (46.986 sec/step)
INFO:tensorflow:global step 120: loss = 0.2572 (46.728 sec/step)
INFO:tensorflow:global step 121: loss = 0.2249 (46.698 sec/step)
INFO:tensorflow:Recording summary at step 121.
INFO:tensorflow:global step 122: loss = 0.2752 (46.561 sec/step)
INFO:tensorflow:global step 123: loss = 0.2563 (46.660 sec/step)
INFO:tensorflow:global step 124: loss = 0.2370 (46.818 sec/step)
INFO:tensorflow:global step 125: loss = 0.2938 (46.732 sec/step)
INFO:tensorflow:global step 126: loss = 0.2634 (46.920 sec/step)
INFO:tensorflow:global step 127: loss = 0.2872 (46.627 sec/step)
INFO:tensorflow:Saving checkpoint to path /snappi.ai/tensorflow/nima/log/model.ckpt
INFO:tensorflow:Recording summary at step 127.
INFO:tensorflow:global step 128: loss = 0.2540 (47.629 sec/step)
INFO:tensorflow:global step 129: loss = 0.2590 (48.070 sec/step)
INFO:tensorflow:global step 130: loss = 0.2224 (47.305 sec/step)
INFO:tensorflow:global step 131: loss = 0.2426 (46.822 sec/step)
INFO:tensorflow:global step 132: loss = 0.2783 (46.753 sec/step)
INFO:tensorflow:global step 133: loss = 0.2514 (46.809 sec/step)
INFO:tensorflow:global step 134: loss = 0.2502 (46.836 sec/step)
INFO:tensorflow:Recording summary at step 134.
INFO:tensorflow:global step 135: loss = 0.2566 (47.107 sec/step)
INFO:tensorflow:global step 136: loss = 0.2465 (47.012 sec/step)
INFO:tensorflow:global step 137: loss = 0.2588 (47.115 sec/step)
INFO:tensorflow:global step 138: loss = 0.2283 (47.062 sec/step)
INFO:tensorflow:global step 139: loss = 0.2709 (46.848 sec/step)
INFO:tensorflow:global step 140: loss = 0.2637 (47.212 sec/step)
INFO:tensorflow:Saving checkpoint to path /snappi.ai/tensorflow/nima/log/model.ckpt
INFO:tensorflow:Recording summary at step 140.
INFO:tensorflow:global step 141: loss = 0.2839 (47.479 sec/step)
INFO:tensorflow:global step 142: loss = 0.2624 (46.992 sec/step)
INFO:tensorflow:global step 143: loss = 0.2373 (46.566 sec/step)
INFO:tensorflow:global step 144: loss = 0.2567 (46.647 sec/step)
INFO:tensorflow:global step 145: loss = 0.2655 (46.627 sec/step)
INFO:tensorflow:global step 146: loss = 0.2429 (46.757 sec/step)
INFO:tensorflow:Recording summary at step 146.
INFO:tensorflow:global step 147: loss = 0.2256 (46.796 sec/step)
INFO:tensorflow:global step 148: loss = 0.2358 (46.966 sec/step)
INFO:tensorflow:global step 149: loss = 0.2805 (47.077 sec/step)
INFO:tensorflow:global step 150: loss = 0.2479 (47.119 sec/step)
INFO:tensorflow:global step 151: loss = 0.2413 (47.154 sec/step)
INFO:tensorflow:global step 152: loss = 0.2692 (47.452 sec/step)
INFO:tensorflow:global step 153: loss = 0.2399 (48.899 sec/step)
INFO:tensorflow:Saving checkpoint to path /snappi.ai/tensorflow/nima/log/model.ckpt
INFO:tensorflow:Recording summary at step 153.
INFO:tensorflow:global step 154: loss = 0.2430 (48.641 sec/step)
INFO:tensorflow:global step 155: loss = 0.2650 (48.286 sec/step)
INFO:tensorflow:global step 156: loss = 0.2597 (46.670 sec/step)
INFO:tensorflow:global step 157: loss = 0.2479 (46.360 sec/step)
INFO:tensorflow:global step 158: loss = 0.2526 (46.614 sec/step)
INFO:tensorflow:global step 159: loss = 0.2642 (46.503 sec/step)
INFO:tensorflow:Recording summary at step 159.
INFO:tensorflow:global step 160: loss = 0.2464 (46.415 sec/step)
INFO:tensorflow:global step 161: loss = 0.2572 (46.646 sec/step)
INFO:tensorflow:global step 162: loss = 0.2588 (46.315 sec/step)
INFO:tensorflow:global step 163: loss = 0.2801 (46.375 sec/step)
INFO:tensorflow:global step 164: loss = 0.2692 (46.635 sec/step)
INFO:tensorflow:global step 165: loss = 0.2676 (46.785 sec/step)
INFO:tensorflow:Saving checkpoint to path /snappi.ai/tensorflow/nima/log/model.ckpt
INFO:tensorflow:Recording summary at step 165.
INFO:tensorflow:global step 166: loss = 0.2570 (47.218 sec/step)
INFO:tensorflow:global step 167: loss = 0.2452 (47.525 sec/step)
INFO:tensorflow:global step 168: loss = 0.2619 (46.151 sec/step)
INFO:tensorflow:global step 169: loss = 0.2715 (46.252 sec/step)
INFO:tensorflow:global step 170: loss = 0.2813 (46.462 sec/step)
INFO:tensorflow:global step 171: loss = 0.2525 (46.510 sec/step)
INFO:tensorflow:global step 172: loss = 0.2742 (47.134 sec/step)
INFO:tensorflow:Recording summary at step 172.
INFO:tensorflow:global step 173: loss = 0.2351 (48.038 sec/step)
INFO:tensorflow:global step 174: loss = 0.2663 (47.943 sec/step)
INFO:tensorflow:global step 175: loss = 0.2885 (51.293 sec/step)
INFO:tensorflow:global step 176: loss = 0.2470 (47.242 sec/step)
INFO:tensorflow:global step 177: loss = 0.2360 (46.514 sec/step)
INFO:tensorflow:global step 178: loss = 0.2460 (46.257 sec/step)
INFO:tensorflow:Saving checkpoint to path /snappi.ai/tensorflow/nima/log/model.ckpt
INFO:tensorflow:Recording summary at step 178.
INFO:tensorflow:global step 179: loss = 0.2464 (47.571 sec/step)
INFO:tensorflow:global step 180: loss = 0.2763 (49.329 sec/step)
INFO:tensorflow:global step 181: loss = 0.2708 (46.327 sec/step)
INFO:tensorflow:global step 182: loss = 0.2671 (46.133 sec/step)
INFO:tensorflow:global step 183: loss = 0.2594 (46.992 sec/step)
INFO:tensorflow:global step 184: loss = 0.2418 (46.241 sec/step)
INFO:tensorflow:Recording summary at step 184.
INFO:tensorflow:global step 185: loss = 0.2411 (46.249 sec/step)
INFO:tensorflow:global step 186: loss = 0.2856 (46.189 sec/step)
INFO:tensorflow:global step 187: loss = 0.2530 (46.418 sec/step)
INFO:tensorflow:global step 188: loss = 0.2348 (46.300 sec/step)
INFO:tensorflow:global step 189: loss = 0.2668 (46.452 sec/step)
INFO:tensorflow:global step 190: loss = 0.2922 (46.487 sec/step)
INFO:tensorflow:global step 191: loss = 0.2145 (46.411 sec/step)
INFO:tensorflow:Saving checkpoint to path /snappi.ai/tensorflow/nima/log/model.ckpt
INFO:tensorflow:Recording summary at step 191.
INFO:tensorflow:global step 192: loss = 0.2491 (47.197 sec/step)
INFO:tensorflow:global step 193: loss = 0.2705 (47.039 sec/step)
INFO:tensorflow:global step 194: loss = 0.2641 (46.245 sec/step)
INFO:tensorflow:global step 195: loss = 0.2299 (46.284 sec/step)
INFO:tensorflow:global step 196: loss = 0.2646 (46.379 sec/step)
INFO:tensorflow:global step 197: loss = 0.2417 (46.250 sec/step)
INFO:tensorflow:Recording summary at step 197.
INFO:tensorflow:global step 198: loss = 0.2045 (46.297 sec/step)
INFO:tensorflow:global step 199: loss = 0.2479 (46.342 sec/step)
INFO:tensorflow:global step 200: loss = 0.2837 (46.310 sec/step)
INFO:tensorflow:Stopping Training.
INFO:tensorflow:Finished training! Saving model to disk.

```