# NIMA: Neural Image Assessment
replicate model for NIMA image assessment
see: 
* https://research.googleblog.com/2017/12/introducing-nima-neural-image-assessment.html
* https://arxiv.org/abs/1709.05424



## Table of Contents
<a href="#Install">Installation and setup</a><br>
<a href="#Conversion">`TFRecord` Conversion</a><br>
<a href="#Model">Model & Loss</a><br>
<a href="#Train">Training</a><br>
<a href="#Evaluation">Evaluation</a><br>

## Overview
### Components
tensorflow models for VGG16, Inception-v2, MobileNet
* https://github.com/tensorflow/models/tree/master/research/slim

training datasets
* AVA: A largescale database for aesthetic visual analysis
  * http://refbase.cvc.uab.es/files/MMP2012a.pdf
  * https://github.com/mtobeiyf/ava_downloader
  * https://mega.nz/#F!hIEhQTLY key `!RkOnZv8Fz7EbYreHsiEzvA` (32GB)
  * https://mega.nz/#!MUcXyBSB key `!0Q0Nq8_zBuSGiKmEHuKXKoAg8SDsB-21GwlJ22AJegU`
  
* TID2013: http://www.ponomarenko.info/tid2013.htm
  * http://www.ponomarenko.info/tid2013/tid2013.rar (1GB)

### Pipeline
* input images are rescaled to 256 × 256, and then a crop of size 224 × 224 crop is randomly extracted.
* random data augmentation in our training process is horizontal flipping of the image crops.

### Score
* mean quality score = `sum_N( s_i*p_i)`


### Loss function
* EMD (Earth Movers Distance) penalize mis-classifications according to class distances.
  * https://gist.github.com/mjdietzx/a8121604385ce6da251d20d018f9a6d6
  * https://www.tensorflow.org/api_docs/python/tf/distributions/Distribution
* CMD, cumulative distr function, N_ava=10, N_tid=9
    ```
    EMD(p,phat) = (1/N.*sum_k( abs(CDF_p(k)-CDF_phat(k)).^2 )).^0.5
    ```

### Training
* 80/20 train/test split on AVA and TID datasets

hyperparameters
  * `momentum=0.9, lambda=3e-7` 
  * `dropout=0.75` applied to last layer of baseline network

FC layer, n=10, followed by softmax activations
  * `lambda_fc=3e-6`

lambda `decay=0.95` after every 10 epochs

**???: how many epochs**


  

<a id='Install'></a>
## Installation and Setup

### Download datasets
* AVA dataset is 32GB, 256K images
* TID dataset is about 1GB, about 3K images


In [5]:
# set key paths
import os
if not 'HOME' in globals(): 
    HOME = %pwd
SLIM = HOME + '/models/research/slim'
CHECKPOINTS = os.path.join(HOME, 'ckpt')
TRAIN_LOG = os.path.join(HOME, 'log')
TMP = HOME + '/tmp'
TID=os.path.join(HOME, 'data', 'tid')
AVA=os.path.join(HOME, 'data', 'ava')

In [None]:
# !mkdir -p $AVA
# !mkdir -p $TID

### TID dataset

In [None]:
%cd $DATA
# download tid, 1GB 
!wget http://www.ponomarenko.info/tid2013/tid2013.rar

In [None]:
# rar archiver for python/conda, https://anaconda.org/pypi/unrar
!pip install -i https://pypi.anaconda.org/pypi/simple unrar
#!conda install -c mlgill rarfile 

### AVA dataset
The AVA dataset is available on MEGA.nz as a 32GB download split into 64 `7z` archive files. You must first register with MEGA and install the desktop client in order to get enough transfer bandwidth to download. Overall, it will take a few days. see https://github.com/mtobeiyf/ava_downloader.

Once the archive files are available, use a `7z` unarchiver to extract. The dataset is 255,000 JPG files in **one** directory.

The images are converted into `TF_Records` for learning. For NIMA, an optional step is to resize all images to `(256,256,3)` before creating `TF_Records` to minimize upload times for cloud=based training. On OSX, this can be done via the following shell script:

```
  export SOURCE=/Volumes/data/DATASETS/AVA/images
  export TARGET=/Volumes/data/DATASETS/AVA/images-256
  mkdir -p $TARGET
  for f in $SOURCE/*.jpg; do
    sips -z 256 256 --setProperty formatOptions high $f --out $TARGET
  done
```

### Tensorflow Models with Pre-trained Weights

In [None]:
%cd $HOME
!git clone https://github.com/mixuala/models  # https://github.com/tensorflow/models/

In [None]:
# check tf-slim install
%cd $SLIM
!python -c "from nets import cifarnet; mynet = cifarnet.cifarnet"


In [None]:
%cd $SLIM
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function

import matplotlib
%matplotlib inline
import matplotlib.pyplot as plt
import math
import numpy as np
import tensorflow as tf
import time

from datasets import dataset_utils

# Main slim library
from tensorflow.contrib import slim

In [None]:
# download tensorflow checkpoints, i.e. pre-trained weights
!mkdir -p $CKPT
!mkdir -p $TMP
%cd $TMP

### download model checkpoint
# vgg16
!wget http://download.tensorflow.org/models/vgg_16_2016_08_28.tar.gz
!tar -xvf vgg_16_2016_08_28.tar.gz
%mv vgg_16.ckpt $CKPT
%rm vgg_16_2016_08_28.tar.gz

# inception-v2
!wget http://download.tensorflow.org/models/inception_v2_2016_08_28.tar.gz
!tar -xvf inception_v2_2016_08_28.tar.gz
%mv inception_v2.ckpt $CKPT
%rm inception_v2_2016_08_28.tar.gz

# MobileNet
!wget http://download.tensorflow.org/models/mobilenet_v1_1.0_224_2017_06_14.tar.gz
!tar -xvf mobilenet_v1_1.0_224_2017_06_14.tar.gz
%mv mobilenet_v1_1.0_224.ckpt.* $CKPT
%rm mobilenet_v1_1.0_224_2017_06_14.tar.gz  

<a id='Conversion'></a>
## Dataset Conversion to `TFRecord`

* Choosing shard values to get TFRecord files of size ~100MB see: https://www.tensorflow.org/performance/performance_guide#input_pipeline_optimization
* 20% of dataset reserved for testing


In [None]:
# config env
import os
if not 'HOME' in globals(): 
    HOME = %pwd
SLIM = HOME + '/models/research/slim'
CHECKPOINTS = os.path.join(HOME, 'ckpt')
TMP = HOME + '/tmp'
TID = os.path.join(HOME, 'data', 'tid')
AVA = os.path.join(HOME, 'data', 'ava')   # dev dataset  "/snappi.ai/tensorflow/nima/data/ava"
# AVA = "/Volumes/data/DATASETS/AVA"        # 32GB dataset
AVA = "/Volumes/data/data/ava"

In [None]:
# convert dataset to `TFRecord`
%cd $SLIM
import tensorflow as tf
from datasets import dataset_utils, convert_nima_tid, convert_nima_ava
AVA = "/Volumes/data/data/ava"
# convert_nima_tid.run(TID)
convert_nima_ava.run(AVA, resized=True, shards=1)

In [None]:
# verify conversion by checking sample data
# NOTE: _mean_image_subtraction() will cause color shifts
%cd $SLIM
import matplotlib
%matplotlib inline
import matplotlib.pyplot as plt
import math
import numpy as np
import tensorflow as tf
import time

from datasets import dataset_utils, nima_tid, nima_ava
import tensorflow as tf
from preprocessing import preprocessing_factory
import preprocessing.nima_preprocessing as nima_pre
from tensorflow.contrib import slim

import sys

use_resized_images = True
is_training = True
split_name = 'train' if is_training else 'validation'

nima_preprocessing = preprocessing_factory.get_preprocessing('nima')

dataset_name = "AVA"
with tf.Graph().as_default(): 
    
    if dataset_name == "AVA":
        dataset = nima_ava.get_split(split_name, AVA, 
                                     resized=use_resized_images)
        data_provider = slim.dataset_data_provider.DatasetDataProvider(
            dataset, common_queue_capacity=32, common_queue_min=1)
        image, id, ratings, mean, stddev, height, width = data_provider.get(
            ['image', 'id', 'ratings', 'mean', 'stddev', 'height', 'width'])

    elif dataset_name == "TID":
        dataset = nima_tid.get_split(split_name, TID, file_pattern='nima_tid_%s_*.tfrecord')
        data_provider = slim.dataset_data_provider.DatasetDataProvider(
            dataset, common_queue_capacity=32, common_queue_min=1)
        image, id, mean, stddev, height, width = data_provider.get(
            ['image', 'id', 'mean', 'stddev', 'height', 'width'])
    else:
        assert(False)
    
    
    # apply preprocessing
    image = nima_preprocessing(image, 224,224,
            is_training=is_training,
            resized=use_resized_images)

        
    with tf.Session() as sess:    
        with slim.queues.QueueRunners(sess):
            for i in range(4):
                if dataset_name=="AVA":
                    np_image, np_id, np_mean, np_stddev, np_ratings, np_h, np_w = sess.run(
                        [image, id, mean, stddev, ratings, height, width])
                    title = '%s: %d x %d, %f/%f, (%s) %s' % (
                        np_id.decode("utf-8"), np_h, np_w, np_mean, np_stddev, np_ratings, tf.shape(image))
                else:
                    np_image, np_id, np_mean, np_stddev, np_h, np_w = sess.run(
                        [image, id, mean, stddev, height, width])
                    title = '%s: %d x %d, %f/%f, %s' % (
                        np_id.decode("utf-8"), np_h, np_w, np_mean, np_stddev, tf.shape(image))
                    
                h,w, _ = np_image.shape
                
                plt.figure()
                plt.imshow(np_image.astype(np.uint8))
                plt.title(title)
                plt.axis('off')
                plt.show()

<a id='Model'></a>
## Model & Loss
Nima was developed to work with multiple pre-trained CNNs, including `Vgg16`, `Inception-v2`, and `MobileNet`



### Helper Functions

In [6]:
# from datasets.nima import load_batch


# modified from slim_walkthrough
%cd $SLIM
import tensorflow as tf
from preprocessing import preprocessing_factory
from tensorflow.contrib import slim

nima_preprocessing = preprocessing_factory.get_preprocessing('nima')

def load_batch(dataset, batch_size=32, height=224, width=224, 
            is_training=False, 
            resized=True,
            model="vgg16",
            label_name="ratings"):
    """Loads a single batch of data.
    
    Args:
      dataset: The dataset to load.
      batch_size: The number of images in the batch.
      height: The size of each image after preprocessing.
      width: The size of each image after preprocessing.
      is_training: Whether or not we're currently training or evaluating.
      resized: Whether the TFRecords were converted with images already resized to (256,256,3)
    
    Returns:
      images: A Tensor of size [batch_size, height, width, 3], image samples that have been preprocessed.
      images_raw: A Tensor of size [batch_size, height, width, 3], image samples that can be used for visualization.
      labels: A Tensor of size [batch_size], whose values range between 0 and dataset.num_classes.
    """
    data_provider = slim.dataset_data_provider.DatasetDataProvider(
        dataset, common_queue_capacity=32,
        common_queue_min=8)
    image_raw, label = data_provider.get(['image', label_name])
    
    # Preprocess image for usage by the appropriate model.
    image = {
      'vgg16': nima_preprocessing(image_raw, height, width, is_training=is_training,
                              resized=resized),
      'inception': None,
      'mobilenet': None,
    }[model]

        
    # Preprocess the image for display purposes.
    image_raw = tf.expand_dims(image_raw, 0)
    image_raw = tf.image.resize_images(image_raw, [height, width])
    image_raw = tf.squeeze(image_raw)

    # Batch it up.
    images, images_raw, labels = tf.train.batch(
          [image, image_raw, label],
          batch_size=batch_size,
          num_threads=1,
          capacity=2 * batch_size)
    
    return images, images_raw, labels



/snappi.ai/tensorflow/nima/models/research/slim


The Nima model specifies a different `learning rate` for the `finetune` layers of the model, and also a `momentum` value, which was not part of the original `vgg_16.ckpt`. This seems to suggest that we need to use 2 separate optimizers with 2 separate gradient calculations.

In [7]:
%cd $HOME
# from nima_utils import slim_learning_create_train_op_with_manual_grads

from tensorflow.python.framework import ops
from tensorflow.python.ops import array_ops
from tensorflow.python.ops import control_flow_ops
from tensorflow.python.training import training_util

def slim_learning_create_train_op_with_manual_grads( total_loss, optimizers, grads_and_vars,
            global_step=0,                                                            
#                     update_ops=None,
#                     variables_to_train=None,
            clip_gradient_norm=0,
            summarize_gradients=False,
            gate_gradients=1,               # tf.python.training.optimizer.Optimizer.GATE_OP,
            aggregation_method=None,
            colocate_gradients_with_ops=False,
            gradient_multipliers=None,
            check_numerics=True):
    """Runs the training loop
            modified from slim.learning.create_train_op() to work with
            a matched list of optimizers and grads_and_vars

    Returns:
        train_ops - the value of the loss function after training.
    """

    def transform_grads_fn(grads):
        if gradient_multipliers:
            with ops.name_scope('multiply_grads'):
                grads = multiply_gradients(grads, gradient_multipliers)

        # Clip gradients.
        if clip_gradient_norm > 0:
            with ops.name_scope('clip_grads'):
                grads = clip_gradient_norms(grads, clip_gradient_norm)
        return grads

    if global_step is None:
        global_step = training_util.get_or_create_global_step()

    assert len(optimizers)==len(grads_and_vars)

    ### order of processing:
    # 0. grads = opt.compute_gradients() 
    # 1. grads = transform_grads_fn(grads)
    # 2. add_gradients_summaries(grads)
    # 3. grads = opt.apply_gradients(grads, global_step=global_step) 

    grad_updates = []
    for i in range(len(optimizers)):
        grads = grads_and_vars[i]                               # 0. kvarg, from opt.compute_gradients()
        grads = transform_grads_fn(grads)                       # 1. transform_grads_fn()
        if summarize_gradients:
            with ops.name_scope('summarize_grads'):
                slim.learning.add_gradients_summaries(grads)    # 2. add_gradients_summaries()
        if i==0:
            grad_update = optimizers[i].apply_gradients( grads, # 3. optimizer.apply_gradients()
                        global_step=global_step)                #    update global_step only once
        else:
            grad_update = optimizers[i].apply_gradients( grads )
        grad_updates.append(grad_update)

    with ops.name_scope('train_op'):
        total_loss = array_ops.check_numerics(total_loss,
                                        'LossTensor is inf or nan')
        train_op = control_flow_ops.with_dependencies(grad_updates, total_loss)

    # Add the operation used for training to the 'train_op' collection    
    train_ops = ops.get_collection_ref(ops.GraphKeys.TRAIN_OP)
    if train_op not in train_ops:
        train_ops.append(train_op)

    return train_op

/snappi.ai/tensorflow/nima


#### EMD loss function

The loss function uses a normalized `Earth Movers Distance` to penalize mis-classifications according to class distances. This is based on the difference between the `Cumulative Distribution Function (CDF)` of the `y` and `y_hat` values for the ratings distribution.

In [8]:
# %cd $HOME
# from nima_utils import NimaUtils

import tensorflow as tf
import numpy as np

""" _CDF in tensorflow """
#
# private methods used by class NimaUtils()
#
def _weighted_score(x):
  m,n = tf.convert_to_tensor(x).get_shape().as_list()
  return tf.multiply(x, tf.range(1, n+1 , dtype=tf.float32))  # (None,10)

def _CDF (k, x):
  # assert k <= tf.shape(x)[1]
  m,n = tf.convert_to_tensor(x).get_shape().as_list()
  w_score = _weighted_score(x)        # (None,10)
  cum_k_score = tf.reduce_sum(w_score[:,:k], axis=1)  # (None)
  total = tf.reduce_sum(w_score, axis=1)  # (None)
  cdf = tf.divide(cum_k_score, total)     # (None)
  return tf.reshape(cdf, [m,1] ) # (None,1)

def _cum_CDF (x):
  # y = tf.concat( [   _CDF(i,x)    for i in tf.range(1, tf.shape(x)[1]+1) ] )
  x = tf.to_float(x)
  m,n = tf.convert_to_tensor(x).get_shape().as_list()
  y = tf.concat( [_CDF(1,x),_CDF(2,x),_CDF(3,x),_CDF(4,x),_CDF(5,x),
      _CDF(6,x),_CDF(7,x),_CDF(8,x),_CDF(9,x),_CDF(10,x)], 
      axis=1 )
  return tf.reshape(y, [m,n] )

def _emd(y, y_hat):
    """Returns the earth mover distance between to arrays of ratings, 
    based on cumulative distribution function
    
    Args:
      y, y_hat: a mini-batch of ratings, each composed of a count of scores 
                shape = (None, n), array of count of scores for score from 1..n

    Returns:
      float 
    """
    r = 2.
    m,n = tf.convert_to_tensor(y).get_shape().as_list()
    N = tf.to_float(n)
    cdf_loss = tf.subtract(_cum_CDF(y), _cum_CDF(y_hat))
    emd_loss = tf.pow( tf.divide( tf.reduce_sum( tf.pow(cdf_loss, r), axis=1 ), N), 1/r)
  #   return tf.reshape(emd_loss, [m,1])
    return tf.reduce_mean(emd_loss)


class NimaUtils(object):
  """Help Class for Nima calculations
    NimaUtils.emd(y, y_hat) return float
    NimaUtils.score( y ) returns [[mean, std]]
  """
  @staticmethod
  def emd(y, y_hat):
    return _emd(y, y_hat)

  @staticmethod
  def mu(y, shape=None):
    """mean quality score for ratings
    
    Args:
      y, y_hat: a mini-batch of ratings, each composed of a count of scores 
                shape = (None, n), array of count of scores for score from 1..n

    Returns:
      array of [mean] floats for each row in y
    """
    y = tf.convert_to_tensor(y)
    m,n = y.get_shape().as_list()
    mean = tf.reduce_sum(_weighted_score(y), axis=1)/tf.reduce_sum(y, axis=1)
    return tf.reshape(mean, [m,1])
  
  @staticmethod
  def sigma(y, shape=None):
    """standard deviation of ratings
    
    Args:
      y, y_hat: a mini-batch of ratings, each composed of a count of scores 
                shape = (None, n), array of count of scores for score from 1..n

    Returns:
      array of [stddev] floats for each row in y
    """    
    y = tf.convert_to_tensor(y)
    m,n = y.get_shape().as_list()    
    mean = NimaUtils.mu(y)
    s = tf.range(1, n+1 , dtype=tf.float32)
    p_score = tf.divide(y, tf.reshape(tf.reduce_sum(y, axis=1),[m,1]))
    stddev = tf.sqrt(tf.reduce_sum( tf.multiply(tf.square(tf.subtract(s,mean)),p_score), axis=1))
    return tf.reshape(stddev, [m,1])

  @staticmethod
  def score(y):
    """returns [mean quality score, stddev] for each row"""
    return tf.concat([NimaUtils.mu(y), NimaUtils.sigma(y)], axis=1)


### Vgg16
Finetune `Vgg16` for `Nima`, using AVA dataset
* the `baseline` model is `Vgg16` with the top (fc8) layer removed
* `finetune` with a `fully_connected` layer with `softmax` activations, `n=10` for AVA
* **???:** `n=9` for TID2013. TID2013 is based on ratings from 1-9, but these score distributions are approximated from a mean/stddev target value using `maximum entropy optimization`
* remove `Vgg16/fc8` and add `fc8` for 10 classes to learn ratings
* restore `vgg_16.ckpt` weights for all but last layer



In [9]:
# Nima Model based on Vgg16 for training against AVA dataset

%cd $HOME
# from nima_utils import nima_vgg_16

%cd $SLIM
from nets import vgg
from tensorflow.contrib import slim

# copied from nets.vgg.vgg_16 with slight modifications
def nima_vgg_16(inputs,
           num_classes=10,
           is_training=True,
           dropout_keep_prob=0.5,
           dropout7_keep_prob=0.5,        # added kvarg to change value for dropout7 only
           spatial_squeeze=True,
           scope='vgg_16',
           fc_conv_padding='VALID',
           global_pool=False):
  """Oxford Net VGG 16-Layers version D Example.

  Note: All the fully_connected layers have been transformed to conv2d layers.
        To use in classification mode, resize input to 224x224.

  Args:
    inputs: a tensor of size [batch_size, height, width, channels].
    num_classes: number of predicted classes. If 0 or None, the logits layer is
      omitted and the input features to the logits layer are returned instead.
    is_training: whether or not the model is being trained.
    dropout_keep_prob: the probability that activations are kept in the dropout
      layers during training.
    spatial_squeeze: whether or not should squeeze the spatial dimensions of the
      outputs. Useful to remove unnecessary dimensions for classification.
    scope: Optional scope for the variables.
    fc_conv_padding: the type of padding to use for the fully connected layer
      that is implemented as a convolutional layer. Use 'SAME' padding if you
      are applying the network in a fully convolutional manner and want to
      get a prediction map downsampled by a factor of 32 as an output.
      Otherwise, the output prediction map will be (input / 32) - 6 in case of
      'VALID' padding.
    global_pool: Optional boolean flag. If True, the input to the classification
      layer is avgpooled to size 1x1, for any input size. (This is not part
      of the original VGG architecture.)

  Returns:
    net: the output of the logits layer (if num_classes is a non-zero integer),
      or the input to the logits layer (if num_classes is 0 or None).
    end_points: a dict of tensors with intermediate activations.
  """
  with tf.variable_scope(scope, 'vgg_16', [inputs]) as sc:
    end_points_collection = sc.original_name_scope + '_end_points'
    # Collect outputs for conv2d, fully_connected and max_pool2d.
    with slim.arg_scope([slim.conv2d, slim.fully_connected, slim.max_pool2d],
                        outputs_collections=end_points_collection):
      net = slim.repeat(inputs, 2, slim.conv2d, 64, [3, 3], scope='conv1')
      net = slim.max_pool2d(net, [2, 2], scope='pool1')
      net = slim.repeat(net, 2, slim.conv2d, 128, [3, 3], scope='conv2')
      net = slim.max_pool2d(net, [2, 2], scope='pool2')
      net = slim.repeat(net, 3, slim.conv2d, 256, [3, 3], scope='conv3')
      net = slim.max_pool2d(net, [2, 2], scope='pool3')
      net = slim.repeat(net, 3, slim.conv2d, 512, [3, 3], scope='conv4')
      net = slim.max_pool2d(net, [2, 2], scope='pool4')
      net = slim.repeat(net, 3, slim.conv2d, 512, [3, 3], scope='conv5')
      net = slim.max_pool2d(net, [2, 2], scope='pool5')

      # Use conv2d instead of fully_connected layers.
      net = slim.conv2d(net, 4096, [7, 7], padding=fc_conv_padding, scope='fc6')
      net = slim.dropout(net, dropout_keep_prob, is_training=is_training,
                         scope='dropout6')
      net = slim.conv2d(net, 4096, [1, 1], scope='fc7')
      # Convert end_points_collection into a end_point dict.
      end_points = slim.utils.convert_collection_to_dict(end_points_collection)
      if global_pool:
        net = tf.reduce_mean(net, [1, 2], keep_dims=True, name='global_pool')
        end_points['global_pool'] = net
      if num_classes:
        net = slim.dropout(net, dropout7_keep_prob, is_training=is_training,  # override dropout_keep_prob
                           scope='dropout7')
        net = slim.conv2d(net, num_classes, [1, 1],
                          activation_fn=None,
                          normalizer_fn=None,
                          scope='fc8')
        if spatial_squeeze and num_classes is not None:
          net = tf.squeeze(net, [1, 2], name='fc8/squeezed')
        end_points[sc.name + '/fc8'] = net
      return net, end_points

#     predictions = tf.nn.softmax(net)

/snappi.ai/tensorflow/nima
/snappi.ai/tensorflow/nima/models/research/slim


### Hyperparameters

In [16]:
# load config and hyperparams
dataset_params = {
    'ava':{ 'path': AVA, 'max_score': 10 },
    'tid':{ 'path': TID, 'max_score': 9  },
    'is_training': True,
    'use_resized_images': True,
}
# Hyperparams from research paper
h_params = {
    "regularization": "?",
    "momentum": 0.9,         # weight and bias   
    "dropout7_keep": 0.75,    # applied to last layer of baseline network only, scope="dropout7"
    "learning_rate": {  
        "baseline": 3e-7,    # baseline CNN layers
        "finetune": 3e-6,    # last fc layer only
    },
    "learning_rate_decay": 0.95,  # applied to both learning rates, after every 10 epochs
}

<a id='Train'></a>
## Training
Using `tf.slim.learning`-style training loop



In [17]:
# load helper functions (also defined above)
%cd $HOME
from nima_utils import NimaUtils, nima_vgg_16
from nima_utils import slim_learning_create_train_op_with_manual_grads

%cd $SLIM
from datasets.nima import load_batch

/snappi.ai/tensorflow/nima
/snappi.ai/tensorflow/nima/models/research/slim


In [18]:
# runtime adjustments
checkpoints_dir = CHECKPOINTS
log_dir = TRAIN_LOG = os.path.join(HOME, 'log')

# learning adjustments
dataset_params['is_training'] = True
# h_params['learning_rate']['finetune'] = 3e-3
dataset_name = "ava"  # ava or tid

In [None]:
%cd $HOME
# from nima_utils import NimaUtils

# Vgg16 (last layer removed) > FC(10) > softmax activations > ratings predictions
%cd $SLIM
import numpy as np
import tensorflow as tf

from preprocessing import preprocessing_factory
from datasets import dataset_utils, nima_tid, nima_ava
from datasets.nima import load_batch as load_nima_batch

from tensorflow.contrib import slim


if not tf.gfile.Exists(log_dir):  tf.gfile.MakeDirs(log_dir)
    
    
# derived params
split_name = 'train' if dataset_params["is_training"] else 'validation'
num_classes_finetune = dataset_params[dataset_name]["max_score"]
dataset_path = dataset_params[dataset_name]["path"]


# build graph    
tf.reset_default_graph()
with tf.Graph().as_default():
    global_step = tf.Variable(0, trainable=False)

    #
    # prepare mini-batches
    #
    dataset = nima_ava.get_split(split_name, dataset_path, 
                                 resized=dataset_params['use_resized_images'])
    images, images_raw, labels = load_batch(dataset, 
                batch_size=32,
                is_training=dataset_params["is_training"],
                resized=dataset_params['use_resized_images'] )


  
        
    # load vgg_16 with modified value for dropout7
    net, end_points = nima_vgg_16(images, 
                                  num_classes=num_classes_finetune, 
                                  dropout7_keep_prob=h_params["dropout7_keep"],
                                  is_training=dataset_params["is_training"])
    predictions = tf.nn.softmax(net)


    #
    # define loss functions––include with slim.losses.get_total_loss()
    #
    emd_loss =  NimaUtils.emd(labels, predictions) 
    tf.losses.add_loss(emd_loss) # Letting TF-Slim know about the emd loss.
    total_loss = tf.losses.get_total_loss( add_regularization_losses=True )

    #
    # configure training loop MANUALLY & apply hyperparams:
    #   - exponential_decay of learning_rate
    #   - learning_rate by layers
    # 
    # apply learning rate decay
    lr_decay = {}
    decay = h_params["learning_rate_decay"]
    for k in ["baseline", "finetune"]:
        lr_decay[k] = tf.train.exponential_decay( h_params['learning_rate'][k],
                                    global_step, 10, 
                                    h_params["learning_rate_decay"], 
                                    staircase=True)


    #
    # configure training loop MANUALLY, apply learning rates by layer
    #
    #   see: https://stackoverflow.com/questions/34945554/how-to-set-layer-wise-learning-rate-in-tensorflow
    split_index = -2     # last layer weights & bias, count=2
    training = {"baseline":{}, "finetune":{}}
    # vars
    trainable = tf.trainable_variables()
    training["baseline"]["vars"] = trainable[:split_index]
    training["finetune"]["vars"] = trainable[-split_index:]
    # grads
    gradients = tf.gradients( total_loss, trainable )
    training["baseline"]["grads"] = gradients[:split_index]
    training["finetune"]["grads"] = gradients[-split_index:]
    # optimizers

    training["baseline"]["opt"] = tf.train.GradientDescentOptimizer(
                                    learning_rate=lr_decay["baseline"])
    # I only want to apply momentum to the finetune layers, and the vgg_16.ckpt did not include momentum...
#         training["finetune"]["opt"] = tf.train.GradientDescentOptimizer(
#                                         learning_rate=lr_decay["finetune"],
#                                         )
    training["finetune"]["opt"] = tf.train.MomentumOptimizer(
                                    learning_rate=lr_decay["finetune"],
                                    momentum=h_params["momentum"]
                                    )

    grads_and_vars = [ zip(training["baseline"]["grads"], training["baseline"]["vars"]), 
                       zip(training["finetune"]["grads"], training["finetune"]["vars"]) ]
    optimizers = [ training["baseline"]["opt"], training["finetune"]["opt"] ]

    train_op = slim_learning_create_train_op_with_manual_grads(total_loss, 
                               optimizers, grads_and_vars, 
                               global_step=global_step)

    
    restore_vars = slim.get_variables_to_restore(exclude=['vgg_16/fc8', 'Variable'])
    
    # restore Vgg16 to net["baseline"], excludes last layer, fc8
    init_fn = slim.assign_from_checkpoint_fn(
        os.path.join(checkpoints_dir, 'vgg_16.ckpt'),
        restore_vars,
        ignore_missing_vars=True
        )
    
    
    # start training
    final_loss = slim.learning.train(train_op, log_dir, 
                        init_fn=init_fn,
                        global_step=global_step,
                        number_of_steps=25,
                        save_summaries_secs=300,
                        save_interval_secs=600                       
                       )

    print('Finished training. Last batch loss %f' % final_loss)
    print('>   dataset=', dataset_path )
    print('>   hyperparams=%s' % h_params)

/snappi.ai/tensorflow/nima
/snappi.ai/tensorflow/nima/models/research/slim
>> TFRecord_dir=/snappi.ai/tensorflow/nima/data/ava/TFRecords_resized, 
>> pattern=nima_ava_train_*.tfrecord
INFO:tensorflow:Restoring parameters from /snappi.ai/tensorflow/nima/ckpt/vgg_16.ckpt
INFO:tensorflow:Starting Session.
INFO:tensorflow:Saving checkpoint to path /snappi.ai/tensorflow/nima/log/model.ckpt
INFO:tensorflow:Starting Queues.
INFO:tensorflow:Recording summary at step 0.
INFO:tensorflow:Variable/sec: 0
INFO:tensorflow:global step 1: loss = 0.2510 (56.180 sec/step)


## training log, small dataset