<a href="https://colab.research.google.com/github/muksmuks/computer_vision/blob/master/project_14/Week14.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [1]:
import numpy as np
import time, math
from tqdm import tqdm_notebook as tqdm

import tensorflow as tf
import tensorflow.contrib.eager as tfe

The TensorFlow contrib module will not be included in TensorFlow 2.0.
For more information, please see:
  * https://github.com/tensorflow/community/blob/master/rfcs/20180907-contrib-sunset.md
  * https://github.com/tensorflow/addons
  * https://github.com/tensorflow/io (for I/O related ops)
If you depend on functionality not listed there, please file an issue.



####Assignment Objectives
* Run the following code: Colab 93%, 24 Epochs, 858 seconds
* Move the code to Tensorflow Eager Mode
* Move the data to TFRecords
* Submit

#####Enable Eager mode

In [0]:
tf.enable_eager_execution()

#####Hyperparameters

In [0]:
BATCH_SIZE = 512 #@param {type:"integer"}
MOMENTUM = 0.9 #@param {type:"number"}
LEARNING_RATE = 0.4 #@param {type:"number"}
WEIGHT_DECAY = 5e-4 #@param {type:"number"}
EPOCHS = 24 #@param {type:"integer"}

#####Parameter initialization


In [0]:
def init_pytorch(shape, dtype=tf.float32, partition_info=None):
  fan = np.prod(shape[:-1])
  bound = 1 / math.sqrt(fan)
  return tf.random.uniform(shape, minval=-bound, maxval=bound, dtype=dtype)

#####Model

For enabling the model for eager execution, one need to pass __dynamic=True__ in all object instantization

From TF documentation:

__dynamic__: Set this to True if your layer should only be run eagerly, and should not be used to generate a static computation graph. This would be the case for a Tree-RNN or a recursive network, for example, or generally for any layer that manipulates tensors using Python control flow. If False, we assume that the layer can safely be used to generate a static computation graph.

In [0]:
class ConvBN(tf.keras.Model):
  def __init__(self, c_out):
    super().__init__(dynamic=True)
    self.conv = tf.keras.layers.Conv2D(filters=c_out, kernel_size=3, padding="SAME", kernel_initializer=init_pytorch, use_bias=False, dynamic=True)
    self.bn = tf.keras.layers.BatchNormalization(momentum=0.9, epsilon=1e-5, dynamic=True)

  def call(self, inputs):
    return tf.nn.relu(self.bn(self.conv(inputs)))

In [0]:
class ResBlk(tf.keras.Model):
  def __init__(self, c_out, pool, res = False):
    super().__init__()
    self.conv_bn = ConvBN(c_out)
    self.pool = pool
    self.res = res
    if self.res:
      self.res1 = ConvBN(c_out)
      self.res2 = ConvBN(c_out)

  def call(self, inputs):
    h = self.pool(self.conv_bn(inputs))
    if self.res:
      h = h + self.res2(self.res1(h))
    return h

In [0]:
class DavidNet(tf.keras.Model):
  def __init__(self, c=64, weight=0.125):
    super().__init__(dynamic=True)
    pool = tf.keras.layers.MaxPooling2D(dynamic=True)
    self.init_conv_bn = ConvBN(c)
    self.blk1 = ResBlk(c*2, pool, res = True)
    self.blk2 = ResBlk(c*4, pool)
    self.blk3 = ResBlk(c*8, pool, res = True)
    self.pool = tf.keras.layers.GlobalMaxPool2D(dynamic=True)
    self.linear = tf.keras.layers.Dense(10, kernel_initializer=init_pytorch, use_bias=False, dynamic=True)
    self.weight = weight

  def call(self, x, y):
    h = self.pool(self.blk3(self.blk2(self.blk1(self.init_conv_bn(x)))))
    h = self.linear(h) * self.weight
    ce = tf.nn.sparse_softmax_cross_entropy_with_logits(logits=h, labels=y)
    loss = tf.reduce_sum(ce)
    correct = tf.reduce_sum(tf.cast(tf.math.equal(tf.argmax(h, axis = 1), y), tf.float32))
    return loss, correct

#####TFRecord
__TFRecord__ file format, Tensorflow’s own binary storage format

Binary data takes up less space on disk, takes less time to copy and can be read much more efficiently from disk.

It is optimized for use with Tensorflow in multiple ways. To start with, it makes it easy to combine multiple datasets and integrates seamlessly with the data import and preprocessing functionality provided by the library. Especially for datasets that are too large to be stored fully in memory this is an advantage as only the data that is required at the time (e.g. a batch) is loaded from disk and then processed. 

#####TF Record Dataset
  1. Build a dataset builder and download from the TensorFlow repository and not Keras (idea is to work on files instead of loaded memory as ndarray)

  2. Load the CIFAR-10 data from the TFRecord files separating test and train set

  3. Read each record as tuple of image and label Tensors
  4. Add normalization, padding and data augmentation methods in the Tensor format to ensure they can be computed on demand

  5. Add batching and prefetch for subsequent iteration and data read

Till actual iteration begins data is not accessible or loaded (other than prefetched data). Hence, all the computation is carried out in the delayed execution mode.

In [0]:
import tensorflow as tf
import tensorflow.contrib.eager as tfe
from tensorflow.data import  TFRecordDataset
from tensorflow_datasets.core import DatasetBuilder

import os

tf.enable_eager_execution()
assert tf.executing_eagerly()

def parse_fn(example):
    example_fmt = {
        'image': tf.io.FixedLenFeature((), tf.string, ""),
        'label': tf.io.FixedLenFeature((), tf.int64, -1)
    }
    parsed = tf.parse_single_example(example, example_fmt)
    image = tf.io.decode_image(parsed['image'])
    image = tf.reshape(image, [32, 32, 3])
    return image, parsed['label']

def make_dataset(path):
    dataset = tf.data.TFRecordDataset(path)
    dataset = dataset.map(parse_fn)
    return dataset

In [0]:
import tensorflow_datasets as tfds
from glob import glob

cifar10_bldr = tfds.builder("cifar10")
cifar10_bldr.download_and_prepare()
cifar10_info = cifar10_bldr.info

name = cifar10_info.name
version = '{}'.format(cifar10_info.version)

len_train, len_test = cifar10_info.splits['train'].num_examples, cifar10_info.splits['test'].num_examples

cifar10_bldr.download_and_prepare()

train_path = os.path.join(os.getenv("HOME"), 'tensorflow_datasets', name, version, '*train.tfrecord*')
test_path  = os.path.join(os.getenv("HOME"), 'tensorflow_datasets', name, version, '*test.tfrecord*')

train_fn = glob(train_path)
test_fn  = glob(test_path)


#####Read the TFRecordDataset

In [10]:
train_tfr, test_tfr = make_dataset(train_fn), make_dataset(test_fn)







#####Rest of the steps remains unchanged

In [11]:
train_mean = (0.4914, 0.4822, 0.4465)
train_std  = (0.2023, 0.1994, 0.2010)

data_aug  = lambda x, y: (tf.image.random_flip_left_right(tf.random_crop(x, [32, 32, 3])), y)
normalize = lambda x, y: ((tf.cast(x, tf.float32)/255 - train_mean) / train_std, y)
pad4      = lambda x, y: (tf.pad(x, [[4, 4], [4, 4], [0, 0]], mode='reflect'), y)

train_ds = train_tfr.map(pad4).map(normalize).map(data_aug)
test_ds  = test_tfr.map(normalize)







In [0]:
model = DavidNet()
batches_per_epoch = len_train//BATCH_SIZE + 1

lr_schedule = lambda t: np.interp([t], [0, (EPOCHS+1)//5, EPOCHS], [0, LEARNING_RATE, 0])[0]
global_step = tf.train.get_or_create_global_step()
lr_func = lambda: lr_schedule(global_step/batches_per_epoch)/BATCH_SIZE
opt = tf.train.MomentumOptimizer(lr_func, momentum=MOMENTUM, use_nesterov=True)

In [13]:
t = time.time()
test_set = test_ds.batch(BATCH_SIZE)

for epoch in range(EPOCHS):
  train_loss = test_loss = train_acc = test_acc = 0.0
  train_set = train_ds.shuffle(len_train).batch(BATCH_SIZE).prefetch(1)
  tf.keras.backend.set_learning_phase(1)
  for (x, y) in tqdm(train_set):
    with tf.GradientTape() as tape:
      loss, correct = model(x, y)

    var = model.trainable_variables
    grads = tape.gradient(loss, var)
    for g, v in zip(grads, var):
      g += v * WEIGHT_DECAY * BATCH_SIZE
    opt.apply_gradients(zip(grads, var), global_step=global_step)

    train_loss += loss.numpy()
    train_acc += correct.numpy()

  tf.keras.backend.set_learning_phase(0)
  for (x, y) in test_set:
    loss, correct = model(x, y)
    test_loss += loss.numpy()
    test_acc += correct.numpy()
    
  print('epoch:', epoch+1, 'lr:', lr_schedule(epoch+1), 'train loss:', train_loss / len_train, 'train acc:', train_acc / len_train, 'val loss:', test_loss / len_test, 'val acc:', test_acc / len_test, 'time:', time.time() - t)

HBox(children=(IntProgress(value=1, bar_style='info', max=1), HTML(value='')))


epoch: 1 lr: 0.08 train loss: 1.5721419348144532 train acc: 0.43224 val loss: 1.254530386352539 val acc: 0.5459 time: 62.98925232887268


HBox(children=(IntProgress(value=1, bar_style='info', max=1), HTML(value='')))


epoch: 2 lr: 0.16 train loss: 0.8431776782226562 train acc: 0.70044 val loss: 0.7955343536376953 val acc: 0.7168 time: 120.00355410575867


HBox(children=(IntProgress(value=1, bar_style='info', max=1), HTML(value='')))


epoch: 3 lr: 0.24 train loss: 0.6381248666381836 train acc: 0.77844 val loss: 0.5895912750244141 val acc: 0.8021 time: 177.20718669891357


HBox(children=(IntProgress(value=1, bar_style='info', max=1), HTML(value='')))


epoch: 4 lr: 0.32 train loss: 0.5412614306640625 train acc: 0.81476 val loss: 0.6035086227416993 val acc: 0.7928 time: 234.22142338752747


HBox(children=(IntProgress(value=1, bar_style='info', max=1), HTML(value='')))


epoch: 5 lr: 0.4 train loss: 0.4811138626098633 train acc: 0.83482 val loss: 0.6609371795654296 val acc: 0.7949 time: 291.1169469356537


HBox(children=(IntProgress(value=1, bar_style='info', max=1), HTML(value='')))


epoch: 6 lr: 0.37894736842105264 train loss: 0.3988603063964844 train acc: 0.86326 val loss: 0.5354397994995117 val acc: 0.8237 time: 347.7831470966339


HBox(children=(IntProgress(value=1, bar_style='info', max=1), HTML(value='')))


epoch: 7 lr: 0.35789473684210527 train loss: 0.32192429763793945 train acc: 0.88848 val loss: 0.4894889495849609 val acc: 0.8473 time: 404.2696752548218


HBox(children=(IntProgress(value=1, bar_style='info', max=1), HTML(value='')))


epoch: 8 lr: 0.33684210526315794 train loss: 0.2685733576965332 train acc: 0.90626 val loss: 0.3827788650512695 val acc: 0.8685 time: 461.0879006385803


HBox(children=(IntProgress(value=1, bar_style='info', max=1), HTML(value='')))


epoch: 9 lr: 0.31578947368421056 train loss: 0.2353031251525879 train acc: 0.91814 val loss: 0.3785165344238281 val acc: 0.8784 time: 517.6616575717926


HBox(children=(IntProgress(value=1, bar_style='info', max=1), HTML(value='')))


epoch: 10 lr: 0.2947368421052632 train loss: 0.2000919107055664 train acc: 0.93098 val loss: 0.34703148193359373 val acc: 0.8852 time: 573.6234633922577


HBox(children=(IntProgress(value=1, bar_style='info', max=1), HTML(value='')))


epoch: 11 lr: 0.2736842105263158 train loss: 0.17404380767822267 train acc: 0.94036 val loss: 0.3438713439941406 val acc: 0.8902 time: 630.0318131446838


HBox(children=(IntProgress(value=1, bar_style='info', max=1), HTML(value='')))


epoch: 12 lr: 0.25263157894736843 train loss: 0.15070886863708496 train acc: 0.94638 val loss: 0.3284852951049805 val acc: 0.8925 time: 685.7305450439453


HBox(children=(IntProgress(value=1, bar_style='info', max=1), HTML(value='')))


epoch: 13 lr: 0.23157894736842108 train loss: 0.1284814682006836 train acc: 0.95686 val loss: 0.3415565658569336 val acc: 0.8945 time: 740.9515132904053


HBox(children=(IntProgress(value=1, bar_style='info', max=1), HTML(value='')))


epoch: 14 lr: 0.2105263157894737 train loss: 0.11068025794982911 train acc: 0.96264 val loss: 0.4176715873718262 val acc: 0.8803 time: 797.4449036121368


HBox(children=(IntProgress(value=1, bar_style='info', max=1), HTML(value='')))


epoch: 15 lr: 0.18947368421052635 train loss: 0.09452359306335449 train acc: 0.96842 val loss: 0.31381525650024417 val acc: 0.9083 time: 852.5909922122955


HBox(children=(IntProgress(value=1, bar_style='info', max=1), HTML(value='')))


epoch: 16 lr: 0.16842105263157897 train loss: 0.0820290566253662 train acc: 0.97224 val loss: 0.31963213272094726 val acc: 0.9059 time: 907.2129213809967


HBox(children=(IntProgress(value=1, bar_style='info', max=1), HTML(value='')))


epoch: 17 lr: 0.1473684210526316 train loss: 0.06788154857635498 train acc: 0.97734 val loss: 0.2958775573730469 val acc: 0.9123 time: 963.1323714256287


HBox(children=(IntProgress(value=1, bar_style='info', max=1), HTML(value='')))


epoch: 18 lr: 0.12631578947368421 train loss: 0.05772969253540039 train acc: 0.98042 val loss: 0.26740488739013674 val acc: 0.9194 time: 1018.0559492111206


HBox(children=(IntProgress(value=1, bar_style='info', max=1), HTML(value='')))


epoch: 19 lr: 0.10526315789473689 train loss: 0.04610778419494629 train acc: 0.98546 val loss: 0.2675219009399414 val acc: 0.92 time: 1073.0491678714752


HBox(children=(IntProgress(value=1, bar_style='info', max=1), HTML(value='')))


epoch: 20 lr: 0.08421052631578951 train loss: 0.03724197769165039 train acc: 0.9889 val loss: 0.25897873153686524 val acc: 0.9259 time: 1128.5794761180878


HBox(children=(IntProgress(value=1, bar_style='info', max=1), HTML(value='')))


epoch: 21 lr: 0.06315789473684214 train loss: 0.02998944305419922 train acc: 0.99172 val loss: 0.26045743713378905 val acc: 0.9249 time: 1183.9294168949127


HBox(children=(IntProgress(value=1, bar_style='info', max=1), HTML(value='')))


epoch: 22 lr: 0.04210526315789476 train loss: 0.026339801807403564 train acc: 0.99298 val loss: 0.24677541580200196 val acc: 0.9288 time: 1241.1666643619537


HBox(children=(IntProgress(value=1, bar_style='info', max=1), HTML(value='')))


epoch: 23 lr: 0.02105263157894738 train loss: 0.022443459854125977 train acc: 0.99432 val loss: 0.23998183212280275 val acc: 0.9312 time: 1299.6373777389526


HBox(children=(IntProgress(value=1, bar_style='info', max=1), HTML(value='')))


epoch: 24 lr: 0.0 train loss: 0.020659445877075196 train acc: 0.99496 val loss: 0.24168466720581055 val acc: 0.9314 time: 1357.3244097232819


#####Validation Accuracy : 93.14, time taken : 1357 sec