<a href="https://colab.research.google.com/github/roulupen/EVAAssignments/blob/master/Assignemnt12/Assignment_12_OCP.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [0]:
import numpy as np
import time, math
from tqdm import tqdm_notebook as tqdm

import tensorflow as tf
import tensorflow.contrib.eager as tfe

The TensorFlow contrib module will not be included in TensorFlow 2.0.
For more information, please see:
  * https://github.com/tensorflow/community/blob/master/rfcs/20180907-contrib-sunset.md
  * https://github.com/tensorflow/addons
  * https://github.com/tensorflow/io (for I/O related ops)
If you depend on functionality not listed there, please file an issue.



## When you enable eager execution, operations execute immediately and return their values to Python without requiring a Session.run()

### Other  benifits are :

*   Easier debugging—Call ops directly to inspect running models and test changes. Use standard Python debugging tools for immediate error reporting.
*   Natural control flow—Use Python control flow instead of graph control flow, simplifying the specification of dynamic models.



In [0]:
tf.enable_eager_execution()

In [0]:
BATCH_SIZE = 512 #@param {type:"integer"}
MOMENTUM = 0.9 #@param {type:"number"}
LEARNING_RATE = 0.4 #@param {type:"number"}
WEIGHT_DECAY = 5e-4 #@param {type:"number"}
EPOCHS = 24 #@param {type:"integer"}

https://mc.ai/tutorial-1-cifar10-with-google-colabs-free-gpu%E2%80%8A-%E2%80%8A92-5/

## init_pytorch is used to initialize initial kernel weights in different layer. We pass this function as paramter to convolution layer while building the model

In [0]:
def init_pytorch(shape, dtype=tf.float32, partition_info=None):
  fan = np.prod(shape[:-1])
  bound = 1 / math.sqrt(fan)
  return tf.random.uniform(shape, minval=-bound, maxval=bound, dtype=dtype)

## The below ConvBN class inherit tf.keras.Model and it uses the default constructor to define different layers and override call method for building model.
### In the call method it takes the input paramter and apply convolution, dropout, batch normalization and finally uses relu activation function.

In [0]:
class ConvBN(tf.keras.Model):
  def __init__(self, c_out):
    super().__init__()
    self.conv = tf.keras.layers.Conv2D(filters=c_out, kernel_size=3, padding="SAME", kernel_initializer=init_pytorch, use_bias=False)
    self.bn = tf.keras.layers.BatchNormalization(momentum=0.9, epsilon=1e-5)
    self.drop = tf.keras.layers.Dropout(0.05)

  def call(self, inputs):
    return tf.nn.relu(self.bn(self.drop(self.conv(inputs))))

## This class build a resnet block. While initializing the class it takes whether skip connection is needed or not, if needed it added skip connection to the final output it returns else it just does a convolution. By default it doesn't adds the skip connection.

In [0]:
class ResBlk(tf.keras.Model):
  def __init__(self, c_out, pool, res = False):
    super().__init__()
    self.conv_bn = ConvBN(c_out)
    self.pool = pool
    self.res = res
    if self.res:
      self.res1 = ConvBN(c_out)
      self.res2 = ConvBN(c_out)

  def call(self, inputs):
    h = self.pool(self.conv_bn(inputs))
    if self.res:
      h = h + self.res2(self.res1(h))
    return h

## As the name says,  this class creates model for DavidNet. Defines following instance variable:
*   It defines pool as Max polling 2D and this is used while creating Resnet block.
*   Defines convolution layer.
* 1st Resnet block, with skip connection and it doubles the input channel size.
* 2nd Resnet block, it 4 times the size of input channel without skip connection.
* 3rd Resnet block, it 8 times the size of input channel with skip connection.
* Defines a average pooling for the final layer of the model.
* Defines a Fully connected layer with output channel as 10 as the number of classes in our dataset
* Initilizes the weight vairable.

## Finally in call method it creates the model using the instance variable in following sequence:
- Initial convolution, 1st Resnet block, 2nd Resnet Block, 3rd Resnet Block, Average Pooling.
- Passes the output of step 1 into the fully conncted layer to convert into 1D array of size 10 as the number of classes in the dataset.
- Multiplying the fully connected layer output with a factor of weight to add regularization into the model manually since Keras doesn't support L2 regularization.
- **sparse_softmax_cross_entropy_with_logits**, Measures the probability error in discrete classification tasks in which the classes are mutually exclusive (each entry is in exactly one class)
- loss, Counts number of incorrect predictions in a batch
- correct, counts number of correct predictions in a batch

 

In [0]:
class DavidNet(tf.keras.Model):
  def __init__(self, c=64, weight=0.125):
    super().__init__()
    pool = tf.keras.layers.MaxPooling2D()
    self.init_conv_bn = ConvBN(c)
    self.blk1 = ResBlk(c*2, pool, res = True)
    self.blk2 = ResBlk(c*4, pool)
    self.blk3 = ResBlk(c*8, pool, res = True)
    self.pool = tf.keras.layers.GlobalMaxPool2D()
    self.linear = tf.keras.layers.Dense(10, kernel_initializer=init_pytorch, use_bias=False)
    self.weight = weight

  def call(self, x, y):
    h = self.pool(self.blk3(self.blk2(self.blk1(self.init_conv_bn(x)))))
    h = self.linear(h) * self.weight 
    ce = tf.nn.sparse_softmax_cross_entropy_with_logits(logits=h, labels=y)
    loss = tf.reduce_sum(ce)
    correct = tf.reduce_sum(tf.cast(tf.math.equal(tf.argmax(h, axis = 1), y), tf.float32))
    return loss, correct

In [0]:
(x_train, y_train), (x_test, y_test) = tf.keras.datasets.cifar10.load_data()
len_train, len_test = len(x_train), len(x_test)
y_train = y_train.astype('int64').reshape(len_train)
y_test = y_test.astype('int64').reshape(len_test)

train_mean = np.mean(x_train, axis=(0,1,2))
train_std = np.std(x_train, axis=(0,1,2))

normalize = lambda x: ((x - train_mean) / train_std).astype('float32') 
pad4 = lambda x: np.pad(x, [(0, 0), (4, 4), (4, 4), (0, 0)], mode='reflect')

x_train = normalize(pad4(x_train))
x_test = normalize(x_test)

Downloading data from https://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz




*   Batches per epoch = (train dataset length/Batch Size) + 1
*   LR Schedule = Learning Rate scheduler
* global_step is defined as the number of batches that have been seen by the graph. Each time a batch is provided, the weights are updated in such a direction that it will minimize the loss . Moreover, global_step tracks the number of batches meet so far and when it is passed to the minimize() argument list, it increases by one.
* Momentum optimizer
* Data Augmentation:
    - Random crop the images with size 32x32x3
    - Random flip the result from left to right.





In [0]:
model = DavidNet()
batches_per_epoch = len_train//BATCH_SIZE + 1

lr_schedule = lambda t: np.interp([t], [0, (EPOCHS+1)//5, EPOCHS], [0, LEARNING_RATE, 0])[0] #TO-DO
global_step = tf.train.get_or_create_global_step() #TO-DO
lr_func = lambda: lr_schedule(global_step/batches_per_epoch)/BATCH_SIZE
opt = tf.train.MomentumOptimizer(lr_func, momentum=MOMENTUM, use_nesterov=True) #TO-DO
data_aug = lambda x, y: (tf.image.random_flip_left_right(tf.random_crop(x, [32, 32, 3])), y) #TO-DO



* Creating test set batches using the batch size  
* Looping through each epoch
* Creating train set batches using the batch size
* Loop over each train set batch:
   - Using tensor flow gradient tape for grdient calculation of all parameters
   - Tensor flow doesn't support weight decay regularization by default. So, we are doing it manually for each variable. If a variable doesn't add much effect into the result then it will be discarded using Weight Decay. Weight Decay and L2 regularization are different.
   - The apply_gradients function compute the gradients of each parameter by minimizing the loss.



In [0]:

t = time.time()
test_set = tf.data.Dataset.from_tensor_slices((x_test, y_test)).batch(BATCH_SIZE)

for epoch in range(EPOCHS):
  train_loss = test_loss = train_acc = test_acc = 0.0
  train_set = tf.data.Dataset.from_tensor_slices((x_train, y_train)).map(data_aug).shuffle(len_train).batch(BATCH_SIZE).prefetch(1)

  tf.keras.backend.set_learning_phase(1)
  for (x, y) in tqdm(train_set):
    with tf.GradientTape() as tape:
      loss, correct = model(x, y)

    var = model.trainable_variables
    grads = tape.gradient(loss, var)
    for g, v in zip(grads, var):
      g += v * WEIGHT_DECAY * BATCH_SIZE
    opt.apply_gradients(zip(grads, var), global_step=global_step)

    train_loss += loss.numpy()
    train_acc += correct.numpy()

  tf.keras.backend.set_learning_phase(0)
  for (x, y) in test_set:
    loss, correct = model(x, y)
    test_loss += loss.numpy()
    test_acc += correct.numpy()
    
  print('epoch:', epoch+1, 'lr:', lr_schedule(epoch+1), 'train loss:', train_loss / len_train, 'train acc:', train_acc / len_train, 'val loss:', test_loss / len_test, 'val acc:', test_acc / len_test, 'time:', time.time() - t)




HBox(children=(IntProgress(value=1, bar_style='info', max=1), HTML(value='')))


epoch: 1 lr: 0.08 train loss: 1.6257620965576172 train acc: 0.40744 val loss: 1.3651089630126954 val acc: 0.5326 time: 44.06200909614563


HBox(children=(IntProgress(value=1, bar_style='info', max=1), HTML(value='')))


epoch: 2 lr: 0.16 train loss: 0.8928479779052735 train acc: 0.68458 val loss: 0.8895318939208985 val acc: 0.6926 time: 73.62545609474182


HBox(children=(IntProgress(value=1, bar_style='info', max=1), HTML(value='')))


epoch: 3 lr: 0.24 train loss: 0.6700631002807618 train acc: 0.7655 val loss: 0.8067521636962891 val acc: 0.7428 time: 103.18359637260437


HBox(children=(IntProgress(value=1, bar_style='info', max=1), HTML(value='')))


epoch: 4 lr: 0.32 train loss: 0.5700838354492187 train acc: 0.80376 val loss: 1.2034398132324218 val acc: 0.6445 time: 132.78622126579285


HBox(children=(IntProgress(value=1, bar_style='info', max=1), HTML(value='')))


epoch: 5 lr: 0.4 train loss: 0.5038436508178711 train acc: 0.8263 val loss: 0.5967511062622071 val acc: 0.8002 time: 162.1881217956543


HBox(children=(IntProgress(value=1, bar_style='info', max=1), HTML(value='')))


epoch: 6 lr: 0.37894736842105264 train loss: 0.42996320709228514 train acc: 0.8528 val loss: 0.6407176742553711 val acc: 0.7923 time: 191.64645171165466


HBox(children=(IntProgress(value=1, bar_style='info', max=1), HTML(value='')))


epoch: 7 lr: 0.35789473684210527 train loss: 0.3448586151123047 train acc: 0.8813 val loss: 0.7550178451538085 val acc: 0.7804 time: 221.0552089214325


HBox(children=(IntProgress(value=1, bar_style='info', max=1), HTML(value='')))


epoch: 8 lr: 0.33684210526315794 train loss: 0.29600581985473634 train acc: 0.89898 val loss: 0.4324545310974121 val acc: 0.8552 time: 251.15765976905823


HBox(children=(IntProgress(value=1, bar_style='info', max=1), HTML(value='')))


epoch: 9 lr: 0.31578947368421056 train loss: 0.2619182469177246 train acc: 0.9088 val loss: 0.3984594123840332 val acc: 0.8715 time: 280.69828724861145


HBox(children=(IntProgress(value=1, bar_style='info', max=1), HTML(value='')))


epoch: 10 lr: 0.2947368421052632 train loss: 0.22652004592895508 train acc: 0.92182 val loss: 0.4735183151245117 val acc: 0.8539 time: 310.0713093280792


HBox(children=(IntProgress(value=1, bar_style='info', max=1), HTML(value='')))


epoch: 11 lr: 0.2736842105263158 train loss: 0.20078069709777832 train acc: 0.93032 val loss: 0.3711290267944336 val acc: 0.8835 time: 339.1911473274231


HBox(children=(IntProgress(value=1, bar_style='info', max=1), HTML(value='')))


epoch: 12 lr: 0.25263157894736843 train loss: 0.17583042610168456 train acc: 0.93892 val loss: 0.31387169189453124 val acc: 0.8989 time: 368.65085768699646


HBox(children=(IntProgress(value=1, bar_style='info', max=1), HTML(value='')))


epoch: 13 lr: 0.23157894736842108 train loss: 0.1559106150817871 train acc: 0.94598 val loss: 0.41876822509765627 val acc: 0.8728 time: 397.6375365257263


HBox(children=(IntProgress(value=1, bar_style='info', max=1), HTML(value='')))


epoch: 14 lr: 0.2105263157894737 train loss: 0.13656528129577636 train acc: 0.9525 val loss: 0.3523848037719727 val acc: 0.8957 time: 426.5815622806549


HBox(children=(IntProgress(value=1, bar_style='info', max=1), HTML(value='')))


epoch: 15 lr: 0.18947368421052635 train loss: 0.12045220352172852 train acc: 0.95882 val loss: 0.35112862854003907 val acc: 0.8947 time: 455.7363238334656


HBox(children=(IntProgress(value=1, bar_style='info', max=1), HTML(value='')))


epoch: 16 lr: 0.16842105263157897 train loss: 0.10550074382781982 train acc: 0.96344 val loss: 0.32586183166503907 val acc: 0.9005 time: 484.763347864151


HBox(children=(IntProgress(value=1, bar_style='info', max=1), HTML(value='')))


epoch: 17 lr: 0.1473684210526316 train loss: 0.09353348903656006 train acc: 0.96746 val loss: 0.32835676193237306 val acc: 0.9021 time: 513.8829939365387


HBox(children=(IntProgress(value=1, bar_style='info', max=1), HTML(value='')))


epoch: 18 lr: 0.12631578947368421 train loss: 0.07915886672973632 train acc: 0.9733 val loss: 0.2875952430725098 val acc: 0.9159 time: 543.3617272377014


HBox(children=(IntProgress(value=1, bar_style='info', max=1), HTML(value='')))


epoch: 19 lr: 0.10526315789473689 train loss: 0.06807763833999633 train acc: 0.97762 val loss: 0.26786767807006834 val acc: 0.9184 time: 572.9406402111053


HBox(children=(IntProgress(value=1, bar_style='info', max=1), HTML(value='')))


epoch: 20 lr: 0.08421052631578951 train loss: 0.05851832050323486 train acc: 0.98094 val loss: 0.27373801040649415 val acc: 0.9221 time: 602.5555322170258


HBox(children=(IntProgress(value=1, bar_style='info', max=1), HTML(value='')))


epoch: 21 lr: 0.06315789473684214 train loss: 0.049228293361663816 train acc: 0.9841 val loss: 0.25219807586669923 val acc: 0.9252 time: 631.810084104538


HBox(children=(IntProgress(value=1, bar_style='info', max=1), HTML(value='')))


epoch: 22 lr: 0.04210526315789476 train loss: 0.04086958978652954 train acc: 0.98814 val loss: 0.2582120223999023 val acc: 0.9237 time: 661.0380907058716


HBox(children=(IntProgress(value=1, bar_style='info', max=1), HTML(value='')))


epoch: 23 lr: 0.02105263157894738 train loss: 0.03703957044601441 train acc: 0.9892 val loss: 0.24948596878051757 val acc: 0.9289 time: 689.9963872432709


HBox(children=(IntProgress(value=1, bar_style='info', max=1), HTML(value='')))


epoch: 24 lr: 0.0 train loss: 0.034272400665283205 train acc: 0.9907 val loss: 0.24836608238220215 val acc: 0.9279 time: 719.2472357749939
