In [0]:
try:
    # %tensorflow_version only exists in Colab.
    %tensorflow_version 2.x
except Exception:
    pass

In [0]:
# Load the TensorBoard notebook extension
%load_ext tensorboard 

In [0]:
import os
import sys
import math
import time
import datetime

import tensorflow as tf
import random
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from PIL import Image

from tensorflow import keras
from sklearn.preprocessing import OneHotEncoder

%matplotlib inline

# ImageNet Challenge

![](https://drive.google.com/uc?export=view&id=1LK_eQgSZXykX20g-T4myDVz_Lo0HOde5)

## AlexNet

![](https://www.researchgate.net/profile/Jaime_Gallego2/publication/318168077/figure/fig1/AS:578190894927872@1514862859810/AlexNet-CNN-architecture-layers.png)

## VGG

![](https://qph.fs.quoracdn.net/main-qimg-ba81c87204be1a5d11d64a464bca39eb)

## ResNet (Deep Residual Network)

Is learning better networks as easy as stacking more layers?

The answer is no, in training deep networks we have the following problems:

*   Vanishing / exploding  gradients,  which hamper convergence from the beginning (largely addressed by normalized initialization (e.g. Xavier) and intermediate normalization layers (e.g. Batch Norm))
*   Degradation problem:  with the network depth increasing, accuracy gets saturated (which might beun surprising)  and  then degrades  rapidly. Unexpectedly, such degradation is not caused by overfitting.

The degradation problem is interesting. It indicates that not all systems are similarly easy to optimize. Let us consider a shallower architecture and its deeper counterpart that adds more layers onto it. There exists a solution by construction to the deeper model: the added layers are identity mapping, and the other layers are copied from the learned shallower model.  The existence of this constructed solution indicates that a deeper model should produce no higher training error than its shallower counterpart. But experiments show that our current solvers on hand are unable to find solutions that are comparably good or better than the constructed solution (or unable to do so in feasible time).

\\

In a deep  residual  learning framework, instead  of  hoping  each  few  stacked  layers  directly  fit  a desired  underlying  mapping,  authors  explicitly  let  these  layers fit a residual mapping. More specifficaly, if the desired underlying mapping is H(x), instead of mapping it directly, authors let the stacked non linear layers fit another mapping of F(x) := H(x) − x.

They hypothesize that it is easier to optimize the residual mapping than to optimize the original, unreferenced mapping. 

To the extreme, if an identity mapping were optimal, it would be easier to push the residual to zero than to fit an identity mapping by a stack of nonlinear layers.

![](https://drive.google.com/uc?export=view&id=1c0P4N4Ax9Ox2Wu6MSACXVKIzBZzz-SXq)

### What is residual layer

![](https://drive.google.com/uc?export=view&id=1M82_hOttWl-_7_bHFkCxPivPJ3pm7USI)

### ResNet architecture

![](https://cdn-images-1.medium.com/max/1400/1*S3TlG0XpQZSIpoDIUCQ0RQ.jpeg)

### Batch normalization

[How to use BatchNorm with activation functions and dropout](https://stackoverflow.com/questions/39691902/ordering-of-batch-normalization-and-dropout)

![](https://drive.google.com/uc?export=view&id=1iTOczI0iM-cr258vHyyN4t-gauslkOXL)

# Tensorboard

The computations you'll use TensorFlow for - like training a massive deep neural network - can be complex and confusing. To make it easier to understand, debug, and optimize TensorFlow programs, Google have included a suite of visualization tools called TensorBoard. You can use TensorBoard to visualize your TensorFlow graph, plot quantitative metrics about the execution of your graph, and show additional data like images that pass through it. When TensorBoard is fully configured, it looks like this:

![](https://www.tensorflow.org/images/mnist_tensorboard.png)

**In collab we run the tensorboard in the following way, where *logs/fit* is the path of directory where we saved logs of our model**

In [0]:
%tensorboard --logdir logs/fit

# Create ResNet for CIFAR10 images classification

## Load CIFAR10 dataset

In [0]:
cifar = keras.datasets.cifar10

(X_train, y_train), (X_test, y_test) = cifar.load_data()

X_train = X_train / 255
X_test = X_test / 255

print(X_train.shape, y_train.shape, X_test.shape, y_test.shape)

## Define the network

### Define the function, that creates the residual layer

The function should take 3 arguments:
*   **x_in** - input tensor
*   **channels_out** - number of channels of image returned by the layer
*   **strides** - layer stride size

\\

The function should:
1.  Take the *x_in* tensor and apply the  convolutional layer [keras.layers.Conv2d](https://www.tensorflow.org/api_docs/python/tf/keras/layers/Conv2D), with *channels_out* filters, with kernel size equal to 3, same padding, stride size equal to *strides* and **without** any activation function.
2.  Add [Batch Normalization](https://www.tensorflow.org/api_docs/python/tf/keras/layers/BatchNormalization) to the output of the first layer.
3.  Use [ReLU](https://www.tensorflow.org/api_docs/python/tf/keras/layers/ReLU) activation function.
4.  Apply another convolutional layer with kernel size equal to 3, same padding, stride size equal to *strides* and **without** any activation function.
5.  Again use Batch Normalization.
6.  If shape of the tensor returned by function from point 5 is different than the shape of the input image, you should rescale the input image, by using the 1x1 convolutions (so the convolutional layer with kernel size equal to 1) with the propper stride.
7.  Add (rescaled) input image to the image returned from point 5.
6.  Apply ReLu activation.

\\

** Note: You can also try to implement residual layer, by defining the custom layer, instead of  creating this funcion **

In [0]:
def residual_layer(x_in, channels_out, strides=1):
    ##
    
    return x

### Define the ResNet 20 architecture

**Define the input layer with a proper input shape**

In [0]:
x_in = ##

**Apply the first convolutional layer**

Apply classic (non residual convolutional layer) with 16 filters, kernel size equal to 3, same padding, stride size equal to 1. Remember about BatchNorm and ReLU activation.

In [0]:
x = ##

**Apply the first block of residual layers**

Residual layers should work on image with 16 channels and 32x32 size.

The block should be made of 3 residual layers.

In [0]:
##

**Apply the second block of residual layers**

Residual layers should work on image with 32 channels and 16x16 size.

The block should be made of 3 residual layers.

In [0]:
##

**Apply the third block of residual layers**

Residual layers should work on image with 64 channels and 8xx size.

The block should be made of 3 residual layers.

In [0]:
##

**Apply the global average pooling**

[GlobalAveragePooling in tf.keras](https://www.tensorflow.org/api_docs/python/tf/keras/layers/GlobalAveragePooling2D)

In [0]:
x = ##

**Apply the output layer**

To the image returned by convolutional layers, we should apply the [dense layer](https://www.tensorflow.org/api_docs/python/tf/keras/layers/Dense), with a proper number of units and activation function.

In [0]:
x_out = ##

**Define keras model**

You should pass the proper input and output tensors to the initializer.

In [0]:
model = ##

**Call the summary function**

We could also plot the image of our model by calling the following functions:


```
keras.utils.plot_model(model, to_file='model.png', show_shapes=True)
plt.figure(figsize=(10,20))
img = Image.open('model.png')
plt.imshow(img)
```


In [0]:
model.summary()

### Train the model

**Before training, you should conpile the model with a propper loss function and optimizer**

In [0]:
##

**The following line creates the tensorboard callback**

With this callback, tensorboard can print the losses/metrics/graphs returned by our model.

[TensorBoard callback](https://www.tensorflow.org/api_docs/python/tf/keras/callbacks/TensorBoard)

In [0]:
log_dir="logs/fit/" + datetime.datetime.now().strftime("%Y%m%d-%H%M%S")
tensorboard_callback = tf.keras.callbacks.TensorBoard(log_dir=log_dir)

**Train the model**

In the original paper they used the batchsize equal to 128. Remember about TensorBoard callback.

In [0]:
##

**Note 1:** You could experiment with other resnet architectures, data augmentation (with using [tf datasets](https://www.tensorflow.org/api_docs/python/tf/data/Dataset) to [train keras model](https://www.tensorflow.org/alpha/guide/keras/training_and_evaluation#training_evaluation_from_tfdata_datasets)), regularization methods, dropout, optimizers, [learning rate schedulers](https://www.tensorflow.org/api_docs/python/tf/keras/callbacks/LearningRateScheduler), in order to get the best classification result as you can.

**Note 2:** You could try to implement model training in more [tensorflow'ish way](https://www.tensorflow.org/alpha/guide/keras/training_and_evaluation#part_ii_writing_your_own_training_evaluation_loops_from_scratch)

# Images sources

Images used in this notebook comes from the following web pages and papers:
1.  https://www.researchgate.net/figure/AlexNet-CNN-architecture-layers_fig1_318168077
2.  https://www.quora.com/What-is-the-VGG-neural-network
3.  https://medium.com/@pierre_guillou/understand-how-works-resnet-without-talking-about-residual-64698f157e0c
4.  [ResNet paper](https://arxiv.org/pdf/1512.03385.pdf) - Text is also inspired by this publication
5.  [BatchNorm paper](https://arxiv.org/pdf/1502.03167.pdf)
