
# **From Paper To Keras: DenseNets With TensorFlow**

[<img src="https://github.com/shubham0204/Privacy_Policy_Texts/blob/master/notebook_button_two.png?raw=true" width="170" height="50" align="center">](https://medium.com/@equipintelligence/exploring-densenets-from-paper-to-keras-dcc01725488b)
[<img src="https://github.com/shubham0204/Privacy_Policy_Texts/raw/master/read_the_paper_button.png" width="150" height="40" align="center">](https://arxiv.org/abs/1608.06993)

---

In this notebook, we'll create the popular [DenseNet](https://arxiv.org/abs/1608.06993) architecture right from scratch! We'll understand the structure right from the beginning and implement it using `tf.keras`.

We'll require a GPU Hardware accelerator for training the model. Change the runtime type to GPU by going to `Tools > Change Runtime Type > Hardware Accelerator > GPU`.

***Note: It is highly recommended that you go through the research paper once as you'll come across various expressions which are from the paper, in this notebook.***






## 1) Importing the Packages



We import TensorFlow and NumPy. Other packages are imported as and when needed.


In [None]:

import tensorflow as tf
import numpy as np
import os



## 2) Loading the Data

We'll train our model on the [CIFAR-10](https://www.cs.toronto.edu/~kriz/cifar.html) dataset which can be easily loaded using the `tf.keras.datasets` module.

You can try more datasets via [TensorFlow Datasets](https://www.tensorflow.org/datasets) and see the list of image datasets on their [catalog](https://www.tensorflow.org/datasets/catalog/overview).




In [None]:

(x_train, y_train), (x_test, y_test) = tf.keras.datasets.cifar10.load_data()

# One-hot encoding for 10 classes.

y_train = tf.keras.utils.to_categorical(y_train, 10)
y_test = tf.keras.utils.to_categorical(y_test, 10)



## 3) The DenseNet Model





### a) The $H$ function ( $BatchNorm \to ReLU \to Conv$ )

The namne "DenseNet" is enough to give an intuition of this. DenseNet is made of Dense Blocks. "Blocks" only refers to a group of different layers.

Dense Blocks, rather than traditional CNNs where information ( or feature maps ) flow in a single direction, allow Convolution layers to access inputs from all the previous layers present in the network. The information flows improves as each layer is connected with all its previous layers as well the feature map of the previous layer.

In the paper, you'll come across an expression,

$x_l = H( x_{l-1} ) + x_{l-1} $

$H$ represents a composite function which takes in an image/feature map ( $x$ ) and performs some operations on it. 

$ x \to Batch \ Normalization \to ReLU \to Zero \ Padding \to 3 \times 3 \ Convolution \ \to Dropout$


The bottleneck layer could be added too. We've implemented it below.



In [None]:

def H(  inputs, num_filters , dropout_rate ):
    x = tf.keras.layers.BatchNormalization( epsilon=eps )( inputs )
    x = tf.keras.layers.Activation('relu')(x)
    x = tf.keras.layers.ZeroPadding2D((1, 1))(x)
    x = tf.keras.layers.Conv2D(num_filters, kernel_size=(3, 3), use_bias=False , kernel_initializer='he_normal' )(x)
    x = tf.keras.layers.Dropout(rate=dropout_rate )(x)
    return x



### b) The Transition Layers

The Transition layers perform the downsampling of the feature maps. The feature maps come from the previous block. The compression_factor observed below is the $\theta$ value from the paper which is the compression factor.

Hence, if $m$ feature maps go into the transition layer, we'll produce $[m \theta]$ feature maps. $[ \ ]$ represents the floor function.



In [None]:

def transition(inputs, num_filters , compression_factor , dropout_rate ):
    # compression_factor is the 'θ'
    x = tf.keras.layers.BatchNormalization( epsilon=eps )(inputs)
    x = tf.keras.layers.Activation('relu')(x)
    num_feature_maps = inputs.shape[1] # The value of 'm'

    x = tf.keras.layers.Conv2D( np.floor( compression_factor * num_feature_maps ).astype( np.int ) ,
                               kernel_size=(1, 1), use_bias=False, padding='same' , kernel_initializer='he_normal' , kernel_regularizer=tf.keras.regularizers.l2( 1e-4 ) )(x)
    x = tf.keras.layers.Dropout(rate=dropout_rate)(x)
    
    x = tf.keras.layers.AveragePooling2D(pool_size=(2, 2))(x)
    return x



### c) Finally, the Dense Block

Each block will get some feature maps as input  from the previous transition layer. These inputs will then go through the $H$ function to produce an output ( $x_1$ ) .

$x_1 = H( x_{0} )$

Now, $x_1$ again goes into the $H$ function. But this time, its concatenated with $x_{0}$. So $x_2$ will be produced like,

$x_2 = H( \ concat( \ x_1 , x_0 \ ) \ )$

Similarly, $x_l$ will be produced by the concatenation of all output feature maps of the previous layers ( as well as the inputs $x_0$ )

$x_l = H( \ concat( \ x_0 \ , x_1\  , \ x_2 , ... , \ x_{l-1} \  ) \ )$


After getting $x_l$,  it will be passed through the transition layer. From then onwards, the outputs of the transition layer again flow in another block.



In [None]:

def dense_block( inputs, num_layers, num_filters, growth_rate , dropout_rate ):
    for i in range(num_layers): # num_layers is the value of 'l'
        conv_outputs = H(inputs, num_filters , dropout_rate )
        inputs = tf.keras.layers.Concatenate()([conv_outputs, inputs])
        num_filters += growth_rate # To increase the number of filters for each layer.
    return inputs, num_filters



We'll add Dense Blocks and Transition layers one after the other. A `GlobalAveragePooling2D` layer ensures that the outputs are 2D and finally a softmax layer produces the class probabilities.


In [None]:

input_shape = ( 32 , 32 , 3 ) 
num_blocks = 3
num_layers_per_block = 4
growth_rate = 16
dropout_rate = 0.4
compress_factor = 0.5
eps = 1.1e-5

num_filters = 16

inputs = tf.keras.layers.Input( shape=input_shape )
x = tf.keras.layers.Conv2D( num_filters , kernel_size=( 3 , 3 ) , use_bias=False, kernel_initializer='he_normal' , kernel_regularizer=tf.keras.regularizers.l2( 1e-4 ) )( inputs )

for i in range( num_blocks ):
    x, num_filters = dense_block( x, num_layers_per_block , num_filters, growth_rate , dropout_rate )
    x = transition(x, num_filters , compress_factor , dropout_rate )

x = tf.keras.layers.GlobalAveragePooling2D()( x ) 
x = tf.keras.layers.Dense( 10 )( x ) # Num Classes for CIFAR-10
outputs = tf.keras.layers.Activation( 'softmax' )( x )



Everything compiled into a beautiful `tf.keras.models.Model`!


In [None]:

model = tf.keras.models.Model( inputs , outputs )
model.compile( loss=tf.keras.losses.categorical_crossentropy ,optimizer=tf.keras.optimizers.Adam( lr=0.0001 ) ,metrics=[ 'acc' ])
model.summary()

#Comment out the below line if you want to have an image of your model's structure.

#tf.keras.utils.plot_model( model , show_shapes=True )



## 4) Training the Model

We'll train the model now.


In [None]:

batch_size = 64
epochs = 100

model.fit( x_train , y_train , epochs=epochs , batch_size=batch_size , validation_data=( x_test , y_test ) )



## 5) Evaluate the Model



In [None]:

results = model.evaluate(x_test, y_test, batch_size=batch_size)
print( 'Loss = {} and Accuracy = {} %'.format( results[0] , results[1] * 100 ) )
