### **U-net architecture**

Developing the Model (UNet) Using Keras Functional API

For this example, we are going to implement a popular architecture: UNet. In a sense, it is not the best for a titorial since this model is very heavy. But I found the exercise interesting. Especially because we are going to use the functional API provided by keras.

This architecture was introduce in the paper **U-Net: Convolutional Networks for Biomedical Image Segmentation** that you can read there: https://arxiv.org/abs/1505.04597  

You can also read my notes on this paper there: https://yann-leguilly.gitlab.io/post/2019-12-11-unet-biomedical-images/   
Basically we are going to reproduce this:  


<p align="center"><img src="https://yann-leguilly.gitlab.io/img/unet_1/figure_1.png" width="60%"/>


In [None]:
import tensorflow as tf

#### **Load ResNet50**


<p align="center"><img src="https://i.stack.imgur.com/gI4zT.png" width="60%"/>

In [None]:
model = tf.keras.applications.ResNet50(weights="imagenet")

#### **Model summary**

In [None]:
model.summary()

Model: "resnet50"
__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
input_5 (InputLayer)            [(None, 224, 224, 3) 0                                            
__________________________________________________________________________________________________
conv1_pad (ZeroPadding2D)       (None, 230, 230, 3)  0           input_5[0][0]                    
__________________________________________________________________________________________________
conv1_conv (Conv2D)             (None, 112, 112, 64) 9472        conv1_pad[0][0]                  
__________________________________________________________________________________________________
conv1_bn (BatchNormalization)   (None, 112, 112, 64) 256         conv1_conv[0][0]                 
___________________________________________________________________________________________

### **VGG-16**

<p align="center"><img src="https://www.researchgate.net/profile/Max_Ferguson/publication/322512435/figure/fig3/AS:697390994567179@1543282378794/Fig-A1-The-standard-VGG-16-network-architecture-as-proposed-in-32-Note-that-only.png" width="60%"/>

#### **Load VGG16**

In [None]:
model = tf.keras.applications.VGG16(weights="imagenet")

#### **Model summary**

In [None]:
'''
NOTE :

(224,224,3) is input shape on vgg16 model is trained , so we have to pass same shape of input
while doing transfer learning  
'''

model.summary()

Model: "vgg16"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
input_6 (InputLayer)         [(None, 224, 224, 3)]     0         
_________________________________________________________________
block1_conv1 (Conv2D)        (None, 224, 224, 64)      1792      
_________________________________________________________________
block1_conv2 (Conv2D)        (None, 224, 224, 64)      36928     
_________________________________________________________________
block1_pool (MaxPooling2D)   (None, 112, 112, 64)      0         
_________________________________________________________________
block2_conv1 (Conv2D)        (None, 112, 112, 128)     73856     
_________________________________________________________________
block2_conv2 (Conv2D)        (None, 112, 112, 128)     147584    
_________________________________________________________________
block2_pool (MaxPooling2D)   (None, 56, 56, 128)       0     

We are using vgg-16 as encoder in U-net , so we don't have to train encoder part which saves lot of time , we add decoder layer and trains only those , which is similar to transfer learning.


In [None]:
for i,layer in enumerate(model.layers):
  print("{} \t {} \t {} \t {}".format(i, layer.name,layer.trainable, layer.output))

0 	 input_6 	 True 	 Tensor("input_6:0", shape=(None, 224, 224, 3), dtype=float32)
1 	 block1_conv1 	 True 	 Tensor("block1_conv1_3/Identity:0", shape=(None, 224, 224, 64), dtype=float32)
2 	 block1_conv2 	 True 	 Tensor("block1_conv2_3/Identity:0", shape=(None, 224, 224, 64), dtype=float32)
3 	 block1_pool 	 True 	 Tensor("block1_pool_3/Identity:0", shape=(None, 112, 112, 64), dtype=float32)
4 	 block2_conv1 	 True 	 Tensor("block2_conv1_3/Identity:0", shape=(None, 112, 112, 128), dtype=float32)
5 	 block2_conv2 	 True 	 Tensor("block2_conv2_3/Identity:0", shape=(None, 112, 112, 128), dtype=float32)
6 	 block2_pool 	 True 	 Tensor("block2_pool_3/Identity:0", shape=(None, 56, 56, 128), dtype=float32)
7 	 block3_conv1 	 True 	 Tensor("block3_conv1_3/Identity:0", shape=(None, 56, 56, 256), dtype=float32)
8 	 block3_conv2 	 True 	 Tensor("block3_conv2_3/Identity:0", shape=(None, 56, 56, 256), dtype=float32)
9 	 block3_conv3 	 True 	 Tensor("block3_conv3_3/Identity:0", shape=(None, 56, 56,

#### **Freezing layers** <br/>

  Before training the network you may want to freeze some of its layers depending upon the task. Once a layer is frozen, its weights are not updated while training.

whenever we set this parameter to false 
it means we are not going to compute gradient 
for this ,so weights are not going to update for pre-trained model


In [None]:
'''
we freeze 10 layer from 0 to 9 , by setting layer.trainable = false 
'''

for layer in model.layers[:10]:
  layer.trainable = False

'''
Alternative 

    for layer in model.layers:
      if ['input_3','block1','block2','block3','block4'] in layer.name:
        layer.trainable = False

'''

for i,layer in enumerate(model.layers):
  print("{} \t {} \t {} \t {}".format(i, layer.name,layer.trainable, layer.output))

0 	 input_2 	 False 	 Tensor("input_2:0", shape=(None, 224, 224, 3), dtype=float32)
1 	 block1_conv1 	 False 	 Tensor("block1_conv1/Identity:0", shape=(None, 224, 224, 64), dtype=float32)
2 	 block1_conv2 	 False 	 Tensor("block1_conv2/Identity:0", shape=(None, 224, 224, 64), dtype=float32)
3 	 block1_pool 	 False 	 Tensor("block1_pool/Identity:0", shape=(None, 112, 112, 64), dtype=float32)
4 	 block2_conv1 	 False 	 Tensor("block2_conv1/Identity:0", shape=(None, 112, 112, 128), dtype=float32)
5 	 block2_conv2 	 False 	 Tensor("block2_conv2/Identity:0", shape=(None, 112, 112, 128), dtype=float32)
6 	 block2_pool 	 False 	 Tensor("block2_pool/Identity:0", shape=(None, 56, 56, 128), dtype=float32)
7 	 block3_conv1 	 False 	 Tensor("block3_conv1/Identity:0", shape=(None, 56, 56, 256), dtype=float32)
8 	 block3_conv2 	 False 	 Tensor("block3_conv2/Identity:0", shape=(None, 56, 56, 256), dtype=float32)
9 	 block3_conv3 	 False 	 Tensor("block3_conv3/Identity:0", shape=(None, 56, 56, 256), d

In [None]:
### We can extract output of one of the conv layer

layer_names = ['block1_pool','block2_pool']
layer_output = [model.get_layer(layer).output for layer in layer_names]

'''
  inputs -> Just the input shape let us say (1,224,224,3)
  outputs -> would be the output of block1_pool,block2_pool layer
  it's not give us actual predicted output because we are passed any image yet but its only give tensor value 
'''

model_practice = tf.keras.Model(inputs = model.input , outputs = layer_output)

# Here we are getting 2 outputs because we extract 2 conv layer 
print(model_practice.output)


[<tf.Tensor 'block1_pool_3/Identity:0' shape=(None, 112, 112, 64) dtype=float32>, <tf.Tensor 'block2_pool_3/Identity:0' shape=(None, 56, 56, 128) dtype=float32>]


#### **Sanity check**

In [None]:
rand_tensor = tf.random.normal(shape=[1,224,224,3])
out_ = model(rand_tensor)
'''
We receive output from block1_pool_1
Relu = max(0,x) 
thier we have relu : 0   if value<0
                     x   if value>0  
we feed a random input tensor of size (224,224,3) , here 1 indicate 1 sample
'''
print(tf.reduce_sum(tf.cast(out_[0]>0,tf.float32)))
print(tf.reduce_sum(tf.cast(out_[0]<0,tf.float32)))


tf.Tensor(1000.0, shape=(), dtype=float32)
tf.Tensor(0.0, shape=(), dtype=float32)


#### **How to add layers** 

In [None]:
from tensorflow.keras.layers import *

model = tf.keras.applications.VGG16(weights="imagenet")
model = tf.keras.Model(inputs= model.input , outputs=model.get_layer('block4_pool').output)

### Extract output from the block4_pool
'''
Another way
x = model(model.input)
'''
x = model.output 

x = Conv2D(filters=64,kernel_size=(3,3),activation='relu',name='extended_1')(x)
x = Conv2D(filters=64,kernel_size=(3,3),name='extended_2')(x)
x = BatchNormalization(name = 'BN_')(x)
x = Conv2D(filters=64,kernel_size=(3,3),use_bias=True,name='extended_3')(x)

model_new = tf.keras.Model(inputs = model.input,outputs = x)

In [None]:
model_new.summary()

Model: "model_6"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
input_7 (InputLayer)         [(None, 224, 224, 3)]     0         
_________________________________________________________________
block1_conv1 (Conv2D)        (None, 224, 224, 64)      1792      
_________________________________________________________________
block1_conv2 (Conv2D)        (None, 224, 224, 64)      36928     
_________________________________________________________________
block1_pool (MaxPooling2D)   (None, 112, 112, 64)      0         
_________________________________________________________________
block2_conv1 (Conv2D)        (None, 112, 112, 128)     73856     
_________________________________________________________________
block2_conv2 (Conv2D)        (None, 112, 112, 128)     147584    
_________________________________________________________________
block2_pool (MaxPooling2D)   (None, 56, 56, 128)       0   

### **Embedding vgg16 as encoder**

<p align="center"><img src="https://www.pyimagesearch.com/wp-content/uploads/2017/03/imagenet_vggnet_table1.png" width="50%"/>

Here we have 5 set of convolutional unit (include conv + pool layer) and then 3 fully connected layer are attached at the end.

we extract last conv layer from each unit. 

In [None]:

model = tf.keras.applications.VGG16(weights="imagenet")

layers_name = ['block1_conv2','block2_conv2','block3_conv2','block4_conv2','block5_conv2']
layers_outputs = [model.get_layer(name).output for name in layers_name]

model = tf.keras.Model(inputs = model.input , outputs = layers_outputs)

model.summary()

Model: "model_7"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
input_8 (InputLayer)         [(None, 224, 224, 3)]     0         
_________________________________________________________________
block1_conv1 (Conv2D)        (None, 224, 224, 64)      1792      
_________________________________________________________________
block1_conv2 (Conv2D)        (None, 224, 224, 64)      36928     
_________________________________________________________________
block1_pool (MaxPooling2D)   (None, 112, 112, 64)      0         
_________________________________________________________________
block2_conv1 (Conv2D)        (None, 112, 112, 128)     73856     
_________________________________________________________________
block2_conv2 (Conv2D)        (None, 112, 112, 128)     147584    
_________________________________________________________________
block2_pool (MaxPooling2D)   (None, 56, 56, 128)       0   

In [None]:
for i,layer in enumerate(model.layers):
  layer.trainable = False
  print("{} \t {} \t {} \t {}".format(i, layer.name,layer.trainable, layer.output))

0 	 input_8 	 False 	 Tensor("input_8:0", shape=(None, 224, 224, 3), dtype=float32)
1 	 block1_conv1 	 False 	 Tensor("block1_conv1_5/Identity:0", shape=(None, 224, 224, 64), dtype=float32)
2 	 block1_conv2 	 False 	 Tensor("block1_conv2_5/Identity:0", shape=(None, 224, 224, 64), dtype=float32)
3 	 block1_pool 	 False 	 Tensor("block1_pool_5/Identity:0", shape=(None, 112, 112, 64), dtype=float32)
4 	 block2_conv1 	 False 	 Tensor("block2_conv1_5/Identity:0", shape=(None, 112, 112, 128), dtype=float32)
5 	 block2_conv2 	 False 	 Tensor("block2_conv2_5/Identity:0", shape=(None, 112, 112, 128), dtype=float32)
6 	 block2_pool 	 False 	 Tensor("block2_pool_5/Identity:0", shape=(None, 56, 56, 128), dtype=float32)
7 	 block3_conv1 	 False 	 Tensor("block3_conv1_5/Identity:0", shape=(None, 56, 56, 256), dtype=float32)
8 	 block3_conv2 	 False 	 Tensor("block3_conv2_5/Identity:0", shape=(None, 56, 56, 256), dtype=float32)
9 	 block3_conv3 	 False 	 Tensor("block3_conv3_5/Identity:0", shape=(Non

In [None]:
x = model.output

for i in x:
  print(i)

Tensor("block1_conv2_5/Identity:0", shape=(None, 224, 224, 64), dtype=float32)
Tensor("block2_conv2_5/Identity:0", shape=(None, 112, 112, 128), dtype=float32)
Tensor("block3_conv2_5/Identity:0", shape=(None, 56, 56, 256), dtype=float32)
Tensor("block4_conv2_5/Identity:0", shape=(None, 28, 28, 512), dtype=float32)
Tensor("block5_conv2_5/Identity:0", shape=(None, 14, 14, 512), dtype=float32)


We add encoder part of vgg16 model and concatenate latent or bottenleck layer (i.e. between encoder and decoder) and decoder layer to get a representation similar to u-net architecture.

<p align="center"><img src="https://www.pyimagesearch.com/wp-content/uploads/2020/02/keras_autoencoder_arch_flow.png" width="50%"/>

Now we have 

input -> 5 Block of encoder -> 1 Latent layer -> 5 Block of decoder -> output

### **Concatenation of encoder with decoder**

In [None]:

# -- Keras Functional API -- #
# -- UNet Implementation -- #

# If you want to know more about why we are using `he_normal`: 
# https://stats.stackexchange.com/questions/319323/whats-the-difference-between-variance-scaling-initializer-and-xavier-initialize/319849#319849  
# Or the excelent fastai course: 
# https://github.com/fastai/course-v3/blob/master/nbs/dl2/02b_initializing.ipynb

initializer = 'he_normal'
'''
Padding = 'same': means input size and output size would be same , so thier is no reduction in size.
so it preserve the size. 
if we use kernel (3x3) then padding = 1 ,  
if we use kernel (5x5) then padding = 2 and so on

when we use padding='valid' : then thier is reduction in size
'''

# -- Encoder -- #
'''
5 block of encoder : x[0],.......x[4]
'''
x = model.output
# -- Encoder -- #

# -----Latent layer------ #
maxpool = MaxPooling2D(pool_size=(2, 2))(x[4])
conv = Conv2D(1024, 3, activation = 'relu', padding = 'same', kernel_initializer = initializer)(maxpool)
conv = Conv2D(1024, 3, activation = 'relu', padding = 'same', kernel_initializer = initializer)(conv)
# ----------- #

'''
first we upsample then we do concatenation.

Concatenation 

one input of size = (224,224,3)
another input of size = (224,224,3)
concatenation of both result to = (224,224,6)
'''

# -- Dencoder -- #

# Block decoder 1
'''
when upsampling size=(2,2) means when their is input size=2x2 which gets 
transformed to output size=4x4 by using some interpolation technique 
'''
up_dec_1 = Conv2D(512, 2, activation = 'relu', padding = 'same', kernel_initializer = initializer)(UpSampling2D(size = (2,2))(conv))
merge_dec_1 = concatenate([x[4], up_dec_1], axis = 3)
conv_dec_1 = Conv2D(512, 3, activation = 'relu', padding = 'same', kernel_initializer = initializer)(merge_dec_1)
conv_dec_1 = Conv2D(512, 3, activation = 'relu', padding = 'same', kernel_initializer = initializer)(conv_dec_1)

# Block decoder 2
up_dec_2 = Conv2D(256, 2, activation = 'relu', padding = 'same', kernel_initializer = initializer)(UpSampling2D(size = (2,2))(conv_dec_1))
merge_dec_2 = concatenate([x[3], up_dec_2], axis = 3)
conv_dec_2 = Conv2D(256, 3, activation = 'relu', padding = 'same', kernel_initializer = initializer)(merge_dec_2)
conv_dec_2 = Conv2D(256, 3, activation = 'relu', padding = 'same', kernel_initializer = initializer)(conv_dec_2)

# Block decoder 3
up_dec_3 = Conv2D(128, 2, activation = 'relu', padding = 'same', kernel_initializer = initializer)(UpSampling2D(size = (2,2))(conv_dec_2))
merge_dec_3 = concatenate([x[2], up_dec_3], axis = 3)
conv_dec_3 = Conv2D(128, 3, activation = 'relu', padding = 'same', kernel_initializer = initializer)(merge_dec_3)
conv_dec_3 = Conv2D(128, 3, activation = 'relu', padding = 'same', kernel_initializer = initializer)(conv_dec_3)

# Block decoder 4
up_dec_4 = Conv2D(64, 2, activation = 'relu', padding = 'same', kernel_initializer = initializer)(UpSampling2D(size = (2,2))(conv_dec_3))
merge_dec_4 = concatenate([x[1], up_dec_4], axis = 3)
conv_dec_4 = Conv2D(64, 3, activation = 'relu', padding = 'same', kernel_initializer = initializer)(merge_dec_4)
conv_dec_4 = Conv2D(64, 3, activation = 'relu', padding = 'same', kernel_initializer = initializer)(conv_dec_4)
conv_dec_4 = Conv2D(2, 3, activation = 'relu', padding = 'same', kernel_initializer = initializer)(conv_dec_4)

# Block decoder 4
up_dec_5 = Conv2D(64, 2, activation = 'relu', padding = 'same', kernel_initializer = initializer)(UpSampling2D(size = (2,2))(conv_dec_4))
merge_dec_5 = concatenate([x[0], up_dec_5], axis = 3)
conv_dec_5 = Conv2D(64, 3, activation = 'relu', padding = 'same', kernel_initializer = initializer)(merge_dec_5)
conv_dec_5 = Conv2D(64, 3, activation = 'relu', padding = 'same', kernel_initializer = initializer)(conv_dec_5)
conv_dec_5 = Conv2D(2, 3, activation = 'relu', padding = 'same', kernel_initializer = initializer)(conv_dec_5)
# -- Dencoder -- #

'''
Here we are commented the output part because we are not going to pass any input in the form 
of image , we are just defining the architecture.
'''
# -----Output----#
# output = Conv2D(N_CLASSES, 1, activation = 'softmax')(conv_dec_5)

In [None]:
'''
Here we can both "encoder of vgg16 + decoder" get concatenated together.  
'''

model = tf.keras.Model(inputs= model.input , outputs = conv_dec_5)
model.summary()

Model: "model_8"
__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
input_8 (InputLayer)            [(None, 224, 224, 3) 0                                            
__________________________________________________________________________________________________
block1_conv1 (Conv2D)           (None, 224, 224, 64) 1792        input_8[0][0]                    
__________________________________________________________________________________________________
block1_conv2 (Conv2D)           (None, 224, 224, 64) 36928       block1_conv1[0][0]               
__________________________________________________________________________________________________
block1_pool (MaxPooling2D)      (None, 112, 112, 64) 0           block1_conv2[0][0]               
____________________________________________________________________________________________


### **NOTE :**

**Question** :As pre-trained model learned the weights based on the perticular input size so it would be problematic , if we pass different size of image as input in pre-trained model ?

**Answer** : The only problem with pre-trained model with different image is 
that when we use fully connected or dense layer at the end. 
Because convolution layer doesn't care about size , it will just reduce the size based on the kernel .  

so when we remove fully connected layer from pre-trained model , so then thier is no problem.

let us suppose are vgg16 trained on input size (224,224,3) ,we get output (7,7,512) from last conv layer then we flatten the output and passed to FC (Fully connected) layer we get some output.

but when we pass input size (448,448,3) ,we get output (14,14,512) from last conv layer then we flatten the output and passed to FC (Fully connected) layer we get we get error because now the size of fc layer gets changed.

so in segmentation we don't get any problem related to input size because we are removing fc layer and only use conv layer.