# <center> Building and Training a UNet Model (W7 S2)

---



### About the Dataset

This dataset contains 9766 realistic renders of lunar landscapes and their masks (segmented into three classes: sky, small rocks, bigger rocks). Additionally, a csv file of bounding boxes and cleaned masks of ground truths are provided.

An interesting feature of this dataset is that the images are synthetic; they were created using Planetside Software's Terragen. This isn't too obvious immediately as the renderings are highly realistic but it does make more sense after taking into account the scarcity of space imagery data.

Acknowledgment: Romain Pessia and Genya Ishigami of the Space Robotics Group, Keio University, Japan. You can find the dataset https://www.kaggle.com/romainpessia/artificial-lunar-rocky-landscape-dataset

### Reminder to turn on your GPU accelerator, from right hand side of your kaggle notebook, under Settings.

### Importing libraries



In [None]:
import os
import cv2
import numpy as np
from PIL import Image
import matplotlib.pyplot as plt
import tensorflow as tf
import keras 

from tqdm import tqdm
import glob
from sklearn.model_selection import train_test_split

## Installing segmentation_models module

Make sure to turn your internet on from kaggle settings on the right hand side. If you don't see the Internet option in Settings, verify your kaggle profile by updating you phone number. 

### you'll know more about segmentation_models at the end of this lecture and in the next lecture. In this lecture we'll only use segmentation_models for iou_score. 

In [None]:

!pip install segmentation_models

import segmentation_models as sm

os.environ["SM_FRAMEWORK"] = "tf.keras"
sm.set_framework('tf.keras')
keras.backend.set_image_data_format('channels_last')

## Data Preprocessing Pipeline

In [None]:
'''
Here load_data function is called. This will load the dataset paths and 
split it into X_train, X_test, y_train, y_test '''

img_dir = '../input/artificial-lunar-rocky-landscape-dataset/images/render'
mask_dir = '../input/artificial-lunar-rocky-landscape-dataset/images/clean'


# let's get the list of image paths and mask paths in sorted order from the given directory respectively
images = []
masks = []

In [None]:
# let's get top 5 images and masks from the lists
images[:5], masks[:5]

## For this session, we will just use first 2000 images and masks as our dataset

In [None]:
images = images[:2000]
masks = masks[:2000]

In [None]:
sample_img = Image.open(images[0])
sample_img

In [None]:
sample_mask = Image.open(masks[0])
sample_mask

In [None]:
sample_img.size

### Originally, our images size is (720, 480) but we will reduce the size for better and faster processing. Since we are focusing on the clean masks so it will not effect much. 

### Ground masks are more detailed and have so much noise. We'll keep things easy for our lecture. However, feel free to use ground masks and play around to explore more. 

In [None]:
# we will use this shape of data for our model
H = 256 # height of image
W = 256 # width of image


# for images and labels, to store our dataset
X_img = []
y_mask = []

# here we have our loop to read, process and store our images X_img, and y_mask variables
for x, y in tqdm(zip(images, masks)):
    # reading images
    img = cv2.imread()
    img = cv2.resize(img, (W, H))
    img = img / 255.0
    img = img.astype(np.float32) # if pixel values are less than 1, then it is important that the values have float datatype
    
    # reading masks
    mask = cv2.imread()
    mask = cv2.resize(mask, (W, H))
    mask = mask.astype(np.int32)  # if pixel values are between 1 and 255, then it is important that the values have integer datatype
    
    # storing images and masks
    X_img.append(img)
    y_mask.append(mask)


In [None]:
X_img = np.array(X_img)
y_mask = np.array(y_mask)

# 1600 datapoints as training dataset and 400 for validation dataset 
X_train = X_img[:1600]
X_valid = X_img[1600:]

y_train = y_mask[:1600]
y_valid = y_mask[1600:]


In [None]:
X_train.shape
# number of images, height, width, channels

In [None]:
fig, (ax1, ax2) = plt.subplots(1,2, figsize = (10, 10))

ax1.set_title('Image')
ax2.set_title('Mask')

ax1.imshow(X_train[1])
ax2.imshow(y_train[1], cmap = 'gray')

plt.show()

Check this article to know more about how to build optimized data pipeline using tf
https://www.tensorflow.org/guide/data_performance

# Data Pipeline

### One hot encoding

![](https://i.imgur.com/mtimFxh.png)

#### Similarly, we'll one hot encode our labels to 4 different channels for four classes

In [None]:
batch_size = 4
num_classes = 4 

'''Here the from_tensor_slices function is called to make dataset objects of our training and validation sets'''
# calling tf_dataset
train_dataset = 

valid_dataset = 


Read more about prefetching and AUTOTUNE here: https://www.tensorflow.org/guide/data_performance#optimize_performance

## Naive Approach
![](https://www.tensorflow.org/guide/images/data_performance/naive.svg)


## After prefetching

![](https://www.tensorflow.org/guide/images/data_performance/prefetched.svg)

In [None]:
train_dataset = 
valid_dataset = 

In [None]:
sample = iter(valid_dataset)
data = next(sample)
data[0].shape
# batch size, height, width, channels

In [None]:
data[1].shape
# batch size, height, width, channels/classes

## Creating U-net Architecture

In [None]:
from tensorflow.keras.layers import Input, Conv2D, BatchNormalization, Activation, MaxPool2D, UpSampling2D, Concatenate
from tensorflow.keras.models import Model

'''conv_block it is used to create one block with two convolution layer 
followed by BatchNormalization and activation function relu. 
If the pooling is required then Maxpool2D is applied and return it else not.'''
# function to create convolution block
def conv_block(inputs, filters, pool=True):
    x = 
    x = 
    x = 

    x = 
    x = 
    x = 

    if pool == True:
        p = 
        return x, p
    else:
        return x

'''build_unet it is used to create the U-net architecture.'''
# function to build U-net
def build_unet(shape, num_classes):
    inputs = Input(shape)

    """ Encoder """
    x1, p1 = 
    x2, p2 = 
    x3, p3 = 
    x4, p4 = 

    """ Bridge """
    b1 = 

    """ Decoder """
    # Reference for UpSampling2D: https://www.tensorflow.org/api_docs/python/tf/keras/layers/UpSampling2D
    # it simply repeats the rows and columns of the data by size[0] and size[1] respectively in nearest interpolation
    # check below in the below cell, the difference between bilinear and nearest interpolation
    u1 = 
    c1 = 
    x5 = 

    u2 = 
    c2 = 
    x6 = 

    u3 = 
    c3 = 
    x7 = 

    u4 = 
    c4 = 
    x8 = 

    """ Output layer """
    output = 
    
    return Model(inputs, output)

In [None]:
input_shape = (1, 3, 2, 1)
x = np.arange(np.prod(input_shape)).reshape(input_shape)
print(x)

b = 
print(b)

n = 
print(n)


**For Contracting Path:** the **conv_block** function is called four time which will create four block with pooling (pool = True). The process is repeated 3 more times.

**For Bridge:** the **conv_block** function is called one time without pooling (pool=False).

**For Expansive Path: UpSampling2D** is used to expands the size of images. This expanded  image is concatenated with the corresponding image from the contracting path, The reason here is to combine the information from the previous layers in order to get a more precise prediction. And now **conv_block** function is called without pooling (pool=False). The process is repeated 3 more times.

The last step is to reshape the image to satisfy our prediction requirements. The last layer is a convolution layer with 1 filter of size 1x1.

In [None]:
# calling build_unet function
model = build_unet((256, 256, 3), 4)
model.summary()

## Load model and compile

In [None]:
# importing libraries
from segmentation_models.metrics import iou_score

""" Hyperparameters """
lr = 1e-4
epochs = 5

"""Model"""
model.compile(loss="categorical_crossentropy", 
              optimizer=tf.keras.optimizers.Adam(lr), 
              metrics=[iou_score])


train_steps = len(X_train)//batch_size
valid_steps = len(X_valid)//batch_size

## Train model

In [None]:
'''model.fit is used to train the model'''
model_history = model.fit(train_dataset,
        steps_per_epoch=train_steps,
        validation_data=valid_dataset,
        validation_steps=valid_steps,
        epochs=epochs
    )

## Predict from model

In [None]:
# function to predict result 
def predict_image(img_path, mask_path, model):
    H = 256
    W = 256
    num_classes = 4

    img = cv2.imread(img_path, cv2.IMREAD_COLOR)
    img = cv2.resize(img, (W, H))
    img = img / 255.0
    img = img.astype(np.float32)

    ## Read mask
    mask = cv2.imread(mask_path, cv2.IMREAD_GRAYSCALE)
    mask = cv2.resize(mask, (W, H))   ## (256, 256)
    mask =  ## (256, 256, 1)
    mask = mask * (255/num_classes)
    mask = mask.astype(np.int32)
    mask = 

    ## Prediction
    pred_mask = model.predict(np.expand_dims(img, axis=0))[0]
    pred_mask = np.argmax(pred_mask, axis=-1)
    pred_mask = 
    pred_mask = pred_mask * (255/num_classes)
    pred_mask = pred_mask.astype(np.int32)
    pred_mask = 

    return img, mask, pred_mask

In [None]:
# function to display result
def display(display_list):
  plt.figure(figsize=(12, 10))

  title = ['Input Image', 'True Mask', 'Predicted Mask', 'Mask On Image']

  for i in range(len(display_list)):
    plt.subplot(1, len(display_list), i+1)
    plt.title(title[i])
    plt.imshow(tf.keras.preprocessing.image.array_to_img(display_list[i]))
    plt.axis('off')
  plt.show()

In [None]:
img_path = '../input/artificial-lunar-rocky-landscape-dataset/images/render/render0041.png'
mask_path = '../input/artificial-lunar-rocky-landscape-dataset/images/clean/clean0041.png'

img, mask, pred_mask = predict_image(img_path, mask_path, model)

display([img, mask, pred_mask])

## segmentation_model

segmentation_models is a python library with Neural Networks for Image Segmentation based on Keras and TensorFlow.

The main features of this library are:

* High level API (just two lines of code to create model for segmentation)
* 4 models architectures for binary and multi-class image segmentation (including legendary Unet)
* 25 available backbones for each architecture
* All backbones have pre-trained weights for faster and better convergence
* Helpful segmentation losses (Jaccard, Dice, Focal) and metrics (IoU, F-score)

Learn more: https://segmentation-models.readthedocs.io/en/latest/tutorial.html

## A practical note: different backbones in modern U-Nets

So far, you have looked at how the U-Net architecture was implemented in the original work by Ronneberger et al. Over the years, many people have experienced with different setups for U-Nets, including pretraining on e.g. ImageNet and then finetuning to their specific image segmentation tasks.

This means that today, you will likely use a U-Net that no longer utilizes the original architecture as proposed above - but it's still a good starting point, because the contractive path, expansive path and the skip connections remain the same.

**Common backbones for U-Net architectures these days are ResNet, ResNeXt, EfficientNet and DenseNet architectures. Often, these have been pretrained on the ImageNet dataset, so that many common features have already been learned. By using these backbone U-Nets, initialized with pretrained weights, it's likely that you can reach convergence on your segmentation problem much faster.**

That's it! You have now a high-level understanding of U-Net and its components.

## In the next session, we will learn how you can use segmentation_models using Transfer learning to use UNet architecture with different pretrained models as backbone.

## In next week, we will learn about different techniques to improve the accuracy of our model. 