Hello World 😃! This is my first kaggle notebook, where I will be implementing Fully Convolutional Networks for semantic segmentation on images using `Keras`. For building our model, we have used the `cityscapes-image-pairs` dataset, containing 2975 images which are used for training and 500 images for validating the performance of our model.

![FCN](https://www.jeremyjordan.me/content/images/2018/05/Screen-Shot-2018-05-16-at-10.34.02-PM.png)

Image segmentation can be interpreted as a classification problem, where the task is to classify each pixel of the image into a particular class. To build an end-to-end pixel-to-pixel segmentation network, our model must be capable of extracting rich spatial information from the images. A typical CNN used for classification takes an image as input, passes it through a series of convolutional and pooling layers and uses fully-connected layers in the end to output a fixed length vector, thus discarding all the spatial information from the original image. However, if the fully-connected layers at the end are replaced by convolutional layers, we get coarse spatial features as output instead of vectors, which can be further upsampled to form the classification maps corresponding to our image.

![Segmentation Architecture](https://csdl-images.computer.org/trans/tp/2017/04/figures/shelh3-2572683.gif)

The architecture used in this notebook is shown in the figure above. The segmentation operation performed by the network can be divided into two parts -

* **Downsampling operation -** This part is simple. Our image is propagated through a series of convolutional and pooling layers to extract the spatial information. For this I have used the `VGG-16` architecture initialized with weights pretrained on the Imagenet dataset. The final fully connected layers of the network are discarded and two new convolutional layers `conv6` and `conv7` are added in their place. While training, the entire model is trained, thus fine-tuning the pretrained weights on our dataset.

* **Upsampling operation -** This generates the classification map from the feature map produced in the Downsampling operation. The operation is also referred as deconvolution or fractionally strided convolution operation, i.e. features of smaller spatial resolution are mapped to larger spatial resolution. Depending upon the upsampling strategy, our network can be of three types as shown in the figure -

    * **FCN-32s** - Directly produces segmentation map from `conv7` layer by using transpose convolution with stride 32
    * **FCN-16s** - Add 1 x 1 convolution on `pool4` and fuse it with 2X upsampled `conv7`. The segmentation map is then produced by using a transpose convolution on the result with stride 16 
    * **FCN-8s** - Add 1 x 1 convolution on `pool3` and fuse it with 2X upsampled fused predictions of `conv7` and `pool4`. The segmentation map is then produced by using a transpose convolution on the result with stride 8

So let's dive into the code and see how the above model can be implemented 😁.

## Import Dependencies

In [5]:
import numpy as np
import os
import matplotlib.pyplot as plot
from PIL import Image
import cv2
import random
import seaborn as sns

In [6]:
from keras.models import Model
from keras.optimizers import Adam
from keras.utils import plot_model
from keras.callbacks import ModelCheckpoint
from keras.layers import Input, Conv2D, Activation, Add, Conv2DTranspose
from keras.applications.vgg16 import VGG16

## Initialize Variables

After importing the libraries, we initialize all the necessary variables - 
* `train_folder` - Path for training images
* `valid_folder` - Path for testing images
* `width` - Width of an image
* `height` - Height of an image
* `classes` - No. of discrete pixel values in the segmentation maps (no. of labels)
* `batch_size` - Size of a single batch
* `num_of_training_samples` - Total number of training samples
* `num_of_testing_samples` - Total number of testing samples

In [7]:
train_folder="/kaggle/input/cityscapes-image-pairs/cityscapes_data/cityscapes_data/train"
valid_folder="/kaggle/input/cityscapes-image-pairs/cityscapes_data/cityscapes_data/val"
width = 256
height = 256
classes = 13
batch_size = 10
num_of_training_samples = len(os.listdir(train_folder)) 
num_of_testing_samples = len(os.listdir(valid_folder))

## Helper Functions

For preprocessing the dataset and defining the model, we have defined several helper functions -

* `LoadImage` - Loads a single image and its corresponding segmentation map 
    * **Arguements** :
        * `name` - Name of the image file
        * `path` - Path to the image directory
    * **Returns** - A tuple of 2 numpy arrays (image and segmentation map)
    
    
* `bin_image` - Bin a segmentation map (Converts pixels from range (0, 255) to (0, classes))
    * **Arguements** :
        * `mask` - Original segmentation map
    * **Returns** - New segmentation mask after binning pixel values
    
    
* `getSegmentationArr` - Convert RGB segmentation maps to categorical maps used for training our model
    * **Arguements** :
        * `image` - Segmentation mask after binning
        * `classes` - Number of categories or unique pixel values (13)
        * `width` - Width of segmentation map
        * `height` - Height of segmentation map
    * **Returns** - Categorical segmentation map (width, height, classes)
    
    
* `give_color_to_seg_img` - Convert categorical arrays back to colored segmentation maps
    * **Arguements** : 
        * `seg` - Categorical segmentation map (width, height, classes)
        * `n_classes` - Number of categories or unique pixel values (13) 
    * **Returns** - Colored segmentation map (width, height, 3)
    
    
* `DataGenerator` - Returns data in form of batches
    * **Arguements** :
        * `path` - location or path of the image directory
        * `batch_size` - size of each batch
        * `classes` - Number of categories or unique pixel values (13)
    * **Returns** - Tuple of `batch_size` number of images and segmentation maps
    
    
* `fcn` - Creates the FCN model
    * **Arguements** :
        * `vgg` - VGG16 pretrained model
        * `classes` - Number of categories or unique pixel values (13)
        * `fcn8` - Set True to use FCN-8s model
        * `fcn16` - Set True to use FCN-16s model
    * **Returns** - FCN model
    * **Note** - If both `fcn8` and `fcn16` arguements are set to False, it returns FCN-32s model by default

## Load Image and Segmentation Mask

In [8]:
def LoadImage(name, path):
    img = Image.open(os.path.join(path, name))
    img = np.array(img)
    
    image = img[:,:256]
    mask = img[:,256:]
    
    return image, mask

## Bin Segmentation Mask 

In [9]:
def bin_image(mask):
    bins = np.array([20, 40, 60, 80, 100, 120, 140, 160, 180, 200, 220, 240])
    new_mask = np.digitize(mask, bins)
    return new_mask

## Segmentation Masks to Categorical Arrays 

In [10]:
def getSegmentationArr(image, classes, width=width, height=height):
    seg_labels = np.zeros((height, width, classes))
    img = image[:, : , 0]

    for c in range(classes):
        seg_labels[:, :, c] = (img == c ).astype(int)
    return seg_labels

## Categorical Arrays to Colored Segmentation Masks

In [11]:
def give_color_to_seg_img(seg, n_classes=13):
    
    seg_img = np.zeros( (seg.shape[0],seg.shape[1],3) ).astype('float')
    colors = sns.color_palette("hls", n_classes)
    
    for c in range(n_classes):
        segc = (seg == c)
        seg_img[:,:,0] += (segc*( colors[c][0] ))
        seg_img[:,:,1] += (segc*( colors[c][1] ))
        seg_img[:,:,2] += (segc*( colors[c][2] ))

    return(seg_img)

## Generator function to generate data batches

In [12]:
def DataGenerator(path, batch_size=10, classes=13):
    files = os.listdir(path)
    while True:
        for i in range(0, len(files), batch_size):
            batch_files = files[i : i+batch_size]
            imgs=[]
            segs=[]
            for file in batch_files:
                #file = random.sample(files,1)[0]
                image, mask = LoadImage(file, path)
                mask_binned = bin_image(mask)
                labels = getSegmentationArr(mask_binned, classes)

                imgs.append(image)
                segs.append(labels)

            yield np.array(imgs), np.array(segs)

## Visualize Data Samples

In [13]:
train_gen = DataGenerator(train_folder, batch_size=batch_size)
val_gen = DataGenerator(valid_folder, batch_size=batch_size)

In [14]:
imgs, segs = next(train_gen)
imgs.shape, segs.shape

In [15]:
image = imgs[0]
mask = give_color_to_seg_img(np.argmax(segs[0], axis=-1))
masked_image = cv2.addWeighted(image/255, 0.5, mask, 0.5, 0)

fig, axs = plot.subplots(1, 3, figsize=(20,20))
axs[0].imshow(image)
axs[0].set_title('Original Image')
axs[1].imshow(mask)
axs[1].set_title('Segmentation Mask')
#predimg = cv2.addWeighted(imgs[i]/255, 0.6, _p, 0.4, 0)
axs[2].imshow(masked_image)
axs[2].set_title('Masked Image')
plot.show()

## Segmentation model - FCN+Transfer Learning

In [16]:
def fcn(vgg, classes = 13, fcn8 = False, fcn16 = False):
    pool5 = vgg.get_layer('block5_pool').output 
    pool4 = vgg.get_layer('block4_pool').output
    pool3 = vgg.get_layer('block3_pool').output
    
    conv_6 = Conv2D(1024, (7, 7), activation='relu', padding='same', name="conv_6")(pool5)
    conv_7 = Conv2D(1024, (1, 1), activation='relu', padding='same', name="conv_7")(conv_6)
    
    conv_8 = Conv2D(classes, (1, 1), activation='relu', padding='same', name="conv_8")(pool4)
    conv_9 = Conv2D(classes, (1, 1), activation='relu', padding='same', name="conv_9")(pool3)
    
    deconv_7 = Conv2DTranspose(classes, kernel_size=(2,2), strides=(2,2))(conv_7)
    add_1 = Add()([deconv_7, conv_8])
    deconv_8 = Conv2DTranspose(classes, kernel_size=(2,2), strides=(2,2))(add_1)
    add_2 = Add()([deconv_8, conv_9])
    deconv_9 = Conv2DTranspose(classes, kernel_size=(8,8), strides=(8,8))(add_2)
    
    if fcn8 :
        output_layer = Activation('softmax')(deconv_9)
    elif fcn16 :
        deconv_10 = Conv2DTranspose(classes, kernel_size=(16,16), strides=(16,16))(add_1)
        output_layer = Activation('softmax')(deconv_10)
    else :
        deconv_11 = Conv2DTranspose(classes, kernel_size=(32,32), strides=(32,32))(conv_7)
        output_layer = Activation('softmax')(deconv_11)
    
    model = Model(inputs=vgg.input, outputs=output_layer)
    return model

In [17]:
vgg = VGG16(include_top=False, weights='imagenet', input_shape=(width, height, 3))

In [18]:
model = fcn(vgg, fcn8=True)
model.summary()

In [19]:
plot_model(model)

## Train our model

In [20]:
adam = Adam(lr=0.001, decay=1e-06)
model.compile(loss='categorical_crossentropy', optimizer=adam, metrics=['accuracy'])

In [21]:
filepath = "best-model-vgg.hdf5"
checkpoint = ModelCheckpoint(filepath, monitor='val_accuracy', verbose=1, save_best_only=True, mode='max')
callbacks_list = [checkpoint]

In [None]:
history = model.fit_generator(train_gen, epochs=20, steps_per_epoch=num_of_training_samples//batch_size,
                       validation_data=val_gen, validation_steps=num_of_testing_samples//batch_size,
                       callbacks=callbacks_list, use_multiprocessing=True)

## Validation and Vizualization

In [None]:
model.load_weights("best-model-vgg.hdf5")

In [None]:
loss = history.history["val_loss"]
acc = history.history["val_accuracy"] #accuracy

plot.figure(figsize=(12, 6))
plot.subplot(211)
plot.title("Val. Loss")
plot.plot(loss)
plot.xlabel("Epoch")
plot.ylabel("Loss")

plot.subplot(212)
plot.title("Val. Accuracy")
plot.plot(acc)
plot.xlabel("Epoch")
plot.ylabel("Accuracy")

plot.tight_layout()
plot.savefig("learn.png", dpi=150)
plot.show()

In [None]:
#val_gen = DataGenerator(valid_folder)
max_show = 1
imgs, segs = next(val_gen)
pred = model.predict(imgs)

for i in range(max_show):
    _p = give_color_to_seg_img(np.argmax(pred[i], axis=-1))
    _s = give_color_to_seg_img(np.argmax(segs[i], axis=-1))

    predimg = cv2.addWeighted(imgs[i]/255, 0.5, _p, 0.5, 0)
    trueimg = cv2.addWeighted(imgs[i]/255, 0.5, _s, 0.5, 0)
    
    plot.figure(figsize=(12,6))
    plot.subplot(121)
    plot.title("Prediction")
    plot.imshow(predimg)
    plot.axis("off")
    plot.subplot(122)
    plot.title("Original")
    plot.imshow(trueimg)
    plot.axis("off")
    plot.tight_layout()
    plot.savefig("pred_"+str(i)+".png", dpi=150)
    plot.show()

## References

* [Fully Convolutional Networks for Semantic Segmentation](https://www.cv-foundation.org/openaccess/content_cvpr_2015/papers/Long_Fully_Convolutional_Networks_2015_CVPR_paper.pdf)
* [Fully Convolutional Networks (FCN) for 2D segmentation](http://www.deeplearning.net/tutorial/fcn_2D_segm.html)
* [Learn about Fully Convolutional Networks for semantic segmentation](https://fairyonice.github.io/Learn-about-Fully-Convolutional-Networks-for-semantic-segmentation.html)
* [An Introduction to different Types of Convolutions in Deep Learning](https://towardsdatascience.com/types-of-convolutions-in-deep-learning-717013397f4d)