*Note: To explain more about the competition, I created a seperate notebook where I do exploratory data analysis (link: https://www.kaggle.com/ekhtiar/eda-find-me-in-the-clouds). I will be using this notebook to take a workshop. I am making it public so new Kaggler's can also follow along.*

## What Is Semantic Segmentation?

Image classification, semantic segmentation, object detection, and instance segmentation are four different types of problems we deal with in image data. Semantic segmentation is the task of classifying each and very pixel in an image into a class. For example, in the image below, you can see that all baloons are classified as once in blue. A more difficult task in image segementation is rather instance segmentation. Semantic segmentation is different from instance segmentation which is that different objects of the same class will have different labels as in ballon1, baloon2 and hence different colours. The picture below very crisply illustrates the difference between instance and semantic segmentation; as well as classification and object detection.

![](https://miro.medium.com/max/548/1*OnuIJiFVpy7m83LSCUgi6w.png)
Source: https://towardsdatascience.com/semantic-segmentation-popular-architectures-dff0a75f39d0

If you want to get more fundemental knowledge about semantic segmentation, [this presentation](http://www.cs.toronto.edu/~tingwuwang/semantic_segmentation.pdf) is also a great point to start. 

In this competition, we are asked to segment different types of cloud formation appearing in the image. Our task for this competition is to label each pixel of the image to five classes: Fish, Flower, Gravel, Sugar, or Nothing! I am sure you already know that Convolutional Neural Networks (CNNs) are great at image related tasks. However, there are many great CNN architectures to handle semantic segmentation tasks. You can find a list of them with a link to practical implementation in [this link](https://github.com/mrgloom/awesome-semantic-segmentation). In this tutorial, I will pick up a much improved version of a very popular CNN architecture, UNet++ or Nested UNets. More about the architecture of our network later. First, let's set our notebook up by doing a few basic imports and creating a few configuration parameters.

## TensorFlow - A Few Words

TensorFlow is an open source software library released in 2015 by Google. TensorFlow enables users to express arbitrary computation as a graph of data flows. Nodes in this graph represent mathematical operations, whereas edges represent data that is communicated from one node to another. Data in TensorFlow are represented as tensors, which are multidimensional arrays. Although this framework for thinking about computation is valuable in many different fields, TensorFlow is primarily used for deep learning in practice and research. [1]

Although TensorFlow was always powerful, it was not always the most intuitive deep learning framework to use. However, the TensorFlow development team started addressing this issues by working toward a more stable and intuitive release of TensorFlow 2.0. One of the major change going forward is integration of Keras. Keras is an open-source neural-network library written in Python. It is capable of running on top of many other deep learning frameworks including TensorFlow. At the time TensorFlow was initially release, Keras was much more user-friendly, modular, and extensible. However, now you can use Keras as one of the TensorFlow APIs. Along with this, many other exciting improvements came in the much anticipated TensorFlow 2.0 release. The stable release was made on 30th September 2019, and you can read the official annoucement here: https://towardsdatascience.com/announcement-tensorflow-2-0-has-arrived-ee59283fd83a.

TensorFlow is also becoming much more complete and end-to-end, with an emphasis on simplification of model deployment and productization. TensorFlow 2.0 standardized the SavedModel file format as the format accross all the deployment options accross various platform (cloud, web, browser, Node.js, mobile and embedded systems). It also supports high performance training, like multi-gpu training, by the [Distribution Strategy API](https://www.tensorflow.org/guide/distributed_training).

![](https://miro.medium.com/max/960/0*C7GCWYlsMrhUYRYi)

In Kaggle, TensorFlow 1.14 is still the default version, so we will use that for this tutorial. However, the APIs that I will be using are all tensorflow.keras. So, this code would run without any change in TensorFlow 2.0.

[1] TensorFlow for deep learningâ€”implementing neural networks - by Nikhil Buduma Publisher: O'Reilly Media, Inc.



## Import & Configurations 

Enough information, let's get started by importing some libraries and setting up some parameters for our network.

In [None]:
# basic imports
import pandas as pd
import numpy as np
import os
import cv2
import matplotlib.pylab as plt

In [None]:
# tensorflow imports
from tensorflow import reduce_sum
from tensorflow.keras.utils import Sequence
from tensorflow.keras.models import Model
from tensorflow.keras.layers import Input, Conv2D, Conv2DTranspose, MaxPool2D, Dropout, concatenate, Flatten
from tensorflow.keras.losses import binary_crossentropy
from tensorflow.keras.optimizers import Adam
from tensorflow.keras.regularizers import l2
from sklearn.model_selection import train_test_split

In [None]:
# image directory paths 
data_path = '../input/understanding_cloud_organization'
train_csv_path = os.path.join('../input/understanding_cloud_organization','train.csv')
train_image_path = os.path.join('../input/understanding_cloud_organization','train_images')

In [None]:
# network configuration parameters
# original image is 1600x256, so we will resize it
img_w = 384 # resized weidth
img_h = 256 # resized height
batch_size = 10
epochs = 25
# batch size for training unet
k_size = 3 # kernel size 3x3
val_size = .20 # split of training set between train and validation set
# network hyper parameters
smooth = 1.
dropout_rate = 0.5

In [None]:
# saving and loading model
load_pretrained_model = False # load a pre-trained model
save_model = True # save the model after training
pretrained_model_path = './nested_unet.h5' # path of pretrained model
model_save_path = './nested_unet.h5' # path of model to save

## Load Data & Utility Functions

In this section, we will load the metadata about the image into pandas dataframe. We will also process it a little bit to make our life easier.

In [None]:
# load full data and label no mask as -1
train_df = pd.read_csv(train_csv_path).fillna(-1)

In [None]:
# image id and class id are two seperate entities and it makes it easier to split them up in two columns
train_df['ImageId'] = train_df['Image_Label'].apply(lambda x: x.split('_')[0])
train_df['Label'] = train_df['Image_Label'].apply(lambda x: x.split('_')[1])
# lets create a dict with class id and encoded pixels and group all the defaults per image
train_df['Label_EncodedPixels'] = train_df.apply(lambda row: (row['Label'], row['EncodedPixels']), axis = 1)

In [None]:
# group together all masks for each image
grouped_EncodedPixels = train_df.groupby('ImageId')['Label_EncodedPixels'].apply(list)
grouped_EncodedPixels.head()

Utility Functions for RLE Encoding & Decoding 

In [None]:
# from https://www.kaggle.com/robertkag/rle-to-mask-converter
def rle_to_mask(rle_string,height,width):
    '''
    convert RLE(run length encoding) string to numpy array

    Parameters: 
    rleString (str): Description of arg1 
    height (int): height of the mask
    width (int): width of the mask 

    Returns: 
    numpy.array: numpy array of the mask
    '''
    rows, cols = height, width
    if rle_string == -1:
        return np.zeros((height, width))
    else:
        rleNumbers = [int(numstring) for numstring in rle_string.split(' ')]
        rlePairs = np.array(rleNumbers).reshape(-1,2)
        img = np.zeros(rows*cols,dtype=np.uint8)
        for index,length in rlePairs:
            index -= 1
            img[index:index+length] = 255
        img = img.reshape(cols,rows)
        img = img.T
        return img

In [None]:
# Thanks to the authors of: https://www.kaggle.com/paulorzp/rle-functions-run-lenght-encode-decode
def mask_to_rle(mask):
    '''
    Convert a mask into RLE
    
    Parameters: 
    mask (numpy.array): binary mask of numpy array where 1 - mask, 0 - background

    Returns: 
    sring: run length encoding 
    '''
    pixels= mask.T.flatten()
    pixels = np.concatenate([[0], pixels, [0]])
    runs = np.where(pixels[1:] != pixels[:-1])[0] + 1
    runs[1::2] -= runs[::2]
    return ' '.join(str(x) for x in runs)

## Data Generator
To push the data to our model, we will create a custom data generator. A generator lets us load data progressively, instead of loading it all into memory at once. A custom generator allows us to also fit in more customization during the time of loading the data. As the model is being procssed in the GPU, we can use a custom generator to pre-process images via a generator. At this time, we can also take advantage multiple processors to parallelize our pre-processing.

In [None]:
class DataGenerator(Sequence):
    def __init__(self, list_ids, labels, image_dir, batch_size=32,
                 img_h=256, img_w=512, shuffle=True):
        
        self.list_ids = list_ids
        self.labels = labels
        self.image_dir = image_dir
        self.batch_size = batch_size
        self.img_h = img_h
        self.img_w = img_w
        self.shuffle = shuffle
        self.on_epoch_end()
    
    def __len__(self):
        'denotes the number of batches per epoch'
        return int(np.floor(len(self.list_ids)) / self.batch_size)
    
    def __getitem__(self, index):
        'generate one batch of data'
        indexes = self.indexes[index*self.batch_size:(index+1)*self.batch_size]
        # get list of IDs
        list_ids_temp = [self.list_ids[k] for k in indexes]
        # generate data
        X, y = self.__data_generation(list_ids_temp)
        # return data 
        return X, y
    
    def on_epoch_end(self):
        'update ended after each epoch'
        self.indexes = np.arange(len(self.list_ids))
        if self.shuffle:
            np.random.shuffle(self.indexes)
            
    def __data_generation(self, list_ids_temp):
        'generate data containing batch_size samples'
        X = np.empty((self.batch_size, self.img_h, self.img_w, 1))
        y = np.empty((self.batch_size, self.img_h, self.img_w, 4))
        
        for idx, id in enumerate(list_ids_temp):
            file_path =  os.path.join(self.image_dir, id)
            image = cv2.imread(file_path, 0)
            image_resized = cv2.resize(image, (self.img_w, self.img_h))
            image_resized = np.array(image_resized, dtype=np.float64)
            # standardization of the image
            image_resized -= image_resized.mean()
            image_resized /= image_resized.std()
            
            mask = np.empty((img_h, img_w, 4))
            
            for idm, image_class in enumerate(['Fish', 'Flower', 'Gravel', 'Sugar']):
                rle = self.labels.get(id + '_' + image_class)
                # if there is no mask create empty mask
                if rle is None:
                    class_mask = np.zeros((2100, 1400))
                else:
                    class_mask = rle_to_mask(rle, width=2100, height=1400)
             
                class_mask_resized = cv2.resize(class_mask, (self.img_w, self.img_h))
                mask[...,idm] = class_mask_resized
            
            X[idx,] = np.expand_dims(image_resized, axis=2)
            y[idx,] = mask
        
        # normalize Y
        y = (y > 0).astype(int)
            
        return X, y

In [None]:
# split the training data into train and validation set (stratified)
train_image_ids = train_df['ImageId'].unique()
X_train, X_val = train_test_split(train_image_ids, test_size=val_size, random_state=42)

In [None]:
# create a dict of all the masks
masks = {}
for index, row in train_df[train_df['EncodedPixels']!=-1].iterrows():
    masks[row['Image_Label']] = row['EncodedPixels']

In [None]:
params = {'img_h': img_h,
          'img_w': img_w,
          'image_dir': train_image_path,
          'batch_size': batch_size,
          'shuffle': True}

# Get Generators
training_generator = DataGenerator(X_train, masks, **params)
validation_generator = DataGenerator(X_val, masks, **params)

In [None]:
# check out the shapes
x, y = training_generator.__getitem__(0)
print(x.shape, y.shape)

In [None]:
# visualize cloud image with four classes of faults in seperate columns
def viz_cloud_img_mask(img, masks):
    img = cv2.cvtColor(img.astype('float32'), cv2.COLOR_BGR2RGB)
    fig, ax = plt.subplots(nrows=1, ncols=4, sharey=True, figsize=(20,10))
    cmaps = ["Reds", "Blues", "Greens", "Purples"]
    for idx, mask in enumerate(masks):
        ax[idx].imshow(img)
        ax[idx].imshow(mask, alpha=0.3, cmap=cmaps[idx])

In [None]:
# lets visualize some images with their cloud formation mask to make sure our data generator is working like it should
for ix in range(0,batch_size):
    if y[ix].sum() > 0:
        img = x[ix]
        masks_temp = [y[ix][...,i] for i in range(0,4)]
        viz_cloud_img_mask(img, masks_temp)

## UNet++: A Nested U-Net Architecture

To detect the cloud formation in our image, we need a convolutional neural network. In this section we will write the code build this network or model. The most used architecture for (semantic segmentation) task is U-Net. U-Net++ makes significant improvements on this, and this will be our architecture of choice for this experiment. [This medium article](https://medium.com/@sh.tsang/review-unet-a-nested-u-net-architecture-biomedical-image-segmentation-57be56859b20) explains the differences and improvement's of U-Net++ architecture over U-Net. Furthermore, how the architecture with "deep-supervision" works, a term you will see as a configurable parameter in our model. So the article is a good read if you want to get more knowledge about U-Net++.

![UNet++: A Nested U-Net Architecture](https://miro.medium.com/max/658/1*ExIkm6cImpPgpetFW1kwyQ.png)

The code below has been adopted from this GitHub repo: https://github.com/CarryHJR/Nested-UNet. So a big shoutout to the authors!

In [None]:
def standard_unit(input_tensor, stage, nb_filter, kernel_size=3):

    act = 'elu'

    x = Conv2D(nb_filter, (kernel_size, kernel_size), activation=act, name='conv'+stage+'_1', kernel_initializer = 'he_normal', padding='same', kernel_regularizer=l2(1e-4))(input_tensor)
    x = Dropout(dropout_rate, name='dp'+stage+'_1')(x)
    x = Conv2D(nb_filter, (kernel_size, kernel_size), activation=act, name='conv'+stage+'_2', kernel_initializer = 'he_normal', padding='same', kernel_regularizer=l2(1e-4))(x)
    x = Dropout(dropout_rate, name='dp'+stage+'_2')(x)

    return x

In [None]:
def Nest_Net(img_rows, img_cols, color_type=1, num_class=1, deep_supervision=False):

    nb_filter = [32,64,128,256,512]
    act = 'elu'

    bn_axis = 3
    img_input = Input(shape=(img_rows, img_cols, color_type), name='main_input')

    conv1_1 = standard_unit(img_input, stage='11', nb_filter=nb_filter[0])
    pool1 = MaxPool2D((2, 2), strides=(2, 2), name='pool1')(conv1_1)

    conv2_1 = standard_unit(pool1, stage='21', nb_filter=nb_filter[1])
    pool2 = MaxPool2D((2, 2), strides=(2, 2), name='pool2')(conv2_1)

    up1_2 = Conv2DTranspose(nb_filter[0], (2, 2), strides=(2, 2), name='up12', padding='same')(conv2_1)
    conv1_2 = concatenate([up1_2, conv1_1], name='merge12', axis=bn_axis)
    conv1_2 = standard_unit(conv1_2, stage='12', nb_filter=nb_filter[0])

    conv3_1 = standard_unit(pool2, stage='31', nb_filter=nb_filter[2])
    pool3 = MaxPool2D((2, 2), strides=(2, 2), name='pool3')(conv3_1)

    up2_2 = Conv2DTranspose(nb_filter[1], (2, 2), strides=(2, 2), name='up22', padding='same')(conv3_1)
    conv2_2 = concatenate([up2_2, conv2_1], name='merge22', axis=bn_axis)
    conv2_2 = standard_unit(conv2_2, stage='22', nb_filter=nb_filter[1])

    up1_3 = Conv2DTranspose(nb_filter[0], (2, 2), strides=(2, 2), name='up13', padding='same')(conv2_2)
    conv1_3 = concatenate([up1_3, conv1_1, conv1_2], name='merge13', axis=bn_axis)
    conv1_3 = standard_unit(conv1_3, stage='13', nb_filter=nb_filter[0])

    conv4_1 = standard_unit(pool3, stage='41', nb_filter=nb_filter[3])
    pool4 = MaxPool2D((2, 2), strides=(2, 2), name='pool4')(conv4_1)

    up3_2 = Conv2DTranspose(nb_filter[2], (2, 2), strides=(2, 2), name='up32', padding='same')(conv4_1)
    conv3_2 = concatenate([up3_2, conv3_1], name='merge32', axis=bn_axis)
    conv3_2 = standard_unit(conv3_2, stage='32', nb_filter=nb_filter[2])

    up2_3 = Conv2DTranspose(nb_filter[1], (2, 2), strides=(2, 2), name='up23', padding='same')(conv3_2)
    conv2_3 = concatenate([up2_3, conv2_1, conv2_2], name='merge23', axis=bn_axis)
    conv2_3 = standard_unit(conv2_3, stage='23', nb_filter=nb_filter[1])

    up1_4 = Conv2DTranspose(nb_filter[0], (2, 2), strides=(2, 2), name='up14', padding='same')(conv2_3)
    conv1_4 = concatenate([up1_4, conv1_1, conv1_2, conv1_3], name='merge14', axis=bn_axis)
    conv1_4 = standard_unit(conv1_4, stage='14', nb_filter=nb_filter[0])

    conv5_1 = standard_unit(pool4, stage='51', nb_filter=nb_filter[4])

    up4_2 = Conv2DTranspose(nb_filter[3], (2, 2), strides=(2, 2), name='up42', padding='same')(conv5_1)
    conv4_2 = concatenate([up4_2, conv4_1], name='merge42', axis=bn_axis)
    conv4_2 = standard_unit(conv4_2, stage='42', nb_filter=nb_filter[3])

    up3_3 = Conv2DTranspose(nb_filter[2], (2, 2), strides=(2, 2), name='up33', padding='same')(conv4_2)
    conv3_3 = concatenate([up3_3, conv3_1, conv3_2], name='merge33', axis=bn_axis)
    conv3_3 = standard_unit(conv3_3, stage='33', nb_filter=nb_filter[2])

    up2_4 = Conv2DTranspose(nb_filter[1], (2, 2), strides=(2, 2), name='up24', padding='same')(conv3_3)
    conv2_4 = concatenate([up2_4, conv2_1, conv2_2, conv2_3], name='merge24', axis=bn_axis)
    conv2_4 = standard_unit(conv2_4, stage='24', nb_filter=nb_filter[1])

    up1_5 = Conv2DTranspose(nb_filter[0], (2, 2), strides=(2, 2), name='up15', padding='same')(conv2_4)
    conv1_5 = concatenate([up1_5, conv1_1, conv1_2, conv1_3, conv1_4], name='merge15', axis=bn_axis)
    conv1_5 = standard_unit(conv1_5, stage='15', nb_filter=nb_filter[0])

    nestnet_output_1 = Conv2D(num_class, (1, 1), activation='sigmoid', name='output_1', kernel_initializer = 'he_normal', padding='same', kernel_regularizer=l2(1e-4))(conv1_2)
    nestnet_output_2 = Conv2D(num_class, (1, 1), activation='sigmoid', name='output_2', kernel_initializer = 'he_normal', padding='same', kernel_regularizer=l2(1e-4))(conv1_3)
    nestnet_output_3 = Conv2D(num_class, (1, 1), activation='sigmoid', name='output_3', kernel_initializer = 'he_normal', padding='same', kernel_regularizer=l2(1e-4))(conv1_4)
    nestnet_output_4 = Conv2D(num_class, (1, 1), activation='sigmoid', name='output_4', kernel_initializer = 'he_normal', padding='same', kernel_regularizer=l2(1e-4))(conv1_5)

    if deep_supervision:
        model = Model(img_input, [nestnet_output_1,nestnet_output_2,nestnet_output_3,nestnet_output_4])
    else:
        model = Model(img_input, [nestnet_output_4])
    
    return model

## Loss Functions

Loss Functions allows our network to measure the error and reduce the error by using gradient descent. This competition is evaluated on the mean [Dice coefficient](https://en.wikipedia.org/wiki/S%C3%B8rensen%E2%80%93Dice_coefficient). The Dice coefficient can be used to compare the pixel-wise agreement between a predicted segmentation and its corresponding ground truth. Since dice coefficient is the evaluation metric, we will use dice loss function as our loss function for the model. However, there are loss functions like Tversky, and Focal Tversky that you can experiment with for a better result.

In [None]:
# Dice similarity coefficient loss, brought to you by: https://github.com/nabsabraham/focal-tversky-unet
def dsc(y_true, y_pred):
    smooth = 1.
    y_true_f = Flatten()(y_true)
    y_pred_f = Flatten()(y_pred)
    intersection = reduce_sum(y_true_f * y_pred_f)
    score = (2. * intersection + smooth) / (reduce_sum(y_true_f) + reduce_sum(y_pred_f) + smooth)
    return score

def dice_loss(y_true, y_pred):
    loss = 1 - dsc(y_true, y_pred)
    return loss

def bce_dice_loss(y_true, y_pred):
    loss = binary_crossentropy(y_true, y_pred) + dice_loss(y_true, y_pred)
    return loss

## Compile & Fit The Model
Now we have our data generator, network architecture, loss function defined, we will compile and train the model in this section. You will notice we are using Adam as our optimizer. If you want to read more about Adam, or understand more about what optimizers are and why they are needed, this is a good article: https://medium.com/datadriveninvestor/overview-of-different-optimizers-for-neural-networks-e0ed119440c3.

In [None]:
# get an instance of the model
model = Nest_Net(img_h, img_w, color_type=1, num_class=4, deep_supervision=False)
# define optimizer 
adam = Adam(lr = 0.05, epsilon = 0.1)
model.compile(optimizer=adam, loss=bce_dice_loss, metrics=[dice_loss])

In [None]:
if load_pretrained_model:
    try:
        model.load_weights(pretrained_model_path)
        print('pre-trained model loaded!')
    except OSError:
        print('You need to run the model and load the trained model')

In [None]:
history = model.fit_generator(generator=training_generator, validation_data=validation_generator, epochs=epochs, verbose=1)

In [None]:
if save_model: 
    model.save(model_save_path)

## Model Insights
Using the history object of the model, we can review how in each epoch we did (in terms of reducing error or improving accuracy). In this section we will do two plots to show model accuracy and loss for our training and validation set per epoch.

In [None]:
# summarize history for accuracy
plt.figure(figsize=(20,5))
plt.subplot(1,2,1)
plt.plot(history.history['loss'])
plt.plot(history.history['dice_loss'])
plt.title('model accuracy')
plt.ylabel('accuracy')
plt.xlabel('epoch')
plt.legend(['train', 'validation'], loc='upper left')

# summarize history for loss
plt.subplot(1,2,2)
plt.plot(history.history['val_loss'])
plt.plot(history.history['val_dice_loss'])
plt.title('model loss')
plt.ylabel('loss')
plt.xlabel('epoch')
plt.legend(['train', 'validation'], loc='upper left')

## TODO

Now that we have our model you can extend this notebook to do the following:

1. Visualize the segmentations for the validation set.
2. Use the model to make segmentation for the training set.
3. Use the mask to rle function to create a submission file.

I have done the above for another challenge in this kernel: https://www.kaggle.com/ekhtiar/resunet-a-baseline-on-tensorflow. Use it for helping you solve the three challenges above. 