# Introduction

Deep Learning has been very successful in solving many complex problems in the real world. However, solving most of these challenges requires data. Additionally, having data is not just enough. The data has to be cleansed, annotated, and organized. Moreover, data in many domains is not accessible due to privacy constraints (Especially in the medical domain). Hence it becomes pertinent to create models that can learn from a limited amount of data.

<br>

## Problem Statement:
In this task, you are given a small dataset of the microscopic view of the cells.

*Your goal is to build a model that accurately predicts the cell regions as shown in the **Label Image**.*

#### **Input Image**

<a href="https://imgur.com/aLDNHwu"><img src="https://i.imgur.com/aLDNHwu.png" title="source: imgur.com" /></a>

#### **Label Image**

<a href="https://imgur.com/s1mIkFE"><img src="https://i.imgur.com/s1mIkFE.png" title="source: imgur.com" /></a>


## Dataset:

The dataset contains 30 training images along with the labels.

[**Dataset Link**](https://drive.google.com/drive/folders/1678Tggykj46SpJZS9mKMKHw7YFmiGMc8?usp=sharing)


## Evaluation

Share the submission Jupyter/Colab notebook with the relevant explanation. code and models. We will evaluate your model on our **test dataset**.

<br><br>
****Note that this is an interview assignment & is not involved in the development of any software or solution.**

### Instructions

1. Make a copy of this notebook to start editing & add your solution.
2. A dataset folder has been shared with you to train and test.
2. You can use any framework to develop the solution (Pytorch, Keras, Tensorflow, Theano, Caffe etc.).
3. This assignment is a great medium to get to know you better. Please feel free to connect, interact & develop the solution. I would be more than willing to help you out in any issues or problems that you face while solving the challenge. You can connect with me at suraj.donthi@elementals.ai.
4. The goal of this task is to understand how you approach solving a problem. The more you connect while developing the solution the better I will be able to understand you.
5. Submission Files:
    - Colab Notebook Link with Solutions Approach and Code Solution.
    - Trained Model Link.
    - Any other necessary files.
5. **Submission deadline: Within 5 days of recieving the assignment, no later than 12 AM IST on the due date. The exact due date shall be mentioned in the email or the portal where you recieve the assignment.** (You can share the Colab Notebook to the gmail address surajdonthi.th@gmail.com)

## Explain the technique you will use to solve the problem in detail.

- Include any model architectures, equations, diagrams etc. that is required to explain how you are going to solve the problem.

# New Section


# Cell Segmentation using U-Net: Solution Explanation

In this assignment, the task is to build a model that accurately predicts cell regions in microscopic images. We are given a small dataset of microscopic cell images along with their corresponding labeled masks. The goal is to develop a deep learning model that can segment the cells in the images.

Here's a detailed explanation of the approach taken to solve this problem:

## Model Selection: U-Net Architecture
Before jumping into the solution as a good practice ,I've referred few research papers (A deep learning-based algorithm for 2-D cell segmentation in microscopy images,U-Net: Convolutional Networks for Biomedical Image Segmentation) to get a good understaing of the starting point for this solution.

 Once the first step is cleared I've found that U-net which is a variant of FCNN which is best suitable for these kind of tasks ,keeping in mind the limited dataset provided.
The U-Net architecture is a popular choice for image segmentation tasks due to its ability to capture both high-level and low-level features. It consists of an encoder-decoder structure with skip connections, making it effective for precise pixel-wise segmentation. Two variations of the U-Net architecture are provided in the code: `UNet()` and `UNet2()`. Both architectures follow the same basic U-Net structure but differ in the number of filters used in each layer. The `UNet2()` architecture uses larger filter sizes, which can capture more complex features.So it is retained in the final notebook.

Here's an explanation of the architecture:

1. **Input Layer**:
   - The model takes grayscale images as input, and the input shape is `(image_size, image_size, 1)`, where `image_size` is the size of input image (512 x 512).

2. **Encoder (Downsampling Path)**:
   - The encoder consists of a series of down blocks, where each down block performs the following operations:
     - Convolutional Layer: Applies a 2D convolution operation with a specified number of filters and a ReLU activation function. This layer is responsible for capturing features in the input image.
     - Convolutional Layer (Again): Another convolution operation is applied to further enhance feature extraction.
     - Max Pooling: Downsamples the feature maps using max-pooling with a pool size of (2, 2). This operation reduces the spatial dimensions of the feature maps.
     - Dropout: Applies dropout regularization to the pooled feature maps to prevent overfitting.

   - The number of filters in the convolutional layers increases as we go deeper into the encoder (from `f[0]` to `f[3]` in the architecture).

3. **Bottleneck**:
   - After the encoder, there is a bottleneck layer. This layer typically has the highest number of filters (`f[4]`) and is intended to capture the most abstract features in the image.

4. **Decoder (Upsampling Path)**:
   - The decoder consists of a series of up blocks, where each up block performs the following operations:
     - UpSampling: Increases the spatial dimensions of the feature maps by a factor of 2 using up-sampling.
     - Concatenation: Concatenates the up-sampled feature maps with the corresponding feature maps from the encoder's down block (skip connections).
     - Convolutional Layer: Applies a 2D convolution operation to the concatenated feature maps with a ReLU activation.
     - Convolutional Layer (Again): Another convolution operation is applied for further feature refinement.
     - Dropout: Applies dropout regularization.

   - The number of filters in the convolutional layers decreases as we go deeper into the decoder (from `f[3]` to `f[0]` in the architecture).

5. **Output Layer**:
   - The final output layer consists of a 2D convolutional layer with a single filter and a sigmoid activation function. This layer produces the predicted binary masks, where each pixel value represents the probability that the corresponding pixel in the input image belongs to the object of interest (e.g., a cell nucleus).

In summary, this U-Net architecture combines both the encoder and decoder paths, with skip connections between corresponding encoder and decoder blocks. This allows the model to capture and retain fine details while simultaneously learning high-level features, making it well-suited for image segmentation tasks like cell segmentation. The model outputs binary masks that can be used to identify and segment objects of interest in input images.

here is the table showing the changes in the image dimensions after passing through each layer .

| Layer        | Input Size (H x W) | Output Size (H x W) |
|----------------|-----------------------------|----------------------|
| Input          | 512 x 512               | 512 x 512         |
| Down Block 1   | 512 x 512           | 256 x 256    |
| Down Block 2   | 256 x 256           | 128 x 128    |
| Down Block 3   | 128 x 128           | 64 x 64         |
| Down Block 4   | 64 x 64             | 32 x 32           |
| Bottleneck     | 32 x 32            	 | 32 x 32          |
| Up Block 1     | 32 x 32            	 | 64 x 64          |
| Up Block 2     | 64 x 64             	| 128 x 128       |
| Up Block 3     | 128 x 128           | 256 x 256        |
| Up Block 4     | 256 x 256           | 512 x 512        |
| Output (Sigmoid)| 512 x 512           | 512 x 512 |

## Data Preprocessing:

### Data Augmentation:

Data augmentation techniques such as rotation, scaling, and flipping have been applied to artificially increase the size of the training dataset and improve model generalization. However, in this implementation, data augmentation techniques are not explicitly shown,I've performed it in another notebook which I'll attach with the link for the notebook. This is done to ease the process of training and tuning of the model.

### Data Loading:

A custom data generator class (`DataGen`) is implemented to load and preprocess the dataset. This class inherits from `keras.utils.Sequence` and provides batch-wise data loading. The following steps are performed during data loading:

- Load both the input image and its corresponding labeled mask.
- Resize both the image and mask to a specified image size (512x512 in this case).
- Apply histogram equalization to enhance image contrast.
- Normalize the pixel values of both the image and mask to the range [0, 1].

## Model Training:

### Loss Function and Metrics:

The model is trained using the binary cross-entropy loss function, which is commonly used for binary image segmentation tasks. The optimizer chosen for training is Adam. Additionally, accuracy is used as a metric to monitor the model's performance during training.

### Training Data Split:

The dataset is split into a training set and a validation set. In this implementation, 60 images are used for training, and 10 images are reserved for validation. You can adjust these numbers as needed.

### Training Loop:

The model is intially set to be trained for 150 epochs with "early stopping" enabled for a patience value of "10" . For each epoch, the training data is divided into batches, and the model's weights are updated using backpropagation. The validation data are used to monitor the model's performance and prevent overfitting.

## Model Evaluation:

### Evaluation Metrics:

To evaluate the model's performance, two evaluation metrics are used:

1. Intersection over Union (IoU): IoU measures the overlap between the predicted mask and the ground truth mask. It is a commonly used metric for image segmentation tasks and provides a measure of segmentation accuracy.

2. Warping Error: Warping error quantifies the absolute pixel-wise difference between the predicted and ground truth masks. It is used to assess how well the predicted mask aligns with the ground truth mask.

### Visualization:

For each validation sample, the following steps are performed:

- Load the ground truth mask.
- Get the predicted mask from the model.
- Performed Morphological Closing
- Binarized the predicted mask using a threshold (0.6 ).
- Calculate both IoU and warping error.
- Visualize the ground truth mask and the predicted mask side by side for visual inspection.

## Model Save:

After training, the model's weights are saved to a file named "UNetW_assessment.h5" for future use.

## Conclusion:

The U-Net architecture is a powerful choice for image segmentation tasks like cell segmentation usually . This implementation provides a structured approach to loading, preprocessing, training, and evaluating the model. The combination of binary cross-entropy loss, Adam optimizer, and evaluation metrics like IoU and warping error helps in assessing and improving the model's performance.






================== `Your answer here. (Double click to edit)` ==================

In [None]:
import numpy as np
import matplotlib.pyplot as plt
import os
import cv2
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras.layers import Conv2D, MaxPooling2D, UpSampling2D, Input, Concatenate,Dropout,Flatten ,Reshape
from tensorflow.keras.models import Model
from tensorflow.keras.optimizers import Adam
from tensorflow.keras.metrics import MeanIoU
from tensorflow.keras.layers import Lambda
from tensorflow.keras.layers import Conv2D, MaxPooling2D, UpSampling2D, Input, Concatenate, Dropout, Flatten, Dense
from tensorflow.keras.models import Model

from tensorflow.keras.layers import Lambda
from tensorflow.keras import backend as K
import tensorflow_addons as tfa
from tensorflow.keras.metrics import MeanIoU

In [None]:


# Definining dataset path and other parameters/kaggle/input/ds-cell/dataset1980
dataset_path = '/kaggle/input/ds-cell/dataset1980'
image_size = 512
batch_size = 16
epochs = 150


In [None]:
#Model arch-10
# Defining the U-Net model architecture
def down_block(x, filters, kernel_size=(3, 3), padding="same", strides=1,dropout_rate=0.2):
    c = Conv2D(filters, kernel_size, padding=padding, strides=strides, activation="relu")(x)
    c = Conv2D(filters, kernel_size, padding=padding, strides=strides, activation="relu")(c)
    p = MaxPooling2D((2, 2), (2, 2))(c)
    p = Dropout(dropout_rate)(p)
    return c, p

def up_block(x, skip, filters, kernel_size=(3, 3), padding="same", strides=1,dropout_rate=0.2):
    us = UpSampling2D((2, 2))(x)
    concat = Concatenate()([us, skip])
    c = Conv2D(filters, kernel_size, padding=padding, strides=strides, activation="relu")(concat)
    c = Conv2D(filters, kernel_size, padding=padding, strides=strides, activation="relu")(c)
    c = Dropout(dropout_rate)(c)
    return c

def bottleneck(x, filters, kernel_size=(3, 3), padding="same", strides=1,dropout_rate=0.2):
    c = Conv2D(filters, kernel_size, padding=padding, strides=strides, activation="relu")(x)
    c = Conv2D(filters, kernel_size, padding=padding, strides=strides, activation="relu")(c)
    c = Dropout(dropout_rate)(c)
    return c


In [None]:

# STN Function
def spatial_transformer_network(input_layer):
    loc_net = Conv2D(8, (3, 3), activation='relu')(input_layer)
    loc_net = MaxPooling2D(pool_size=(2, 2))(loc_net)
    loc_net = Conv2D(10, (3, 3), activation='relu')(loc_net)
    loc_net = MaxPooling2D(pool_size=(2, 2))(loc_net)

    loc_net = Flatten()(loc_net)
    loc_net = Dense(50, activation='relu')(loc_net)
    loc_net = Dense(8, weights=[np.zeros((50, 8)), np.zeros(8)])(loc_net)  # Change 6 to 8

    loc_net = Reshape((2, 4))(loc_net)  # Change 3 to 4

    # Flatten the loc_net tensor to make it rank 1
    loc_net_flat = Flatten()(loc_net)

    x = Lambda(lambda args: tfa.image.transform(args[0], args[1]))([input_layer, loc_net_flat])

    return x

# Dfining the U-Net model architecture with STN
def UNet2_with_STN():
    f = [64, 128, 256, 512, 1024]
    inputs = Input((image_size, image_size, 1))

    # Apply STN to the input
    stn_output = spatial_transformer_network(inputs)

    # Concatenate STN output with the original input
    inputs_transformed = Concatenate()([inputs, stn_output])

    p0 = inputs_transformed
    c1, p1 = down_block(p0, f[0])
    c2, p2 = down_block(p1, f[1])
    c3, p3 = down_block(p2, f[2])
    c4, p4 = down_block(p3, f[3])

    bn = bottleneck(p4, f[4])

    u1 = up_block(bn, c4, f[3])
    u2 = up_block(u1, c3, f[2])
    u3 = up_block(u2, c2, f[1])
    u4 = up_block(u3, c1, f[0])

    outputs = Conv2D(1, (1, 1), padding="same", activation="sigmoid")(u4)
    model = Model(inputs, outputs)
    return model


In [None]:

class DataGen(keras.utils.Sequence):
    def __init__(self, ids, path, batch_size=16, image_size=512):
        self.ids = ids
        self.path = path
        self.batch_size = batch_size
        self.image_size = image_size
        self.on_epoch_end()

    def __load__(self, id_name):
        ## Path
        image_path = os.path.join(self.path, 'trainvolume/', id_name)
        mask_name = f'train-labels_{id_name.split("_")[1]}'
        mask_path = os.path.join(self.path, 'trainlabels/', mask_name)

        # Reading Image
        image = cv2.imread(image_path, 0)
        image = cv2.resize(image, (self.image_size, self.image_size))

#         Loading the label image as grayscale
        mask = cv2.imread(mask_path, cv2.IMREAD_GRAYSCALE)
        mask = cv2.resize(mask, (self.image_size, self.image_size))
        mask = np.expand_dims(mask, axis=-1)
            # Histogram equalization
        # image = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
        image = cv2.equalizeHist(image)
        ## Normalizing
        image = image / 255.0
        mask = mask / 255.0

        return image, mask

    def __getitem__(self, index):
        if (index + 1) * self.batch_size > len(self.ids):
            self.batch_size = len(self.ids) - index * self.batch_size

        files_batch = self.ids[index * self.batch_size: (index + 1) * self.batch_size]

        image = []
        mask = []

        for id_name in files_batch:
            _img, _mask = self.__load__(id_name)
            image.append(_img)
            mask.append(_mask)

        image = np.array(image)
        mask = np.array(mask)

        return image, mask

    def on_epoch_end(self):
        pass

    def __len__(self):
        return int(np.ceil(len(self.ids) / float(self.batch_size)))


In [None]:
from tensorflow.keras.callbacks import EarlyStopping
## Training Ids
train_ids = [f'train-volume_{str(i)}.tif' for i in range(1,1981)]
## Validation Data Size
np.random.shuffle(train_ids)

val_data_size = 576
valid_ids = train_ids[:val_data_size]
train_ids = train_ids[val_data_size:]
np.random.shuffle(valid_ids)
np.random.shuffle(train_ids)

gen = DataGen(train_ids, dataset_path, batch_size=batch_size, image_size=image_size)
valid_gen = DataGen(valid_ids, dataset_path, batch_size=batch_size, image_size=image_size)
# Define and compile the U-Net model
strategy = tf.distribute.MirroredStrategy()
with strategy.scope():

    model = UNet2_with_STN()
    
    mean_iou = MeanIoU(num_classes=2) 
    model.compile(optimizer="adam", loss="binary_crossentropy", metrics=["acc",mean_iou])
    
# model.compile(optimizer="adam", loss="binary_crossentropy", metrics=["acc"])
train_steps = len(train_ids) // batch_size
valid_steps = len(valid_ids) // batch_size

early_stopping = EarlyStopping(monitor='val_loss',  # Monitor validation loss
                               patience=10,         # Number of epochs with no improvement to wait before stopping
                               restore_best_weights=True)  # Restore model weights to the best observed during training

model.fit(gen, validation_data=valid_gen, steps_per_epoch=train_steps,
          validation_steps=valid_steps, epochs=epochs, callbacks=[early_stopping])
# Saving the trained model weights
model.save_weights("UNet+stn_model_finalcall.h5")

In [None]:
import cv2
import numpy as np

def intersection_over_union(y_true, y_pred):
    # Resize predicted mask to match the shape of true mask
    y_pred_resized = cv2.resize(y_pred, (y_true.shape[1], y_true.shape[0]))
    
    # Convert masks to binary images
    y_true_binary = (y_true > 0).astype(np.uint8)
    y_pred_binary = (y_pred_resized > 0).astype(np.uint8)

    # Perform element-wise multiplication
    intersection = np.sum(y_true_binary * y_pred_binary)
    union = np.sum(y_true_binary) + np.sum(y_pred_binary) - intersection
    iou = intersection / (union + 1e-8)  # Adding a small epsilon to avoid division by zero

    return iou


In [None]:

def calculate_warping_error(true_image, warped_image):
    # Ensure both images have the same shape
    if true_image.shape != warped_image.shape:
        return 0
    if len(true_image.shape) == 3:
        true_image = cv2.cvtColor(true_image, cv2.COLOR_BGR2GRAY)
    if len(warped_image.shape) == 3:
        warped_image = cv2.cvtColor(warped_image, cv2.COLOR_BGR2GRAY)

    diff = np.abs(true_image.astype(np.float32) - warped_image.astype(np.float32))

    warping_error = np.mean(diff)
    
    return warping_error



In [None]:
import matplotlib.pyplot as plt

def plot_masks(true_mask, pred_mask):

    fig, axes = plt.subplots(1, 2, figsize=(10, 5))

    # Plot the ground truth mask on the first subplot
    axes[0].imshow(true_mask, cmap='gray')
    axes[0].set_title('Ground Truth Mask')
    axes[0].axis('off')

    # Plot the predicted mask on the second subplot
    axes[1].imshow(pred_mask, cmap='gray')
    axes[1].set_title('Predicted Mask')
    axes[1].axis('off')

    plt.show()

In [None]:

predicted_masks = model.predict(valid_gen)

kernel = np.ones((1, 1), np.uint8)

processed_masks = []

for i in range(len(predicted_masks)):
    pred_mask = predicted_masks[i].squeeze()

    processed_mask = cv2.morphologyEx(pred_mask, cv2.MORPH_CLOSE, kernel)

    processed_masks.append(processed_mask)

processed_masks = np.array(processed_masks)

iou_scores = []

for i in range(len(valid_ids)):
    # Loading the ground truth mask
    mask_name = f'train-labels_{valid_ids[i].split("_")[1]}'
    mask_path = os.path.join(dataset_path, 'trainlabels/', mask_name)
    true_mask = cv2.imread(mask_path, cv2.IMREAD_GRAYSCALE)
    true_mask = (true_mask > 128).astype(np.uint8)

    pred_mask = processed_masks[i]
    pred_mask = (pred_mask > 0.5).astype(np.uint8)  # Binarizing the predicted mask

    iou = intersection_over_union(true_mask, pred_mask)
    
    iou_scores.append(iou)
    if i %64 ==0 :
        plot_masks(true_mask,pred_mask )
        print("IoU for one sample:", iou)
        print("Warping Error:", warping_error)
    warping_error = calculate_warping_error(true_mask, pred_mask)

# Calculating the mean IoU across all validation samples
mean_iou = np.mean(iou_scores)
print(f"Mean IoU: {mean_iou}")


In [None]:
# serializing model to JSON
model_json = model.to_json()
with open("model_final.json", "w") as json_file:
    json_file.write(model_json)


In [None]:
# Mean IoU: 0.9651099867947772 - keranlsize = (1,1) - threshold value = 0.6
# Mean IOu : 0.97500000000000 - Kernalsize = (1,1) - threshold value = 0.5
