Problem 1 - Learning and Estimation

In [23]:
import os
import numpy as np
from keras.preprocessing.image import load_img, img_to_array

# Function to preprocess images (both images and labels)
def preprocess_images(image_path, target_size=(256, 256)):
    images = []
    for file_name in sorted(os.listdir(image_path)):
        image = load_img(os.path.join(image_path, file_name), target_size=target_size)
        image = img_to_array(image) / 255.0  # Normalize the image
        images.append(image)
    return np.array(images)

# Function to load images and labels
def load_data(images_dir, labels_dir, image_size=(256, 256)):
    # Load the images and labels
    images = preprocess_images(images_dir, image_size)
    labels = preprocess_images(labels_dir, image_size)

    # Check if the arrays are empty
    if images.size == 0:
        raise ValueError("Error: No images found in the directory")
    if labels.size == 0:
        raise ValueError("Error: No labels found in the directory")

    # Check if images and labels have the same number of elements
    if images.shape[0] != labels.shape[0]:
        raise ValueError("Error: Number of images and labels do not match")

    # Expand the dimensions of labels if they are grayscale images (3D array)
    if len(labels.shape) == 3:  # If labels are grayscale (3D array)
        labels = np.expand_dims(labels, axis=-1)  # Add a channel dimension to labels (4D array)

    return images, labels

# Path to dataset
train_images_path = "data/membrane/train/image"
train_labels_path = "data/membrane/train/label"
test_images_path = "data/membrane/test"

# Load data
try:
    train_images, train_labels = load_data(train_images_path, train_labels_path)
    print(f"train_images shape: {train_images.shape}")
    print(f"train_labels shape: {train_labels.shape}")

except ValueError as e:
    print(e)


train_images shape: (30, 256, 256, 3)
train_labels shape: (30, 256, 256, 3)


In [34]:
import numpy as np
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Conv2D, MaxPooling2D, UpSampling2D, Flatten, Dense, Reshape, Input
from tensorflow.keras.preprocessing.image import ImageDataGenerator
from tensorflow.keras.optimizers import Adam
from tensorflow.keras import backend as K

# Assuming train_images and val_images are already loaded as numpy arrays and have shape (num_samples, 256, 256, 3)

# Normalize the images to the range [0, 1]
train_images = train_images / 255.0
val_images = val_images / 255.0

# Assuming you have corresponding labels for the images
train_labels = train_labels / 255.0
val_labels = val_labels / 255.0

# Define the model architecture
input_layer = Input(shape=(256, 256, 3))  # Input shape for RGB images
model = Sequential()

# Contracting Path (Encoder)
model.add(Conv2D(32, (3, 3), activation='relu', padding='same', input_shape=(256, 256, 3)))
model.add(MaxPooling2D(pool_size=(2, 2)))

model.add(Conv2D(64, (3, 3), activation='relu', padding='same'))
model.add(MaxPooling2D(pool_size=(2, 2)))

model.add(Conv2D(128, (3, 3), activation='relu', padding='same'))
model.add(MaxPooling2D(pool_size=(2, 2)))

model.add(Conv2D(256, (3, 3), activation='relu', padding='same'))
model.add(MaxPooling2D(pool_size=(2, 2)))

# Bottom (Bottleneck)
model.add(Conv2D(512, (3, 3), activation='relu', padding='same'))

# Expanding Path (Decoder)
model.add(UpSampling2D(size=(2, 2)))
model.add(Conv2D(256, (3, 3), activation='relu', padding='same'))

model.add(UpSampling2D(size=(2, 2)))
model.add(Conv2D(128, (3, 3), activation='relu', padding='same'))

model.add(UpSampling2D(size=(2, 2)))
model.add(Conv2D(64, (3, 3), activation='relu', padding='same'))

model.add(UpSampling2D(size=(2, 2)))
model.add(Conv2D(32, (3, 3), activation='relu', padding='same'))

# Output layer
model.add(Conv2D(3, (3, 3), activation='sigmoid', padding='same'))  # Output has 3 channels for RGB

# Compile the model
model.compile(optimizer=Adam(), loss='mean_squared_error', metrics=['accuracy'])

# Data augmentation
datagen = ImageDataGenerator(
    rotation_range=30,
    width_shift_range=0.2,
    height_shift_range=0.2,
    shear_range=0.2,
    zoom_range=0.2,
    horizontal_flip=True,
    fill_mode='nearest'
)

# Fit the model using the augmented training data
model.fit(
    datagen.flow(train_images, train_labels, batch_size=32),
    epochs=50,  # Number of epochs, adjust as needed
    validation_data=(val_images, val_labels)
)

# Save the trained model after training
model.save('unet_model.h5')

# You can later load the model using:
model = tf.keras.models.load_model('unet_model.h5')


  self._warn_if_super_not_called()


Epoch 1/50
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m42s[0m 42s/step - accuracy: 1.0000 - loss: 0.2500 - val_accuracy: 0.0000e+00 - val_loss: 0.2487
Epoch 2/50
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m34s[0m 34s/step - accuracy: 0.0000e+00 - loss: 0.2487 - val_accuracy: 0.0000e+00 - val_loss: 0.2442
Epoch 3/50
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m34s[0m 34s/step - accuracy: 0.0000e+00 - loss: 0.2442 - val_accuracy: 0.9370 - val_loss: 0.2220
Epoch 4/50
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m29s[0m 29s/step - accuracy: 0.9370 - loss: 0.2220 - val_accuracy: 0.9491 - val_loss: 0.1318
Epoch 5/50
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m24s[0m 24s/step - accuracy: 0.9491 - loss: 0.1318 - val_accuracy: 0.0327 - val_loss: 0.0110
Epoch 6/50
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m34s[0m 34s/step - accuracy: 0.0327 - loss: 0.0110 - val_accuracy: 7.6294e-05 - val_loss: 7.5813e-04
Epoch 7/50
[1m1/1[0m



Problem 2 - Code Reading

Summary of "U-Net: Convolutional Networks for Biomedical Image Segmentation"

The U-Net paper introduces a convolutional neural network architecture that is especially designed for biomedical image segmentation. Here's a breakdown of the key elements:

Architecture Overview:

Contracting Path (Encoder): This part of the network uses convolution and max-pooling layers to capture context and reduce the spatial dimensions of the input image. It essentially extracts features from the image.

Bottleneck: This is where the model processes the most abstract features, often with smaller spatial dimensions.

Expansive Path (Decoder): This part of the network uses transposed convolutions (up-sampling) to recover spatial dimensions and refine the output by adding context back into the segmented regions. The expansive path essentially "decodes" the high-level features into pixel-wise segmentation maps.

Skip Connections: These connections directly pass feature maps from the contracting path to the corresponding layers in the expansive path, which helps to recover fine-grained details during decoding. This helps U-Net avoid losing spatial information during down-sampling.

Final Output:

U-Net produces pixel-wise binary classification maps (for binary segmentation tasks), with the final layer typically being a softmax or sigmoid activation function for segmentation tasks.
Loss Function:

The most common loss function for U-Net is binary cross-entropy for binary segmentation or categorical cross-entropy for multi-class segmentation tasks.
Training Strategy:

U-Net is often trained on a relatively small number of images, relying on data augmentation techniques like rotations, flips, and scaling to increase the variety of training data.


Applicability:

While originally designed for biomedical image segmentation, U-Net has been successfully applied in many other domains, such as satellite image segmentation, road detection, and more.

Code Walkthrough Based on U-Net Architecture

1. Model Architecture:
Input Layer: The input to the network is a 2D or 3D image.
Encoder: The encoder typically consists of convolutional layers followed by max-pooling operations to downsample the input.
Bottleneck: After downsampling, the network processes the most abstract features at a bottleneck point, where the spatial dimensions are reduced.
2. 
Decoder: Transposed convolution layers (also called deconvolution layers) are used to upsample the data and recover spatial resolution.
Skip Connections: These connections pass features from the encoder directly to the corresponding decoder layers.