<a href="https://colab.research.google.com/github/suhaaskarthik/segmentation-projects/blob/main/humanparser_segmenation.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [None]:
# IMPORTANT: SOME KAGGLE DATA SOURCES ARE PRIVATE
# RUN THIS CELL IN ORDER TO IMPORT YOUR KAGGLE DATA SOURCES.
import kagglehub
kagglehub.login()


In [None]:
# IMPORTANT: RUN THIS CELL IN ORDER TO IMPORT YOUR KAGGLE DATA SOURCES,
# THEN FEEL FREE TO DELETE THIS CELL.
# NOTE: THIS NOTEBOOK ENVIRONMENT DIFFERS FROM KAGGLE'S PYTHON
# ENVIRONMENT SO THERE MAY BE MISSING LIBRARIES USED BY YOUR
# NOTEBOOK.

soumikrakshit_human_segmentation_path = kagglehub.dataset_download('soumikrakshit/human-segmentation')
suhaaskarthikeyan_best_checkpoint_path = kagglehub.dataset_download('suhaaskarthikeyan/best-checkpoint')

print('Data source import complete.')


## Human body segmentation task
In this notebook we build and train an AI model that can predict masks to segment humans from an image.The training data consists of images and their respective masks. These masks must be converted to pure black  and white format (1 and 0). After preprocessing the dataset, creating the training and testing datasets, we build our Unet model. This model is particularly useful for medical image segmentation, but can be used for a wide variety of segmentation tasks. We add data augmentation, learning rate schedulers(not used in this notebook), and checkpoint callbacks. After training the model we test with our very own custom images. This dataset consists of approx. 28,000 images which require faster GPUs with greater RAM, hence training should be done with a portion of the dataset. I've already trained this model(with full training dataset) using the A100 GPU provided by google colab pro. Hence the model has reached pretty great results. So please try to play around with this notebook and any suggestions and improvements are appreciated...


We load 5000 training images, u can play around with this number (more training images, greater the accuracy), but you must account for RAM and GPU speed. We shuffle the dataset and split training and testing

In [None]:
import random
import os

#you can select the number of training samples needed by changing from 5000
files = os.listdir('/kaggle/input/human-segmentation/instance-level_human_parsing/instance-level_human_parsing/Training/Images')[:5000]
files = [f'/kaggle/input/human-segmentation/instance-level_human_parsing/instance-level_human_parsing/Training/Images/{file}' for file in files]
random.seed(123)
random.shuffle(files)
train_sample = int(len(files)*0.9)
training_files = files[:train_sample]
testing_files = files[train_sample:]

We load the dataset, not directly, but through symbolic tensors. This would create a computational graph, which will be the executed during the training process. So loading the data and preprocessing steps will happen during model fitting. Here we wrap the preprocessing function around a py_function to ensure compatibility to numpy and other data formats.

In [None]:
import tensorflow as tf
def preprocess(file):
    # Decode file path
    path = file.numpy().decode("utf-8")

    # Load and process the real image
    real_img = tf.io.read_file(path)
    real_img = tf.image.decode_jpeg(real_img, channels=3)
    real_img = tf.image.resize(real_img, (256, 256))
    real_img = tf.cast(real_img, tf.float32) / 255.0  # Normalize to [0,1]

    # Load and process the mask image
    fileno = (path.split('/')[-1]).split('.')[0]
    img_path = f'/kaggle/input/human-segmentation/instance-level_human_parsing/instance-level_human_parsing/Training/Human/{fileno}.png'

    mask_img = tf.io.read_file(img_path)
    mask_img = tf.image.decode_png(mask_img, channels=1)  # Read as grayscale
    mask_img = tf.image.resize(mask_img, (256, 256))

    # Normalize mask image and apply threshold
    mask_img = tf.cast(mask_img, tf.float32) / 255.0  # Normalize to [0,1]
    mask_img = tf.where(mask_img >0.0, 1.0, 0.0)  # Thresholding-  to convert grayscale to pure black and white

    return real_img, mask_img

def load_data(file_path):
    img, mask = tf.py_function(
        func=preprocess,
        inp=[file_path],
        Tout=[tf.float32, tf.float32]
    )

    # Explicitly set tensor shapes
    img.set_shape((256, 256, 3))
    mask.set_shape((256, 256, 1))

    return img, mask

Here we load our training and testing dataset by converting the train and test files into a tensorflow dataset, we map them to the preprocessing function and batch them up. We do the same thing with the testing files

In [None]:
# Load and prepare training dataset
training_dataset = tf.data.Dataset.from_tensor_slices(training_files)
training_dataset = training_dataset.map(load_data, num_parallel_calls=tf.data.AUTOTUNE)
training_dataset = training_dataset.batch(64).prefetch(tf.data.AUTOTUNE)

# Load and prepare testing dataset
testing_dataset = tf.data.Dataset.from_tensor_slices(testing_files)
testing_dataset = testing_dataset.map(load_data, num_parallel_calls=tf.data.AUTOTUNE)
testing_dataset = testing_dataset.batch(64).prefetch(tf.data.AUTOTUNE)

We can take a sample from our training dataset and visualise the images along with their masks

In [None]:
import matplotlib.pyplot as plt
for i,o in training_dataset.take(1):
  for image, mask in zip(i, o):
    fig, ax = plt.subplots(1, 2, figsize=(10, 5))
    ax[0].imshow(tf.squeeze(mask), cmap="gray")
    ax[0].set_title("Grayscale Image")
    ax[0].axis("off")
    new_arr = tf.cast(image,dtype=tf.float32)
    ax[1].imshow(new_arr)
    ax[1].set_title("Black & White (Thresholded)")
    ax[1].axis("off")
    plt.show()

In [None]:
import tensorflow as tf
from tensorflow.keras.layers import Input, Conv2D, MaxPooling2D, UpSampling2D, concatenate, Dropout
from tensorflow.keras.models import Model
from tensorflow.keras.losses import BinaryCrossentropy
import tensorflow.keras.backend as K

Data augmentation steps

In [None]:
data_augmentation = tf.keras.Sequential([
    tf.keras.layers.RandomFlip("horizontal"),    # Random horizontal flip
    tf.keras.layers.RandomFlip("vertical"),      # Random vertical flip
    tf.keras.layers.RandomRotation(0.2),         # Random rotation (between -0.2 and 0.2 radians)
    tf.keras.layers.RandomZoom(0.2),             # Random zoom (between 0.8 and 1.2 scale)
    tf.keras.layers.RandomContrast(0.2),         # Random contrast adjustment
    tf.keras.layers.RandomBrightness(0.2),       # Random brightness adjustment
    tf.keras.layers.RandomTranslation(height_factor=0.2, width_factor=0.2),  # Random translation
    tf.keras.layers.RandomHeight(0.2),           # Random height change
    tf.keras.layers.RandomWidth(0.2),            # Random width change
])

Here we create our Unet model, which cosists of an encoder and decoder along with a bottle-neck. The encoder downsamples with convolutional layers, whereas the decoder upsamples with conv2d layers. The bottle-neck mediates the process. Final we add concatenation layers (residual layers) to ensure a good flow of information across the complex architecture. Here we use a different kind of loss function called dice loss, that measures similairites between the predicted and actual mask. Model compilation process is also done here

In [None]:
import tensorflow as tf
from tensorflow.keras.layers import Input, Conv2D, MaxPooling2D, UpSampling2D, concatenate, Dropout
from tensorflow.keras.models import Model
from tensorflow.keras.losses import BinaryCrossentropy
import tensorflow.keras.backend as K

# Dice Loss Function
def dice_coefficient(y_true, y_pred, smooth=1e-6):
    y_true_f = K.flatten(y_true)
    y_pred_f = K.flatten(y_pred)
    intersection = K.sum(y_true_f * y_pred_f)
    return (2. * intersection + smooth) / (K.sum(y_true_f) + K.sum(y_pred_f) + smooth)

def dice_loss(y_true, y_pred):
    return 1 - dice_coefficient(y_true, y_pred)

# U-Net Model
def unet(input_size=(256, 256, 3)):
    inputs = Input(input_size)
    inputs = data_augmentation(inputs)
    # Encoder (Contracting Path)
    c1 = Conv2D(64, (3, 3), activation='relu', padding='same')(inputs)
    c1 = Conv2D(64, (3, 3), activation='relu', padding='same')(c1)
    p1 = MaxPooling2D(pool_size=(2, 2))(c1)

    c2 = Conv2D(128, (3, 3), activation='relu', padding='same')(p1)
    c2 = Conv2D(128, (3, 3), activation='relu', padding='same')(c2)
    p2 = MaxPooling2D(pool_size=(2, 2))(c2)

    c3 = Conv2D(256, (3, 3), activation='relu', padding='same')(p2)
    c3 = Conv2D(256, (3, 3), activation='relu', padding='same')(c3)
    p3 = MaxPooling2D(pool_size=(2, 2))(c3)

    c4 = Conv2D(512, (3, 3), activation='relu', padding='same')(p3)
    c4 = Conv2D(512, (3, 3), activation='relu', padding='same')(c4)
    p4 = MaxPooling2D(pool_size=(2, 2))(c4)

    # Bottleneck
    c5 = Conv2D(1024, (3, 3), activation='relu', padding='same')(p4)
    c5 = Conv2D(1024, (3, 3), activation='relu', padding='same')(c5)

    # Decoder (Expanding Path)
    u6 = UpSampling2D(size=(2, 2))(c5)
    u6 = concatenate([u6, c4])  # Skip connection
    c6 = Conv2D(512, (3, 3), activation='relu', padding='same')(u6)
    c6 = Conv2D(512, (3, 3), activation='relu', padding='same')(c6)

    u7 = UpSampling2D(size=(2, 2))(c6)
    u7 = concatenate([u7, c3])
    c7 = Conv2D(256, (3, 3), activation='relu', padding='same')(u7)
    c7 = Conv2D(256, (3, 3), activation='relu', padding='same')(c7)

    u8 = UpSampling2D(size=(2, 2))(c7)
    u8 = concatenate([u8, c2])
    c8 = Conv2D(128, (3, 3), activation='relu', padding='same')(u8)
    c8 = Conv2D(128, (3, 3), activation='relu', padding='same')(c8)

    u9 = UpSampling2D(size=(2, 2))(c8)
    u9 = concatenate([u9, c1])
    c9 = Conv2D(64, (3, 3), activation='relu', padding='same')(u9)
    c9 = Conv2D(64, (3, 3), activation='relu', padding='same')(c9)

    outputs = Conv2D(1, (1, 1), activation='sigmoid')(c9)  # Binary Segmentation Output

    model = Model(inputs, outputs)
    return model

# Compile Model
model = unet()
model.compile(optimizer=tf.keras.optimizers.Adam(learning_rate=1e-4),
              loss=[dice_loss, BinaryCrossentropy()],
              metrics=[dice_coefficient])

model.summary()

This block trains the model and adds a checkpoint to save the model for each epoch. I trained this model in google colab with full dataset(15 epochs) and A100 gpu that took almost an hour and yielded a dice loss of around 0.11, this is a pretty great score!!

In [None]:
from tensorflow.keras.callbacks import ModelCheckpoint

# Define the checkpoint callback
checkpoint_callback = ModelCheckpoint(
    filepath="model_checkpoint_epoch_{epoch:02d}.keras",  # Save model after each epoch
    save_weights_only=False,  # Set to True if you only want to save weights
    save_best_only=False,  # Set to True to save only the best model based on validation loss
    verbose=1
)
#model.fit(training_dataset,validation_data = testing_dataset,epochs = 15, verbose = 1, batch_size = 64,callbacks =[checkpoint_callback])

After the training is over we can test with custom images, or we can just use some image from testing dataset.We load the model with the latest checkpoint. When loading we must make sure we call the dice loss and dice coefficient functions again, like so.

In [None]:
from tensorflow.keras.models import load_model
def dice_coefficient(y_true, y_pred, smooth=1e-6):
    y_true_f = K.flatten(y_true)
    y_pred_f = K.flatten(y_pred)
    intersection = K.sum(y_true_f * y_pred_f)
    return (2. * intersection + smooth) / (K.sum(y_true_f) + K.sum(y_pred_f) + smooth)

def dice_loss(y_true, y_pred):
    return 1 - dice_coefficient(y_true, y_pred)
model = load_model('/kaggle/input/best-checkpoint/model_checkpoint_epoch_13.keras',custom_objects={'dice_loss': dice_loss, 'dice_coefficient':dice_coefficient})

As we can see,the output produced by the model, seems to be a bit messy here and there. But overall the model performs very well. It is able to perfectly capture faces and other body parts, but it is not able to perfectly capture some other finer details

In [None]:
import matplotlib.pyplot as plt
def preprocess(path):
    real_img = tf.io.read_file(path)
    real_img = tf.image.decode_jpeg(real_img, channels=3)
    real_img = tf.image.resize(real_img, (256, 256))
    real_img = tf.cast(real_img, tf.float32) / 255.0
    return real_img

#go through the testing dataset
for test_img in os.listdir('/kaggle/input/human-segmentation/instance-level_human_parsing/instance-level_human_parsing/Testing/Images')[:10]:
    res = preprocess(f'/kaggle/input/human-segmentation/instance-level_human_parsing/instance-level_human_parsing/Testing/Images/{test_img}')
    plt.imshow(res)
    plt.show()
    mask =model.predict(tf.expand_dims(res,axis = 0))
    #mutiply mask with the actual image to segment the human beings present in the photo
    plt.imshow(tf.squeeze(mask*res))
    plt.show()