# Table of Contents

>[Table of Contents](#scrollTo=9f3FiNe19Lrb)

>[Theory](#scrollTo=ooaZBv0173IE)

>[Install Required Packages](#scrollTo=sWTizU2Tz_xG)

>[Mount Google Drive and Set Paths](#scrollTo=6fJyJYrY1Prn)

>[Configuration](#scrollTo=eevcuFP31VyS)

>[Define U-Net Model](#scrollTo=EE0gnzOC1gc4)

>[Load and Preprocess Data](#scrollTo=LrdAEPcd1n3V)

>[Train the Model](#scrollTo=KWiggjdR1xSw)

>[Test and Evaluate the Model](#scrollTo=iNxBRpTHgNAu)

>[Visualization](#scrollTo=lp19_OYYtcWB)

>[Bonus [convert to lite format]](#scrollTo=OX7TrzB9jz-a)



# Theory

Image segmentation is the process of partitioning an image into multiple segments, where each segment represents a meaningful region or object within the image. The goal is to simplify and/or change the representation of an image into something that is more meaningful and easier to analyze.

The task involves separating the regions of the image that contain glasses from the background or other objects.

| Architecture | Architecture Type | Suitability for Semantic Segmentation | Performance | Computational Efficiency |
|--------------|-------------------|-----------------------------------------|-------------|--------------------------|
| U-Net        | Fully Convolutional Network (FCN) | Excellent | High | Moderate |
| DeepLab      | Fully Convolutional Network (FCN) | Excellent | High | Moderate |
| Mask R-CNN   | Combination of object detection and instance segmentation | Good | Very High | Lower |
| FCN          | Fully Convolutional Network (FCN) | Good | Moderate | High |





Based on this comparison, U-Net and DeepLab both offer excellent suitability for semantic segmentation tasks and high performance, making them strong candidates for glasses image segmentation. While Mask R-CNN provides very high performance, it may be overkill for this task, considering its complexity and computational requirements. FCN, while also suitable, may require more tuning to achieve comparable performance to U-Net and DeepLab. Ultimately, the choice between U-Net and DeepLab would depend on specific requirements such as computational resources and the desired balance between accuracy and efficiency.

Conclusion: U-Net emerged as the most suitable choice among the alternatives.








# Install Required Packages

In [None]:
!pip3 install numpy



In [1]:
import os
import numpy as np
import cv2
import time
import matplotlib.pyplot as plt
from tqdm import tqdm
from glob import glob
from sklearn.metrics import jaccard_score, f1_score

In [2]:
import tensorflow as tf
from tensorflow.keras.layers import Conv2D, Activation, BatchNormalization, Concatenate, Conv2DTranspose, Input, MaxPool2D
from tensorflow.keras.models import Model
from tensorflow.keras.callbacks import CSVLogger, EarlyStopping, ModelCheckpoint, ReduceLROnPlateau

# Mount Google Drive and Set Paths

In [3]:
from google.colab import drive
drive.mount("/content/drive")

Mounted at /content/drive


In [4]:
drive_path = "/content/drive/MyDrive"

In [5]:
dataset_path = os.path.join(drive_path, "datasets", "eyeglasses_dataset")

files_dir = os.path.join(drive_path, "Colab Notebooks", "files", "eyeglasses")
model_file = os.path.join(files_dir, "unet-eyeglasses.h5")
log_file = os.path.join(files_dir, "log-eyeglasses.csv")

In [None]:
def create_dir(path):
  if not os.path.exists(path):
    os.makedirs(path)

In [None]:
create_dir(files_dir)

# Configuration

In [None]:
os.environ["PYTHONHASHSEED"] = str(42)
np.random.seed(42)
tf.random.set_seed(42)

In [None]:
batch_size = 4
lr = 1e-4 ## 0.0001
epochs = 16 ## 100
height = 512
width = 512

# Define U-Net Model

In [None]:
def conv_block(input, num_filters):
    x = Conv2D(num_filters, 3, padding="same")(input)
    x = BatchNormalization()(x)
    x = Activation("relu")(x)

    x = Conv2D(num_filters, 3, padding="same")(x)
    x = BatchNormalization()(x)
    x = Activation("relu")(x)

    return x

In [None]:
def encoder_block(input, num_filters):
    x = conv_block(input, num_filters)
    p = MaxPool2D((2, 2))(x)
    return x, p

In [None]:
def decoder_block(input, skip_features, num_filters):
    x = Conv2DTranspose(num_filters, (2, 2), strides=2, padding="same")(input)
    x = Concatenate()([x, skip_features])
    x = conv_block(x, num_filters)
    return x

In [None]:
def build_unet(input_shape):
    inputs = Input(input_shape)

    s1, p1 = encoder_block(inputs, 64)
    s2, p2 = encoder_block(p1, 128)
    s3, p3 = encoder_block(p2, 256)
    s4, p4 = encoder_block(p3, 512)

    b1 = conv_block(p4, 1024)

    d1 = decoder_block(b1, s4, 512)
    d2 = decoder_block(d1, s3, 256)
    d3 = decoder_block(d2, s2, 128)
    d4 = decoder_block(d3, s1, 64)

    outputs = Conv2D(1, 1, padding="same", activation="sigmoid")(d4)

    model = Model(inputs, outputs, name="U-Net")
    return model

In [None]:
if __name__ == "__main__":
    input_shape = (512, 512, 3)
    model = build_unet(input_shape)
    model.summary()

Model: "U-Net"
__________________________________________________________________________________________________
 Layer (type)                Output Shape                 Param #   Connected to                  
 input_1 (InputLayer)        [(None, 512, 512, 3)]        0         []                            
                                                                                                  
 conv2d (Conv2D)             (None, 512, 512, 64)         1792      ['input_1[0][0]']             
                                                                                                  
 batch_normalization (Batch  (None, 512, 512, 64)         256       ['conv2d[0][0]']              
 Normalization)                                                                                   
                                                                                                  
 activation (Activation)     (None, 512, 512, 64)         0         ['batch_normalization[0][0

# Load and Preprocess Data

In [None]:
def load_data(path):
  train_x = sorted(glob(os.path.join(path, 'train', 'images', '*')))
  train_y = sorted(glob(os.path.join(path, 'train', 'masks', '*')))

  valid_x = sorted(glob(os.path.join(path, 'val', 'images', '*')))
  valid_y = sorted(glob(os.path.join(path, 'val', 'masks', '*')))

  return (train_x, train_y), (valid_x, valid_y)

In [None]:
def read_image(path):
  path = path.decode()
  x = cv2.imread(path, cv2.IMREAD_COLOR)
  x = x/255.0
  return x

In [None]:
def read_mask(path):
  path = path.decode()
  x = cv2.imread(path, cv2.IMREAD_GRAYSCALE)
  x = x/255.0
  x = np.expand_dims(x, axis = -1)
  return x

In [None]:
def tf_parse(x, y):
  def _parse(x, y):
    x = read_image(x)
    y = read_mask(y)
    return x, y

  x, y = tf.numpy_function(_parse, [x, y], [tf.float64, tf.float64])

  x.set_shape([height, width, 3])
  y.set_shape([height, width, 1])

  return x, y

In [None]:
def tf_dataset(x, y, batch=8):
  dataset = tf.data.Dataset.from_tensor_slices((x, y))
  dataset = dataset.map(tf_parse, num_parallel_calls=tf.data.AUTOTUNE)
  dataset = dataset.batch(batch)
  dataset = dataset.prefetch(tf.data.AUTOTUNE)
  return dataset

In [None]:
(train_x, train_y), (valid_x, valid_y) = load_data(dataset_path)
print(f'Train: {len(train_x)} - {len(train_y)}')
print(f'val: {len(valid_x)} - {len(valid_y)}')

Train: 1992 - 1992
val: 489 - 489


In [None]:
train_dataset = tf_dataset(train_x, train_y, batch=batch_size)
valid_dataset = tf_dataset(valid_x, valid_y, batch=batch_size)

In [None]:
for x, y in valid_dataset:
  print(x.shape, y.shape)

(4, 512, 512, 3) (4, 512, 512, 1)
(4, 512, 512, 3) (4, 512, 512, 1)
(4, 512, 512, 3) (4, 512, 512, 1)
(4, 512, 512, 3) (4, 512, 512, 1)
(4, 512, 512, 3) (4, 512, 512, 1)
(4, 512, 512, 3) (4, 512, 512, 1)
(4, 512, 512, 3) (4, 512, 512, 1)
(4, 512, 512, 3) (4, 512, 512, 1)
(4, 512, 512, 3) (4, 512, 512, 1)
(4, 512, 512, 3) (4, 512, 512, 1)
(4, 512, 512, 3) (4, 512, 512, 1)
(4, 512, 512, 3) (4, 512, 512, 1)
(4, 512, 512, 3) (4, 512, 512, 1)
(4, 512, 512, 3) (4, 512, 512, 1)
(4, 512, 512, 3) (4, 512, 512, 1)
(4, 512, 512, 3) (4, 512, 512, 1)
(4, 512, 512, 3) (4, 512, 512, 1)
(4, 512, 512, 3) (4, 512, 512, 1)
(4, 512, 512, 3) (4, 512, 512, 1)
(4, 512, 512, 3) (4, 512, 512, 1)
(4, 512, 512, 3) (4, 512, 512, 1)
(4, 512, 512, 3) (4, 512, 512, 1)
(4, 512, 512, 3) (4, 512, 512, 1)
(4, 512, 512, 3) (4, 512, 512, 1)
(4, 512, 512, 3) (4, 512, 512, 1)
(4, 512, 512, 3) (4, 512, 512, 1)
(4, 512, 512, 3) (4, 512, 512, 1)
(4, 512, 512, 3) (4, 512, 512, 1)
(4, 512, 512, 3) (4, 512, 512, 1)
(4, 512, 512, 

In [None]:
input_shape = (height, width, 3)
model = build_unet(input_shape)

In [None]:
model.summary()

Model: "U-Net"
__________________________________________________________________________________________________
 Layer (type)                Output Shape                 Param #   Connected to                  
 input_2 (InputLayer)        [(None, 512, 512, 3)]        0         []                            
                                                                                                  
 conv2d_19 (Conv2D)          (None, 512, 512, 64)         1792      ['input_2[0][0]']             
                                                                                                  
 batch_normalization_18 (Ba  (None, 512, 512, 64)         256       ['conv2d_19[0][0]']           
 tchNormalization)                                                                                
                                                                                                  
 activation_18 (Activation)  (None, 512, 512, 64)         0         ['batch_normalization_18[0

# Train the Model

In [None]:
opt = tf.keras.optimizers.Adam(lr)
model.compile(loss='binary_crossentropy', optimizer=opt, metrics=['acc'])

In [None]:
callbacks = [
    ModelCheckpoint(model_file, verbose = 1, save_best_only=True),
    ReduceLROnPlateau(monitor='val_loss', factor=0.1, patience=4),
    CSVLogger(log_file),
    EarlyStopping(monitor='val_loss', patience=12, restore_best_weights=False)
]

In [None]:
model.fit(
    train_dataset,
    validation_data=valid_dataset,
    epochs=epochs,
    callbacks=callbacks
)

Epoch 1/16
Epoch 1: val_loss improved from inf to 0.08520, saving model to /content/drive/MyDrive/Colab Notebooks/files/eyeglasses/unet-eyeglasses.h5


  saving_api.save_model(


Epoch 2/16
Epoch 2: val_loss improved from 0.08520 to 0.03836, saving model to /content/drive/MyDrive/Colab Notebooks/files/eyeglasses/unet-eyeglasses.h5
Epoch 3/16
Epoch 3: val_loss improved from 0.03836 to 0.02959, saving model to /content/drive/MyDrive/Colab Notebooks/files/eyeglasses/unet-eyeglasses.h5
Epoch 4/16
Epoch 4: val_loss improved from 0.02959 to 0.01919, saving model to /content/drive/MyDrive/Colab Notebooks/files/eyeglasses/unet-eyeglasses.h5
Epoch 5/16
Epoch 5: val_loss did not improve from 0.01919
Epoch 6/16
Epoch 6: val_loss improved from 0.01919 to 0.01469, saving model to /content/drive/MyDrive/Colab Notebooks/files/eyeglasses/unet-eyeglasses.h5
Epoch 7/16
Epoch 7: val_loss improved from 0.01469 to 0.01197, saving model to /content/drive/MyDrive/Colab Notebooks/files/eyeglasses/unet-eyeglasses.h5
Epoch 8/16
Epoch 8: val_loss improved from 0.01197 to 0.01123, saving model to /content/drive/MyDrive/Colab Notebooks/files/eyeglasses/unet-eyeglasses.h5
Epoch 9/16
Epoch 9

<keras.src.callbacks.History at 0x7fcc5c260ca0>

# Test and Evaluate the Model

In [None]:
prediction_dir = os.path.join(files_dir, "prediction")
create_dir(prediction_dir)

In [None]:
model = tf.keras.models.load_model(model_file)
model.summary()

Model: "U-Net"
__________________________________________________________________________________________________
 Layer (type)                Output Shape                 Param #   Connected to                  
 input_2 (InputLayer)        [(None, 512, 512, 3)]        0         []                            
                                                                                                  
 conv2d_19 (Conv2D)          (None, 512, 512, 64)         1792      ['input_2[0][0]']             
                                                                                                  
 batch_normalization_18 (Ba  (None, 512, 512, 64)         256       ['conv2d_19[0][0]']           
 tchNormalization)                                                                                
                                                                                                  
 activation_18 (Activation)  (None, 512, 512, 64)         0         ['batch_normalization_18[0

In [6]:
test_x = sorted(glob(os.path.join(dataset_path, 'test', 'images', '*')))
test_y = sorted(glob(os.path.join(dataset_path, 'test', 'masks', '*')))
assert len(test_x) == len(test_y), "Mismatch in number of test images and masks."

In [None]:
print(f"Test Images: {len(test_x)}")

Test Images: 10


In [None]:
time_taken = []
dice_scores = []
iou_scores = []

for img_path, mask_path in tqdm(zip(test_x, test_y), total=len(test_x)):
    name = os.path.basename(img_path)

    # Read and preprocess the image
    img = cv2.imread(img_path, cv2.IMREAD_COLOR)
    img = img / 255.0
    img = np.expand_dims(img, axis=0)

    # Read and preprocess the mask
    mask = cv2.imread(mask_path, cv2.IMREAD_GRAYSCALE)
    mask = mask / 255.0
    mask = np.expand_dims(mask, axis=-1)

    # Predict the mask
    start_time = time.time()
    pred = model.predict(img)[0]
    total_time = time.time() - start_time
    time_taken.append(total_time)

    # Post-process the prediction
    pred = (pred * 255).astype(np.uint8)
    pred = np.squeeze(pred)

    # Save the predicted mask
    save_path = os.path.join(prediction_dir, name)
    cv2.imwrite(save_path, pred)

    # Debugging: Check if the file was saved
    if os.path.exists(save_path):
        print(f"Saved: {save_path}")
    else:
        print(f"Failed to save: {save_path}")

    # Compute Dice coefficient and IoU
    y_true = mask.flatten()
    y_pred = (pred.flatten() > 0.5).astype(np.uint8)
    dice = f1_score(y_true, y_pred)
    iou = jaccard_score(y_true, y_pred)

    dice_scores.append(dice)
    iou_scores.append(iou)

  0%|          | 0/10 [00:00<?, ?it/s]

Saved: /content/drive/MyDrive/Colab Notebooks/files/eyeglasses/prediction/58080_3_generated_0_00001_.png


 10%|█         | 1/10 [00:00<00:03,  2.50it/s]

Saved: /content/drive/MyDrive/Colab Notebooks/files/eyeglasses/prediction/58162_2_generated_0_00001_.png


 20%|██        | 2/10 [00:00<00:03,  2.53it/s]

Saved: /content/drive/MyDrive/Colab Notebooks/files/eyeglasses/prediction/58196_0_generated_0_00001_.png


 30%|███       | 3/10 [00:01<00:02,  2.56it/s]

Saved: /content/drive/MyDrive/Colab Notebooks/files/eyeglasses/prediction/58208_1_generated_0_00001_.png


 40%|████      | 4/10 [00:01<00:02,  2.58it/s]

Saved: /content/drive/MyDrive/Colab Notebooks/files/eyeglasses/prediction/58212_0_generated_1_00001_.png


 50%|█████     | 5/10 [00:01<00:01,  2.60it/s]

Saved: /content/drive/MyDrive/Colab Notebooks/files/eyeglasses/prediction/58219_1_generated_0_00001_.png


 60%|██████    | 6/10 [00:02<00:01,  2.60it/s]

Saved: /content/drive/MyDrive/Colab Notebooks/files/eyeglasses/prediction/58223_0_generated_0_00001_.png


 70%|███████   | 7/10 [00:02<00:01,  2.61it/s]

Saved: /content/drive/MyDrive/Colab Notebooks/files/eyeglasses/prediction/58234_0_generated_0_00001_.png


 80%|████████  | 8/10 [00:03<00:00,  2.61it/s]

Saved: /content/drive/MyDrive/Colab Notebooks/files/eyeglasses/prediction/58258_0_generated_0_00001_.png


 90%|█████████ | 9/10 [00:03<00:00,  2.62it/s]

Saved: /content/drive/MyDrive/Colab Notebooks/files/eyeglasses/prediction/58277_1_generated_0_00001_.png


100%|██████████| 10/10 [00:03<00:00,  2.59it/s]


In [None]:
mean_dice = np.mean(dice_scores)
mean_iou = np.mean(iou_scores)
mean_time = np.mean(time_taken)
mean_fps = 1 / mean_time

print(f"Mean Dice Coefficient: {mean_dice:.4f}")
print(f"Mean IoU: {mean_iou:.4f}")
print(f"Mean Time: {mean_time:.5f} - Mean FPS: {mean_fps:.5f}")

Mean Dice Coefficient: 0.8276
Mean IoU: 0.7075
Mean Time: 0.16637 - Mean FPS: 6.01082


The model's **Mean Dice Coefficient** of 0.8276 shows that it's pretty good at figuring out the right parts of the pictures. It agrees well with the real answers, which is what we want.

The **Mean IoU score**, which is 0.7075, also tells us the model is good at pinpointing the important parts of the pictures. When this score is over 0.5, it means the model's guesses and the real answers overlap a lot, which is great for our task.

The **Mean Time**, which is how long the model takes to look at each picture, is 0.16637 seconds. This is important because we want the model to be fast, especially if we're working with lots of pictures or need to see results quickly.

The **Mean FPS**, which is about 6 frames per second, shows how quickly the model can go through pictures. This speed is okay for many things, but we might want it to be faster for some jobs that need results really quickly.



# Visualization

In [None]:
visual_test_dir = os.path.join(files_dir, "visual_test")
create_dir(visual_test_dir)

In [None]:
for i, img_path in enumerate(test_x[:10]):
    name = os.path.basename(img_path)
    img = cv2.imread(img_path, cv2.IMREAD_COLOR)
    img_norm = img / 255.0
    img_input = np.expand_dims(img_norm, axis=0)

    pred = model.predict(img_input)[0]
    pred = (pred * 255).astype(np.uint8)
    pred = np.squeeze(pred)

    fig, ax = plt.subplots(1, 2, figsize=(12, 6))
    ax[0].imshow(cv2.cvtColor(img, cv2.COLOR_BGR2RGB))
    ax[0].set_title("Original Image")
    ax[0].axis('off')

    ax[1].imshow(pred, cmap='gray')
    ax[1].set_title("Predicted Mask")
    ax[1].axis('off')

    plt.suptitle(f"Prediction for {name}")
    plt.savefig(os.path.join(visual_test_dir, name))
    plt.show()

Output hidden; open in https://colab.research.google.com to view.

# Bonus [convert to lite format]

In [8]:
# Load the trained model
model = tf.keras.models.load_model(model_file)

In [7]:
# Convert the model to TensorFlow Lite format
converter = tf.lite.TFLiteConverter.from_keras_model(model)
tflite_model = converter.convert()

In [9]:
# Path to Lite model
tflite_model_file = os.path.join(files_dir, "unet-eyeglasses.tflite")

In [8]:
# Save the TensorFlow Lite model
with open(tflite_model_file, 'wb') as f:
    f.write(tflite_model)

Change runtime type to CPU

In [10]:
# Load the TensorFlow Lite model
interpreter = tf.lite.Interpreter(model_path=tflite_model_file)
interpreter.allocate_tensors()

# Get input and output details
input_details = interpreter.get_input_details()
output_details = interpreter.get_output_details()

In [11]:
# Initialize lists to store inference time
inference_times = []

# Run inference on test images using the CPU
for img_path in tqdm(test_x):
    # Read and preprocess the image
    img = cv2.imread(img_path, cv2.IMREAD_COLOR)
    img = img / 255.0
    img = img.astype(np.float32)
    img = np.expand_dims(img, axis=0)

    # Perform inference
    start_time = time.time()
    interpreter.set_tensor(input_details[0]['index'], img)
    interpreter.invoke()
    output_data = interpreter.get_tensor(output_details[0]['index'])
    inference_time = time.time() - start_time
    inference_times.append(inference_time)

100%|██████████| 10/10 [02:26<00:00, 14.66s/it]


In [12]:
# Calculate mean inference time
mean_inference_time = np.mean(inference_times)
mean_fps = 1 / mean_inference_time

print(f"Mean Time: {mean_inference_time:.5f} - Mean FPS: {mean_fps:.5f}")

Mean Time: 14.21506 - Mean FPS: 0.07035
