<h1 style = "text-align: center;">Endoscope Semantic Segmentation</h1>


  <h2>Project Scope and Overview</h2>
  <p>This project focuses on advancing semantic segmentation in medical imaging, particularly for computer-assisted surgery. The main objective is to develop neural network models that can accurately segment surgical images into distinct classes, such as various tissues, surgical instruments, blood vessels, and other critical anatomical structures. By improving segmentation accuracy, the project aims to enhance real-time surgical navigation and safety, providing essential support for clinical decision-making during operations.</p>
  

<body>

  <h2>Dataset Overview</h2>
<p>
  The CholecSeg8K dataset is organized into a clear hierarchical structure, making it easy to locate and use the data. Below is a breakdown of its organization:
</p>
<ul>
  <li>
    <strong>Top-Level Directories:</strong>
    <ul>
      <li>Folders are labeled as <em>video01</em>, <em>video02</em>, etc., where each folder represents a complete surgical video clip.</li>
    </ul>
  </li>
  <li>
    <strong>Segment Directories:</strong>
    <ul>
      <li>Within each video folder, the video is divided into several segments.</li>
      <li>Each segment directory is named with the video ID and the starting frame number (for example, <em>video01_00080</em> indicates that the segment starts at frame 80).</li>
    </ul>
  </li>
  <li>
    <strong>Frame and Image Files:</strong>
    <ul>
      <li>Each segment directory contains <strong>80 consecutive frames</strong> extracted from the video.</li>
      <li>For every frame, there are <strong>4 image files</strong>:
        <ul>
          <li>The raw image frame</li>
          <li>The annotation tool mask (the original hand-drawn annotation)</li>
          <li>The color mask (used for visualization, where classes are painted in distinct colors)</li>
          <li>The watershed mask (used for processing, where each pixel value corresponds to a class ID)</li>
        </ul>
      </li>
      <li>This results in <strong>80 frames × 4 images per frame = 320 images</strong> in each segment directory.</li>
    </ul>
  </li>
  <li>
    <strong>Annotations:</strong>
    <ul>
      <li>Each frame is annotated at the pixel level for 13 distinct classes (e.g., tissue, instruments, blood vessels, etc.).</li>
      <li>The annotations are presented in both the color and watershed masks, ensuring clear class identification for both visualization and automated processing.</li>
    </ul>
  </li>
</ul>
<p>
  This structured, high-quality organization facilitates the development and training of advanced neural networks for precise semantic segmentation in surgical environments.
</p>


<body>
  <div class="gallery">
    <img src="./Images/Fig1.png" alt="Figure 1">
    <img src="./Images/Fig2.png" alt="Figure 2">
    <img src="./Images/Fig3.png" alt="Figure 3">
  </div>
</body>


  <h2>Class Information Table</h2>
  <p>Table I shows the corresponding class names of the class numbers in Figure 1, 2, 3 and the RGB hex code in the watershed masks:</p>
  <table>
    <thead>
      <tr>
        <th>Class Number</th>
        <th>Class Name</th>
        <th>RGB Hexcode</th>
      </tr>
    </thead>
    <tbody>
      <tr>
        <td>Class 0</td>
        <td>Black Background</td>
        <td>#505050</td>
      </tr>
      <tr>
        <td>Class 1</td>
        <td>Abdominal Wall</td>
        <td>#111111</td>
      </tr>
      <tr>
        <td>Class 2</td>
        <td>Liver</td>
        <td>#212121</td>
      </tr>
      <tr>
        <td>Class 3</td>
        <td>Gastrointestinal Tract</td>
        <td>#131313</td>
      </tr>
      <tr>
        <td>Class 4</td>
        <td>Fat</td>
        <td>#121212</td>
      </tr>
      <tr>
        <td>Class 5</td>
        <td>Grasper</td>
        <td>#313131</td>
      </tr>
      <tr>
        <td>Class 6</td>
        <td>Connective Tissue</td>
        <td>#232323</td>
      </tr>
      <tr>
        <td>Class 7</td>
        <td>Blood</td>
        <td>#242424</td>
      </tr>
      <tr>
        <td>Class 8</td>
        <td>Cystic Duct</td>
        <td>#252525</td>
      </tr>
      <tr>
        <td>Class 9</td>
        <td>L-hook Electrocautery</td>
        <td>#323232</td>
      </tr>
      <tr>
        <td>Class 10</td>
        <td>Gallbladder</td>
        <td>#222222</td>
      </tr>
      <tr>
        <td>Class 11</td>
        <td>Hepatic Vein</td>
        <td>#333333</td>
      </tr>
      <tr>
        <td>Class 12</td>
        <td>Liver Ligament</td>
        <td>#050505</td>
      </tr>
    </tbody>
  </table>


<h2>Mask Overview</h2>
<p>
  The table below summarizes the three types of masks that accompany each image frame, along with their descriptions and corresponding images.
</p>
<table border="1" cellspacing="0" cellpadding="10">
  <thead>
    <tr>
      <th>Mask Name</th>
      <th>Description</th>
      <th>Image</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td>Original Image Frame</td>
      <td>This is the raw endoscopic image captured during the surgery.</td>
      <td><img src="./Images/frame_100_endo.png" alt="Original Endoscopic Image" width="200"></td>
    </tr>
    <tr>
      <td>1. Annotation Tool Mask</td>
      <td>
        <ul>
          <li>This is the original hand-drawn mask created during the annotation process.</li>
          <li>It contains detailed pixel-level annotations drawn by experts.</li>
          <li>It serves as the basis for generating the other two masks.</li>
        </ul>
      </td>
      <td><img src="./Images/frame_100_endo_mask.png" alt="Annotation Tool Mask" width="200"></td>
    </tr>
    <tr>
      <td>2. Color Mask</td>
      <td>
        <ul>
          <li>Derived from the annotation tool mask.</li>
          <li>It assigns a unique color to each class (e.g., tissue, instrument, blood vessel) based on predefined IDs.</li>
          <li>This facilitates visual inspection and interpretation of the segmentation results.</li>
        </ul>
      </td>
      <td><img src="./Images/frame_100_endo_color_mask.png" alt="Color Mask" width="200"></td>
    </tr>
    <tr>
      <td>3. Watershed Mask</td>
      <td>
        <ul>
          <li>Also generated from the annotation tool mask.</li>
          <li>It assigns a uniform pixel value (the same across all three RGB channels) to each class.</li>
          <li>These numerical values represent the class IDs, making it ideal for automated processing and further analysis.</li>
        </ul>
      </td>
      <td><img src="./Images/frame_100_endo_watershed_mask.png" alt="Watershed Mask" width="200"></td>
    </tr>
  </tbody>
</table>


In [4]:
%load_ext autoreload
%autoreload 2
%matplotlib inline

In [7]:
import os
import numpy as np
from PIL import Image
import matplotlib.pyplot as plt
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers
from sklearn.model_selection import train_test_split

<h2>I. Data Engineering</h2>

<h3>a. Dataset Loading and Preprocessing</h3>

```
Dataset/
│
├── video01/
│   ├── video01_00000/
│   │   ├── frame00000_endo.png                  ← Raw image (input)
│   │   ├── frame00000_endo_annotation.png       ← Annotation tool mask
│   │   ├── frame00000_endo_color_mask.png       ← Color mask (visualization)
│   │   ├── frame00000_endo_watershed_mask.png   ← Watershed mask (labels)
│   │   ├── frame00001_endo.png
│   │   ├── ...
│   │   └── frame00079_endo_watershed_mask.png
│   ├── video01_00080/
│   │   ├── frame00080_endo.png
│   │   ├── ...
│
├── video02/
│   ├── video02_00000/
│   │   ├── frame00000_endo.png
│   │   └── ...
│
├── ...
│
└── video17/
    └── ...
```


In [None]:
# Automatically detect the Dataset directory relative to the notebook's location
dataset_dir = os.path.join(os.getcwd(), "Dataset")

# Check if the folder exists
if not os.path.isdir(dataset_dir):
    raise FileNotFoundError(f"Dataset folder not found at: {dataset_dir}\n"
                            f"Please ensure the 'Dataset' folder is placed in the same directory as this notebook.")

Our data loading pipeline specifically targets two types of files:

- **Original frames** — filenames containing `_endo.png` (without any suffix like `_mask` or `_annotation`).  
  These are the raw RGB endoscopic images used as **inputs** to the model.

- **Watershed masks** — filenames containing `_endo_watershed_mask.png`.  
  These masks encode pixel-wise class IDs and are used as **ground truth labels** during training.

We will gather all such image–mask pairs, ensuring that **each original image has a corresponding watershed mask** before including it in the dataset.

In [None]:
# Collect file paths for all images and their corresponding watershed masks
image_paths = []
mask_paths = []

for root, dirs, files in os.walk(dataset_dir):
    for filename in files:
        
        # Identify original image frames (filenames end with "_endo.png" and are not masks)
        if filename.endswith("_endo.png") and "mask" not in filename:
            
            img_path = os.path.join(root, filename)
            
            # Construct the corresponding watershed mask filename
            mask_filename = filename.replace("_endo.png", "_endo_watershed_mask.png")
            mask_path = os.path.join(root, mask_filename)
            
            if os.path.exists(mask_path):
                image_paths.append(img_path)
                mask_paths.append(mask_path)

In [None]:
# Sort the paths for consistency
image_paths.sort()
mask_paths.sort()
print(f"Found {len(image_paths)} image-mask pairs.")  # Expected: 8080 pairs for full dataset

In [None]:
# Prepare arrays for images and masks
num_samples = len(image_paths)
img_height, img_width = 256, 256

X = np.zeros((num_samples, img_height, img_width, 3), dtype=np.float32)
y = np.zeros((num_samples, img_height, img_width), dtype=np.uint8)

- Each pixel in a watershed mask encodes a **semantic class** using a unique **grayscale intensity**.
- The intensity is uniform across all three channels (R = G = B), making it easy to identify programmatically.
- These grayscale values are **mapped to integer class IDs** ranging from 0 to 12.
- This mapping is essential for training the model using pixel-wise classification.
- In total, there are **13 distinct classes**, including the background and various anatomical structures relevant to laparoscopic surgery.

In [None]:
# Define mapping from grayscale values to class IDs (13 classes including background)
value_to_class = {
    80: 0,   # background (#505050)
    17: 1,   # abdominal wall (#111111)
    33: 2,   # liver (#212121)
    19: 3,   # gastrointestinal tract (#131313)
    18: 4,   # fat (#121212)
    49: 5,   # grasper (instrument) (#313131)
    35: 6,   # connective tissue (#232323)
    36: 7,   # blood (#242424)
    37: 8,   # cystic duct (#252525)
    50: 9,   # L-hook electrocautery (instrument) (#323232)
    34: 10,  # gallbladder (#222222)
    51: 11,  # hepatic vein (#333333)
    5: 12    # liver ligament (#050505)
}

<p>
Each image in the dataset is originally <strong>854×480 pixels</strong>. For training purposes, both the images and their corresponding masks are resized to <strong>256×256 pixels</strong>. This resizing is a common practice to reduce memory consumption and computational overhead, while still retaining enough detail for semantic segmentation. The chosen size of 256×256 provides a good trade-off between information preservation and model efficiency.
</p>

<p>
Image pixel values are <strong>normalized to the [0, 1] range</strong> by dividing by 255. This standardization helps the neural network train faster and more reliably by keeping inputs on a consistent scale. 
</p>

<p>
Watershed masks are treated differently. Each pixel in the mask uses a uniform grayscale value across the R, G, and B channels (e.g., <code>[80, 80, 80]</code>). These grayscale values correspond to specific class labels. We convert each pixel to an integer class ID from <strong>0 to 12</strong> based on a predefined mapping. 
</p>

<p>
For example: <code>[80, 80, 80]</code> (hex <code>#505050</code>) represents class 0 (background), while <code>[33, 33, 33]</code> (hex <code>#212121</code>) represents class 2 (liver). This mapping is derived from the dataset documentation and ensures accurate labeling for training.
</p>

In [None]:
# Load and preprocess each image and mask
for i, (img_path, mask_path) in enumerate(zip(image_paths, mask_paths)):
    
    # Load and resize the image
    img = Image.open(img_path).convert("RGB")
    img = img.resize((img_width, img_height), Image.Resampling.BILINEAR)
    img_array = np.array(img, dtype=np.float32) / 255.0  # normalize
    X[i] = img_array

    # Load and resize the watershed mask
    mask = Image.open(mask_path).convert("L")
    mask = mask.resize((img_width, img_height), Image.Resampling.NEAREST)
    mask_array = np.array(mask, dtype=np.uint8)
    
    # Map grayscale pixel values to class IDs
    mask_mapped = np.zeros_like(mask_array, dtype=np.uint8)
    for val, cls in value_to_class.items():
        mask_mapped[mask_array == val] = cls
    y[i] = mask_mapped


During preprocessing, each image–mask pair is resized to a fixed dimension of **256×256 pixels** using different interpolation strategies for the input image and its corresponding label mask:

- **Image resizing:**
  - Each RGB image is loaded and converted to a 3-channel format (`RGB`).
  - It is then resized using **bilinear interpolation** (`Image.Resampling.BILINEAR`), which computes the output pixel value as a weighted average of the nearest four pixels. 
  - This method preserves smooth gradients and is suitable for natural images, making it ideal for input data to convolutional networks.
  - After resizing, the image is normalized to the **[0, 1]** range by dividing by 255, ensuring consistent input scale for the neural network.

- **Mask resizing:**
  - Each watershed mask is loaded in **grayscale** mode (`"L"`), resulting in a single-channel image.
  - It is resized using **nearest-neighbor interpolation** (`Image.Resampling.NEAREST`) to preserve **discrete class boundaries**. This avoids introducing interpolated pixel values that could distort class labels.
  - The resized mask is then converted from grayscale pixel values to **class IDs** (0 to 12) using a predefined mapping, ensuring the model receives accurate training labels.


In [2]:
# Verification
print("Data loaded successfully!")
print("Shape of X (images):", X.shape)
print("Shape of y (masks):", y.shape)
print("Image pixel range [min, max]:", X.min(), X.max())
print("Unique mask labels:", np.unique(y))

Found 8080 image-mask pairs.
Data loaded successfully!
Shape of X (images): (8080, 256, 256, 3)
Shape of y (masks): (8080, 256, 256)
Image pixel range [min, max]: 0.0 1.0
Unique mask labels: [ 0  2  9 12]


<h3>b. Data splitting</h3>

To evaluate our model's performance on unseen data, we separate the dataset into:

- **Training set (80%)**  
  Used to train the neural network and update weights.

- **Validation set (20%)**  
  Used to monitor model performance on unseen data during training.

In [6]:
# Split into training and validation sets (e.g., 80% train, 20% val)
X_train, X_val, y_train, y_val = train_test_split(X, y, test_size=0.2, random_state=42)
print("Training set size:", X_train.shape[0], "images")
print("Validation set size:", X_val.shape[0], "images")

Training set size: 6464 images
Validation set size: 1616 images


<h2>II. Model Architecture: U-Net for Semantic Segmentation</h2>

<h3>a. Defining Model</h3>

For this project, we use a **U-Net architecture** — a popular encoder–decoder convolutional neural network designed for semantic segmentation tasks in biomedical imaging.

U-Net is built to capture both **global context** and **fine-grained local details**, thanks to its **skip connections** that link the encoder and decoder paths.

---

#### Architecture Overview

- **Encoder (Contracting Path):**
  - Consists of **4 downsampling blocks**
  - Each block includes:
    - Two `3×3` convolutional layers with ReLU activation
    - A `2×2` max pooling layer for spatial downsampling
  - The number of filters doubles at each stage:  
    `64 → 128 → 256 → 512`
  - After the fourth block, there's a **bottleneck layer** with `1024` filters

- **Decoder (Expanding Path):**
  - Consists of **4 upsampling blocks**
  - Each block includes:
    - A `2×2` transposed convolution to upsample and halve the number of filters
    - A skip connection that concatenates the corresponding feature map from the encoder
    - Two `3×3` convolutional layers with ReLU activation
  - Filter sizes follow the reverse order:  
    `1024 → 512 → 256 → 128 → 64`

- **Output Layer:**
  - A `1×1` convolution to reduce the channel dimension to the number of classes (**13**)
  - Followed by a **softmax activation** to produce a probability distribution per pixel

This architecture enables the model to segment objects at different scales and accurately preserve spatial information.

---

#### Model Compilation and Training Setup

- **Loss Function:**  
  `SparseCategoricalCrossentropy` is used since the target masks contain integer class labels (not one-hot encoded).

- **Optimizer:**  
  `Adam` — a robust and widely used optimizer for deep learning, with a default learning rate.

- **Metrics:**  
  We track **pixel-wise accuracy** during training.  
  More detailed metrics like **IoU** and **Dice coefficient** will be computed separately after training.

- **Early Stopping:**  
  To prevent overfitting, we use early stopping with `patience = 5`.  
  This means training will stop if the validation loss does not improve for 5 consecutive epochs. The best-performing model weights are automatically restored.

---


In [8]:

def build_unet(input_size=(256, 256, 3), num_classes=13):
    """Builds a U-Net model."""
    inputs = keras.Input(shape=input_size)
    # Encoder: Downsampling through conv blocks and max pooling
    c1 = layers.Conv2D(64, kernel_size=3, activation='relu', padding='same')(inputs)
    c1 = layers.Conv2D(64, kernel_size=3, activation='relu', padding='same')(c1)
    p1 = layers.MaxPooling2D(pool_size=(2, 2))(c1)
    
    c2 = layers.Conv2D(128, kernel_size=3, activation='relu', padding='same')(p1)
    c2 = layers.Conv2D(128, kernel_size=3, activation='relu', padding='same')(c2)
    p2 = layers.MaxPooling2D(pool_size=(2, 2))(c2)
    
    c3 = layers.Conv2D(256, kernel_size=3, activation='relu', padding='same')(p2)
    c3 = layers.Conv2D(256, kernel_size=3, activation='relu', padding='same')(c3)
    p3 = layers.MaxPooling2D(pool_size=(2, 2))(c3)
    
    c4 = layers.Conv2D(512, kernel_size=3, activation='relu', padding='same')(p3)
    c4 = layers.Conv2D(512, kernel_size=3, activation='relu', padding='same')(c4)
    p4 = layers.MaxPooling2D(pool_size=(2, 2))(c4)
    
    # Bottleneck
    c5 = layers.Conv2D(1024, kernel_size=3, activation='relu', padding='same')(p4)
    c5 = layers.Conv2D(1024, kernel_size=3, activation='relu', padding='same')(c5)
    
    # Decoder: Upsampling and skip connections
    u6 = layers.Conv2DTranspose(512, kernel_size=2, strides=2, padding='same')(c5)
    u6 = layers.concatenate([u6, c4])  # skip connection from encoder c4
    c6 = layers.Conv2D(512, kernel_size=3, activation='relu', padding='same')(u6)
    c6 = layers.Conv2D(512, kernel_size=3, activation='relu', padding='same')(c6)
    
    u7 = layers.Conv2DTranspose(256, kernel_size=2, strides=2, padding='same')(c6)
    u7 = layers.concatenate([u7, c3])
    c7 = layers.Conv2D(256, kernel_size=3, activation='relu', padding='same')(u7)
    c7 = layers.Conv2D(256, kernel_size=3, activation='relu', padding='same')(c7)
    
    u8 = layers.Conv2DTranspose(128, kernel_size=2, strides=2, padding='same')(c7)
    u8 = layers.concatenate([u8, c2])
    c8 = layers.Conv2D(128, kernel_size=3, activation='relu', padding='same')(u8)
    c8 = layers.Conv2D(128, kernel_size=3, activation='relu', padding='same')(c8)
    
    u9 = layers.Conv2DTranspose(64, kernel_size=2, strides=2, padding='same')(c8)
    u9 = layers.concatenate([u9, c1])
    c9 = layers.Conv2D(64, kernel_size=3, activation='relu', padding='same')(u9)
    c9 = layers.Conv2D(64, kernel_size=3, activation='relu', padding='same')(c9)
    
    # Output layer
    outputs = layers.Conv2D(num_classes, kernel_size=1, activation='softmax')(c9)
    
    model = keras.Model(inputs=inputs, outputs=outputs)
    return model



In [None]:
# Build the U-Net model and compile it
num_classes = 13
model = build_unet(input_size=(256, 256, 3), num_classes=num_classes)
model.compile(optimizer=keras.optimizers.Adam(),
              loss=keras.losses.SparseCategoricalCrossentropy(),
              metrics=['accuracy'])

# Print model summary
model.summary()

<h3>b. Model Training</h3>

In [None]:
# Set up early stopping callback
early_stop = keras.callbacks.EarlyStopping(monitor='val_loss', patience=5, restore_best_weights=True)

# Train the model
history = model.fit(X_train, y_train,
                    batch_size=8,
                    epochs=50,
                    validation_data=(X_val, y_val),
                    callbacks=[early_stop])


Epoch 1/50
[1m498/808[0m [32m━━━━━━━━━━━━[0m[37m━━━━━━━━[0m [1m3:05:21[0m 36s/step - accuracy: 0.8869 - loss: 1.2214

<h2>III. Evaluation Metrics</h2>

In [None]:
# Predict on the validation set
y_pred_probs = model.predict(X_val)
y_pred = np.argmax(y_pred_probs, axis=-1)  # shape: (num_val, 256, 256)

# Ensure the true labels (y_val) are proper type for comparison
y_true = y_val  # already shape (num_val, 256, 256) as integers

<h3>a. Pixel Accuracy</h3>

<p>Pixel Accuracy: the percentage of pixels (over the whole image set) whose predicted class matches the ground truth class. This is a global measure and was also tracked during training as the 'accuracy' metric.</p>

In [None]:
total_pixels = y_true.size
correct_pixels = np.sum(y_pred == y_true)
pixel_accuracy = correct_pixels / total_pixels

<h3>b. Intersection over Union (IoU)</h3>

<p>Intersection over Union (IoU): for each class, IoU = (True Positive) / (True Positive + False Positive + False Negative), i.e., the area of overlap between the predicted mask and true mask divided by the area of their union. We will compute the IoU for each class and then take the average (Mean IoU) across all classes.
</p>

In [None]:
num_classes = 13
iou_per_class = []
for cls in range(num_classes):
    # Compute intersection and union
    pred_mask = (y_pred == cls)
    true_mask = (y_true == cls)
    intersection = np.logical_and(pred_mask, true_mask).sum()
    union = np.logical_or(pred_mask, true_mask).sum()
    if union == 0:
        # If no pixel of this class is present in both pred and true, skip it (continue)
        continue
    iou = intersection / union
    iou_per_class.append(iou)
mean_iou = np.mean(iou_per_class)

<h3>c. Dice Coefficient</h3>

<p>Dice Coefficient: also known as F1 score for segmentation, Dice = 2 * (Precision * Recall) / (Precision + Recall) for each class, which can be computed as 2 * |Prediction ∩ Ground Truth| / (|Prediction| + |Ground Truth|). We will compute Dice per class and then average. Dice is closely related to IoU (Dice = 2*IoU/(IoU+1)) and emphasizes overlap.</p>

In [None]:
dice_per_class = []
for cls in range(num_classes):
    pred_mask = (y_pred == cls)
    true_mask = (y_true == cls)
    intersection = np.logical_and(pred_mask, true_mask).sum()
    pred_area = pred_mask.sum()
    true_area = true_mask.sum()
    if true_area == 0 and pred_area == 0:
        continue  # skip classes not present in either
    # If only one is present and the other is not, intersection=0 will yield dice=0, which is fine.
    dice = (2 * intersection) / (pred_area + true_area + 1e-8)
    dice_per_class.append(dice)
mean_dice = np.mean(dice_per_class)

print(f"Pixel Accuracy: {pixel_accuracy*100:.2f}%")
print(f"Mean IoU: {mean_iou*100:.2f}%")
print(f"Mean Dice Coefficient: {mean_dice*100:.2f}%")

<h2>IV. Results Visualization</h2>

<p>To get a better intuition of the model's performance, let's visualize some segmentation results. We'll take a few examples from the validation set and display:
The original image.
The ground truth mask (using a color overlay for different classes).
The predicted mask from our model (using the same color scheme as ground truth for easy comparison).</p>

<p>We'll use a consistent color palette to color-code the 13 classes. For clarity, let's define a set of distinct colors for the classes. (These are arbitrary chosen colors for visualization; they may not match the exact colors used in the dataset's color mask, but serve to differentiate classes.)</p>

In [None]:
import matplotlib.pyplot as plt

# Define a color for each class (in RGB)
class_colors = [
    (0, 0, 0),        # 0: background - black
    (128, 64, 128),   # 1: e.g. abdominal wall - purple
    (128, 128, 64),   # 2: liver - olive
    (60, 180, 75),    # 3: GI tract - green
    (255, 225, 25),   # 4: fat - yellow
    (0, 130, 200),    # 5: grasper (instrument) - blue
    (245, 130, 48),   # 6: connective tissue - orange
    (220, 20, 60),    # 7: blood - crimson
    (230, 190, 255),  # 8: cystic duct - lavender
    (170, 110, 40),   # 9: electrocautery instrument - brown
    (0, 0, 255),      # 10: gallbladder - bright blue
    (128, 0, 0),      # 11: hepatic vein - maroon
    (170, 255, 195)   # 12: liver ligament - mint
]

# Choose some sample indices from the validation set to visualize
sample_indices = [0, 1, 2]  # (You can also choose random indices or specific ones)

for idx in sample_indices:
    image = X_val[idx]
    true_mask = y_val[idx]
    pred_mask = y_pred[idx]
    
    # Construct color images for true mask and predicted mask
    H, W = true_mask.shape
    true_mask_color = np.zeros((H, W, 3), dtype=np.uint8)
    pred_mask_color = np.zeros((H, W, 3), dtype=np.uint8)
    for cls, color in enumerate(class_colors):
        true_mask_color[true_mask == cls] = color
        pred_mask_color[pred_mask == cls] = color
    
    # Plot the results
    fig, axes = plt.subplots(1, 3, figsize=(12, 4))
    axes[0].imshow(image)  # image is already normalized [0,1]
    axes[0].set_title("Original Image")
    axes[0].axis('off')
    axes[1].imshow(true_mask_color)
    axes[1].set_title("Ground Truth Mask")
    axes[1].axis('off')
    axes[2].imshow(pred_mask_color)
    axes[2].set_title("Predicted Mask")
    axes[2].axis('off')
    plt.show()


<h2>V. Deploy</h2>

<p>Finally, we save the trained model to disk so that it can be reloaded later for inference or further training without having to retrain from scratch. We'll save the model in Keras's HDF5 format:<p>

In [None]:
model.save("cholecseg_unet_model.h5")
print("Model saved to disk.")