<h1 style = "text-align: center;">Endoscope Semantic Segmentation</h1>


  <h2>Project Scope and Overview</h2>
  <p>This project focuses on advancing semantic segmentation in medical imaging, particularly for computer-assisted surgery. The main objective is to develop neural network models that can accurately segment surgical images into distinct classes, such as various tissues, surgical instruments, blood vessels, and other critical anatomical structures. By improving segmentation accuracy, the project aims to enhance real-time surgical navigation and safety, providing essential support for clinical decision-making during operations.</p>
  

<body>

  <h2>Dataset Overview</h2>
<p>
  The CholecSeg8K dataset is organized into a clear hierarchical structure, making it easy to locate and use the data. Below is a breakdown of its organization:
</p>
<ul>
  <li>
    <strong>Top-Level Directories:</strong>
    <ul>
      <li>Folders are labeled as <em>video01</em>, <em>video02</em>, etc., where each folder represents a complete surgical video clip.</li>
    </ul>
  </li>
  <li>
    <strong>Segment Directories:</strong>
    <ul>
      <li>Within each video folder, the video is divided into several segments.</li>
      <li>Each segment directory is named with the video ID and the starting frame number (for example, <em>video01_00080</em> indicates that the segment starts at frame 80).</li>
    </ul>
  </li>
  <li>
    <strong>Frame and Image Files:</strong>
    <ul>
      <li>Each segment directory contains <strong>80 consecutive frames</strong> extracted from the video.</li>
      <li>For every frame, there are <strong>4 image files</strong>:
        <ul>
          <li>The raw image frame</li>
          <li>The annotation tool mask (the original hand-drawn annotation)</li>
          <li>The color mask (used for visualization, where classes are painted in distinct colors)</li>
          <li>The watershed mask (used for processing, where each pixel value corresponds to a class ID)</li>
        </ul>
      </li>
      <li>This results in <strong>80 frames × 4 images per frame = 320 images</strong> in each segment directory.</li>
    </ul>
  </li>
  <li>
    <strong>Annotations:</strong>
    <ul>
      <li>Each frame is annotated at the pixel level for 13 distinct classes (e.g., tissue, instruments, blood vessels, etc.).</li>
      <li>The annotations are presented in both the color and watershed masks, ensuring clear class identification for both visualization and automated processing.</li>
    </ul>
  </li>
</ul>
<p>
  This structured, high-quality organization facilitates the development and training of advanced neural networks for precise semantic segmentation in surgical environments.
</p>


<body>
  <div class="gallery">
    <img src="./Images/Fig1.png" alt="Figure 1">
    <img src="./Images/Fig2.png" alt="Figure 2">
    <img src="./Images/Fig3.png" alt="Figure 3">
  </div>
</body>


  <h2>Class Information Table</h2>
  <p>Table I shows the corresponding class names of the class numbers in Figure 1, 2, 3 and the RGB hex code in the watershed masks:</p>
  <table>
    <thead>
      <tr>
        <th>Class Number</th>
        <th>Class Name</th>
        <th>RGB Hexcode</th>
      </tr>
    </thead>
    <tbody>
      <tr>
        <td>Class 0</td>
        <td>Black Background</td>
        <td>#505050</td>
      </tr>
      <tr>
        <td>Class 1</td>
        <td>Abdominal Wall</td>
        <td>#111111</td>
      </tr>
      <tr>
        <td>Class 2</td>
        <td>Liver</td>
        <td>#212121</td>
      </tr>
      <tr>
        <td>Class 3</td>
        <td>Gastrointestinal Tract</td>
        <td>#131313</td>
      </tr>
      <tr>
        <td>Class 4</td>
        <td>Fat</td>
        <td>#121212</td>
      </tr>
      <tr>
        <td>Class 5</td>
        <td>Grasper</td>
        <td>#313131</td>
      </tr>
      <tr>
        <td>Class 6</td>
        <td>Connective Tissue</td>
        <td>#232323</td>
      </tr>
      <tr>
        <td>Class 7</td>
        <td>Blood</td>
        <td>#242424</td>
      </tr>
      <tr>
        <td>Class 8</td>
        <td>Cystic Duct</td>
        <td>#252525</td>
      </tr>
      <tr>
        <td>Class 9</td>
        <td>L-hook Electrocautery</td>
        <td>#323232</td>
      </tr>
      <tr>
        <td>Class 10</td>
        <td>Gallbladder</td>
        <td>#222222</td>
      </tr>
      <tr>
        <td>Class 11</td>
        <td>Hepatic Vein</td>
        <td>#333333</td>
      </tr>
      <tr>
        <td>Class 12</td>
        <td>Liver Ligament</td>
        <td>#050505</td>
      </tr>
    </tbody>
  </table>


<h2>Mask Overview</h2>
<p>
  The table below summarizes the three types of masks that accompany each image frame, along with their descriptions and corresponding images.
</p>
<table border="1" cellspacing="0" cellpadding="10">
  <thead>
    <tr>
      <th>Mask Name</th>
      <th>Description</th>
      <th>Image</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td>Original Image Frame</td>
      <td>This is the raw endoscopic image captured during the surgery.</td>
      <td><img src="./Images/frame_100_endo.png" alt="Original Endoscopic Image" width="200"></td>
    </tr>
    <tr>
      <td>1. Annotation Tool Mask</td>
      <td>
        <ul>
          <li>This is the original hand-drawn mask created during the annotation process.</li>
          <li>It contains detailed pixel-level annotations drawn by experts.</li>
          <li>It serves as the basis for generating the other two masks.</li>
        </ul>
      </td>
      <td><img src="./Images/frame_100_endo_mask.png" alt="Annotation Tool Mask" width="200"></td>
    </tr>
    <tr>
      <td>2. Color Mask</td>
      <td>
        <ul>
          <li>Derived from the annotation tool mask.</li>
          <li>It assigns a unique color to each class (e.g., tissue, instrument, blood vessel) based on predefined IDs.</li>
          <li>This facilitates visual inspection and interpretation of the segmentation results.</li>
        </ul>
      </td>
      <td><img src="./Images/frame_100_endo_color_mask.png" alt="Color Mask" width="200"></td>
    </tr>
    <tr>
      <td>3. Watershed Mask</td>
      <td>
        <ul>
          <li>Also generated from the annotation tool mask.</li>
          <li>It assigns a uniform pixel value (the same across all three RGB channels) to each class.</li>
          <li>These numerical values represent the class IDs, making it ideal for automated processing and further analysis.</li>
        </ul>
      </td>
      <td><img src="./Images/frame_100_endo_watershed_mask.png" alt="Watershed Mask" width="200"></td>
    </tr>
  </tbody>
</table>


In [4]:
%load_ext autoreload
%autoreload 2
%matplotlib inline

In [5]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

from pandas.plotting import scatter_matrix
from seaborn import scatterplot, heatmap

from sklearn.compose import ColumnTransformer
from sklearn.pipeline import Pipeline

from sklearn.impute import SimpleImputer
from sklearn.preprocessing import StandardScaler
from sklearn.preprocessing import MinMaxScaler
from sklearn.preprocessing import RobustScaler

from sklearn.preprocessing import LabelEncoder
from sklearn.preprocessing import OneHotEncoder
from sklearn.model_selection import ShuffleSplit
from sklearn.model_selection import KFold

from sklearn.dummy import DummyClassifier
from sklearn.neighbors import KNeighborsClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.linear_model import RidgeClassifier
from sklearn.linear_model import SGDClassifier

from sklearn.model_selection import train_test_split
from sklearn.model_selection import GridSearchCV
from sklearn.model_selection import cross_validate
from sklearn.model_selection import cross_val_score

from sklearn.metrics import accuracy_score
from sklearn.metrics import confusion_matrix, ConfusionMatrixDisplay

from sklearn.base import BaseEstimator, TransformerMixin

from sklearn.linear_model import Ridge
from sklearn.neighbors import KNeighborsRegressor

from sklearn.model_selection import GridSearchCV
from sklearn.model_selection import cross_validate

from sklearn.metrics import mean_absolute_error

from joblib import dump

In [6]:
import os
if 'google.colab' in str(get_ipython()):
  from google.colab import drive
  drive.mount('/content/drive')
  base_dir = "./drive/My Drive/Colab Notebooks/" 
else:
  base_dir = "." 

<h2>I. Data Engineering</h2>

<h3>a. Data Cleaning</h3>

<h3>b. Dataset Exploration</h3>

<h3>d. Impute/Replace Values</h3>

<h3>e. Feature Engineering</h3>

<h3>f. Data preprocessing</h3>

<h2>II. Modeling</h2>

<h4>Choosing a Backbone</h4>
<p>
  <strong>U-Net:</strong> A popular choice for medical image segmentation due to its encoder–decoder architecture with skip connections that preserve spatial context.
</p>
<p>
  <strong>Alternative Models:</strong> Consider architectures such as DeepLabV3, FCN, or attention-based segmentation networks if you need advanced performance or have specific requirements.
</p>
<h4>Customization</h4>
<ul>
  <li>Adapt the network’s output layer to match the number of segmentation classes.</li>
  <li>Configure the input layer to accept 3 channels *H*W (RGB) for the image data.</li>
  <li>Set the output layer to produce 13 channels (one for each class).</li>
  <li>Experiment with deeper or shallower networks depending on available GPU memory and desired resolution.</li>
</ul>
<h4>U-Net Specific Customizations</h4>
<ul>
  <li>
    <strong>Activation Functions:</strong>
    <ul>
      <li>Use ReLU (or variants like Leaky ReLU or ELU) in the encoder and decoder layers.</li>
      <li>Apply a softmax activation function in the final layer to obtain class probabilities for multi-class segmentation.</li>
    </ul>
  </li>
  <li>
    <strong>Loss Functions:</strong>
    <ul>
      <li>Consider using categorical cross-entropy, Dice loss, or a combination to better handle class imbalance.</li>
    </ul>
  </li>
  <li>
    <strong>Regularization:</strong>
    <ul>
      <li>Incorporate dropout layers to reduce overfitting.</li>
      <li>Add batch normalization to stabilize and accelerate training.</li>
    </ul>
  </li>
  <li>
    <strong>Optimization:</strong>
    <ul>
      <li>Utilize optimizers such as Adam or SGD with momentum.</li>
      <li>Implement learning rate scheduling to adjust the learning rate during training for improved convergence.</li>
    </ul>
  </li>
  <li>
    <strong>Network Architecture:</strong>
    <ul>
      <li>Adjust the number of layers and filters per layer to balance between model complexity and computational efficiency.</li>
      <li>Consider using residual connections or attention mechanisms if additional performance improvements are needed.</li>
    </ul>
  </li>
  <li>
    <strong>Data Augmentation:</strong>
    <ul>
      <li>Apply techniques like rotation, flipping, and scaling to increase the diversity of the training data and improve model robustness.</li>
    </ul>
  </li>
</ul>


<h4 style="color:red; font-style:italic;">Method 1: <u>  </u></43>

<h5>a. Error Estimation & Model Selection</h5>

<h5>b. Training</h5>

<h5>c. Evaluation</h5>

<h4 style="color:red; font-style:italic;">Method 2: <u></u></h4>

<h5>a. Error Estimation & Model Selection</h5>

<h5>b. Training</h5>

<h5>c. Evaluation</h5>

<h4 style="color:red; font-style:italic;">Method 3: <u><u></h4>

<h4>Confidence Intervals </h4>

<p style="color:blue; font-style:italic;">Source: https://github.com/rasbt/machine-learning-notes/blob/main/evaluation/ci-for-ml/confidence-intervals-for-ml.ipynb</p>

<h2>III. Deploy</h2>