# Evaluation of Segmentation

---
## Learning Objectives
By the end of this module, learners will be able to:
- Understand the importance of evaluating segmentation results.
- Visually compare predicted masks with ground truth.
- Understand basic segmentation evaluation metrics: Dice Score and IoU (Intersection over Union).
- Implement simple visual and quantitative evaluations in Python.
- Practice evaluation on real bioimages and interpret results.

---
## Visual Inspection of Segmentation
Segmentation models often make small errors in detecting cell boundaries or regions. Before using these masks for scientific conclusions, we must inspect the quality. A visual overlay of the predicted mask with the ground truth (manually labeled image) can quickly highlight errors such as:
- Missed objects
- Extra (false positive) detections
- Boundary misalignment

This is a first, essential sanity check before using more complex metrics.

### Hands-on Coding
We will overlay prediction and ground truth masks in different colors.

In [1]:
import numpy as np
import matplotlib.pyplot as plt
from skimage.io import imread

# Load grayscale image and masks
image = imread("cell_image.tif")
gt_mask = imread("ground_truth_mask.tif") > 0
pred_mask = imread("predicted_mask.tif") > 0

# Overlay visualization
def show_overlay(image, gt, pred):
    overlay = np.zeros((*image.shape, 3))
    overlay[..., 0] = np.where(pred, 1.0, 0.0)   # Red: prediction
    overlay[..., 1] = np.where(gt, 1.0, 0.0)     # Green: ground truth
    overlay[..., 2] = image / image.max()        # Blue: background image

    plt.figure(figsize=(6, 6))
    plt.imshow(overlay)
    plt.title("Overlay: Green=GT, Red=Prediction, Yellow=Overlap")
    plt.axis('off')
    plt.show()

show_overlay(image, gt_mask, pred_mask)

FileNotFoundError: [Errno 2] No such file or directory: '/Users/ranit/Research/github/GBI-Python-2025/course_material/06182025_day3/cell_image.tif'

### Exercise
Use a sample image and:
- Overlay your segmentation result.
- Identify 3 major areas of mismatch (e.g., false positives or false negatives).
- Take a screenshot and write down your observations.

---
## Quantitative Metrics – Dice Score and IoU
### Dice Coefficient (Dice Score)

The Dice coefficient is a measure of overlap between two binary masks:

$$
\text{Dice}(A, B) = \frac{2 \times |A \cap B|}{|A| + |B|}
$$

Where:
- $( A )$: Ground truth mask (binary)
- $( B )$: Predicted mask (binary)
- $( |A \cap B| )$: Number of overlapping pixels between A and B (true positives)
- $( |A| )$: Number of pixels in ground truth
- $( |B| )$: Number of pixels in prediction

---

### Intersection over Union (IoU or Jaccard Index)

IoU measures the proportion of the intersection area over the union area:

$$
\text{IoU}(A, B) = \frac{|A \cap B|}{|A \cup B|}
$$

Where:
- $( |A \cup B| = |A| + |B| - |A \cap B| )$

---

### Relationship Between Dice and IoU

$$
\text{Dice} = \frac{2 \cdot \text{IoU}}{1 + \text{IoU}}
$$

This shows that Dice and IoU are mathematically related and often behave similarly in practice.

High values (close to 1) mean better segmentation.

Dice is more sensitive to overlap, while IoU is stricter.

### Hands-on Coding

In [None]:
def dice_score(gt, pred):
    intersection = np.logical_and(gt, pred).sum()
    return 2 * intersection / (gt.sum() + pred.sum())

def iou_score(gt, pred):
    intersection = np.logical_and(gt, pred).sum()
    union = np.logical_or(gt, pred).sum()
    return intersection / union

# Compute scores
dice = dice_score(gt_mask, pred_mask)
iou = iou_score(gt_mask, pred_mask)

print(f"Dice Score: {dice:.3f}")
print(f"IoU Score: {iou:.3f}")

### Exercise
- Load any two masks (ground truth and prediction).
- Compute Dice and IoU scores.
- Compare how much they differ.
- Change one object in the predicted mask and observe how the scores change.

---
## Mini Project: Evaluate Segmentation on a Dataset
**Goal:**
Given a folder with:
- Original images
- Ground truth masks
- Predicted masks

**Tasks:**
- Visual inspection for 3 samples (with overlays)
- Compute Dice and IoU scores for all samples
- Save results to a CSV file

**Summarize:** Which images had the worst performance and why?

In [None]:
import os
import pandas as pd

image_dir = "dataset/images/"
gt_dir = "dataset/ground_truth/"
pred_dir = "dataset/predictions/"

results = []

for fname in os.listdir(gt_dir):
    gt = imread(os.path.join(gt_dir, fname)) > 0
    pred = imread(os.path.join(pred_dir, fname)) > 0
    
    dice = dice_score(gt, pred)
    iou = iou_score(gt, pred)
    
    results.append({"filename": fname, "dice": dice, "iou": iou})

df = pd.DataFrame(results)
df.to_csv("segmentation_evaluation.csv", index=False)
print(df.sort_values(by="dice"))

---
## Module Summary
| Concept             | What You Learned                                                           |
| ------------------- | -------------------------------------------------------------------------- |
| Visual Inspection   | Overlay GT and predicted masks to identify mismatches visually             |
| Dice Score          | Measures similarity (focus on overlap); good for imbalanced objects        |
| IoU (Jaccard)       | Measures strict overlap; lower tolerance to false positives                |
| Hands-on Evaluation | Use Python to load masks, compute scores, and generate overlays            |
| Mini Project        | Combine visual + quantitative evaluations to identify segmentation quality |
