In [4]:
# This Python 3 environment comes with many helpful analytics libraries installed
# It is defined by the kaggle/python Docker image: https://github.com/kaggle/docker-python
# For example, here's several helpful packages to load

import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)

# Input data files are available in the read-only "../input/" directory
# For example, running this (by clicking run or pressing Shift+Enter) will list all files under the input directory

import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))

# You can write up to 20GB to the current directory (/kaggle/working/) that gets preserved as output when you create a version using "Save & Run All" 
# You can also write temporary files to /kaggle/temp/, but they won't be saved outside of the current session

# **Image Classification**

## 1. Introduction

**Image Classification** is a fundamental task in computer vision where the goal is to assign a label (or category) to an input image.
For example:

* Classifying an image as “cat” or “dog”.
* Detecting the presence of diseases in medical scans.

Formally, given an image ( x ), the model predicts the class label ( y \in {1, 2, ..., K} ), where ( K ) is the number of categories.


## 2. Core Idea

An image classification model learns to map pixel patterns to class labels:
[
f_\theta(x) \rightarrow y
]
where ( f_\theta ) is a neural network parameterized by weights ( \theta ).

The model minimizes a **loss function** (e.g., cross-entropy) to make predictions as close as possible to the true labels.


## 3. Workflow Overview

1. **Data Preparation**

   * Collect a labeled dataset (e.g., CIFAR-10, MNIST, ImageNet).
   * Split into training, validation, and test sets.
   * Apply preprocessing (resize, normalization, augmentation).

2. **Model Design**

   * Choose a neural network architecture (e.g., CNN, ResNet, EfficientNet).
   * Define input shape (e.g., 224×224×3 for RGB images).

3. **Training**

   * Feed batches of images to the network.
   * Compute the **loss** between predictions and true labels.
   * Update weights via **backpropagation** using an optimizer (SGD, Adam).

4. **Evaluation**

   * Use accuracy, precision, recall, and F1-score on test data.

5. **Deployment**

   * Save the trained model and use it for inference on new images.


## 4. Common Architectures

| Model            | Key Features                                     | Notable Use                         |
| ---------------- | ------------------------------------------------ | ----------------------------------- |
| **LeNet-5**      | Early CNN; simple architecture for MNIST         | Digit recognition                   |
| **AlexNet**      | Deep CNN with ReLU and Dropout                   | ImageNet 2012                       |
| **VGGNet**       | Uniform 3×3 conv layers                          | Transfer learning                   |
| **ResNet**       | Residual connections to solve vanishing gradient | General purpose                     |
| **EfficientNet** | Compound scaling of depth, width, resolution     | High accuracy with fewer parameters |


In [5]:
import warnings
warnings.filterwarnings('ignore')

In [6]:
## 5. Example: Basic CNN in Keras

import tensorflow as tf
from tensorflow.keras import layers, models

# Load dataset (CIFAR-10)
(x_train, y_train), (x_test, y_test) = tf.keras.datasets.cifar10.load_data()
x_train, x_test = x_train / 255.0, x_test / 255.0

# Define model
model = models.Sequential([
    layers.Conv2D(32, (3,3), activation='relu', input_shape=(32, 32, 3)),
    layers.MaxPooling2D((2,2)),
    layers.Conv2D(64, (3,3), activation='relu'),
    layers.MaxPooling2D((2,2)),
    layers.Conv2D(64, (3,3), activation='relu'),
    layers.Flatten(),
    layers.Dense(64, activation='relu'),
    layers.Dense(10, activation='softmax')
])

# Compile and train
model.compile(optimizer='adam',
              loss='sparse_categorical_crossentropy',
              metrics=['accuracy'])


In [7]:
model.summary()

In [8]:
model.fit(x_train, y_train, epochs=10, validation_data=(x_test, y_test))

Epoch 1/10
[1m1563/1563[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m42s[0m 25ms/step - accuracy: 0.3440 - loss: 1.7669 - val_accuracy: 0.5330 - val_loss: 1.2978
Epoch 2/10
[1m1563/1563[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m36s[0m 23ms/step - accuracy: 0.5765 - loss: 1.1915 - val_accuracy: 0.6250 - val_loss: 1.0535
Epoch 3/10
[1m1563/1563[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m36s[0m 23ms/step - accuracy: 0.6374 - loss: 1.0255 - val_accuracy: 0.6444 - val_loss: 1.0270
Epoch 4/10
[1m1563/1563[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m35s[0m 23ms/step - accuracy: 0.6842 - loss: 0.9052 - val_accuracy: 0.6807 - val_loss: 0.9001
Epoch 5/10
[1m1563/1563[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m35s[0m 22ms/step - accuracy: 0.7092 - loss: 0.8303 - val_accuracy: 0.6868 - val_loss: 0.8916
Epoch 6/10
[1m1563/1563[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m34s[0m 22ms/step - accuracy: 0.7318 - loss: 0.7718 - val_accuracy: 0.6865 - val_loss: 0.8982
Epoc

<keras.src.callbacks.history.History at 0x7a0d49efe710>

## 6. Key Loss Function

For multi-class classification, we use **Categorical Cross-Entropy**:
[
L = -\sum_{i=1}^{K} y_i \log(\hat{y_i})
]
where

* ( y_i ) = true label (one-hot encoded)
* ( \hat{y_i} ) = predicted probability from softmax


## 7. Performance Metrics

* **Accuracy**: ( \frac{\text{Correct Predictions}}{\text{Total Predictions}} )
* **Precision & Recall**: Useful for imbalanced datasets.
* **Confusion Matrix**: Visualizes prediction vs true class.


## 8. Techniques to Improve Accuracy

* **Data Augmentation**: Flip, rotate, crop, color jitter.
* **Transfer Learning**: Use pre-trained models (e.g., ResNet, MobileNet).
* **Regularization**: Dropout, L2 weight decay.
* **Batch Normalization**: Stabilizes and speeds up training.
* **Learning Rate Scheduling**: Gradually reduce learning rate.

## 9. Real-World Applications

* Facial recognition
* Object detection
* Medical imaging
* Quality inspection in manufacturing
* Satellite image analysis


# **Image Classification with Localization**

## 1. Introduction

**Image Classification with Localization** extends simple classification by not only identifying *what* is in an image but also *where* it is.
The goal is to predict both:

1. The **class label** of the object.
2. The **bounding box coordinates** that locate the object in the image.

Formally, given an input image ( x ), the model predicts:
[
\hat{y} = [p_1, p_2, ..., p_K, b_x, b_y, b_w, b_h]
]
where:

* ( p_1, p_2, ..., p_K ) are class probabilities (via softmax).
* ( b_x, b_y, b_w, b_h ) define the bounding box (center coordinates, width, height).


## 2. Difference from Pure Classification

| Task                                 | Output               | Example Output                      |
| ------------------------------------ | -------------------- | ----------------------------------- |
| **Image Classification**             | Label only           | “Dog”                               |
| **Classification with Localization** | Label + Bounding Box | (“Dog”, [x=75, y=40, w=120, h=150]) |

This means the model learns both *semantic* (what) and *spatial* (where) information.


## 3. Problem Setup

1. **Input:** Image (e.g., 224×224×3)
2. **Output:**

   * **Class scores:** vector of length (K) (for K classes)
   * **Bounding box:** vector of 4 numbers ([x, y, w, h])
3. **Model Type:** CNN backbone with multiple output heads.


## 4. Model Architecture

A typical model has:

* A **feature extractor** (like ResNet, VGG).
* Two **output heads**:

  1. **Classification head:** predicts object class.
  2. **Regression head:** predicts bounding box coordinates.

Example structure:

```
Input Image
     ↓
Convolutional Layers (feature extraction)
     ↓
Flatten / Global Pooling
     ↓
 ┌──────────────┬──────────────┐
 │ Classification │ Localization │
 │ (Softmax)      │ (Linear)     │
 └──────────────┴──────────────┘
```


## 5. Combined Loss Function

The model optimizes **both classification and localization losses** together.

[
L = L_{cls} + \lambda L_{loc}
]

Where:

* ( L_{cls} ) = categorical cross-entropy (for class prediction)
* ( L_{loc} ) = mean squared error or smooth L1 loss (for bounding box regression)
* ( \lambda ) = balancing weight between the two losses

Example:
[
L_{loc} = \frac{1}{N} \sum_i | b_i - \hat{b_i} |^2
]

In [9]:

## 6. Example Implementation (TensorFlow/Keras)

import tensorflow as tf
from tensorflow.keras import layers, models, losses

# Base feature extractor
base_model = tf.keras.applications.ResNet50(weights='imagenet', include_top=False, input_shape=(224,224,3))
x = layers.GlobalAveragePooling2D()(base_model.output)

# Classification head
class_output = layers.Dense(10, activation='softmax', name='class_output')(x)

# Localization head
bbox_output = layers.Dense(4, activation='linear', name='bbox_output')(x)

# Combine into one model
model = models.Model(inputs=base_model.input, outputs=[class_output, bbox_output])

# Compile with multiple losses
model.compile(
    optimizer='adam',
    loss={
        'class_output': 'categorical_crossentropy',
        'bbox_output': 'mse'
    },
    loss_weights={
        'class_output': 1.0,
        'bbox_output': 1.0
    },
    metrics={'class_output': 'accuracy'}
)

Downloading data from https://storage.googleapis.com/tensorflow/keras-applications/resnet/resnet50_weights_tf_dim_ordering_tf_kernels_notop.h5
[1m94765736/94765736[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 0us/step


In [None]:

model.fit(X_train, {'class_output': y_classes, 'bbox_output': y_boxes}, epochs=10)



## 7. Bounding Box Formats

There are two common formats:

1. **(x_min, y_min, x_max, y_max)** — coordinates of the box corners.
2. **(x_center, y_center, width, height)** — center and size representation.

Always normalize coordinates to [0,1] relative to image dimensions during training.


## 8. Dataset Examples

Datasets that support localization (labels + bounding boxes):

* **PASCAL VOC**
* **MS COCO**
* **Open Images Dataset**

Each provides annotations in XML or JSON formats containing class names and box coordinates.


## 9. Evaluation Metrics

* **Classification:** Accuracy, Precision, Recall, F1-score
* **Localization:** IoU (Intersection over Union)

[
IoU = \frac{Area(Box_{pred} \cap Box_{true})}{Area(Box_{pred} \cup Box_{true})}
]

An IoU > 0.5 is often considered a correct localization.
---

## 10. Extensions

Image classification with localization forms the basis for more advanced tasks:

* **Object Detection:** multiple objects per image (e.g., YOLO, SSD, Faster R-CNN)
* **Instance Segmentation:** pixel-level object boundaries (e.g., Mask R-CNN)


## 11. Summary

| Aspect           | Description                                           |
| ---------------- | ----------------------------------------------------- |
| **Goal**         | Predict class + bounding box                          |
| **Input**        | Image                                                 |
| **Output**       | Class probabilities + 4 box coordinates               |
| **Loss**         | Classification + Localization (MSE or Smooth L1)      |
| **Metrics**      | Accuracy, IoU                                         |
| **Applications** | Object tracking, medical imaging, autonomous vehicles |

--


# **Object Detection**

## 1. Introduction

**Object Detection** is a computer vision task that identifies **what objects** are present in an image and **where** they are located.
Unlike simple classification or single-object localization, object detection can handle **multiple objects** of different classes in the same image.

**Goal:**
For a given image ( I ), predict a set of bounding boxes ( B_i ) and class labels ( C_i ) for all objects:
[
\hat{y} = { (B_1, C_1), (B_2, C_2), ..., (B_n, C_n) }
]


## 2. Comparison with Related Tasks

| Task                     | Output                           | Example                      |
| ------------------------ | -------------------------------- | ---------------------------- |
| **Image Classification** | One label per image              | “Cat”                        |
| **Localization**         | One label + one bounding box     | (“Cat”, [x, y, w, h])        |
| **Object Detection**     | Multiple labels + bounding boxes | (“Cat”, Box1), (“Dog”, Box2) |


## 3. Core Concepts

Object detection combines **classification** and **localization**:

1. **Classification** — what the object is.
2. **Bounding Box Regression** — where the object is.

Each detected object is represented by:
[
[b_x, b_y, b_w, b_h, p_1, p_2, ..., p_K]
]
where:

* ( b_x, b_y, b_w, b_h ): bounding box coordinates.
* ( p_i ): class probabilities.


## 4. Types of Object Detection Models

Object detection models are typically categorized into two main families:

### 4.1. **Two-Stage Detectors**

These methods first generate *region proposals* and then classify each one.

| Model                   | Description                                                            |
| ----------------------- | ---------------------------------------------------------------------- |
| **R-CNN (2014)**        | Extracts region proposals, runs CNN on each region. Slow but accurate. |
| **Fast R-CNN (2015)**   | Runs CNN once per image, uses RoI pooling for proposals.               |
| **Faster R-CNN (2016)** | Adds a Region Proposal Network (RPN) for efficiency.                   |

Two-stage detectors are more accurate but computationally expensive.


### 4.2. **Single-Stage Detectors**

These models predict bounding boxes and classes in a **single forward pass**.

| Model                                   | Description                                                          |
| --------------------------------------- | -------------------------------------------------------------------- |
| **YOLO (You Only Look Once)**           | Divides image into grid cells and predicts boxes + classes directly. |
| **SSD (Single Shot Multibox Detector)** | Predicts multiple boxes at different scales from feature maps.       |
| **RetinaNet**                           | Introduces focal loss to handle class imbalance.                     |

Single-stage models are faster and suitable for real-time detection.


## 5. Model Architecture (Example: YOLO)

**YOLO Architecture Overview:**

```
Input Image (e.g., 416x416x3)
      ↓
Convolutional Backbone (e.g., Darknet53)
      ↓
Feature Maps
      ↓
Detection Head
   ├── Bounding Box (x, y, w, h)
   ├── Objectness Score
   └── Class Probabilities
```

The image is divided into an ( S \times S ) grid.
Each grid cell predicts:

* ( B ) bounding boxes
* Confidence score (objectness)
* Class probabilities



## 6. Loss Function

The overall loss combines **localization**, **confidence**, and **classification** losses.

[
L = \lambda_{coord} L_{coord} + L_{conf} + L_{class}
]

* ( L_{coord} ): bounding box regression loss (Smooth L1 / MSE)
* ( L_{conf} ): objectness loss
* ( L_{class} ): categorical cross-entropy
* ( \lambda_{coord} ): weight for coordinate loss



## 7. Evaluation Metrics

### **Intersection over Union (IoU)**

Measures how much the predicted box overlaps with the ground truth.

[
IoU = \frac{Area(B_{pred} \cap B_{true})}{Area(B_{pred} \cup B_{true})}
]

A prediction is considered correct if IoU > 0.5 (common threshold).



### **Mean Average Precision (mAP)**

The standard metric for object detection:

1. Compute precision-recall curve for each class.
2. Compute **Average Precision (AP)** for each class.
3. Take the mean across all classes → **mAP**.

[
mAP = \frac{1}{K} \sum_{i=1}^{K} AP_i
]

In [None]:

## 8. Example Implementation (Using YOLOv8 and Ultralytics)

# Install ultralytics if not already installed
!pip install ultralytics

from ultralytics import YOLO

# Load a pretrained YOLOv8 model
model = YOLO('yolov8s.pt')

# Perform object detection
results = model('image.jpg')

# Visualize results
results.show()


# You can fine-tune YOLO on a custom dataset by:


model.train(data='data.yaml', epochs=50)


## 9. Popular Datasets

| Dataset         | Description                              |
| --------------- | ---------------------------------------- |
| **COCO**        | 80 object categories, 330K images        |
| **PASCAL VOC**  | 20 categories, classic benchmark         |
| **Open Images** | 9M images with bounding boxes and labels |
| **KITTI**       | Commonly used for autonomous driving     |



## 10. Practical Applications

* Autonomous Vehicles (detecting pedestrians, vehicles, signs)
* Security Systems (person or object recognition)
* Medical Imaging (detecting tumors or anomalies)
* Retail Analytics (product detection on shelves)
* Robotics (object tracking and manipulation)


## 11. Summary

| Aspect           | Description                                      |
| ---------------- | ------------------------------------------------ |
| **Goal**         | Detect and localize multiple objects in an image |
| **Output**       | Multiple bounding boxes + class labels           |
| **Models**       | Faster R-CNN, YOLO, SSD, RetinaNet               |
| **Metrics**      | IoU, mAP                                         |
| **Approach**     | Combines classification + localization           |
| **Applications** | Real-time systems, autonomous tech, security     |


## 12. Next Step

Once object detection is clear, the natural progression is:

* **Instance Segmentation:** pixel-level detection (Mask R-CNN)
* **Semantic Segmentation:** classifying every pixel in the image


# **Image Segmentation**

## 1. Introduction

**Image Segmentation** is the process of partitioning an image into multiple regions or segments to simplify its representation and make it more meaningful.
The goal is to assign a **label to every pixel** in the image so that pixels with the same label share similar characteristics (such as color, texture, or object identity).

Mathematically, for an image ( I ) of size ( H \times W ), segmentation produces an output mask ( M ) where:
[
M_{ij} \in {1, 2, ..., K}
]
Here, ( K ) is the number of classes, and each pixel ((i, j)) gets a class label.


## 2. Why Segmentation?

Unlike **object detection**, which predicts bounding boxes around objects, **segmentation** gives a **pixel-accurate** understanding of the image.

| Task           | Output Type         | Example                                      |
| -------------- | ------------------- | -------------------------------------------- |
| Classification | One label per image | “Dog”                                        |
| Detection      | Bounding boxes      | (“Dog”, Box1)                                |
| Segmentation   | Pixel-level masks   | Every pixel labeled as dog, background, etc. |



## 3. Types of Image Segmentation

### 3.1. **Semantic Segmentation**

* Groups pixels belonging to the same class.
* All objects of the same type share one label.
* Example: all cars → “car” class, regardless of instance.

**Use case:** Satellite imagery, medical scans, scene understanding.


### 3.2. **Instance Segmentation**

* Distinguishes between **different instances** of the same class.
* Example: two people → “person 1” and “person 2”.

**Use case:** Autonomous driving, robotics, video analysis.


### 3.3. **Panoptic Segmentation**

* Combines both **semantic** and **instance** segmentation.
* Produces a complete scene understanding.


## 4. Architecture Overview

### 4.1. **Encoder–Decoder Structure**

Most segmentation networks follow an **encoder–decoder** architecture:

* **Encoder:** extracts spatial and semantic features (similar to a CNN classifier).
* **Decoder:** upsamples features to recover spatial resolution and produce a dense pixel map.

```
Input Image
     ↓
Encoder (e.g., ResNet, VGG)
     ↓
Bottleneck (feature representation)
     ↓
Decoder (upsampling + skip connections)
     ↓
Segmentation Map (pixel labels)
```



## 5. Popular Segmentation Architectures

| Model                                 | Description                                                                          |
| ------------------------------------- | ------------------------------------------------------------------------------------ |
| **FCN (Fully Convolutional Network)** | Replaces fully connected layers with convolutional layers for pixel-wise prediction. |
| **U-Net**                             | Encoder–decoder with skip connections; popular in medical imaging.                   |
| **SegNet**                            | Uses pooling indices for efficient upsampling.                                       |
| **DeepLab (v3, v3+)**                 | Uses atrous (dilated) convolutions and CRFs for high accuracy.                       |
| **Mask R-CNN**                        | Extends Faster R-CNN to perform instance segmentation.                               |



## 6. Example: U-Net Architecture

U-Net is one of the most widely used models for semantic segmentation.

**Key features:**

* Contracting path (encoder) captures context.
* Expanding path (decoder) recovers spatial details.
* Skip connections combine encoder and decoder features.

In [12]:
from tensorflow.keras import layers, models

def unet_model(input_shape=(128,128,3), num_classes=2):
    inputs = layers.Input(shape=input_shape)

    # Encoder
    c1 = layers.Conv2D(64, (3,3), activation='relu', padding='same')(inputs)
    c1 = layers.Conv2D(64, (3,3), activation='relu', padding='same')(c1)
    p1 = layers.MaxPooling2D((2,2))(c1)

    c2 = layers.Conv2D(128, (3,3), activation='relu', padding='same')(p1)
    c2 = layers.Conv2D(128, (3,3), activation='relu', padding='same')(c2)
    p2 = layers.MaxPooling2D((2,2))(c2)

    # Bottleneck
    b = layers.Conv2D(256, (3,3), activation='relu', padding='same')(p2)
    b = layers.Conv2D(256, (3,3), activation='relu', padding='same')(b)

    # Decoder
    u1 = layers.Conv2DTranspose(128, (2,2), strides=(2,2), padding='same')(b)
    u1 = layers.concatenate([u1, c2])
    c3 = layers.Conv2D(128, (3,3), activation='relu', padding='same')(u1)
    c3 = layers.Conv2D(128, (3,3), activation='relu', padding='same')(c3)

    u2 = layers.Conv2DTranspose(64, (2,2), strides=(2,2), padding='same')(c3)
    u2 = layers.concatenate([u2, c1])
    c4 = layers.Conv2D(64, (3,3), activation='relu', padding='same')(u2)
    c4 = layers.Conv2D(64, (3,3), activation='relu', padding='same')(c4)

    outputs = layers.Conv2D(num_classes, (1,1), activation='softmax')(c4)

    model = models.Model(inputs, outputs)
    return model

model = unet_model()
model.summary()

## 7. Loss Functions for Segmentation

The choice of loss depends on the nature of the segmentation problem.

| Loss                   | Use Case                            | Formula / Description                              |
| ---------------------- | ----------------------------------- | -------------------------------------------------- |
| **Cross-Entropy Loss** | Multi-class segmentation            | Penalizes incorrect pixel classifications.         |
| **Dice Loss**          | Medical imaging, imbalanced classes | Measures overlap between predicted and true masks. |
| **IoU Loss (Jaccard)** | General segmentation                | Based on Intersection over Union metric.           |
| **Focal Loss**         | Imbalanced datasets                 | Down-weights easy examples.                        |

Example (Dice Coefficient):
[
Dice = \frac{2 |A \cap B|}{|A| + |B|}
]
[
Loss_{Dice} = 1 - Dice
]


## 8. Evaluation Metrics

* **Pixel Accuracy**
* **IoU (Intersection over Union)**
* **Dice Coefficient (F1 Score)**
* **Mean IoU (mIoU)** — average IoU over all classes.

[
mIoU = \frac{1}{K} \sum_{i=1}^{K} \frac{TP_i}{TP_i + FP_i + FN_i}
]


## 9. Example Inference Visualization

After training, predictions can be visualized as overlays.

In [13]:
import matplotlib.pyplot as plt
import numpy as np

def visualize_segmentation(image, mask, pred_mask):
    plt.figure(figsize=(12, 4))
    plt.subplot(1, 3, 1)
    plt.imshow(image)
    plt.title("Original Image")
    plt.axis("off")

    plt.subplot(1, 3, 2)
    plt.imshow(mask, cmap='gray')
    plt.title("Ground Truth Mask")
    plt.axis("off")

    plt.subplot(1, 3, 3)
    plt.imshow(pred_mask, cmap='gray')
    plt.title("Predicted Mask")
    plt.axis("off")

    plt.show()

## 10. Common Datasets

| Dataset                 | Description                              |
| ----------------------- | ---------------------------------------- |
| **PASCAL VOC**          | 20 classes, benchmark dataset            |
| **MS COCO**             | Complex real-world scenes                |
| **Cityscapes**          | Urban street scenes (autonomous driving) |
| **CamVid**              | Road scene understanding                 |
| **ISIC / LUNA / DRIVE** | Medical segmentation datasets            |

---

## 11. Applications

* **Medical Imaging:** Tumor or organ segmentation.
* **Autonomous Driving:** Lane, vehicle, and pedestrian segmentation.
* **Satellite Imagery:** Land cover classification.
* **Agriculture:** Crop disease detection, field segmentation.
* **AR/VR:** Background removal and object isolation.

---

## 12. Summary

| Aspect            | Description                            |
| ----------------- | -------------------------------------- |
| **Goal**          | Assign a label to every pixel          |
| **Input**         | Image                                  |
| **Output**        | Pixel-level class mask                 |
| **Architectures** | FCN, U-Net, SegNet, DeepLab            |
| **Loss**          | Cross-Entropy, Dice, IoU               |
| **Metrics**       | IoU, Dice, mIoU                        |
| **Applications**  | Medical, automotive, satellite imaging |
