### What is image segmentation?
In an image classification task, the network assigns a label (or class) to each input image. However, suppose you want to know the shape of that object, which pixel belongs to which object, etc. In this case, you need to assign a class to each pixel of the image—this task is known as segmentation. A segmentation model returns much more detailed information about the image. Image segmentation has many applications in medical imaging, self-driving cars and satellite imaging, just to name a few.

![images.jpg](attachment:images.jpg)
ref: https://www.tensorflow.org/tutorials/images/segmentation, https://images.app.goo.gl/SHAFFbpg72JaqKfu6

### Deep learning image segmentation
1. Semantic segmentation: Semantic segmentation is the simplest type of image segmentation. A semantic segmentation model assigns a semantic class to every pixel, but doesn’t output any other context or information (like objects).

2. Instance segmentation: Instance segmentation inverts the priorities of semantic segmentation: whereas semantic segmentation algorithms predict only semantic classification of each pixel (with no regard for individual instances), instance segmentation delineates the exact shape of each separate object instance.

3. Panoptic segmentation: Panoptic segmentation models both determine semantic classification of all pixels and differentiate each object instance in an image, combining the benefits of both semantic and instance segmentation.

![Screenshot%202024-05-12%20at%2019-58-47%20Image%20Compression%20%E2%80%93%20An%20Introduction%20-%20Week%206%20-%20Image%20Segmentation.pdf.png](attachment:Screenshot%202024-05-12%20at%2019-58-47%20Image%20Compression%20%E2%80%93%20An%20Introduction%20-%20Week%206%20-%20Image%20Segmentation.pdf.png)

ref-https://www.ibm.com/topics/image-segmentation

### Traditional image segmentation techniques

1. Thresholding: Thresholding methods create binary images, classifying pixels based on whether their intensity is above or below a given “threshold value”.

![8e2f2a41-5461-410c-8e7a-6fcbefb4a821_6360d967720e3f797e4e79d3_fig1.avif](attachment:8e2f2a41-5461-410c-8e7a-6fcbefb4a821_6360d967720e3f797e4e79d3_fig1.avif)

2. Histograms: Histograms, which plot the frequency of certain pixel values in an image, are often used to define thresholds. For example, histograms can infer the values of background pixels, helping isolate object pixels.

3. Edge detection: Edge detection methods identify the boundaries of objects or classes by detecting discontinuities in brightness or contrast.

![image5-2.png](attachment:image5-2.png)

4. Watersheds: Watershed algorithms transform images into grayscale, then generate a topographical map in which each pixel’s “elevation” is determined by its brightness. Regions, boundaries and objects can be inferred from where “valleys”, “ridges” and “catchment basins” form.

5. Region-based segmentation: Starting with one or more “seed pixels”, region-growing algorithms group together neighboring pixels with similar characteristics. Algorithms can be agglomerative or divisive.

![acb1201a-e14e-4488-94d7-8ab21680d98b_graph_based_Segmentation.avif](attachment:acb1201a-e14e-4488-94d7-8ab21680d98b_graph_based_Segmentation.avif)

6. Clustering-based segmentation: An unsupervised learning method, clustering algorithms divide visual data into clusters of pixels with similar values. A common variant is K-means clustering, in which k is the number of clusters: pixel values are plotted as data points, and k random points are selected as center of a cluster (“centroid”). Each pixel is assigned to a cluster based on the nearest—that is, most similar—centroid. Centroids are then relocated to the mean of each cluster and the process is repeated, relocating centroids with each iteration until clusters have stabilized.

![0e034664-c804-4965-8b76-0b1e7840e37f_635fad6277408d43e590c1c8_3l1AWRwFQAjP7hJdONXzJjro-je1yq33rM3odPiC0Hhu4X5pAMuRrfcOIVj8GYdzY3vuL2xWQvz5t5NX7frCDWB6pMJlo7MRvOqN5oLBC7FDBDwk9kPmcIsps7nPDVfoSlQ9rKhftel2qYqYnBxrn9.avif](attachment:0e034664-c804-4965-8b76-0b1e7840e37f_635fad6277408d43e590c1c8_3l1AWRwFQAjP7hJdONXzJjro-je1yq33rM3odPiC0Hhu4X5pAMuRrfcOIVj8GYdzY3vuL2xWQvz5t5NX7frCDWB6pMJlo7MRvOqN5oLBC7FDBDwk9kPmcIsps7nPDVfoSlQ9rKhftel2qYqYnBxrn9.avif)

### Overview of traditional image segmentation techniques
![Screenshot%202024-05-12%20at%2020-09-01%20Image%20Compression%20%E2%80%93%20An%20Introduction%20-%20Week%206%20-%20Image%20Segmentation.pdf.png](attachment:Screenshot%202024-05-12%20at%2020-09-01%20Image%20Compression%20%E2%80%93%20An%20Introduction%20-%20Week%206%20-%20Image%20Segmentation.pdf.png)

### Deep learning image segmentation
![Screenshot%202024-05-12%20at%2020-41-10%20Image%20Compression%20%E2%80%93%20An%20Introduction%20-%20Week%206%20-%20Image%20Segmentation.pdf.png](attachment:Screenshot%202024-05-12%20at%2020-41-10%20Image%20Compression%20%E2%80%93%20An%20Introduction%20-%20Week%206%20-%20Image%20Segmentation.pdf.png)

### Deep learning image segmentation models
Trained on an annotated dataset of images, the neural networks of deep learning image segmentation models discover underlying patterns in visual data and discern the salient features most relevant to classification, detection and segmentation.

Prominent deep learning models used in image segmentation include:

1. Fully Convolutional Networks (FCNs): FCNs, often used for semantic segmentation, are a type of convolutional neural network (CNN) with no fixed layers. An encoder network passes visual input data through convolutional layers to extract features relevant to segmentation or classification, and compresses (or downsamples) this feature data to remove non-essential information. This compressed data is then fed into decoder layers, upsampling the extracted feature data to reconstruct the input image with segmentation masks.

2. U-Nets: U-Nets modify FCN architecture to reduce data loss during downsampling with skip connections, preserving greater detail by selectively bypassing some convolutional layers as information and gradients move through the neural network. Its name is derived from the shape of diagrams demonstrating the arrangement of its layers.

3. Deeplab: Like U-Nets, Deeplab is a modified FCN architecture. In addition to skip connections, it uses diluted (or “atrous”) convolution to yield larger output maps without necessitating additional computational power.

4. Mask R-CNNs: Mask R-CNNs are a leading model for instance segmentation. Mask R-CNNs combine a region proposal network (RPN) that generates bounding boxes for each potential instance with an FCN-based “mask head” that generates segmentation masks within each confirmed bounding box.

5. Transformers: inspired by the success of transformer models like GPT and BLOOM in natural language processing, new models like Vision Transformer (ViT) using attention mechanisms in place of convolutional layers have matched or exceeded CNN performance for computer vision tasks.