## Segmentation


The goal of segmentation is identify the boundaries between different objects in an image, and to simplify the representation of an image into meaningful boundaries that are easier to analyze. This topic is important because:
- it is a more basic step for the convolutional filters in Neural Networks for extracting image features
- it is the basis for many other image processing tasks such as object detection, object tracking, and image classification

\
Image segmentation can be classified into three categories:
Semantic segmentation
Instance segmentation
Panoptic segmentation

The document will cover the following topics:
- Segmentation as pixel wise classification
    - Probabilistic classification
    - Mixture of Gaussians, EM

- Segmentation as energy minimization
    - Markov Random Fields
    - Energy formulation

- Graph cuts for image segmentaton
    - Basic idea
    - s-t Mincut Algorithm
    - Extension to non-binary case



\
Definition of The Problem\
Identifying groups of pixels in the input which is the image. This is a semantic segmentation, this means all the pixels that belong to a title are grouped together. For example, semantically meaningful groups such as color similarity,




Segmentation Approaches:
- Unsupervised Clustering
    - Grouping what "looks similar"
- Semantic Segmentation
    - Learn a classifier to assign a semantic class $C_k$ to every pixel


## Segmentation as Pixel-Wise Classification

To define the grouping semantics, we use feature space. Grayscale image pixels are classified based on intensity similarity as a 1D representation, while colored images are classified based on color value similarity as a 3D representation.

A filter Bank of 24 filters allow us to operate in 24-dimension. 


\
Basically the method is to apply a threshold, above which is the foreground, and below is the background.

\
Probabilistic Classification
- Bayesian Classification
    - Given a measurement $x$, what semantic class $C_k$ should we assign to a pixel?
    - We must recall Bayes Decision Theory in this section:
        - $P(C_k|x) = \frac{p(x|C_k)p(C_k)}{\sum_{j}p(x|C_j)p(C_j)} $
        - where: 
            - $p(x|C_k)$ is the likelihood of the measurement $x$ having been generated by class $C_k$
            -  $p(C_k)$ is the prior probability of class $C_k$
    - In order to build a classifier, we can try either:
        - Discriminative Methods: directly estimating the posterior
        - Generative Methods: estimating likelihood and prior, and then using the Bayes Decision formula


In image recognition, a machine learning model can be taught to recognize objects. Next section will explore how machines recognize classifications by finding patterns. In order to explore the segmentation subject further, we need to take a look at Machine Learning Data Models first to understand the context better.


### Machine Learning Data Models

A machine Learning data model is a program that expresses the relationship between data and finds patterns in the dataset. When an unseen dataset is expressed in a ML data model, the model creates meaningful connections between data, and this can help us make decisions. For example, in natural language processing, machine learning models can parse and correctly recognize the context behind previously unheard sentences or combinations of words.

Machine learning data models can be classified into two types: Generative and Discriminative Models

- A generative model focuses on learning the underlying probability distribution of a given dataset. The fundamental idea behind generative models is to create a model that can generate new data points statistically similar to the original dataset, and new data points are generated from the probability distribution. This model type learns the patterns between data, and creates new realistic networks stemming from one value. This is why, the model focuses on learning the probability distribution that generates data, rather than the classification of data.
Examples:
    - Image or Face generation with Generative adversarial networks (GANs) 
    - Text generation
    - Anomaly detection
    - Gaussian mixture model learning the parameters of the Gaussian mixture that best fits the data \
&nbsp;

- A discriminative model focuses on learning which x-value will map to which y-value. In other words, it learns the direct mapping between input variables and output labels (aka the classification of data) through learned boundaries without considering the underlying probability distribution of the data. The discriminative model learns to find the decision boundary that separates different classes or categories in the input space. This model type can make predictions on previously unseen data based on conditional probability and can be used either for classification or regression problem statements. 
For example: 
    - A convolutional neural network recognizing what an object in an image is
    - A program that predicts the price of a house based on its features
    - Logistic regression program performing sentiment analysis



<p float="left">
  <img src="Segmentation\img\generativedatamodel.jpg" width=45% />
  <img src="Segmentation\img\discriminativedatamodel.jpg" width=45% /> 
</p>

<div style="display: flex;">
        <div style ="padding-left" 10>
            <h3>Generative Model</h3>
            <p>A generative model focuses on learning the underlying probability distribution of a given dataset. The fundamental idea behind generative models is to create a model that can generate new data points statistically similar to the original dataset, and new data points are generated from the probability distribution. This model type learns the patterns between data, and creates new realistic networks stemming from one value. This is why, the model focuses on learning the probability distribution that generates data, rather than the classification of data.
            
Examples:
<ul>
<li>Image or Face generation with Generative adversarial networks (GANs)</li>
<li>Text generation</li>
<li>Anomaly detection</li>
<li>Gaussian mixture model learning the parameters of the Gaussian mixture that best fits the data</li>
</ul>
</p>
        </div>
        <div style ="padding-right" 10>
            <h3>Discriminative Model</h3>
            <p>A discriminative model focuses on learning which x-value will map to which y-value. In other words, it learns the direct mapping between input variables and output labels (aka the classification of data) through learned boundaries without considering the underlying probability distribution of the data. The discriminative model learns to find the decision boundary that separates different classes or categories in the input space. This model type can make predictions on previously unseen data based on conditional probability and can be used either for classification or regression problem statements.
            
Examples:
<ul>
<li>A convolutional neural network recognizing what an object in an image is</li>
<li>A program that predicts the price of a house based on its features</li>
<li>Logistic regression program performing sentiment analysis</li>
<ul>
</p>
        </div>
</div>



### Mixture Model of Gaussian Distribution (MoG)
A mixture model is a statistical model used for representing data that may arise from a mixture of different probability distributions. In simpler terms, it's a way to describe data that might come from several different sources or populations.

For estimating the parameters of a mixture model, we determine component distributions and their corresponding weights, with methods like expectation-maximization (EM) algorithm or Bayesian inference.

Mixture Model of Gaussian Distribution (MoG) is a specific type of mixture model where the data distributions are Gaussian (also known as normal) distributions. In a Gaussian mixture model, the assumption is that the observed data is generated by a mixture of several Gaussian distributions with different means and variances. Therefore it is a generative data model.

Mixture Model of Gaussian Distribution will be referred as MoG here on. MoG can be expressed so:
$$ p(x) = \sum_{i=1}^{K} \phi_i \mathcal{N}(x|\mu_i, \Sigma_i) $$

where,
- $p(x)$ is the probability density function of the mixture model
- $\phi_i$ is the mixing coefficient for the $i$-th component
- $\mathcal{N}(x|\mu_i, \Sigma_i)$ represents the Gaussian distribution with mean $\mu_i$ and covariance matrix $\Sigma_i$
- $K$ is the number of components in the mixture

&nbsp;
&nbsp;

### Expectation-Maximization (EM) Algorithm
THe Expectation-maximization algorithm (EM), is a widely-used computational method for performing maximum likelihood method in certain models.

 - E-Step: assign samples to mixture model components:
    - $ \pi_j \gamma_j(x_n) $


User Assisted Image Segmentation
- User marks two regions for foreground and background
- Learn a MoG model for the color values in each region
- Use the models to classify all pixels by deciding for the class with the highest posterior probability

### Pros of MoG, EM
- It provides an interpretation of the task probability functions
- It is a generative model as the values can be generated from the distribution, and it can predict novel data points

#### Cons of MoG, EM
- Local minima
    - k-means is NP-hard (see: computational complexity theory, nondeterministic polynomial time) even with k=2
- Initialization
    - Often a good idea to start with some k-means iteration
- Needs to know number of componens
    - Solution: Model selection
- Needs careful implementation to avoid numerical issues

### Caveats

So far we have explored  bottom-up ways for segmentation, for which we examined individual pixels and neighborhoods to segment an image into regions. Due to the problem for recognition in finding meaningful segments, alternative methods are explored. 

In the next section, we will explore pixel neighborhood relations.

### References for this Section

---

[1] v7labs, https://www.v7labs.com/blog/panoptic-segmentation-guide

[2] fiveMinuteStats, EM, https://stephens999.github.io/fiveMinuteStats/intro_to_em.html

[3] fiveMinuteStates, Mixture Models, https://stephens999.github.io/fiveMinuteStats/intro_to_mixture_models.html

[4] neptune.ai, https://neptune.ai/blog/image-segmentation

[5] Mordatch, Igor, "Concept Learning with Energy-Based Models" OpenAI. https://openreview.net/pdf?id=H12Y1dJDG



&nbsp;

### Segmentation as Energy Minimization

In the previous section, we explored semantioc clustering approaches for segmentation. In this section, we will explore Image Segmentation with Energy Minimization methods.

### Markov Random Fields Method

Markov Random Fields (MRF) is a method to model a joint distribution of an undirected, connected graph where each node implies a random variable and each edge between nodes is a modeled stochastic dependency.

MRF is an undirected graphical model that explicitly expresses the conditional independence relationship between nodes.

In MRF, we use a class of graphical models to model the conditional probability of a random variable with its given parents.

MRF Nodes as Pixels

![](Segmentation/img/mrf.png)


### Energy Formulation
Energy Function\
$E(x,y) = \sum_{i} + \sum_{i,j}\psi (x_i,x_j)$ \
$ -logp(x,y) = - \sum_{i} \log \psi(x_i,y_i)$

Local Optima of Energy Function


References for this Section

[1] Jun-DevpBlog https://medium.com/jun94-devpblog/cv-7-segmentation-as-energy-minimization-markov-random-fields-energy-formulation-graph-cut-670b9b3c82ee

[2] Statistical Techniques in Robotics, CMU https://www.cs.cmu.edu/~16831-f14/notes/F11/16831_lecture07_bneuman.pdf

[3] https://jwmi.github.io/ASM/Murphy%20chapter%2019.pdf