# Model-Based Statistical Learning

***class 3***

Model-based statistical learning techniques are a subset of methods that **assumes a statistical model that is supposed to have generated the data**.

Among those model-based statistical learning, we can cite:

- linear model for regression
- logistic regression
- LDA/QDA
- Probabilistic PCA
- T-SNE

## The place of M.B. statistical learning among the stat./ML field.

Model-based statistical learning methods are generally older than ML. 

M.B. represent a large part of modern machine learning, in particular because the most popular methods are now quite old.

| Statistical Learning  | Machine Learning |
|--- |--- |
| 1936, LDA (Ronald Fisher) | 1980 (1956), Neural Networks |
| 1960s, Logistic Regression| 1990, SVM |
| 1965, K-Means | 1998, ConvNets |
| 1979, Expectation-Maximization algorithm for clustering |  | 
| ... | ... |

## Generative vs. Discriminative Techniques

Terms encountered in the statistical learning literature.

### Generative

> There is a model that is supposed to have generated the data (<span style="color:red">model-based</span>)

### Discriminative

> The model is not important, the methods aim at directly modelling the classification boundaries

### Why is model-based learning interesting?

1. **Interpretability of studied phenomena**: Models are usually understandable, providing a comfortable "summary" of the dataset as a generative process -- appealing to researchers/analysts.

2. **Probability output**: Models provide probabilistic outputs, which provide more knowledge on the prediction risk.

3. **Pick and choose approach**: The use of model-based methods offers the possibility to rely on model selection to select the most appropriate model for the data

4. **Extensibility**: Most MB techniques can be expanded to adapt to the complexity of the problem at hand

## Clustering

$$\text{Classification: }(X, y) \overset{\text{learning}}{\rightarrow} f_{\hat{\theta}}: x^* \overset{\text{predict}}{\rightarrow} f_{\hat{\theta}}(x^*)=\hat{z}^*$$

$$\text{Clustering: }(X) \overset{\text{learning}}{\rightarrow} f_{\hat{\theta}}\rightarrow \hat{z} = f_{\hat{\theta}}(x)$$

The task of clustering consistes in forming groups only from the data $X$

<u>**Definition:**</u> The goal of clustering to form $K$ homogneous groups of data such that:
- the data of a group should be similar
- the data of different groups should be different

Even though the clustering task is simple, the general problem is hard (NP-hard) as it is a **combinatorial problem**. 

It is usually not possible to test in any reasonable time all possible configuration for $n >> 10$ and $k>>2$.

### The need for a clustering algorithm

The need for an algorithm is needed to approximate the best configuration.

<u>Examples:</u> 
- **not model-based**: k-means, adaboost, hierarchical, spectral clustering
- **model-based**: <span style="color:red">EM algorithm for mixture models</span>

<hr>

# Mixture Models

<u>**Definition**:</u> Class of statistical models that assumes the following distributions:
$$\mathbb{P}(x) = \overset{K}{\underset{k=1}{\sum}}\Pi_k\mathbb{P}_k(x)$$
Where:
> $\mathbb{P}_k$ is a certain probability densition function, and $\Pi_k\in[0,1]$ such that $\overset{K}{\underset{k=1}{\sum}}\Pi_k=1$.

Then:
> $\mathbb{P}(x)$ is also a probability density function also called a **mixture distribution**.

<u>Example:</u> The following distributions belong to the mixture family:
$$\mathbb{P}(x) = 0.4\mathcal{N}(x; 1, 1) + 0.6\mathcal{N}(x; 2, 1)$$

$$\mathbb{P}(x) = 0.1\mathcal{N}(x; 1, 1) + 0.7\mathcal{U}_{[0, 1]}(x) + 0.2\,{Gamma}(x; 1)$$

> <span style="color:red">**The LDA is a Gaussian Mixture Model**</span>.