# The fundamental goal of classification is to predict the label of a target variable, y, based on some other known variables, X. 

Many types of models, the simplest example being logistic regression, do this by explicitly modelling $P(y|X)$, the conditional probability distribution of y given X. An alternative approach is to model the *joint distribution* of y and X, $P(y,X)$ and use this joint distribution for a variety of tasks, one of which is to estimate $P(y|X)$ so that we can predict the value of a target given a set of features.

There must be some benefit for taking the second approach though, otherwise why would we go to the trouble of making extra assumptions and doing extra computation to get to a similar answer as the first approach? One of the key advantages of the second approach (called a generative approach hereafter) is that it offers a natural method for dealing with missing data. If one of the features is missing for an observation, it requires a small amount of work to marginalise the distribution so that we compute $P(y|X^*)$, where $X^*$ is the set of features apart from the one that is missing. By contrast, the first approach, also known as a discriminative approach, doesn't allow us to compute $p(y|X*)$, so our model is broken if we don't have values for all of the features - data imputation is an alternative way around the missing data problem that also works for discriminative models, but it's not straightforward to do rigourously so we prefer generative models in instances where we're going to encounter lots of missing data. 

In this notebook, we're going to implement a Linear Discriminant Analysis (LDA) model. LDA is a type of generative model which assumes that the features of observations with the same label were generated by a common multivariate normal distribution. I won't describe how LDA works in this notebook because I wrote a [blogpost on my research group's blog](https://www.blopig.com/blog/2020/07/no-labels-no-problem-a-quick-introduction-to-gaussian-mixture-models/) where I describe how it works so if you're interested you can find the details there!

# Data Generation

