# Linear and quadratic discriminant analysis
In todays lecture we will cover linear (LDA) and quadratic discriminant analysis (QDA) and how we can use them for classification.
But before we dive in deeper think about what we actually trying to achieve with this approaches.

We have learned that we use so-called discriminants to represent classifiers. For a classifier with $\omega_1,...,\omega_n$ classes, we have a set of discriminant functions $g_i(\mathbf{x})$ with $i \in \{1,...,c\}$.    
*Task*: Give some examples for discriminants with a minimal error rate!    
* Todo           

*Task*: What happens with the feature space $\mathbf{R}^d$ when we use such discriminants/decission rules?
* Todo        

*Task*: How do we decide to which class $\omega_i$ a feature vetor $\mathbf{x}$ belongs?
* Todo        

Now that we have an idea about discriminants, we can dive deeper into our LDA and QDA classifiers.    
*Task*: We want to calculate the probability that a featire vector $\mathbf{x}$ belongs to a class $\mathbf{\omega_i}$. What can we use to calculate $P(\omega_i|\mathbf{x})$?    
* $P(\omega_i|\mathbf{x}) = $ Todo    

*Task*: What kind of assumption do we have to make when we use an LDA/QDA?
* Todo    

*Task*: What is the difference between a LDA and QDA?
* Todo    

## Exercise 1
A fisherman needs your help in classifying fish. He recently caught the following fish:

| Length (m)    | Species          | 
| ------------- |-------------  |
| 1.3           | Seabass       |
| 0.7           | Salmon       |
| 0.62           | Salmon      |
| 0.9           | Salmon       |
| 0.91          | Seabass       |
| 0.31          | Herring       |
| 0.26           | Herring       |

* Calculate the priors $p(\omega_i)$ for each fish species
* What is the formula for calculating the parameters $\mu$ and $\sigma^2$ for the likelihoods?
* Calculate the parameters $\mu$ and $\sigma^2$ for the likelihoods $p(\mathbf{x}|\omega_i)$.
* The fisherman catches a new fish with length $x = 0.82 m$. Calculate the posterior probability $p(\omega_i|\mathbf{x})$ for each class. How is the fish classified?

## Exercise 2
Implement a function `priors(classes)` that outputs the prior $p(\omega)$ for each class for a vector of class labels.
The input should be an array of classes (e.g. `np.array(["stand", "sit", "sit", "stand"])`). The output should be a data frame with the columns `class` and `prior`.

In [None]:
import numpy as np
import pandas as pd

def priors(classes):
    'Implement me!'
    
pp = priors(np.array(["stand","sit","sit","sit","stand"]))
print(pp)

## Exercise 3
Implement a function `likelihood(data)` that approximates the likelihood $p(\omega_i|\mathbf{x})$ for each class $\omega_i$ with a normal distribution for a data frame consisting of a column $y$ and a column $x$, i.e. a mean value and a variance are to be output for each class.
The output should therefore have the columns `class`, `mean` and `variance`.

Plot the likelihood for each class.

In [None]:
from scipy.io import arff

def likelihood(data):
    'Implement me!'
    
data = arff.loadarff('features1.arff')
df = pd.DataFrame(data[0])

dat = df.loc[:, ["AccX_mean","class"]]
dat.columns = ["x","class"]
lik = likelihood(dat)
lik

## Exercise 4
Implement a function myqda(newdat,lik,priors) that returns the most probable class for a new observation `newdat`.

Test your implementation on the dataset `features1.arff`. "Train" the QDA (i.e. calculate likelihood and prior), and then perform classification on the same data. How good is the classification?

In [None]:
from scipy.io import arff
import scipy.stats

def myqda(newdat,lik,prior):
    'Implement me!'


data = arff.loadarff('features1.arff')
df = pd.DataFrame(data[0])

dat = df.loc[:, ["AccX_mean","class"]]
dat.columns = ["x","class"]

lik = likelihood(dat)
prior = priors(dat["class"])

nc = myqda(dat["x"][1:100],lik,prior) 
print(sum(nc == dat["class"][1:100])/100) # compute fraction of correct classified data points