###### Exercise in Photogrammetry I
## **Ex.8 : Classification**
### Hand out: 10.06.2020 
### Submission: 19.06.2020 
### Points: 32


We have given a dataset ```data/fruits_data.npy``` which contains the average grayscale value (first column) and the length (in centimeter, second column) of apples and pears. The file ```data/fruits_label.npy``` contains the corresponding label $w$ {0=apple, 1=pear} of each fruit. Based on this dataset (so called traingset) we want to estimate the labels of the fruits in ```data/test_data.npy```(so called testset). In the end we can evaluate our classifier by comparing the predicted labels with the true labels (specified in ```data/test_label.npy```.

## A. kNN - Classifier
**Tasks:**
1. load the training and testset and visualize them based on their labels. Don't forget the legend and to label the axes. (2 Points)
2. write a function ```kNN(...)``` which takes as input the *training data*, the *training labels*, the *test data* (not the *test labels*!) and the number of neighbors *k*. Predict for each test point the label based on the *k* nearest neighbors of the training set. Return the predicted labels. (8 Points)
3. Predict the labels of the testset with the ```kNN(...)``` function based on their 10 nearest neighbors. Visualize the results. (2 Points)
4. Use the function ```plotConfusionMatrix(...)``` to compute and visualize the confusion matrix for the testset. Compute and print the precision, recall for each class (apples and pears) and the overall accuracy. (4 Points)

### B. MAP - Classifier 
5. compute and print the mean $\mu_i \in R^{2x1}$, the covariance matrix $\Sigma_i \in R^{2x2}$, and the probability of occurence $P(w_i)$ for each class $w_i$ (4 Points)
6. write a function ```MAP(...)``` which takes the means, covariance matrices and probabilities of the classes as well as the testdata and returns the predicted labels. The prediction of the labels should be based on a maximum-a-posteriori classification. Assume that the data is normal distributed. You can use the following pseudo code for your implementation (6 Points):
    - for each class $w_i:= \{i,\mu_i,\Sigma_i,P(w_i)\}$:
        - for each test point $e_n\in R^{2x1}$: 
            - compute the likelihood $P(e_n|w_i)$ as a bivariate normal distribution: $P(e_n|w_i)= N(e_n|\mu_i,\Sigma_i) = \frac{exp(-\frac{1}{2}(e_n-\mu_i)^T\Sigma_i^{-1}(e_n-\mu_i))}{2\pi\sqrt{|\Sigma_i|}}$,  where $|\Sigma_i|$ is the determinant of $\Sigma_i$
            - compute $P(w_i|e_n) = P(e_n|w_i) P(w_i)$ 
    - return the most likely class for each point: $w_{n,i^*}=\text{arg max}_i P(w_i|e_n)$

7. Predict the labels of the testset with your ```MAP(...)``` function. Visualize the results. (2 Points)
8. Plot the confusion matrix for the MAP classification. Compute and print the precision, recall for each class (apples and pears) and the overall accuracy. (4 Points)

**Hint:** numpy has a lof of usefull inbuild functions for linear algebra you can use  (np.mean(), np.cov(), np.argmax(), np.argsort(), ...). Make sure your matrices/ vectors have the expected dimensionality.

In [None]:
# import all required modules
import numpy as np
import matplotlib.pyplot as plt
from sklearn.metrics import confusion_matrix

%matplotlib notebook

In [None]:
def plotConfusionMatrix(y_true, y_pred,
                          normalize=False,
                          title=None,
                          classes = None,
                          cmap=plt.cm.Blues):
    """
    This function prints and plots the confusion matrix.
    Normalization can be applied by setting `normalize=True`.
    Returns the figure, the axis and the confusion matrix
    """
    if not title:
        if normalize:
            title = 'Normalized confusion matrix'
        else:
            title = 'Confusion matrix, without normalization'

    # Compute confusion matrix
    cm = confusion_matrix(y_true, y_pred)
    # Only use the labels that appear in the data
    if classes is None:
        classes = np.unique(y_true)
    if normalize:
        cm = cm.astype('float') / cm.sum(axis=1)[:, np.newaxis]
        print("Normalized confusion matrix")
    else:
        print('Confusion matrix, without normalization')

    print(cm)

    fig, ax = plt.subplots()
    im = ax.imshow(cm, interpolation='nearest', cmap=cmap)
    ax.figure.colorbar(im, ax=ax)
    # We want to show all ticks...
    ax.set(xticks=np.arange(cm.shape[1]),
           yticks=np.arange(cm.shape[0]),
           # ... and label them with the respective list entries
           xticklabels=classes, yticklabels=classes,
           title=title,
           ylabel='True label',
           xlabel='Predicted label')

    # Rotate the tick labels and set their alignment.
    plt.setp(ax.get_xticklabels(), rotation=45, ha="right",
             rotation_mode="anchor")

    # Loop over data dimensions and create text annotations.
    fmt = '.2f' if normalize else 'd'
    thresh = cm.max() / 2.
    for i in range(cm.shape[0]):
        for j in range(cm.shape[1]):
            ax.text(j, i, format(cm[i, j], fmt),
                    ha="center", va="center",
                    color="white" if cm[i, j] > thresh else "black")
    fig.tight_layout()
    return ax, fig, cm
