***
**Introduction to Machine Learning** <br>
__[https://slds-lmu.github.io/i2ml/](https://slds-lmu.github.io/i2ml/)__
***

# Exercise sheet 4: Supervised Classification

In [1]:
#| label: import
# Consider the following libraries for this exercise sheet:

# general
import numpy as np
import pandas as pd
from scipy.stats import norm
# plotting
import matplotlib.pyplot as plt
import seaborn as sns
# sklearn
from sklearn.naive_bayes import CategoricalNB # import Naive Bayes Classifier for categorial distributed features
from sklearn.naive_bayes import GaussianNB # import Naive Bayes Classifier for normal distributed features
from sklearn.preprocessing import OrdinalEncoder
from sklearn.preprocessing import LabelEncoder
from sklearn.discriminant_analysis import LinearDiscriminantAnalysis as LDA
from sklearn.discriminant_analysis import QuadraticDiscriminantAnalysis as QDA
from sklearn.inspection import DecisionBoundaryDisplay
from sklearn.metrics import confusion_matrix
from sklearn.metrics import precision_recall_fscore_support

## Exercise 1: Naive Bayes

You are given the following table with the target variable `Banana`:

| ID | Color | Form | Origin | Banana |
| --- | --- | --- | --- | --- |
| 1 | yellow | oblong | imported | yes |
| 2 | yellow | round | domestic | no |
| 3 | yellow | oblong | imported | no |
| 4 | brown | oblong | imported | yes |
| 5 | brown | round | domestic | no |
| 6 | green | round | imported | yes |
| 7 | green | oblong | domestic | no |
| 8 | red | round | imported | no |


> a) We want to use a Naive Bayes classifier to predict whether a new fruit is a Banana or not. Estimate the posterior
probability $\hat{\pi}(\mathbf{x}_*)$ for a new observation $\mathbf{x}_* = (yellow,round, imported)$. How would you classify the object?


In [2]:
# Enter your code here:

> b) Assume you have an additional feature Length that measures the length in cm. Describe in 1-2 sentences how
you would handle this numeric feature with Naive Bayes.

> **\# Enter your answer here:**

## Exercise 2: Discriminant Analysis

![unnamed-chunk-4-1-1.png](attachment:unnamed-chunk-4-1-1.png)

The above plot shows $\mathcal{D} = \left( (\mathbf{x}^{(1)}, y^{(1)}), \dots, (\mathbf{x}^{(n)}, y^{(n)}) \right)$, a data set with $n = 200$ observations of a continuous target variable $y$ and a continuous, 1-dimensional feature variable $\mathbf{x}$. In the following, we aim at predicting $y$ with a machine learning model that takes $\mathbf{x}$ as input.


> a) To prepare the data for classification, we categorize the target 
  variable $y$ in $3$ classes and call the transformed target variable $z$, as 
  follows:
  
  $$
    z^{(i)} = 
    \begin{cases}
      1, &  y^{(i)} \in (-\infty, 2.5] \\
      2, &  y^{(i)} \in (2.5, 3.5] \\
      3, &  y^{(i)} \in (3.5, \infty)
    \end{cases}
  $$
  > Now we can apply quadratic discriminant analysis (QDA):

>> i) Estimate the class means $\mu_k = E(\mathbf{x}| z = k)$ for each of the three classes $k \in \{1, 2, 3\}$ visually from the plot.
Do not overcomplicate this, a rough estimate is sufficient here. <br>

>> **\# Enter your answer here:** <br>

>> ii) Make a plot that visualizes the different estimated densities per class

In [3]:
# Enter your code here:

>> iii)  How would your plot from ii) change if we used linear discriminant analysis (LDA) instead of QDA? Explain
your answer. <br>

>> **\# Enter your answer here:** <br>


>> iv) Why is QDA preferable over LDA for this data?

>> **\# Enter your answer here:**

> b) Given are two new observations $\mathbf{x}_{∗1} = −10$ and $\mathbf{x}_{∗2} = 7$. State the prediction for QDA and explain how you arrive there.

> **\# Enter your answer here:**

## Exercise 3: Decision Boundaries for Classification Learners

We will now visualize how well different learners classify the three-class `cassini` data set.
Import `cassini_data.csv`.
Then, perturb the `x.2` dimension with Gaussian noise (mean $0$, standard deviation $0.5$), and consider the classifiers
already introduced in the lecture:

- LDA,
- QDA, and
- Naive Bayes.

Plot the learners’ decision boundaries. Can you spot differences in separation ability? <br>
(Note that logistic regression cannot handle more than two classes and is therefore not listed here.)

In [4]:
# read in the CSV file:
cassini = pd.read_csv('https://raw.githubusercontent.com/slds-lmu/lecture_i2ml/master/exercises/data/cassini_data.csv')

# Enter your code here: