## Exercise 1: Discrete Naive Bayes Classifier

In this exercise, we want to get a basic idea of the naive Bayes classifier by analysing a small example. Suppose we want to classify fruits based on the criteria length, sweetness and the colour of the fruit and we already spent days by categorizing 1900 fruits. The results are summarized in the following table. ![](./img/figure1.png)

To keep our classifier simple, we deal only with three fruits (banana, papaya and apples). The general scenario is as follows: we get a new fruit where we do not know its class. However, we can measure its length, take a taste and name its colour. Based on these features (the criteria) we then want to know which type of fruit we have. The features are assumed to be independent.

1. Regarding the features and their level of measurement: what scale do they have (i.e. nominal-, ordinal-, interval- or ratio-scaling)?

>**Length: Ordinal <br />
Sweetness: Nominal <br />
Color: Nominal**
    

2. What is the prior probability of a banana, i.e. 𝑃(𝐵𝑎𝑛𝑎𝑛𝑎)?

3. What is the prior probability of a long fruit, i.e. 𝑃(𝐿𝑜𝑛𝑔)?

4. What is the probability that you have a banana in your hand, when you already know
that the fruit is long, i.e. 𝑃(𝐵𝑎𝑛𝑎𝑛𝑎|𝐿𝑜𝑛𝑔)?

5. What is the probability that the fruit is long, when you already know that it is a banana
(the likelihood of the evidence), i.e. 𝑃(𝐿𝑜𝑛𝑔|𝐵𝑎𝑛𝑎𝑛𝑎)?

6. The naive Bayes classifier iterates over every possible class and calculates the conditional probability that the class fits the known features. The class with the maximum corresponding probability is selected
    \begin{equation} argmax_{𝜔 ∈ \{𝐵𝑎𝑛𝑎𝑛𝑎, 𝑃𝑎𝑝𝑎𝑦𝑎, 𝐴𝑝𝑝𝑙𝑒\}} (𝑃(𝜔|𝑥)) \end{equation}
    <br />
    with
    <br />
    \begin{equation}𝑃(𝜔|𝑥) = \frac{𝑃(𝑥1|𝜔) ⋯ 𝑃(𝑥𝑛|𝜔) ⋅ 𝑃(𝜔)}{𝑝(𝑥)}\end{equation}
    <br />
    For a fruit which has medium length, tastes sweet and looks green to you (i.e. 𝑥 = (𝑀𝑒𝑑𝑖𝑢𝑚, 𝑆𝑤𝑒𝑒𝑡, 𝐺𝑟𝑒𝑒𝑛)<sup>𝑇</sup>), which class does it most likely belong to? Hint: you do not need to calculate 𝑝(𝑥).

## Exercise 2: Misclassification Costs

With the Bayes decision rule, we distinguish between classes solely based on probability distributions. This rule is fixed in the sense that we have no manual control over the decision process. But not every class is of the same importance and sometimes we wish to set an individual focus. This can be achieved with penalty terms and is the topic of this exercise. <br /> <br />
We consider a two-class problem and use the normal distributions from the example in the lecture script (slide 27) as likelihood functions

\begin{equation}P(𝑥|𝜔_1) = \frac{1}{\sqrt\pi}\exp{(-x^2)}\end{equation}
\begin{equation}P(𝑥|𝜔_2) = \frac{1}{\sqrt\pi}\exp{(-(x-1)^2)}\end{equation}

We do not set concrete values for the a-priori probabilities, yet. But as we only consider two
classes, we can use the variable 𝑝 ∈ [0; 1] to describe both

\begin{equation}P(𝜔_1) = p \hspace{1cm} and  \hspace{1cm} P(𝜔_2) = 1 − p \end{equation}


Now, we have all ingredients together to define the loss of each class
<br />
<br />
\begin{equation}l_1(𝑥) = 𝜆_{21}p(x | 𝜔_2)𝑃(𝜔_2)\end{equation}
\begin{equation}l_2(𝑥) = 𝜆_{12}p(x | 𝜔_1)𝑃(𝜔_1)\end{equation}

where correct classifications are not penalized, i.e. 𝜆<sub>𝑖𝑖</sub> = 0. The term 𝜆<sub>𝑖𝑗</sub> ∈ ℝ<sup>+</sup> weights the
probabilities and penalizes if a pattern which belongs to the class 𝜔<sub>𝑖</sub> is misclassified to the class 𝜔<sub>𝑗</sub>.
To assign patterns to classes we use a similar decision rule as in the Bayes case. The only difference is that we select the class which yields the lowest loss instead of the highest probability

\begin{equation}𝜔^* = argmin_𝑖 (𝑙_𝑖(𝑥)) \end{equation}
<br />
In the case of our two-class problem, we assign a new pattern 𝑥 to 𝜔<sub>1</sub> if 𝑙<sub>1</sub>(𝑥) < 𝑙<sub>2</sub>(𝑥) and to 𝜔<sub>2</sub> if 𝑙<sub>2</sub> (𝑥) < 𝑙<sub>1</sub>(𝑥)