**CS596 - Machine Learning**
<br>
Date: **23 September 2020**


Title: **Lecture 5 - Part A**
<br>
Speaker: **Dr. Shota Tsiskaridze**
<br>
Teaching Assistant: **Levan Sanadiradze**

Bibliography:


<h1 align="center">Naive Bayes Algorithm</h1>

<h3 align="center">Classification Problems</h3>

- The goal in **classification** is to take an input vector $\mathbf{x}$ and to assign it to one of $K$ discrete classes $\mathcal{C}_k$ where $k = 1, ..., K$. 


- The classes are taken to be **disjoint**, so that each input is assigned to one and only one class.


- The **input space** is thereby divided into **decision regions** whose boundaries are called **decision boundaries** or **decision surfaces**.


- There are **three distinct approaches** to the classification problem:
  - **constructing a discriminant function**, i.e. directly assigns each vector $\mathbf{x}$  to a specific class. 
  - **directly modeling of the conditional probability$p(\mathcal{C}_k |\mathbf{x})$**, i.e. representing them as parametric models and then optimizing the parameters using a training set. 
  - **adopt a generative approach**, i.e. model the class-conditional densities given by $p(\mathbf{x} |\mathcal{C}_k)$, together with the prior probabilities $p(\mathcal{C}_k)$ for the classes, and then compute the required posterior probabilities using Bayes’ theorem:
  
  $$p(\mathcal{C}_k | \mathbf{x}) = \frac{p(\mathbf{x} |\mathcal{C}_k)p(\mathcal{C}_k)}{p( \mathbf{x})}$$

<h3 align="center">Conditional probability</h3>

- Let $(\Omega, \Sigma, P)$ be a probability space and $A\in \Sigma$ and $B\in \Sigma$ are two events.


- The **conditional probability** of $A$ given $B$ is defined as the quotient of the probability of the joint of events $A$ and $B$, and the probability of $B$.

  In other words, if $P(B) \gt 0$, then **conditional probability** of $A$ given $B$ is given as:

  $$P(A|B) = \frac{P(A\cap B)}{P(B)}.$$

<h3 align="center">Bayes' Theorem</h3>

- Let $(\Omega, \Sigma, P)$ be a probability space and $A\in \Sigma$ and $B\in \Sigma$ are two events.


- **Bayes’ theorem** is stated mathematically as the following equation:

  $$P(A|B) = \frac{P(B|A)P(A)}{P(B)},$$
  
  where:
    - $P(A|B)$ is a conditional probability of occurring $A$ given that $B$ is true;
    - $P(B|A)$ is a conditional probability of occurring $B$ given that $A$ is true;
    - $P(A)$ and $P(B)$ are the probabilities of observing $A$ and $B$ respectively.


- **Proof**. 

  We can rewrite the definitions of $P(A|B)$ and $P(B|A)$ in the following forms:

  $$P(A|B)P(B) = P(A \cap B) \text{ and } P(B|A)P(A) = P(B \cap A).$$

  Equating the two yields, we get $P(A|B)P(B) = P(A \cap B) = P(B \cap A) = P(B|A)P(A)$, and thus: 

  $$P(A|B) = \frac{P(B|A)P(A)}{P(B)}.$$

<h3 align="center">Example</h3>

- **Chris Wiggins**, an associate professor of applied mathematics at **Columbia University**, posed the following problem in an article in **Scientific American** (<a href = 'https://www.scientificamerican.com/article/what-is-bayess-theorem-an/'>Link to the article in Scientific American</a>):


- **Problem**: A patient goes to see a doctor. The doctor performs a test with **99%** reliability - that is, **99%** of people who are sick test positive and **99%** of the healthy people test negative. The doctor knows that only **1%** of the people in the country are sick. Now the question is: **if the patient tests positive, what are the chances the patient is sick?**


- The intuitive answer is **99 %**, but the correct answer is **50 %**.

- **Solution**. 

  Wiggins's explanation can be summarized with the help of the following table which illustrates the scenario in a hypothetical population of $10,000$ people:

|      | Diseased | Not Diseased |     |
|:----:|:--------:|:------------:|:---:|
|test +| 99       | 99           | 198 | 
|test -| 1        | 9801         | 9802| 
|      | 100      | 9900         |10000|


- We want to know the probability of disease $(A)$ given that the patient has a positive test $(B)$, i.e. $P(A|B).$


- We know that the unconditional probability of disease is $1\%$, i.e. $P(A) = 0.01$;


- The unconditional probability of a positive test is $P(B) = 198/10000 = 0.0198$;


- We also know the sensitivity of the test is $99\%$, i.e. $P(B | A) = 0.99$.


- Using the Bayes's Theorem we get:

  $$P(A|B)= \frac{P(B|A)P(A)}{P(B)} = \frac{0.99 \cdot 0.01}{0.0198} = \frac{1}{2} = 50\%.$$

 

<h3 align="center">Extended Bayes' Theorem</h3>

- Let $(\Omega, \Sigma, P)$ be a probability space.


- **Extended Bayes' theorem**:

  If $B_1, B_2, \dots, B_n$ are conditionally independent events given $A$, such that $P(B_i) > 0$ for each $i \in \{1, 2, 3, \dots , n \}$ and $P(A)>0$, then:

  $$P(A | B) = 
\frac{P(B_1|A) \times \cdots \times  P(B_n|A) \times P(A)}{P(B_1|A) \times \cdots \times P(B_n|A) \times P(A) + P(B_1|\overline{A}) \times \cdots \times  P(B_n|\overline{A}) \times P(\overline{A})},$$

  where $\overline{A} = \Omega \setminus A$ and $B = B_1 \cap B_2 \cap \dots \cap B_n$.

<h3 align="center">Example</h3>

- Suppose we have a data set that describes the weather conditions for playing a **game of golf**:
  - Weather conditions: **Outlook, Temperature, Humidity, Wind**.
  - Each tuple classifies the conditions as **fit** (**Yes**) or **unfit** (**No**) for **Playing** golf.

| Day |  Outlook | Temperature | Humidity | Wind   | Play |
|:---:|:--------:|:-----------:|:--------:|:------:|:----:|
|  0  |   Sunny  |     Hot     | High     | Weak   | No   |
|  1  |   Sunny  |     Hot     | High     | Strong | No   |
|  2  | Overcast |     Hot     | High     | Weak   | Yes  |
| 3   | Rain     | Mild        | High     | Weak   | Yes  |
| 4   | Rain     | Cool        | Normal   | Weak   | Yes  |
| 5   | Rain     | Cool        | Normal   | Strong | No   |
| 6   | Overcast | Cool        | Normal   | Strong | Yes  |
| 7   | Sunny    | Mild        | High     | Weak   | No   |
| 8   | Sunny    | Cool        | Normal   | Weak   | Yes  |
| 9   | Rain     | Mild        | Normal   | Weak   | Yes  |
| 10  | Sunny    | Mild        | Normal   | Strong | Yes  |
| 11  | Overcast | Mild        | High     | Strong | Yes  |
| 12  | Overcast | Hot         | Normal   | Weak   | Yes  |
| 13  | Rain     | Mild        | High     | Strong | No   |


- Now, suppose we have a **Day** with the following values:
  - **Outlook** = **Sunny**
  - **Temperature** = **Cool**
  - **Humidity** = **High**
  - **Windy** = **Strong**

- So, with the data, we have to predict whether "we can **Play** on that day or not."

- **What are the events in our case?**
  - $A \text{  }: \{ \text{ Play = Yes}\}$
  - $B_1 : \{ \text{Outlook = Sunny}\}$
  - $B_2 : \{ \text{Temperature = Cool}\}$
  - $B_3 : \{ \text{Humidity = High}\}$
  - $B_4 : \{ \text{Wind = Strong}\}$

- First, we need to calculate **individual probabilities** with respect to each weather conditions (feature):

|  Outlook |     |    |               |               |$|$| Temperature |     |    |               |               |
|:---------|:---:|:--:|:-------------:|:-------------:|---|:------------|:---:|:--:|:-------------:|:-------------:|
|          | Yes | No |     P(Yes)    |     P(No)     |$|$|             | Yes | No |     P(Yes)    |     P(No)     |
|   Sunny  |  2  |  3 | $\frac{2}{9}$ | $\frac{3}{5}$ |$|$|     Hot     |  2  |  2 | $\frac{2}{9}$ | $\frac{2}{5}$ |
| Overcast |  4  |  0 | $\frac{4}{9}$ | $\frac{0}{5}$ |$|$|     Mild    |  4  |  2 | $\frac{4}{9}$ | $\frac{2}{5}$ |
|   Rainy  |  3  |  2 | $\frac{3}{9}$ | $\frac{2}{5}$ |$|$|     Cool    |  3  |  1 | $\frac{3}{9}$ | $\frac{1}{5}$ |
|   Total  |  9  |  5 |       1       |       1       |$|$|    Total    |  9  |  5 |       1       |       1       |

| Humidity |     |    |               |               |$|$| Wind  |     |    |               |               |
|:---------|:---:|:--:|:-------------:|:-------------:|---|:------|:---:|:--:|:-------------:|:-------------:|
|          | Yes | No |     P(Yes)    |     P(No)     |$|$|       | Yes | No |     P(Yes)    |     P(No)     |
|   High   |  3  |  4 | $\frac{3}{9}$ | $\frac{4}{5}$ |$|$|Strong |  3  |  3 | $\frac{3}{9}$ | $\frac{3}{5}$ |
|   Norm   |  6  |  1 | $\frac{6}{9}$ | $\frac{1}{5}$ |$|$| Weak  |  6  |  2 | $\frac{6}{9}$ | $\frac{2}{5}$ |
|   Total  |  9  |  5 |       1       |       1       |$|$| Total |  9  |  5 |       1       |       1       |

|       | Play |   Probability  |
|:-----:|:----:|:--------------:|
|  Yes  |   9  | $\frac{9}{14}$ |
|   No  |   6  | $\frac{5}{14}$ |
| Total |  14  |        1       |

- Second, we need to write down the **conditional probabilites** for our instance:

| | |
|:-:|:-:|
|$P(B_1|A) = \frac{2}{9}$|  $P(B_1|\overline{A}) = \frac{3}{5}$|
|$P(B_2|A) = \frac{3}{9}$| $P(B_2|\overline{A}) = \frac{1}{5}$|
|$P(B_3|A) = \frac{3}{9}$| $P(B_3|\overline{A}) = \frac{4}{5}$|
|$P(B_4|A) = \frac{3}{9}$| $P(B_4|\overline{A}) = \frac{3}{5}$|
|$P(A) = \frac{9}{14}$| $P(\overline{A}) = \frac{5}{14}$|

- Third, we use the **Extended Bayes' Theorem** and obtain the final probability:

$$P(A|B) = \frac{
\frac{2}{9} \frac{3}{9} \frac{3}{9} \frac{3}{9} \frac{9}{14}
}{
\frac{2}{9} \frac{3}{9} \frac{3}{9} \frac{3}{9} \frac{9}{14} + \frac{3}{5} \frac{1}{5} \frac{4}{5} \frac{3}{5} \frac{5}{14}
} = 0.20$$


In [1]:
P = (2/9 * 3/9 * 3/9 * 3/9 * 9/14)/(2/9 * 3/9 * 3/9 * 3/9 * 9/14 + 3/5 * 1/5 * 4/5 * 3/5 * 5/14)
P

0.204582651391162


<h1 align="center">End of Part A</h1>