# Week 2
Required imports:

In [13]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import scipy

## Applying Bayes' theorem – the Naïve Bayes classifyer

One simple, yet powerful method to solve **classification problems** is what's called the Naïve Bayes classifyer, and today we're going to figure out how it works.

Suppose that we have a set of **training data** to teach the classifyer from – these are observations of the following form:

$$\textrm{Observed object}= (x_1,x_2,...,C)$$

where $\mathbf{x}=(x_1,x_2,...,x_n)$ are some **parameters** that we can measure, and $C$ is the value of the **class** that this observed object belongs to – say, there are $K$ classes to choose from, so 

$$C\in{1,2,...,K}$$

is a discrete variable that simply **labels** the object with the number of the class it belongs to.

Now, if we observe a new object, we have to decide the value of $C$, based on the values of $x_i$-s, its parameters – so we are solving a **classification problem.**

Using Bayes' theorem, we can express the conditional probabilities from one another:

$$p(C|\mathbf{x}) = \frac{p(C)p(\mathbf{x}|C)}{p(\mathbf{x})}$$

or, to name it in Bayesian probability terms,

$$\textrm{posterior} = \frac{\textrm{prior}\times\textrm{likelihood}}{\textrm{evidence}}$$

_(this mnemonic formula is good to remember)_

Notice that in the above formula, the denominator (the evidence), $p(\mathbf{x})$ **does not depend on $C$**! So, since our task is to **infer** $C$ from given observed parameters, $\mathbf{x}$, varying the value of $C$ does not change the denominator – so effectively we're **only interested in the numerator**.

The posterior, $p(\mathbf{x}|C)$, can turned into a product of conditional probabilities, using the **chain rule**:

$$p(\mathbf{x}|C)=p(x_1,x_2,...|C)=p(x_1|x_2,x_3...,C)\cdot p(x_2|x_3,...,C)...p(x_{n-1}|x_n,C)\cdot p(x_n|C)$$

This is true in the most general case, no matter what. The key **assumption** that we're going to make is that the **parameters are independent random variables** 

$$p(x_1|x_2,x_3...,C)=p(x_1|C),\quad p(x_2|x_3,...,C)=p(x_2|C),\quad...$$

_(this is the limitation of NBC, and this is why it has "Naïve" in its name)_ – which is a very strong assumption! But from it, it follows that all the $x$-s after the condition line, $|$, can be now neglected, and the posterior is **just a product** of individual posteriors of the parameters:

$$p(\mathbf{x}|C)=\prod_{i=1}^N p(x_i|C).$$

Now, to **predict** (infer) the class of the observed object, one idea is to **maximise**

$$\tilde{C}=\underset{C\in{1,2,...,K}}{\text{arg max }} p(C|\mathbf{x})$$

and, since the **denominator does not depend on $C$,** we should just maximise the numerator

$$\tilde{C}=\underset{C\in{1,2,...,K}}{\text{arg max }}\left( p(C)\prod_{i=1}^N p(x_i|C)\right).$$

To do that, one should also know the **prior** probability distribution of the **classes**, $p(C)$. In many practical cases, this can be also chosen from the common sense.

### Example: Gaussian NBC

Here's some data on the height, weight and foot size for a group of people:

In [5]:
df = pd.DataFrame.from_dict({'person':['female','female','female','female','male','male','male','male'], 
                        'height':[5, 5.5, 5.42, 5.75, 6, 5.92, 5.58, 5.92], 
                        'weight':[100, 150, 130, 150, 180, 190, 170, 165], 
                        'foot_size':[6,8,7,9,12,11,12,10]})

In [6]:
df

Unnamed: 0,person,height,weight,foot_size
0,female,5.0,100,6
1,female,5.5,150,8
2,female,5.42,130,7
3,female,5.75,150,9
4,male,6.0,180,12
5,male,5.92,190,11
6,male,5.58,170,12
7,male,5.92,165,10


And here's a sample (c for classify) that we should classify:

In [40]:
sample = {'height': 6, 'weight': 130, 'foot_size': 8}

**Task** is to compute the likelihoods of the sample being a male or a female, and to compare them!

_Hint: for that, here's the function that computes the pdf of a normal distribution with $\mu = 0$ (zero mean), $\sigma = 1$ (std = 1) at $x=0.3$:_

In [15]:
from scipy.stats import norm

In [19]:
norm(0, 1).pdf(0.3)

0.38138781546052414

Let's get the averages and the stds:

In [37]:
means = df.groupby(['person']).mean().reset_index()

means

Unnamed: 0,person,height,weight,foot_size
0,female,5.4175,132.5,7.5
1,male,5.855,176.25,11.25


In [38]:
stds = df.groupby(['person']).std().reset_index()

stds

Unnamed: 0,person,height,weight,foot_size
0,female,0.311809,23.629078,1.290994
1,male,0.187172,11.086779,0.957427


In [43]:
for pers in ['male', 'female']:
    # set the likelihood to 1 at first
    li = 1.
    
    for param in ['height', 'weight', 'foot_size']:
        mean = means[means.person==pers][param]
        std = stds[stds.person==pers][param]
        x = sample[param]
        li *= norm(mean, std).pdf(x)
        
    print(pers,'likelihood = ', li)

male likelihood =  [1.23941437e-08]
female likelihood =  [0.00107558]
