# Bayesian estimation

A core concept in machine learning and related fields

Probablistic views and concepts

Bayes classifiers

## Conditional probability

$p(C \, | \, \mathbf{x})$: the probability of event $C$ given $\mathbf{x}$

## Classification

$C$ indicates class label

From an observed data $\mathbf{x}$ we want to predict probability of some event $C$.

We take a probabilistic view because real world is often non-deterministic.

### Iris example

$C$: type of flower

$\mathbf{x}$: flower features

<table style="width:100% border=0">
<tr>
<td>
<img src ="https://upload.wikimedia.org/wikipedia/commons/thumb/5/56/Kosaciec_szczecinkowaty_Iris_setosa.jpg/220px-Kosaciec_szczecinkowaty_Iris_setosa.jpg">
</td>

<td>
<img src="https://upload.wikimedia.org/wikipedia/commons/thumb/4/41/Iris_versicolor_3.jpg/220px-Iris_versicolor_3.jpg">
</td>

<td>
<img src="https://upload.wikimedia.org/wikipedia/commons/thumb/9/9f/Iris_virginica.jpg/220px-Iris_virginica.jpg">
</td>
</tr>

<tr style="text-align=center">
<td>
Setosa
</td>
<td>
Versicolor
</td>
<td>
Virginica
</td>
</tr>
</table>


### Banking example

A bank decides whether to make a loan to a customer:
* $
\mathbf{x} = 
\begin{pmatrix}
x_1 \\
x_2
\end{pmatrix}
$
: customer income $x_1$ and asset $x_2$

* $C$: 0/1 if the customer is likely/unlike to pay back the loan

Make a loan if $P(C = 1 \, | \, \mathbf{x}) > 0.5$ or other threshold value

### Bayes' rule

How to compute $P(C \, | \, \mathbf{x})$, which is unknown?

From conditional probability:
$$
\begin{align}
p\left(C \, | \, \mathbf{x}\right)
&= 
\frac{p\left(\mathbf{x} \, | \, C\right) p\left(C\right)}{p\left(\mathbf{x}\right)}
\\
&\propto
p\left(\mathbf{x} \, | \, C\right) p\left(C\right)
\end{align}
$$

* $p(C \, | \, \mathbf{x})$: posterior
<br>
the likelihood of $C$ given $\mathbf{x}$

* $p(C)$: prior
<br>
how likely $C$ is before observing $\mathbf{x}$

* $p(\mathbf{x} \, | \, C)$: likelihood
<br>
how likely $\mathbf{x}$ is if it belongs to $C$

* $p(\mathbf{x})$: marginal/evidence
<br>
constant for given $\mathbf{x}$ 

We can compute $P(C \, | \, \mathbf{x})$ (posterior), if given $p(C)$ (prior) and $p(\mathbf{x} \, | \, C)$ (likelihood)

### Rational decision making

Humans tend to over focus on rare events

For example:

$\mathbf{x}$: get killed

$C$: cause of death
* $C_{car}$: car crash
* $C_{plane}$: airplane crash

Plane crash is much more deadly than car crash:
$
p(\mathbf{x} \, | \, C_{plane}) \gg p(\mathbf{x} \, | \, C_{car})
$

say
$
\begin{align}
p(\mathbf{x} \, | \, C_{plane}) &= 1.0
\\
p(\mathbf{x} \, | \, C_{car}) &= 0.1
\end{align}
$

But plane crash is much rarer than car crash:
$
p(C_{plane}) \ll p(C_{car})
$

say
$
\begin{align}
p(C_{plane}) &= 0.001 \\
p(C_{car}) &= 0.1
\end{align}
$

Multiply together:
$
\begin{align}
p(\mathbf{x} \, | \, C_{plane}) p(C_{plane}) &= 0.001
\\
p(\mathbf{x} \, | \, C_{car}) p(C_{car}) &= 0.01
\end{align}
$

Thus plane travel is actually safter than car travel:
$
p(C_{plane} \, | \, \mathbf{x}) < p(C_{car} \, | \, \mathbf{x})
$

## Learning

From training data $\mathbf{X}$ we want to estimate model parameters $\Theta$.

$$
\begin{align}
p\left(\Theta | \mathbf{X}\right)
&= 
\frac{p\left(\mathbf{X} | \Theta\right) p\left(\Theta\right)}{p\left(\mathbf{X}\right)}
\\
&\propto
p\left(\mathbf{X} | \Theta\right) p\left(\Theta\right)
\end{align}
$$

* $p(\Theta | \mathbf{X})$: posterior
<br>
the likelihood of $\Theta$ given $\mathbf{X}$

* $p(\Theta)$: prior
<br>
how likely $\Theta$ is before observing $\mathbf{X}$

* $p(\mathbf{X} | \Theta)$: likelihood
<br>
how likely $\mathbf{X}$ is if the model parameters are $\Theta$

* $p(\mathbf{X})$: marginal/evidence
<br>
constant for given $\mathbf{X}$ 

# Naive Bayes

http://scikit-learn.org/stable/modules/naive_bayes.html

In [1]:
from sklearn import datasets
import numpy as np

iris = datasets.load_iris()

X = iris.data[:, [2, 3]]
y = iris.target

In [2]:
from sklearn.cross_validation import train_test_split

# splitting data into 70% training and 30% test data: 
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.3, random_state=0)

In [3]:
from sklearn.preprocessing import StandardScaler

sc = StandardScaler()
sc.fit(X_train)
X_train_std = sc.transform(X_train)
X_test_std = sc.transform(X_test)

In [4]:
from sklearn.metrics import accuracy_score

In [5]:
from sklearn.naive_bayes import GaussianNB

gnb = GaussianNB()

_ = gnb.fit(X_train_std, y_train)

In [6]:
y_pred = gnb.predict(X_test_std)

print('Accuracy: %.2f' % accuracy_score(y_test, y_pred))

Accuracy: 0.98
