### Math Overview
\begin{equation}
    P(y\mid x_1,x_2,\dots,x_n) = \frac{P(y)P(x_1, \dots, x_n\mid y)}{P(x_1, \dots, x_n)}
\end{equation}

since we assume that all the explanatory variables are linearly independent,
\begin{equation}
    P(x_1, \dots, x_n \mid y) = P(x_i \mid y)
\end{equation}

for all $i$, this relationship is simplified to:
\begin{equation}
    P(y \mid x_1,\dots,x_n) = \frac{P(y) \prod_{i=1}^n {P(x_i \mid y)}}{P(x_i \mid y)}
\end{equation}

Naive Bayes classifier chooses the highest probability for y,  
since denominator $P(x_i \mid y)$ is common for all $P(y_j)$
\begin{equation}
    P(y \mid x_1,\dots,x_n) \propto P(y) \prod_{i=1}^n {P(x_i \mid y)} 　\\
    \Downarrow   \\
    \hat{y}=\arg\max_y{P(y)}\prod_{i=1}^n{P(x_i\mid y)}
\end{equation}


In [7]:
import sklearn
from sklearn.naive_bayes import GaussianNB
from sklearn import datasets

In [8]:
iris= datasets.load_iris()
X= iris.data
Y= iris.target

In [11]:
from sklearn.model_selection import train_test_split
X_train, X_test,Y_train,Y_test= train_test_split(X,Y)

In [12]:
model= GaussianNB()
model.fit(X_train,Y_train)
y_pred= model.predict(X_test)

In [13]:
sklearn.metrics.accuracy_score(Y_test,y_pred)

0.92105263157894735

In [14]:
sklearn.metrics.confusion_matrix(Y_test,y_pred)

array([[13,  0,  0],
       [ 0, 11,  1],
       [ 0,  2, 11]])

### More resources
1.) [SciKit Learn Documentation](http://scikit-learn.org/stable/modules/naive_bayes.html)  
2.) [Naive Bayes with NLTK](http://slendermeans.org/ml4h-ch3.html)  
3.) [Wikipedia on Naive Bayes](http://en.wikipedia.org/wiki/Naive_Bayes_classifier)  
4.) [Andrew Ng's Class Notes](http://cs229.stanford.edu/notes/cs229-notes2.pdf)  
5.) [Andrew Ng's Video Lecture on Naive Bayes](https://www.youtube.com/watch?v=z5UQyCESW64)  
6.) [UC Berkeley Lecture by Pieter Abbeel](https://www.youtube.com/watch?v=DNvwfNEiKvw)