Naive Bayes

# 1 Background
Given a class variable $y$ and a dependent feature vector $x_1$ through $x_n$, Bayes' theorem states the following relationship.  
$$P(y|x_1,\ldots,x_n)=\frac{P(y)P(x_1,\ldots,x_n | y)}{P(x_1,\ldots,x_n)}$$ 
Using the naive independence assumption that
$$P(x_i|y,x_1,\ldots,x_{i-1},x_{i+1},\ldots,x_n)=P(x_i|y)$$
for all $i$ this relationship is simplified to
$$P(y|x_1,\ldots,x_n)=\frac{P(y)\prod_{i=1}^{n}P(x_i|y)}{P(x_1,\ldots,x_n)}$$
Since $P(x_1,\ldots,x_n)$ is constant given the input, we can use the following classification rule:
$$P(y|x_1,\ldots,x_n) \propto P(y)\prod_{i=1}^{n}P(x_i|y)$$
$\Rightarrow$
$$\hat{y}=arg \overset{max}{y} P(y)\prod_{i=1}^{n}P(x_i|y)$$

# 2 Gaussian Naive Bayes
$$P(x_i|y)=\frac{1}{\sqrt{2\pi\sigma_{y}^2}}exp\left( -\frac{(x_i-\mu_y)^2}{2\sigma_y^2}\right)$$

In [2]:
from sklearn import datasets
iris = datasets.load_iris()
from sklearn.naive_bayes import GaussianNB
gnb=GaussianNB()
y_pred=gnb.fit(iris.data, iris.target).predict(iris.data)

In [4]:
print('Number of mislabeled points out of the a total %d points: %d'
      % (iris.data.shape[0], (iris.target!=y_pred).sum()))

Number of mislabeled points out of the a total 150 points: 6


# 3 Multinomial Naive Bayes
Implements the naive bayes algorithm for multinomial distributed data, and is one of th two classcci Naive Bayes variants in text classification.

In [24]:
import numpy as np
x_1=np.array([1,1,1,1,1,
              2,2,2,2,2,
              3,3,3,3,3])
x_2=np.array(['S','M','M','S','S',
              'S','M','M','L','L',
              'L','M','M','L','L'])
X=np.vstack((x_1,x_2)).T
y=np.array([-1,-1,1,1,-1,
            -1,-1,1,1,1,
            1, 1, 1,1,-1])
from sklearn.naive_bayes import MultinomialNB
clf=MultinomialNB()
clf.fit(X,y)

TypeError: '<' not supported between instances of 'numpy.ndarray' and 'int'

# 4 Bernoulli Naive Bayes
implements the naive Bayes training and classification algorithms for data that is distributed according to multivariate Bernoulli distributions;