## 1.9 Naive Bayes

**Naive Bayes** methods are a set of supervised learning algorithms applying Bayes' theorem ith the *naive* assumption of conditional independence between every pair of features given the class variable.

**Bayes' Theorem**
For a class variable $y$ with dependent feature vectors $x_{1}$ to $x_{n}$, then:

$P(y|x_{1},...,x_{n}) = \frac{P(y)P(x_{1},...,x_{n}|y)}{P(x_{1},...,x_{n})}$ 

And Naive Bayes then uses the naive conditional independence assumption:

$P(x_{i}|y, x_{1},...,x_{i-1},x_{i+1},...,x_{n}) = P(x_{i} | y)$

**Although Naive Bayes is a decent classifier, it is known to be a bad estimator** and therefore the probability outputs from `predict_proba` should not be taken very seriously

### 1.9.1 Gaussian Naive Bayes

`GaussianNB` implements the Gaussian NB algorithm for classification, where the likelihood of the features is assumed to be Gaussian:

$P(x_{i}|y) = \frac{1}{\sqrt{2\pi\sigma^{2}_{y}}}exp\left(-\frac{(x_{i}-\mu_{y})^{2}}{2\sigma^{2}_{y}}\right)$

Example:

In [1]:
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import GaussianNB

In [2]:
X, y = load_iris(return_X_y = True)

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.5, random_state=0)

In [3]:
gnb = GaussianNB()

y_pred = gnb.fit(X_train, y_train).predict(X_test)

In [4]:
print(f"Mislabeled points: {(y_test != y_pred).sum()} out of {X_test.shape[0]}")

Mislabeled points: 4 out of 75


### 1.9.2 Multinomial Naive Bayes

`MultinomialNB` implements the naive Bayes algorithm for multinomially distributed data, and is one of two class NB algorithms used in text classification (where data is typically represented as word vectors counts).

### 1.9.3 Complement Naive Bayes

`ComplementNB` implements the complement naive Bayes (CNB), which is an adaptation of the standard multinomial NB and which is particularly suited for imbalanced data sets. Specifically, CNB uses statistics from the *complement* of each class to compute the model's weights.

### 1.9.4 Bernoulli Naive Bayes

`BernoulliNB` implements the NB training and classification algorithms for data that is distributed according to *multivariate Bernoulli distributions*; that is, there may be multiple features but each is assumed to be a binary-valued (boolean) variable.

### 1.9.5 Categorical Naive Bayes

`CategoricalNB` implements the categorical NB algorithm for categorically distributed data. It assumes each feature described by index $i$ has its on categorical distribution.

### 1.9.6 Out-of-core naive Bayes model fitting

NB models can be use for large-scale classication problems for which the full training set might not fit in memory.