# Naive Bayes Classification

- As similar as Linear Classification Model.
- Faster training model than linear models, but low generalization ability.
- Naive Bayes train each features individually.
- After then, for each characteristics, combine each.
- Scikit-learn:
    - GaussianNB: Continuous Data
    - BernoulliNB: Binary Data
    - MultinomialNB: Count Data

## BernoulliNB
- For each features count how many non-zero value exists.
- Example: 
    - For each class y(0,1), count how many non-zero X-features exist.

In [25]:
import numpy as np
X = np.array([[0, 1, 0, 1],
              [1, 0, 1, 1],
              [0, 0, 0, 1],
              [1, 0, 1, 0]])
y = np.array([0, 1, 0, 1])
for ls in np.unique(y):
    print(ls)

0
1


In [26]:
# count the non-zero entries per class.
counts = {}
for label in np.unique(y):
    counts[label] = X[y == label].sum(axis=0)
print("Feature counts:\n{}".format(counts))

Feature counts:
{0: array([0, 1, 0, 2]), 1: array([2, 0, 2, 1])}


## MultinomialNB and BernoulliNB
- MultinomialNB: Calculate the average of each feature for each class.
- GaussianNB: Average value as well as the standard deviation of each feature for each class.
- Predicting formula is as similar as linear model.
- However, coef_ is not slope. It shows the counts of each features which transform to log.


## Strengths, Weaknesses, and Parameters
- MultinomialNB, BernoulliNB: 
    - Use for sparse count data such as text.
    - Single parameter alpha which controls model coplexity. 
    - Add alpha amount of virtual data points.
- GaussianNB: 
    - Use on very high-dimensional data.
- Strengthen and Weakness:
    - Work very well with high-dimensional sparse data.
    - Relatively robust to the parameters.
    - Use on very large datasets, where training even a linear model might take too long.