# Naïve Bayes Classifier

[Bayes theorem](https://en.wikipedia.org/wiki/Bayes%27_theorem) _describes the probability of an event, based on prior knowledge of conditions that might be related to the event. For example, if cancer is related to age, then, using Bayes' theorem, a person's age can be used to more accurately assess the probability that they have cancer, compared to the assessment of the probability of cancer made without knowledge of the person's age._

[Naïve Bayes](https://en.wikipedia.org/wiki/Naive_Bayes_classifier) represents a family of _simple probabilistic classifiers based on applying Bayes' theorem with strong (naive) independence assumptions between the features._

>When dealing with continuous data, a typical assumption is that the continuous values associated with each class are distributed according to a [Gaussian](https://en.wikipedia.org/wiki/Naive_Bayes_classifier#Gaussian_naive_Bayes) distribution.

### import
Import, `numpy`, `matplotlib.pyplot`, `GaussianNB`, `cross_validation` and `visualize_classifier`:

In [49]:
import numpy as np
import matplotlib.pyplot as plt
from sklearn.naive_bayes import GaussianNB as gnb
from sklearn import cross_validation as cv
from utilities import visualize_classifier as vc

### input data

In [50]:
inp = 'data_multivar_nb.txt'

### load data

In [51]:
d = np.loadtxt(inp, delimiter = ',')
X, y = d[:, :-1], d[:, -1]

### Naïve Bayes classifier
Create `sklearn.naive_bayes.GaussianNB()` classifier.

In [52]:
c = gnb()

### train classifier
`sklearn.naive_bayes.GaussianNB().fit()`

In [53]:
c.fit(X, y)

GaussianNB(priors=None)

### predict training data values

In [None]:
y_pred = c.predict(X)

### compute accuracy

In [None]:
acc = 100.0 * (y == y_pred).sum() / X.shape[0]
print("Accuracy of Naive Bayes classifier =", round(acc, 2), "%")

### visualize classifier performance

In [None]:
vc(c, X, y)

## Cross Validation

### split data: training | testing

In [None]:
X_train, X_test, y_train, y_test = cv.train_test_split(X, y, test_size = 0.2, random_state = 3)
cn = gnb()
cn.fit(X_train, y_train)
y_test_pred = cn.predict(X_test)

### compute classifier accuracy

In [None]:
acc = 100.0 * (y_test == y_test_pred).sum() / X_test.shape[0]
print("Accuracy of the new classifier =", round(acc, 2), "%")

### visualize classifier performance

In [None]:
vc(cn, X_test, y_test)

## Scoring Functions
Values:
- accuracy
- precision
- recall
- F1

In [None]:
num_folds = 3
# accuracy values
av = cv.cross_val_score(classifier, X, y, scoring = 'accuracy', cv = num_folds)
print("Accuracy: " + str(round(100 * av.mean(), 2)) + "%")
# precision values
pv = cv.cross_val_score(classifier, X, y, scoring = 'precision_weighted', cv = num_folds)
print("Precision: " + str(round(100 * pv.mean(), 2)) + "%")
# recall values
rv = cv.cross_val_score(classifier, X, y, scoring = 'recall_weighted', cv = num_folds)
print("Recall: " + str(round(100 * rv.mean(), 2)) + "%")
# F1 values
fv = cv.cross_val_score(classifier, X, y, scoring = 'f1_weighted', cv = num_folds)
print("F1: " + str(round(100 * fv.mean(), 2)) + "%")