## The Naive Bayes Classifier

This notebook explores the use of the Naive Bayes classifier on a binary classification problem.

The dataset follows a very simple rule:
* Given an Nx3 binary input matrix X and an Nx1 output vector y, if sum(X(m,:)) < 2 then y(m)=0 and y(m)=1 otherwise

See [this link](http://machinelearningmastery.com/naive-bayes-tutorial-for-machine-learning/) for a simple intro to Naive Bayes for Machine Learning

In [1]:
import numpy as np
from sklearn.naive_bayes import GaussianNB, MultinomialNB, BernoulliNB

### Gaussian Naive Bayes

In [2]:
N=1000 #number of training (and test) examples
X_train = np.random.randint(0,2,(N,3))
y_train = np.where(np.sum(X_train,1)<2, 0, 1)
X_test = np.random.randint(0,2,(N,3))
y_test = np.where(np.sum(X_test,1)<2, 0, 1)

# use Gaussian Naive Bayes
classifier = GaussianNB()
classifier.fit(X_train, y_train)
yp = classifier.predict(X_test)

# classification score and priors
print 'Test Set Score:', classifier.score(X_test, y_test)*100, '%'
print 'Test Set Class Priors: ', classifier.class_prior_

Test Set Score: 100.0 %
Test Set Class Priors:  [ 0.457  0.543]


In [3]:
# evaluate the inputs of the confusion matrix (directly)

fp = np.sum((y_test==0) & (yp==1))
fn = np.sum((y_test==1) & (yp==0))
tp = np.sum((y_test==1) & (yp==1))
tn = np.sum((y_test==0) & (yp==0))

print 'False Positives: ', fp
print 'False Negatives: ', fn
print 'True Positives: ', tp
print 'True Negatives: ', tn

False Positives:  0
False Negatives:  0
True Positives:  513
True Negatives:  487


### Multinomial Naive Bayes

In [4]:
N=1000 #number of training (and test) examples
X_train = np.random.randint(0,2,(N,3))
y_train = np.where(np.sum(X_train,1)<2, 0, 1)
X_test = np.random.randint(0,2,(N,3))
y_test = np.where(np.sum(X_test,1)<2, 0, 1)

# use Multinomial Naive Bayes
classifier = MultinomialNB()
classifier.fit(X_train, y_train)
yp = classifier.predict(X_test)

# classification score and priors
print 'Test Set Score:', classifier.score(X_test, y_test)*100, '%'
print 'Test Set Class Priors: ', classifier.class_prior

Test Set Score: 63.5 %
Test Set Class Priors:  None


In [5]:
# evaluate the inputs of the confusion matrix (directly)

fp = np.sum((y_test==0) & (yp==1))
fn = np.sum((y_test==1) & (yp==0))
tp = np.sum((y_test==1) & (yp==1))
tn = np.sum((y_test==0) & (yp==0))

print 'False Positives: ', fp
print 'False Negatives: ', fn
print 'True Positives: ', tp
print 'True Negatives: ', tn

False Positives:  126
False Negatives:  239
True Positives:  244
True Negatives:  391


### Bernoulli Naive Bayes

In [6]:
N=1000 #number of training (and test) examples
X_train = np.random.randint(0,2,(N,3))
y_train = np.where(np.sum(X_train,1)<2, 0, 1)
X_test = np.random.randint(0,2,(N,3))
y_test = np.where(np.sum(X_test,1)<2, 0, 1)

# use Multinomial Naive Bayes
classifier = BernoulliNB(fit_prior=True)
classifier.fit(X_train, y_train)
yp = classifier.predict(X_test)

# classification score and priors
print 'Test Set Score:', classifier.score(X_test, y_test)*100, '%'
print 'Test Set Class Priors: ', classifier.class_prior

Test Set Score: 100.0 %
Test Set Class Priors:  None


In [7]:
# evaluate the inputs of the confusion matrix (directly)

fp = np.sum((y_test==0) & (yp==1))
fn = np.sum((y_test==1) & (yp==0))
tp = np.sum((y_test==1) & (yp==1))
tn = np.sum((y_test==0) & (yp==0))

print 'False Positives: ', fp
print 'False Negatives: ', fn
print 'True Positives: ', tp
print 'True Negatives: ', tn

False Positives:  0
False Negatives:  0
True Positives:  482
True Negatives:  518
