### Naive Bayes in Sklearn

We will again use the iris data.

In [2]:
# This tells matplotlib not to try opening a new window for each plot.
%matplotlib inline

import matplotlib.pyplot as plt
import numpy as np
from sklearn.datasets import load_iris
from sklearn.naive_bayes import GaussianNB
from sklearn.naive_bayes import BernoulliNB
from sklearn.naive_bayes import MultinomialNB

In [3]:
# Load the data, which is included in sklearn.
iris = load_iris()
print 'Iris target names:', iris.target_names
print 'Iris feature names:', iris.feature_names
X, Y = iris.data, iris.target

# Shuffle the data, but make sure that the features and accompanying labels stay in sync.
np.random.seed(0)
shuffle = np.random.permutation(np.arange(X.shape[0]))
X, Y = X[shuffle], Y[shuffle]

# Split into train and test.
train_data, train_labels = X[:100], Y[:100]
test_data, test_labels = X[100:], Y[100:]

Iris target names: ['setosa' 'versicolor' 'virginica']
Iris feature names: ['sepal length (cm)', 'sepal width (cm)', 'petal length (cm)', 'petal width (cm)']


Sklearn has three types of Naive Bayes: gaussian, beroulli, and multinomial.

http://scikit-learn.org/stable/modules/generated/sklearn.naive_bayes.GaussianNB.html

http://scikit-learn.org/stable/modules/generated/sklearn.naive_bayes.BernoulliNB.html

http://scikit-learn.org/stable/modules/generated/sklearn.naive_bayes.MultinomialNB.html

What is the difference between them? These are the assumed ditributional form of each component of P(X|Y); the distribution of each individual features.

Try using each of these on the iris data, you will have to prepare the data for multinomial and bernoulli.

In [4]:
gau = GaussianNB()
gau.fit(train_data, train_labels)

bern = BernoulliNB(binarize=0.9)
bern.fit(train_data, train_labels)

mult = MultinomialNB()
# floor converts data into a discrete-like set
mult.fit(np.floor(train_data), train_labels)

print 'gaussian accuracy: %3.2f' %gau.score(test_data, test_labels)
print 'bernoulli accuracy: %3.2f' %bern.score(test_data, test_labels)
print 'multinomial accuracy: %3.2f' %mult.score(np.floor(test_data), test_labels)

gaussian accuracy: 0.96
bernoulli accuracy: 0.66
multinomial accuracy: 0.82




What choices did you make with manipulating the features above? Try tuning these choices, can you improve the accuracy?

In [5]:
# lets play around with binarize in bernoulli

for bin in [x / 1.0 for x in range(1, 10)]:
    bern = BernoulliNB(binarize=bin)
    bern.fit(train_data, train_labels)
    print 'binarize:', bin, 'bernoulli accuracy: %3.2f' %bern.score(test_data, test_labels)

binarize: 1.0 bernoulli accuracy: 0.66
binarize: 2.0 bernoulli accuracy: 0.88
binarize: 3.0 bernoulli accuracy: 0.80
binarize: 4.0 bernoulli accuracy: 0.66
binarize: 5.0 bernoulli accuracy: 0.74
binarize: 6.0 bernoulli accuracy: 0.62
binarize: 7.0 bernoulli accuracy: 0.46
binarize: 8.0 bernoulli accuracy: 0.28
binarize: 9.0 bernoulli accuracy: 0.28


Investigate what effect alpha has on the bernoulli and multinomial classifiers. What happens when alpha is very high? Is there an optimal value?

Does increasing alpha add bias or variance to our estimator?

In [8]:
# alpha is a biasing effect, e.g.

for aa in [0, 0.001, 0.01, 0.1, 1, 5, 10, 50, 100, 1000]:
    bern = BernoulliNB(binarize=0.1, alpha=aa)
    bern.fit(train_data, train_labels)
    
    print 'alpha:', aa, 'bernoulli accuracy: %3.2f;' %bern.score(test_data, test_labels)


alpha: 0 bernoulli accuracy: 0.38;
alpha: 0.001 bernoulli accuracy: 0.34;
alpha: 0.01 bernoulli accuracy: 0.34;
alpha: 0.1 bernoulli accuracy: 0.34;
alpha: 1 bernoulli accuracy: 0.34;
alpha: 5 bernoulli accuracy: 0.34;
alpha: 10 bernoulli accuracy: 0.34;
alpha: 50 bernoulli accuracy: 0.28;
alpha: 100 bernoulli accuracy: 0.28;
alpha: 1000 bernoulli accuracy: 0.28;


  neg_prob = np.log(1 - np.exp(self.feature_log_prob_))
  jll += self.class_log_prior_ + neg_prob.sum(axis=1)
