### NAIVE BAYES

Naïve Bayes methods are a set of supervised learning algorithms based on applying Bayes’ theorem with a strong assumption that all the predictors are independent to each other i.e. the presence of a feature in a class is independent to the presence of any other feature in the same class. This is naïve assumption that is why these methods are called Naïve Bayes methods.
Bayes theorem states the following relationship in order to find the posterior probability of class i.e. the probability of a label and some observed features, 𝑷(𝒀 | 𝒇𝒆𝒂𝒕𝒖𝒓𝒆𝒔). 𝑃(𝑌 | 𝑓𝑒𝑎𝑡𝑢𝑟𝑒𝑠)= 𝑃(𝑌)𝑃(𝑓𝑒𝑎𝑡𝑢𝑟𝑒𝑠 | 𝑌)𝑃(𝑓𝑒𝑎𝑡𝑢𝑟𝑒𝑠)
Here, 𝑷(𝒀| 𝒇𝒆𝒂𝒕𝒖𝒓𝒆𝒔) is the posterior probability of class.
𝑷(𝒀) is the prior probability of class.
𝑷(𝒇𝒆𝒂𝒕𝒖𝒓𝒆𝒔 | 𝒀) is the likelihood which is the probability of predictor given class.
𝑷(𝒇𝒆𝒂𝒕𝒖𝒓𝒆𝒔) is the prior probability of predictor.
The Scikit-learn provides different naïve Bayes classifiers models namely Gaussian, Multinomial, Complement and Bernoulli. All of them differ mainly by the assumption they make regarding the distribution of 𝑷(𝒇𝒆𝒂𝒕𝒖𝒓𝒆𝒔 | 𝒀) i.e. the probability of predictor given class

#### **Gaussian Naive Bayes**

As the name suggest, Gaussian Naïve Bayes classifier assumes that the data from each label is drawn from a simple Gaussian distribution. The Scikit-learn provides sklearn.naive_bayes.GaussianNB to implement the Gaussian Naïve Bayes algorithm for classification.

In [None]:
import numpy as np
X = np.array([[-1, -1], [-2, -4], [-4, -6], [1, 2]])
Y = np.array([1, 1, 2, 2])
from sklearn.naive_bayes import GaussianNB
GNBclf = GaussianNB()
GNBclf.fit(X, Y)

In [None]:
print(GNBclf.predict([[-0.5, 2]]))

[2]


#### Multinomial Naïve Bayes

It is another useful Naïve Bayes classifier. It assumes that the features are drawn from a simple Multinomial distribution. The Scikit-learn provides sklearn.naive_bayes.MultinomialNB to implement the Multinomial Naïve Bayes algorithm for classification.

Following table consist the parameters used by sklearn.naive_bayes.MultinomialNB method:

**Implementation Example**

The Python script below will use sklearn.naive_bayes.GaussianNB method to construct Gaussian Naïve Bayes Classifier from our data set:

In [1]:
import numpy as np
X = np.random.randint(8, size=(8, 100))
y = np.array([1, 2, 3, 4, 5, 6, 7, 8])
from sklearn.naive_bayes import MultinomialNB
MNBclf = MultinomialNB()
MNBclf.fit(X, y)
print(MNBclf.predict(X[4:5]))

[5]


#### Bernoulli Naïve Bayes

Bernoulli Naïve Bayes is another useful naïve Bayes model. The assumption in this model is that the features binary (0s and 1s) in nature. An application of Bernoulli Naïve Bayes classification is Text classification with ‘bag of words’ model. The Scikit-learn provides sklearn.naive_bayes.BernoulliNB to implement the Gaussian Naïve Bayes algorithm for classification.

In [2]:
#### The Python script below will use sklearn.naive_bayes.BernoulliNB method to construct Bernoulli Naïve Bayes Classifier from our data set:

import numpy as np
X = np.random.randint(10, size=(10, 1000))
y = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10])
from sklearn.naive_bayes import BernoulliNB
BNBclf = BernoulliNB()
BNBclf.fit(X, y)

In [3]:
print(BNBclf.predict(X[0:5]))

[1 2 3 4 5]


### Complement Naïve Bayes

Another useful naïve Bayes model which was designed to correct the severe assumptions made by Multinomial Bayes classifier. This kind of NB classifier is suitable for imbalanced data sets. The Scikit-learn provides sklearn.naive_bayes.ComplementNB to implement the Gaussian Naïve Bayes algorithm for classification.

In [4]:
import numpy as np
X = np.random.randint(15, size=(15, 1000))
y = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15])
from sklearn.naive_bayes import ComplementNB
CNBclf = ComplementNB()
CNBclf.fit(X, y)

In [5]:
print(CNBclf.predict(X[10:15]))

[11 12 13 14 15]


#### Building Naïve Bayes Classifier

We can also apply Naïve Bayes classifier on Scikit-learn dataset.

In the example below, we are applying GaussianNB and fitting the breast_cancer dataset of Scikit-leran

In [6]:
import sklearn
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
data = load_breast_cancer()
label_names = data['target_names']
labels = data['target']
feature_names = data['feature_names']
features = data['data']

print(label_names)
print(labels[0])
print(feature_names[0])
print(features[0])

train, test, train_labels, test_labels = train_test_split(features,labels,test_size = 0.40, random_state = 42)
from sklearn.naive_bayes import GaussianNB
GNBclf = GaussianNB()
model = GNBclf.fit(train, train_labels)
preds = GNBclf.predict(test)
print(preds)

['malignant' 'benign']
0
mean radius
[1.799e+01 1.038e+01 1.228e+02 1.001e+03 1.184e-01 2.776e-01 3.001e-01
 1.471e-01 2.419e-01 7.871e-02 1.095e+00 9.053e-01 8.589e+00 1.534e+02
 6.399e-03 4.904e-02 5.373e-02 1.587e-02 3.003e-02 6.193e-03 2.538e+01
 1.733e+01 1.846e+02 2.019e+03 1.622e-01 6.656e-01 7.119e-01 2.654e-01
 4.601e-01 1.189e-01]
[1 0 0 1 1 0 0 0 1 1 1 0 1 0 1 0 1 1 1 0 1 1 0 1 1 1 1 1 1 0 1 1 1 1 1 1 0
 1 0 1 1 0 1 1 1 1 1 1 1 1 0 0 1 1 1 1 1 0 0 1 1 0 0 1 1 1 0 0 1 1 0 0 1 0
 1 1 1 1 1 1 0 1 1 0 0 0 0 0 1 1 1 1 1 1 1 1 0 0 1 0 0 1 0 0 1 1 1 0 1 1 0
 1 1 0 0 0 1 1 1 0 0 1 1 0 1 0 0 1 1 0 0 0 1 1 1 0 1 1 0 0 1 0 1 1 0 1 0 0
 1 1 1 1 1 1 1 0 0 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 1 0 1 1 0 1 1 1 1 1 1 0 0
 0 1 1 0 1 0 1 1 1 1 0 1 1 0 1 1 1 0 1 0 0 1 1 1 1 1 1 1 1 0 1 1 1 1 1 0 1
 0 0 1 1 0 1]
