### Naive Bayes Classifiers:

• Naive Bayes classifiers are a collection of classification algorithms based on Bayes’ Theorem.

• It is not a single algorithm but a family of algorithms where all of them share a common principle, i.e. every pair of features being classified is independent of each other.

• To start with, let us consider a dataset.

• Consider a fictional dataset that describes the weather conditions for playing a game of golf. Given the weather conditions, each tuple classifies the conditions as fit(“Yes”) or unfit(“No”) for plaing golf.

• The dataset is divided into two parts, namely, feature matrix and the response vector.

    • Feature matrix contains all the vectors(rows) of dataset in which each vector consists of the value of dependent features. In above dataset, features are ‘Outlook’, ‘Temperature’, ‘Humidity’ and ‘Windy’.
    
    • Response vector contains the value of class variable(prediction or output) for each row of feature matrix. In above dataset, the class variable name is ‘Play golf’.

#### Assumption:

• The fundamental Naive Bayes assumption is that each feature makes an:

    • independent
    
    • equal

contribution to the outcome.

• With relation to our dataset, this concept can be understood as:

    • We assume that no pair of features are dependent. For example, the temperature being ‘Hot’ has nothing to do with the humidity or the outlook being ‘Rainy’ has no effect on the winds. Hence, the features are assumed to be independent.

    • Secondly, each feature is given the same weight(or importance). For example, knowing only temperature and humidity alone can’t predict the outcome accurately. None of the attributes is irrelevant and assumed to be contributing equally to the outcome.

#### Note:
    
• The assumptions made by Naive Bayes are not generally correct in real-world situations. In-fact, the independence assumption is never correct but often works well in practice.

• Now, before moving to the formula for Naive Bayes, it is important to know about Bayes’ theorem.

#### Bayes’ Theorem

• Bayes’ Theorem finds the probability of an event occurring given the probability of another event that has already occurred.

• Bayes’ theorem is stated mathematically as the following equation:

                                        P(A/B) = P(B/A)P(A)/P(B)

where A and B are events and P(B) != 0.

    • Basically, we are trying to find probability of event A, given the event B is true. Event B is also termed as evidence.
    
    • P(A) is the priori of A (the prior probability, i.e. Probability of event before evidence is seen). The evidence is an attribute value of an unknown instance(here, it is event B).

    • P(A|B) is a posteriori probability of B, i.e. probability of event after evidence is seen.

• Now, we look at an implementation of Gaussian Naive Bayes classifier using scikit-learn.

In [7]:
# Load the iris dataset
from sklearn.datasets import load_iris
iris = load_iris()

# Store the feature matrix(X) & response vector(y)
X = iris.data
y = iris.target

# Splitting the X and y into training & testing sets
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.4, random_state = 1)

# Training the model on training set
from sklearn.naive_bayes import GaussianNB
gnb = GaussianNB()
gnb.fit(X_train,y_train)

# Making the predictions on training set
y_predict = gnb.predict(X_test)

# Comparing the actual response value(y_test) with predicted response value (y_predict)
from sklearn.metrics import accuracy_score
print("Gaussian Naive Bayes accuracy(in %):", accuracy_score(y_test, y_predict) * 100)

Gaussian Naive Bayes accuracy(in %): 95.0


### Other popular Naive Bayes classifiers are:

• Multinomial Naive Bayes:
    
    • Feature vectors represent the frequencies with which certain events have been generated by a multinomial distribution.
    
    • This is the event model typically used for document classification.

• Bernoulli Naive Bayes:
    
    • In the multivariate Bernoulli event model, features are independent booleans(binary variables) describing inputs.
    
    • Like the multinomial model, this model is popular for document classification tasks, where binary term    occurrence(i.e. a word occurs in a document or not) features are used rather than term frequencies(i.e. frequency of a word in the document).