# **Naive Bayes Classifier from scratch using Iris Dataset**
Understanding the concepts behind each statistical model is an important element of studying ML. However, Scikit-learn appears to the user as a mystical black box. The Naive Bayes method is easy to learn and fun to use. It will be presented in Python and applied to the Dataset.

![Naive Bayes Classifier](Assets/NBC.jfif)

## What is Bayes Theorem?

The Bayes theorem is a mathematical formula to calculate conditional probability, named after 18th-century British mathematician Thomas Bayes. The possibility of an outcome occurring based on the likelihood of a preceding outcome occurring is known as conditional probability. Given fresh or more facts, Bayes' theorem can be used to alter previous forecasts or theories (update probability). Bayes' theorem is used in finance to assess the risk of lending money to prospective investors.


The Bayes theorem, commonly known as Bayes' Rule or Bayes' Law, is the cornerstone of Bayesian statistics.

It expresses the likelihood of an event based on past knowledge of circumstances that may be relevant to the event. Given that ( x ) has occurred, we may calculate the likelihood of ( y ) occurring using Bayes theory. The evidence ( x ) is the evidence, the previous experience ( y ) is the prior knowledge, and the likelihood ( P(x|y)) is the likelihood. The predictor variables are assumed to be independent in this case.

It's equation is as follows:

\\( P(y|x)= \dfrac{P(x|y)P(y)}{P(x)} \\)
where, 
* \\( y,x \\) = Events
* \\( P(y|x) \\) = Probability of \\( y \\) given \\( x \\)
* \\( P(x|y) \\) = Probability of \\( x \\) given \\( y \\)
* \\( P(y), P(x) \\) = Independent probabilities of \\( y \\) and \\( x \\)

The Naive Bayes Methods are a series of supervised learning algorithms based on Bayes' theorem and the "naive" assumption of conditional independence between every pair of features given the class variable's value.

The variable ( y ) in our example is the class variable (Survival 0 or 1), which indicates whether a passenger would survive or not under the circumstances. The parameters/features are represented by the variable ( X ). ( X ) is written as ( X=(x 1, x 2,..., x n) ) and can be mapped to Age, Class, Sex, and so on. We get by inserting for ( X) in the Bayes Rule and expanding with the chain rule.
        
\\( P(y|x_{1}, x_{2}, ..., x_{n})= \dfrac{P(x_{1}|y)P(x_{2}|y)...P(x_{n}|y)P(y)}{P(x_{1})P(x_{2})...P(x_{n})} \\)

By examining the dataset, you can now compute the values for each likelihood and plug them into the equation. The class variable ( y ) has two possible outcomes in our case: 0 or 1. So, for each passenger, the survival chances for each situation of '*Survival*' or '*No survival*' (1 or 0, respectively) must be determined. The final outcome is the one with the highest likelihood. That is, the guy will survive if ( P(1) > P(0) ).


Lets import simple libraries

In [1]:
import numpy as np
import pandas as pd

Here, instead of inbuilt libraries we will use custom functions to import and pre-process the data

In [2]:
# Let's a CSV file
def load_csv(filename):
	dataset = list()
	with open(filename, 'r') as file:
		csv_reader = reader(file)
		for row in csv_reader:
			if not row:
				continue
			dataset.append(row)
	return dataset
 
# Convert the target string column to float
def str_column_to_float(dataset, column):
	for row in dataset:
		row[column] = float(row[column].strip())

def str_column_to_int(dataset, column):
	class_values = [row[column] for row in dataset]
	unique = set(class_values)
	lookup = dict()
	for i, value in enumerate(unique):
		lookup[value] = i
	for row in dataset:
		row[column] = lookup[row[column]]
	return lookup

Here, we create a function to split the the in k-folds for train-test purpose and a function to calculate percentage accuracy

In [3]:
# Split a dataset into k folds
from random import randrange
def cross_validation_split(dataset, n_folds):
	dataset_split = list()
	dataset_copy = list(dataset)
	fold_size = int(len(dataset) / n_folds)
	for _ in range(n_folds):
		fold = list()
		while len(fold) < fold_size:
			index = randrange(len(dataset_copy))
			fold.append(dataset_copy.pop(index))
		dataset_split.append(fold)
	return dataset_split
 
# Calculate accuracy percentage
def accuracy_metric(actual, predicted):
	correct = 0
	for i in range(len(actual)):
		if actual[i] == predicted[i]:
			correct += 1
	return correct / float(len(actual)) * 100.0

In [4]:
# Evaluate an algorithm using a cross validation split
def evaluate_algorithm(dataset, algorithm, n_folds, *args):
	folds = cross_validation_split(dataset, n_folds)
	scores = list()
	for fold in folds:
		train_set = list(folds)
		train_set.remove(fold)
		train_set = sum(train_set, [])
		test_set = list()
		for row in fold:
			row_copy = list(row)
			test_set.append(row_copy)
			row_copy[-1] = None
		predicted = algorithm(train_set, test_set, *args)
		actual = [row[-1] for row in fold]
		accuracy = accuracy_metric(actual, predicted)
		scores.append(accuracy)
	return scores
 
# Split the dataset by class values, returns a dictionary
def separate_by_class(dataset):
	separated = dict()
	for i in range(len(dataset)):
		vector = dataset[i]
		class_value = vector[-1]
		if (class_value not in separated):
			separated[class_value] = list()
		separated[class_value].append(vector)
	return separated

Calculating the parameters like mean, standard deviation for further evaluation

In [5]:
# Calculate the mean of a list of numbers
def mean(numbers):
	return sum(numbers)/float(len(numbers))
 
# Calculate the standard deviation of a list of numbers
def stdev(numbers):
	avg = mean(numbers)
	variance = sum([(x-avg)**2 for x in numbers]) / float(len(numbers)-1)
	return np.sqrt(variance)
 
# Calculate the mean, stdev and count for each column in a dataset
def summarize_dataset(dataset):
	summaries = [(mean(column), stdev(column), len(column)) for column in zip(*dataset)]
	del(summaries[-1])
	return summaries
 
# Split dataset by class then calculate statistics for each row
def summarize_by_class(dataset):
	separated = separate_by_class(dataset)
	summaries = dict()
	for class_value, rows in separated.items():
		summaries[class_value] = summarize_dataset(rows)
	return summaries
 
# Calculate the Gaussian probability distribution function for x
def calculate_probability(x, mean, stdev):
	exponent = np.exp(-((x-mean)**2 / (2 * stdev**2 )))
	return (1 / (np.sqrt(2 * np.pi) * stdev)) * exponent

In [6]:
# Calculate the probabilities of predicting each class for a given row
def calculate_class_probabilities(summaries, row):
	total_rows = sum([summaries[label][0][2] for label in summaries])
	probabilities = dict()
	for class_value, class_summaries in summaries.items():
		probabilities[class_value] = summaries[class_value][0][2]/float(total_rows)
		for i in range(len(class_summaries)):
			mean, stdev, _ = class_summaries[i]
			probabilities[class_value] *= calculate_probability(row[i], mean, stdev)
	return probabilities
 
# Predict the class for a given row
def predict(summaries, row):
	probabilities = calculate_class_probabilities(summaries, row)
	best_label, best_prob = None, -1
	for class_value, probability in probabilities.items():
		if best_label is None or probability > best_prob:
			best_prob = probability
			best_label = class_value
	return best_label

Defining the predict and naive bayes function

In [7]:
# Naive Bayes Algorithm
def naive_bayes(train, test):
	summarize = summarize_by_class(train)
	predictions = list()
	for row in test:
		output = predict(summarize, row)
		predictions.append(output)
	return(predictions)

![Species](Assets/iris-machinelearning.png)

In [8]:
# Loading the dataset
np.random.seed(1) 
from csv import reader
dataset = load_csv("iris.csv")

In [9]:
# Viewing the dataset
dataset_view = pd.DataFrame(dataset, columns =["sepal_length", "sepal_width", "petal_length", "petal_width","species"])
dataset_view

Unnamed: 0,sepal_length,sepal_width,petal_length,petal_width,species
0,5.1,3.5,1.4,0.2,Setosa
1,4.9,3,1.4,0.2,Setosa
2,4.7,3.2,1.3,0.2,Setosa
3,4.6,3.1,1.5,0.2,Setosa
4,5,3.6,1.4,0.2,Setosa
...,...,...,...,...,...
145,6.7,3,5.2,2.3,Virginica
146,6.3,2.5,5,1.9,Virginica
147,6.5,3,5.2,2,Virginica
148,6.2,3.4,5.4,2.3,Virginica


As we can see in the above dataset, we have **sepal_length** which is flower's sepel length, **sepal width** which is flower's sepel width, **petal length** which is flower's petal length and **petal width** which is flower's petal width and on the basis of "sepal_length", "sepal_width", "petal_length", "petal_width" we are predicting the class "species" which is flower **species** consisting of 'Setosa', Virginica' and 'Versicolor' using Naive Bayes Classifier

In [10]:
# Test Naive Bayes on Iris Dataset
for i in range(len(dataset[0])-1):
	str_column_to_float(dataset, i)
# convert class column to integers
str_column_to_int(dataset, len(dataset[0])-1)
# evaluate algorithm
n_folds = 5
scores = evaluate_algorithm(dataset, naive_bayes, n_folds)
print('Mean Accuracy of Naive Bayes classifier is : %.3f%%' % (sum(scores)/float(len(scores))))

Mean Accuracy of Naive Bayes classifier is : 95.333%


### Advantages
- It is not only a straightforward strategy, but also a quick and accurate one.
- The calculation cost of Naive Bayes is quite low.
- It can handle a large dataset with ease.
- When compared to a continuous variable, it performs well with discrete response variables.
- It can be used to solve problems involving numerous classes.
- It also performs well when dealing with text analytics issues.
- A Naive Bayes classifier outperforms other models like logistic regression when the assumption of independence is met.

### Disadvantages
- The presumption of distinct characteristics. In practise, getting a group of predictors that are completely independent is nearly impossible.

- When there is no training tuple for a specific class, the posterior probability is zero. The model is unable to provide predictions in this scenario. The Zero Likelihood Problem is the name for this problem.

## Conclusion

Congratulations, we have successfully learnt using Naive Bayes without using any fancy libraries!

Here, we learned about Naïve Bayes algorithm, it's working, Naive Bayes assumption, issues, implementation, advantages, and disadvantages. Along the road, you have also learned model building and evaluation without using scikit-learn for multinomial classes.

Naive Bayes is the most straightforward and most potent algorithm. In spite of the significant advances of Machine Learning in the last couple of years, it has proved its worth. It has been successfully deployed in many applications from text analytics to recommendation engines.