<a href="https://colab.research.google.com/github/mukeshyadav4747/ML/blob/main/Naive_Bayes_Classifier.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Naive Bayes Algorithm ===>

The Naive Bayes algorithm is a family of simple yet powerful probabilistic classifiers based on applying Bayes' theorem with strong (naive) independence assumptions between the features. It's particularly popular for text classification tasks such as spam detection, sentiment analysis, and document categorization.

There are several types of Naive Bayes classifiers, depending on the nature of the feature data:

Gaussian Naive Bayes: Assumes that the features follow a normal (Gaussian) distribution. It's used for continuous data. Multinomial Naive Bayes: Used for discrete data, particularly in text classification where features represent word frequencies, Bernoulli Naive Bayes: Assumes binary features (0s and 1s), used for binary/boolean features, such as in text classification tasks where the presence or absence of a word is considered. Steps of Naive Bayes Algorithm Training Phase:

Calculate the prior probability for each class. Calculate the likelihood for each feature given each class, If using Gaussian Naive Bayes, calculate the mean and variance of the features for each class.

In [None]:
import numpy as np
import pandas as pd

In [None]:
df = pd.read_csv("/content/Social_Network_Ads.csv", usecols= ['Age', 'EstimatedSalary', 'Purchased'])

In [None]:
df.head()

Unnamed: 0,Age,EstimatedSalary,Purchased
0,19,19000,0
1,35,20000,0
2,26,43000,0
3,27,57000,0
4,19,76000,0


In [None]:
x = df.drop(columns = ['Purchased'])
y = df['Purchased']

In [None]:
from sklearn.model_selection import train_test_split

In [None]:
x_train, x_test, y_train, y_test = train_test_split(x,y,test_size= 0.2,random_state = 23)

In [None]:
from sklearn.preprocessing import StandardScaler

In [None]:
sc =StandardScaler()

In [None]:
x_train_new = sc.fit_transform(x_train)

In [None]:
x_test_new = sc.transform(x_test)

In [None]:
from sklearn.naive_bayes import GaussianNB, MultinomialNB, BernoulliNB
classifier = GaussianNB()

In [None]:
classifier.fit(x_train_new, y_train)

In [None]:
GaussianNB()

In [None]:
y_pred = classifier.predict(x_test_new)
y_pred

array([0, 1, 0, 0, 1, 0, 1, 1, 1, 0, 0, 1, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0,
       1, 0, 1, 0, 1, 1, 0, 0, 0, 0, 0, 1, 1, 0, 1, 1, 0, 0, 1, 1, 0, 0,
       1, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 1, 1, 0])

In [None]:
from sklearn.metrics import confusion_matrix

In [None]:
cn = confusion_matrix(y_test, y_pred)
cn

array([[48,  2],
       [ 5, 25]])

In [None]:
from sklearn.metrics import accuracy_score

In [None]:
accuracy_score(y_test,y_pred)

0.9125

In [None]:
from sklearn.metrics import precision_score

# It tells true positive rate

# Assuming y_true and y_pred are the true and predicted labels respectively

precision = precision_score(y_test, y_pred)

In [None]:
precision

0.9259259259259259

In [None]:
from sklearn.metrics import recall_score

# It tells true negative rate
recall = recall_score(y_test,y_pred)

In [None]:
recall

0.8333333333333334

In [None]:
from sklearn.metrics import f1_score

# It tells both combination of true positive and true negative

f1 = f1_score(y_test,y_pred)

In [None]:
f1

0.8771929824561403

1. Bernoulli Naive Bayes (BernoulliNB): Scenario: You would use BernoulliNB when your data consists of binary features (i.e., features that take on a value of 0 or 1). Example Use Cases: Text Classification: When you are dealing with text data where the presence or absence of specific words (rather than the frequency of words) is more important. For example, spam detection where the presence of certain words (like "free", "offer") is critical. Binary Data: Any scenario where features are binary, such as a survey where responses are "Yes" (1) or "No" (0).

2. Multinomial Naive Bayes (MultinomialNB): Scenario: MultinomialNB is suitable when your data is represented as counts or frequencies of events. It works well when features represent the frequency of occurrence of a particular event. Example Use Cases: Text Classification: When you have text data and the 'frequency of words is important. For example, classifying news articles based on word frequency Document Classification: When the document term matrix (which.counts the occurrence of words) is the primary feature set, such as in sentiment analysis or topic categorization.

3. (Gaussian NB): You would use GaussianNB when your features are continuous numerical values that are expected to follow a Gaussian (normal) distribution. Example Use Cases: Iris Flower Classification: Classifying iris species based on features like sepal length, sepal width, petal length, and petal width, which are continuous and can be modeled by a Gaussian distribution, Medical Diagnosis: When features like patient age, blood pressure, or cholesterol levels are continuous and normally distributed, GaussianNB can be used to predict the likelihood of a disease Weather Prediction: Features such as temperature, humidity, or wind speed, which are continuous and may follow a normal distribution, can be used to predict weather conditions. Finance: Predicting stock prices or market trends using continuous financial metrics that may follow a Gaussian distribution.