# **Naive Bayes**
The Naive Bayes classifier is a fast, simple, probabilistic machine learning algorithm for classification, known for its efficiency in tasks like spam filtering and sentiment analysis, despite its ***naive*** assumption that features are independent of each other given the class. It uses ***Bayes' Theorem*** to calculate the probability of a new data point belonging to a specific class, assigning it the class with the highest probability.

## Baye's thearom (for context)
$$ P(A|B)=\frac{P(B|A)\cdot P(A)}{P(B)} $$

## Naive Baye's

$$ P(y|x_{1},\dots ,x_{n})=\frac{P(y) * P(x₁|y) * P(x₂|y) * ..... * P(x_{n}|y')}{P(x_{1},\dots ,x_{n})} $$
$$ P(y'|x_{1},\dots ,x_{n})=\frac{P(y') * P(x₁|y') * P(x₂|y') * ..... * P(x_{n}|y')}{P(x_{1},\dots ,x_{n})} $$

However, both the denomitor are same i.e., $ P(x_{1},\dots ,x_{n}) $, so, we can eliminate that and the final formula will be:

$$ P(y|x_{1},\dots ,x_{n})= P(y) * P(x₁|y) * P(x₂|y) * ..... * P(x_{n}|y') $$
$$ P(y'|x_{1},\dots ,x_{n})= P(y') * P(x₁|y') * P(x₂|y') * ..... * P(x_{n}|y') $$
$$ P(y₁|x_{1},\dots ,x_{n}) ...\ all\ the\ further\ classes\ will\ be\ calculated\ the\ same\ way. $$

The class with highest probability will be taken as final prediction.


## Types of Naive baye's

### **1. Bernoulli Naive Bayes**
- *Data Type*\
Suited for binary/boolean features (values of 0 or 1).
- *Assumption*\
Assumes each feature is a binary-valued variable, modeling the presence or absence of a feature rather than its frequency.
- *Use Cases*\
Commonly used in text classification where the model only cares about whether a specific word is present in a document or not, not how many times it appears.
- *Key Feature*\
Explicitly penalizes the non-occurrence of a feature that is considered an indicator for a class. 

### **2. Multinomial Naive Bayes**
- *Data Type*\
Designed for discrete count data.
- *Assumption*\
Assumes that features are multinomially distributed, typically representing counts or frequencies of events (e.g., word counts).
- *Use Cases*\
This is the most common variant for text classification tasks, such as spam filtering and document categorization, where the frequency of words is important. 

### **3. Gaussian Naive Bayes**
- *Data Type*\
Used when predictor features have continuous values.
- *Assumption*\
Assumes that the continuous feature values follow a Gaussian (normal) distribution (a bell-shaped curve) within each class.
- *Use Cases*\
Applied in tasks with numerical data such as medical diagnosis (e.g., height, weight, blood pressure) or weather prediction, where data points cluster around a mean value. The model estimates the mean and variance of each feature for each class to calculate probabilities using the Gaussian probability density function. 





## Implementation of Naive Baye's algorithm

In [None]:
# get a data set
from sklearn.datasets import load_iris

X, y = load_iris(return_X_y= True)
X, y

(array([[5.1, 3.5, 1.4, 0.2],
        [4.9, 3. , 1.4, 0.2],
        [4.7, 3.2, 1.3, 0.2],
        [4.6, 3.1, 1.5, 0.2],
        [5. , 3.6, 1.4, 0.2],
        [5.4, 3.9, 1.7, 0.4],
        [4.6, 3.4, 1.4, 0.3],
        [5. , 3.4, 1.5, 0.2],
        [4.4, 2.9, 1.4, 0.2],
        [4.9, 3.1, 1.5, 0.1],
        [5.4, 3.7, 1.5, 0.2],
        [4.8, 3.4, 1.6, 0.2],
        [4.8, 3. , 1.4, 0.1],
        [4.3, 3. , 1.1, 0.1],
        [5.8, 4. , 1.2, 0.2],
        [5.7, 4.4, 1.5, 0.4],
        [5.4, 3.9, 1.3, 0.4],
        [5.1, 3.5, 1.4, 0.3],
        [5.7, 3.8, 1.7, 0.3],
        [5.1, 3.8, 1.5, 0.3],
        [5.4, 3.4, 1.7, 0.2],
        [5.1, 3.7, 1.5, 0.4],
        [4.6, 3.6, 1. , 0.2],
        [5.1, 3.3, 1.7, 0.5],
        [4.8, 3.4, 1.9, 0.2],
        [5. , 3. , 1.6, 0.2],
        [5. , 3.4, 1.6, 0.4],
        [5.2, 3.5, 1.5, 0.2],
        [5.2, 3.4, 1.4, 0.2],
        [4.7, 3.2, 1.6, 0.2],
        [4.8, 3.1, 1.6, 0.2],
        [5.4, 3.4, 1.5, 0.4],
        [5.2, 4.1, 1.5, 0.1],
        [5

> We can see the data points are continous, Gaussian naive bayes should be used

In [12]:
# split into train test data
from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25, random_state=42)

In [14]:
# train it on gaussian naive bayes model
from sklearn.naive_bayes import GaussianNB
gnb = GaussianNB()
gnb.fit(X=X_train, y=y_train)

In [17]:
# get the predicted points
y_pred = gnb.predict(X_test)
y_pred

array([1, 0, 2, 1, 1, 0, 1, 2, 1, 1, 2, 0, 0, 0, 0, 1, 2, 1, 1, 2, 0, 2,
       0, 2, 2, 2, 2, 2, 0, 0, 0, 0, 1, 0, 0, 2, 1, 0])

In [18]:
# check the accuracy
from sklearn.metrics import accuracy_score, confusion_matrix, classification_report
print(accuracy_score(y_true=y_test, y_pred=y_pred))
print(confusion_matrix(y_true=y_test, y_pred=y_pred))
print(classification_report(y_true=y_test, y_pred=y_pred))

1.0
[[15  0  0]
 [ 0 11  0]
 [ 0  0 12]]
              precision    recall  f1-score   support

           0       1.00      1.00      1.00        15
           1       1.00      1.00      1.00        11
           2       1.00      1.00      1.00        12

    accuracy                           1.00        38
   macro avg       1.00      1.00      1.00        38
weighted avg       1.00      1.00      1.00        38

