# Table of Contents
1. [Introduction to Naive Bayes](#1)
   * [1.1. Bayes' Theorem](#1.1)
   * [1.2. Naive Bayes Assumption](#1.2)


2. [Types of Naive Bayes](#2)


3. [Steps to Implement Naive Bayes](#3)
   * [3.1. Data Preprocessing](#3.1)
   * [3.2. Training](#3.2)
   * [3.3. Prediction](#3.3)
   

4. [Performance Evaluation](#4)
   
   
5. [Advantages of Naive Bayes](#5)


6. [Limitations of Naive Bayes](#6)
   
   
7. [Example: Text Classification](#7)

<a id = "1"></a>
# 1. Introduction to Naive Bayes
Naive Bayes is a simple yet powerful classification algorithm based on Bayes' Theorem. It is used in various areas including text classification, spam filtering, medical diagnosis and more. Despite its simplicity, it often performs remarkably well especially on large datasets with high dimensionality.

<a id = "1.1"></a>
### 1.1. Bayes' Theorem
Bayes' Theorem is the foundation of Naive Bayes. It calculates the probability of a hypothesis given the evidence.


P(A|B): Probability of event A given event B.


P(B|A): Probability of event B given event A.


P(A) and P(B): Probabilities of events A and B respectively.


<a id = "1.2"></a>
### 1.1. Naive Bayes Assumption
Naive Bayes makes a strong assumption of feature independence which means it assumes that the presence of a particular feature in a class is unrelated to the presence of any other feature. 


<a id = "2"></a>
# 2. Types of Naive Bayes
1. Gaussian Naive Bayes (Assumes that features follow a gaussian distribution)
2. Multinomial Naive Bayes (Used for discrete counts. It's suitable for text classification where the features are the word counts)
3. Bernoulli Naive Bayes: (Assumes that features are binary-valued. It's useful when dealing with binary or Boolean features)


<a id = "3"></a>
# 3. Steps to Implement Naive Bayes

<a id = "3.1"></a>
### 3.1. Data Preprocessing
- Clean the data.
- Split the data into training and testing sets.

<a id = "3.2"></a>
### 3.2. Training
- Calculate prior probabilities for each class.
- Calculate likelihood probabilities for each feature given each class.

<a id = "3.3"></a>
### 3.3. Prediction
- For a given instance, calculate the posterior probability for each class.
- Assign the instance to the class with the highest posterior probability.

<a id = "4"></a>
# 4. Performance Evaluation
1. Accuracy (Overall correctness of the classifier)
2. Precision (Proportion of true positive predictions out of all positive predictions)
3. Recall (Proportion of true positive predictions out of all actual positives)
4. F1 Score (Harmonic mean of precision and recall)

<a id = "5"></a>
# 5. Advantages of Naive Bayes
- Simple and easy to implement.
- Works well with high-dimensional datasets.
- Requires a small amount of training data to estimate parameters.
- Performs well in the presence of irrelevant features.


<a id = "6"></a>
# 6. Limitations of Naive Bayes
- Strong assumption of feature independence which may not hold true in some cases.
- May perform poorly if a categorical variable has a category in the test data that was not observed in the training data.
- Requires careful preprocessing of textual data.


<a id = "7"></a>
# 7. Example: Text Classification
Let's consider an example of text classification using Naive Bayes. This code includes sample data for text classification where each document is labeled as either **positive** or **negative**. It then trains a Naive Bayes classifier using this data and makes predictions on new instances. Finally, it evaluates the model's performance using accuracy and classification report metrics.

In [3]:
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.naive_bayes import MultinomialNB
from sklearn.metrics import accuracy_score, classification_report

documents = [
    ("This is a positive review", "positive"),
    ("Very good movie", "positive"),
    ("Terrible acting", "negative"),
    ("Not worth watching", "negative"),
    ("Enjoyed the plot", "positive")
]

X = [doc[0] for doc in documents]
y = [doc[1] for doc in documents]

vectorizer = CountVectorizer()
X_counts = vectorizer.fit_transform(X)

clf = MultinomialNB()
clf.fit(X_counts, y)

new_instances = ["Great acting", "Awful experience"]
new_instances_counts = vectorizer.transform(new_instances)
predictions = clf.predict(new_instances_counts)

for instance, prediction in zip(new_instances, predictions):
    print(f"Instance: {instance} --> Prediction: {prediction}")

y_pred = clf.predict(X_counts)
print("\nAccuracy:", accuracy_score(y, y_pred))
print("Classification Report:")
print(classification_report(y, y_pred))

Instance: Great acting --> Prediction: negative
Instance: Awful experience --> Prediction: positive

Accuracy: 1.0
Classification Report:
              precision    recall  f1-score   support

    negative       1.00      1.00      1.00         2
    positive       1.00      1.00      1.00         3

    accuracy                           1.00         5
   macro avg       1.00      1.00      1.00         5
weighted avg       1.00      1.00      1.00         5

