 Naive Bayes
 What is Naive Bayes? 

Bayes Theorem 

Types Of Naive Bayes 

Naive Bayes Classifier 

https://www.kaggle.com/code/prashant111/naive-bayes-classifier-in-python 

https://www.analyticsvidhya.com/blog/2021/09/naive-bayes-algorithm-a-complete-guide-for-data-science-enthusiasts/

https://www.simplilearn.com/tutorials/machine-learning-tutorial/naive-bayes-classifier


### Naive Bayes

**What is Naive Bayes?**

Naive Bayes is a family of probabilistic algorithms based on Bayes' Theorem, used for classification tasks. It is called "naive" because it assumes that the features used for classification are independent of each other, given the class label. Despite this strong assumption, Naive Bayes can perform surprisingly well in practice, especially for text classification problems like spam detection and sentiment analysis.

---

### Bayes Theorem

Bayes' Theorem describes the probability of an event based on prior knowledge of conditions that might be related to the event. It can be mathematically expressed as:

\[ P(A|B) = \frac{P(B|A) \cdot P(A)}{P(B)} \]

Where:
- \( P(A|B) \): Probability of event A occurring given that B is true (posterior probability).
- \( P(B|A) \): Probability of event B occurring given that A is true (likelihood).
- \( P(A) \): Probability of event A occurring (prior probability).
- \( P(B) \): Probability of event B occurring (evidence).

---

### Types of Naive Bayes

There are several variations of the Naive Bayes algorithm, depending on the type of data being processed:

1. **Gaussian Naive Bayes**:
   - Assumes that the features follow a Gaussian (normal) distribution.
   - Suitable for continuous data.

2. **Multinomial Naive Bayes**:
   - Used for discrete count data.
   - Commonly used in text classification (e.g., word counts).

3. **Bernoulli Naive Bayes**:
   - Similar to the multinomial variant but assumes binary features (presence/absence).
   - Often used in text classification where features are binary (e.g., whether a word exists in a document).

---

### Naive Bayes Classifier

The Naive Bayes Classifier uses the Bayes Theorem to predict the class label of a given instance. Here’s a basic overview of how it works:

1. **Training**: 
   - Calculate prior probabilities for each class based on the training data.
   - For each feature, calculate the likelihood of that feature given each class.

2. **Prediction**:
   - For a new instance, compute the posterior probability for each class using the calculated priors and likelihoods.
   - Classify the instance to the class with the highest posterior probability.

### Implementation Example

Here’s a simple implementation of a Naive Bayes classifier using Python’s `scikit-learn` library, focusing on text classification with the Multinomial Naive Bayes variant:

### Conclusion

Naive Bayes is a powerful and straightforward algorithm for classification tasks, particularly suited for high-dimensional data like text. Understanding the underlying principles of Bayes' Theorem and the types of Naive Bayes classifiers allows for effective application in various domains.

Naive Bayes algorithm falls under classification.

##### Applications of Naive Bayes Algorithm
- Real-time Prediction.
- Multi-class Prediction.
- Text classification/ Spam Filtering/ Sentiment Analysis.
- Recommendation Systems.

In [1]:
# Import necessary libraries
import numpy as np
from sklearn.datasets import fetch_20newsgroups
from sklearn.model_selection import train_test_split
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.naive_bayes import MultinomialNB
from sklearn.metrics import accuracy_score, classification_report

# Load dataset
data = fetch_20newsgroups(subset='all')
X = data.data  # Text data
y = data.target  # Labels

# Split the dataset
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Convert text to feature vectors
vectorizer = CountVectorizer()
X_train_counts = vectorizer.fit_transform(X_train)
X_test_counts = vectorizer.transform(X_test)

# Create a Multinomial Naive Bayes classifier
nb_classifier = MultinomialNB()

# Train the model
nb_classifier.fit(X_train_counts, y_train)

# Make predictions
y_pred = nb_classifier.predict(X_test_counts)

# Evaluate the model
accuracy = accuracy_score(y_test, y_pred)
print(f'Accuracy: {accuracy:.2f}')
print("Classification Report:\n", classification_report(y_test, y_pred))


Accuracy: 0.85
Classification Report:
               precision    recall  f1-score   support

           0       0.85      0.89      0.87       236
           1       0.61      0.90      0.73       287
           2       0.94      0.23      0.37       290
           3       0.58      0.85      0.69       285
           4       0.94      0.81      0.87       312
           5       0.88      0.82      0.85       308
           6       0.92      0.69      0.79       276
           7       0.90      0.91      0.91       304
           8       0.97      0.94      0.95       279
           9       0.97      0.94      0.96       308
          10       0.96      0.96      0.96       309
          11       0.83      0.97      0.89       290
          12       0.87      0.82      0.84       304
          13       0.95      0.91      0.93       300
          14       0.90      0.98      0.94       297
          15       0.76      0.99      0.86       292
          16       0.88      0.92      0.9

###### Here’s how to implement the three types of Naive Bayes classifiers—Gaussian, Multinomial, and Bernoulli—using Python’s `scikit-learn` library. We will use a sample dataset for demonstration purposes.


In [None]:

### 1. Gaussian Naive Bayes

# Gaussian Naive Bayes is used for continuous data and assumes that the features follow a Gaussian distribution.


# Import necessary libraries
import numpy as np
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import GaussianNB
from sklearn.metrics import accuracy_score, classification_report

# Load the Iris dataset
data = load_iris()
X = data.data
y = data.target

# Split the dataset
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Create a Gaussian Naive Bayes classifier
gnb_classifier = GaussianNB()

# Train the model
gnb_classifier.fit(X_train, y_train)

# Make predictions
y_pred = gnb_classifier.predict(X_test)

# Evaluate the model
print("Gaussian Naive Bayes:")
print(f'Accuracy: {accuracy_score(y_test, y_pred):.2f}')
print("Classification Report:\n", classification_report(y_test, y_pred))

In [None]:

### 2. Multinomial Naive Bayes

# Multinomial Naive Bayes is used for discrete count data, such as text classification.


# Import necessary libraries
from sklearn.datasets import fetch_20newsgroups
from sklearn.model_selection import train_test_split
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.naive_bayes import MultinomialNB

# Load dataset
data = fetch_20newsgroups(subset='all')
X = data.data  # Text data
y = data.target  # Labels

# Split the dataset
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Convert text to feature vectors
vectorizer = CountVectorizer()
X_train_counts = vectorizer.fit_transform(X_train)
X_test_counts = vectorizer.transform(X_test)

# Create a Multinomial Naive Bayes classifier
mnb_classifier = MultinomialNB()

# Train the model
mnb_classifier.fit(X_train_counts, y_train)

# Make predictions
y_pred = mnb_classifier.predict(X_test_counts)

# Evaluate the model
print("\nMultinomial Naive Bayes:")
print(f'Accuracy: {accuracy_score(y_test, y_pred):.2f}')
print("Classification Report:\n", classification_report(y_test, y_pred))


In [None]:


### 3. Bernoulli Naive Bayes

Bernoulli Naive Bayes is also used for binary/boolean features.

```python
# Import necessary libraries
from sklearn.naive_bayes import BernoulliNB

# Using the same dataset as Multinomial Naive Bayes but binary features
# Convert counts to binary features (0 or 1)
X_train_binary = (X_train_counts > 0).astype(int)
X_test_binary = (X_test_counts > 0).astype(int)

# Create a Bernoulli Naive Bayes classifier
bnb_classifier = BernoulliNB()

# Train the model
bnb_classifier.fit(X_train_binary, y_train)

# Make predictions
y_pred = bnb_classifier.predict(X_test_binary)

# Evaluate the model
print("\nBernoulli Naive Bayes:")
print(f'Accuracy: {accuracy_score(y_test, y_pred):.2f}')
print("Classification Report:\n", classification_report(y_test, y_pred))


### Summary

- **Gaussian Naive Bayes** is ideal for continuous data.
- **Multinomial Naive Bayes** is suited for discrete count data (e.g., text classification).
- **Bernoulli Naive Bayes** is suitable for binary/boolean features.

You can run each of these code snippets separately to see how each classifier performs on their respective datasets. Adjust the datasets and parameters as needed to explore different scenarios!