1. **Gaussian Naive Bayes** : Gaussian Naive Bayes assumes that the features follow a Gaussian (normal) distribution. This variant is typically used for continuous features.
 - **Assumption**: The probability of a feature value, given a class, is drawn from a Gaussian distribution.
 - **When to Use**: When your features are continuous and can be reasonably approximated by a normal distribution. Examples include height, weight, temperature, or pixel intensity values in image classification.
2. **Multinomial Naive Bayes**: Multinomial Naive Bayes is designed for discrete features that represent counts or frequencies. This is the most common variant used for text classification.
 - **Assumption**: The probability of observing a feature (word) given a class follows a multinomial distribution. This distribution is suitable for modeling counts of events (like word occurrences) in a fixed number of trials (like the total number of words in a document).
 - **When to Use**: This is the go-to variant for text classification tasks where features represent word counts, term frequencies, or TF-IDF scores.
3. **Bernoulli Naive Bayes**: Bernoulli Naive Bayes is also for discrete features, but it models the presence or absence of a feature, rather than its frequency. It assumes that each feature is a binary variable.
 - **Assumption**: Each feature (word) is either present or absent in a document. The probability of a feature being present or absent is modeled.
 - **When to Use**: When your features are binary, indicating presence or absence. This can be useful for text if you are only considering whether a word appears in a document, not how many times.

## Why Multinomial Naive Bayes Reigns Supreme for Text Classification
Multinomial Naive Bayes (MNB) is one of the most effective and widely used algorithms for text classification tasks in Natural Language Processing (NLP).

### 1. Specifically Designed for Text Data
Text data is naturally represented using word counts or frequencies (Bag-of-Words, Term Frequency, TF-IDF).
Multinomial Naive Bayes directly models these word frequency distributions, making it ideal for text-based problems.

### 2. Handles High-Dimensional and Sparse Data Well
Text datasets typically contain:
- A very large vocabulary (thousands of features)
- Sparse feature vectors (mostly zeros)

MNB performs efficiently in such high-dimensional sparse spaces.

### 3. Fast Training and Prediction
- Training involves simple counting of word occurrences
- Prediction is based on probability calculations

This makes MNB extremely fast and suitable for large-scale datasets.

### 4. Effective Despite the Naive Independence Assumption
MNB assumes that words occur independently of each other, which is not true in real language.
However, this assumption works surprisingly well in practice and still produces strong results.

### 5. Performs Well with Small Datasets
Unlike deep learning models, MNB does not require large amounts of training data.
It generalizes well even with limited labeled text data.

### 6. Robust to Rare and Unseen Words (Using Smoothing)
Laplace (Additive) Smoothing:
- Prevents zero probabilities
- Allows the model to handle unseen words during testing

This is essential for real-world text applications.

In [31]:
import numpy as np

In [32]:
# Defining prior probabilities
p_purchase=0.1
p_no_purchase=0.9

# Likelihoods
p_click_given_purchase = 0.8
p_click_given_no_purchase = 0.2

# Calculate the probability of the evidence (clicking)
# P(Click) = P(Click | Purchase) * P(Purchase) + P(Click | No Purchase) * P(No Purchase)
p_click = (p_click_given_purchase * p_purchase) + (p_click_given_no_purchase * p_no_purchase)
print(f"Probability of clicking (P(Click)): {p_click:.4f}")

Probability of clicking (P(Click)): 0.2600


In [33]:
# Calculate the posterior probability P(Purchase | Click) using Bayes' Theorem
# P(Purchase | Click) = (P(Click | Purchase) * P(Purchase)) / P(Click)
p_purchase_given_click = (p_click_given_purchase * p_purchase) / p_click

print(f"Probability of Purchase given Click (P(Purchase | Click)): {p_purchase_given_click:.4f}")

Probability of Purchase given Click (P(Purchase | Click)): 0.3077


In [34]:
p_no_purchase_given_click = (p_click_given_no_purchase * p_no_purchase) / p_click
print(f"Probability of No Purchase given Click (P(No Purchase | Click)): {p_no_purchase_given_click:.4f}")

# Verify that the posterior probabilities sum to 1
print(f"Sum of posterior probabilities: {p_purchase_given_click + p_no_purchase_given_click:.4f}")

Probability of No Purchase given Click (P(No Purchase | Click)): 0.6923
Sum of posterior probabilities: 1.0000


## Implementation of Multinomial bayes for text processing and trainign it

In [35]:
import pandas as pd
import re
from nltk.corpus import stopwords
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import MultinomialNB
from sklearn.metrics import accuracy_score, classification_report

In [36]:
# Sample dataset
data = {
    'text': [
        'This is the first document. It is about machine learning and data science.',
        'This document is the second document. It discusses artificial intelligence.',
        'And this is the third one. Machine learning is fun!',
        'Is this the first document again? Data science is important.',
        'Artificial intelligence and machine learning are related fields.',
        'The stock market experienced a significant downturn today.',
        'New advancements in AI are revolutionizing healthcare.',
        'Learning Python for data analysis is highly recommended.',
        'The weather forecast predicts rain for tomorrow.',
        'Natural Language Processing is a subfield of AI.'
    ],
    'label': ['ML', 'AI', 'ML', 'ML', 'AI', 'Finance', 'AI', 'ML', 'Weather', 'AI']
}
df = pd.DataFrame(data)
print(df.head())

                                                text label
0  This is the first document. It is about machin...    ML
1  This document is the second document. It discu...    AI
2  And this is the third one. Machine learning is...    ML
3  Is this the first document again? Data science...    ML
4  Artificial intelligence and machine learning a...    AI


In [37]:
# Get English stop words
stop_words = set(stopwords.words('english'))

def preprocess_text(text):
    text = text.lower() # Lowercasing
    text = re.sub(r'[^\\w\\s]', '', text) # Remove punctuation
    tokens = text.split() # Tokenize and remove stop words
    filtered_tokens = [word for word in tokens if word not in stop_words]
    return ' '.join(filtered_tokens)

df['processed_text'] = df['text'].apply(preprocess_text)

print("DataFrame after preprocessing:")
print(df.head())

DataFrame after preprocessing:
                                                text label processed_text
0  This is the first document. It is about machin...    ML          sssss
1  This document is the second document. It discu...    AI        sssssss
2  And this is the third one. Machine learning is...    ML            sss
3  Is this the first document again? Data science...    ML          sssss
4  Artificial intelligence and machine learning a...    AI               


In [38]:
vectorizer = CountVectorizer()
X = vectorizer.fit_transform(df['processed_text'])
y = df['label']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Training the model that has already been preprocessed
mnb_model = MultinomialNB()
mnb_model.fit(X_train, y_train)
y_pred = mnb_model.predict(X_test)
print(y_pred)

accuracy = accuracy_score(y_test, y_pred)
print(f"Model Accuracy: {accuracy:.2f}")
print("Classification Report:", classification_report(y_test, y_pred))

['ML' 'ML' 'ML']
Model Accuracy: 0.00
Classification Report:               precision    recall  f1-score   support

          AI       0.00      0.00      0.00       1.0
     Finance       0.00      0.00      0.00       1.0
          ML       0.00      0.00      0.00       0.0
     Weather       0.00      0.00      0.00       1.0

    accuracy                           0.00       3.0
   macro avg       0.00      0.00      0.00       3.0
weighted avg       0.00      0.00      0.00       3.0



  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
