## Choosing the Right Naïve Bayes Variant: Multinomial, Bernoulli, Gaussian

We will focus on understanding **when to use which Naïve Bayes model**.

**What we will do in this exercise**

We will cover the following topics in this notebook:
- Multinomial NB – word counts / frequency features
- Bernoulli NB – binary features
- Gaussian NB – continuous features
- Choosing the correct model for different datasets (text, binary indicators, continuous attributes)

**Learning Objective:**
- We will learn to distinguish between the three main Naïve Bayes variants and identify appropriate use-cases for each based on data type.
- We will learn to select the correct Naïve Bayes variant for a given problem, understanding the underlying distribution assumption for each variant.

**Let's get started now**

### Setup and Imports

In [1]:
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.naive_bayes import MultinomialNB, BernoulliNB, GaussianNB
from sklearn.metrics import accuracy_score, confusion_matrix, classification_report

### Load Email Spam Detection dataset

We will reuse the `Spam dataset` for Gaussian NB and prepare additional datasets to illustrate Multinomial and Bernoulli Naïve Bayes.


**Note:** For the text-based examples, we will create a small text corpus to demonstrate word-frequency-based features.

In [2]:
df = pd.read_csv("/content/Spam.csv")
df.head()

Unnamed: 0,word_freq_make,word_freq_address,word_freq_all,word_freq_3d,word_freq_our,word_freq_over,word_freq_remove,word_freq_internet,word_freq_order,word_freq_mail,...,char_freq_;,char_freq_(,char_freq_[,char_freq_!,char_freq_$,char_freq_hash,capital_run_length_average,capital_run_length_longest,capital_run_length_total,spam
0,0.0,0.64,0.64,0.0,0.32,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.778,0.0,0.0,3.756,61,278,1
1,0.21,0.28,0.5,0.0,0.14,0.28,0.21,0.07,0.0,0.94,...,0.0,0.132,0.0,0.372,0.18,0.048,5.114,101,1028,1
2,0.06,0.0,0.71,0.0,1.23,0.19,0.19,0.12,0.64,0.25,...,0.01,0.143,0.0,0.276,0.184,0.01,9.821,485,2259,1
3,0.0,0.0,0.0,0.0,0.63,0.0,0.31,0.63,0.31,0.63,...,0.0,0.137,0.0,0.137,0.0,0.0,3.537,40,191,1
4,0.0,0.0,0.0,0.0,0.63,0.0,0.31,0.63,0.31,0.63,...,0.0,0.135,0.0,0.135,0.0,0.0,3.537,40,191,1


### Gaussian Naïve Bayes — Continuous Features

When we look at the Spambase features, these are engineered as **continuous** numerical values. This representation of the original data make it suitable for the **GaussianNB** model.

Next, let us prepare the data to apply `GaussianNB` model.

In [3]:
X = df.drop(columns=['spam']).values
y = df['spam'].values
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42, stratify=y)

In [4]:
gnb = GaussianNB()
gnb.fit(X_train, y_train)
y_pred_gnb = gnb.predict(X_test)
print('GaussianNB accuracy:', accuracy_score(y_test, y_pred_gnb))
print(classification_report(y_test, y_pred_gnb))

GaussianNB accuracy: 0.8240405503258509
              precision    recall  f1-score   support

           0       0.96      0.74      0.84       837
           1       0.70      0.95      0.81       544

    accuracy                           0.82      1381
   macro avg       0.83      0.85      0.82      1381
weighted avg       0.86      0.82      0.83      1381



### Multinomial Naïve Bayes — Word Counts / Frequency Features

The **MultinomialNB** is ideal for discrete count-based features (e.g., term frequencies in text). Since our current dataset has continuous features, we will convert continuous features to positive integer "count-like” values.

In [5]:
X_train_m = np.round(np.maximum(X_train, 0)).astype(int)
X_test_m = np.round(np.maximum(X_test, 0)).astype(int)
mnb = MultinomialNB()
mnb.fit(X_train_m, y_train)
y_pred_m = mnb.predict(X_test_m)
print("MultinomialNB Accuracy:", accuracy_score(y_test, y_pred_m))
print(classification_report(y_test, y_pred_m))

MultinomialNB Accuracy: 0.7661115133960897
              precision    recall  f1-score   support

           0       0.80      0.82      0.81       837
           1       0.71      0.69      0.70       544

    accuracy                           0.77      1381
   macro avg       0.76      0.75      0.75      1381
weighted avg       0.76      0.77      0.77      1381



### Bernoulli Naïve Bayes — Binary Features

The **BernoulliNB** model is best for binary indicator features (e.g., word presence/absence or yes/no attributes). We'll reuse the same text data, but binarise it to convert continuous features to binary (presence/absence)

In [6]:
X_train_b = (X_train > 0).astype(int)
X_test_b = (X_test > 0).astype(int)

In [7]:
bnb = BernoulliNB()
bnb.fit(X_train_b, y_train)
y_pred_b = bnb.predict(X_test_b)
print("BernoulliNB Accuracy:", accuracy_score(y_test, y_pred_b))
print(classification_report(y_test, y_pred_b))

BernoulliNB Accuracy: 0.8870383779869659
              precision    recall  f1-score   support

           0       0.89      0.93      0.91       837
           1       0.89      0.82      0.85       544

    accuracy                           0.89      1381
   macro avg       0.89      0.87      0.88      1381
weighted avg       0.89      0.89      0.89      1381



### Summary Table Comparing The Three Methods Applied

| NB Variant | Suitable For | Example Dataset | Feature Type | Key Distribution |
|-------------|---------------|------------------|----------------|------------------|
| GaussianNB | Continuous numeric features | Spambase | Continuous | Normal (Gaussian) |
| MultinomialNB | Word counts / frequencies | Text corpus | Discrete counts | Multinomial |
| BernoulliNB | Binary features | Binarized text | 0/1 indicators | Bernoulli |

In [8]:
display(pd.DataFrame({
    'Model': ['GaussianNB', 'MultinomialNB', 'BernoulliNB'],
    'Accuracy': [
        accuracy_score(y_test, y_pred_gnb),
        accuracy_score(y_test, y_pred_m),
        accuracy_score(y_test, y_pred_b)
    ]
}))

Unnamed: 0,Model,Accuracy
0,GaussianNB,0.824041
1,MultinomialNB,0.766112
2,BernoulliNB,0.887038


These results beautifully demonstrate that the “best” variant depends on how feature distributions align with model assumptions and not on which algorithm is “better.”  Even though BernoulliNB performed best here, that does not mean it is always best. It means:
- In this dataset, presence/absence signals happen to correlate strongly with the class label.
- In datasets where magnitude matters (e.g., how many times a keyword appears), MultinomialNB would likely outperform.

For genuinely continuous numeric data (like sensor readings), GaussianNB remains the correct and stable choice.

Let us now summarise our findings:

- **GaussianNB** → Continuous numeric data (e.g., email metadata, sensor data)
- **MultinomialNB** → Count/frequency-based features (e.g., bag-of-words, text classification)
- **BernoulliNB** → Binary indicators (e.g., feature present/absent)


Choose the Naïve Bayes variant based on **the type of feature distribution** and not the task type.

### Conclusion

In this exercise, we explored all three main Naïve Bayes variants side by side. We learned to connect the **data type and feature distribution** with the corresponding model: Gaussian for continuous, Multinomial for counts, and Bernoulli for binary indicators. Understanding these distinctions ensures correct model choice, improved interpretability, and better real-world performance when applying Naïve Bayes to diverse datasets.