<a href="https://colab.research.google.com/github/sreent/machine-learning/blob/main/Naive%20Bayes/Naive%20Bayes%20Code%20Walk%20Through.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Naïve Bayes: Code Walk Through

This notebook provides a step-by-step computational walkthrough of **Naïve Bayes** classification using two variants:
- **Gaussian Naïve Bayes** for continuous features
- **Multinomial Naïve Bayes** for discrete count data (text classification)

## What You'll Learn

- How **Bayes' Theorem** forms the foundation of Naïve Bayes classification
- How the **naïve assumption** of conditional independence simplifies computation
- How to compute **prior** and **conditional probabilities** from training data
- How **Gaussian Naïve Bayes** models continuous features using mean and variance
- How **Multinomial Naïve Bayes** handles text classification with word counts
- How **Laplace smoothing** prevents zero probabilities
- How to visualize decision boundaries and probability distributions

---
# Part 1: Gaussian Naïve Bayes

Gaussian Naïve Bayes assumes that features follow a **normal (Gaussian) distribution** for each class.

## 1. Import Libraries

In [None]:
import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import norm

# Set random seed for reproducibility
np.random.seed(42)

## 2. Generate Synthetic Binary Classification Data

We'll create a 2D dataset with two classes, following the pattern from the lecture slides.

In [None]:
# Generate two-class data
N = 50  # samples per class
D = 2   # features

np.random.seed(0)

# Class 0: centered around (1.5, 2.0)
class_0 = np.hstack((
    1.5 + 1.25 * np.random.randn(N, 1),
    2.0 + 1.25 * np.random.randn(N, 1)
))

# Class 1: centered around (4.0, 4.0)
class_1 = np.hstack((
    4.0 + 1.0 * np.random.randn(N, 1),
    4.0 + 1.0 * np.random.randn(N, 1)
))

# Combine into training set
X_train = np.vstack((class_0, class_1))  # shape (2N, 2)
y_train = np.concatenate([np.zeros(N), np.ones(N)])  # shape (2N,)

print(f"X_train shape: {X_train.shape}")
print(f"y_train shape: {y_train.shape}")
print(f"Class distribution: {np.bincount(y_train.astype(int))}")

## 3. Visualize the Data

In [None]:
plt.figure(figsize=(8, 6))
plt.scatter(X_train[y_train == 0, 0], X_train[y_train == 0, 1],
            c='skyblue', label='Class 0', edgecolors='k', s=50)
plt.scatter(X_train[y_train == 1, 0], X_train[y_train == 1, 1],
            c='orange', label='Class 1', edgecolors='k', s=50)
plt.xlabel('$x_1$', fontsize=12)
plt.ylabel('$x_2$', fontsize=12)
plt.title('Binary Classification Data', fontsize=14)
plt.legend()
plt.grid(True, alpha=0.3)
plt.show()

## 4. Understanding Bayes' Theorem

**Bayes' Theorem** allows us to compute the probability of a class given the features:

$$P(y|\vec{x}) = \frac{P(\vec{x}|y) \cdot P(y)}{P(\vec{x})}$$

Where:
- $P(y|\vec{x})$ is the **posterior probability** (probability of class $y$ given features $\vec{x}$)
- $P(\vec{x}|y)$ is the **likelihood** (probability of observing $\vec{x}$ given class $y$)
- $P(y)$ is the **prior probability** (probability of class $y$)
- $P(\vec{x})$ is the **evidence** (probability of observing $\vec{x}$)

### The Naïve Assumption

Naïve Bayes assumes that **features are conditionally independent** given the class:

$$P(\vec{x}|y) = P(x_1, x_2, ..., x_D|y) \approx P(x_1|y) \cdot P(x_2|y) \cdots P(x_D|y)$$

This simplifies to:

$$P(y|\vec{x}) \propto P(y) \prod_{i=1}^{D} P(x_i|y)$$

Since $P(\vec{x})$ is constant for all classes, we can ignore it when comparing probabilities.

## 5. Calculate Prior Probabilities

The **prior probability** $P(y)$ is simply the fraction of training samples in each class.

In [None]:
# Identify unique classes
classes = np.unique(y_train)
print(f"Classes: {classes}")

# Calculate prior probabilities
priors = {}
for c in classes:
    priors[c] = np.sum(y_train == c) / len(y_train)
    print(f"P(y={int(c)}) = {priors[c]:.3f}")

## 6. Fit Gaussian Distributions for Each Feature

For **Gaussian Naïve Bayes**, we assume each feature $x_i$ follows a normal distribution:

$$P(x_i|y) = \frac{1}{\sqrt{2\pi\sigma^2_{y,i}}} \exp\left(-\frac{(x_i - \mu_{y,i})^2}{2\sigma^2_{y,i}}\right)$$

We need to calculate:
- **Mean** $\mu_{y,i}$: average value of feature $i$ for class $y$
- **Variance** $\sigma^2_{y,i}$: variance of feature $i$ for class $y$

In [None]:
# Calculate means and variances for each class and feature
stats = {}

for c in classes:
    # Filter data for class c
    X_c = X_train[y_train == c]

    # Calculate mean and variance for each feature
    means = np.mean(X_c, axis=0)
    variances = np.var(X_c, axis=0)

    # Store in dictionary
    stats[c] = {'means': means, 'variances': variances}

    print(f"\nClass {int(c)}:")
    print(f"  Means:     {means}")
    print(f"  Variances: {variances}")
    print(f"  Std devs:  {np.sqrt(variances)}")

## 7. Visualize the Gaussian Distributions

Let's visualize the fitted Gaussian distributions for each feature.

In [None]:
fig, axes = plt.subplots(1, 2, figsize=(14, 5))

for feature_idx in range(2):
    ax = axes[feature_idx]

    # Plot histogram for each class
    for c in classes:
        X_c = X_train[y_train == c]
        color = 'skyblue' if c == 0 else 'orange'
        label = f'Class {int(c)}'

        # Histogram
        ax.hist(X_c[:, feature_idx], bins=15, alpha=0.5, color=color,
                edgecolor='black', density=True, label=f'{label} (data)')

        # Fitted Gaussian
        mu = stats[c]['means'][feature_idx]
        sigma = np.sqrt(stats[c]['variances'][feature_idx])
        x_range = np.linspace(X_train[:, feature_idx].min() - 1,
                              X_train[:, feature_idx].max() + 1, 200)
        pdf = norm.pdf(x_range, mu, sigma)
        ax.plot(x_range, pdf, linewidth=2, color=color,
                linestyle='--', label=f'{label} N({mu:.2f}, {sigma:.2f})')

    ax.set_xlabel(f'$x_{{{feature_idx+1}}}$', fontsize=12)
    ax.set_ylabel('Probability Density', fontsize=12)
    ax.set_title(f'Feature {feature_idx+1}: Gaussian Distributions', fontsize=13)
    ax.legend()
    ax.grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

## 8. Calculate Likelihood for a Test Point

Let's make a prediction for a test point: $\vec{x}_{test} = [4.0, 3.0]^T$

In [None]:
# Test point
x_test = np.array([4.0, 3.0])
print(f"Test point: {x_test}")

# Calculate likelihood for each class
def gaussian_pdf(x, mean, variance):
    """Calculate Gaussian probability density."""
    return (1 / np.sqrt(2 * np.pi * variance)) * \
           np.exp(-((x - mean) ** 2) / (2 * variance))

print("\n" + "="*60)
print("CALCULATING LIKELIHOODS")
print("="*60)

for c in classes:
    print(f"\nClass {int(c)}:")

    # Calculate P(x_i|y) for each feature
    for i in range(len(x_test)):
        mu = stats[c]['means'][i]
        var = stats[c]['variances'][i]
        likelihood = gaussian_pdf(x_test[i], mu, var)
        print(f"  P(x_{i+1}={x_test[i]:.1f}|y={int(c)}) = {likelihood:.6f}")

## 9. Calculate Posterior Probabilities

Using the naïve assumption:

$$P(y|\vec{x}) \propto P(y) \prod_{i=1}^{D} P(x_i|y)$$

In [None]:
print("="*60)
print("CALCULATING POSTERIOR PROBABILITIES")
print("="*60)

posteriors = {}

for c in classes:
    # Start with prior
    posterior = priors[c]
    print(f"\nClass {int(c)}:")
    print(f"  Prior: P(y={int(c)}) = {posterior:.6f}")

    # Multiply by likelihood of each feature
    for i in range(len(x_test)):
        mu = stats[c]['means'][i]
        var = stats[c]['variances'][i]
        likelihood = gaussian_pdf(x_test[i], mu, var)
        posterior *= likelihood
        print(f"  After feature {i+1}: {posterior:.10f}")

    posteriors[c] = posterior
    print(f"  " + "-" * 50)
    print(f"  Final: P(y={int(c)}|x) ∝ {posterior:.10f}")

## 10. Make Prediction

The predicted class is the one with the highest posterior probability:

$$\hat{y} = \arg\max_y P(y|\vec{x})$$

In [None]:
# Find class with maximum posterior
predicted_class = max(posteriors, key=posteriors.get)

print("\n" + "="*60)
print("PREDICTION")
print("="*60)
print(f"\nTest point: {x_test}")
print(f"\nPosterior probabilities (unnormalized):")
for c in classes:
    print(f"  P(y={int(c)}|x) ∝ {posteriors[c]:.10f}")
print(f"\nPredicted class: {int(predicted_class)}")

# Visualize prediction
plt.figure(figsize=(8, 6))
plt.scatter(X_train[y_train == 0, 0], X_train[y_train == 0, 1],
            c='skyblue', label='Class 0', edgecolors='k', s=50)
plt.scatter(X_train[y_train == 1, 0], X_train[y_train == 1, 1],
            c='orange', label='Class 1', edgecolors='k', s=50)
plt.scatter(x_test[0], x_test[1], c='red', marker='X', s=200,
            edgecolors='black', linewidths=2, label='Test Point', zorder=5)
plt.xlabel('$x_1$', fontsize=12)
plt.ylabel('$x_2$', fontsize=12)
plt.title(f'Prediction: Class {int(predicted_class)}', fontsize=14)
plt.legend()
plt.grid(True, alpha=0.3)
plt.show()

## 11. Visualize Decision Boundary

Let's visualize the decision boundary created by our Gaussian Naïve Bayes classifier.

In [None]:
# Create a mesh for visualization
x1_min, x1_max = X_train[:, 0].min() - 1, X_train[:, 0].max() + 1
x2_min, x2_max = X_train[:, 1].min() - 1, X_train[:, 1].max() + 1
xx1, xx2 = np.meshgrid(np.linspace(x1_min, x1_max, 200),
                        np.linspace(x2_min, x2_max, 200))

# Calculate posterior probability for each point in the mesh
def predict_proba_class_1(X):
    """Predict probability of class 1 for array of points."""
    if X.ndim == 1:
        X = X.reshape(1, -1)

    probs_class_1 = []
    for x in X:
        posteriors_temp = {}
        for c in classes:
            posterior = priors[c]
            for i in range(len(x)):
                mu = stats[c]['means'][i]
                var = stats[c]['variances'][i]
                posterior *= gaussian_pdf(x[i], mu, var)
            posteriors_temp[c] = posterior

        # Normalize posteriors
        total = sum(posteriors_temp.values())
        prob_class_1 = posteriors_temp[1.0] / total if total > 0 else 0.5
        probs_class_1.append(prob_class_1)

    return np.array(probs_class_1)

# Predict for mesh
mesh_points = np.c_[xx1.ravel(), xx2.ravel()]
Z = predict_proba_class_1(mesh_points)
Z = Z.reshape(xx1.shape)

# Plot
plt.figure(figsize=(12, 8))
plt.contourf(xx1, xx2, Z, levels=20, cmap='RdBu_r', alpha=0.6)
plt.colorbar(label='P(y=1|x)')
plt.contour(xx1, xx2, Z, levels=[0.5], colors='black', linewidths=2, linestyles='dashed')

plt.scatter(X_train[y_train == 0, 0], X_train[y_train == 0, 1],
            c='skyblue', label='Class 0', edgecolors='k', s=50)
plt.scatter(X_train[y_train == 1, 0], X_train[y_train == 1, 1],
            c='orange', label='Class 1', edgecolors='k', s=50)

plt.xlabel('$x_1$', fontsize=12)
plt.ylabel('$x_2$', fontsize=12)
plt.title('Gaussian Naïve Bayes Decision Boundary', fontsize=14)
plt.legend()
plt.grid(True, alpha=0.3)
plt.show()

## 12. Comparison with scikit-learn

In [None]:
from sklearn.naive_bayes import GaussianNB

# Train sklearn model
sklearn_model = GaussianNB()
sklearn_model.fit(X_train, y_train)

# Predict on test point
sklearn_pred = sklearn_model.predict(x_test.reshape(1, -1))[0]
sklearn_proba = sklearn_model.predict_proba(x_test.reshape(1, -1))[0]

print("="*60)
print("COMPARISON WITH SCIKIT-LEARN")
print("="*60)
print(f"\nOur prediction:      Class {int(predicted_class)}")
print(f"sklearn prediction:  Class {int(sklearn_pred)}")
print(f"\nsklearn probabilities:")
print(f"  P(y=0|x) = {sklearn_proba[0]:.6f}")
print(f"  P(y=1|x) = {sklearn_proba[1]:.6f}")

# Compare parameters
print(f"\nParameter comparison:")
print(f"\nOur means:")
for c in classes:
    print(f"  Class {int(c)}: {stats[c]['means']}")
print(f"\nsklearn means (theta_):")
print(sklearn_model.theta_)

print(f"\nOur variances:")
for c in classes:
    print(f"  Class {int(c)}: {stats[c]['variances']}")
print(f"\nsklearn variances (var_):")
print(sklearn_model.var_)

---
# Part 2: Multinomial Naïve Bayes

Multinomial Naïve Bayes is used for **discrete count data**, most commonly for **text classification**.

## 13. Create Text Dataset

Following the lecture slides example, we'll create a simple spam/ham email dataset.

In [None]:
# Training emails
spam_emails = [
    "free cash now",
    "limited offer",
    "cash prize waiting"
]

ham_emails = [
    "we meet tomorrow can we",
    "have you seen my book",
    "i will bring cash later"
]

# Combine
emails = spam_emails + ham_emails
labels_text = ['spam'] * len(spam_emails) + ['ham'] * len(ham_emails)
labels_numeric = np.array([1] * len(spam_emails) + [0] * len(ham_emails))

print("Training Emails:")
for i, (email, label) in enumerate(zip(emails, labels_text)):
    print(f"{i+1}. [{label.upper():>4}] {email}")

## 14. Build Vocabulary and Bag-of-Words Representation

We'll represent each email as a **bag-of-words** vector.

In [None]:
# Build vocabulary
vocabulary = set()
for email in emails:
    words = email.split()
    vocabulary.update(words)

vocabulary = sorted(list(vocabulary))
word_to_idx = {word: idx for idx, word in enumerate(vocabulary)}

print(f"Vocabulary (|V| = {len(vocabulary)}):")
print(vocabulary)
print(f"\nWord to index mapping:")
for word, idx in word_to_idx.items():
    print(f"  {idx:2d}: {word}")

In [None]:
# Convert emails to bag-of-words representation
def email_to_bow(email, word_to_idx):
    """Convert email to bag-of-words vector."""
    bow = np.zeros(len(word_to_idx))
    for word in email.split():
        if word in word_to_idx:
            bow[word_to_idx[word]] += 1
    return bow

# Create feature matrix
X_text = np.array([email_to_bow(email, word_to_idx) for email in emails])

print("Bag-of-Words Representation:")
print(f"Shape: {X_text.shape}\n")
print("Features (words):")
print(" ".join(f"{word:>8}" for word in vocabulary))
print("-" * (9 * len(vocabulary)))
for i, (email, label, bow) in enumerate(zip(emails, labels_text, X_text)):
    bow_str = " ".join(f"{int(count):>8}" for count in bow)
    print(f"{bow_str}  <- [{label.upper():>4}] {email}")

## 15. Calculate Prior Probabilities for Text

In [None]:
# Calculate priors
n_spam = np.sum(labels_numeric == 1)
n_ham = np.sum(labels_numeric == 0)
n_total = len(labels_numeric)

prior_spam = n_spam / n_total
prior_ham = n_ham / n_total

print("Prior Probabilities:")
print(f"  P(spam) = {n_spam}/{n_total} = {prior_spam:.3f}")
print(f"  P(ham)  = {n_ham}/{n_total} = {prior_ham:.3f}")

## 16. Calculate Word Likelihoods with Laplace Smoothing

For Multinomial Naïve Bayes:

$$P(word|class) = \frac{\text{Count}(word|class) + \alpha}{\text{Total words in class} + \alpha \cdot |V|}$$

Where:
- $\alpha$ is the **Laplace smoothing** parameter (usually $\alpha = 1$)
- $|V|$ is the vocabulary size

In [None]:
# Laplace smoothing parameter
alpha = 1.0

# Calculate word counts for each class
word_counts_spam = X_text[labels_numeric == 1].sum(axis=0)
word_counts_ham = X_text[labels_numeric == 0].sum(axis=0)

# Total word counts
total_words_spam = word_counts_spam.sum()
total_words_ham = word_counts_ham.sum()

print(f"Word counts per class:\n")
print(f"Spam: {word_counts_spam} (total = {total_words_spam})")
print(f"Ham:  {word_counts_ham} (total = {total_words_ham})")

# Calculate likelihoods with Laplace smoothing
vocab_size = len(vocabulary)
likelihood_spam = (word_counts_spam + alpha) / (total_words_spam + alpha * vocab_size)
likelihood_ham = (word_counts_ham + alpha) / (total_words_ham + alpha * vocab_size)

print(f"\nWord Likelihoods (with α={alpha}):")
print(f"\n{'Word':<10} P(word|spam)  P(word|ham)")
print("-" * 40)
for i, word in enumerate(vocabulary):
    print(f"{word:<10} {likelihood_spam[i]:.6f}    {likelihood_ham[i]:.6f}")

## 17. Make Prediction on New Email

Let's predict whether "**limited cash offer now free cash**" is spam or ham.

In [None]:
# New email to classify
new_email = "limited cash offer now free cash"
print(f"New email: \"{new_email}\"\n")

# Convert to bag-of-words
new_bow = email_to_bow(new_email, word_to_idx)
print("Bag-of-words representation:")
for word, count in zip(vocabulary, new_bow):
    if count > 0:
        print(f"  {word}: {int(count)}")

# Calculate log posterior for spam
log_posterior_spam = np.log(prior_spam)
print(f"\n{'='*60}")
print("CALCULATING LOG POSTERIOR FOR SPAM")
print(f"{'='*60}")
print(f"log P(spam) = {log_posterior_spam:.6f}")

for i, word in enumerate(vocabulary):
    if new_bow[i] > 0:
        contribution = new_bow[i] * np.log(likelihood_spam[i])
        log_posterior_spam += contribution
        print(f"+ {int(new_bow[i])} × log P({word}|spam) = {int(new_bow[i])} × {np.log(likelihood_spam[i]):.6f} = {contribution:.6f}")

print(f"{'-'*60}")
print(f"Total log P(spam|email) = {log_posterior_spam:.6f}")

# Calculate log posterior for ham
log_posterior_ham = np.log(prior_ham)
print(f"\n{'='*60}")
print("CALCULATING LOG POSTERIOR FOR HAM")
print(f"{'='*60}")
print(f"log P(ham) = {log_posterior_ham:.6f}")

for i, word in enumerate(vocabulary):
    if new_bow[i] > 0:
        contribution = new_bow[i] * np.log(likelihood_ham[i])
        log_posterior_ham += contribution
        print(f"+ {int(new_bow[i])} × log P({word}|ham) = {int(new_bow[i])} × {np.log(likelihood_ham[i]):.6f} = {contribution:.6f}")

print(f"{'-'*60}")
print(f"Total log P(ham|email) = {log_posterior_ham:.6f}")

# Make prediction
print(f"\n{'='*60}")
print("PREDICTION")
print(f"{'='*60}")
prediction = "spam" if log_posterior_spam > log_posterior_ham else "ham"
print(f"\nlog P(spam|email) = {log_posterior_spam:.6f}")
print(f"log P(ham|email)  = {log_posterior_ham:.6f}")
print(f"\nPredicted class: {prediction.upper()}")

## 18. Comparison with scikit-learn for Multinomial NB

In [None]:
from sklearn.naive_bayes import MultinomialNB

# Train sklearn model
sklearn_mnb = MultinomialNB(alpha=alpha)
sklearn_mnb.fit(X_text, labels_numeric)

# Predict on new email
sklearn_pred = sklearn_mnb.predict(new_bow.reshape(1, -1))[0]
sklearn_proba = sklearn_mnb.predict_proba(new_bow.reshape(1, -1))[0]
sklearn_log_proba = sklearn_mnb.predict_log_proba(new_bow.reshape(1, -1))[0]

print("="*60)
print("COMPARISON WITH SCIKIT-LEARN")
print("="*60)
print(f"\nOur prediction:      {prediction.upper()}")
print(f"sklearn prediction:  {'SPAM' if sklearn_pred == 1 else 'HAM'}")

print(f"\nOur log probabilities:")
print(f"  log P(ham|email)  = {log_posterior_ham:.6f}")
print(f"  log P(spam|email) = {log_posterior_spam:.6f}")

print(f"\nsklearn log probabilities:")
print(f"  log P(ham|email)  = {sklearn_log_proba[0]:.6f}")
print(f"  log P(spam|email) = {sklearn_log_proba[1]:.6f}")

print(f"\nsklearn probabilities (normalized):")
print(f"  P(ham|email)  = {sklearn_proba[0]:.6f}")
print(f"  P(spam|email) = {sklearn_proba[1]:.6f}")

## 19. Understanding Numerical Underflow

When multiplying many small probabilities, we risk **numerical underflow** (values becoming too small to represent).

**Solution**: Use **log probabilities** and addition instead:

$$\log P(y|\vec{x}) = \log P(y) + \sum_{i=1}^{D} \log P(x_i|y)$$

In [None]:
# Demonstrate numerical underflow
print("Demonstrating Numerical Underflow:")
print(f"\n0.1 ** 10   = {0.1 ** 10:.2e}")
print(f"0.1 ** 100  = {0.1 ** 100:.2e}")
print(f"0.1 ** 1000 = {0.1 ** 1000:.2e}  <- Underflow!")

print(f"\nUsing logs:")
print(f"10   × log(0.1) = {10 * np.log(0.1):.6f}")
print(f"100  × log(0.1) = {100 * np.log(0.1):.6f}")
print(f"1000 × log(0.1) = {1000 * np.log(0.1):.6f}  <- No underflow!")

## Summary

In this walkthrough, we explored:

### Gaussian Naïve Bayes
1. **Built from Bayes' Theorem** with the naïve conditional independence assumption
2. **Computed prior probabilities** from class frequencies in training data
3. **Fitted Gaussian distributions** (mean and variance) for each feature per class
4. **Calculated likelihoods** using the Gaussian probability density function
5. **Made predictions** by selecting the class with highest posterior probability
6. **Visualized decision boundaries** showing how the classifier separates classes

### Multinomial Naïve Bayes
1. **Represented text as bag-of-words** vectors (word count features)
2. **Calculated word likelihoods** with Laplace smoothing to prevent zero probabilities
3. **Used log probabilities** to avoid numerical underflow
4. **Classified new documents** by comparing log posterior probabilities

### Key Concepts

| Concept | Description |
|---------|-------------|
| **Bayes' Theorem** | $P(y|\vec{x}) = \frac{P(\vec{x}|y) \cdot P(y)}{P(\vec{x})}$ |
| **Naïve Assumption** | Features are conditionally independent: $P(\vec{x}|y) \approx \prod_{i=1}^{D} P(x_i|y)$ |
| **Prior** | $P(y)$ - probability of a class before seeing features |
| **Likelihood** | $P(\vec{x}|y)$ - probability of features given a class |
| **Posterior** | $P(y|\vec{x})$ - probability of class given features |
| **Laplace Smoothing** | Add $\alpha$ to counts to avoid zero probabilities |
| **Log Probabilities** | Use $\log$ to prevent numerical underflow |

### When to Use Naïve Bayes

**Advantages:**
- Fast training and prediction
- Works well with high-dimensional data
- Performs well with small training sets
- Provides probabilistic predictions
- Simple and interpretable

**Best suited for:**
- Text classification (spam detection, sentiment analysis)
- Document categorization
- Real-time prediction
- When features are relatively independent

**Limitations:**
- Assumes feature independence (often violated in practice)
- Gaussian NB is sensitive to outliers
- Can be outperformed by more complex models on large datasets

### Comparison with Other Algorithms

| Algorithm | Training Speed | Prediction Speed | Handles High Dimensions | Probabilistic | Feature Independence |
|-----------|---------------|------------------|------------------------|---------------|---------------------|
| **Naïve Bayes** | Very Fast | Very Fast | Excellent | Yes | Assumes |
| **Logistic Regression** | Fast | Fast | Good | Yes | Not required |
| **KNN** | None | Slow | Poor | No | Not required |
| **Decision Trees** | Medium | Fast | Good | No | Not required |