# Naive Bayes

## 1. Bayes' Theorem

Bayes' Theorem states:

$$
P(y | X) = \frac{P(X | y) P(y)}{P(X)}
$$

where:
- \( P(y | X) \) is the **posterior probability** (probability of class \( y \) given features \( X \)).
- \( P(X | y) \) is the **likelihood** (probability of features \( X \) given class \( y \)).
- \( P(y) \) is the **prior probability** of class \( y \).
- \( P(X) \) is the **evidence** (total probability of \( X \) across all classes).

Since \( P(X) \) is the same for all classes:

$$
P(y | X) \propto P(X | y) P(y)
$$

---

## 2. Naïve Assumption (Feature Independence)

If \( X \) has \( n \) features \( x_1, x_2, ..., x_n \), we assume **conditional independence**:

$$
P(X | y) = P(x_1 | y) \cdot P(x_2 | y) \cdot ... \cdot P(x_n | y)
$$

Thus, Bayes’ Theorem simplifies to:

$$
P(y | X) \propto P(y) \prod_{i=1}^{n} P(x_i | y)
$$

This is why it’s called **"Naïve"**—it assumes that all features are **independent**, which may not always be true in practice.

---

## 3. Classification Rule

To classify a new instance \( X \), we compute the posterior for each class \( y \) and **choose the class with the highest probability**:

$$
\hat{y} = \arg\max_{y} P(y) \prod_{i=1}^{n} P(x_i | y)
$$

where:
- \( P(y) \) is estimated as the **proportion of samples** in each class.
- \( P(x_i | y) \) is estimated using **Maximum Likelihood Estimation (MLE)** or **Laplace Smoothing**.

---

## 4. Log Probability for Numerical Stability

Since probabilities can be very small, we take the **log** to avoid underflow:

$$
\log P(y | X) = \log P(y) + \sum_{i=1}^{n} \log P(x_i | y)
$$

This converts **multiplications into additions**, making computations more stable.

In [3]:
import numpy as np

In [8]:
import numpy as np

class NaiveBayes:
    def __init__(self):
        self.class_priors = {}  # P(y) - Prior probabilities
        self.feature_likelihoods = {}  # P(x|y) - Conditional probabilities
        self.classes = None  # Unique class labels

    def fit(self, X, y):
        n_samples, n_features = X.shape
        self.classes = np.unique(y)

        # Compute prior probabilities P(y)
        self.class_priors = {c: np.mean(y == c) for c in self.classes}

        # Compute likelihoods P(x|y) using Laplace smoothing
        self.feature_likelihoods = {}

        for c in self.classes:
            X_c = X[y == c]  # Subset of X where y == c
            self.feature_likelihoods[c] = {}

            for feature_idx in range(n_features):
                # Count occurrences of each feature value
                values, counts = np.unique(X_c[:, feature_idx], return_counts=True)
                total_count = X_c.shape[0]

                # Apply Laplace Smoothing: (count + 1) / (total + num_values)
                likelihoods = {val: (count + 1) / (total_count + len(values)) for val, count in zip(values, counts)}
                
                # Store likelihoods, handling unseen values with smoothing
                self.feature_likelihoods[c][feature_idx] = likelihoods

    def predict(self, X):
        predictions = []

        for x in X:
            posteriors = {}

            for c in self.classes:
                # Start with the log prior P(y)
                posterior = np.log(self.class_priors[c])

                for feature_idx, feature_value in enumerate(x):
                    # Add the log likelihood P(x|y), handling unseen values
                    likelihoods = self.feature_likelihoods[c][feature_idx]
                    posterior += np.log(likelihoods.get(feature_value, 1e-6))  # Small value for unseen cases

                posteriors[c] = posterior

            # Choose class with highest posterior probability
            predictions.append(max(posteriors, key=posteriors.get))

        return np.array(predictions)

In [9]:
if __name__ == "__main__":
    # Example dataset (categorical features: Weather (0=sunny, 1=rainy), Temperature (0=cold, 1=hot))
    X = np.array([
        [0, 1],  # sunny, hot
        [0, 0],  # sunny, cold
        [1, 1],  # rainy, hot
        [1, 0],  # rainy, cold
    ])
    y = np.array([1, 1, 0, 0])  # Play: yes=1, no=0

    nb = NaiveBayes()
    nb.fit(X, y)

    X_test = np.array([
        [0, 1],  # sunny, hot
        [1, 0],  # rainy, cold
    ])
    predictions = nb.predict(X_test)
    print("Predictions:", predictions)

Predictions: [1 0]
