# Lesson 6A: Naive Bayes TheoryProbabilistic classification using Bayes' Theorem - perfect for text and medical diagnosis!

## IntroductionImagine a doctor diagnosing flu. They consider:- P(flu | fever) = How likely is flu given you have fever?- P(fever | flu) = How likely is fever if you have flu?- P(flu) = Base rate of flu in populationNaive Bayes formalizes this reasoning using probability theory.

## Table of Contents1. Bayes' Theorem2. The "naive" assumption3. Types of Naive Bayes4. Math derivation5. Implementation from scratch6. Text classification example

In [None]:
import numpy as npimport matplotlib.pyplot as pltfrom sklearn.datasets import load_breast_cancer, fetch_20newsgroupsfrom sklearn.model_selection import train_test_splitfrom sklearn.feature_extraction.text import CountVectorizerfrom sklearn.metrics import accuracy_score, classification_reportfrom collections import defaultdictnp.random.seed(42)print('✅ Libraries loaded')

## Bayes' Theorem**The foundation:**$P(y|X) = \frac{P(X|y) \cdot P(y)}{P(X)}$Where:- P(y|X) = Posterior: Probability of class y given features X- P(X|y) = Likelihood: Probability of features given class- P(y) = Prior: Base probability of class- P(X) = Evidence: Probability of features (constant)**For classification:** Pick class with highest P(y|X)

## The "Naive" AssumptionAssume features are **conditionally independent** given the class:$P(X|y) = P(x_1|y) \cdot P(x_2|y) \cdot ... \cdot P(x_n|y) = \prod_{i=1}^{n} P(x_i|y)$This is "naive" because features are usually correlated!**Example:** Spam detection- "Free" and "money" often appear together- But we assume independence- Surprisingly, it still works well!

## Types of Naive Bayes**1. Gaussian NB:** Features are continuous, normally distributed$P(x_i|y) = \frac{1}{\sqrt{2\pi\sigma_y^2}} \exp\left(-\frac{(x_i - \mu_y)^2}{2\sigma_y^2}\right)$**2. Multinomial NB:** Features are counts (text classification)$P(x_i|y) = \frac{count(x_i, y) + \alpha}{\sum_j count(x_j, y) + \alpha n}$**3. Bernoulli NB:** Binary features (word present/absent)

In [None]:
class GaussianNB:    def fit(self, X, y):        self.classes = np.unique(y)        self.mean = {}        self.var = {}        self.priors = {}        for c in self.classes:            X_c = X[y == c]            self.mean[c] = X_c.mean(axis=0)            self.var[c] = X_c.var(axis=0)            self.priors[c] = len(X_c) / len(X)    def _gaussian_prob(self, x, mean, var):        eps = 1e-4  # Avoid division by zero        coeff = 1.0 / np.sqrt(2.0 * np.pi * var + eps)        exponent = np.exp(-((x - mean) ** 2 / (2 * var + eps)))        return coeff * exponent    def predict(self, X):        preds = []        for x in X:            posts = []            for c in self.classes:                prior = np.log(self.priors[c])                likelihood = np.sum(np.log(self._gaussian_prob(x, self.mean[c], self.var[c])))                posts.append(prior + likelihood)            preds.append(self.classes[np.argmax(posts)])        return np.array(preds)print('✅ Gaussian Naive Bayes implemented!')

In [None]:
# Test on breast cancerdata = load_breast_cancer()X, y = data.data, data.targetX_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)nb = GaussianNB()nb.fit(X_train, y_train)y_pred = nb.predict(X_test)print(f'Accuracy: {accuracy_score(y_test, y_pred):.3f}')print('\n✅ Naive Bayes from scratch works!')

## Why Naive Bayes WorksDespite the naive assumption:- Fast: Only needs to estimate means/variances- Works well with small data- Excellent for text classification- Handles high dimensions- Probabilities are interpretable

## Conclusion**Pros:**- ✅ Fast training and prediction- ✅ Works with small datasets- ✅ Handles high dimensions- ✅ Probabilistic predictions- ✅ Excellent for text**Cons:**- ❌ Naive independence assumption- ❌ Can't learn feature interactions- ❌ Sensitive to zero probabilities**Best for:** Spam detection, sentiment analysis, document classification**Next:** Lesson 6B - Text classification with scikit-learn!