Suppose we want to build a simple classifier that determines whether an email is spam or legitimate.
We define two possible classes
\begin{align*}
\omega_1 &= \text{spam emails} \\ 
\omega_2 &= \text{legitimate emails}
\end{align*}
we have a feature $x$, which represents the count of promotional words in an email (e.g. "free," "sale," "discount," etc). From prior data or expert knowledge, we know the probabilities of each class are:
\begin{align}
P(\omega_1) = 0.4, \qquad P(\omega_2) = 0.6
\end{align}
Similarly, we assume that the likelihood of spam and non-spam emails follow Gaussian (normal) distribution, so

\begin{align*}
p(x \mid \omega_1) &= \mathcal{N}(\mu_1, \sigma_1^2) &= \frac{1}{\sqrt{2\pi \sigma_1^2}}\exp\left(-\frac{(x-\mu_1)^2}{2\sigma_1^2}\right) \tag{2}\\
p(x|\omega_2) &= \mathcal{N}(\mu_2, \sigma_2^2) &= \frac{1}{\sqrt{2\pi}\sigma_2} \exp\left(-\frac{(x-\mu_2)^2}{2\sigma_2^2}\right) \tag{3}
\end{align*}
Here, each distribution characterizes how the feature $x$ behaves in that class (e.g., spam emails might have a higher mean count of promotional words)

So given this information, how would we determine the optimal decision rule that minimizes classification error?

Intuitively, we should choose whichever class is more probable given our observed feature. In mathematical terms:
$$
 \text{Decide } \omega_1 \text{ if } P(\omega_1\mid x) > P(\omega_2\mid x), \text{ otherwise decide } \omega_2
$$
But how do we calculate $P(\omega_j \mid x)$? Well, Bayes' Rule tells us:
$$
P(A \mid B) = \frac{P(B\mid A)P(A)}{P(B)} \implies P(\omega_j \mid x) = \frac{P(x \mid \omega_j)P(\omega_j)}{P(x)} \tag{4}
$$
We know $P(\omega_j)$, those are just our prior probabilities defined in (1), and we have a likelihood function ((2)-(3)) defined for $P(x \mid \omega_j)$, so how do we find $P(x)$? Well, since any email must belong to one of our two classes, the total probability of observing $x$ is:
$$
p(x) = \sum_{j=1}^2 p(x \mid \omega_j)P(\omega_j) = p(x \mid \omega_1)P(\omega_1) + p(x\mid \omega_2)P(\omega_2)
$$
Similarly, notice by rearranging (4) we get:
\begin{align*}
P(\omega_j \mid x) &= \frac{P(x \mid \omega_j)P(\omega_j)}{P(x)} \\ 
P(x) &= \frac{P(x \mid \omega_j)P(\omega_j)}{P(\omega_j \mid x)} \qquad \qquad \left( \text{since }\sum_j P(\omega_j \mid x) = 1\right) \\ 
P(x) &= P(x \mid \omega_j)P(\omega_j)
\end{align*}

So now we can implement this decision making processes by: (1) defining the likelihood functions for each class, (2) computing the evidence term as a weighted sum, (3) calculating posterior probabilities using Bayes' rule, and (4) classifying based on whichever posterior is larger

In [1]:
import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import norm

np.random.seed(123)

#initial parameters

# Class 1: Spam emails (higher mean promotional word count)
mu1 = 8.0      # Average promotional words in spam
sigma1 = 2.0   # Standard deviation for spam
P_w1 = 0.4     # Prior probability of spam

# Class 2: Legitimate emails (lower mean promotional word count)
mu2 = 3.0      # Average promotional words in legitimate emails
sigma2 = 1.5   # Standard deviation for legitimate
P_w2 = 0.6     # Prior probability of legitimate

#Generate data
n_spam = 200
n_legit = 300

spam_emails = np.random.normal(mu1, sigma1, n_spam)
legit_emails = np.random.normal(mu2, sigma2, n_legit)

#(1) Define likelihood functions
def likelihood_spam(x):
    """p(x | omega_1): Probability of observing x promotional words given spam"""
    return norm.pdf(x, mu1, sigma1)

def likelihood_legit(x):
    """p(x | omega_2): Probability of observing x promotional words given legitimate"""
    return norm.pdf(x, mu2, sigma2)

#(2) Calculate evidence p(x)
def evidence(x):
    """p(x): Total probability of observing x promotional words"""
    return likelihood_spam(x) * P_w1 + likelihood_legit(x) * P_w2

#(3) Calculate posterior probabilities using Bayes' Rule
def posterior_spam(x):
    """P(omega_1 | x): Probability that email is spam given x promotional words"""
    return (likelihood_spam(x) * P_w1) / evidence(x)

def posterior_legit(x):
    """P(omega_2 | x): Probability that email is legitimate given x promotional words"""
    return (likelihood_legit(x) * P_w2) / evidence(x)

#(4) Decision rule
def classify_email(x):
    """Classify email based on maximum posterior probability"""
    if posterior_spam(x) > posterior_legit(x):
        return "Spam"
    else:
        return "Legitimate"



In [2]:
# Test the classifier
print("=" * 70)
print("SPAM EMAIL CLASSIFIER - BAYESIAN DECISION THEORY")
print("=" * 70)
print(f"\nModel Parameters:")
print(f"  Spam emails:       μ₁ = {mu1}, σ₁ = {sigma1}, P(ω₁) = {P_w1}")
print(f"  Legitimate emails: μ₂ = {mu2}, σ₂ = {sigma2}, P(ω₂) = {P_w2}")
print("\n" + "=" * 70)

test_cases = [1, 3, 5, 7, 9, 11]
print("\nClassification Results:")
print("-" * 70)
print(f"{'Words':<8} {'p(x|ω₁)':<12} {'p(x|ω₂)':<12} {'P(ω₁|x)':<12} {'P(ω₂|x)':<12} {'Decision':<12}")
print("-" * 70)

for x in test_cases:
    p_x_given_spam = likelihood_spam(x)
    p_x_given_legit = likelihood_legit(x)
    p_spam_given_x = posterior_spam(x)
    p_legit_given_x = posterior_legit(x)
    decision = classify_email(x)
    
    print(f"{x:<8} {p_x_given_spam:<12.4f} {p_x_given_legit:<12.4f} {p_spam_given_x:<12.4f} {p_legit_given_x:<12.4f} {decision:<12}")

print("-" * 70)

SPAM EMAIL CLASSIFIER - BAYESIAN DECISION THEORY

Model Parameters:
  Spam emails:       μ₁ = 8.0, σ₁ = 2.0, P(ω₁) = 0.4
  Legitimate emails: μ₂ = 3.0, σ₂ = 1.5, P(ω₂) = 0.6


Classification Results:
----------------------------------------------------------------------
Words    p(x|ω₁)      p(x|ω₂)      P(ω₁|x)      P(ω₂|x)      Decision    
----------------------------------------------------------------------
1        0.0004       0.1093       0.0027       0.9973       Legitimate  
3        0.0088       0.2660       0.0215       0.9785       Legitimate  
5        0.0648       0.1093       0.2831       0.7169       Legitimate  
7        0.1760       0.0076       0.9392       0.0608       Spam        
9        0.1760       0.0001       0.9992       0.0008       Spam        
11       0.0648       0.0000       1.0000       0.0000       Spam        
----------------------------------------------------------------------


In [3]:
spam_predictions = [classify_email(x) for x in spam_emails]
legit_predictions = [classify_email(x) for x in legit_emails]

spam_correct = sum(1 for pred in spam_predictions if pred == "Spam")
legit_correct = sum(1 for pred in legit_predictions if pred == "Legitimate")

spam_accuracy = spam_correct / n_spam * 100
legit_accuracy = legit_correct / n_legit * 100
overall_accuracy = (spam_correct + legit_correct) / (n_spam + n_legit) * 100

print("\n" + "=" * 70)
print("CLASSIFIER PERFORMANCE")
print("=" * 70)
print(f"Spam emails correctly classified:       {spam_correct}/{n_spam} ({spam_accuracy:.1f}%)")
print(f"Legitimate emails correctly classified: {legit_correct}/{n_legit} ({legit_accuracy:.1f}%)")
print(f"Overall accuracy:                       {overall_accuracy:.1f}%")
print("=" * 70)


CLASSIFIER PERFORMANCE
Spam emails correctly classified:       178/200 (89.0%)
Legitimate emails correctly classified: 289/300 (96.3%)
Overall accuracy:                       93.4%
