<a href="https://colab.research.google.com/github/lubaochuan/ml_python/blob/main/bayes_rule_intro.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Introduction to Bayesian Inference

### Part 1: The Mathematical Foundation
Bayes' Rule allows us to update the probability of a hypothesis ($H$) based on new evidence ($E$).

$$
\begin{align}
P(H|E) & = \frac{P(E|H) \cdot P(H)}{P(E)}\\
& = \frac{P(E|H) \cdot P(H)}{P(EH)+P(E-H)}\\
& =\frac{P(E|H) \cdot P(H)}{P(E|H) \cdot P(H)+P(E|-H) \cdot P(-H)}
\end{align}
$$

* **P(H|E)**: Posterior - What we believe after seeing evidence.
* **P(E|H)**: Likelihood - How likely the evidence is if the hypothesis is true.
* **P(H)**: Prior - Our initial belief before seeing evidence.
* **P(E)**: Evidence - Total probability of the evidence occurring.
* **P(E|-H)**: False positive rate.

### Part 2: The Rare Disease Paradox
We test for a disease affecting 1% of people. The test is 99% accurate (Sensitivity) but has a 5% false positive rate.

**Sensitivity** (also known as the true positive rate) is the probability that a test will yield a **positive** result given that the person or item actually has the condition or attribute being tested for.

For example, if a disease test has a sensitivity of 99%, it means that 99% of people who *actually have the disease* will test positive.

The **false positive rate** is the probability that the test incorrectly reports a positive result for someone who actually does *not* have the condition.

It is directly related to the **specificity** of a test. Specificity is the probability that a test will yield a *negative* result given that the person does NOT have the disease. In simpler terms:

**False Positive Rate = 1 - Specificity**

For example, if a test has a specificity of 90%, it means that 90% of healthy individuals will correctly test negative. The remaining 10% of healthy individuals will incorrectly test positive, which is the false positive rate.

Both sensitivity and specificity come from the inherent imperfections or design of a diagnostic test.

In [None]:
prior_disease = 0.01
sensitivity = 0.99
false_pos_rate = 0.05

p_evidence = (sensitivity * prior_disease) + (false_pos_rate * (1 - prior_disease))
posterior = (sensitivity * prior_disease) / p_evidence

print(f"Probability of disease given positive test: {posterior:.2%}")

**Annotation:**

This code block demonstrates how to calculate the posterior probability of having a rare disease using Bayes' Theorem. Let's break it down:

1.  **`prior_disease = 0.01`**: This variable sets the *prior probability* of a randomly selected person having the disease to 1%.
2.  **`sensitivity = 0.99`**: This is the test's *sensitivity*, representing the probability of a positive test result given that the person actually has the disease (P(E|H)).
3.  **`false_pos_rate = 0.05`**: This is the *false positive rate*, which is the probability of a positive test result given that the person does NOT have the disease (P(E|not H)).
4.  **`p_evidence = (sensitivity * prior_disease) + (false_pos_rate * (1 - prior_disease))`**: This calculates the *total probability of evidence* (P(E)), which is the probability of getting a positive test result overall. It accounts for both true positives and false positives.
5.  **`posterior = (sensitivity * prior_disease) / p_evidence`**: This is the application of Bayes' Theorem. It calculates the *posterior probability* (P(H|E)), which is the probability of actually having the disease given that the test result is positive.


### Part 3: Simple Spam Filtering
Using a **Naive Bayes** approach to check for multiple suspicious words.

In [None]:
def classify_spam(words, p_spam, word_probs_spam, word_probs_ham):
    p_ham = 1 - p_spam
    likelihood_spam = p_spam
    likelihood_ham = p_ham

    for word in words:
        # Using .get() with 0.1 as a default 'smoothing' value for unknown words
        likelihood_spam *= word_probs_spam.get(word, 0.1)
        likelihood_ham *= word_probs_ham.get(word, 0.1)

    return likelihood_spam / (likelihood_spam + likelihood_ham)

# Data
p_s = 0.3
w_spam = {'winner': 0.6, 'money': 0.5, 'urgent': 0.7}
w_ham = {'winner': 0.01, 'money': 0.05, 'urgent': 0.02}

test_email = ['winner', 'money', 'urgent']
result = classify_spam(test_email, p_s, w_spam, w_ham)
print(f"Spam Probability for {test_email}: {result:.2%}")

**Annotation:**

This code block implements a simplified Naive Bayes approach for classifying an email as spam or not spam, based on the presence of certain words. Let's break it down:

1.  **`def classify_spam(words, p_spam, word_probs_spam, word_probs_ham):`**
    *   This defines a function that takes the following arguments:
        *   `words`: A list of words from the email to be classified.
        *   `p_spam`: The prior probability that *any* email is spam.
        *   `word_probs_spam`: A dictionary where keys are words and values are the probabilities of those words appearing in a spam email (P(word|Spam)).
        *   `word_probs_ham`: A dictionary similar to `word_probs_spam`, but for ham (non-spam) emails (P(word|Ham)).

2.  **`p_ham = 1 - p_spam`**: Calculates the prior probability of an email being ham.

3.  **`likelihood_spam = p_spam`** and **`likelihood_ham = p_ham`**: These initialize the likelihoods with the prior probabilities. As the function iterates through words, these will be updated by multiplying them with the likelihoods of each word, effectively applying a form of Bayes' theorem.

4.  **`for word in words:`**
    *   The code iterates through each word in the `test_email`.
    *   **`likelihood_spam *= word_probs_spam.get(word, 0.1)`**: For each word, it multiplies the current `likelihood_spam` by the probability of that word appearing in a spam email. The `.get(word, 0.1)` part is a simple 'smoothing' technique; if a word is not found in `word_probs_spam`, it assumes a default probability of 0.1 instead of 0, which prevents the entire likelihood from becoming zero if an unknown word is encountered.
    *   **`likelihood_ham *= word_probs_ham.get(word, 0.1)`**: Does the same multiplication for ham emails.

5.  **`return likelihood_spam / (likelihood_spam + likelihood_ham)`**: After processing all words, this calculates the final posterior probability of the email being spam. This is a simplified application of Bayes' rule: P(Spam|Words) = P(Words|Spam) * P(Spam) / P(Words), where P(Words) is approximated by the sum of P(Words|Spam)P(Spam) and P(Words|Ham)P(Ham).

### Part 4: Student Exercise
**Scenario:** A self-driving car sensor detects an obstacle.
* Prior Obstacle Probability: 2%
* Sensor Sensitivity: 95%
* False Positive Rate (Shadows): 10%

**Task:** Fill in the logic below.

In [1]:
def should_brake(p_obs, sense, fp_rate):
    # 1. Total Evidence: P(Detection)
    p_detection =

    # 2. Posterior: P(Obstacle | Detection)
    posterior =
    return posterior

prob_obstacle = should_brake(0.02, 0.95, 0.10)
print(f"Actual probability of obstacle: {prob_obstacle:.2%}")
# expected: Actual probability of obstacle: 16.24%

Actual probability of obstacle: 16.24%


### Part 5: Review & Answer Key

**1. The Base Rate Fallacy**
* **Question:** Why is the 99% accurate test result so low (16.6%) in Part 2?
* **Answer:** Because the disease is rare (1%). The 'noise' from the 99% of healthy people (even at a 5% error rate) outweighs the 'signal' from the 1% who are actually sick.

**2. The 'Naive' Assumption**
* **Question:** Why is it 'Naive' to assume words like 'Winner' and 'Money' are independent?
* **Answer:** In human language, words appear in patterns. If you see 'Winner', you are contextually more likely to see 'Prize' or 'Money'. Naive Bayes ignores these links to stay computationally efficient.

**3. Impact of Priors**
* **Question:** If we knew the patient was in a high-risk group (Prior = 50%), how does the test trust change?
* **Answer:** The posterior jumps to over 95%. Bayes' rule shows that our initial context (prior) is just as important as the new data (test result).