# Bayes' Theorem

* Frequentists and Bayesians
* Humes and Bayes

**Difference between conditional probability and Bayes' Theorem**<br />
https://www.vedantu.com/question-answer/difference-between-conditional-probability-class-12-maths-cbse-60c1f417caeb64334bec837d

$P(A|B) = \frac{P(B|A) P(A)}{P(B)}$<br />
$P(B) = (P(A) * P(B|A)) + (P(\bar{A}) * P(B|\bar{A}))$

Recall dependent intersection and multiplication rule:<br />
$P(A \cap B) = P(A) * P(B|A)$

### Sample Problem

We have a box and a crate full of apples and oranges, 100 fruit in all. We have 40 apples (or 40 percent of our fruit) and 60 oranges (60 % of our fruit). 70% of the apples are in the box. 25% of the oranges are also in the box. You are at the market selling apples and oranges. A customer wants to get an apple from a box. What's the probability of getting an apple from a box?

**Bayes' Approach**

Using Bayes Theorem we want to know the probability of picking an apple given the box. Let's assign $P(A)$ to the apples and assign $P(\overline{A})$ to our oranges. Let's assign $P(B)$ to the box and $P(\overline{B})$ to the crate. Here's our equation once again:

$P(A|B) = \large{\frac{P(A)P(B|A)}{P(B)}}$

$P(A|B)$ reads the probability A given B, or the probability of an apple given the box.
* $P(A)$ = 0.40 (the number of apples / the number of fruit)
* $P(B|A)$ = 0.70 (the decimal representation of the percentage of apples in the box)

All we need now is $P(B)$:<br />
$P(B) = (P(A) * P(B|A)) + (P(\bar{A}) * P(B|\bar{A}))$

We know that $(P(A) * P(B|A))$ = .40 * .70 = .28

What is $(P(\bar{A}) * P(B|\bar{A}))$?<br />
$P(\bar{A})$ = 1 - .40 = .60 or the number of oranges<br />

How many oranges are in the box? .25<br />
So $(P(\bar{A}) * P(B|\bar{A}))$ = .60 * .25 = .15

$P(B)$ = .28 + .15 = .43

What is $P(A|B)$?

In [None]:
# answer
(.4 * .7) / .43

0.6511627906976744

### Tree Diagrams (Probability Trees)
https://www.mathsisfun.com/data/probability-events-conditional.html<br />
https://en.wikipedia.org/wiki/Tree_diagram_(probability_theory)

**Tree Approach**<br /><br />


<pre>
                  0.75  0.6 * 0.75 = 0.45
            crate/
                /
            0.6
           /    \
 oranges  /   box\
         /        0.25  0.6 * 0.25 = 0.15
        /
        \
         \        0.30  0.4 * 0.30 = 0.12
 apples   \ crate/
           \    /
            0.4
                \
              box\
                  0.70  0.4 * 0.70 = 0.28

</pre>

Our root node is our fruit and is split between apples and oranges. Each apple and orange decision node is split according to how many are in the crate and how many are in the box. 75% of the oranges are in the crate and 25% of the oranges are in the box. 30% of the apples are in the crate and 70% of the apples are in the box (notice all of our nodes add up to 100%). 57% of the fruit is in the crate and 43% of the fruit is in the box. Because there are 60 oranges, we know that there are 45 oranges in the crate. We also know there are 28 apples in the box out of the 40 apples we have. The box contains 15 oranges and 28 apples (43 items in all). The probability of getting an apple from the box is 28 / 43 or roughly 65%

### Contingency Table Using Conditions (A given B, not A and B)

In our apples and oranges discussion there are four numbers (that we will convert to a count rather than a percentage) that can be put into a contingency table as shown below:

<pre>
                 crate |   box  
                 -----------------------
      oranges     45   |   15    |  60
                 -----------------------
      apples      12   |   28    |  40
                 -----------------------
                  57   |   43    | 100
</pre>

What percentage of fruit are apples? 40 / 100<br />
What percentage of fruit are in a box? 43 / 100<br />
What percentage of apples are in the box? 28 / 40<br />
What's the probability of picking an apple out of a box?

In [None]:
# What's the probability of picking an apple out of a box?
28 / 43

0.6511627906976745

## Sunrises

David Hume used the rising of the sun as his most famous example to expose the fundamental flaw in **inductive reasoning** and question its rational basis. He argued that our belief that the sun will rise tomorrow is based purely on **habit**, not logic.

---

### Hume's Two Types of Reasoning

Hume separated human knowledge into two categories:

1.  **Relations of Ideas (Deductive Reasoning):** These are logical truths, like mathematics or geometry, where the conclusion is absolutely certain and contained within the premises. Denying a deductive truth results in a **contradiction**.
2.  **Matters of Fact (Inductive Reasoning):** These are claims about the world based on experience. Their truth is **contingent** (it could be otherwise).

---

### The Problem of Induction and the Sunrise

Hume focused on how we predict future events, such as the sun rising:

* **The Inductive Claim:** Every day in the past, the sun has risen. Therefore, the sun will rise tomorrow.
* **The Flaw:** This argument relies on the hidden, unjustified assumption that **the future must resemble the past**. This assumption is often called the **Principle of the Uniformity of Nature**.

#### Why Deduction Fails

Hume pointed out that the opposite of the inductive claim—"The sun will **not** rise tomorrow"—is perfectly conceivable and **does not imply a logical contradiction**. Because the denial of the claim is possible, the claim itself cannot be proven by deductive reasoning.

#### Why Induction Fails

If we try to justify the belief that the future will resemble the past by appealing to experience, the argument becomes circular:

* **Premise:** The future will resemble the past because it has **always** done so in the past.
* **The Circularity:** This argument assumes the very principle it is trying to prove: that what happened in the past (the uniformity of nature) will continue to happen in the future.

---

### Hume's Conclusion

Hume concluded that our expectation that the sun will rise isn't a rational, reasoned certainty. Instead, it is a matter of **custom** and **habit**. Because we have constantly observed a sequence of events (sunrise following dawn), our mind is simply conditioned to anticipate that conjunction in the future. We *must* make these assumptions to live our lives and practice science, but we have no logical justification for them.

## Bayes' Response

**Thomas Bayes did not publish a direct response to David Hume's problem of induction.**

Bayes's most significant work, which contained **Bayes' Theorem**, was published posthumously in 1763 and did not mention Hume. However, his work was immediately recognized by his editor and friend, **Richard Price**, as providing a mathematical framework to address Hume's skepticism.

---

### The "Bayesian" Solution to Induction

While Hume questioned the *rational certainty* of induction, the later Bayesian school of thought provides a framework for its *probabilistic justification*:

* **Hume's Demand:** Hume asked for a **deductive** proof that the future must resemble the past, a demand that he showed cannot be met.
* **The Bayesian Framework:** Bayesianism shifts the focus from certainty to the coherent updating of **degrees of belief** (probabilities). Bayes' Theorem formalizes how a rational agent should update their belief in a hypothesis given new evidence.

$$P(\text{Hypothesis}|\text{Data}) \propto P(\text{Data}|\text{Hypothesis}) \cdot P(\text{Hypothesis})$$

### How Bayes' Theorem Addresses the Circularity

The theorem provides a clear, non-circular rule for learning:

* **Initial Belief (Prior):** You start with a **prior probability** for a hypothesis, such as $P(\text{Sun rises tomorrow})$. Even if this prior is very small (but not zero), it gives the hypothesis a starting chance.
* **Sequential Updating:** Every time the sun rises, that event serves as new **Data** that is fed back into Bayes' Theorem.
* **Coherent Learning:** The theorem dictates that a long string of consistent observations (like the sun rising millions of times) will cause the posterior probability of the hypothesis (the sun-rising mechanism is stable) to converge toward one.

Therefore, the belief that the sun will rise is justified not by a deductive proof of necessity, but by a mathematically sound and rigorous **rule for updating probability** based on accumulating evidence. The process of induction becomes the formal act of applying Bayes' Theorem repeatedly.

### Bayesian Inference

* $P(A)$ a.k.a. the **prior**: A prior probability distribution, often simply called the prior, of an uncertain quantity is the probability distribution that would express one's beliefs about this quantity before some evidence is taken into account. https://en.m.wikipedia.org/wiki/Priowr_probability
* $P(B)$ a.k.a. the **marginal likelihood**: Aa marginal likelihood function, or integrated likelihood, is a likelihood function in which some parameter variables have been marginalized. In the context of Bayesian statistics, it may also be referred to as the evidence or model evidence. https://en.m.wikipedia.org/wiki/Marginal_likelihood
* $P(A|B)$ a.k.a. the **posterior**: This is what e want to know: the probability of a hypothesis given the observed evidence. https://en.m.wikipedia.org/wiki/Bayesian_inference
* $P(B|A)$ a.k.a. the **likelihood**: Indicates the compatibility of the evidence with the given hypothesis. https://en.m.wikipedia.org/wiki/Bayesian_inference

Bayesian inference. (February 8, 2022). In *Wikipedia*. https://en.m.wikipedia.org/wiki/Bayesian_inference

### Frequentist vs Bayesian

* Frequentist finds an expected frequency of something occurring over a large number of experiments
* Baysian has a measure of belief based on prior knowledge

The Master Algorithm: https://www.amazon.com/Master-Algorithm-Ultimate-Learning-Machine-ebook/dp/B012271YB2

### Updating Our Prior

The next day, the same customer comes to you wanting fruit. What's the probability that this same customer will want an apple from a box?

https://www.analyticsvidhya.com/blog/2017/03/conditional-probability-bayes-theorem/

$P(A|B) = \large{\frac{P(A)P(B|A)}{P(B)}}$

**First Pick**
* $A$ = apple
* $\overline{A}$ = orange
* $B$ = box
* $\overline{B}$ = crate
* $P(A)$ = .4
* $P(\overline{A})$ = .6
* $P(B|A)$ = .7
* $P(B|\overline{A})$ = .25
* $P(A)$ * $P(B|A)$ = .4 * .7 = .28
* $P(\overline{A})$ * $P(B|\overline{A})$ = .6 * .25 = .15

$P(B)$ = ($P(A)$ * $P(B|A)$) + (P($\overline{A}$) * $P(B|\overline{A})$)<br />
$P(B)$ = (.4 * .7) + (.6 * .25) = .28 + .15 = .43<br />
$P(A|B)$ = .28 / .43 = .65 (rounded)

65% chance of picking an apple out of a box

**Second Pick**
* $A$ = apple
* $\overline{A}$ = orange
* $B$ = box
* $\overline{B}$ = crate
* $P(A)$ = .65
* $P(\overline{A})$ = .35
* $P(B|A)$ = .7
* $P(B|\overline{A})$ = .25
* $P(A)$ * $P(B|A)$ = .64 * .7 = .28
* $P(\overline{A})$ * $P(B|\overline{A})$ = .6 * .25 = .15

$P(B)$ = ($P(A)$ * $P(B|A)$) + (P($\overline{A}$) * $P(B|\overline{A})$)<br />
$P(B)$ = (.65 * .7) + ((1 - .65) * .25) = (.65 * .7) + (.35 * .25) = .455 + .0875 = .543<br />
$P(A|B)$ = .455 / .543 = .84 (rounded)<br />
There is an 84% chance of the same customer wanting an apple out of a box.

## Examples Using Bayes Theorem

#### Example 1

It has started raining. It rains 10% of the time. We know that flooding can happen when it rains but it is rare. Flooding happnes 1% of the time. 90% of the time it floods we see rain. What's the probability of a flood given rain?<br />

Review the equation:
* P(A|B) = (P(A)P(B|A)) / P(B)
* P(Flood|Rain) = (P(Flood)P(Rain|Flood)) / P(Rain)

What we know:
* P(A) = P(Flood) = .01
* P(B) = P(Rain) = .1
* P(B|A) = P(Rain|Flood) = .9

In [None]:
# solve
.01*.9/.1

0.09000000000000001

Expect flooding 9% of the time if it starts raining

#### Example 2

A company makes clocks. 1% of the clocks are broken. The company tests for broken clocks and the test identifies a broken clock when it is broken 90% of the time. Unfortunately, the test also identifies a broken clock when it is not defective 9.6% of the time. If a clock tests as broken what's the probability that the clock is actually broken?

What we know:
* $P(A)$ = P(defective clock) = .01
* $P(\bar{A})$ = P(not defective clock) = .99
* $P(B|A)$ = P(test identifies defect| defective clock) = .9
* $P(B|\bar{A})$ = P(test identifies defect| not defective clock) = .096

Review the equation:
* P(A|B) = (P(A)P(B|A)) / P(B)
* P(defective clock| test identifies defect) = (P(defective clock) * P(test identifies defect| defected clock)) / P(B)

Sovlve for P(B):
* $P(B) = (P(A) * P(B|A)) + (P(\bar{A}) * P(B|\bar{A}))$
* P(B) = (P(defective clock) * P(test identifies defect| defected clock)) + (P(not defective clock) * P(test identifies defect| not defected clock))
* = (.01 * .9) + (.99 * .096)

P(A|B) = (.01 * .9)/((.01 * .9) + (.99 * .096)) = .009/(.009+.09504) = 0.08651

In [None]:
# solve
(.01 * .9)/((.01 * .9) + (.99 * .096))

0.0865051903114187

#### Example 3

One hundred students enrolled in class and the administration had to divide into two classes. 60 students went into class 1 and 40 students went into class 2. We know that 2% of the students in class 1 are failing and 1.5% of the students are failing in class 2. All one hundred students went on a field trip during midterm. One student was randomly chosen and asked how things were going. Turns out this student is failing. What's the probability that this student is from class 1?

What we know:
* $P(A)$ = P(students in class 1) = .6
* $P(\bar{A})$ = P(students in class 2) = .4
* $P(B)$ = failing
* $P(B|A)$ = P(failing|class1) = .02
* $P(B|\bar{A})$ = P(failing|class2) = .015

Equation review for P(B):
* $P(B) = (P(A) * P(B|A)) + (P(\bar{A}) * P(B|\bar{A}))$
* P(B) = (0.6 * 0.2) + (0.4 * 0.015) = 0.012 + 0.006 = 0.018<br />

P(A|B) = (0.6 * 0.02) / ((0.6 * 0.02) + (0.4 * 0.015))

In [None]:
# solve
(.02 * 0.6) / ((.02 * 0.6) + (0.015 * 0.4))

0.6666666666666666