# Bayesian Theory

![Bayes Theorem](https://upload.wikimedia.org/wikipedia/commons/1/18/Bayes%27_Theorem_MMB_01.jpg)

_Bayes Theorem (Source [commons.wikimedia.org](https://commons.wikimedia.org/wiki/File:Bayes%27_Theorem_MMB_01.jpg))_

## Introduction

**Bayes' theorem** is also a fundamental concept in probability theory and statistics, playing an important role in various domains such as:

- 🔬 Scientific discovery,
- 💻 Machine learning and AI.
- ⛏️ It has even found application in treasure hunting, as demonstrated by [Tommy Thompson's](https://projectnile.in/2021/04/04/bayes-theorem-in-the-face-of-reality/) success using Bayesian search tactics in the 1980s.

Bayes' Theorem helps update our beliefs based on new evidence. This is important because it allows us to make more accurate predictions and judgments by incorporating prior knowledge.

The mathematical representation for Bayes' Theorem is:

$$P(H|E) = \frac{P(H).P(E|H)}{P(E)}$$

- $P(H)$: Probability that a **hypothesis** is true before any evidence is available.
- $P(E|H)$: Probability of observing the evidence given that the **hypothesis** is true.
- $P(E)$: Probability of observing the evidence.
- $P(H|E)$: Probability that the **hypothesis** is true given some evidence.

Let's use Budi, a **shy and meticulous** individual, as an example. We're contemplating whether Budi is more likely to be a **👨‍🔬 librarian** or a **👨‍🌾 farmer** where both exhibits these characteristics. Usually people often lean towards stereotypes, in this case, assuming Budi is a **librarian**.

- 👨‍🔬 90% [📚📚📚📚📚📚📚📚📚🌾] 10%

However, this disregards the fact that farmers outnumber librarians in the Indonesian. by a ratio of **20:1**. We should consider the significantly larger population of farmers. Thus, even if "shy and meticulous" seems to lean more towards a librarian, the number of farmers who could also be "shy and meticulous" is high, keeping the probability of Budi being a farmer rather high.

**1** [📚🌾🌾🌾🌾🌾🌾🌾🌾🌾🌾🌾🌾🌾🌾🌾🌾🌾🌾🌾🌾] **20**

Therefore, even if only 10% of farmers fit Budi's description compared to 40% of librarians, there would still be more farmers than librarians who do.

The table below represents a survey of a sample of 🌾 **200** farmers and 📚 **10** librarians, and the number of each who are "shy and meticulous".

| 📚  | 🌾  | 🌾  | 🌾  | 🌾  | 🌾  | 🌾  | 🌾  | 🌾  | 🌾  | 🌾  | 🌾  | 🌾  | 🌾  | 🌾  | 🌾  | 🌾  | 🌾  | 🌾  | 🌾  | 🌾  |
| --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- |
| 📚  | 🌾  | 🌾  | 🌾  | 🌾  | 🌾  | 🌾  | 🌾  | 🌾  | 🌾  | 🌾  | 🌾  | 🌾  | 🌾  | 🌾  | 🌾  | 🌾  | 🌾  | 🌾  | 🌾  | 🌾  |
| 📚  | 🌾  | 🌾  | 🌾  | 🌾  | 🌾  | 🌾  | 🌾  | 🌾  | 🌾  | 🌾  | 🌾  | 🌾  | 🌾  | 🌾  | 🌾  | 🌾  | 🌾  | 🌾  | 🌾  | 🌾  |
| 📚  | 🌾  | 🌾  | 🌾  | 🌾  | 🌾  | 🌾  | 🌾  | 🌾  | 🌾  | 🌾  | 🌾  | 🌾  | 🌾  | 🌾  | 🌾  | 🌾  | 🌾  | 🌾  | 🌾  | 🌾  |
| 📚  | 🌾  | 🌾  | 🌾  | 🌾  | 🌾  | 🌾  | 🌾  | 🌾  | 🌾  | 🌾  | 🌾  | 🌾  | 🌾  | 🌾  | 🌾  | 🌾  | 🌾  | 🌾  | 🌾  | 🌾  |
| 📚  | 🌾  | 🌾  | 🌾  | 🌾  | 🌾  | 🌾  | 🌾  | 🌾  | 🌾  | 🌾  | 🌾  | 🌾  | 🌾  | 🌾  | 🌾  | 🌾  | 🌾  | 🌾  | 🌾  | 🌾  |
| 📚  | 🌾  | 🌾  | 🌾  | 🌾  | 🌾  | 🌾  | 🌾  | 🌾  | 🌾  | 🌾  | 🌾  | 🌾  | 🌾  | 🌾  | 🌾  | 🌾  | 🌾  | 🌾  | 🌾  | 🌾  |
| 📚  | 🌾  | 🌾  | 🌾  | 🌾  | 🌾  | 🌾  | 🌾  | 🌾  | 🌾  | 🌾  | 🌾  | 🌾  | 🌾  | 🌾  | 🌾  | 🌾  | 🌾  | 🌾  | 🌾  | 🌾  |
| 📚  | 🌾  | 🌾  | 🌾  | 🌾  | 🌾  | 🌾  | 🌾  | 🌾  | 🌾  | 🌾  | 🌾  | 🌾  | 🌾  | 🌾  | 🌾  | 🌾  | 🌾  | 🌾  | 🌾  | 🌾  |

If 40% of librarians (4 librarians) and 10% of farmers (20 farmers) fit the description, the probability that a random individual fitting the description is a librarian is:

| - | -  | -  | -  | -  | -  | -  | -  | -  | -  | -  | -  | -  | -  | -  | -  | -  | -  | -  | -  | -  |
| --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- |
| - | -  | -  | -  | -  | -  | -  | -  | -  | -  | -  | -  | -  | -  | -  | -  | -  | -  | -  | -  | -  |
| - | -  | -  | -  | -  | -  | -  | -  | -  | -  | -  | -  | -  | -  | -  | -  | -  | -  | -  | -  | -  |
| - | -  | -  | -  | -  | -  | -  | -  | -  | -  | -  | -  | -  | -  | -  | -  | -  | -  | -  | -  | -  |
| - | -  | -  | -  | -  | -  | -  | -  | -  | -  | -  | -  | -  | -  | -  | -  | -  | -  | -  | -  | -  |
| 📚  | -  | -  | -  | -  | -  | -  | -  | -  | -  | -  | -  | -  | -  | -  | -  | -  | -  | -  | -  | -  |
| 📚  | -  | -  | -  | -  | -  | -  | -  | -  | -  | -  | -  | -  | -  | -  | -  | -  | -  | -  | -  | -  |
| 📚  | -  | -  | -  | -  | -  | -  | -  | -  | -  | -  | -  | -  | -  | -  | -  | -  | -  | -  | -  | -  |
| 📚  | 🌾  | 🌾  | 🌾  | 🌾  | 🌾  | 🌾  | 🌾  | 🌾  | 🌾  | 🌾  | 🌾  | 🌾  | 🌾  | 🌾  | 🌾  | 🌾  | 🌾  | 🌾  | 🌾  | 🌾  |

$$P(Librarian \ given \ description) = \frac{4}{4 + 20} = 16.7\%$$

Therefore, even if you believe a librarian is four times as likely as a farmer to fit this description.

It's more probable that Budi is a farmer due to the larger number of farmers. This example demonstrates how Bayes’ theorem can help make more rational judgments by taking into account all relevant probabilities.

**Key Takeaway**: The essence of Bayes' theorem is that it doesn't form beliefs based on new evidence alone; instead, it updates pre-existing beliefs with that new evidence. This approach to evidence effectively narrows down the possibilities and highlights the importance of considering all relevant ratios.

![](https://storage.googleapis.com/rg-ai-bootcamp/toolkits/heart-of-bayes-theorem-min.png)

(Interactive created by Reddit user Thoggalluth: [Link](https://www.skobelevs.ie/BayesTheorem/))

If you understand this concept, congratulations! You've grasped the heart of Bayes' theorem. The specific numbers you estimate may vary, but the crucial part is how you use these numbers to update a belief based on evidence. Now, consider generalizing this process and expressing it as a formula.

## Formulating a General Rule

Bayes' theorem applies when we have:

- A hypothesis (e.g., Budi is a librarian)
- Some evidence (e.g., Budi is a shy, meticulous individual)
- The goal to determine the hypothesis's probability given the evidence is true. This is represented as $P(H|E)$, where "H" is the hypothesis and "E" is the evidence.

![](https://storage.googleapis.com/rg-ai-bootcamp/toolkits/bayesian-rule-min.png)

Let's break down this explanation of Bayes' theorem, which is a principle in probability theory and statistics that describes how to update the probabilities of hypotheses when given evidence.

1. **Goal**: The goal is to calculate the probability of the hypothesis being true given the evidence, denoted as $P(H∣E)$.
2. **Prior**: The Prior is the initial probability of the hypothesis before we have any evidence, denoted as $P(H)$. In the example, this is 1/21.
3. **Likelihood**: The Likelihood is the probability of the evidence given the hypothesis is true, denoted as $P(E∣H)$.
4. **Negation**: $P(¬H)$ is the probability of the hypothesis not being true. In this case, it's the probability of Budi not being a librarian, which is 20/21.
5. **Calculation**: The calculation is made using Bayes' theorem to find the probability of our hypothesis given the evidence, denoted as $P(H∣E)$. This is known as the Posterior probability.
6. **Posterior**: The Posterior is the updated probability of the hypothesis being true after considering the evidence. It is calculated using the formula:

  $$P(H|E) = \frac{P(H)P(E|H)}{P(E)}$$
​
  Here, $P(E)$ is calculated as $P(H)P(E∣H)+P(¬H)P(E∣¬H)$. It is the total probability of the evidence.

7. **Belief Modification**: The process of updating our belief about the hypothesis after observing the evidence is called belief modification. Bayes' theorem provides a systematic way to do this.

In conclusion, Bayes' theorem is a powerful tool used in various fields, including science and artificial intelligence, to update probabilities given new data or evidence.

Imagine we're at a large social gathering, and we're playing a game of guessing occupations. Budi, a person whom we've just met, is our current subject.

Our initial guess, based on the fact that there are 21 people at the party and only one is a librarian, is that Budi is not a librarian. This is our Hypothesis (H) and the initial belief or Prior, $P(H)$, which is 1 in 21.

As we talk to Budi, we notice that he is both shy and meticulous, traits we associate more with librarians. This is our Evidence (E). The likelihood, $P(E∣H)$, is the probability that Budi is shy and meticulous given that he is a librarian, which we might estimate as 0.4 based on our experiences.

But we also need to consider the Negation, the probability that Budi is not a librarian, $P(¬H)$, which is 20 in 21. And we know that not all non-librarians are shy and meticulous, let's say that the probability that a non-librarian is shy and meticulous, $P(E∣¬H)$, is 0.1.

Now, armed with our observations and initial assumptions, we want to calculate the Posterior probability, $P(H∣E)$, the probability that Budi is a librarian given that he is shy and meticulous. We use Bayes' theorem for this:

$$P(H|E) = \frac{P(H)P(E|H)}{P(H)P(E|H) + (P(¬H)P(E|¬H)}$$

Substitute the values:

$$P(H|E) = \frac{(1/21) \times 0.4}{(1/21) \times 0.4 + (20/21) \times 0.1} = \frac{4}{4+20} = \frac{4}{24} = 16.7\%$$

So, given that Budi is shy and meticulous, there's a 16.7% chance that he is a librarian. This demonstrates the process of updating our belief about the hypothesis (Budi being a librarian) after observing the evidence (Budi is shy and meticulous). The game becomes much more interesting with Bayes' theorem in our toolkit!

Apart from that, the formula can be made as follows:

$$P(A∣B)= \frac{P(B∣A)P(A)}{P(B)}$$

Is actually the standard form of Bayes' theorem, just with different variable names. It describes the probability of an event $A$ given that another event $B$ has happened (expressed as $P(A∣B)$), in terms of the probability of event $B$ given that event $A$ has happened (expressed as $P(B∣A)$), the independent probability of event $A$ (expressed as $P(A)$), and the independent probability of event $B$ (expressed as $P(B)$).

Here's what each term represents:

- $P(A∣B)$ is the posterior probability of $A$ occurring given that $B$ has occurred.
- $P(B∣A)$ is the likelihood, which is the probability of $B$ occurring given that $A$ has occurred.
- $P(A)$ is the prior probability of $A$ occurring.
- $P(B)$ is the evidence, which is the total probability of $B$ occurring.

This formula is used across a variety of fields, including statistics, computer science, machine learning, and artificial intelligence for reasoning under uncertainty. It provides a way to update the probability of a hypothesis (in this case, event $A$) based on the evidence (in this case, event $B$).

## Clarifying the Issue When $P(A)+P(B)>1$?

Remember, the sum of probabilities for all possible outcomes in a scenario should equal 1. So what does it mean if $P(A)+P(B)>1$? This implies that the events A and B are not mutually exclusive -- there's some overlap. In other words, some outcomes are being counted in both $P(A)$ and $P(B)$.

Consider the case of our friend, Budi:

- **Event A**: Budi is a librarian
- **Event B**: Budi enjoys reading

These two events are not mutually exclusive. Budi could be both a librarian and someone who enjoys reading. If the probability of Budi being a librarian ($P(A)$) is 0.7 or **70%** and the probability of him enjoying reading ($P(B)$) is 0.9 or **90%**, their **sum is 1.6, which is greater than 1**. This doesn't mean our probabilities are incorrect; rather, it reflects the overlap between A and B.

![Simple Venn diagram](https://i.pinimg.com/564x/98/be/fd/98befd90cab1bb6e1a3324f5be906182.jpg)

_Simple Venn diagram (Source: [creately.com](https://creately.com/diagram/example/gat6r5b23/simple-venn-diagram))_

**How Bayesian Theorem Resolves the Issue**

The beauty of Bayesian theorem is that it allows us to update our beliefs based on new information. In this case, we might be interested in the probability that Budi is a librarian given that he enjoys reading, represented as $P(A∣B)$.

Let's plug in some numbers into our Bayesian formula:

Suppose we know the following probabilities based on historical data or some other source of information:

- Probability that Budi is a librarian, $P(A)=0.7$
- Probability that Budi likes to read, $P(B)=0.9$
- Probability that Budi likes to read given that he is a librarian, $P(B∣A)=0.95$

We want to know the probability that Budi is a librarian given that he likes to read, or $P(A∣B)$. According to Bayes' theorem, this is calculated as:

$$P(A∣B)= \frac{P(B∣A)P(A)}{P(B)}$$
​
Substituting in the known values:

$$P(A∣B)=\frac{(0.95)(0.7)}{0.9}=0.74$$

Therefore, if we know that Budi likes to read, the probability that he is a librarian is **0.74**, or **74%**.

Bayesian statistics helps us refine our predictions about Budi, such as suggesting tailored career resources. For example, with the updated probability that Budi is a librarian, we could recommend librarian-specific material instead of general reading. This way, Bayesian statistics enables us to update our beliefs systematically with new evidence, enhancing practical applications like recommender systems.