<img src="./images/banner.png" width="800">

# Bayesian Inference: Understanding the Probabilistic Approach

Imagine you're a detective trying to solve a mystery. As you gather clues, your suspicions about what happened change and evolve. This process of updating your beliefs as new evidence comes to light is the essence of **Bayesian inference**.


Bayesian inference is a powerful approach to statistical reasoning that allows us to:

- **Incorporate prior knowledge** into our analyses
- **Update our beliefs** as we collect new data
- **Quantify uncertainty** in a natural and intuitive way


Named after Thomas Bayes, an 18th-century statistician, Bayesian inference has become increasingly popular in recent years, thanks to advances in computational power and its ability to handle complex, real-world problems.


In today's data-driven world, Bayesian inference is more relevant than ever:

1. **Handling Uncertainty**: It provides a framework for making decisions under uncertainty, crucial in fields like finance, healthcare, and AI.

2. **Flexibility**: Bayesian methods can handle small datasets and complex models where traditional methods might fail.

3. **Interpretability**: Results are often more intuitive, expressing probabilities of hypotheses rather than abstract p-values.

4. **Continuous Learning**: The Bayesian approach naturally incorporates new information, making it ideal for adaptive systems and online learning.


In this lecture, we'll dive deep into the world of Bayesian inference:

- The fundamental concepts and thinking behind the Bayesian approach
- How Bayes' Theorem works and why it's so powerful
- The step-by-step process of Bayesian inference
- Practical examples to illustrate these concepts
- Applications in machine learning and data science


By the end of this lecture, you'll have a solid understanding of how Bayesian inference works and why it's become an essential tool in the modern data scientist's toolkit.


So, put on your detective hat, and let's embark on this journey of probabilistic reasoning and updated beliefs!

**Table of contents**<a id='toc0_'></a>    
- [Fundamentals of Bayesian Thinking](#toc1_)    
  - [The Bayesian Framework](#toc1_1_)    
- [Bayes' Theorem: The Heart of Bayesian Inference](#toc2_)    
  - [Components: Prior, Likelihood, and Posterior](#toc2_1_)    
- [The Bayesian Inference Process](#toc3_)    
  - [Step 1: Defining the Prior](#toc3_1_)    
  - [Step 2: Specifying the Likelihood](#toc3_2_)    
  - [Step 3: Calculating the Posterior](#toc3_3_)    
  - [Step 4: Making Inferences](#toc3_4_)    
  - [Putting It All Together](#toc3_5_)    
- [Practical Examples](#toc4_)    
  - [Simplified Coin Flip Example](#toc4_1_)    
  - [Medical Diagnosis Example](#toc4_2_)    
  - [Key Takeaways from These Examples](#toc4_3_)    
- [Conclusion](#toc5_)    

<!-- vscode-jupyter-toc-config
	numbering=false
	anchor=true
	flat=false
	minLevel=2
	maxLevel=6
	/vscode-jupyter-toc-config -->
<!-- THIS CELL WILL BE REPLACED ON TOC UPDATE. DO NOT WRITE YOUR TEXT IN THIS CELL -->

## <a id='toc1_'></a>[Fundamentals of Bayesian Thinking](#toc0_)

In the Bayesian world, probability isn't just about coin flips and dice rolls. It's a way to quantify our uncertainty about the world. Here's how to think about it:

- **Subjective Probability**: Unlike the frequentist view, which sees probability as long-term frequency, Bayesians view probability as a *degree of belief*.

- **Uncertainty Quantification**: Probability becomes a tool to express how sure (or unsure) we are about something.

- **Dynamic Beliefs**: These probabilities can change as we gather new information.

**Example**: You might say, "I'm 70% sure it will rain tomorrow." This isn't based on it raining on 70% of all tomorrows, but on your current belief given the information you have.


### <a id='toc1_1_'></a>[The Bayesian Framework](#toc0_)


The Bayesian framework is built on a few key ideas:

1. **Prior Beliefs**: We start with what we already know (or think we know). This is called the *prior*.

2. **New Evidence**: We collect data or make observations. This is our *likelihood*.

3. **Updated Beliefs**: We combine our prior beliefs with the new evidence to form our *posterior* beliefs.

4. **Continuous Learning**: This process can be repeated, with today's posterior becoming tomorrow's prior.


This framework is often represented mathematically as:

$$ \text{Posterior} \propto \text{Prior} \times \text{Likelihood} $$


Where $\propto$ means "proportional to".


**Key Principles**:


- **All parameters are random variables**: In the Bayesian view, we're not trying to find fixed, "true" values, but rather distributions of possible values.

- **Conditioning on known information**: We always work with probabilities that are conditional on what we know.

- **Coherence**: Our beliefs should be logically consistent with each other.


**Example in Action**:
Imagine you're guessing the skill of a basketball player:

1. *Prior*: Based on the league average, you think they might make about 50% of their shots.
2. *New Data*: You watch them make 8 out of 10 shots in practice.
3. *Posterior*: You update your belief, now thinking they're probably better than average.


This simple example encapsulates the essence of Bayesian thinking: starting with a belief, observing evidence, and updating our belief accordingly.


Understanding these fundamentals sets the stage for diving deeper into how Bayesian inference works in practice. Next, we'll explore the mathematical heart of this approach: Bayes' Theorem.

## <a id='toc2_'></a>[Bayes' Theorem: The Heart of Bayesian Inference](#toc0_)

Bayes' Theorem is a fundamental principle in probability theory and statistics that describes how to update the probability of a hypothesis based on new evidence. It's named after Reverend Thomas Bayes, who first formulated the theorem in the 18th century. Bayes' Theorem is the mathematical foundation of Bayesian inference. It's a way to calculate the probability of an event based on prior knowledge of conditions that might be related to the event. Here's the theorem in its simplest form:

$$ P(A|B) = \frac{P(B|A) \times P(A)}{P(B)} $$


<img src="./images/tmp/bayesian.png" width="800">

Where:
- $P(A|B)$ is the probability of A given B (posterior)
- $P(B|A)$ is the probability of B given A (likelihood)
- $P(A)$ is the probability of A (prior)
- $P(B)$ is the probability of B (evidence)


**Intuitive Explanation**:
Imagine you're a doctor diagnosing a rare disease. Bayes' Theorem helps you update your belief about whether a patient has the disease based on test results.


<img src="./images/tmp/bayesian-ai-system.png" width="800">

### <a id='toc2_1_'></a>[Components: Prior, Likelihood, and Posterior](#toc0_)


Let's break down the key components of Bayes' Theorem:

1. **Prior - $P(A)$**
   - This is our initial belief before seeing new evidence.
   - It represents what we know (or assume) beforehand.
   - Example: The general prevalence of the disease in the population.

2. **Likelihood - $P(B|A)$**
   - This is the probability of seeing the evidence if our hypothesis is true.
   - It relates our hypothesis to observable data.
   - Example: The probability of a positive test result given that the patient has the disease.

3. **Posterior - $P(A|B)$**
   - This is our updated belief after considering the new evidence.
   - It's what we're ultimately interested in calculating.
   - Example: The probability that the patient has the disease, given a positive test result.

4. **Evidence - $P(B)$**
   - This is the probability of seeing the evidence, regardless of whether our hypothesis is true.
   - It acts as a normalizing constant.
   - Example: The overall probability of getting a positive test result.


**Putting It All Together**:

Let's use our medical diagnosis example:
- Prior: 1% of the population has the disease.
- Likelihood: The test is 95% accurate (for both positive and negative results).
- Evidence: A patient tests positive.


Applying Bayes' Theorem:

$$ P(\text{Disease|Positive}) = \frac{P(\text{Positive|Disease}) \times P(\text{Disease})}{P(\text{Positive})} $$

$$ = \frac{0.95 \times 0.01}{(0.95 \times 0.01) + (0.05 \times 0.99)} \approx 0.16 $$


This means that even with a positive test result, there's only about a 16% chance the patient has the disease. This counterintuitive result demonstrates the power of Bayes' Theorem in handling real-world probabilities.


Understanding these components and how they interact in Bayes' Theorem is crucial for applying Bayesian inference to practical problems. In the next section, we'll walk through the step-by-step process of applying this knowledge in Bayesian inference.

## <a id='toc3_'></a>[The Bayesian Inference Process](#toc0_)

The Bayesian inference process is like updating a mental model as new information comes in. Let's break it down step-by-step:


### <a id='toc3_1_'></a>[Step 1: Defining the Prior](#toc0_)


This is where we formalize our initial beliefs:

- **What**: The prior is our belief about the parameter(s) of interest before seeing the data.
- **How**: We express this as a probability distribution.
- **Example**: If we're estimating the fairness of a coin, we might start with a prior that it's probably fair, but we're not certain.


**Key Point**: The prior can be:
- *Informative*: Based on previous studies or expert knowledge.
- *Uninformative* or *Flat*: When we have little prior knowledge.


### <a id='toc3_2_'></a>[Step 2: Specifying the Likelihood](#toc0_)


This step involves modeling how our data relates to the parameter(s):

- **What**: The likelihood is the probability of observing our data, given different possible values of the parameter(s).
- **How**: We choose a probability distribution that best represents our data-generating process.
- **Example**: For coin flips, we might use a Binomial distribution.


**Key Point**: The likelihood function connects our theoretical model to the observed data.


### <a id='toc3_3_'></a>[Step 3: Calculating the Posterior](#toc0_)


Now we combine our prior beliefs with the observed data:

- **What**: The posterior is our updated belief about the parameter(s) after seeing the data.
- **How**: We use Bayes' Theorem to compute this:

  $$ P(\theta|D) \propto P(D|\theta) \times P(\theta) $$

  Where $\theta$ is our parameter and $D$ is our data.

- **Example**: After seeing 60 heads in 100 flips, our belief about the coin's fairness would shift towards it being slightly biased.


**Key Point**: The posterior combines prior knowledge with observed data.


### <a id='toc3_4_'></a>[Step 4: Making Inferences](#toc0_)


Finally, we use our posterior distribution to draw conclusions:

- **Point Estimates**: We might use the mean or mode of the posterior as our best guess for the parameter value.
- **Credible Intervals**: We can find ranges where we're X% sure the true parameter lies.
- **Predictions**: We can generate predictions for future data.
- **Decision Making**: We can use the posterior to inform decisions under uncertainty.


**Example**: We might conclude that there's a 95% chance the coin's probability of heads lies between 0.51 and 0.69.


**Key Point**: Bayesian inference provides a full distribution of possible parameter values, allowing for rich and nuanced conclusions.


### <a id='toc3_5_'></a>[Putting It All Together](#toc0_)


Let's revisit our coin flip example:

1. **Prior**: We start believing the coin is probably fair (Beta(10,10) distribution).
2. **Likelihood**: We model 100 flips as a Binomial distribution.
3. **Data**: We observe 60 heads out of 100 flips.
4. **Posterior**: We update our belief, now leaning towards the coin being slightly biased (Beta(70,50) distribution).
5. **Inference**: We might conclude there's strong evidence the coin is biased, with a 95% credible interval for the probability of heads being (0.51, 0.69).


This process allows us to start with our prior knowledge, incorporate new evidence, and end up with a nuanced understanding of the situation, complete with a measure of our uncertainty. It's this ability to handle uncertainty and update beliefs that makes Bayesian inference so powerful in real-world applications.

## <a id='toc4_'></a>[Practical Examples](#toc0_)

### <a id='toc4_1_'></a>[Simplified Coin Flip Example](#toc0_)


Let's determine if a coin is fair using a simpler Bayesian approach.


1. **Define the Prior**:
   - We start believing the coin is probably fair.
   - Let's say we're 80% sure it's fair (50% chance of heads).
   - Prior: P(Fair) = 0.8, P(Biased) = 0.2

2. **Specify the Likelihood**:
   - If the coin is fair, P(Heads|Fair) = 0.5
   - If it's biased, let's assume P(Heads|Biased) = 0.7

3. **Observe Data**:
   - We flip the coin 10 times and get 7 heads.

4. **Calculate the Posterior**:
   Using Bayes' Theorem:

   P(Fair|7 Heads) = P(7 Heads|Fair) × P(Fair) / P(7 Heads)

   P(Biased|7 Heads) = P(7 Heads|Biased) × P(Biased) / P(7 Heads)

   We can calculate these probabilities:

   P(7 Heads|Fair) ≈ 0.117
   P(7 Heads|Biased) ≈ 0.267

   P(Fair|7 Heads) ≈ 0.637
   P(Biased|7 Heads) ≈ 0.363

5. **Make Inferences**:
   - Our belief in the coin being fair has decreased from 80% to about 64%.
   - There's now a 36% chance the coin is biased, up from our initial 20%.


This simplified example shows how our belief updates based on the observed data, without using complex distributions.


### <a id='toc4_2_'></a>[Medical Diagnosis Example](#toc0_)


Now, let's consider a more complex scenario: diagnosing a rare disease.

1. **Define the Prior**:
   - The disease affects 1% of the population.
   - Prior probability of having the disease: P(D) = 0.01

2. **Specify the Likelihood**:
   - We have a test that's 95% accurate for both positive and negative results.
   - P(Positive|Disease) = 0.95 (true positive rate)
   - P(Negative|No Disease) = 0.95 (true negative rate)

3. **Observe Data**:
   - A patient tests positive.

4. **Calculate the Posterior**:
   Using Bayes' Theorem:
   
   $P(D|+) = \frac{P(+|D) \times P(D)}{P(+)}$
   
   Where $P(+) = P(+|D)P(D) + P(+|\text{Not D})P(\text{Not D})$
              $= 0.95 \times 0.01 + 0.05 \times 0.99 = 0.0585$
   
   So, $P(D|+) = \frac{0.95 \times 0.01}{0.0585} \approx 0.162$

5. **Make Inferences**:
   - Despite the positive test, there's only about a 16.2% chance the patient has the disease.
   - This counterintuitive result demonstrates the importance of considering base rates (our prior) in medical diagnosis.


### <a id='toc4_3_'></a>[Key Takeaways from These Examples](#toc0_)


1. **Beliefs Update with Data**: In the coin example, our confidence in the coin's fairness decreased after seeing more heads than expected.

2. **Prior Matters**: The medical example shows how the rarity of a disease affects the interpretation of a positive test.

3. **Intuition vs. Calculation**: Both examples demonstrate how Bayesian calculations can sometimes contradict our initial intuitions.

4. **Practical Application**: These examples show how Bayesian thinking applies to everyday scenarios, from games to important medical decisions.


By simplifying the coin flip example, we can more clearly see the process of updating beliefs based on new evidence, which is the core of Bayesian inference.

## <a id='toc5_'></a>[Conclusion](#toc0_)

As we wrap up our journey through Bayesian inference, let's revisit the core ideas we've explored:

1. **Probability as Belief**: 
   - In Bayesian thinking, probability represents our degree of certainty about something.
   - This allows us to quantify and update our beliefs as we gather new information.

2. **Bayes' Theorem**: 
   - The mathematical heart of Bayesian inference.
   - It shows us how to update probabilities given new evidence.

3. **Prior, Likelihood, and Posterior**:
   - *Prior*: Our initial beliefs before seeing data.
   - *Likelihood*: How probable the data is given our hypothesis.
   - *Posterior*: Our updated beliefs after considering the data.

4. **The Inference Process**:
   - Start with a prior belief.
   - Collect data and specify how it relates to our hypothesis.
   - Use Bayes' Theorem to update our beliefs.
   - Make decisions or predictions based on the posterior.

5. **Practical Applications**:
   - From simple examples like coin flips to complex scenarios like medical diagnoses.
   - Bayesian methods shine in handling uncertainty and incorporating prior knowledge.


By mastering Bayesian inference, you're not just learning a statistical technique – you're adopting a way of thinking that will serve you well in navigating the uncertain, data-rich world of modern data science. Keep exploring, keep questioning, and keep updating your beliefs as you encounter new evidence. That's the Bayesian way!

# add resources from youtube and medium