# **Simulation Based Learning:**

**Simulation** in machine learning is when we create a **virtual environment** that mimics the real world. This virtual environment is used to **train** and **test** machine learning models without needing real-world data or real-world actions.

Imagine you’re building a robot to play soccer. Instead of making the robot play hundreds of real games (which takes time, energy, and money), you can create a **simulated soccer field on a computer**. The robot learns to play by practicing in the simulation.

Sometimes in machine learning, collecting real data is:
* Too **`expensive`**
* Too **`dangerous`**
* Too **`slow`**
* Too **`rare`**

So instead of relying on **`real-world data`**, we:
1. Build a **`simulated environment`**.
2. Let the model **`learn, make mistakes, and improve`** in that virtual world.
3. Later, we may test or fine-tune it with real data.

### **Simulation vs Model-Based Learning:**

| Aspect      | Simulation                                                                      | Model-Based Learning                                                                                |
| ----------- | ------------------------------------------------------------------------------- | --------------------------------------------------------------------------------------------------- |
| What is it? | A **virtual world** where the model learns by interacting with the environment. | A method where the model **learns a mathematical model** of the environment and then plans actions. |
| Example     | A racing car game where AI learns to drive.                                     | An AI learning how cars move by building equations and predicting outcomes.                         |
| Data Need   | No need for real-world data initially; the simulator provides the experience.   | Needs real or accurate data to learn the model.                                                     |
| Flexibility | Easy to try different situations (weather, terrain, etc.).                      | Harder to test rare or dangerous events.                                                            |

### **Why Use Simulation Instead of Model-Based Learning?**

Because **real-world environments** can be:

* **`Hard to model perfectly`** (too complex or unknown physics).
* **`Risky`** (self-driving cars learning by crashing? Not safe).
* **`Slow`** (waiting years to see the effects of a new drug? Too long).

Simulation lets us **`experiment safely`, `quickly`, and `cheaply`**.


**Summary:**
* **Simulation** means teaching a model in a computer-made world that acts like the real world.
* We use it when the real world is too dangerous, expensive, or hard to collect data from.
* It's a powerful **alternative to model-based learning**, especially when it's hard to model everything mathematically.
* Simulations allow faster, safer learning and testing.

------
---
---

### **Simulation-based Statistical Machine Learning:**

In statistical modeling, simulation refers to creating artificial data or behavior by mimicking a real-world process, usually by using `random sampling` and a model that approximates reality.

**Simulation-based approaches involve:**

- Generating synthetic data from a model (forward simulation)

- Repeating this process many times

- Using the results to infer or learn something (e.g., `estimate parameters`, `test hypotheses`)

Imagine we don’t know how exactly the data was generated — you only have a `black-box simulator` (like a function that takes in parameters and produces fake data). But you still want to learn about the parameters that would make the simulated data match the real data.

For this; instead of solving a formula (which you may not have), you do this:

1. Guess a parameter (e.g., from a `uniform distribution`).

2. Simulate fake data using this guess.

3. Compare the fake data to your real data.

4. Keep the parameter if the fake data looks like real data.

5. Do this many times, and you'll get a collection of good guesses → This is your posterior distribution.

That’s a simulation-based inference.

#### **Why Use Statistical Simulation instead of Model-Based Solution?**

If you can use `Model Based Approach`, you should — it's efficient and reliable. 

But in real-world cases:
1. Model may be too complex: For example, `non-linear`, `non-Gaussian`, or `involving physics simulations` or something that `can't be expressed in a mathemetical form` or `very difficult/complex to formulate the relation`

2. Likelihood may not be available: For example,we `can't write down a formula` for how data depends on parameters.

3. Data comes from a simulator: ie. we simulate behavior (e.g., `in biology`, `economics`, `epidemiology`) — not model it analytically.

4. We want full uncertainty, not just point estimates: Simulation gives us a distribution, not just `"the best guess."`

> **Example:**   
> Think of it like trying to find the `right combination on a safe`:     
> 
> One approach is like having the manual: we know how the lock works, so we compute the combination. This is Model-Based Approach.
> 
> Simulation-based inference is like `guessing the possible combinations`, trying them, and keeping the ones that open the safe — even though we don’t know how the mechanism inside works.

#### **Head-to-Head:**

| Feature                                | Traditional ML (e.g., Linear Regression) | Simulation-Based Learning   |
| -------------------------------------- | ---------------------------------------- | --------------------------- |
| Uses likelihood                        | Yes                                      | No                          |
| Closed-form / gradient descent         | Yes                                      | No                          |
| Uses simulator                         | No                                       | Yes                         |
| Point estimate or distribution?        | Point estimate                           | Posterior distribution      |
| Suitable for complex/black-box models? | No                                       | Yes                         |
| Example method                         | Least Squares, GD                        | ABC, MCMC, Particle Filters |


-----
----
----

### **Flowchart:**

**Following Diagram shows some most common simulation based learning approachs/methods:**

```
        Simulation-Based Learning:            
│
├── (1) Approximate Bayesian Computation (ABC):
│   ├── Rejection ABC
│   ├── ABC-MCMC (Markov Chain Monte Carlo)
│   └── ABC-SMC (Sequential Monte Carlo)
│
├── (2) Markov Chain Monte Carlo (MCMC):
│   ├── Metropolis-Hastings
│   └── Gibbs Sampling
│
├── (3) Sequential Monte Carlo (SMC):
│   ├── Particle Filters
│   └── SMC for Bayesian Inference
│
├── (4) Likelihood-Free Inference:
│   ├── Synthetic Likelihood
│   └── Density Ratio Estimation
│
└── (5) Other Simulation-Based Methods:
    ├── Variational Bayesian Approaches (with simulated data)
    └── Simulation-Based Optimization (e.g., Evolutionary Algorithms)
```

1. **$ABC$ (Approximate Bayesian Computation)**: `Avoids likelihood entirely`; `compares simulated vs observed data`.

2. **$MCMC$ (Markov Chain Monte Carlo)**: Samples from the posterior using a Markov chain; requires likelihood, but still simulation-based.

3. **$SMC$ (Sequential Monte Carlo)**: Tracks a population of particles through time; used in dynamic models and time-series.

4. **Likelihood-Free Inference**: Techniques where we can’t write down the likelihood but still estimate parameters.

5. **Other Methods**: Includes any optimization or inference technique relying on simulated data.

---
---
---

### **Very Fundamental Concepts:**

##### **1. Traditional Approach (e.g., Regression):**

- You look at data, assume a mathematical form (like a line: $y = mx + c$), and define a loss function (e.g., `mean squared error`).

- Then you find the best parameters by `minimizing that loss function` — either using a formula (`closed-form`) or by using `optimization` (like `gradient descent`).

- You need to know or assume the relationship (linearity, etc.) between inputs and outputs.

- This is called a `model-driven approach` — you drive the model by defining its shape and logic.

##### **2. Simulation-Based Approach (like ABC or MCMC):**

- You have data, but you `can’t` (or `don’t want to`) write a formula for how it was generated.

- Instead, you say: `“Let me guess parameters and use a simulator to generate fake data from them.”`

- You compare fake data to the real data — if they match closely, the guessed parameters are likely to be good.

- After many guesses, the collection of good ones gives you the posterior distribution — showing which parameters are plausible.

- This is a data-driven or simulator-driven approach — you rely on simulations instead of analytical formulas.

##### **3. Posterior Distribution:**

In simulation-based learning (especially in Bayesian methods like `ABC`): You’re not looking for just one best parameter. You want to know what range of parameters are possible, and how probable each is. That’s called the posterior distribution.

**To get the exact idea of Posterior Distribution lets take an example:**

Suppose I have very huge and complex dataset which contains millions of rows and thousands of colums. I have no clues on how the different variables are related to each other and it seems like it is very difficult or like impossible to form a mathematical relation that can relate these variables. So, I decide to use simulation based learning.

Now, based on some simulation approach(like; ABC), I do repeaetd simulation experiments say 100,000 times by starting with some random parameter values ($(θ₁, θ₂, θ₃, ......)$ ← think of these as random guesses) and then generate fake data samples based on those parameters. On each simulation experiment we generate fake dataset. If I store the parameter values used in each simulation experiment in a tupple (.......) then there would be 100,000 such tuples which contains the parameter values used in each simulation experiment. After each simulation, I also compare the simulated data to the real data. Out of 100,000 simulations, only 500 simulations result in data that is `"close enough"` to the real data. These 500 "accepted" parameter tuples are the ones that can generate the data well, so they are `believable` or `plausible`. The distribution of these parameters that can generate data colse enough to the real data is called the `posterior distribution`.


##### **Let's Construct the Posterior:**

Suppose we have two parameters that we want to estimate:
* `θ₁`: say, intercept
* `θ₂`: say, slope

Let’s say the 50 accepted parameter tuples (Parameter sets that closely match the observed data) look like this:

```
(1.0, 2.5)
(1.1, 2.4)
(0.9, 2.6)
(1.0, 2.5)
(1.2, 2.3)
.........
(1.1, 2.4)
```

These 50 tuples are the **`approximate posterior samples`**.

Now:  
##### 1. **Marginal Posterior Distributions:**

* If we take all the `θ₁` values from the 50 samples then, this gives us the **`marginal posterior of θ₁`**

* If we take all the `θ₂` values then, this gives us the **`marginal posterior of θ₂`**

Now, we can:

  1. Plot a `histogram` or `KDE` (smooth curve) of these values. 
  2. It will show you **`which values appear more often`**, i.e., which parameter values are **`more likely`** under the posterior.

Example:

```
θ₁ (Intercept):
  0.9 → 3 times
  1.0 → 10 times
  1.1 → 15 times
  1.2 → 8 times
→ So 1.1 is the most plausible value of θ₁

θ₂ (Slope):
  2.3 → 4 times
  2.4 → 12 times
  2.5 → 18 times
  2.6 → 6 times
→ So 2.5 is the most plausible value of θ₂
```

##### 2. **Joint Posterior Distribution:**

We also have 50 **(θ₁, θ₂)** pairs.
If we plot them on a 2D scatter plot or use a contour heatmap, the **`density of points`** in different areas tells us:

  * What **combinations** of `(θ₁, θ₂)` are more likely
  * Whether there’s a relationship between them (e.g., higher θ₁ means lower θ₂)

This is our **joint posterior distribution** of `θ₁` and `θ₂`.


| **`Term:`**                   | **`Meaning:`**                                                                            |
| ---------------------- | ---------------------------------------------------------------------------------- |
| **Prior**              | What you believe about parameters before seeing data (in ABC: your random guesses) |
| **Likelihood**         | (Skipped in ABC) – probability of data given parameters                            |
| **Posterior**          | What you believe about parameters **after** seeing how well they explain the data  |
| **Posterior Samples**  | The subset of parameter guesses that produced good simulations                     |
| **Marginal Posterior** | Distribution of each parameter independently (from accepted samples)               |
| **Joint Posterior**    | Distribution over the combinations of parameters (from accepted pairs)             |


----
---
---

## **Approximate Bayesian Computation (ABC):**

#### **1. Bayes Theorem:**

#### **2. What is Bayesian Inference?**

#### **3. What is Approximate Bayesian Computation (ABC)?**

#### **4. What is Rejection ABC?**

#### **5. Why Use ABC in Machine Learning?**