# Week 2 Overview
This week, you’ll explore two key tools for understanding causal relationships in data: fixed effects models and bootstrap simulation. You’ll begin by learning how fixed effects help control for unobserved differences across groups—like companies or soil types—so you can isolate the impact of a single variable of interest. You’ll also practice building your own example that uses fixed effects. Then, you’ll turn to simulation techniques, using bootstrapping to estimate how your results might vary if you had a different dataset. By the end of the week, you’ll have hands-on experience running fixed effects regressions, performing bootstrap simulations, and reflecting on what your results say about real-world data-generating processes (DGPs).

## Learning Objectives
At the end of this week, you will be able to: 
- Perform fixed effects  
- Invent your own example situation that uses fixed effects  
- Perform bootstrap simulation  
- Describe an example of a data-generating process


## Topic Overview: Simulation
**Bootstrapping** is a powerful simulation technique that allows you to estimate how your results might vary if you collected a different sample from the same population. By repeatedly sampling your existing dataset with replacement, you can create many simulated datasets, each slightly different from the original. You then run your analysis — such as calculating a mean or running a regression—on each of these samples. The variation across these results helps you understand the reliability of your estimates, including how much they might fluctuate due to sampling variability. This week, you’ll explore how bootstrapping works, why it’s especially useful when you can’t access more data, and how it helps evaluate the robustness of statistical findings in causal inference.

### Learning Objectives:
- Perform bootstrap simulation
- Describe an example of a data-generating process

## 1.1 Lesson: Bootstrapping
In this short video, we will discuss how bootstrapping helps you estimate the variability of regression coefficients and why resampling your data can reveal how confident you should be in the patterns you’ve found.

### Simulation
Let’s look at another example. 

Say you want to know how well your linear regression will work. You have 1,000 samples, but what if you had a different 1,000 samples? Would you get a similar effect? A very different effect? What is the variance of the effects you’d get? 

A simple way to answer this question is a **Bootstrap Simulation**. This means that you pick 1,000 samples out of the 1,000 samples with replacement. That is, you can pick the same sample twice. Now you can run whatever test you want to run but use the bootstrap sample. For example, suppose that values in our dataset are:

`values = [1,2,3,4,5]`

To do the bootstrap, we pick five of the values at random. But there are only five values, you say! What does it mean to pick “five of them?” We pick them with replacement, meaning that we can pick the same one twice. We get:

`values_bootstrap = [1,2,3,4,4]`

Now, we can estimate some statistic related to these new values. For example, the `mean`:

$$\frac{1 + 2 + 2 + 4 + 5}{5} \; \; \frac{13}{5} \; = \; 2.6$$

This is different from the original mean of 3. If we perform this many times, we'll get a variety of means. This represents the different means we could get with different sample populations. For example, suppose the true population looks like this:

`values_true = [0, 1, 1, 1, 2, 2, 3, 3, 4, 4, 5, 5, 6, 7]`

Then, the true possible distribution of means would involve taking samples from that population. 

Instead, we're only taking samples from our sampled population, or `[1,2,3,4,5]`.

However, it turns out that for large samples, the bootstrap gives a distribution that’s not so different from the population as a whole. So, we can use the bootstrap to get a handle on how things are working.

Bootstrap sampling might not work with some datasets (like the Pareto distribution mentioned previously). 

That is, with the Pareto distribution, when we sample the original population 1,000 times, we will likely get some pretty weird items in our sample — astronomically tall trees, for instance.

Usually with a big enough sample bootstrapping is a reasonably good estimate for what would happen if we sample the original population a bunch of times. In particular, it can give us the variance of any statistic. It can give us the mean, too, but the mean it gives us might be the same as the mean of that value for the sample. That is, the 2.6 value above differs from the sample of 3, but if we did it a bunch of times, we’d get 3 on average.

We can use the bootstrap sample to find the mean and variance of anything. For instance, we could run a linear regression on the bootstrap sample and find the coefficients. Then, we could find the mean and standard deviation of each coefficient. This could tell us, for instance, how likely the coefficient is to be zero (or rather, above or below zero), which is an important question for power analysis. Power analysis is a method used to estimate how likely a statistical test is to detect a true effect, if one exists. In other words, it helps you assess whether your study has enough data to confidently find meaningful results. We’ll address power analysis in a later week.




### Chapter 2 - Research Questions