# Lab 6: Sampling Variability and Confidence Intervals in Behavioral Health Data

**Learning Objectives**  
By the end of this lab, you will be able to:  
1. **Understand sampling variability** by simulating repeated samples and observing the distribution of sample means (standard error).  
2. **Compute confidence intervals** for a sample mean at different confidence levels (90%, 95%, 99%) and interpret them in context.  
3. **Compare groups and timepoints using CIs** â€“ plot mean differences (with error bars) across subgroups (e.g. low vs. high risk) and between baseline and follow-up to judge statistical significance.

## Background & Variables

We will use a synthetic dataset (N ~ 11,000 adolescents) derived from the ABCD study, including measures of childhood adversity, community environment, and substance use. Key variables in this lab:
- **ACEs Sum Score** (`aces_sum_score`): Total count of Adverse Childhood Experiences (0â€“10 scale; higher = more childhood trauma exposure).
- **Child Opportunity Index (COI) â€“ Overall Z-score** (`le_l_coi__addr1__coi__total__national_zscore`): A standardized score (mean ~0, SD ~1) reflecting the quality of a child's neighborhood environment (higher = more opportunity). 
- **COI Overall Quintile** (`le_l_coi__addr1__coi__total__national_c5`): Categorical ranking of the overall COI into 5 levels (1 = Very Low opportunity, 5 = Very High opportunity).
- **COI Social & Economic Z-score** (`le_l_coi__addr1__se__total__national_zscore`): Z-score for the Social & Economic subdomain of the COI (higher = better socioeconomic environment).
- **Rx opioid total dose (non-medical; TLFB)** (`su_y_tlfb__rxopi_totdose`): Total non-medical prescription opioid dose over the recall window from the Timeline Follow-Back (continuous; 0 if no use).
- **Marijuana use-days (TLFB)** (`su_y_tlfb__mj_ud`): Number of days with marijuana use over the recall window from the Timeline Follow-Back (count; 0 if no use).

*Note:* The dataset contains both baseline (age ~16) and follow-up (age ~21) values for the substance use variables, allowing us to examine changes over time. Missing data are coded as `999` for certain measures (e.g., some participants did not respond to the ACEs questionnaire). We will account for this by filtering out those codes in our analyses.

## Setup

Run the setup cell below to import libraries and load the dataset for Labs 6/7. This dataset is a synthetic longitudinal extension of the Lab4/5 data (now including young adult follow-up). We will use pandas for data handling and matplotlib for plotting. A helper function `savefig()` is provided to save our figures in a `figures/` directory.

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from scipy import stats
import math

# Configure display options
pd.set_option('display.max_columns', None)

# Load the dataset
df = pd.read_csv('data/L6L10/ABCD_synthetic.csv')

# Display shape and first few rows
print(f"Dataset contains {df.shape[0]} rows and {df.shape[1]} columns.")
df.head(5)

# Set up figure directory and helper function for saving plots
from pathlib import Path
FIG_DIR = Path("figures")
FIG_DIR.mkdir(exist_ok=True)
def savefig(name, dpi=150):
    """Save current Matplotlib figure to figures/<name>.png"""
    plt.tight_layout()
    if not name.endswith('.png'):
        name += '.png'
    plt.savefig(FIG_DIR / name, dpi=dpi)
    print(f"Figure saved as {name}")
 

### Activity 1. Sampling Variability and the Standard Error

**Demo:** To understand the concept of the **standard error of the mean**, we will simulate the process of drawing repeated samples from a population. Our "population" here will be the **Wave 21 (age ~21) data** from our study sample. We will repeatedly take random subsamples and compute their mean, observing how the sample mean varies from sample to sample. This distribution of sample means is called the **sampling distribution**.

Let's focus on a single continuous variable for this demonstration. We will use the **ACEs sum score** as an example (a count of adverse childhood experiences, 0â€“10). Even though ACE scores are not perfectly normally distributed (many have low scores), the Central Limit Theorem tells us that the *means* of sufficiently large samples will be approximately normally distributed. We'll demonstrate this by:
1. Filtering to **Wave 21 only** to ensure independence of observations (each subject appears once).
2. Taking many random samples from the Wave 21 data (treating these ~11k young adults as the population).
3. Computing the mean ACE score for each sample.
4. Plotting the distribution (histogram) of these sample means to visualize the variability.
5. Comparing the observed standard deviation of the sample means to the theoretical standard error formula $SE = \sigma/\sqrt{n}$.

*Why do this?* The standard error (SE) tells us how much we expect the sample mean to vary from sample to sample. A smaller SE means a more precise estimate of the population mean. Here, we'll empirically see how sample size and population variability influence the SE.

**Important:** We use only Wave 21 to avoid violating the independence assumptionâ€”each observation represents a unique individual at one timepoint.

In [None]:
# Demo 1: Sampling distribution of the mean (using ACEs sum score at Wave 21)

# 1. Filter to Wave 21 only to ensure independence
df_w21 = df[df['wave'] == 21].copy()
print(f"Wave 21 data: {df_w21.shape[0]} observations")

# 2. Define the population data (exclude missing 999 values)
aces = df_w21['aces_sum_score']
aces = aces[aces != 999]
pop_mean = aces.mean()
pop_std = aces.std(ddof=0)  # population std (treating this dataset as population)
print(f"Wave 21 ACE scores: n={len(aces)}, mean={pop_mean:.3f}, SD={pop_std:.3f}")

# 3. Choose a sample size and number of resamples
n = 50  # sample size for each draw (e.g., 50 individuals)
num_samples = 1000  # number of repeated samples

# 4. Draw samples and record their means
sample_means = []
for i in range(num_samples):
    sample = aces.sample(n, replace=False)
    sample_means.append(sample.mean())
sample_means = np.array(sample_means)

# 5. Calculate the empirical standard error from our resamples and the theoretical SE
empirical_se = sample_means.std(ddof=1)
theoretical_se = pop_std / np.sqrt(n)
print(f"\nPopulation mean ACE score (W21) = {pop_mean:.3f}")
print(f"Empirical SE (observed SD of sample means) = {empirical_se:.3f}")
print(f"Theoretical SE = sigma/sqrt(n) = {theoretical_se:.3f}")

# 6. Plot the sampling distribution (histogram of sample means)
plt.figure(figsize=(6,4))
plt.hist(sample_means, bins=30, color='skyblue', edgecolor='black')
plt.axvline(pop_mean, color='red', linestyle='--', linewidth=2, label=f"Population mean = {pop_mean:.2f}")
plt.title(f"Sampling Distribution of the Mean (ACE score, Wave 21, n={n})")
plt.xlabel("Sample Mean ACE Score")
plt.ylabel("Frequency (out of 1000 samples)")
plt.legend()
savefig("01_demo_sampling_means_w21")
plt.show()

# Notice: The histogram of sample means is centered near the true population mean. 
# The spread of the distribution (SE) is close to the theoretical value sigma/âˆšn.

In the output above, we see the **population mean** of ACE scores from Wave 21 (age ~21) is **2.512** (about 2.5 adverse childhood experiences on average) with a population standard deviation of **1.593**. The **empirical SE** (calculated from the spread of 1000 sample means) is **0.233**, which is very close to the **theoretical SE** of **0.225** computed from the formula $\sigma/\sqrt{n} = 1.593/\sqrt{50}$. 

The histogram shows the distribution of sample mean ACE scores across 1000 resamples of size $n=50$ from Wave 21 data. As predicted by the Central Limit Theorem, this sampling distribution is approximately normal (bell-shaped) and centered around the true population mean of 2.512. Most sample means fall within a narrow range around this value â€“ specifically, most fall between roughly 2.0 and 3.0. This spread reflects the standard error of approximately **0.23** (i.e., roughly that much variation is expected in the sample mean from sample to sample) for samples of size $n=50$. 

If we increased the sample size $n$ to, say, 100 or 200, the distribution of sample means would become even tighter (smaller SE), with the SE decreasing by a factor of $\sqrt{2}$ or $\sqrt{4}$ respectively. Conversely, a smaller sample size (e.g., $n=25$) would yield a wider spread of sample means, with SE increasing to about 0.32.

*Key point:* The **standard error** is the standard deviation of the sampling distribution of a statistic (here, the mean). It quantifies our uncertainty in the sample mean as an estimate of the population mean. By using only Wave 21 data, we ensure that each observation is independent (no subject appears twice), which is a critical assumption for the standard error formula $\sigma/\sqrt{n}$ to be valid. The close agreement between empirical SE (0.233) and theoretical SE (0.225) confirms that our sampling procedure and the formula work as expected. In practice, we often estimate SE by $s/\sqrt{n}$ (using the sample's standard deviation $s$), since we usually have only one sample rather than the full population.

### Your Turn 1: Sampling Variability

Choose **one** variable:

- `mh_y_upps__nurg_sum` â€” *impulsivity sum scale*  
- `ph_y_bld__rslt__chol_qnt` â€” *cholesterol, mg/dL*  
- **Social media time (minutes or hours):**  
  - `nt_y_stq__socmed__min_001` **or** `nt_y_stq__socmed__mins_001` â€” *minutes*  
  - `nt_y_stq__socmed__hr_001` â€” *hours*

---

#### Task (Wave 21 only)
1. Extract your chosen variable from Wave 21 data
2. Pick a sample size `n` (start with 50) and take **1,000** simple random samples **without replacement**
3. Collect the sample means, compute the **empirical SE** (standard deviation of those means), and compare to the **theoretical SE = Ïƒ/âˆšn** (where Ïƒ is the SD of the full Wave-21 data)
4. Plot a histogram of the sample means with the Wave-21 population mean marked
5. Save your figure in the `figures/` folder as `01_YT_sampling_means_{chosen_var}.png`

---

#### Think About
- Does the sampling distribution look approximately normal at `n = 50`?  
- How does skew (e.g., ACEs) change the shape of the distribution?  
- Does the empirical SE closely match the theoretical formula?

---

#### Getting Started with Copilot

Try these prompts in the code cell below to build your solution step-by-step:

1. **"Filter df to Wave 21 only, extract {chosen_var}, drop NaN values, store as a numpy array"**  
   *(Replace `{chosen_var}` with your actual variable name)*

2. **"Take 1000 samples of size n=50 without replacement from this array, compute the mean of each sample, store all means in a list"**

3. **"Calculate the empirical SE (standard deviation of sample means) and theoretical SE using the population SD divided by sqrt(n)"**

4. **"Create a histogram of the sample means with 30 bins, add a vertical red dashed line at the population mean, include title and labels, save as 01_YT_sampling_means_{chosen_var}.png"**

ðŸ’¡ **Tip:** Start by copying the structure from Demo 1, then modify the variable name and adjust as needed!

In [None]:
# === Your Turn 1: Sampling Variability in Natural Units (Wave 21) ===

# Choose ONE variable:
#   'mh_y_upps__nurg_sum'
#   'ph_y_bld__rslt__chol_qnt'
#   'nt_y_stq__socmed__mins_001'
#   'nt_y_stq__socmed__hr_001'
chosen_var = '___'

# TODO: Filter to Wave 21, extract chosen variable, drop missing values
# TODO: Set n=50, num_samples=1000
# TODO: Take 1000 samples (without replacement), compute means
# TODO: Calculate empirical SE and theoretical SE = pop_std / sqrt(n)
# TODO: Print population mean, empirical SE, theoretical SE
# TODO: Plot histogram with population mean marked
# TODO: Save as f"01_YT_sampling_means_{chosen_var}"

# your code here

> **Optional Extension: Sampling Variability with Standardized Variables**  
> 
> **Why try z-scored variables?** So far we've worked with raw-score measures (ACEs, cholesterol, impulsivity). Now explore how the standard error behaves with **standardized (z-scored) variables** like the COI indices. This extension demonstrates that:
> - **SE applies to ANY continuous variable**, not just raw counts or physical measurements
> - **The math still works**: For a z-scored variable with population SD â‰ˆ 1.0, theoretical SE = 1/âˆšn
> - **Real-world relevance**: Many research variables are standardized (effect sizes, composite scores, normalized indices)
>
> **Choose ONE z-scored COI variable:**
> - `le_l_coi__addr1__coi__total__national_zscore` â€” **COI Overall Z-score**
> - `le_l_coi__addr1__se__total__national_zscore` â€” **COI Social & Economic Z-score**
>
> **Key difference from raw scores:**  
> Raw scores (like ACEs: 0â€“10) have natural units (number of adverse events). Z-scores are in **standard deviation units** relative to a national reference population (mean=0, SD=1). Your Wave-21 sample's SD may differ slightly from 1.0 due to sampling variability, regional variation, or missingnessâ€”but it should be close!
>
> **What to expect:**
> - If the **Wave-21 SD â‰ˆ 1.0**, then theoretical SE â‰ˆ 1/âˆšn (e.g., ~0.14 for n=50)
> - Your **empirical SE** (from 1,000 resamples) should closely match this
> - The **sampling distribution** will be approximately normal (COI z-scores are already near-normal, and the Central Limit Theorem makes the distribution of sample means even more symmetric)
>
> **Instructions:** Follow the same steps as the main activity (filter Wave 21, choose n=50, take 1,000 samples, compute empirical and theoretical SE, plot histogram). Use the code cell below (the template is already set up for z-scored variables).
>
> **Reflection questions:**
> - How does the Wave-21 SD for your COI variable compare to the reference SD of 1.0? Why might they differ?
> - Is the empirical SE close to 1/âˆšn? What does this tell you about the SE formula's generalizability?
> - Compare the shape of this sampling distribution to the ACEs distribution from the demo. Which is more symmetric? Why?

In [None]:
# TODO: Set chosen_var to one of: 'le_l_coi__addr1__coi__total__national_zscore' or 'le_l_coi__addr1__se__total__national_zscore'
chosen_var = '___'  

# 1. Filter to Wave 21 only to ensure independence
df_w21 = df[df['wave'] == 21].copy()
print(f"Wave 21 data: {df_w21.shape[0]} observations")

# 2. Extract the data for the chosen variable from Wave 21, dropping missing values
data = df_w21[chosen_var]
if chosen_var == 'aces_sum_score':
    data = data[data != 999]
else:
    data = data[~data.isna()]

print(f"Wave 21 {chosen_var}: n={len(data)}")

# 3. Set sample size n and number of samples
n = 50  # you can change this to explore effect of sample size
num_samples = 1000

# 4. Resample and collect means
means = []
for i in range(num_samples):
    sample = data.sample(n, replace=False)
    means.append(sample.mean())
means = np.array(means)

# 5. Compute empirical and theoretical SE
pop_mu = data.mean()
pop_sigma = data.std(ddof=0)
emp_se = means.std(ddof=1)
theo_se = pop_sigma / np.sqrt(n)
print(f"\nWave 21 population mean of {chosen_var} = {pop_mu:.3f}")
print(f"Wave 21 population SD = {pop_sigma:.3f}")
print(f"Empirical SE (from 1000 samples) = {emp_se:.3f}")
print(f"Theoretical SE = {theo_se:.3f}")

# 6. Plot histogram of sample means
plt.figure(figsize=(6,4))
plt.hist(means, bins=30, color='lightgreen', edgecolor='black')
plt.axvline(pop_mu, color='red', linestyle='--', linewidth=2, label=f"Pop mean = {pop_mu:.2f}")
plt.title(f"Sampling Distribution of {chosen_var} (Wave 21, n={n})")
plt.xlabel(f"Sample Mean of {chosen_var}")
plt.ylabel("Frequency (out of 1000 samples)")
plt.legend()
savefig(f"01_YT_sampling_means_{chosen_var}")
plt.show()

### Activity 2. Confidence Intervals for a Mean

**Demo:** Now that we've seen how sample means vary, let's construct **confidence intervals (CIs)** to quantify our uncertainty about a single sample mean. A confidence interval gives a range of plausible values for the population parameter (e.g., the true mean) based on our sample. The general form for a confidence interval for a mean is:

$$\text{CI} = \bar{x} \pm z^* \times \frac{s}{\sqrt{n}}$$

where $\bar{x}$ is the sample mean, $s$ is the sample's standard deviation, $n$ is sample size, and $z^*$ is the critical value from the standard normal distribution for the desired confidence level (e.g., 1.96 for 95%). (Here we are assuming $n$ is moderately large, so using normal $z$; a more precise approach would use a $t$ distribution.)

In this demonstration, we'll:
- **Filter to Wave 21 only** to ensure independence of observations (same rationale as Activity 1).
- Draw **one random sample** from the Wave 21 data (to mimic the scenario of having just one study sample).
- Compute the sample mean and standard deviation for a continuous variable.
- Calculate **90%**, **95%**, and **99%** confidence intervals for the mean.
- Visualize these intervals to see how the width expands with higher confidence.

We'll use the **COI Overall Z-score** as our example variable in this section (roughly normally distributed). The process can be applied to any continuous outcome.

*Reminder:* A 95% confidence interval means that if we repeated the sampling process many times, about 95% of those intervals would contain the true population mean. It does **not** mean there's a 95% probability that the true mean lies in *this particular interval* (the true mean is fixed, the interval is random). We interpret it as a degree of confidence in our estimation procedure.

In [None]:
# Demo 2: Confidence intervals for a sample mean (COI Overall Z-score, Wave 21)
import math
from scipy.stats import norm

# 1. Filter to Wave 21 only to ensure independence
df_w21 = df[df['wave'] == 21].copy()
print(f"Wave 21 data: {df_w21.shape[0]} observations")

# 2. Extract COI Overall Z-score from Wave 21 data
data_all = df_w21['le_l_coi__addr1__coi__total__national_zscore']
data_all = data_all[~data_all.isna()]
print(f"Wave 21 COI Overall Z-score: n={len(data_all)}")

# 3. Take one random sample from the Wave 21 data
np.random.seed(42)  # for reproducibility
sample_size = 100
sample = data_all.sample(sample_size, replace=False)

# 4. Compute sample mean and std
x_bar = sample.mean()
s = sample.std(ddof=1)
print(f"\nSample mean (COI overall z) = {x_bar:.3f}, Sample SD = {s:.3f}, n = {sample_size}")

# 5. Calculate CIs at 90%, 95%, 99%
confidence_levels = [0.90, 0.95, 0.99]
intervals = {}
for conf in confidence_levels:
    alpha = 1 - conf
    z_crit = norm.ppf(1 - alpha/2)  # critical z-value
    margin = z_crit * (s / math.sqrt(sample_size))
    ci_lower = x_bar - margin
    ci_upper = x_bar + margin
    intervals[f"{int(conf*100)}% CI"] = (ci_lower, ci_upper)
    print(f"{int(conf*100)}% CI: [{ci_lower:.3f}, {ci_upper:.3f}] (width = {ci_upper-ci_lower:.3f})")

# 6. Visualize the CIs
plt.figure(figsize=(6,4))
conf_list = [90, 95, 99]
y_positions = [1, 2, 3]
for y, conf in zip(y_positions, conf_list):
    ci = intervals[f"{conf}% CI"]
    plt.hlines(y, ci[0], ci[1], colors='blue', linewidth=4)
    plt.plot(x_bar, y, 'o', color='black')  # mark the sample mean
plt.gca().set_yticks(y_positions)
plt.gca().set_yticklabels([f"{conf}% CI" for conf in conf_list])
plt.axvline(x_bar, color='black', linestyle='--', linewidth=1)
plt.title("90% vs 95% vs 99% Confidence Intervals (COI Z-score, Wave 21)")
plt.xlabel("COI Overall Z-score (sample mean Â± margin)")
savefig("02_demo_confidence_intervals")
plt.show()

# The plot shows the 90%, 95%, 99% CIs for the mean. The dot is the sample mean, and the dashed line is also at the sample mean.

From the printed results, we can interpret each confidence interval. For example, a **95% CI** might be something like [0.12, 0.34] for the COI z-score mean. In plain language, we would say: *\"We are 95% confident that the true population mean COI score lies between 0.12 and 0.34.\"* Note that this interval is fairly narrow because our sample size is 100 (yielding a small standard error). If we had a smaller sample or more variability, the interval would be wider.

Looking at the figure, we see that the **90% CI** is the narrowest, and the **99% CI** is the widest. This is because higher confidence requires "stretching" the interval to be more sure the true mean is captured. In our example, the 99% CI is much wider (by using a larger $z^*$ value) than the 90% CI. All three intervals are centered at the sample mean (the black dot and dashed line), since that's our best estimate. The **margin of error** (half the width of the interval) increases with confidence level.

*Key points:*  
- **Interpretation of a CI:** A 95% CI means that if we repeated the study many times, 95% of the constructed intervals would contain the true mean. It does not give the probability for this interval specifically, but we treat the interval as a likely range for the parameter.  
- **CI width vs confidence:** Higher confidence -> wider interval. Lower confidence -> narrower interval. (There's a trade-off between precision and confidence.)  
- **Factors affecting CI width:** For a given confidence level, a larger sample size $n$ (smaller SE) or lower variability $s$ will produce a narrower CI. Conversely, more variability or smaller $n$ yields a wider CI.

### Your Turn 2: Computing Confidence Intervals for ACEs Sum Score

Practice constructing confidence intervals for a mean using the **ACEs sum score** variable from **Wave 21 data**.

**Instructions:**  
1. **Filter to Wave 21 only** to ensure independence (same as Activities 1 and Demo 2)
2. Extract the `aces_sum_score` variable and remove missing values (code `999`)
3. Take a **random sample** from the Wave 21 ACEs data (use `n=100`, `seed=42` for reproducibility)
4. Calculate the sample mean and standard deviation
5. Compute **90%, 95%, and 99%** confidence intervals for the mean using `scipy.stats.norm.ppf` for critical z-values
6. Print the intervals and their widths
7. **Visualize the intervals** similar to the demo (horizontal lines with the sample mean marked)
8. Save your figure as `02_YT_confidence_intervals_aces.png`

**Think about:**  
- How do the intervals change with confidence level (90% â†’ 95% â†’ 99%)?  
- How does the CI width for ACEs (a discrete count variable, 0â€“10) compare to the COI z-score from Demo 2?  
- What happens if you change your sample size (e.g., try n=30 vs n=100)? How does this affect the margin of error?
- Since ACEs are right-skewed, are the z-based CIs still appropriate? (With moderate n, CLT helps, but bootstrap might be more robustâ€”we'll explore this in Activity 2b!)

---

#### Getting Started with Copilot

Use these prompts to build your confidence interval analysis:

1. **"Filter df to Wave 21, extract aces_sum_score, replace 999 with NaN, drop missing values, take a random sample of size 100 with seed 42"**

2. **"Calculate the sample mean and sample standard deviation (ddof=1). For each confidence level (90%, 95%, 99%), compute the z-critical value using scipy.stats.norm.ppf, then calculate the margin of error and CI bounds"**

3. **"Store all three CIs in a dictionary with keys '90% CI', '95% CI', '99% CI'. Print each CI with its width (upper - lower)"**

4. **"Create a figure with horizontal lines showing all three CIs at y-positions 1, 2, 3. Add black dots at the sample mean for each line, add a vertical dashed line at xbar, label axes, save as 02_YT_confidence_intervals_aces.png"**

ðŸ’¡ **Tip:** Reference Demo 2 for the exact plot structureâ€”your code will be very similar but with ACEs data!

In [None]:
# === Your Turn 2: Confidence Intervals for ACEs (Wave 21) ===

# TODO: Filter to Wave 21, extract ACEs, remove 999 codes
# TODO: Take a random sample (n=100, seed=42)
# TODO: Compute sample mean and SD
# TODO: Calculate 90%, 95%, 99% CIs using stats.norm.ppf()
# TODO: Print all three CIs with widths
# TODO: Plot horizontal lines showing all three CIs
# TODO: Save as "02_YT_confidence_intervals_aces"

# your code here

### ðŸ§ª Mini Lesson: What is Bootstrapping (and why use it?)

<p align="center">
  <img src="bootstrapviz.png" alt="Three-panel bootstrap visual: original sample â†’ resample with replacement (B=2000) â†’ bootstrap distribution with 95% percentile CI." width="820">
</p>

Bootstrapping is a simple, powerful way to approximate the sampling distribution of a statistic (like a mean or median) using only your sample.

- **Intuition:** Treat your sample as a stand-in for the population. If you could sample from the population many times, youâ€™d see how your statistic varies. Bootstrap simulates this by resampling **your sample** with replacement many times.

**Basic procedure:**
1. Choose a statistic of interest (e.g., mean of COI z-score).  
2. For *B* iterations (e.g., *B* = 2000):  
   - Draw a bootstrap sample: sample *n* observations **with replacement** from your original sample of size *n*.  
   - Compute the statistic for that bootstrap sample.  
3. The collection of bootstrap statistics approximates the sampling distribution.

**Two useful outputs:**
- **Bootstrap Standard Error (SE):** the standard deviation of the bootstrap statistics.  
- **Bootstrap Percentile CI:** the [2.5th, 97.5th] percentiles of the bootstrap statistics for a 95% CI.

**Why use bootstrapping?**
- Works when normal-theory formulas are questionable (skewed data, non-normal stats, complex estimators).  
- Minimal assumptions beyond: observations are i.i.d. and the sample is reasonably representative.

**Caveats:**
- Very small *n* can give unstable bootstrap distributions.  
- Strong dependence (e.g., repeated measures without pairing) violates assumptions; use paired/block bootstrap variants.  
- Clean missing or special codes (e.g., 999) before bootstrapping.

**How many resamples (*B*)?**
- Common choices: 1000â€“5000. More resamples give smoother percentile estimates at higher compute cost (*B* = 2000 is a good default here).

**In this lab:**
- Weâ€™ll compare the normal-approximation CI, $ \bar{x} \pm z^* \frac{s}{\sqrt{n}} $, to the **bootstrap percentile CI**.  
- For near-normal settings and moderate *n*, both methods agree closely. With skew (e.g., ACEs), bootstrap often provides more reliable intervals.


### Activity 2b. Confidence Intervals for a Single Mean â€” **Formula vs. Bootstrap** (Demo)

We'll compute a 95% confidence interval (CI) for a single mean in two ways and compare them:

1) **Formula (Normal approximation):**  
   $\text{CI} = \bar{x} \pm z^* \cdot \frac{s}{\sqrt{n}}$
   with $z^*=1.96$ for 95%.  
2) **Bootstrap Percentile CI:** Resample the *sample* with replacement $B$ times, compute the mean each time, then take the 2.5th and 97.5th percentiles of the bootstrap distribution of means.

We'll use **COI Overall (national z-score)** `le_l_coi__addr1__coi__total__national_zscore` as the example.

Note: Substance-use outcomes in later sections now use TLFB variables (e.g., `su_y_tlfb__rxopi_totdose`, `su_y_tlfb__mj_ud`).

In [None]:
# Demo: 95% CI for a single mean via formula and bootstrap (COI overall z-score)
import numpy as np, math, pandas as pd, matplotlib.pyplot as plt
from pathlib import Path

# Try to use existing df; if missing, try to load from disk.
try:
    df  # noqa: F821
except NameError:
    df = pd.read_csv("L6L7dataset.csv")

# Pick variable and clean
var = "le_l_coi__addr1__coi__total__national_zscore"
x_all = df[var].dropna().astype(float)

# Use the FULL dataset as the sample (per course rule); report resulting n
x = x_all.to_numpy()
n = len(x)

# --- 1) Formula (Normal approx) ---
xbar = x.mean()
s = x.std(ddof=1)
z = 1.96  # 95%
moe = z * (s / math.sqrt(n))
ci_norm = (xbar - moe, xbar + moe)

# --- 2) Bootstrap Percentile CI ---
B = 2000
rng = np.random.default_rng(7)
boot_means = rng.choice(x, size=(B, n), replace=True).mean(axis=1)
ci_boot = (np.percentile(boot_means, 2.5), np.percentile(boot_means, 97.5))

print(f"Variable: {var}")
print(f"Sample mean = {xbar:.4f}, s = {s:.4f}, n = {n}")
print(f"95% Normal-approx CI: [{ci_norm[0]:.4f}, {ci_norm[1]:.4f}]  (width {ci_norm[1]-ci_norm[0]:.4f})")
print(f"95% Bootstrap CI    : [{ci_boot[0]:.4f}, {ci_boot[1]:.4f}]  (width {ci_boot[1]-ci_boot[0]:.4f})")

# Visualize bootstrap distribution with both CIs and the sample mean
FIG_DIR = Path("figures"); FIG_DIR.mkdir(exist_ok=True)
plt.figure(figsize=(7,4))
plt.hist(boot_means, bins=30, edgecolor="black")
plt.axvline(xbar, color="black", linestyle="--", linewidth=1, label=f"sample mean = {xbar:.3f}")
plt.axvline(ci_norm[0], color="C1", linestyle="--", label="Normal CI")
plt.axvline(ci_norm[1], color="C1", linestyle="--")
plt.axvline(ci_boot[0], color="C2", linestyle="--", label="Bootstrap CI")
plt.axvline(ci_boot[1], color="C2", linestyle="--")
plt.title("Bootstrap Distribution of the Mean with 95% CIs")
plt.xlabel(f"{var} â€” bootstrap means")
plt.ylabel("Frequency")
plt.legend()
plt.tight_layout()
outpath = FIG_DIR / "02b_demo_single_mean_formula_vs_bootstrap.png"
plt.savefig(outpath, dpi=150)
print(f"Figure saved to: {outpath}")
plt.show()

**Interpretation.** Both methods are centered at the sample mean. When the sampling distribution of the mean is close to Normal and \(n\) is moderate/large, the **formula CI** and the **bootstrap percentile CI** are typically similar. With skew (e.g., ACE counts) or small \(n\), bootstrap can be more robust.


### Your Turn 2b. Build a 95% CI â€” Formula **and** Bootstrap for ACEs Sum Score

Practice comparing the **formula-based CI** and **bootstrap CI** for the **ACEs sum score** using **Wave 21 data**.

**Instructions:**
- Filter to **Wave 21 only** to ensure independence (consistent with Activities 1 and 2)
- Extract `aces_sum_score` and **drop code 999** (missing values)
- Use the **full Wave 21 ACEs data as your sample** (after cleaning)
- Compute both:
  1. **95% Normal-approximation CI** using the formula $\text{CI} = \bar{x} \pm 1.96 \cdot \frac{s}{\sqrt{n}}$
  2. **95% Bootstrap percentile CI** (use $B=2000$ resamples with `seed=42`)
- Print the variable name, sample stats (mean, SD, n), and both CIs with their widths
- Visualize the bootstrap distribution with both CIs marked (similar to Demo 2b)
- Save your figure as `02b_YT_single_mean_formula_vs_bootstrap_aces.png`

**Think about:**
- Are the two CIs similar or different? Why might they differ for ACEs (a right-skewed count variable)?
- Does the bootstrap distribution look approximately normal despite the skew in the raw ACEs data?
- How does the CLT help justify using the normal approximation here (hint: we're analyzing the distribution of *means*, not raw scores)?

---

#### Getting Started with Copilot

Break this task into manageable steps with these prompts:

1. **"Filter df to Wave 21, extract aces_sum_score, replace 999 with NaN, dropna, convert to numpy array x. Calculate n, xbar (mean), and s (std with ddof=1)"**

2. **"Calculate 95% Normal CI: z=1.96, moe = z * s / sqrt(n), ci_norm = (xbar - moe, xbar + moe). Print the CI and its width"**

3. **"Bootstrap CI: create rng with seed 42, use rng.choice to resample x with replacement B=2000 times creating a (2000, n) array, compute mean of each row (axis=1) to get boot_means. Calculate ci_boot as (2.5th percentile, 97.5th percentile) of boot_means"**

4. **"Plot histogram of boot_means with 30 bins. Add 5 vertical lines: xbar (black, dashed, label='sample mean'), ci_norm[0] and ci_norm[1] (color C1, dashed, label='Normal CI'), ci_boot[0] and ci_boot[1] (color C2, dashed, label='Bootstrap CI'). Add title, xlabel, ylabel, legend, save as 02b_YT_single_mean_formula_vs_bootstrap_aces.png"**

ðŸ’¡ **Tip:** The bootstrap resampling step is the trickiestâ€”make sure to use `replace=True` and compute means along `axis=1`!

In [None]:
# === Your Turn 2b: 95% CI via formula and bootstrap (ACEs, Wave 21) ===

import numpy as np, math, matplotlib.pyplot as plt
from pathlib import Path

# TODO: Filter to Wave 21, extract ACEs, clean (replace 999 â†’ NaN, dropna)
# TODO: Use full Wave 21 data as sample (x = s_all.to_numpy(), n = len(x))
# TODO: Normal CI: compute xbar, s, z=1.96, moe, ci_norm = (xbarÂ±moe)
# TODO: Bootstrap CI: B=2000, rng seed=42, resample with replacement, 
#       compute boot_means, ci_boot = (2.5th, 97.5th percentiles)
# TODO: Print variable name, sample stats, both CIs with widths
# TODO: Plot histogram of boot_means with 5 vertical lines:
#       xbar (black), ci_norm[0/1] (C1), ci_boot[0/1] (C2)
# TODO: Save as "02b_YT_single_mean_formula_vs_bootstrap_aces.png"

# your code here

### Activity 2c. **Coverage via Simulation** â€” Does the CI hit the true mean ~95% of the time? (Demo)

**Goal:** Empirically check that a 95% CI procedure actually covers the *true* mean about 95% of the time.

**Design:** Treat the **Wave 21 ACEs data** as the population. For a given sample size $n$:  
1) Repeat **R** times: draw a simple random sample (without replacement) from Wave 21.  
2) Build **two CIs** for each sample: (a) Normal-approx, (b) Bootstrap (percentile).  
3) Record whether each interval contains the *population* mean (computed from the full Wave 21 data).  
4) Report coverage rates (proportion of intervals that contain the truth).

**Why ACEs?** ACEs are right-skewed (many people have 0â€“2 ACEs, fewer have high counts). This lets us see whether the **normal approximation** still achieves 95% coverage despite the skew, and whether **bootstrap** performs better for skewed data.

*Note:* We use Wave 21 to ensure independence of observations (no repeated measures). Substance-use variables referenced later use TLFB measures (e.g., `su_y_tlfb__rxopi_totdose`).

In [None]:
# Demo 2c: Coverage by simulation for Normal vs Bootstrap 95% CI (ACEs sum score, Wave 21)
import numpy as np, pandas as pd, matplotlib.pyplot as plt
from pathlib import Path

# Filter to Wave 21 only
df_w21 = df[df['wave'] == 21].copy()
print(f"Wave 21 data: {df_w21.shape[0]} observations")

# Extract and clean ACEs data
aces_all = df_w21['aces_sum_score'].copy()
aces_all = aces_all.replace(999, np.nan).dropna().astype(float)
print(f"Wave 21 ACEs (non-missing): n={len(aces_all)}")

# Population mean (the "truth" we're trying to cover)
true_mu = aces_all.mean()
print(f"True population mean (Wave 21 ACEs) = {true_mu:.3f}\n")

# Helper functions
def ci_normal(sample):
    xbar = sample.mean()
    s = sample.std(ddof=1)
    n = len(sample)
    z = 1.96
    moe = z * (s / np.sqrt(n))
    return (xbar - moe, xbar + moe)

def ci_bootstrap(sample, B=400, alpha=0.05, rng=None):
    X = sample.to_numpy()
    n = len(X)
    rng = np.random.default_rng() if rng is None else rng
    boots = rng.choice(X, size=(B, n), replace=True).mean(axis=1)
    lo = np.percentile(boots, 100*(alpha/2))
    hi = np.percentile(boots, 100*(1 - alpha/2))
    return (lo, hi)

# Simulation parameters
np.random.seed(42)
sizes = [30, 50, 100]  # sample sizes to test
R = 500  # repetitions per sample size
records = []

print("Running coverage simulation...")
for n in sizes:
    cover_norm = cover_boot = 0
    for r in range(R):
        sample = aces_all.sample(n, replace=False)
        
        # Normal CI
        L, U = ci_normal(sample)
        if L <= true_mu <= U:
            cover_norm += 1
        
        # Bootstrap CI
        Lb, Ub = ci_bootstrap(sample, B=400)
        if Lb <= true_mu <= Ub:
            cover_boot += 1
    
    records.append({
        "n": n,
        "coverage_normal_95": cover_norm / R,
        "coverage_bootstrap_95": cover_boot / R
    })

# Display results
cov_df = pd.DataFrame(records)
print("\nCoverage Results:")
print(cov_df)

# Plot coverage vs sample size
FIG_DIR = Path("figures")
FIG_DIR.mkdir(exist_ok=True)
plt.figure(figsize=(7,4))
plt.plot(cov_df["n"], cov_df["coverage_normal_95"], marker="o", linewidth=2, 
         label="Normal 95% CI", color='C1')
plt.plot(cov_df["n"], cov_df["coverage_bootstrap_95"], marker="s", linewidth=2, 
         label="Bootstrap 95% CI", color='C2')
plt.axhline(0.95, color="gray", linestyle="--", linewidth=1, label="Target 0.95")
plt.ylim(0.88, 1.0)
plt.xlabel("Sample size (n)")
plt.ylabel("Coverage (proportion capturing true mean)")
plt.title("Empirical Coverage: ACEs Sum Score (Wave 21)")
plt.legend()
plt.grid(alpha=0.3)
plt.tight_layout()
outpath = FIG_DIR / "02c_demo_coverage_vs_n_aces.png"
plt.savefig(outpath, dpi=150)
print(f"\nFigure saved to: {outpath}")
plt.show()

**Reading the coverage plot.** Each point shows the fraction of 95% intervals that contained the true mean across $R=500$ samples. Good procedures land near the 0.95 line (dashed gray).

**Key observations:**
- Both methods achieve coverage near 95% even for ACEs (a right-skewed variable), especially as $n$ increases.
- At **smaller sample sizes** (n=30), the **bootstrap** often maintains coverage closer to 0.95, while the normal approximation may slightly undercover (fall below 0.95) due to skewness.
- As **sample size increases** (n=100), both methods converge toward 95% coverage, confirming that the Central Limit Theorem helps even with skewed data when $n$ is large enough.
- **Practical takeaway:** For skewed variables like ACEs with moderate samples (n=30-50), bootstrap CIs provide more reliable coverage than the normal approximation.

### Activity 3. Group Differences and Longitudinal Change with CIs

**Demo:** In the final activity, we'll apply what we've learned to compare groups and to examine change over time, using confidence intervals to help interpret differences.

We will explore a substantive question: **How do substance use outcomes differ by risk factors (like ACEs or community opportunity) and how do they change from adolescence to young adulthood?** Specifically, we'll look at the **non-medical prescription opioid total dose** aggregated from a Timeline Follow-Back (TLFB) window by individuals at ~age 16 (baseline) and again by ~age 21 (follow-up). We'll group individuals by their community opportunity level (COI quintiles) to see if disadvantaged vs advantaged groups differ in substance use and how their trajectories differ.

Steps in this demonstration:
1. **Extract baseline (Wave 16) and follow-up (Wave 21) data** â€“ Filter the dataset to create separate dataframes for each timepoint.
2. **Select variables for analysis** â€“ Choose the outcome variable (TLFB total dose) and grouping variable (COI quintile).
3. **Extract baseline data** â€“ Get subject IDs, grouping variable (COI quintile), and baseline outcome values.
4. **Extract follow-up data** â€“ Get subject IDs and follow-up outcome values.
5. **Merge baseline and follow-up** on subject_id to create a single dataframe with both timepoints.
6. **Group by COI quintile** and compute mean, SD, and count for both baseline and follow-up.
7. **Compute standard errors and 95% CIs** for each group at each timepoint.
8. **Plot the group means** at baseline and follow-up with error bars (95% CIs) â€“ a side-by-side bar chart showing each quintile group with two bars (baseline vs follow-up).

This visual will allow us to compare: (a) differences between groups (e.g., Quintile 1 vs Quintile 5) and (b) changes over time within groups (from 16 to 21). We'll pay attention to whether the CIs overlap or not to infer statistical significance of differences.

*Note:* Because we're reusing the entire sample for each group, these group comparisons are effectively based on large sub-samples (so CIs may be quite narrow). Also, individuals are the same at baseline and follow-up, but for simplicity, we'll treat the means independently (a more advanced analysis would account for the paired nature of the data).

In [None]:
# Demo 3: Group differences by COI quintile, baseline vs follow-up (Rx opioid TLFB total dose)

# 1. Extract baseline and follow-up data (filter by wave)
baseline = df[df['wave'] == 16].copy()
followup = df[df['wave'] == 21].copy()
print(f"Baseline data (Wave 16): {baseline.shape[0]} observations")
print(f"Follow-up data (Wave 21): {followup.shape[0]} observations")

# 2. Select variables for analysis
outcome_var = 'su_y_tlfb__rxopi_totdose'  # TLFB total dose (non-medical Rx opioids)
group_var = 'le_l_coi__addr1__coi__total__national_c5'  # COI quintile

# 3. Extract baseline COI quintile and baseline outcome
baseline_data = baseline[['subject_id', group_var, outcome_var]].rename(
    columns={outcome_var: 'outcome_bl'}
)

# 4. Extract follow-up outcome
followup_data = followup[['subject_id', outcome_var]].rename(
    columns={outcome_var: 'outcome_fu'}
)

# 5. Merge baseline and follow-up on subject_id
data = baseline_data.merge(followup_data, on='subject_id', how='inner')
data = data.rename(columns={group_var: 'group_var'})

# 6. Compute mean TLFB dose at baseline and follow-up for each group
# 7. Compute the standard error and 95% CI for each mean
group_stats = data.groupby('group_var').agg(
    mean_bl=('outcome_bl', 'mean'),
    sd_bl=('outcome_bl', 'std'),
    mean_fu=('outcome_fu', 'mean'),
    sd_fu=('outcome_fu', 'std'),
    count=('outcome_bl', 'count')
)
group_stats['se_bl'] = group_stats['sd_bl'] / np.sqrt(group_stats['count'])
group_stats['se_fu'] = group_stats['sd_fu'] / np.sqrt(group_stats['count'])

# 8. Plot the group means at baseline and follow-up with 95% CI error bars
quintiles = group_stats.index.astype(int)
x = np.arange(len(quintiles))
width = 0.35

fig, ax = plt.subplots(figsize=(8,5))
# Bars for baseline
ax.bar(x - width/2, group_stats['mean_bl'], width, 
       yerr=1.96*group_stats['se_bl'], capsize=5, 
       label='Baseline (Age ~16)', color='skyblue')
# Bars for follow-up
ax.bar(x + width/2, group_stats['mean_fu'], width, 
       yerr=1.96*group_stats['se_fu'], capsize=5, 
       label='Follow-up (Age ~21)', color='orange')

# X-axis labels and legend
ax.set_xlabel('COI Overall Quintile (1=Lowest Opportunity, 5=Highest)')
ax.set_ylabel('Mean non-medical Rx opioid total dose (TLFB)')
ax.set_title('Prescription Opioid (TLFB total dose) by Opportunity Quintile: Age 16 vs 21')
ax.set_xticks(x)
ax.set_xticklabels([str(q) for q in quintiles])
ax.legend()
savefig('03_demo_group_CI')
plt.show()

# Display the summary statistics table
print("\nGroup statistics by COI quintile:")
print(group_stats)

In the chart above, each quintile group has two bars: the **blue bar** is the mean TLFB total dose at baseline (~age 16) and the **orange bar** is at follow-up (~age 21). The error bars on each bar represent the 95% confidence interval for the mean.

**Observations:**
- **Between-group differences:** Inspect whether mean TLFB dose differs across quintiles at each timepoint and whether a gradient by opportunity level appears. Overlapping 95% CIs imply no clear differences.

- **Within-group change over time:** Compare blue to orange bars within each quintile. If the follow-up CI sits entirely above the baseline CI, that suggests a likely increase over time.

- **Magnitude of change:** Consider both absolute and relative change. With skewed, zero-inflated use variables, mean differences can be small in absolute terms but meaningful in context.

**Interpreting CIs:** We use the confidence intervals to judge differences. Substantial CI overlap suggests no statistically significant difference at the Î±=0.05 level. The main focus here is the **within-group temporal change** and whether it varies by opportunity level.

### Your Turn 3: Group Comparison and Change (ACE groups)

For your final task, analyze substance use differences by **ACEs (Adverse Childhood Experiences)** group and over time. Specifically, compare a high-ACE group vs a low-ACE group, and see how their substance use changes from age 16 to 21.

**Instructions:**  
1. Choose an outcome to examine (TLFB-based measure):
   - Rx opioid total dose (non-medical; TLFB): `su_y_tlfb__rxopi_totdose`
   - Marijuana use-days (TLFB): `su_y_tlfb__mj_ud`

2. Create separate baseline (wave 16) and follow-up (wave 21) dataframes

3. Build `baseline_data` with columns: `['subject_id', 'aces_sum_score', outcome_var]`, rename outcome â†’ `'outcome_bl'`

4. Build `followup_data` with columns: `['subject_id', outcome_var]`, rename outcome â†’ `'outcome_fu'`

5. Merge the two dataframes on `'subject_id'` using inner join

6. Filter out missing ACEs values (code 999) and apply `.copy()`

7. Create `ACE_group` using `pd.cut()` with:
   - `bins = [-1, 0, 3, 10]`
   - `labels = ['0 ACE', '1-3 ACE', '4+ ACE']`

8. Group by `ACE_group`, aggregate to get mean, SD, and count for both `outcome_bl` and `outcome_fu`

9. Calculate standard errors: `se_bl` and `se_fu` (SD / sqrt(count))

10. Extract the `'0 ACE'` and `'4+ ACE'` rows

11. Print 4 lines showing each group's baseline and follow-up mean with 95% CI bounds

12. Create a side-by-side bar plot:
    - Baseline bars (color='skyblue', label='Baseline (Age ~16)')
    - Follow-up bars (color='orange', label='Follow-up (Age ~21)')
    - Error bars using `yerr=1.96*SE`, `capsize=5`
    - X-axis labels: `['0 ACE', '4+ ACE']`
    - Appropriate title and axis labels

13. Save your figure as `03_YT_ACE_group_CI.png`

**Guiding questions:**  
- Does the high-ACE group have a different mean than the 0-ACE group?  
- Are the CIs overlapping or separate?  
- How much does each group increase from baseline to follow-up, and is that increase meaningful (consider the CIs)?  
- Relate your findings to the context: ACEs are risk factors, so we expect the 4+ ACE group to possibly have higher substance use. Does the data support this?

---

#### Getting Started with Copilot

This is the most complex taskâ€”break it into phases:

**Phase 1: Data Preparation**

1. "Create baseline dataframe by filtering df to wave==16, select columns ['subject_id', 'aces_sum_score', outcome_var], rename outcome_var to 'outcome_bl'. Create followup dataframe by filtering to wave==21, select ['subject_id', outcome_var], rename to 'outcome_fu'. Merge them on 'subject_id' with how='inner'"

2. "Filter merged data to remove rows where aces_sum_score equals 999, apply .copy(). Then create a new column 'ACE_group' using pd.cut with bins=[-1,0,3,10] and labels=['0 ACE','1-3 ACE','4+ ACE']"

**Phase 2: Calculate Statistics**

3. "Group the data by 'ACE_group' and aggregate to compute mean_bl=(outcome_bl, mean), sd_bl=(outcome_bl, std), mean_fu=(outcome_fu, mean), sd_fu=(outcome_fu, std), count=(outcome_bl, count). Then add two new columns: se_bl = sd_bl / sqrt(count) and se_fu = sd_fu / sqrt(count)"

4. "Extract the rows for '0 ACE' and '4+ ACE' groups from ace_stats. For each group, print the baseline mean with its 95% CI (mean Â± 1.96*SE) and follow-up mean with its 95% CI"

**Phase 3: Visualization**

5. "Create a bar plot with x-positions [0,1] for the two groups ['0 ACE','4+ ACE']. For each group, plot two bars side-by-side (use width=0.35, offset by Â±width/2): one skyblue bar for baseline mean with yerr=1.96*se_bl, one orange bar for follow-up mean with yerr=1.96*se_fu. Set capsize=5 for error bars. Add appropriate labels, title, legend, and save as 03_YT_ACE_group_CI.png"

ðŸ’¡ **Tip:** Demo 3 has the exact plot structure you needâ€”the main difference is using ACE groups instead of COI quintiles!

In [None]:
# === Your Turn 3: Group differences by ACE category (0 vs 4+), baseline vs follow-up ===

# Allowed outcomes (pick exactly one TLFB measure):
#   'su_y_tlfb__rxopi_totdose'  # non-medical Rx opioid total dose (TLFB)
#   'su_y_tlfb__mj_ud'          # marijuana use-days (TLFB)
outcome_var = '___'

# TODO: Extract baseline (wave 16) and followup (wave 21) DataFrames
# TODO: Select your outcome variable for analysis
# TODO: Extract baseline_data with ['subject_id', 'aces_sum_score', outcome_var]
#       Rename outcome_var â†’ 'outcome_bl'
# TODO: Extract followup_data with ['subject_id', outcome_var]
#       Rename outcome_var â†’ 'outcome_fu'
# TODO: Merge on 'subject_id' (inner join)
# TODO: Filter out ACEs==999 (.copy() after filtering!)
# TODO: Create ACE_group using pd.cut with bins=[-1,0,3,10], 
#       labels=['0 ACE', '1-3 ACE', '4+ ACE']
# TODO: Groupby 'ACE_group', aggregate mean/SD/count for both timepoints
# TODO: Calculate SE columns: se_bl, se_fu
# TODO: Extract '0 ACE' and '4+ ACE' rows
# TODO: Print 4 lines: each group's baseline & followup mean with 95% CI
# TODO: Plot side-by-side bars (baseline=skyblue, followup=orange)
#       with yerr=1.96*SE, capsize=5
# TODO: Save as '03_YT_ACE_group_CI'

# your code here