---
title: "PSTAT 5A: Confidence Intervals Deep Dive"
subtitle: "Lecture 12 - From Theory to Practice: z and t Distributions"
author: "Narjes Mathlouthi"
date: today
format:
  revealjs:
    logo: /img/logo.png
    slide-level: 2            
    theme:  default
    css: lec12-style.css
    slide-number: true
    chalkboard: true
    preview-links: auto
    footer: "Confidence Intervals – z and t Distributions © 2025"
    transition: slide
    background-transition: fade
    incremental: false
    smaller: true

jupyter: pstat5a
execute:
  echo: false
  warning: false
  message: false
  eval: true
---

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import plotly.graph_objects as go
import plotly.express as px
from plotly.subplots import make_subplots
import scipy.stats as stats
from scipy.stats import norm, t
import warnings
warnings.filterwarnings('ignore')

# Enhanced color palette
colors = {
    'primary': '#3b82f6',
    'secondary': '#f59e0b', 
    'success': '#10b981',
    'danger': '#ef4444',
    'info': '#8b5cf6',
    'warning': '#f97316',
    'light': '#f8fafc',
    'dark': '#1f2937',
    'accent': '#06b6d4'
}

# Set random seed for reproducibility
np.random.seed(42)

# Welcome to Lecture 12 {.center}

**Confidence Intervals: From Theory to Practice**

*"A confidence interval is a way of expressing uncertainty in a precise, mathematical way"*

------------------------------------------------------------------------

## 📢 Important Announcements

::::: columns
::: {.column width="50%"}
### 📝 Quiz 2 Details

**When:**\
- 📅 **Date:** Friday, July 25\
- ⏰ **Window:** 7 AM – 12 AM\
- ⏳ **Duration:** 1 hour once started

**Where:** 💻 Online via Canvas

**Covers:** Material from Weeks 3-4
:::

::: {.column width="50%"}
### 📚 What to Expect

-   Discrete & continuous distributions
-   Probability calculations
-   Expected value & variance
-   Normal distribution applications
-   **Note:** Upload photos of written work for calculation problems
:::
:::::

## 📢 Today's Roadmap

::::: columns
::: column
### 🎯 Learning Objectives

-   **Know the difference** between $z$ and $t$ distributions
-   **Understand when to use** each distribution
-   **Learn to find critical values** from tables and plots
-   **Practice calculating** confidence intervals step-by-step
-   **Interpret results** correctly in context
:::

::: column
### 📋 What We'll Cover

1.  **Review:** Confidence interval basics
2.  **The t-Distribution:** When and why we use it
3.  **Critical Regions:** Finding the right values
4.  **Practical Examples:** z and t calculations
5.  **Common Mistakes:** What to avoid
6.  **Real Applications:** Making it meaningful
:::
:::::

------------------------------------------------------------------------

## Quick Review: Confidence Interval Basics 🔄

::::: columns
::: column
### 🎯 The Big Idea

A **confidence interval (CI)** takes a single sample statistic and turns it into a *range* that is likely to contain an *unknown* population parameter; most often the mean $\mu$.

**CI template**

$$
\underbrace{\text{Point estimate}}_{\color{blue}{(e.g., \bar{x})}}
\;\pm\;
\underbrace{\text{(critical value) $\times$ (standard error)}}_{\color{red}{\text{Margin of Error (ME)}}}
$$

**For the mean**

| Situation | Formula | Distribution |
|-----------------------|-------------------|-----------------------------|
| **σ known** (rare) | $\displaystyle \bar{x} \;\pm\; z^{*}\,\frac{\sigma}{\sqrt{n}}$ | *z*-distribution |
| **σ unknown** (typical) | $\displaystyle \bar{x} \;\pm\; t^{*}\,\frac{s}{\sqrt{n}}$ | *t*-distribution ($df = n-1$) |

**Key points**

-   We *never* know the true mean $\mu$ in practice, that's exactly what the CI estimates.\
-   Use the population SD **σ** only when it is genuinely known (e.g., industrial process with long‑term QC).\
-   Otherwise substitute the sample SD **s** and switch to the *t*‑distribution, which is wider to reflect that extra uncertainty.
:::

::: column

In [None]:
# Quick confidence interval review visualization
np.random.seed(42)

# Create a sampling distribution
sample_mean = 75
se = 3
confidence_level = 0.95
alpha = 1 - confidence_level
z_critical = stats.norm.ppf(1 - alpha/2)

# Generate the distribution
x = np.linspace(sample_mean - 4*se, sample_mean + 4*se, 1000)
y = stats.norm.pdf(x, sample_mean, se)

# Calculate CI bounds
ci_lower = sample_mean - z_critical * se
ci_upper = sample_mean + z_critical * se

fig = go.Figure()

# Add the distribution curve
fig.add_trace(go.Scatter(
    x=x, y=y,
    mode='lines',
    line=dict(color=colors['primary'], width=3),
    name='Sampling Distribution',
    fill='tonexty'
))

# Shade the confidence interval
ci_mask = (x >= ci_lower) & (x <= ci_upper)
fig.add_trace(go.Scatter(
    x=np.concatenate([x[ci_mask], [ci_upper, ci_lower]]),
    y=np.concatenate([y[ci_mask], [0, 0]]),
    fill='toself',
    fillcolor='rgba(59, 130, 246, 0.3)',
    line=dict(color='rgba(0,0,0,0)'),
    name='95% Confidence Interval'
))

# Add critical lines
fig.add_vline(x=sample_mean, line_dash="solid", line_color=colors['danger'], 
              line_width=3, annotation_text="Sample Mean")
fig.add_vline(x=ci_lower, line_dash="dot", line_color=colors['warning'], 
              line_width=2, annotation_text=f"Lower: {ci_lower:.1f}")
fig.add_vline(x=ci_upper, line_dash="dot", line_color=colors['warning'], 
              line_width=2, annotation_text=f"Upper: {ci_upper:.1f}")

fig.update_layout(
    title="95% Confidence Interval: The Range of Reasonable Values",
    xaxis_title="Value",
    yaxis_title="Probability Density",
    height=400,
    showlegend=False
)

fig.show()

**Key Formula:** $\bar{x} \pm z^* \cdot \frac{s}{\sqrt{n}}$ (when using z-distribution)
:::
:::::

------------------------------------------------------------------------

## Step-by-Step Example 1: Using z-Distribution 📝

:::::: columns
:::: column
### 🎯 Problem Setup

**Research Question:** What is the average SAT score of students at UCSB?

**Given Information:**

-   Sample size: $n = 50$ students
-   Sample mean: $\bar{x} = 1180$
-   **Population standard deviation:** $\sigma = 120$ (known from past data)
-   Confidence level: $95\%$

**Question:** Construct a $95\%$ confidence interval for the population mean SAT score.

::: {#my-solution .collapsible-solution}
<button class="solution-toggle">
  Show Solution
</button>
<div class="solution-content">

**Step 1: Check conditions**

-   $\sigma$ is known ✓
-   Use z-distribution ✓

**Step 2: Find critical value**

-   For $95\%$ CI: $\alpha = 0.05, \frac{\alpha}{2} = 0.025$
-   $z^* = 1.96$ (from z-table)

**Step 3: Calculate SE**

$$SE = \frac{\sigma}{\sqrt{n}} = \frac{120}{\sqrt{50}} = \frac{120}{7.071} = 16.97$$

**Step 4: Calculate Margin of Error**

$$ME = z^* \times SE = 1.96 \times 16.97 = 33.26$$

**Step 5: Construct CI**

$$CI = \bar{x} \pm ME = 1180 \pm 33.26 = (1146.7, 1213.3)$$

**Final Answer:** We are 95% confident that the true average SAT score is between 1146.7 and 1213.3.
</div>
:::
::::

::: column

In [None]:
# Example 1: z-distribution calculation
n = 50
x_bar = 1180
sigma = 120
confidence = 0.95
z_star = 1.96

# Calculate standard error and CI
se = sigma / np.sqrt(n)
margin_error = z_star * se
ci_lower = x_bar - margin_error
ci_upper = x_bar + margin_error

# Create visualization
fig = go.Figure()

# Sampling distribution
x_range = np.linspace(ci_lower - 50, ci_upper + 50, 1000)
y_range = stats.norm.pdf(x_range, x_bar, se)

fig.add_trace(go.Scatter(
    x=x_range, y=y_range,
    mode='lines',
    line=dict(color=colors['primary'], width=3),
    name='Sampling Distribution'
))

# Shade the confidence interval
ci_mask = (x_range >= ci_lower) & (x_range <= ci_upper)
fig.add_trace(go.Scatter(
    x=np.concatenate([x_range[ci_mask], [ci_upper, ci_lower]]),
    y=np.concatenate([y_range[ci_mask], [0, 0]]),
    fill='toself',
    fillcolor='rgba(59, 130, 246, 0.3)',
    line=dict(color='rgba(0,0,0,0)'),
    name='95% Confidence Interval'
))

# Add lines
fig.add_vline(x=x_bar, line_dash="solid", line_color=colors['danger'], 
              line_width=3, annotation_text=f"x̄ = {x_bar}")
fig.add_vline(x=ci_lower, line_dash="dot", line_color=colors['warning'], 
              line_width=2, annotation_text=f"Lower = {ci_lower:.1f}")
fig.add_vline(x=ci_upper, line_dash="dot", line_color=colors['warning'], 
              line_width=2, annotation_text=f"Upper = {ci_upper:.1f}")

fig.update_layout(
    title="Example 1: 95% CI using z-Distribution (σ known)",
    xaxis_title="SAT Score",
    yaxis_title="Probability Density",
    height=400,
    showlegend=False
)

fig.show()

:::
::::::

------------------------------------------------------------------------

## Step-by-Step Example 2: Using t-Distribution 📝

::::::: columns
::::: column
### 🎯 Problem Setup

**Research Question:** What is the average daily coffee consumption at our office?

**Given Information:**

\- Sample size: n = 16 employees

\- Sample mean: $\bar{x} = 2.8$ cups

\- **Sample standard deviation: s = 0.9** ($\sigma$ unknown)

\- Confidence level: 90%

**Question:** Construct a 90% confidence interval for the population mean daily coffee consumption.

:::: collapsible-solution
<button class="solution-toggle">

Show Solution

</button>

::: solution-content
**Step 1: Check conditions** - $\sigma$ is unknown ✓ - n \< 30 ✓ - Use t-distribution ✓

**Step 2: Calculate degrees of freedom** - df = n - 1 = 16 - 1 = 15

**Step 3: Find critical value** - For 90% CI: $\alpha = 0.10, \alpha/2 = 0.05$ - t\* = 1.753 (from t-table, df = 15)

**Step 4: Calculate SE**

$$SE = \frac{s}{\sqrt{n}} = \frac{0.9}{\sqrt{16}} = \frac{0.9}{4} = 0.225$$

**Step 5: Calculate Margin of Error**

$$ME = t^* \times SE = 1.753 \times 0.225 = 0.394$$

**Step 6: Construct CI**

$$CI = \bar{x} \pm ME = 2.8 \pm 0.394 = (2.406, 3.194)$$

**Final Answer:** We are 90% confident that the true average daily coffee consumption is between 2.406 and 3.194 cups.
:::
::::
:::::

::: column

In [None]:
# Example 2: t-distribution calculation
n2 = 16
x_bar2 = 2.8
s = 0.9
confidence2 = 0.90
df = n2 - 1
t_star = stats.t.ppf(0.95, df)  # 90% CI, so upper tail is 0.05

# Calculate standard error and CI
se2 = s / np.sqrt(n2)
margin_error2 = t_star * se2
ci_lower2 = x_bar2 - margin_error2
ci_upper2 = x_bar2 + margin_error2

# Create visualization
fig = go.Figure()

# Sampling distribution
x_range2 = np.linspace(ci_lower2 - 0.5, ci_upper2 + 0.5, 1000)
y_range2 = stats.t.pdf((x_range2 - x_bar2)/se2, df) / se2

fig.add_trace(go.Scatter(
    x=x_range2, y=y_range2,
    mode='lines',
    line=dict(color=colors['info'], width=3),
    name='Sampling Distribution (t)'
))

# Shade the confidence interval
ci_mask2 = (x_range2 >= ci_lower2) & (x_range2 <= ci_upper2)
fig.add_trace(go.Scatter(
    x=np.concatenate([x_range2[ci_mask2], [ci_upper2, ci_lower2]]),
    y=np.concatenate([y_range2[ci_mask2], [0, 0]]),
    fill='toself',
    fillcolor='rgba(139, 92, 246, 0.3)',
    line=dict(color='rgba(0,0,0,0)'),
    name='90% Confidence Interval'
))

# Add lines
fig.add_vline(x=x_bar2, line_dash="solid", line_color=colors['danger'], 
              line_width=3, annotation_text=f"x̄ = {x_bar2}")
fig.add_vline(x=ci_lower2, line_dash="dot", line_color=colors['warning'], 
              line_width=2, annotation_text=f"Lower = {ci_lower2:.3f}")
fig.add_vline(x=ci_upper2, line_dash="dot", line_color=colors['warning'], 
              line_width=2, annotation_text=f"Upper = {ci_upper2:.3f}")

fig.update_layout(
    title="Example 2: 90% CI using t-Distribution (σ unknown)",
    xaxis_title="Cups of Coffee",
    yaxis_title="Probability Density",
    height=400,
    showlegend=False
)

fig.show()

:::
:::::::

------------------------------------------------------------------------

## Practice Problem: Test Your Skills! 🧠

::::::: columns
::::: column
### 🎯 Your Turn!

**Problem:** A researcher wants to estimate the average time students spend studying per day.

**Given:**

-   Sample size: $n = 25$ students\
-   Sample mean: $\bar{x} = 3.2$ hours
-   Sample standard deviation: $s = 1.1$ hours
-   Confidence level: $95\%$

**Questions:**

1.  Should you use $z$ or $t$-distribution? Why?
2.  What are the degrees of freedom?
3.  What is the critical value?
4.  Calculate the 95% confidence interval
5.  Interpret your result in context

:::: collapsible-solution
<button class="solution-toggle">

Show Solution

</button>

::: solution-content
**Step 1: Distribution Choice** Use t-distribution because: - σ is unknown (only sample standard deviation s is given) - n = 25 \< 30

**Step 2: Degrees of Freedom** df = n - 1 = 25 - 1 = 24

**Step 3: Critical Value** For 95% CI with df = 24: t\* = 2.064

**Step 4: Calculate CI**

Standard Error: $SE = \frac{s}{\sqrt{n}} = \frac{1.1}{\sqrt{25}} = \frac{1.1}{5} = 0.220$

Margin of Error: $ME = t^* \times SE = 2.064 \times 0.220 = 0.454$

Confidence Interval: $CI = \bar{x} \pm ME = 3.2 \pm 0.454 = (2.746, 3.654)$

**Step 5: Interpretation** We are 95% confident that the true average study time for students is between 2.746 and 3.654 hours per day.
:::
::::
:::::

::: column

In [None]:
# Practice problem visualization
n_practice = 25
x_bar_practice = 3.2
s_practice = 1.1
df_practice = n_practice - 1
t_star_practice = stats.t.ppf(0.975, df_practice)  # 95% CI

se_practice = s_practice / np.sqrt(n_practice)
margin_error_practice = t_star_practice * se_practice
ci_lower_practice = x_bar_practice - margin_error_practice
ci_upper_practice = x_bar_practice + margin_error_practice

# Create visualization
fig = go.Figure()

# Sampling distribution
x_range_practice = np.linspace(ci_lower_practice - 0.5, ci_upper_practice + 0.5, 1000)
y_range_practice = stats.t.pdf((x_range_practice - x_bar_practice)/se_practice, df_practice) / se_practice

fig.add_trace(go.Scatter(
    x=x_range_practice, y=y_range_practice,
    mode='lines',
    line=dict(color=colors['success'], width=3),
    name='Sampling Distribution'
))

# Shade the confidence interval
ci_mask_practice = (x_range_practice >= ci_lower_practice) & (x_range_practice <= ci_upper_practice)
fig.add_trace(go.Scatter(
    x=np.concatenate([x_range_practice[ci_mask_practice], [ci_upper_practice, ci_lower_practice]]),
    y=np.concatenate([y_range_practice[ci_mask_practice], [0, 0]]),
    fill='toself',
    fillcolor='rgba(16, 185, 129, 0.3)',
    line=dict(color='rgba(0,0,0,0)'),
    name='95% Confidence Interval'
))

# Add lines
fig.add_vline(x=x_bar_practice, line_dash="solid", line_color=colors['danger'], 
              line_width=3, annotation_text=f"x̄ = {x_bar_practice}")
fig.add_vline(x=ci_lower_practice, line_dash="dot", line_color=colors['warning'], 
              line_width=2, annotation_text=f"Lower = {ci_lower_practice:.3f}")
fig.add_vline(x=ci_upper_practice, line_dash="dot", line_color=colors['warning'], 
              line_width=2, annotation_text=f"Upper = {ci_upper_practice:.3f}")

fig.update_layout(
    title="Practice Problem: Study Time Confidence Interval",
    xaxis_title="Study Time (hours)",
    yaxis_title="Probability Density",
    height=400,
    showlegend=False
)

fig.show()

### 🤔 Think About It...

-   Why is the t-distribution appropriate here?
-   How would the interval change if n = 100?
-   What if we wanted 99% confidence instead?
:::
:::::::

------------------------------------------------------------------------

## Summary: Key Takeaways 🎯

::::: columns
::: column
### 🧠 Core Concepts

**1. Distribution Choice** - σ known → z-distribution - σ unknown + n ≥ 30 → z-distribution\
- σ unknown + n \< 30 → t-distribution

**2. t-Distribution Properties** - Heavier tails than z - Depends on degrees of freedom (df = n-1) - Approaches z as df increases

**3. Critical Regions** - α/2 in each tail for two-sided CI - Critical values from tables or software - Larger confidence → larger critical values
:::

::: column
### 🛠️ Practical Skills

**4. Calculation Steps** 1. Check conditions (σ known?, sample size?) 2. Choose distribution (z or t) 3. Find critical value 4. Calculate standard error 5. Compute margin of error\
6. Construct interval 7. Interpret in context

**5. Interpretation** - "We are C% confident..." - Focus on the process, not individual interval - Consider practical significance

**6. Common Pitfalls to Avoid** - Wrong distribution choice - Incorrect degrees of freedom - Using α instead of α/2 - Misinterpreting the interval
:::
:::::

------------------------------------------------------------------------

## 

```{=html}
<a href="https://pstat5a.com/schedule.html" class="main-page-btn">🏠 Back to Main Page</a>
```


<script>
// Direct solution toggle implementation
console.log('Solution toggle script loaded');

// Function to setup toggles
function setupToggles() {
  console.log('Setting up toggles...');
  const buttons = document.querySelectorAll('.solution-toggle');
  console.log('Found buttons:', buttons.length);
  
  buttons.forEach((btn, i) => {
    console.log('Setting up button', i + 1);
    
    // Clear any existing handlers
    btn.onclick = null;
    
    // Add click handler
    btn.onclick = function(e) {
      e.preventDefault();
      e.stopPropagation();
      console.log('Button clicked!');
      
      const parent = this.closest('.collapsible-solution');
      if (parent) {
        parent.classList.toggle('open');
        this.textContent = parent.classList.contains('open') ? 'Hide Solution' : 'Show Solution';
        console.log('Toggled! Open:', parent.classList.contains('open'));
      }
    };
  });
}

// Try multiple times
document.addEventListener('DOMContentLoaded', setupToggles);
window.addEventListener('load', setupToggles);
setTimeout(setupToggles, 500);
setTimeout(setupToggles, 1000);
setTimeout(setupToggles, 2000);

// For Reveal.js
if (typeof Reveal !== 'undefined') {
  Reveal.addEventListener('ready', setupToggles);
}
</script>
