# STA130 Tutorial 3 (with \<Your Favorite TA\>): Bootstrapping
Today we're interested in how well samples approximate parameters!
![](im/4/garfield_confint.png)

## Confidence Interval Widths (20 mins) [Click "down" next]
- *Break into groups*

Compare the following two statements:
- "We have 99% confidence that tomorrow's high temperature will be between -40 and 200 degrees!"
- "We have 70% confidence that tomorrow's high temperature will be between 10 and 20 degrees!"

Discuss:
- What does it mean when we say we have "99% confidence" or "70% confidence"
- Is it always desirable to have higher confidence levels in predictions? 
- Which one is more informative, if you want to decide what clothing to wear tomorrow?

#### Confidence Interval Widths (continued...) [Click "down" next]
### How Do We Get Narrower Confidence Intervals?
> At a fixed confidence level, narrower intervals are more meaningful and therefore more likely actionable
- Discuss/Explore what changing the sample size, `n` in line 1 does on the next slide 

In [58]:
from scipy import stats
import plotly.express as px
import pandas as pd
import numpy as np
normal_population = stats.norm(loc=8, scale=1)
x = normal_population.rvs(size=30)

def bootstrap_magic(bootstrapped_means):
    lower = np.quantile(bootstrapped_means, (0.025, 0.975))[0]
    upper = np.quantile(bootstrapped_means, (0.025, 0.975))[1]
    print("The 95% Bootstrap Confidence Interval is :", (lower, upper))
    print("The width of the Confidence Interval is:", upper-lower)

    fig = px.histogram(pd.DataFrame({"Bootstrapped Means": bootstrapped_means}), x="Bootstrapped Means", histnorm='probability density',
                  title='95% Confidence Interval')
    fig.add_vline(x=lower, line_dash='dash', line_color='firebrick')
    fig.add_vline(x=upper, line_dash='dash', line_color='firebrick')
    fig.show()


In [59]:
n = 5 # CHANGE THIS VALUE

sample = normal_population.rvs(size=n)
bootstrapped_means = []
for i in range(1000):
    bootstrap_sample = np.random.choice(sample, n)
    bootstrapped_means += [bootstrap_sample.mean()]
bootstrapped_means

# plotting process hidden in helper function to make code more concise
bootstrap_magic(bootstrapped_means) 

The 95% Bootstrap Confidence Interval is : (7.134206367190833, 8.659952816949637)
The width of the Confidence Interval is: 1.5257464497588042


#### Confidence Interval Widths (continued...) [Click "down" next]
![](im/4/confint_width_graph.JPG)

#### Confidence Interval Widths (continued...) [Click "down" next]
- Full Class Discussion: How does the skewness of the sampling distribution change as the sample size increases? 

The Sampling Distribution of $\bar x$ VS Skewness 
![](im/4/sampling_dist_graphs.JPG)

# Tutorial Activity: Quiz (15 mins) [Click "down" next]
*Turn this in for your Tutorial Activity mark*
> - Question credit for attempting to provide an answer: answers will not be reviewed in detail during marking   

0. What's your name?
1. What is the difference between a statistic and a parameter?  
2. What is the purpose of bootstrapping?
3. When bootstrapping a sampling distribution, should we sample with replacement? <!-- Yes -->
4. Describe what the code below does. Assume `x` is a numpy array of numbers.
```python
boot_sample = np.random.choice(x, 50)
```
5. Suppose we want to simulate bootstrap samples and calculate the median of the original sample `orig_sample`. Assume that the size of `orig_sample` is 22. Should we set the size of each bootstrapped sample to be less than, greater than, or equal to 22? <!-- 22 -->
6. Write code to return the 80% confidence interval. Assume we already have the bootstrapped medians in `boot_medians`. <!--np.quantile(boot_medians, (0.1, 0.9)) -->
7. Suppose our 80% confidence interval for the code in `Q5` was (0.62, 0.93). Interpret its meaning.
8. If we want to be more confident in capturing the true median, should we use a wider or narrower confidence interval?
9. If we repeat the bootstrapping procedure 100 times, how many times would we expect to return a 80% confidence interval that captures the true median?

# Tutorial Activity: Quiz Review (10 mins)

- Review the quiz and address any open questions or concerns

1. A statistic is calculated from a sample, whereas a parameter is calculated from a population
2. To estimate parameters and their sampling distributions from a sample
3. Yes, to get variability in bootstrap samples
4. It generates a bootstrap sample of 50 from the "population" `x`, with replacement
5. Equal to 22, we want the size of bootstrap samples to match the size of the original sample
6. `np.quantile(boot_medians, (0.1, 0.9))
7. There is an 80% chance that this 80% confidence interval construction procedure captured the true median
8. Wider
9. Close to 80

# Hedging (10 mins)
Hedging is helpful whenever you can’t say something is 100% one way or another, as is often the case.
In statistics, hedging should always be used with respect to the limitations of data and the strength and
generalizability of the conclusions.

Play this video for students: [https://web.microsoftstream.com/video/22f20d20-f096-4934-bfb4-86c0caf9da85](https://web.microsoftstream.com/video/22f20d20-f096-4934-bfb4-86c0caf9da85)

> We hope a **sample** is representative of a **population**; but, small sample sizes mean generalizations -- such as the accuracy of **sample statistics** estimating **population parameters** -- should be viewed cautiously and not be used overconfidently

# Practice/Discussion (20-30 mins)
not sure what to put here, possibly exploring how well samples approximate populations, although that was covered pretty well in the lecture

## Tutorial Assignment (get started...) *[next click is down not right]*

- Submit your work for the assignment through Quercus

- Don't spend more than 60 minutes on this assignment (unless really needed...)    

    - Aim for something close to 200 to 500 words
    - Grammar is *not* the main focus of the assessment, but it is important that you communicate in a clear and professional manner; so, 
        - use full sentences (without slang or emojis) 
    




## Tutorial Assignment *[next click is down not right]* 

### <div align="center"> Vocabulary:</div>

| | | |
|-|-|-|
|Parameter |Statistic |Population |        
|Sample |Sampling distribution |Random sampling |     
|Resampling |Bootstrap |Percentile (quantile) |         
|Confidence interval |Confidence level |Estimation |      
|Representative  | | | 

## Tutorial Assignment (complete at home if needed)

You are once again chatting on the phone to your friend. Your friend enjoyed your previous conversation about data visualization so much that your friend asked you if you had learned anything new in your STA130 course. You decided to tell them about the fancy new technique you just learned: <mark>bootstrapping</mark>! Be sure to include at least 2 vocabulary words from this week and explain them in simple terms for a lay audience.