1, 
Standard deviation measures the spread of individual data points. Standard error of the mean measures the spread of sample means, indicating the precision of the sample mean as an estimate of the population mean.

2, 
Start with the sample mean from your data.

Multiply the SEM by 1.96 (since for a normal distribution, 95% of values lie within 1.96 standard deviations from the mean).

Add and subtract this value from the sample mean to create the confidence interval.

3,
Generate a large number of bootstrapped samples (resamples with replacement) from your original dataset, each of the same size as the original sample.

Calculate the mean for each of these bootstrapped samples to build a distribution of bootstrapped means.

Use the np.quantile(...) function to find the 2.5th percentile and the 97.5th percentile of the bootstrapped means. These percentiles give the lower and upper bounds of the 95% confidence interval.

In [1]:
4,
import numpy as np

# Sample data (you can replace this with your actual sample)
data = [12, 15, 14, 10, 18, 16, 17, 19, 21, 13]

# Number of bootstrap samples
n_bootstrap = 1000

# Function to calculate bootstrap means (this can be changed to median)
def bootstrap_sample_stat(data, n_bootstrap, statistic_function):
    bootstrap_stats = []
    n = len(data)
    
    for _ in range(n_bootstrap):
        # Create a bootstrap sample (sampling with replacement from the original data)
        bootstrap_sample = np.random.choice(data, size=n, replace=True)
        
        # Calculate the statistic (mean or median) for this bootstrap sample
        bootstrap_stat = statistic_function(bootstrap_sample)
        
        # Store the statistic for this sample
        bootstrap_stats.append(bootstrap_stat)
    
    return bootstrap_stats

# Get bootstrap means
bootstrap_means = bootstrap_sample_stat(data, n_bootstrap, np.mean)

# Calculate 95% confidence interval (2.5th and 97.5th percentiles of bootstrap means)
confidence_interval_mean = np.quantile(bootstrap_means, [0.025, 0.975])

# Print the confidence interval for the mean
print(f"95% Bootstrap Confidence Interval for the Mean: {confidence_interval_mean}")

# To calculate a 95% bootstrap confidence interval for the median, change np.mean to np.median
bootstrap_medians = bootstrap_sample_stat(data, n_bootstrap, np.median)
confidence_interval_median = np.quantile(bootstrap_medians, [0.025, 0.975])

# Print the confidence interval for the median
print(f"95% Bootstrap Confidence Interval for the Median: {confidence_interval_median}")


95% Bootstrap Confidence Interval for the Mean: [13.5    17.5025]
95% Bootstrap Confidence Interval for the Median: [13. 18.]


Session Summary: In this session, we explored several concepts related to bootstrapping and confidence intervals:

Difference between standard deviation and standard error: We discussed how standard deviation measures the spread of individual data points, while the standard error measures the spread of sample means (i.e., how much the sample mean is expected to vary from the true population mean).

Using the standard error for a 95% confidence interval: We explained how to use the standard error of the mean to create a 95% confidence interval by multiplying the SEM by 1.96 and adding/subtracting this from the sample mean.

Bootstrapped confidence intervals using percentiles: We discussed how to calculate a 95% bootstrapped confidence interval directly from bootstrapped sample means by using the 2.5th and 97.5th percentiles of the bootstrapped means, which provides an exact 95% interval.

Code to produce a 95% bootstrap confidence interval: We developed Python code to calculate a 95% bootstrap confidence interval for the mean. We also explained how to adapt this code to calculate a 95% confidence interval for the median or other statistics by simply changing the function used to calculate the statistic (e.g., np.mean to np.median).

Link: https://chatgpt.com/c/66ff4e7e-9528-8000-8e34-b7db6ce9c07a

5,
We need to distinguish between the population parameter and the sample statistic because a confidence interval is used to estimate the unknown population parameter based on the sample statistic. The sample statistic is calculated from the data we have, while the population parameter represents the true, but often unknown, value for the entire population. Confidence intervals provide a range of plausible values for the population parameter, acknowledging that the sample statistic may vary due to random sampling error.

6，
What is the process of bootstrapping?

Okay, so imagine you have a small sample of data, like you asked 20 friends how many hours they sleep each night. Bootstrapping is a clever trick we can use to understand more about this sample without needing more data. What you do is take that sample and create a bunch of new “fake” samples by randomly picking data points from your original set over and over, even allowing repeats. So, you might grab 10 hours of sleep from one friend several times, while skipping others in some samples. It’s like making new “what-if” scenarios using the same data. Then, you calculate whatever statistic you care about (like the average sleep) for each of these new samples to get a sense of the variation in your estimates.

What is the main purpose of bootstrapping?

The main point of bootstrapping is to help us understand the uncertainty in our estimate. It’s useful when we don’t have tons of data but still want to make guesses about the overall population. So, instead of relying on one single number from our original data, we get lots of “what-if” versions, which give us a clearer picture of how much our results might change if we could take many samples from the population.

How could you use bootstrapping to assess if your hypothesized guess about the average is plausible?

Let’s say you think the average number of hours people sleep in the whole population is 8 hours, but from your sample, you’re not quite sure. Here’s where bootstrapping helps! You’d take your sample, use the bootstrapping method to create lots of new samples (like we talked about earlier), and calculate the average sleep time for each one. This gives you a range of possible averages based on your sample data.

Now, you check to see if your guess of 8 hours falls inside this range. If the bootstrapped averages are often close to 8, it suggests your guess might be reasonable. But, if most of the bootstrapped averages are far away from 8, it means your guess is probably off. In short, bootstrapping helps you see if the data you have lines up with your hypothesis.

7，
When we talk about confidence intervals, we’re trying to determine a range of values where we think the true population parameter (like an average or mean) might be. If this interval covers zero, it means that "zero effect" (no difference, no impact) is a possibility within that range.

If zero is included in the confidence interval, we can’t reject the null hypothesis because there’s still a chance that the drug has no effect.

If zero isn’t included, we reject the null hypothesis because the data suggests the drug has a measurable effect.

8.
. Problem Introduction:
Null Hypothesis (H₀) Explanation: In this context, the null hypothesis could be that the vaccine has no effect on the health scores, meaning the average change in health score (FinalHealthScore - InitialHealthScore) is zero. If this is true, we would expect no significant improvement or decline in health scores due to the vaccine.

Data Visualization: To illustrate and motivate the comparison, you can create a plot (like a histogram or boxplot) showing the difference between initial and final health scores for each patient. This helps to see visually if there's a pattern of improvement or decline.

2. Quantitative Analysis:
Methodology: You can use bootstrapping to estimate the mean difference between initial and final health scores. By resampling the data with replacement multiple times, you’ll build a distribution of possible outcomes. This allows you to compute a confidence interval for the mean difference.

You would want to calculate:

The mean difference between the initial and final health scores.
The bootstrapped confidence intervals.
Supporting Visualizations: Use bootstrapped data to show the distribution of mean differences, including the confidence interval. This could be visualized through a density plot or similar.

3. Findings and Discussion:
Conclusion Regarding the Null Hypothesis: Based on the results of the bootstrapping, you’ll determine if the null hypothesis of "no effect" can be rejected. If the confidence interval does not include zero, you would reject the null hypothesis, indicating the vaccine likely has an effect. If the confidence interval includes zero, you fail to reject the null, suggesting that the vaccine might not have a significant effect.

Further Considerations: Consider other factors that could influence the results, such as patient demographics (age, gender) and whether they may have an impact on the health score changes.

9,
Yes.