1
The standard deviation tells us how much individual data points differ from the mean, showing the spread or variability in the dataset. It’s about how scattered the data is. The standard error of the mean tells us how accurate the sample mean is in estimating the true population mean. It decreases as the sample size gets larger, meaning the bigger the sample, the more reliable the mean. So, standard deviation is about the data’s spread, while the standard error is about how trustworthy the mean is

2
First, find the mean of the sample data.
Then, calculate the standard error of the mean (SEM), which shows how much the sample mean might change when took different samples.
Multiply the SEM by about 2 (technically 1.96) to get the margin of error for a 95% confidence interval.
Finally, add and subtract this margin from the mean to get a range — that’s the confidence interval.


3
Resample your data: Take many bootstrap samples from your original dataset. For each sample, calculate the mean. Repeat this process a lot of times to get a large set of bootstrapped sample means.
Sort the means: Arrange the bootstrapped sample means from smallest to largest.
Find the percentiles: Look for the 2.5th percentile and the 97.5th percentile in the sorted list of means. These values will mark the lower and upper limits of your 95% confidence interval.
Report the interval: The range between these two percentiles is your 95% bootstrapped confidence interval, which captures the middle 95% of the bootstrapped means.


In [None]:
4
import numpy as np

# Sample data
data = [15, 20, 25, 30, 35, 40, 45]

# Number of bootstrap samples
n_bootstrap = 1000

# Store the bootstrap means
bootstrap_means = []

for _ in range(n_bootstrap):
    bootstrap_sample = np.random.choice(data, size=len(data), replace=True)
    sample_mean = np.mean(bootstrap_sample)
    bootstrap_means.append(sample_mean)

bootstrap_means.sort()

lower_bound = np.percentile(bootstrap_means, 2.5)
upper_bound = np.percentile(bootstrap_means, 97.5)

print(f"95% Bootstrap Confidence Interval for the Mean: [{lower_bound}, {upper_bound}]")

5
We distinguish between the population parameter and the sample statistic because the parameter is the true value we're estimating, while the statistic is our estimate from the sample. The confidence interval uses the sample statistic to give a range where the population parameter likely falls, accounting for uncertainty.

6
Bootstrapping is when you take your sample data, randomly pick values from it with replacement, which means you might pick the same value more than once, and create many new samples. You then use these samples to estimate things like the average or median to see how they might vary.
The main purpose is to estimate how accurate or reliable your sample data is at representing the whole population, especially when you don’t have a lot of data. It helps you understand the uncertainty in your estimates.
If you have a guess for the average (like 75), you can use bootstrapping to generate lots of new sample averages. Then, see if your guess falls within the range of those bootstrapped averages. If it does, your guess might be plausible; if it’s outside that range, your guess might be off.


7
A confidence interval overlapping zero means that zero is a possible value for the true effect of the drug in the population. This suggests the drug might not have any effect, so we "fail to reject the null hypothesis" because the data doesn't give us strong enough evidence to prove otherwise.
However, if the confidence interval doesn't include zero, it means the data provides stronger evidence that the true effect is different from zero. In that case, we would "reject the null hypothesis" and conclude that the drug likely has an effect.
In summary, if zero is within the confidence interval, we can’t confidently say the drug works. If zero isn’t in the interval, we have enough evidence to say it does.


In [None]:
8
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt


data = pd.DataFrame({
    "PatientID": [1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
    "InitialHealthScore": [84, 78, 83, 81, 81, 80, 79, 85, 76, 83],
    "FinalHealthScore": [86, 86, 80, 86, 84, 86, 86, 82, 83, 84]
})

np.random.seed(42)

observed_diff = data['FinalHealthScore'] - data['InitialHealthScore']

n_bootstrap = 1000
bootstrap_means = []

for _ in range(n_bootstrap):
    resample_diff = np.random.choice(observed_diff, size=len(observed_diff), replace=True)
    bootstrap_means.append(np.mean(resample_diff))

lower_bound = np.percentile(bootstrap_means, 2.5)
upper_bound = np.percentile(bootstrap_means, 97.5)

print(f"95% Bootstrap Confidence Interval: [{lower_bound}, {upper_bound}]")

plt.hist(bootstrap_means, bins=30, edgecolor='black')
plt.axvline(x=lower_bound, color='red', linestyle='--', label=f'Lower Bound: {lower_bound}')
plt.axvline(x=upper_bound, color='red', linestyle='--', label=f'Upper Bound: {upper_bound}')
plt.title('Bootstrap Distribution of Health Score Differences')
plt.xlabel('Mean Difference in Health Scores')
plt.ylabel('Frequency')
plt.legend()
plt.show()

Yes