<div class='alert alert-warning'>

SciPy's interactive examples with Jupyterlite are experimental and may not always work as expected. Execution of cells containing imports may result in large downloads (up to 60MB of content for the first import from SciPy). Load times when importing from SciPy may take roughly 10-20 seconds. If you notice any problems, feel free to open an [issue](https://github.com/scipy/scipy/issues/new/choose).

</div>

Suppose we have sampled data from an unknown distribution.


In [None]:
import numpy as np
rng = np.random.default_rng()
from scipy.stats import norm
dist = norm(loc=2, scale=4)  # our "unknown" distribution
data = dist.rvs(size=100, random_state=rng)

We are interested in the standard deviation of the distribution.


In [None]:
std_true = dist.std()      # the true value of the statistic
print(std_true)

4.0

In [None]:
std_sample = np.std(data)  # the sample statistic
print(std_sample)

3.9460644295563863

The bootstrap is used to approximate the variability we would expect if we
were to repeatedly sample from the unknown distribution and calculate the
statistic of the sample each time. It does this by repeatedly resampling
values *from the original sample* with replacement and calculating the
statistic of each resample. This results in a "bootstrap distribution" of
the statistic.


In [None]:
import matplotlib.pyplot as plt
from scipy.stats import bootstrap
data = (data,)  # samples must be in a sequence
res = bootstrap(data, np.std, confidence_level=0.9,
                random_state=rng)
fig, ax = plt.subplots()
ax.hist(res.bootstrap_distribution, bins=25)
ax.set_title('Bootstrap Distribution')
ax.set_xlabel('statistic value')
ax.set_ylabel('frequency')
plt.show()

The standard error quantifies this variability. It is calculated as the
standard deviation of the bootstrap distribution.


In [None]:
res.standard_error

0.24427002125829136

In [None]:
res.standard_error == np.std(res.bootstrap_distribution, ddof=1)

True

The bootstrap distribution of the statistic is often approximately normal
with scale equal to the standard error.


In [None]:
x = np.linspace(3, 5)
pdf = norm.pdf(x, loc=std_sample, scale=res.standard_error)
fig, ax = plt.subplots()
ax.hist(res.bootstrap_distribution, bins=25, density=True)
ax.plot(x, pdf)
ax.set_title('Normal Approximation of the Bootstrap Distribution')
ax.set_xlabel('statistic value')
ax.set_ylabel('pdf')
plt.show()

This suggests that we could construct a 90% confidence interval on the
statistic based on quantiles of this normal distribution.


In [None]:
norm.interval(0.9, loc=std_sample, scale=res.standard_error)

(3.5442759991341726, 4.3478528599786)

Due to central limit theorem, this normal approximation is accurate for a
variety of statistics and distributions underlying the samples; however,
the approximation is not reliable in all cases. Because `bootstrap` is
designed to work with arbitrary underlying distributions and statistics,
it uses more advanced techniques to generate an accurate confidence
interval.


In [None]:
print(res.confidence_interval)

ConfidenceInterval(low=3.57655333533867, high=4.382043696342881)

If we sample from the original distribution 100 times and form a bootstrap
confidence interval for each sample, the confidence interval
contains the true value of the statistic approximately 90% of the time.


In [None]:
n_trials = 100
ci_contains_true_std = 0
for i in range(n_trials):
   data = (dist.rvs(size=100, random_state=rng),)
   res = bootstrap(data, np.std, confidence_level=0.9,
                   n_resamples=999, random_state=rng)
   ci = res.confidence_interval
   if ci[0] < std_true < ci[1]:
       ci_contains_true_std += 1
print(ci_contains_true_std)

88

Rather than writing a loop, we can also determine the confidence intervals
for all 100 samples at once.


In [None]:
data = (dist.rvs(size=(n_trials, 100), random_state=rng),)
res = bootstrap(data, np.std, axis=-1, confidence_level=0.9,
                n_resamples=999, random_state=rng)
ci_l, ci_u = res.confidence_interval

Here, `ci_l` and `ci_u` contain the confidence interval for each of the
``n_trials = 100`` samples.


In [None]:
print(ci_l[:5])

[3.86401283 3.33304394 3.52474647 3.54160981 3.80569252]

In [None]:
print(ci_u[:5])

[4.80217409 4.18143252 4.39734707 4.37549713 4.72843584]

And again, approximately 90% contain the true value, ``std_true = 4``.


In [None]:
print(np.sum((ci_l < std_true) & (std_true < ci_u)))

93

`bootstrap` can also be used to estimate confidence intervals of
multi-sample statistics. For example, to get a confidence interval
for the difference between means, we write a function that accepts
two sample arguments and returns only the statistic. The use of the
``axis`` argument ensures that all mean calculations are perform in
a single vectorized call, which is faster than looping over pairs
of resamples in Python.


In [None]:
def my_statistic(sample1, sample2, axis=-1):
    mean1 = np.mean(sample1, axis=axis)
    mean2 = np.mean(sample2, axis=axis)
    return mean1 - mean2

Here, we use the 'percentile' method with the default 95% confidence level.


In [None]:
sample1 = norm.rvs(scale=1, size=100, random_state=rng)
sample2 = norm.rvs(scale=2, size=100, random_state=rng)
data = (sample1, sample2)
res = bootstrap(data, my_statistic, method='basic', random_state=rng)
print(my_statistic(sample1, sample2))

0.16661030792089523

In [None]:
print(res.confidence_interval)

ConfidenceInterval(low=-0.29087973240818693, high=0.6371338699912273)

The bootstrap estimate of the standard error is also available.


In [None]:
print(res.standard_error)

0.238323948262459

Paired-sample statistics work, too. For example, consider the Pearson
correlation coefficient.


In [None]:
from scipy.stats import pearsonr
n = 100
x = np.linspace(0, 10, n)
y = x + rng.uniform(size=n)
print(pearsonr(x, y)[0])  # element 0 is the statistic

0.9954306665125647

We wrap `pearsonr` so that it returns only the statistic, ensuring
that we use the `axis` argument because it is available.


In [None]:
def my_statistic(x, y, axis=-1):
    return pearsonr(x, y, axis=axis)[0]

We call `bootstrap` using ``paired=True``.


In [None]:
res = bootstrap((x, y), my_statistic, paired=True, random_state=rng)
print(res.confidence_interval)

ConfidenceInterval(low=0.9941504301315878, high=0.996377412215445)

The result object can be passed back into `bootstrap` to perform additional
resampling:


In [None]:
len(res.bootstrap_distribution)

9999

In [None]:
res = bootstrap((x, y), my_statistic, paired=True,
                n_resamples=1000, random_state=rng,
                bootstrap_result=res)
len(res.bootstrap_distribution)

10999

or to change the confidence interval options:


In [None]:
res2 = bootstrap((x, y), my_statistic, paired=True,
                 n_resamples=0, random_state=rng, bootstrap_result=res,
                 method='percentile', confidence_level=0.9)
np.testing.assert_equal(res2.bootstrap_distribution,
                        res.bootstrap_distribution)
res.confidence_interval

ConfidenceInterval(low=0.9941574828235082, high=0.9963781698210212)

without repeating computation of the original bootstrap distribution.
