This module provides a simple function for bootstrap confidence intervals on a list of data sampled from an unknown population.
The package is available on PyPI and can be installed using pip
.
$ pip install pybootstrap
As a general recommendation I suggest using virtualenv
to keep your main
Python environment clean from auxiliary packages.
The main function provided is the bootstrap
function to be invoked as shown below:
import pybootstrap as pb
pb.bootstrap(dataset, confidence=0.95, iterations=10000, sample_size=1.0, statistic=numpy.mean)
dataset
: a list of values, each a sample from an unknown populationconfidence
: the confidence value (a float between 0 and 1.0). Default is 0.95.iterations
: the number of iterations of resampling to perform. Default is 10000.sample_size
: the sample size for each of the resampled (a float between 0 ` and 1.0 for 0 to 100% of the original data size). Default is 1.0.statistic
: the statistic to use. This must be a function that accepts a list of values and returns a single value. Default is numpy.mean.
The function returns the upper and lower values of the confidence interval for the given dataset.
For a quick test, you can use:
import pybootstrap as pb
pb.test()
The example function that is executed is shown below:
def generate_random(n_values, min_value, max_value):
"""
Generate an array of random values for testing.
Args:
n_values: The number of random values to generate
min_value: Define the lower bound of the range to use
max_value: Define the upper bound of the range to use
Returns:
A list containing 'n_values' random values in the range
between 'min_value' and 'max_value'
"""
return sample(range(min_value, max_value), n_values)
# Generate some random data
data = generate_random(1000, 1, 10000)
# Generate confidence intervals on the mean
confidence = 0.95
iterations = 1000
sample_size = 1.0
statistic = np.mean
lower, upper = bootstrap(data,
confidence=confidence,
iterations=iterations,
sample_size=sample_size,
statistic=statistic)
print('Performed {} iterations (each with {}% original sample length)'.format(
iterations, sample_size*100))
print('{:3.1f}% confidence interval ({:s}):'.format(
confidence*100, statistic.__name__))
print('lower: {:.1f}'.format(lower))
print('upper: {:.1f}'.format(upper))
print('observed: {:.1f}'.format(np.mean(data)))