Skip to content

Bootstrap confidence intervals for Python

License

Notifications You must be signed in to change notification settings

mvanga/pybootstrap

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Bootstrap Confidence Intervals

This module provides a simple function for bootstrap confidence intervals on a list of data sampled from an unknown population.

Installation

The package is available on PyPI and can be installed using pip.

$ pip install pybootstrap

As a general recommendation I suggest using virtualenv to keep your main Python environment clean from auxiliary packages.

Usage

The main function provided is the bootstrap function to be invoked as shown below:

import pybootstrap as pb

pb.bootstrap(dataset, confidence=0.95, iterations=10000, sample_size=1.0, statistic=numpy.mean)
  • dataset : a list of values, each a sample from an unknown population
  • confidence : the confidence value (a float between 0 and 1.0). Default is 0.95.
  • iterations : the number of iterations of resampling to perform. Default is 10000.
  • sample_size: the sample size for each of the resampled (a float between 0 ` and 1.0 for 0 to 100% of the original data size). Default is 1.0.
  • statistic : the statistic to use. This must be a function that accepts a list of values and returns a single value. Default is numpy.mean.

The function returns the upper and lower values of the confidence interval for the given dataset.

For a quick test, you can use:

import pybootstrap as pb
pb.test()

The example function that is executed is shown below:

def generate_random(n_values, min_value, max_value):
    """
    Generate an array of random values for testing.

    Args:
        n_values: The number of random values to generate
        min_value: Define the lower bound of the range to use
        max_value: Define the upper bound of the range to use

    Returns:
        A list containing 'n_values' random values in the range
        between 'min_value' and 'max_value'
    """
    return sample(range(min_value, max_value), n_values)

# Generate some random data
data = generate_random(1000, 1, 10000)
# Generate confidence intervals on the mean
confidence = 0.95
iterations = 1000
sample_size = 1.0
statistic = np.mean
lower, upper = bootstrap(data,
                            confidence=confidence,
                            iterations=iterations,
                            sample_size=sample_size,
                            statistic=statistic)

print('Performed {} iterations (each with {}% original sample length)'.format(
    iterations, sample_size*100))
print('{:3.1f}% confidence interval ({:s}):'.format(
    confidence*100, statistic.__name__))
print('lower: {:.1f}'.format(lower))
print('upper: {:.1f}'.format(upper))
print('observed: {:.1f}'.format(np.mean(data)))

About

Bootstrap confidence intervals for Python

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages