# Bootstrap and Jackknife comparison

In this notebook we compare the bootstrap to the jackknife. Bootstrap resampling is superior to jackknifing, but the jackknife is deterministic, which may be helpful, and it can exactly remove biases of order 1/N from an estimator (the bootstrap removes biases of higher orders, too, but it does not remove the lowest order exactly).

In [1]:
from resample.jackknife import variance as j_var, resample as j_resample, bias as j_bias
from resample.bootstrap import resample as b_resample
import numpy as np
from scipy import stats

In [2]:
rng = np.random.default_rng(1)
data = rng.normal(size=20)


# get mean and std deviation
def fn(d):
    return np.mean(d), np.var(d, ddof=0) # we return the biased variance


# compute bias with the bootstrap
def b_bias(fn, sample):
    theta = fn(sample)
    b_rep = [fn(s) for s in b_resample(sample, random_state=rng)]
    return np.mean(b_rep, axis=0) - theta


# compute variance with bootstrap
def b_var(fn, sample):
    b_rep = [fn(s) for s in b_resample(sample, random_state=rng)]
    return np.var(b_rep, axis=0)

# exact bias for biased standard deviation
# - we computed: s = 1/N * sum(x ** 2 - np.mean(x) ** 2)
# - correct is:  N/(N-1) * s
# - bias is: (1 - N/(N-1)) * s = (N - 1 - N) / (N - 1) * s = - 1 / (N - 1) * s


print("theta               ", np.round(fn(data), 3))
print("std.dev. (jackknife)", np.round(j_var(fn, data) ** 0.5, 3))
print("std.dev. (bootstrap)", np.round(b_var(fn, data) ** 0.5, 3))
print("bias (jackknife)    ", np.round(j_bias(fn, data), 3))
print("bias (bootstrap)    ", np.round(b_bias(fn, data), 3))
print("bias (exact)        ", np.round((0, -1 / (len(data) - 1) * fn(data)[1]), 3))

theta                [0.037 0.333]
std.dev. (jackknife) [0.132 0.098]
std.dev. (bootstrap) [0.126 0.089]
bias (jackknife)     [-0.    -0.018]
bias (bootstrap)     [ 0.    -0.016]
bias (exact)         [ 0.    -0.018]


The standard deviations for the estimates computed by bootstrap and jackknife differ by about 10 %. This difference shrinks for larger data sets.

Both resampling methods find no bias for the mean, and a small bias for the (not bias-corrected) variance. The jackknife is getting closer, since the bias for sufficiently large N is dominated by the O(1/N) order that the jackknife removes exactly.