### All of Statistics - Chapter 8 Exercise 7

Let X1, ...,Xn ∼ Uniform(0, θ). Let 1θ= Xmax = max{X1, ...,Xn}. Generate a data set of size 50 with θ = 1.
(a) Find the distribution of 1θ. Compare the true distribution of 1θ to the histograms from the bootstrap.
b) This is a case where the bootstrap does very poorly. In fact, we can prove that this is the case. Show that P(1θ = 1θ) = 0 and yet P(1θ∗ =
θ1) ≈ .632. Hint: show that, P(1θ∗ = 1θ) = 1 − (1 − (1/n))^n then take the limit as n gets large.

In [26]:
import numpy as np
from scipy.stats import norm, t, uniform
import plotly.express as px
import plotly.graph_objects as go

In [27]:
T_of_F =  lambda sample_: np.max(sample_)
n = 50
theta = 1
sample = uniform.rvs(0,theta,n)
px.histogram(sample, nbins = 40, title = 'Sample Distribution')

In [28]:
bootstrap_repetitions = 1000
bootstrap_estimations = list()
for i in range(bootstrap_repetitions):
    bootstrap_sample = np.random.choice(sample, size = len(sample), replace = True)
    bootstrap_estimations.append(T_of_F(bootstrap_sample))

bootstrap_estimations = np.sort(bootstrap_estimations)
theta_hat = T_of_F(sample)
se_hat = np.array(bootstrap_estimations).std()
print(f'theta_hat = {theta_hat}')
print(f'se_hat = {se_hat}')

# Normal method
alpha = 0.05
z = norm.ppf(1-alpha/2)
normal_upper_bound = theta_hat + se_hat * z
normal_lower_bound = theta_hat - se_hat * z
print(f'Normal method CI:({normal_lower_bound}, {normal_upper_bound})')

# Percentile method
alpha = 0.05
percentile_upper_bound = np.quantile(bootstrap_estimations, 1 - alpha/2)
percentile_lower_bound = np.quantile(bootstrap_estimations, alpha/2)
print(f'Percentile method CI: ({percentile_lower_bound}, {percentile_upper_bound})')

# Pivotal method
alpha = 0.05
pivotal_lower_bound = 2*theta_hat - np.quantile(bootstrap_estimations, 1-alpha/2)
pivotal_upper_bound = 2*theta_hat - np.quantile(bootstrap_estimations, alpha/2)
print(f'Pivotal method CI: ({pivotal_lower_bound}, {pivotal_upper_bound})')

theta_hat = 0.9977944257829237
se_hat = 0.01192555260805425
Normal method CI:(0.9744207721753996, 1.0211680793904476)
Percentile method CI: (0.985103676015227, 0.9977944257829237)
Pivotal method CI: (0.9977944257829237, 1.0104851755506203)


In [29]:
px.histogram(
    bootstrap_estimations,
    title = 'Histogram of the bootstrap replications of theta',
)

In [34]:
x_axis = np.linspace(0, 1.5, 1000)
theta_cdf = lambda x: x**n / theta**n if x <= 1 else 1
theta_real_cdf = list(map(theta_cdf, x_axis))
theta_real_pdf = np.append([0], np.diff(theta_real_cdf))
fig = px.line(
    x = x_axis, y = theta_real_pdf,
    title = 'True distribution function for theta_hat'
)
fig.show()

In [35]:
x_axis_bar = np.linspace(0,1.5,100)
bootstrap_hist, _ = np.histogram(bootstrap_estimations, bins = 100, range = [0,1.5])
bootstrap_hist = bootstrap_hist / sum(bootstrap_hist)
fig = px.bar(
    x = x_axis_bar, y = bootstrap_hist,
    title = 'Comparison of the true distribution and the bootstrap replications'
)
fig.add_trace(go.Scatter(
    x = x_axis, y = theta_real_pdf,
))
fig.show()