### All of Statistics - Chapter 8 Exercise 1
Consider the data in Example 8.6. Find the plug-in estimate of the correlation coefficient. Estimate the standard error using the bootstrap. Find a 95 percent confidence interval using the Normal, pivotal, and percentile methods.

In [13]:
import numpy as np
from scipy.stats import norm
import plotly.express as px
import plotly.graph_objects as go

lsat_sample = np.array([
    576, 635, 558, 578, 666, 580, 555, 661, 651, 605, 653, 575, 545, 572, 594
])
gpa_sample = np.array([
    3.39, 3.30, 2.81, 3.03, 3.44, 3.07, 3.00, 3.43, 3.36, 3.13, 3.12, 2.74, 2.76, 2.88, 3.96
])

Consider the data in Example 8.6.
## Requirements:
- Find the plug-in estimate of the correlation coefficient
- Estimate the standard error using the bootstrap
- Find a 95 percent confidence interval using the following methods:
    - Normal
    - Pivotal
    - Percentile

### Task: Find the plug-in estimate of the correlation coefficient

In [14]:
def pearson_correlation(X, Y):
    x_mean, y_mean = np.mean(X), np.mean(Y)
    x_var, y_var = np.sum((X - x_mean)**2), np.sum((Y - y_mean)**2)
    covariance = np.sum((X - x_mean) * (Y - y_mean))
    normalization_factor = np.sqrt(x_var * y_var)
    return covariance / normalization_factor

r_hat = pearson_correlation(gpa_sample, lsat_sample)
print(f'Plug-in estimate for r: r_hat = {r_hat}')

Plug-in estimate for r: r_hat = 0.5459189161795885


### Task: Estimate the standard error using the bootstrap

In [15]:
bootstrap_repetitions = 9999
bootstrap_estimations = list()

for i in range(bootstrap_repetitions):
    lsat_bs_sample = np.random.choice(lsat_sample, len(lsat_sample), replace=True)
    gpa_bs_sample = np.random.choice(gpa_sample, len(gpa_sample), replace=True)
    bootstrap_estimations.append(pearson_correlation(lsat_bs_sample, gpa_bs_sample))

se_hat = np.array(bootstrap_estimations).std()
print(f'Bootstrap std. error estimate = {se_hat}')

Bootstrap std. error estimate = 0.26162850994014897


### Task: Find a 95 percent confidence interval using the Normal method

In [16]:
alpha = 0.05
z = norm.ppf(1-alpha/2)
normal_upper_bound = r_hat + se_hat * z
normal_lower_bound = r_hat - se_hat * z
print(f'Normal method CI:({normal_lower_bound}, {normal_upper_bound})')

Normal method CI:(0.03313645936801701, 1.05870137299116)


### Task: Find a 95 percent confidence interval using the Pivotal method

In [17]:
alpha = 0.05
bootstrap_estimations = np.sort(bootstrap_estimations)
percentile_upper_bound = np.quantile(bootstrap_estimations, 1 - alpha/2)
percentile_lower_bound = np.quantile(bootstrap_estimations, alpha/2)
print(f'Percentile method CI: ({percentile_lower_bound}, {percentile_upper_bound})')

Percentile method CI: (-0.49595646920490705, 0.5062498146199815)


### Task: Find a 95 percent confidence interval using the Percentile method

In [18]:
alpha = 0.05
bootstrap_estimations = np.sort(bootstrap_estimations)
pivotal_lower_bound = 2*r_hat - np.quantile(bootstrap_estimations, 1-alpha/2)
pivotal_upper_bound = 2*r_hat - np.quantile(bootstrap_estimations, alpha/2)
print(f'Pivotal method CI: ({pivotal_lower_bound}, {pivotal_upper_bound})')

Pivotal method CI: (0.5855880177391956, 1.587794301564084)


In [20]:
print(f'Normal method CI:({normal_lower_bound}, {normal_upper_bound})')
print(f'Percentile method CI: ({percentile_lower_bound}, {percentile_upper_bound})')
print(f'Pivotal method CI: ({pivotal_lower_bound}, {pivotal_upper_bound})')

Normal method CI:(0.03313645936801701, 1.05870137299116)
Percentile method CI: (-0.49595646920490705, 0.5062498146199815)
Pivotal method CI: (0.5855880177391956, 1.587794301564084)
