# Data Description
- Simulated dataset with 100 returns for assets X and Y.
- Estimates optimal investment fraction to minimize portfolio risk.
- Bootstrap estimates standard error.
- There are 2 variables:
    - X: Returns for Asset X
    - Y: Returns for Asset Y

# Load Packages and Data

In [2]:
import numpy as np, statsmodels.api as sm
from ISLP import load_data
from ISLP.models import (ModelSpec as MS,
                         summarize, poly, sklearn_sm)
from sklearn.model_selection import train_test_split
from functools import partial
from sklearn.model_selection import \
     (train_test_split, cross_validate, KFold, ShuffleSplit)
from sklearn.base import clone

In [3]:
Portfolio = load_data('Portfolio')
Portfolio.head()

Unnamed: 0,X,Y
0,-0.895251,-0.234924
1,-1.562454,-0.885176
2,-0.41709,0.271888
3,1.044356,-0.734198
4,-0.315568,0.841983


In [4]:
Portfolio.describe().round(1)

Unnamed: 0,X,Y
count,100.0,100.0
mean,-0.1,-0.1
std,1.1,1.1
min,-2.4,-2.7
25%,-0.9,-0.9
50%,-0.3,-0.2
75%,0.6,0.8
max,2.5,2.6


# The Bootstrap
## Estimating the Accuracy of a Statistic of Interest
- The bootstrap is versatile and easy to use, requiring no complex math. You can implement it in Python to estimate standard error, even with data in a dataframe.

- We'll demonstrate with a simple example using the `Portfolio` data set to estimate the sampling variance of $\alpha$. We'll create `alpha_func()`, which takes a dataframe `D` with `X` and `Y` columns and an index vector `idx` to compute $\alpha$.

In [5]:
def alpha_func(D, idx):
    # Calculate the covariance matrix for columns 'X' and 'Y' of DataFrame D, 
    # considering only the rows specified by idx. 
    # The rowvar=False indicates that each column represents a variable.
    # The rowvar=True indicates that each row represents a variable.
    cov_ = np.cov(D[['X', 'Y']].loc[idx], rowvar=False)
    
    # Compute the alpha value using the elements of the covariance matrix.
    # The formula is: (variance of Y - covariance of X and Y) /
    #                 (variance of X + variance of Y - 2 * covariance of X and Y)
    return ((cov_[1, 1] - cov_[0, 1]) /
            (cov_[0, 0] + cov_[1, 1] - 2 * cov_[0, 1]))

- The function uses the minimum variance formula to estimate $\alpha$ based on `idx`. For example, it estimates $\alpha$ using all 100 observations.

In [9]:
alpha_func(Portfolio, range(100))

0.57583207459283

- We then randomly select 100 observations with replacement from `range(100)`, forming a bootstrap data set to recompute $\hat{\alpha}$.

In [18]:
rng = np.random.default_rng(0)
alpha_func(Portfolio,
           rng.choice(100, 100, replace=True))

0.6074452469619002

- This can be generalized with `boot_SE()` to compute bootstrap standard errors for functions using a dataframe.

In [18]:
def boot_SE(func, D, n=None, B=1000, seed=0):
    rng = np.random.default_rng(seed)
    # Initialize variables to accumulate the first and second moments
    first_, second_ = 0, 0
    # If n is not specified, use the number of rows in DataFrame D
    n = n or D.shape[0]
    # Perform B bootstrap iterations
    for _ in range(B):
        # Randomly sample n indices from D with replacement
        idx = rng.choice(D.index, n, replace=True)
        
        # Apply the provided function to the sampled data
        value = func(D, idx)
        
        # Accumulate the first and second moments
        first_ += value
        second_ += value**2
    
    # Compute and return the bootstrap standard error
    return np.sqrt(second_ / B - (first_ / B)**2)

- The `_` variable is used in `for _ in range(B)` to run a loop `B` times, ignoring the counter.

- Use this approach to evaluate $\alpha$'s accuracy with $B=1{,}000$ bootstrap replications.

In [19]:
alpha_SE = boot_SE(alpha_func, Portfolio, B=1000, seed=0)
alpha_SE

0.09118176521277699

- The final output shows that the bootstrap estimate for ${\rm SE}(\hat{\alpha})$ is $0.0912$.