Credit to Prof. Nancy Xu, PhD from Boston College: Data Analytics in Finance Course

Researchers use bootstrapping methodologies to understand how certain we are about $\hat{\theta}_N$. <b>You will bootstrap <font color='blue'>6</font> asset returns, one-by-one, to infer <font color='blue'>how certain we are about each asset's volatility</b> (return volatility=return standard deviation).</font> 

<span style="background-color: #ffff66">Asset returns: QUESTION1DATA.csv</span>

  1. S&P GSCI Commodity Returns<br>
  2. S&P GSCI Gold Returns<br>
  3. MSCI USA Stock Market Returns <br>
  4. MSCI Emerging Market Returns <br>
  5. MSCI Euro Area Market Returns <br>
  6. US 10-Year Government Bond Returns<br>

All asset returns are expressed in current US dollars (source: DataStream; MSCI; Bloomberg). These are monthly returns (not in percentage points); that is, for instance, "0.026561915" in Cell B2 means that changes in the world commodity price from December 1987 to January 1988 $\frac{P_{1988/01}-P_{1987/12}}{P_{1987/12}}*100\%$ is 2.66\%. The data is balanced from January 1988 to December 2020 (N=396 months).  

<span style="background-color: #ffff66">Parameters:</span>
 * Each bootstrapping subsample has <font color='blue'>300</font> observations;
 * We bootstrap for <font color='blue'>10000</font> rounds. 

<span style="background-color: #ffff66">Discussion suggestions:</span>

Here are some questions to think about and eventually help you organize your discussions: 
* What is the sample standard deviation? (You should not use numpy or other canned functions)
* What is the mean of all bootstrapped standard deviations? And how does it compare to the sample standard deviation? 
* Can you compare the (un)certainty across the 6 asset volatilities?
* When you change some parameters in the boostrapping procedure, do you see different results? Why?
* Your own topic


<span style="background-color: #ffff66">Editorial suggestion:</span>
* Please make your answers and procedures easy to read and follow
* print+format, markdowns, plots... these are all helpful tools. 

-------------

In [5]:
import pandas
import math
import random

data = pandas.read_csv('QUESTION1DATA.csv', header=0,parse_dates=True)
print(data.head())
print(len(data['DATE']))

        DATE  COMMODITY      GOLD  STOCK_US  STOCK_EM  STOCK_EUROPE  \
0  1/29/1988   0.026562 -0.069215  0.042834  0.098380     -0.040626   
1  2/29/1988  -0.011024 -0.053320  0.041869  0.003396      0.059266   
2  3/31/1988   0.050392  0.054769 -0.033456  0.107385      0.027948   
3  4/29/1988   0.031047 -0.006874  0.009184  0.053382      0.019014   
4  5/31/1988   0.035301  0.011578  0.008603  0.029962     -0.018278   

   GOVBOND_US  
0    0.047578  
1    0.009004  
2   -0.019963  
3   -0.013340  
4   -0.013437  
396


In [6]:
portlist = ['COMMODITY', 'GOLD', 'STOCK_US', 'STOCK_EM', 'STOCK_EUROPE', 'GOVBOND_US']

In [7]:
# Step 1: Conduct bootstrapping procedure [10pts]
def bootstrap(q1data,plist,obs,rounds): #input data, portfolio list names, # of observations, and # of rounds 
    sd = []
    sd_total = []
    for asset in plist: 
        for i in range(rounds): 
            sampling = random.sample(list(q1data[asset]),obs) #sampling given parameters of observations and rounds 
            asset_mean = sum(sampling)/len(sampling) 
            asset_var = sum([(x - asset_mean)**2 for x in sampling])/(len(sampling)-1)
            asset_sd = asset_var**0.5 * 100
            sd.append(asset_sd) #appending the sd of the asset in question per round 
        sd_total.append(sd) #adding a list to a bigger list 
        sd = [] # redoing the list 
    return sd_total    
result = bootstrap(data,portlist,300,10000) 

In [8]:
boot_mean = []
boot_sd = [] #sd of bootstrap sd list 
for volatilities in result: 
    mean = sum(volatilities)/len(volatilities) 
    variance = sum([(x-mean)**2 for x in volatilities])/len(volatilities) 
    std = variance**0.5 
    boot_mean.append(mean)
    boot_sd.append(std)
for i in range(len(boot_sd)): #for loop for bootstrap results 
    print('The average and standard deviation of the bootstrapped volatilities for {} are {:3.2f}% and {:3.2f}%'.format(portlist[i],boot_mean[i],boot_sd[i]))

The average and standard deviation of the bootstrapped volatilities for COMMODITY are 6.15% and 0.19%
The average and standard deviation of the bootstrapped volatilities for GOLD are 4.42% and 0.11%
The average and standard deviation of the bootstrapped volatilities for STOCK_US are 4.22% and 0.11%
The average and standard deviation of the bootstrapped volatilities for STOCK_EM are 6.48% and 0.18%
The average and standard deviation of the bootstrapped volatilities for STOCK_EUROPE are 4.99% and 0.13%
The average and standard deviation of the bootstrapped volatilities for GOVBOND_US are 2.05% and 0.05%


In [9]:
for name in portlist:  #for loop for the sample asset averages and volatilities 
    l_data = list(data[name])
    sample_mean = sum(l_data)/len(l_data) * 100 
    sample_var = sum([(x*100 - sample_mean)**2 for x in l_data])/len(l_data) 
    sample_std = sample_var**0.5 
    print('The asset {} has a mean daily average return of {:3.2f}% and a daily volatility of {:3.2f}%'.format(name,sample_mean,sample_std))

The asset COMMODITY has a mean daily average return of 0.35% and a daily volatility of 6.15%
The asset GOLD has a mean daily average return of 0.43% and a daily volatility of 4.41%
The asset STOCK_US has a mean daily average return of 0.98% and a daily volatility of 4.21%
The asset STOCK_EM has a mean daily average return of 1.08% and a daily volatility of 6.47%
The asset STOCK_EUROPE has a mean daily average return of 0.79% and a daily volatility of 4.99%
The asset GOVBOND_US has a mean daily average return of 0.53% and a daily volatility of 2.04%


In [10]:
result = bootstrap(data,portlist,100,100) #changing parameters to 100 observations and 100 iterations for discussion 
boot_mean = []
boot_sd = [] 
for volatilities in result: 
    mean = sum(volatilities)/len(volatilities) 
    variance = sum([(x-mean)**2 for x in volatilities])/len(volatilities) 
    std = variance**0.5 
    boot_mean.append(mean)
    boot_sd.append(std)
for i in range(len(boot_sd)): #for loop for bootstrap results 
    print('The average and standard deviation of the bootstrapped volatilities for {} are {:3.2f}% and {:3.2f}%'.format(portlist[i],boot_mean[i],boot_sd[i]))

The average and standard deviation of the bootstrapped volatilities for COMMODITY are 6.18% and 0.51%
The average and standard deviation of the bootstrapped volatilities for GOLD are 4.42% and 0.37%
The average and standard deviation of the bootstrapped volatilities for STOCK_US are 4.20% and 0.29%
The average and standard deviation of the bootstrapped volatilities for STOCK_EM are 6.47% and 0.50%
The average and standard deviation of the bootstrapped volatilities for STOCK_EUROPE are 4.96% and 0.43%
The average and standard deviation of the bootstrapped volatilities for GOVBOND_US are 2.00% and 0.16%


Discussion:
+ There is no surprise that the mean daily volatility of the entire sample of observations is equally approximated by the average bootstrapped volatilities of the assets. With the amount of rounds reaching 10,000 iterations, then the Law of Large Numbers applies to the bootstrapping method and ensures an average volatility had we not done bootstrapping. 
+ The relationship between the average bootstrapped volatilities and its standard deviation is directly proportional. The higher the average bootstrapped volatility, or the average daily volatility of the large sample, then the standard deviation of the boostrapped volatilities increases as well. 
+ When parameters are changed to smaller numbers from 300 observations to 100 observations, and 10000 rounds to 100 rounds, one would expect more volatility in the data and less reliable results, then before. Turns out that the average bootstrapped volatilities still approximate the sample volatility, but differ somewhat at times. the standard deviation of the volatility becomes slightly more volatile. Some increase and some decrease in comparison with step 2's results. 
+ Interestingly, the more volatile bootstrapped standard deviations with lower parameters, cause an increase in the average bootstrapped volatiilites. Thus, these blow up the approximate sample volatility a little. 