# Bootstrapping a Linear Regression

## Loading Packages

In [1]:
using Compat, Missings        #in Julia 0.6 
#using Dates, DelimitedFiles, Random  #in Julia 0.7

include("jlFiles/printmat.jl")

include("jlFiles/OlsFn.jl")
include("jlFiles/NWFn.jl")

NWFn

## Loading Data

The regressions used below are of the type

$
y_t = x_t'b + u_t
$

where $y_t$ are monthly data on 1-year excess returns on a bond and $x_t$ are lagged (12 months) forward rates. 

In [2]:
xx  = readdlm("Data/BondPremiaPs.csv",',',skipstart=1)   
rx  = xx[:,5]                     #bond excess returns
f   = xx[:,6:end]                 #forward rates

x = [ones(size(f,1)-12) f[1:end-12,:]]   #regressors
y = rx[13:end]                           #dependent variable


(T,n) = (size(y,1),size(y,2))            #no. obs and no. test assets
K     = size(x,2)

println("T = $T, n = $n, K = $K")

T = 580, n = 1, K = 6


## Point Estimates

In [3]:
(bLS,u,yhat,Covb,) = OlsFn(y,x)              #OLS estimate and classical std errors
StdbLS = sqrt.(diag(Covb)/T)                 #Covb is Cov(sqrt(T)b) 

println("\nLS coeffs      std")
printmat([bLS  StdbLS])


LS coeffs      std
    -3.306     0.943
    -4.209     0.583
    10.627     4.378
   -14.397    13.989
     7.096    18.094
     1.284     8.058



## Bootstrap


After that follows the bootstrap itself.

The code makes NSim loops. 

In each loop, we initially define a random starting point (row number) of each block (by using the rand() function)---and create a vector of all rows that are in a block. For instance, suppose we randomly draw that the blocks should start on rows $27$ and $35$ (...assuming only two blocks in each simulation) and that we have decided that each block should contain $10$ rows, then the artificial sample will pick out rows $27-36$ and $35-44$. Clearly, some rows can be in several blocks. Once we have $T$ rows, we define a new series of residuals, $\tilde{u}_{t}$.

Then, new values of the dependent variable are created as $\tilde{y}_{t}=x_{t}^{\prime}\beta+\tilde{u}_{t}$ and we redo the estimation on ($\tilde{y}_{t},x_{t}$).

In [4]:
BlockSize = 10                  #size of blocks
NSim      = 2000                 #no. of simulations
srand(123)

nBlocks = round(Int,ceil(T/BlockSize))             #number of blocks, rounded up
bBoot   = fill(NaN,(NSim,K*n))                       #vec(b), [beq1 beq2..beqn]
for i = 1:NSim                                       #loop over simulations
  local t_i, vv_i, utilde, ytilde, b_i  
  t_i        = rand(1:T,nBlocks,1)                   #nBlocks x 1, random starting row of blocks
  t_i        = t_i .+ collect(0:BlockSize-1)'        #nBlocks x BlockSize, each row is a block
  vv_i       = t_i .> T
  t_i[vv_i]  = t_i[vv_i] .- T                        #wrap around if index > T
  #println(t_i)                                      #uncomment to see which rows that are picked out
  t_i        = vec(t_i')                             #column vector of the blocks
  utilde     = u[t_i,:]
  ytilde     = x*bLS + utilde[1:T,:]
  b_i,       = OlsFn(ytilde,x)                       #,skips the remaining outputs
  bBoot[i,:] = b_i
end

println("\nAverage bootstrap estimates and bootstrapped std")
printmat([Compat.mean(bBoot,dims=1)' Compat.std(bBoot,dims=1)'])           #0.7 syntax

println("\nbootstrapped std/OLS std")
printmat(Compat.std(bBoot,dims=1)'./StdbLS)                                #0.7 syntax


Average bootstrap estimates and bootstrapped std
    -3.318     2.072
    -4.164     1.391
    10.636     8.024
   -14.643    23.175
     7.371    29.076
     1.206    12.906


bootstrapped std/OLS std
     2.198
     2.387
     1.833
     1.657
     1.607
     1.602

