# Week 5: High-Dimensional Methods and Confidence Intervals

The purpose of this week's problem set is to get familiar with inference based on high-dimensional methods. Our focus is now on testing the convergence hypothesis in cross-country growth using the `growth.csv` dataset that accompanies *2025-10_Project_2.pdf*. The file combines Barro-style measures of GDP growth with a rich set of covariates drawn from geography, institutions, and demographics.

We start by loading the data and cleaning it so the subsequent high-dimensional procedures run smoothly.

In [1]:
# Load packages
import numpy as np
import numpy.linalg as la
import pandas as pd
from sklearn.linear_model import Lasso
from scipy.stats import norm
from sklearn.preprocessing import PolynomialFeatures

# Read and clean growth data
growth = pd.read_csv('growth.csv')

# Harmonise missing values that are stored as empty strings
growth = growth.replace(r'^\s*$', np.nan, regex=True)

# Drop the textual country code and coerce all remaining columns to numeric
growth = growth.drop(columns=['code'], errors='ignore')
growth = growth.apply(pd.to_numeric, errors='coerce')

# Keep observations with the key outcome and treatment, then impute remaining missings
growth = growth.dropna(subset=['gdp_growth', 'lgdp_initial'])
growth = growth.fillna(growth.mean(numeric_only=True))

print(f'The cleaned dataset has shape {growth.shape}.')

The cleaned dataset has shape (102, 84).


In [2]:
print('Column names are\n{}'.format(growth.columns.tolist()))

Column names are
['marketref', 'dem', 'demCGV', 'demBMR', 'demreg', 'lp_bl', 'ls_bl', 'lh_bl', 'tropicar', 'distcr', 'distc', 'distr', 'ginv', 'polity', 'polity2', 'landlock', 'yellow', 'oilres', 'goldm', 'iron', 'silv', 'zinc', 'mortality', 'imputedmort', 'logem4', 'excolony', 'lt100km', 'democ1', 'democ00a', 'cons00a', 'currentinst', 'imr95', 'leb95', 'malfal', 'uvdamage', 'suitavg', 'pdiv', 'pdiv_aa', 'pdivhmi', 'pdivhmi_aa', 'pd1', 'pd1000', 'pd1500', 'pd1500.1', 'pop1', 'pop1000', 'pop1500', 'legor_uk', 'legor_fr', 'temp', 'elevavg', 'elevstd', 'kgatr', 'precip', 'suitgini', 'area', 'abslat', 'cenlong', 'area_ar', 'rough', 'ln_yst', 'ln_yst_aa', 'africa', 'europe', 'asia', 'oceania', 'americas', 'pprotest', 'pcatholic', 'pmuslim', 'pother', 'population_initial', 'population_now', 'gdp_pc_initial', 'gdp_pc_now', 'capital_growth_pct_gdp_initial', 'capital_growth_pct_gdp_now', 'gdp_initial', 'gdp_now', 'investment_rate', 'gdp_growth', 'pop_growth', 'lgdp_initial', 'lpop_initial']


In [3]:
print(growth.head())  # first observations

    marketref  dem  demCGV  demBMR    demreg      lp_bl      ls_bl      lh_bl  \
4   34.144062  0.0     0.0     0.0  0.071429  72.400000  15.300000   4.000000   
7   29.444778  1.0     1.0     1.0  0.954545  29.300000  48.300000  21.500000   
8   38.210518  1.0     1.0     1.0  0.954545  57.263283  35.800000   2.600000   
10  26.193008  0.0     0.0     0.0  0.032258  11.365340   0.912563   0.141201   
11  53.843560  1.0     1.0     1.0  0.954545  65.299510  28.700000   5.200000   

    tropicar     distcr  ...    gdp_pc_now  capital_growth_pct_gdp_initial  \
4   0.027089   271.6420  ...   8692.708046                       24.440095   
7   0.381887   354.8870  ...  56307.280685                       33.022170   
8   0.000000    79.4891  ...  47008.710433                       31.038141   
10  1.000000  1009.6600  ...    202.372052                        4.530795   
11  0.000000    38.3926  ...  44361.248365                       29.656689   

    capital_growth_pct_gdp_now   gdp_initial

In [4]:
print(growth.tail())  # last observations

     marketref       dem    demCGV    demBMR    demreg      lp_bl      ls_bl  \
178  54.545450  0.333333  0.311111  0.333333  0.888889  38.099504  11.618525   
185   1.785714  0.000000  0.000000  0.000000  0.032258  26.700000  27.400002   
187  22.524733  0.000000  0.000000  0.000000  0.032258  54.600000   1.900000   
188   0.000000  0.000000  0.000000  0.000000  0.032258  51.920998   5.440012   
208  26.193008  0.333333  0.311111  0.333333  0.350646  38.099504  11.618525   

        lh_bl  tropicar      distcr  ...    gdp_pc_now  \
178  2.740676  0.586425  293.471971  ...   6654.144614   
185  3.700000  0.037754  306.891000  ...   6748.234355   
187  0.600000  1.000000  992.196000  ...   1558.323007   
188  2.309395  1.000000  489.782000  ...   1058.845827   
208  2.740676  0.586425  293.471971  ...  26822.444082   

     capital_growth_pct_gdp_initial  capital_growth_pct_gdp_now   gdp_initial  \
178                       21.854891                   22.287625  1.654725e+08   
185     

In [5]:
print(growth.dtypes)  # data types

marketref          float64
dem                float64
demCGV             float64
demBMR             float64
demreg             float64
                    ...   
investment_rate    float64
gdp_growth         float64
pop_growth         float64
lgdp_initial       float64
lpop_initial       float64
Length: 84, dtype: object


We model average annual GDP-per-capita growth (`gdp_growth`) using a linear (in the parameters) specification where the regressor of interest is the log of initial GDP per capita (`lgdp_initial`):

$$
\underbrace{g_{i}}_{=Y}= \alpha \times \underbrace{\log(\text{GDP}_{i,1970})}_{=D} + Z_i'\gamma + \varepsilon_i,\quad \mathbb{E}[\varepsilon_i\mid D_i,Z_i]=0.
$$

The vector $Z_i$ collects a set of potential controls drawn from the rich list of geography, institutions, and demographics that come with the growth dataset. The `sklearn` implementation of the Lasso handles the intercept for us, so we can work with demeaned regressors.

We also estimate an auxiliary relationship for the treatment variable,

$$
\log(\text{GDP}_{i,1970}) = Z_i'\psi + \nu_i,\quad \mathbb{E}[\nu_i\mid Z_i]=0,
$$

which is required for the post-double-selection procedure. The convergence hypothesis suggests that $\alpha$ should be negative, but we keep the discussion focused on the mechanics of high-dimensional inference.

# Exercises

## Part 1: Prepare data
Following the project brief, we treat log initial GDP as the regressor of interest and allow for a comparatively rich, but still manageable, set of controls. We will work with the following baseline variables captured in `growth.csv`: `investment_rate`, `abslat`, `temp`, `uvdamage`, `suitavg`, `kgatr`, `pdiv`, `pdiv_aa`, `pop_growth`, `lpop_initial`, `landlock`, `polity2`, `area`, `rough`, `precip`, and `dem`. Treat these as $Z_1,\dots,Z_p$ and expand them with polynomial terms (squares, cubes) and interactions to generate a high-dimensional design matrix.

Hints: Use `sklearn.preprocessing.PolynomialFeatures` for the transformation. If the optimiser struggles to converge, increase the maximum number of iterations via the `max_iter` argument in `Lasso`.

### Question 1.1
Set up the data and augment the baseline controls with quadratic, cubic, and interaction terms (up to total degree three). Do not include an explicit constant—the Lasso implementation handles it for us. How many regressors do you obtain after the expansion?

In [6]:
# Setup data
y = growth.gdp_growth
d = growth.lgdp_initial

control_candidates = [
    'investment_rate', 'abslat', 'temp', 'uvdamage', 'suitavg', 'kgatr',
    'pdiv', 'pdiv_aa', 'pop_growth', 'lpop_initial', 'landlock', 'polity2',
    'area', 'rough', 'precip', 'dem'
]
available_controls = [c for c in control_candidates if c in growth.columns]
missing_controls = sorted(set(control_candidates) - set(available_controls))
if missing_controls:
    print(f"Dropped controls not present in the data: {missing_controls}")

Z_basic = growth[available_controls]

# Add polynomial features (no constant, handled by Lasso)
Z = PolynomialFeatures(3, include_bias=False).fit_transform(Z_basic)

print(f'The number of baseline controls is {len(available_controls)}')
print(f'The number of regressors in Z is {Z.shape[1]}')

The number of baseline controls is 16
The number of regressors in Z is 968


Your numeric output will depend on the available countries and the controls you keep. Make sure the code runs without errors and inspect the printed values from the preceding cell.

In [7]:
# Construct X 
X = np.column_stack((d,Z))

# Find N
N = X.shape[0]

### Question 1.2
Standardize variables before running the Lasso.

*Note:* Make sure make a degrees of freedom correction when computing the standard errors. Pandas does this automatically, but if you use numpy, you should set the argument ddof=1 in the function np.std().

In [8]:
# Create a function for standardizing
def standardize(X):

    X_stan = (X - np.mean(X, axis=0))/np.std(X, axis=0, ddof=1)
    return X_stan

# Standardize data
X_stan = standardize(X)
Z_stan = standardize(Z)
d_stan = standardize(d)

## Part 2: OLS

Students get slightly different answers with each different version of Python. Your results for Exercise 3 should be correct to 3 significant figures.

### Question 2.1
Estimate $\alpha$ using Ordinary Least Squares (OLS). Remember to add a constant to the regressors for this part.

In [9]:
# Add a constant to X
xx = np.column_stack((np.ones(N),X))

# Reshape y
yy = np.array(y).reshape(-1,1)

# Calculate OLS estimate
coefs_OLS = la.inv(xx.T@xx)@xx.T@yy
alpha_OLS = coefs_OLS[1][0]

# Calculate residuals
res_OLS = yy - xx@coefs_OLS

# Display alpha
print("alpha_OLS = ",alpha_OLS.round(2))

alpha_OLS =  10.41


#### Hint: We are doing OLS not Lasso

Your numeric output will depend on the available countries and the controls you keep. Make sure the code runs without errors and inspect the printed values from the preceding cell.

### Question 2.2

Estimate the variance of the OLS estimator and calculate the standard deviation of $\hat{\alpha}$. For this exercise we will assume homoscedasticity.

In [10]:
# Estimate variance
SSR = res_OLS.T@res_OLS
sigma2_OLS = SSR/(N-xx.shape[1])
var = sigma2_OLS*la.inv(xx.T@xx)

# Calculate standard errors
se = np.sqrt(np.diagonal(var)).reshape(-1,1)

# Get standard error of alpha
se_OLS = se[1][0]

# Display standard error
print("se_OLS = ",se_OLS.round(2))


se_OLS =  nan


  se = np.sqrt(np.diagonal(var)).reshape(-1,1)


Your numeric output will depend on the available countries and the controls you keep. Make sure the code runs without errors and inspect the printed values from the preceding cell.

### Question 2.3 

Calculate the 95% confidence interval for $\hat{\alpha}$.

*Hint:* Use scipy.stats.norm.ppf to find quantiles of the normal distribution.

In [11]:
# Calculate the quantile of the standard normal distribution that corresponds to the 95% confidence interval of a two-sided test
q = norm.ppf(1-0.025)

# Calculate confidence interval
CI_low_OLS  = alpha_OLS-q*se_OLS
CI_high_OLS = alpha_OLS+q*se_OLS

# Display confidence interval
CI_OLS =  (((alpha_OLS-q*se_OLS).round(2),(alpha_OLS+q*se_OLS).round(2)))
print("CI_OLS = ",(CI_low_OLS.round(2),CI_high_OLS.round(2)))

CI_OLS =  (nan, nan)


Your numeric output will depend on the available countries and the controls you keep. Make sure the code runs without errors and inspect the printed values from the preceding cell.

## Part 3: Post-Single Lasso

### Question 3.1
Estimate $\alpha$ using Post-Single Lasso (PSL).

Step 0: Calculate BRT

In [12]:
# Make a function that calculates BRT. Hint: You implemented a version of this last week
def BRT(X_tilde,y):
    (N,p) = X_tilde.shape
    sigma = np.std(y, ddof=1)
    c=1.1
    alpha=0.05

    penalty_BRT= (sigma*c)/np.sqrt(N)*norm.ppf(1-alpha/(2*p))

    return penalty_BRT

In [13]:
# Calculate BRT
penalty_BRTyx = BRT(X_stan, y)
print("lambda_BRT =",penalty_BRTyx.round(2))

lambda_BRT = 0.01


Your numeric output will depend on the available countries and the controls you keep. Make sure the code runs without errors and inspect the printed values from the preceding cell.

Step 1: Lasso Y using D and Z. Collect variables in Z with non-zero coefficients in a set called Z_J.

*Hint:* Set max_iter=10_000 to make the Lasso converge.

In [14]:
# Run Lasso 
fit_BRTyx = Lasso(penalty_BRTyx, max_iter=10000).fit(X_stan,y)
coefs=fit_BRTyx.coef_

# Save variables where coefficients are not zero
Z_J = Z[:,coefs[1:]!=0] # Note: We use Z and not Z_stan

# Display number of variables in Z_J
print("The number of variables in Z_J is {}".format(Z_J.shape[1]))

The number of variables in Z_J is 0


Your numeric output will depend on the available countries and the controls you keep. Make sure the code runs without errors and inspect the printed values from the preceding cell.

Step 2: Regress Y using D and Z_J

In [15]:
# Add a constant to X
xx = np.column_stack((np.ones(N),d,Z_J))
yy = np.array(y).reshape(-1,1)

# Calculate OLS estimate
coefs_PSL = la.inv(xx.T@xx)@xx.T@yy
alpha_PSL = coefs_PSL[1][0]

# Calculate residuals
res_PSL = yy - xx@coefs_PSL

# Display alpha
print("alpha_PSL = ",alpha_PSL.round(2))

alpha_PSL =  -0.0


Your numeric output will depend on the available countries and the controls you keep. Make sure the code runs without errors and inspect the printed values from the preceding cell.

### Question 3.2

Estimate the variance of the second step OLS estimator and calculate the standard deviation of $\tilde{\alpha}$.

In [16]:
# Estimate variance
SSR = res_PSL.T@res_PSL
sigma2_PSL = SSR/(N-xx.shape[1])
var = sigma2_PSL*la.inv(xx.T@xx)

# Calculate standard errors
se = np.sqrt(np.diagonal(var)).reshape(-1, 1)
se_PSL=se[1][0]

# Display standard error
print("se_PSL = ",se_PSL.round(2))


se_PSL =  0.0


Your numeric output will depend on the available countries and the controls you keep. Make sure the code runs without errors and inspect the printed values from the preceding cell.

### Question 3.3 

Calculate the 95% confidence interval for $\tilde{\alpha}$.

In [17]:
# Calculate the z statistic that corresponds to the 95% confidence interval of a two-sided test
q = norm.ppf(1-0.025)

# Calculate confidence interval
CI_low_PSL  = alpha_PSL-q*se_PSL
CI_high_PSL = alpha_PSL+q*se_PSL

# Display confidence interval
CI_PSL =  (((alpha_PSL-q*se_PSL).round(2),(alpha_PSL+q*se_PSL).round(2)))
print("CI_PSL = ",(CI_low_PSL.round(2),CI_high_PSL.round(2)))

CI_PSL =  (-0.0, 0.0)


Your numeric output will depend on the available countries and the controls you keep. Make sure the code runs without errors and inspect the printed values from the preceding cell.

## Part 4: Double Post Lasso

### Question 4.1
Estimate $\alpha$ using Double Post Lasso (DPL).

Step 0: Calculate BRT

*Note:* In this exercise we will use the penalty suggested by BRT. BRT relies on homoscedasticity which is a strong assumption.

In [18]:
# Calculate BRT
penalty_BRTyx = BRT(X_stan,y)
print("lambda_BRT =",penalty_BRTyx.round(2))

lambda_BRT = 0.01


Your numeric output will depend on the available countries and the controls you keep. Make sure the code runs without errors and inspect the printed values from the preceding cell.

Step 1: Lasso Y using D and Z

*Hint:* To calculate the residuals from the LASSO-regression you can use the predict method from the Lasso object. The predict method returns the predicted values from the LASSO regression. You can then calculate the residuals by subtracting the predicted values from the actual values. 

In [19]:
# Run Lasso 
fit_BRTyx = Lasso(penalty_BRTyx, max_iter=10000).fit(X_stan, y)
coefs=fit_BRTyx.coef_

# Calculate residuals
resyx = y-fit_BRTyx.predict(X_stan)

# Calculate Y - Z@gamma (epsilon + alpha*d)
# Hint: You only need the variables given to you in this cell, in addition
# to a standardized data set you made previoously.
resyxz = resyx + d_stan*coefs[0]

# Display first coefficient
print("First coefficient =",coefs[0].round(2))

First coefficient = -0.0


Your numeric output will depend on the available countries and the controls you keep. Make sure the code runs without errors and inspect the printed values from the preceding cell.

Step 2: Lasso D using Z

In [20]:
# Calculate BRT
penalty_BRTdz = BRT(Z_stan, d)

In [21]:
# Run Lasso
fit_BRTdz = Lasso(penalty_BRTdz, max_iter=10000).fit(Z_stan, d)
coefs=fit_BRTdz.coef_

# Calculate residuals
resdz=d-fit_BRTdz.predict(Z_stan)

# Display first coefficient
print("First coefficient =",coefs[0].round(2))

First coefficient = 0.0


Your numeric output will depend on the available countries and the controls you keep. Make sure the code runs without errors and inspect the printed values from the preceding cell.

Step 3: Estimate alpha

In [22]:
# Calculate alpha
num = resdz@resyxz
denom = resdz@d
alpha_PDL = num/denom

# Display alpha
print("alpha_PDL = ",alpha_PDL.round(2))

alpha_PDL =  -0.0


Your numeric output will depend on the available countries and the controls you keep. Make sure the code runs without errors and inspect the printed values from the preceding cell.

### Question 4.2
Calculate the implied variance estimate, $\check{\sigma}^2$, and calculate the standard deviation of $\check{\alpha}$.

In [23]:
print(resdz)
print(resyxz)

4      0.632770
7      2.336072
8      1.404136
10    -1.696054
11     1.303333
         ...   
178   -0.432426
185    1.132530
187    0.081042
188   -0.239723
208    1.468071
Name: lgdp_initial, Length: 102, dtype: float64
4     -0.011579
7     -0.000199
8      0.001998
10    -0.020955
11     0.000577
         ...   
178    0.010493
185   -0.014125
187   -0.015966
188   -0.019853
208    0.003040
Length: 102, dtype: float64


In [24]:
# Calculate variance    
num = resdz**2@resyx**2/N
denom = (resdz.T@resdz/N)**2
sigma2_PDL = num/denom

# Display variance
print("sigma2_PDL = ",sigma2_PDL.round(2))

sigma2_PDL =  0.0


Your numeric output will depend on the available countries and the controls you keep. Make sure the code runs without errors and inspect the printed values from the preceding cell.

In [25]:
# Calculate standard error
se_PDL = np.sqrt(sigma2_PDL/N)

# Display standard error
print("se_PDL = ",se_PDL.round(2))

se_PDL =  0.0


Your numeric output will depend on the available countries and the controls you keep. Make sure the code runs without errors and inspect the printed values from the preceding cell.

### Question 4.3
Calculate the confidence interval for $\check{\alpha}$.

In [26]:
# Calculate the quantile of the standard normal distribution that corresponds to the 95% confidence interval of a two-sided test
q = norm.ppf(1-0.025)

# Calculate confidence interval
CI_low_PDL  = alpha_PDL - q * se_PDL
CI_high_PDL = alpha_PDL + q * se_PDL

# Display confidence interval
print("CI_PDL = ",(CI_low_PDL.round(2),CI_high_PDL.round(2)))

CI_PDL =  (-0.01, 0.0)


Your numeric output will depend on the available countries and the controls you keep. Make sure the code runs without errors and inspect the printed values from the preceding cell.

### Question 4.4
Compare OLS, PSL and PDL. 
- Which estimator do you believe the most? 
- Does the dimensionality of the problem affect your answer?

In [27]:
# Create a dictionary with the results
results = {'OLS': [alpha_OLS, se_OLS, CI_low_OLS, CI_high_OLS], 
           'PSL': [alpha_PSL, se_PSL, CI_low_PSL, CI_high_PSL],
           'PDL': [alpha_PDL, se_PDL, CI_low_PDL, CI_high_PDL]}

# Create a dataframe from the dictionary
df_results = pd.DataFrame.from_dict(results, orient='index', columns=['Estimate of alpha', 'Standard error', 'Low bound of CI', 'High bound of CI'])

# Format the dataframe to two digits after the comma
df_results = df_results.round(2)

# Display the dataframe
df_results


Unnamed: 0,Estimate of alpha,Standard error,Low bound of CI,High bound of CI
OLS,10.41,,,
PSL,-0.0,0.0,-0.0,0.0
PDL,-0.0,0.0,-0.01,0.0


Your numeric output will depend on the available countries and the controls you keep. Make sure the code runs without errors and inspect the printed values from the preceding cell.

## Part 5: Post Partialling Out Lasso

An alternative to Post Double Lasso is Post Partialling Out Lasso (PPOL). PPOL is based on another orthogonalized moment condition, which is asymptotically first order equivalent to the one used in Post Double Lasso,

$$
E[(D - Z'\psi_0) ([Y - Z'\delta_0] - \alpha_0[D - Z' \psi_0])] = 0 
$$

The PPOL estimator of $\alpha_0$ can be found by applying the following 3 steps:
1. Lasso Y using Z to get residuals $\hat{\zeta} = Y - Z' \hat{\delta}$
2. Lasso D using Z to get residuals $\hat{\nu} = D - Z' \hat{\psi}$
3. OLS of $\hat{\zeta}$ on $\hat{\nu}$ to get $\breve{\alpha} = \frac{\sum_i \hat{\nu}_i \hat{\zeta}_i}{\sum_i \hat{\nu}_i^2}$


### Question 5.1
Estimate $\alpha$ using Post Partialling Out Lasso (PPOL).

Step 1: Lasso Y using Z

In [28]:
penalty_BRTyz = BRT(Z_stan, y)
print("lambda_BRT =",penalty_BRTyz.round(2))

lambda_BRT = 0.01


Your numeric output will depend on the available countries and the controls you keep. Make sure the code runs without errors and inspect the printed values from the preceding cell.

In [29]:
# Run Lasso
fit_BRTyz = Lasso(penalty_BRTyz, max_iter=10000).fit(Z_stan,y)
coefs=fit_BRTdz.coef_

# Calculate residuals
resyz = y-fit_BRTyz.predict(Z_stan)

# Display first coefficient
print("First coefficient =",coefs[0].round(2))

First coefficient = 0.0


Your numeric output will depend on the available countries and the controls you keep. Make sure the code runs without errors and inspect the printed values from the preceding cell.

Step 2: Lasso D and Z

In [30]:
penalty_BRTdz = BRT(Z_stan, d)
print("lambda_BRT =",penalty_BRTdz.round(2))

lambda_BRT = 0.6


Your numeric output will depend on the available countries and the controls you keep. Make sure the code runs without errors and inspect the printed values from the preceding cell.

In [31]:
# Run Lasso
fit_BRTdz = Lasso(penalty_BRTdz, max_iter=10000).fit(Z_stan,d)
coefs=fit_BRTdz.coef_

# Calculate residuals
resdz = d-fit_BRTdz.predict(Z_stan)

# Display first coefficient
print("First coefficient =",coefs[0].round(2))

First coefficient = 0.0


Your numeric output will depend on the available countries and the controls you keep. Make sure the code runs without errors and inspect the printed values from the preceding cell.

d) Estimate alpha

In [32]:
# Calculate alpha
num = resdz.T@resyz
denom = resdz.T@resdz
alpha_PPOL = num/denom

# Display alpha
print("alpha_PPOL = ",alpha_PPOL.round(2))

alpha_PPOL =  -0.0


Your numeric output will depend on the available countries and the controls you keep. Make sure the code runs without errors and inspect the printed values from the preceding cell.

### Question 5.2

The variance of the PPOL estimator is given by

$$
\breve{\sigma}^2 = \frac{N^{-1}\sum_i \hat{\zeta}_i^2 \hat{\nu}_i^2}{(N^{-1}\sum_i \hat{\nu}_i^2)^2}
$$

where it can be shown that 
$$
\sqrt{N} (\breve{\alpha} - \alpha_0)/\breve{\sigma} \xrightarrow{d} N(0,1)
$$

Calculate the implied variance estimate, $\check{\sigma}^2$, and calculate the standard deviation of $\breve{\alpha}$.

In [33]:
# Calculate variance    
num = resyz**2 @ resdz**2 / N
denom = (resdz.T@resdz / N)**2
sigma2_PPOL = num/denom

# Display variance
print("sigma2_PDL = ",sigma2_PPOL.round(2))

sigma2_PDL =  0.0


Your numeric output will depend on the available countries and the controls you keep. Make sure the code runs without errors and inspect the printed values from the preceding cell.

In [34]:
# Calculate standard error
se_PPOL = np.sqrt(sigma2_PPOL/N)

# Display standard error
print("se_PDL = ",se_PPOL.round(2))

se_PDL =  0.0


Your numeric output will depend on the available countries and the controls you keep. Make sure the code runs without errors and inspect the printed values from the preceding cell.

### Question 5.3
Calculate the confidence interval for $\breve{\alpha}$.

In [35]:
# Calculate the quantile of the standard normal distribution that corresponds to the 95% confidence interval of a two-sided test
q = norm.ppf(1-0.025)

# Calculate confidence interval
CI_low_PPOL  = alpha_PPOL - q * se_PPOL
CI_high_PPOL = alpha_PPOL + q * se_PPOL

# Display confidence interval
print("CI_PDL = ",(CI_low_PPOL.round(2),CI_high_PPOL.round(2)))

CI_PDL =  (-0.01, 0.0)


Your numeric output will depend on the available countries and the controls you keep. Make sure the code runs without errors and inspect the printed values from the preceding cell.

### Question 5.4
Compare OLS, PDL and PPOL

In [36]:
# Create a dictionary with the results
results = {'OLS'   : [alpha_OLS,    se_OLS,    CI_low_OLS,    CI_high_OLS], 
           'PSL'   : [alpha_PSL,    se_PSL,    CI_low_PSL,    CI_high_PSL],
           'PDL'   : [alpha_PDL,    se_PDL,    CI_low_PDL,    CI_high_PDL],
           'PPOL'  : [alpha_PPOL,   se_PPOL,   CI_low_PPOL,   CI_high_PPOL]}

# Create a dataframe from the dictionary
df_results = pd.DataFrame.from_dict(results, orient='index', columns=['Estimate of alpha', 'Standard error', 'Low bound of CI', 'High bound of CI'])

# Format the dataframe to two digits after the comma
df_results = df_results.round(2)

# Display the dataframe
df_results


Unnamed: 0,Estimate of alpha,Standard error,Low bound of CI,High bound of CI
OLS,10.41,,,
PSL,-0.0,0.0,-0.0,0.0
PDL,-0.0,0.0,-0.01,0.0
PPOL,-0.0,0.0,-0.01,0.0


Your numeric output will depend on the available countries and the controls you keep. Make sure the code runs without errors and inspect the printed values from the preceding cell.

## (Optional) Part 6: Repeat with BCCH and CV

* Repeat Exercises using the Belloni-Chen-Chernozhukov-Hansen (BCCH) penalty level for each Lasso (which may be justified without any independence/homoscedasticity assumptions).
* Repeat Exercises using cross-validation (CV) for each Lasso.