In this project you are asked to estimate the announcement effect of Volcker Rule (new banking regulation in US) on US banks. More specifically, analyze at least:

- Did the banks decrease their trading assets after the announcement of the new regulation?
- If they responded to the regulation, which banks responded most and which banks least? Why?
- Remember robustness, and how should banks or regulators use these results?


## Variable Description

- <strong>rssd9001</strong>: *Banking holding company code*
- <strong>rssd9999</strong>: *time*
- <strong>bhc_avgtradingratio</strong>: *Trading asset ratio*, ratio of trading assets to total assets
- <strong>treat_3_b_avg</strong>: *Affected BHC*, dummy variable takes a value of one if the average trading asset ratio during the pre-DFA period (Q3 2004 - Q2 2009) was equal to or larger than 3%, and zero otherwise.
- <strong>after_DFA_1</strong>: *After DFA dummy*, dummy variable that equals one for all quarters between the third quarter of 2010 and the second quarter of 2015, and zero for all quarters from the third quarter of 2004 to the second quarter of 2009
- <strong>dep_roa1</strong>: *Return on assets*, net operating income divided by average total assets
- <strong>dep_leverage</strong>: *Leverage ratio*, average equity divided by average total assets
- <strong>dep_lnassets</strong>: *Total assets*, natural logarithm of total assets 
- <strong>dep_creditrisk_total3</strong>: *Non-performing loan ratio*, past due and non-accrual loans divided by total loans
- <strong>dep_cir</strong>: *Cost-income ratio*, operating expenses divided by total income
- <strong>dep_depositratio</strong>: *Deposit ratio*, average deposits divided by averagetotal assets
- <strong>dep_loans_REratio</strong>: *Real estate loan ratio*, loans secured by real estate divided by total loans
- <strong>dep_liquidity</strong>: *Liquidity ratio*, cash and balances at other depository institutions divided by total assets
- <strong>dep_cpp_bankquarter</strong>: *CPP recipient indicator*, capital Purchase Program indicator variable takes one if the bank is a current recipient of CPP funds in a given quarter, and zero otherwise.


In [3]:
import pandas as pd
import numpy as np
import statsmodels.api as sm

# Define a function to select the closest controls for each treated unit
def select_closest_controls(treated_group, control_group, num_controls):
    closest_controls = []
    for _, treated_row in treated_group.iterrows():
        control_candidates = control_group.copy()
        control_candidates['propensity_scores_diff'] = np.abs(control_candidates['propensity_scores'] - treated_row['propensity_scores'])
        closest_control_indices = control_candidates.sort_values(by=['propensity_scores_diff']).index[:num_controls]
        closest_controls.extend(closest_control_indices)
    return closest_controls

# Calculates the average of the differences between the variables of the Control and Treatment Group
def control_treat_diff(df):
    # Calculate the group means
    group_mean = df.groupby("treatment").mean(numeric_only=True)

    # Extract Control result
    control = group_mean[group_mean.index == 0]
    control = control.reset_index().drop(columns="treatment")

    # Extract Treatment result
    treatment = group_mean[group_mean.index == 1]
    treatment = treatment.reset_index().drop(columns="treatment")

    # Extract Difference result
    diff = group_mean.diff()
    diff = diff[diff.index == 1]
    diff = diff.reset_index().drop(columns="treatment")
    diff_avg = diff.iloc[0,2::].mean()

    # Create a new DataFrame by concatenating control_mean, treatment_mean, and mean_diff
    mean_df = pd.concat([control, treatment, diff], axis=0).reset_index().drop(columns="index").T
    mean_df.columns = ['Control', 'Treatment', 'Difference']
    
    return diff_avg, mean_df


In [4]:
df = pd.read_csv("DiD_data.csv")

In [5]:
# process time
df['time'] = pd.to_datetime(df['rssd9999'], format='%Y%m%d')

# assign treatment group, when bank has >3% at first timestep
df['treatment'] = 0 
mask = (df['treat_3_b_avg'] == 1) & (df['time'] == '2004-09-30')
banks_with_treatment = df.loc[mask, 'rssd9001'].unique()
df.loc[df['rssd9001'].isin(banks_with_treatment), 'treatment'] = 1

# drop NA values
initial_row_count = df.shape[0]

df.dropna(inplace=True)

final_row_count = df.shape[0]

print(f'Initial No. of rows: {initial_row_count}')
print(f'Final No. of rows: {final_row_count}')
print(f'No. of rows dropped: {initial_row_count - final_row_count}')

df

Initial No. of rows: 81560
Final No. of rows: 40026
No. of rows dropped: 41534


Unnamed: 0,rssd9001,rssd9999,bhc_avgtradingratio,treat_3_b_avg,after_DFA_1,dep_roa1,dep_leverage,dep_lnassets,dep_creditrisk_total3,dep_cir,dep_depositratio,dep_loans_REratio,dep_liquidity,dep_cpp_bankquarter,time,treatment
0,1020180,20040930,0.0,0,0,0.002772,0.081957,15.601202,0.013304,0.463811,0.561805,0.593738,0.024337,0,2004-09-30,0
1,1020180,20041231,0.0,0,0,0.003045,0.082480,15.630583,0.009732,0.456392,0.557617,0.601763,0.025446,0,2004-12-31,0
2,1020180,20050331,0.0,0,0,0.002616,0.082074,15.644925,0.011830,0.444011,0.556980,0.600700,0.025153,0,2005-03-31,0
3,1020180,20050630,0.0,0,0,0.002647,0.081712,15.679702,0.013654,0.433771,0.571642,0.601042,0.023670,0,2005-06-30,0
4,1020180,20050930,0.0,0,0,0.002867,0.082944,15.661868,0.012456,0.400985,0.577408,0.581438,0.029793,0,2005-09-30,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
40021,3832583,20140331,0.0,0,1,0.006362,0.225532,13.525286,0.008318,0.528542,0.720636,0.001986,0.081983,0,2014-03-31,0
40022,3832583,20140630,0.0,0,1,0.006616,0.224154,13.519756,0.007821,0.490698,0.724049,0.002048,0.084540,0,2014-06-30,0
40023,3832583,20140930,0.0,0,1,0.006579,0.226952,13.523643,0.008012,0.469434,0.716915,0.001870,0.075759,0,2014-09-30,0
40024,3832583,20141231,0.0,0,1,0.006423,0.227009,13.552240,0.008406,0.473358,0.709474,0.001760,0.074139,0,2014-12-31,0


---

## Propensity Score Matching

We start off by looking at the differences between the variables for the original dataset.

In [6]:
og_diff_avg, og_mean_df = control_treat_diff(df)

og_mean_df

Unnamed: 0,Control,Treatment,Difference
rssd9001,1760253.0,1620711.0,-139541.968232
rssd9999,20086390.0,20085180.0,-1212.022394
bhc_avgtradingratio,0.001244335,0.09672398,0.09548
treat_3_b_avg,0.002506456,1.0,0.997494
after_DFA_1,0.3899944,0.3674242,-0.02257
dep_roa1,0.001812045,0.002007881,0.000196
dep_leverage,0.09192024,0.09111156,-0.000809
dep_lnassets,13.83313,18.1265,4.293375
dep_creditrisk_total3,0.02833271,0.02803137,-0.000301
dep_cir,0.528432,0.5787095,0.050277


In [7]:
# T-test to assess difference between treatment and control 
import scipy.stats as stats
import pandas as pd

# Split the data into the 'treat' and 'test' groups
df_pre = df[df['after_DFA_1'] == 0]
treat_group = df_pre[df_pre['treat_3_b_avg'] == 1]
test_group = df_pre[df_pre['treat_3_b_avg'] == 0]

In [8]:
# Perform t-tests for each covariate
covariates = ['bhc_avgtradingratio', 'dep_roa1', 'dep_leverage','dep_lnassets','dep_creditrisk_total3','dep_cir','dep_depositratio','dep_loans_REratio','dep_liquidity','dep_cpp_bankquarter']
results = {}

In [9]:
for covariate in covariates:
    t_stat, p_value = stats.ttest_ind(treat_group[covariate], test_group[covariate])
    results[covariate] = {'t-statistic': t_stat, 'p-value': p_value}

# Display the t-test results
results_df_pre = pd.DataFrame(results)
print(results_df_pre)

             bhc_avgtradingratio  dep_roa1  dep_leverage  dep_lnassets  \
t-statistic           178.923967  0.395793     -0.858699     64.439366   
p-value                 0.000000  0.692261      0.390515      0.000000   

             dep_creditrisk_total3   dep_cir  dep_depositratio  \
t-statistic              -0.201895  0.753519        -58.839206   
p-value                   0.840001  0.451145          0.000000   

             dep_loans_REratio  dep_liquidity  dep_cpp_bankquarter  
t-statistic      -3.425047e+01   1.309243e+01             3.319195  
p-value          3.721842e-251   4.924378e-39             0.000904  


In [10]:
# dep_lnassets, dep_depositratio, dep_loans_REratio, dep_liquidity and dep_depositratio are stastistically different