### Hypothesis 2: Firms with higher systemic risk experience higher default probabilities during market downturns.

# DELETE LATER

Basic idea:
- estimate beta of each firm vs market (probably use SPY returns for the market)
- define two market downturn periods as:
    -  Global Financial Crisis: September 1, 2008 to March 30, 2009
    -  COVID: February 20, 2020 to April 30, 2020
- calculate the mean pd during each crisis period
- look at correlation between beta and pd \
Expected result: positive and statistically significant correlation


In [2]:
import pandas as pd
import numpy as np
import statsmodels.api as sm

In [3]:
SPY = pd.read_csv('SPY.csv')
SPY['log_market_return'] = np.log(SPY['PRC'] / SPY['PRC'].shift(1))
SPY = SPY.rename(columns={'PRC': 'SPY PRC'})
SPY = SPY[['date', 'SPY PRC', 'log_market_return']]
df= pd.read_csv('../model/merton_model_output.csv')
df = pd.merge(df, SPY, how="left")
df['log_E'] = np.log(df['market_cap'])

In [4]:
beta_list = []
for tic, group in df.groupby('tic'):
    group = group.sort_values('date')
    try:
        X = sm.add_constant(group['log_market_return'])
        y = group['log_E']
        model = sm.OLS(y, X).fit()
        beta = model.params['log_market_return']
        beta_list.append({'tic': tic, 'beta': beta})
    except:
        continue

beta_df = pd.DataFrame(beta_list)

In [5]:
crisis_mask = (
    ((df['date'] >= '2008-09-01') & (df['date'] <= '2009-03-31')) |
    ((df['date'] >= '2020-02-20') & (df['date'] <= '2020-04-30'))
)
crisis_avg_pd = df[crisis_mask].groupby('tic')['merton_pd'].mean().reset_index()
crisis_avg_pd.rename(columns={'merton_pd': 'avg_crisis_pd'}, inplace=True)

merged = pd.merge(beta_df, crisis_avg_pd, on='tic')
print("Correlation:", merged['beta'].corr(merged['avg_crisis_pd']))

X = sm.add_constant(merged['beta'])
y = merged['avg_crisis_pd']
model = sm.OLS(y, X).fit()
print(model.summary())


Correlation: -0.03631053257002913
                            OLS Regression Results                            
Dep. Variable:          avg_crisis_pd   R-squared:                       0.001
Model:                            OLS   Adj. R-squared:                 -0.010
Method:                 Least Squares   F-statistic:                    0.1215
Date:                Wed, 09 Apr 2025   Prob (F-statistic):              0.728
Time:                        15:55:36   Log-Likelihood:                 212.15
No. Observations:                  94   AIC:                            -420.3
Df Residuals:                      92   BIC:                            -415.2
Df Model:                           1                                         
Covariance Type:            nonrobust                                         
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
const          0.0