# Difference-in-Differences: CECL Adoption & Opacity Effects

This notebook implements a difference-in-differences (DiD) framework to separate CECL adoption effects from COVID-19 confounds.

## Research Question
Did CECL adoption in 2020 differentially affect opaque vs. transparent banks' stock returns?

## Treatment Definition
- **Treated group**: Banks that adopted CECL in 2020Q1 (SEC filers)
- **Control group**: Late adopters (2023) or matched non-banks
- **Event period**: 2020 Q1-Q4 (adoption year)
- **Pre-period**: 2019 (pre-adoption)

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

from src.analysis.causal_inference.difference_in_differences import (
    prepare_did_data,
    run_did_regression,
    did_summary_table,
    dynamic_did,
)
from src.analysis.causal_inference.parallel_trends import (
    test_parallel_trends,
    plot_parallel_trends,
    placebo_test,
)

sns.set_style('whitegrid')
plt.rcParams['figure.figsize'] = (12, 6)

## 1. Load Data

Load panel data with:
- `cik`: Bank identifier
- `quarter`: Time period
- `treated`: 1 if CECL adopter, 0 otherwise
- `post_cecl`: 1 if period >= 2020Q1
- `ret_next_quarter`: Quarterly return
- `CNOI`: Opacity measure
- Controls: `log_mcap`, `leverage`, `ROA`

In [None]:
# TODO: Load actual panel data
# panel_df = pd.read_csv('data/panel_quarterly.csv')

# For demo, create synthetic data
np.random.seed(42)
n_firms = 100
quarters = pd.period_range('2018Q1', '2022Q4', freq='Q')

panel_data = []
for firm_id in range(n_firms):
    treated = 1 if firm_id < 50 else 0
    alpha_i = np.random.normal(0.02, 0.01)
    
    for quarter in quarters:
        post = 1 if quarter >= pd.Period('2020Q1', freq='Q') else 0
        treatment_effect = -0.03 if (treated == 1 and post == 1) else 0.0
        
        ret = alpha_i + 0.01 * post + treatment_effect + np.random.normal(0, 0.02)
        
        panel_data.append({
            'cik': firm_id,
            'quarter': quarter.to_timestamp(),
            'treated': treated,
            'post_cecl': post,
            'ret_next_quarter': ret,
            'log_mcap': np.random.normal(8, 1),
            'leverage': np.random.normal(0.5, 0.1),
        })

panel_df = pd.DataFrame(panel_data)
print(f"Panel: {len(panel_df)} observations, {panel_df['cik'].nunique()} firms, {panel_df['quarter'].nunique()} quarters")

## 2. Parallel Trends Test

Before running DiD, verify that treated and control groups have parallel trends pre-treatment.

In [None]:
# Visual test
fig = plot_parallel_trends(
    panel_df,
    outcome='ret_next_quarter',
    treatment_col='treated',
    time_col='quarter',
    treatment_date='2020Q1'
)
plt.show()

In [None]:
# Statistical test
pt_result = test_parallel_trends(
    panel_df,
    outcome='ret_next_quarter',
    treatment_col='treated',
    time_col='quarter',
    entity_col='cik',
    pre_period_end='2019Q4'
)

print(f"F-statistic: {pt_result['f_stat']:.2f}")
print(f"p-value: {pt_result['f_pvalue']:.4f}")
print(f"Parallel trends violated: {pt_result['violated']}")

if not pt_result['violated']:
    print("✓ Parallel trends assumption satisfied (p > 0.05)")
else:
    print("✗ Warning: Parallel trends may be violated")

## 3. DiD Estimation

Model: $Y_{it} = \alpha + \beta_1 \cdot Treat_i + \beta_2 \cdot Post_t + \delta \cdot (Treat \times Post)_{it} + \gamma \cdot X_{it} + \mu_i + \lambda_t + \varepsilon_{it}$

Where $\delta$ is the DiD estimator (treatment effect).

In [None]:
# Prepare DiD data
did_df = prepare_did_data(panel_df)

# Run DiD regression with two-way clustering
result = run_did_regression(
    did_df,
    outcome='ret_next_quarter',
    entity_col='cik',
    time_col='quarter',
    controls=['log_mcap', 'leverage'],
    cluster_entity=True,
    cluster_time=True,
    entity_fe=True,
    time_fe=True
)

# Summary table
summary = did_summary_table(result)
print("\nDiD Estimation Results:")
print("=" * 80)
print(summary.to_string())

In [None]:
# Extract key finding
did_coef = summary.loc['treat_post', 'coef']
did_se = summary.loc['treat_post', 'se']
did_p = summary.loc['treat_post', 'p_value']

print("\n" + "=" * 80)
print("KEY FINDING")
print("=" * 80)
print(f"DiD Coefficient (treat_post): {did_coef:.4f}")
print(f"Standard Error (2-way clustered): {did_se:.4f}")
print(f"t-statistic: {did_coef/did_se:.2f}")
print(f"p-value: {did_p:.4f}")
print(f"\nInterpretation: CECL adopters had {did_coef*100:.2f}% {('lower' if did_coef < 0 else 'higher')} ")
print(f"quarterly returns relative to control group after adoption.")
print(f"This effect is {'statistically significant' if did_p < 0.05 else 'not significant'} at the 5% level.")

## 4. Dynamic DiD (Event Study)

Examine leads and lags to verify:
1. No pre-trends (leads should be ≈ 0)
2. Treatment effect timing (lags show when effect appears)

In [None]:
# Dynamic DiD with 4 leads and 4 lags
coef_df, dyn_result = dynamic_did(
    panel_df,
    outcome='ret_next_quarter',
    treatment_col='treated',
    entity_col='cik',
    time_col='quarter',
    treatment_date='2020Q1',
    n_leads=4,
    n_lags=4,
    controls=['log_mcap', 'leverage']
)

print("\nDynamic DiD Coefficients:")
print(coef_df[['event_time', 'coef', 'se', 'p_value']])

In [None]:
# Plot event study
fig, ax = plt.subplots(figsize=(12, 6))

# Plot coefficients with 95% CI
ax.errorbar(
    coef_df['event_time'],
    coef_df['coef'],
    yerr=1.96 * coef_df['se'],
    fmt='o-',
    capsize=5,
    linewidth=2,
    markersize=8
)

# Add reference line at y=0
ax.axhline(0, color='red', linestyle='--', alpha=0.7)

# Add vertical line at treatment
ax.axvline(-0.5, color='gray', linestyle='--', alpha=0.5)

ax.set_xlabel('Quarters Relative to CECL Adoption', fontsize=12)
ax.set_ylabel('Treatment Effect on Returns', fontsize=12)
ax.set_title('Dynamic DiD: Event Study Plot', fontsize=14, fontweight='bold')
ax.grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

print("\nInterpretation:")
print("- Pre-treatment leads should be near zero (parallel trends)")
print("- Post-treatment lags show evolution of treatment effect")

## 5. Robustness: Placebo Test

Run DiD with fake treatment date in pre-period. Should find no effect.

In [None]:
placebo_result = placebo_test(
    panel_df,
    outcome='ret_next_quarter',
    treatment_col='treated',
    entity_col='cik',
    time_col='quarter',
    true_treatment_date='2020Q1',
    fake_treatment_date='2019Q3',
    controls=['log_mcap', 'leverage']
)

print("\nPlacebo Test Results:")
print("=" * 80)
print(f"Fake DiD coefficient: {placebo_result['did_coef']:.4f}")
print(f"Standard error: {placebo_result['se']:.4f}")
print(f"t-statistic: {placebo_result['t_stat']:.2f}")
print(f"p-value: {placebo_result['p_value']:.4f}")
print(f"\nTest passed (no fake effect): {placebo_result['passed']}")

if placebo_result['passed']:
    print("✓ Placebo test passed: No spurious pre-treatment effect detected")
else:
    print("✗ Warning: Placebo test failed, may indicate pre-trends")

## Conclusion

This analysis uses difference-in-differences to identify the causal effect of CECL adoption on bank stock returns, controlling for:
- Time-invariant firm characteristics (entity fixed effects)
- Common time shocks including COVID-19 (time fixed effects)
- Firm-level controls (size, leverage)

**Key Findings:**
1. Parallel trends test [PASSED/FAILED]
2. DiD estimate: X.XX% (p = Y.YY)
3. Dynamic DiD confirms timing
4. Placebo test [PASSED/FAILED]

**Next Steps:**
- Heterogeneity analysis: Does effect vary by opacity (CNOI)?
- Alternative specifications: Different control groups
- Sensitivity: Varying event windows