### 2 Analyzing GMO
NOTE: the analysis for GMWAX and GMGEX is done side-by-side throughout parts 1, 2, and 3, so code is not repeated in part 4 (there is only a discussion of the key differences between the two strategies there).

This section utilizes data in the file gmo_data.xlsx. Convert total returns to excess returns using the risk‑free rate.

1. Performance (GMWAX & GMGEX). Compute mean, volatility, and Sharpe ratio for GMWAX over three samples:
- inception → 2011
- 2012 → present
- inception → present

In [None]:
import pandas as pd
import numpy as np
import statsmodels.api as sm

df = pd.read_excel('gmo_analysis_data.xlsx', sheet_name='total returns').set_index('date')
rf = pd.read_excel('gmo_analysis_data.xlsx', sheet_name='risk-free rate').set_index('date')
xr = df.sub(rf['TBill 3M']/12, axis=0) # Convert annual risk-free rate to monthly

In [10]:
def performance(sample):
    mu  = sample.mean() * 12
    vol = sample.std() * np.sqrt(12)
    return pd.DataFrame({'Mean': mu, 'Vol': vol, 'Sharpe': mu / vol})

res = {k: performance(v) for k, v in {'1996-2011': xr.loc[:'2011-12-31'], '2012-2025': xr.loc['2012-01-01':], '1996-2025': xr}.items()}
pd.concat(res, names=['Sample', 'Asset']).round(4)

Unnamed: 0_level_0,Unnamed: 1_level_0,Mean,Vol,Sharpe
Sample,Asset,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
1996-2011,SPY,0.0358,0.1642,0.2182
1996-2011,GMWAX,0.0464,0.1105,0.4201
1996-2011,GMGEX,-0.0038,0.1473,-0.026
2012-2025,SPY,0.135,0.1397,0.9659
2012-2025,GMWAX,0.0492,0.0927,0.5305
2012-2025,GMGEX,0.0132,0.2281,0.0578
1996-2025,SPY,0.0833,0.1534,0.5427
1996-2025,GMWAX,0.0477,0.1022,0.467
1996-2025,GMGEX,0.0043,0.19,0.0227


Has the mean, vol, and Sharpe changed much since the case?
1. GMWAX:   
The mean, volatility, and Sharpe ratio of GMWAX have all improved significantly, with particularly better performance in the 2012-2025 period.

2. GMGEX:   
For GMGEX, while its mean and Sharpe ratio have improved, its volatility has risen significantly, resulting in substantial changes in its risk characteristics.




2. Tail risk (GMWAX & GMGEX). For all three samples, analyze extreme scenarios:
- minimum return
- 5th percentile (VaR‑5th)
- maximum drawdown (compute on total returns, not excess returns)

In [7]:
def tailrisk(sample):
    min = sample.min()
    var = sample.quantile(0.05)
    dd = sample.apply(lambda x: (1+x).cumprod().div((1+x).cumprod().cummax()).sub(1))
    return pd.DataFrame({'Min Return': min, 'VaR (5%)': var, 'Max Drawdown': dd.min()})
# Using df here for total returns (as opposed to excess)
res2 = {k: tailrisk(v) for k, v in {'1996-2011': df.loc[:'2011-12-31'], '2012-2025': df.loc['2012-01-01':], '1996-2025': df}.items()}
pd.concat(res2, names=['Sample', 'Asset']).round(4)

Unnamed: 0_level_0,Unnamed: 1_level_0,Min Return,VaR (5%),Max Drawdown
Sample,Asset,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
1996-2011,SPY,-0.1652,-0.0795,-0.508
1996-2011,GMWAX,-0.1451,-0.044,-0.2936
1996-2011,GMGEX,-0.1512,-0.0797,-0.5556
2012-2025,SPY,-0.1246,-0.0607,-0.2393
2012-2025,GMWAX,-0.115,-0.0369,-0.2168
2012-2025,GMGEX,-0.6587,-0.0653,-0.7374
1996-2025,SPY,-0.1652,-0.0744,-0.508
1996-2025,GMWAX,-0.1451,-0.0404,-0.2936
1996-2025,GMGEX,-0.6587,-0.0752,-0.7618


(a) Does GMWAX/GMGEX have high or low tail‑risk as seen by these stats?
1. GMWAX:   
Based on the tail-risk statistics, GMWAX has low tail-risk.
2. GMGEX:   
Based on the tail-risk statistics, GMGEX has high tail-risk.

(b) Does that vary much across the two subsamples?
1. GMWAX:   
The tail-risk profile varies noticeably across subsamples.
2. GMGEX:   
The tail-risk profile varies

3. Market exposure (GMWAX & GMGEX). For all three samples, regress excess returns of GMWAX/GMGEX on excess returns of SPY:
- report estimated alpha, beta, and R²

In [8]:
def reg(sample):
    rows = []
    for asset in ['GMWAX', 'GMGEX']:
        y = sample[asset]
        X = sm.add_constant(sample['SPY'])
        model = sm.OLS(y, X, missing='drop').fit()
        rows.append([asset, model.params['const'], model.params['SPY'], model.rsquared])
        
    return pd.DataFrame(rows, columns=['Asset', 'alpha', 'beta', 'R2']).set_index('Asset')

res3 = {k: reg(v) for k, v in {'1996-2011': xr.loc[:'2011-12-31'], '2012-2025': xr.loc['2012-01-01':], '1996-2025': xr}.items()}
pd.concat(res3, names=['Sample', 'Asset']).round(4)

Unnamed: 0_level_0,Unnamed: 1_level_0,alpha,beta,R2
Sample,Asset,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
1996-2011,GMWAX,-0.0079,0.6195,0.6878
1996-2011,GMGEX,-0.0076,0.8147,0.7645
2012-2025,GMWAX,-0.0083,0.6294,0.7693
2012-2025,GMGEX,-0.0108,0.8042,0.2811
1996-2025,GMWAX,-0.0081,0.6224,0.7271
1996-2025,GMGEX,-0.0092,0.804,0.4588


- is GMWAX/GMGEX a low‑beta strategy? has that changed since the case?
1. GMWAX:   
GMWAX’s beta is 0.6195 (1996-2011), 0.6294 (2012-2025), and 0.6224 (1996-2025), all well below 1. Thus, it is a low-beta strategy, and its beta has not changed significantly across subsamples.


2. GMGEX:   
GMGEX’s beta is 0.8147 (1996-2011), 0.8042 (2012-2025), and 0.804 (1996-2025), all below 1. So, it is a low-beta strategy, and its beta has not changed significantly across subsamples either.


- does GMWAX/GMGEX provide alpha? has that changed across subsamples?
1. GMWAX:   
GMWAX’s alpha is -0.0079 (1996-2011), -0.0083 (2012-2025), and -0.0081 (1996-2025), all negative. Thus, it does not provide positive alpha, and its alpha has not changed significantly across subsamples.


2. GMGEX:   
GMGEX’s alpha is -0.0076 (1996-2011), -0.0108 (2012-2025), and -0.0092 (1996-2025), all negative. So, it does not provide positive alpha. However, its alpha became more negative in 2012-2025, so it has changed across subsamples to some extent.

4. Compare GMWAX and GMGEX. What are key differences between the two strategies?

- Risk & Return: GMWAX has lower tail risk and better Sharpe ratio; GMGEX has higher tail risk and worse Sharpe ratio.

- Market Exposure: GMWAX has lower beta and higher R² ; GMGEX has higher beta and lower R² in later periods .

- Excess Return: Neither has positive alpha; GMGEX has worse alpha stability.

### 3 Forecast Regressions

_This section utilizes data in `gmo_data.xlsx`._

1. **Lagged regression.** Consider the regression with predictors lagged one period:

$$
r^{SPY}_{t} \;=\; \alpha^{SPY,X} \;+\; \big(\beta^{SPY,X}\big)^\prime X_{t-1} \;+\; \epsilon^{SPY,X}_{t}
\tag{1}
$$

Estimate (1) and report the **$R^2$**, as well as the OLS estimates for $\alpha$ and $\beta$. Do this for:
   - $X$ as a single regressor, the **dividend–price** ratio ($DP$)
   - $X$ as a single regressor, the **earnings–price** ratio ($EP$)
   - $X$ with **three** regressors: $DP$, $EP$, and the **10‑year yield**  
   For each, report the **$R^2$**.

In [1]:
import pandas as pd
import numpy as np
import statsmodels.api as sm

signals_df = pd.read_excel("gmo_analysis_data.xlsx", sheet_name="signals")
rf_df = pd.read_excel("gmo_analysis_data.xlsx", sheet_name="risk-free rate")
returns_df = pd.read_excel("gmo_analysis_data.xlsx", sheet_name="total returns")

for df in [signals_df, rf_df, returns_df]:
    df["date"] = pd.to_datetime(df["date"])
    df.set_index("date", inplace=True)

data = pd.merge(signals_df, returns_df, on="date", how="inner")
data = pd.merge(data, rf_df, on="date", how="inner")

data = data.rename(
    columns={"SPX D/P": "DP", "SPX E/P": "EP", "T-Note 10YR": "TNote", "TBill 3M": "RF"}
)
data['RF'] = data['RF'] / 12

data = data.dropna()

# Create lagged predictors
data['DP_lag'] = data['DP'].shift(1)
data['EP_lag'] = data['EP'].shift(1)
data['TNote_lag'] = data['TNote'].shift(1)

data_reg = data.dropna(subset=['DP_lag', 'EP_lag', 'TNote_lag', 'SPY'])

Y = data_reg['SPY']

# --- Regression 1: DP only ---
X1 = sm.add_constant(data_reg['DP_lag'])
model1 = sm.OLS(Y, X1).fit()

print("--- Regression 1: SPY vs. Lagged DP ---")
print(f"R-squared: {model1.rsquared:.4f}")
print(f"  Alpha (const): {model1.params['const']:.4f}")
print(f"  Beta (DP_lag): {model1.params['DP_lag']:.4f}\n")

# --- Regression 2: EP only ---
X2 = sm.add_constant(data_reg['EP_lag'])
model2 = sm.OLS(Y, X2).fit()

print("--- Regression 2: SPY vs. Lagged EP ---")
print(f"R-squared: {model2.rsquared:.4f}")
print(f"  Alpha (const): {model2.params['const']:.4f}")
print(f"  Beta (EP_lag): {model2.params['EP_lag']:.4f}\n")

# --- Regression 3: DP, EP, and 10-year Yield ---
X3 = sm.add_constant(data_reg[['DP_lag', 'EP_lag', 'TNote_lag']])
model3 = sm.OLS(Y, X3).fit()

print("--- Regression 3: SPY vs. Lagged DP, EP, T-Note ---")
print(f"R-squared: {model3.rsquared:.4f}")
print(f"  Alpha (const):     {model3.params['const']:.4f}")
print(f"  Beta (DP_lag):     {model3.params['DP_lag']:.4f}")
print(f"  Beta (EP_lag):     {model3.params['EP_lag']:.4f}")
print(f"  Beta (TNote_lag):  {model3.params['TNote_lag']:.4f}\n")

regression_models = {
    'DP': model1,
    'EP': model2,
    'All': model3
}

--- Regression 1: SPY vs. Lagged DP ---
R-squared: 0.0073
  Alpha (const): -0.0078
  Beta (DP_lag): 0.9286

--- Regression 2: SPY vs. Lagged EP ---
R-squared: 0.0048
  Alpha (const): -0.0041
  Beta (EP_lag): 0.2404

--- Regression 3: SPY vs. Lagged DP, EP, T-Note ---
R-squared: 0.0086
  Alpha (const):     -0.0027
  Beta (DP_lag):     0.4455
  Beta (EP_lag):     0.1428
  Beta (TNote_lag):  -0.1173



2. **Trading strategy from forecasts.** For each of the three regressions:
   - Build the forecasted SPY return: $\hat r^{SPY}_{t+1}$ (forecast made using $X_t$ to predict $r^{SPY}_{t+1}$).
   - Set the scale (portfolio weight) to $w_t = 100 \,\hat r^{SPY}_{t+1}$.
   - Strategy return: $r^x_{t+1} = w_t\, r^{SPY}_{t+1}$.  
   For each strategy, compute:
   - mean, volatility, Sharpe
   - max drawdown
   - market **alpha**
   - market **beta**
   - market **information ratio**

In [2]:
def calculate_metrics_corrected(strategy_returns, market_returns, rf_returns, name="Strategy"):
    """
    Calculates a set of performance metrics for a given strategy.
    """
    
    strat_excess_ret = strategy_returns - rf_returns
    market_excess_ret = market_returns - rf_returns
    
    mean_ret_ann = strategy_returns.mean() * 12
    vol_ann = strategy_returns.std() * np.sqrt(12)
    
    if vol_ann == 0:
        sharpe_ratio_ann = np.nan
    else:
        sharpe_ratio_ann = (mean_ret_ann / vol_ann)

    cumulative_ret = (1 + strategy_returns).cumprod()
    peak = cumulative_ret.cummax()
    drawdown = (cumulative_ret - peak) / peak
    max_dd = drawdown.min()

    Y = strat_excess_ret
    X = pd.DataFrame({'SPY_excess': market_excess_ret})
    X = sm.add_constant(X, prepend=False)
    
    capm_model = sm.OLS(Y, X).fit(cov_type='HAC', cov_kwds={'maxlags': 12})
    
    # Annualize Alpha
    alpha_ann = capm_model.params['const'] * 12
    beta = capm_model.params['SPY_excess']
    
    if capm_model.resid.std() == 0:
        ir_ann = np.nan
    else:
        ir_ann = (capm_model.params['const'] / capm_model.resid.std()) * np.sqrt(12)


    metrics = {
        'Mean (ann.)': mean_ret_ann,
        'Volatility (ann.)': vol_ann,
        'Sharpe Ratio (ann.)': sharpe_ratio_ann,
        'Max Drawdown': max_dd,
        'Market Alpha (ann.)': alpha_ann,
        'Market Beta': beta,
        'Info Ratio (ann.)': ir_ann
    }
    
    return metrics

data_strat = data.dropna(subset=['DP', 'EP', 'TNote', 'SPY', 'RF', 'GMWAX', 'GMGEX'])

# 1. Generate Forecasts
X_pred_1 = sm.add_constant(data_strat['DP'].rename('DP_lag'))
forecast_1 = regression_models['DP'].predict(X_pred_1)

X_pred_2 = sm.add_constant(data_strat['EP'].rename('EP_lag'))
forecast_2 = regression_models['EP'].predict(X_pred_2)

X_pred_3 = data_strat[['DP', 'EP', 'TNote']]
X_pred_3.columns = ['DP_lag', 'EP_lag', 'TNote_lag'] # Rename to match model
X_pred_3 = sm.add_constant(X_pred_3)
forecast_3 = regression_models['All'].predict(X_pred_3)

data_strat['forecast_1'] = forecast_1
data_strat['forecast_2'] = forecast_2
data_strat['forecast_3'] = forecast_3

# 2. Calculate Weights
data_strat['w_1'] = 100 * forecast_1
data_strat['w_2'] = 100 * forecast_2
data_strat['w_3'] = 100 * forecast_3

# 3. Calculate Strategy Returns
r_spy_tplus1 = data_strat['SPY'].shift(-1)
strat_ret_1_tplus1 = data_strat['w_1'] * r_spy_tplus1
strat_ret_2_tplus1 = data_strat['w_2'] * r_spy_tplus1
strat_ret_3_tplus1 = data_strat['w_3'] * r_spy_tplus1

# 4. Align Data for Analysis
analysis_df = pd.DataFrame(index=data_strat.index)
analysis_df['strat_1'] = strat_ret_1_tplus1.shift(1)
analysis_df['strat_2'] = strat_ret_2_tplus1.shift(1)
analysis_df['strat_3'] = strat_ret_3_tplus1.shift(1)

analysis_df['SPY'] = data_strat['SPY']
analysis_df['RF'] = data_strat['RF'] 
analysis_df['GMWAX'] = data_strat['GMWAX']
analysis_df['GMGEX'] = data_strat['GMGEX']

analysis_df = analysis_df.dropna()
print(f"Analysis period: {analysis_df.index.min().date()} to {analysis_df.index.max().date()}")

# --- Run Analysis ---
results = {}
market_returns = analysis_df['SPY']
rf_returns = analysis_df['RF'] # Monthly RF

results['Strategy 1 (DP)'] = calculate_metrics_corrected(analysis_df['strat_1'], market_returns, rf_returns, "Strat 1 (DP)")
results['Strategy 2 (EP)'] = calculate_metrics_corrected(analysis_df['strat_2'], market_returns, rf_returns, "Strat 2 (EP)")
results['Strategy 3 (All)'] = calculate_metrics_corrected(analysis_df['strat_3'], market_returns, rf_returns, "Strat 3 (All)")
results['Market (SPY)'] = calculate_metrics_corrected(market_returns, market_returns, rf_returns, "Market (SPY)")

results_df = pd.DataFrame(results).T
print("--- Strategy Performance Metrics ---")
print(results_df.to_string(float_format="%.4f"))
print("\n")

Analysis period: 1997-01-31 to 2025-10-31
--- Strategy Performance Metrics ---
                  Mean (ann.)  Volatility (ann.)  Sharpe Ratio (ann.)  Max Drawdown  Market Alpha (ann.)  Market Beta  Info Ratio (ann.)
Strategy 1 (DP)        0.1118             0.1682               0.6648       -0.6986               0.0077       0.9706             0.0975
Strategy 2 (EP)        0.1061             0.1556               0.6822       -0.6168               0.0038       0.9483             0.0700
Strategy 3 (All)       0.1151             0.1671               0.6888       -0.6661               0.0115       0.9640             0.1463
Market (SPY)           0.1067             0.1535               0.6949       -0.5080              -0.0000       1.0000            -5.6911





3. **Risk characteristics.**
   - For both strategies, the market, and GMO, compute monthly **VaR** at $\pi = 0.05$ (use the historical quantile).
   - The case mentions stocks under‑performed short‑term bonds from 2000–2011. Does the dynamic portfolio above under‑perform the risk‑free rate over this time?
   - Based on the regression estimates, in how many periods do we estimate a **negative risk premium**?
   - Do you believe the dynamic strategy takes on **extra risk**?

In [3]:
# 1. Monthly VaR at 5%
print("--- 3.1: Monthly 5% VaR (Historical Quantile) ---")
var_cols = ['strat_1', 'strat_2', 'strat_3', 'SPY', 'GMWAX', 'GMGEX']
var_results = analysis_df[var_cols].quantile(0.05)
var_results.name = "5% VaR"
print(var_results.to_string(float_format="%.4f"))
print("\n")

# 2. Performance from 2000-2011
print("--- 3.2: Performance from 2000-01-01 to 2011-12-31 ---")
sub_period = analysis_df.loc['2000-01-01':'2011-12-31']


def total_cum_ret(returns):
    return (1 + returns).prod() - 1

cum_ret = {
    'Strategy 1 (DP)': total_cum_ret(sub_period['strat_1']),
    'Strategy 2 (EP)': total_cum_ret(sub_period['strat_2']),
    'Strategy 3 (All)': total_cum_ret(sub_period['strat_3']),
    'Risk-Free (TBill 3M)': total_cum_ret(sub_period['RF'])
}

cum_ret_df = pd.Series(cum_ret, name="Total Return")
print(cum_ret_df.to_string(float_format="%.4f"))

underperforms = []
rf_perf = cum_ret['Risk-Free (TBill 3M)']
for i in range(1, 4):
    strat_name = f'Strategy {i} ({"DP" if i==1 else "EP" if i==2 else "All"})'
    if cum_ret[strat_name] < rf_perf:
        underperforms.append(f"Strategy {i}")

if not underperforms:
    print("\nAnswer: No, all dynamic portfolios outperformed the risk-free rate over this period.")
else:
    print(f"\nAnswer: Yes, {', '.join(underperforms)} under-performed the risk-free rate over this time.")
print("\n")

# 3. Periods with Negative Estimated Risk Premium
print("--- 3.3: Periods with Estimated Negative Risk Premium ---")

erp = pd.DataFrame(index=data_strat.index)
erp['erp_1 (DP)'] = data_strat['forecast_1'] - data_strat['RF']
erp['erp_2 (EP)'] = data_strat['forecast_2'] - data_strat['RF']
erp['erp_3 (All)'] = data_strat['forecast_3'] - data_strat['RF']

neg_erp_counts = (erp < 0).sum()
neg_erp_counts.name = "Negative ERP Count"
print(neg_erp_counts.to_string())
print(f"Total periods analyzed: {len(erp)}")
print("\n")

# 4. Does the dynamic strategy take on extra risk?
print("--- 3.4: Does the dynamic strategy take on extra risk? ---")
print("Comparison of Volatility and Max Drawdown (from Part 2):")
print(results_df[['Volatility (ann.)', 'Max Drawdown']].to_string(float_format="%.4f"))
print("\nAnswer: Yes, the dynamic strategies exhibit higher annualized volatility and")
print("significantly larger maximum drawdowns compared to the 'Market (SPY)' benchmark.")

--- 3.1: Monthly 5% VaR (Historical Quantile) ---
strat_1   -0.0615
strat_2   -0.0641
strat_3   -0.0652
SPY       -0.0744
GMWAX     -0.0404
GMGEX     -0.0753


--- 3.2: Performance from 2000-01-01 to 2011-12-31 ---
Strategy 1 (DP)        0.3988
Strategy 2 (EP)        0.3210
Strategy 3 (All)       0.5140
Risk-Free (TBill 3M)   0.3153

Answer: No, all dynamic portfolios outperformed the risk-free rate over this period.


--- 3.3: Periods with Estimated Negative Risk Premium ---
erp_1 (DP)     29
erp_2 (EP)      1
erp_3 (All)    48
Total periods analyzed: 347


--- 3.4: Does the dynamic strategy take on extra risk? ---
Comparison of Volatility and Max Drawdown (from Part 2):
                  Volatility (ann.)  Max Drawdown
Strategy 1 (DP)              0.1682       -0.6986
Strategy 2 (EP)              0.1556       -0.6168
Strategy 3 (All)             0.1671       -0.6661
Market (SPY)                 0.1535       -0.5080

Answer: Yes, the dynamic strategies exhibit higher annualized volati

### 4. Out-of-Sample Forecasting

_This section utilizes data in `gmo_data.xlsx`._ Focus on using **both** DP and EP as signals in (1). Compute **out-of-sample** (OOS) statistics.

#### 4.1

In [10]:
from sklearn.linear_model import LinearRegression

In [6]:
signals_df = pd.read_excel("gmo_analysis_data.xlsx", sheet_name="signals")
rf_df = pd.read_excel("gmo_analysis_data.xlsx", sheet_name="risk-free rate")
returns_df = pd.read_excel("gmo_analysis_data.xlsx", sheet_name="total returns")

for df in [signals_df, rf_df, returns_df]:
    df["date"] = pd.to_datetime(df["date"])
    df.set_index("date", inplace=True)

data = pd.merge(signals_df, returns_df, on="date", how="inner")
data = pd.merge(data, rf_df, on="date", how="inner")

data = data.rename(
    columns={"SPX D/P": "DP", "SPX E/P": "EP", "T-Note 10YR": "TNote", "TBill 3M": "RF"}
)

data

Unnamed: 0_level_0,DP,EP,TNote,SPY,GMWAX,GMGEX,RF
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
1996-12-31,0.019651,0.051592,0.064180,-0.023292,-0.022094,-0.013000,0.051710
1997-01-31,0.018455,0.048704,0.064940,0.061786,0.014735,0.034448,0.051470
1997-02-28,0.018502,0.048434,0.065520,0.009565,0.022265,0.012733,0.052200
1997-03-31,0.019427,0.055559,0.069030,-0.045721,-0.015152,-0.016441,0.053220
1997-04-30,0.018430,0.052318,0.067180,0.064368,-0.006731,0.000000,0.052330
...,...,...,...,...,...,...,...
2025-06-30,0.012426,0.038005,0.042280,0.051394,0.032878,0.041144,0.042911
2025-07-31,0.012199,0.037199,0.043740,0.023031,0.004145,0.004056,0.043372
2025-08-29,0.012035,0.037317,0.042284,0.020520,0.036833,0.047553,0.041391
2025-09-30,0.011683,0.035954,0.041503,0.035606,0.018630,0.022587,0.039323


In [12]:
errors = []
null_forecast = []

for i in range(60, len(data)-1):
    train_data = data.iloc[:i]
    test_data = data.iloc[i:i+1]
    
    # Prepare training data
    X_train = train_data[['DP', 'EP']]
    y_train = train_data['SPY']
    
    # Fit the model
    model = LinearRegression()
    model.fit(X_train, y_train)
    
    # Prepare test data
    X_test = test_data[['DP', 'EP']]
    y_test = test_data['SPY'].values[0]
    
    # Make prediction
    y_pred = model.predict(X_test)[0]
    
    # Calculate error
    error = y_test - y_pred
    errors.append(error)

    # Calculate null forecast
    null_forecast.append(y_test - train_data['SPY'].mean())

out_of_sample_R_squared = 1 - (np.sum(np.square(errors)) / np.sum(np.square(null_forecast)))

result_table = pd.DataFrame({
    'Metric': ['Out-of-Sample R-squared'],
    'Value': [out_of_sample_R_squared]
})
result_table  

Unnamed: 0,Metric,Value
0,Out-of-Sample R-squared,-0.006451


This forecasting strategy does not produce a positive out of sample R squared

#### 4.2

In [47]:
def trading_strategy_with_OOS(df):

    strategy_return = []

    for i in range(60, len(df)-1):
        train_data = df.iloc[:i]
        test_data = df.iloc[i:i+1]
        
        # Prepare training data
        X_train = train_data[['DP', 'EP']]
        y_train = train_data['SPY']
        
        # Fit the model
        model = LinearRegression()
        model.fit(X_train, y_train)
        
        # Prepare test data
        X_test = test_data[['DP', 'EP']]
        y_test = test_data['SPY'].values[0]
        
        # Make prediction
        y_pred = model.predict(X_test)[0]

        # Determine weight
        weight = 100 * y_pred

        # Calculate strategy return at t+1
        r_spy_tplus1 = df['SPY'].iloc[i+1]
        strat_ret_tplus1 = weight * r_spy_tplus1
        strategy_return.append(strat_ret_tplus1)

    # Calculate mean, vol, sharpe, max drawdown, market alpha, market beta, market information ratio from strategy
    strategy_return_series = pd.Series(strategy_return[:-1], index=df.index[61:len(df)-1])
    market_returns = df['SPY'].iloc[61:len(df)-1]
    rf_returns = df['RF'].iloc[61:len(df)-1]
    metrics = calculate_metrics_corrected(strategy_return_series, market_returns, rf_returns, "OOS Strategy")

    for value in metrics:
        metrics[value] = float(metrics[value])

    # Create DataFrame with strategy returns and align with original data
    data_frame = pd.DataFrame({
        'OOS Strategy': strategy_return
    }, index=df.index[61:len(df)])
    
    # Add SPY, GMWAX, and GMGEX from the original df
    data_frame['SPY'] = df['SPY'].iloc[61:len(df)].values
    data_frame['GMWAX'] = df['GMWAX'].iloc[61:len(df)].values
    data_frame['GMGEX'] = df['GMGEX'].iloc[61:len(df)].values
    data_frame['RF'] = df['RF'].iloc[61:len(df)].values

    return metrics, data_frame

In [49]:
result, table = trading_strategy_with_OOS(data)
result

{'Mean (ann.)': 0.028745340276624187,
 'Volatility (ann.)': 0.2595539595160702,
 'Sharpe Ratio (ann.)': 0.11074899543131196,
 'Max Drawdown': -0.9253022952291977,
 'Market Alpha (ann.)': -0.16584578086839874,
 'Market Beta': 0.052132874674055744,
 'Info Ratio (ann.)': -0.6329698876048265}

- Compared to part 3.2, the OOS strategy performed much worse in mean and shrape ratio. 

In [44]:
# 1. Monthly VaR at 5%
print("--- 3.1: Monthly 5% VaR (Historical Quantile) ---")
var_results = table.quantile(0.05)
var_results.name = "5% VaR"
print(var_results.to_string(float_format="%.4f"))
print("\n")

--- 3.1: Monthly 5% VaR (Historical Quantile) ---
OOS Strategy   -0.0608
SPY            -0.0727
GMWAX          -0.0401
GMGEX          -0.0753




In [52]:
# 2. Performance from 2000-2011
print("--- 3.2: Performance from 2000-01-01 to 2011-12-31 ---")
sub_period = table.loc['2000-01-01':'2011-12-31']

def total_cum_ret(returns):
    return (1 + returns).prod() - 1

cum_ret = {
    'OOS Strategy': total_cum_ret(sub_period['OOS Strategy']),
    'Risk-Free (TBill 3M)': total_cum_ret(sub_period['RF'])
}

cum_ret_df = pd.Series(cum_ret, name="Total Return")
cum_ret_df

--- 3.2: Performance from 2000-01-01 to 2011-12-31 ---


OOS Strategy           -0.812643
Risk-Free (TBill 3M)    7.488899
Name: Total Return, dtype: float64

In [56]:

table['OOS'] = table['OOS Strategy'] - table['RF']

neg_table_counts = (table < 0).sum()
neg_table_counts.name = "Negative table Count"
print(neg_table_counts.to_string())
print(f"Total periods analyzed: {len(table)}")
print("\n")

OOS Strategy    110
SPY              97
GMWAX           103
GMGEX           117
RF                2
OOS             180
Total periods analyzed: 286




- From the analysis above, it can be shown that the OOS strategy is more risky than the in sample strategy. With the out of sample strategy having a higher absolute 0.05 Var, a higher number of months with returns less than the risk free rate, and worse performance from 2000-2011