# Timing backtest with learning

In [None]:
# hide
%load_ext autoreload
%autoreload 2
%matplotlib inline

import os
import subprocess
from pathlib import Path

import numpy as np
import pandas as pd
from IPython.display import Image, display
from skfin.plot import bar, line

In previous sections, we studied the predictability of industry and stock returns in a long-short "cash-neutral" setting. In this section, we shift to the predictability of a single asset (ie. the "market" as the S\&P 500 US index).

## Timing the market

To evaluate the out-of-sample predictability of a variable, Welch-Goyal (2008) compare two regressions:

- conditional regression (based on the predictor)
- unconditional regression (based on a rolling mean)

The comparison between the two regression provides a test of whether the predictor has any value. 

The main intuitions for why the predictability of some variables for the S&P 500 are related to valuations: 

- “low" prices relative to dividends forecast higher subsequent returns; 
- other ratios (earnings, book value, moving average of past prices instead of dividends) should also work; 
- expected returns vary over the business cycle and higher risk premium required to get people to hold stocks at the bottom of a recession: dividend-price ratios can be interpreted a state-variable capturing business cycle risk. 

The main critical question that Welch-Goyal (2008) ask is whether in-sample results also hold out-of-sample. 

Data for the following graphs: 

- dividend price ratio (“d/p"): difference between the log of dividends and the log of prices
- dividend yield (“d/y"): difference between the log of dividends and the log of lagged prices
- percent equity issuing (“equis"): ratio of equity issuing activity as a fraction of total issuing equity

In [None]:
# hide
display(Image("images/gw_1.png", width=500))

In [None]:
# hide
display(Image("images/gw_2.png", width=700))

Welch-Goyal summary: very little predictability and the oil shock 1974 important in explaining results in the literature.

In [None]:
# hide
display(Image("images/gw_3.png", width=500))

Responding to Welch-Goyal (2008), Campbell-Thompson (2008): impose “sign-restrictions"

>  “in practice, an investor would not use a perverse coefficient but would likely conclude that the coefficient is zero, in effect imposing prior knowledge on the output of the regression" (p. 1516)

Sign restrictions

- set the regression coefficient to zero whenever it has the "wrong" sign (different from the theoretically expected sign estimated over the sample)
- set the forecast equity premium to zero whenever it is negative

Summary: does dividend yield predict returns?

- Yes: dividend yield is a strong predictor in the 1970s and 1980s (in-sample!)
- No: the relationship became weaker in 1990s
- No: the statistical evidence is much weaker when adjusting for fact that regressors highly persistent
- No: dividend yield is also weak predictor out-of-sample –and rarely better than a moving-average.

## Data

The data provided by Amit Goyal on the S\&P 500 is essentially identical to the one provided by Ken French.

In [None]:
from skfin.datasets import load_ag_features, load_kf_returns
df = load_ag_features()[:"1999"]

In [None]:
ret = load_kf_returns(filename="F-F_Research_Data_Factors")["Monthly"][:"1999"]

In [None]:
corr_ = df[["CRSP_SPvw"]].corrwith(
    ret.assign(Mkt=lambda x: x["Mkt-RF"] + x["RF"])["Mkt"]
)["CRSP_SPvw"]
print(f"Correlation data Ken French/Amit Goyal:{corr_:.2f}")

In [None]:
df["CRSP_SPvw"].std() * np.sqrt(12)

In [None]:
line(
    {
        "Amit Goyal": df["CRSP_SPvw"],
        "Ken French": ret.assign(Mkt=lambda x: x["Mkt-RF"] + x["RF"])["Mkt"] / 100,
    },
    cumsum=True,
)

## Timing backtest

In [None]:
from skfin.estimators import Ridge, RidgeCV
from skfin.mv_estimators import TimingMeanVariance
from sklearn.model_selection import TimeSeriesSplit
from sklearn.pipeline import make_pipeline
from sklearn.preprocessing import StandardScaler

In [None]:
start_date = "1945-01-01"
test_size = 1
params = dict(max_train_size=36, test_size=test_size, gap=0)
params["n_splits"] = 1 + len(ret[:"1999"].loc[start_date:]) // test_size

cv = TimeSeriesSplit(**params)

In [None]:
cols = [
    "D12",
    "E12",
    "b/m",
    "tbl",
    "AAA",
    "BAA",
    "lty",
    "ntis",
    "Rfree",
    "infl",
    "ltr",
    "corpr",
    "svar",
    "csp",
]
ret_ = ret["Mkt-RF"]
target = ret_
features = df.loc[ret.index, cols].fillna(0)

In [None]:
m = make_pipeline(
    StandardScaler(), Ridge(), TimingMeanVariance(a_min=-0.25, a_max=0.25)
)

_h = []
for train, test in cv.split(ret):
    m.fit(features.iloc[train], target.iloc[train])
    _h += [m.predict(features.iloc[test])]

idx = ret.index[np.concatenate([test for _, test in cv.split(ret)])]
h = pd.Series(np.concatenate(_h), index=idx)
pnl = h.shift(1).mul(ret_).dropna()
line(pnl, cumsum=True)

We can plot the holdings and in this case, we see that the positions vary significantly and that there is a significant positive `tilt` (defined as the exponential average over the positions with a 252-day halflife). 

In [None]:
line({"holding": h, "tilt": h.ewm(halflife=12).mean()})

Decomposing the pnl attributed to the `tilt` and the `timing` (defined as the difference between the positions and the `tilt`), we see that both contribute -- although the `timing` pnl has a lower sharpe ratio. 

In [None]:
line(
    {
        "ALL": pnl,
        "tilt": h.ewm(halflife=12).mean().shift(1).mul(ret_).dropna(),
        "timing": h.sub(h.ewm(halflife=12).mean()).shift(1).mul(ret_).dropna(),
    },
    cumsum=True, title='Tilt/timing decomposition'
)

In what follows, we use the `Backtester` clas with the timing pipeline. 

In [None]:
from skfin.backtesting import Backtester

estimator = make_pipeline(
    StandardScaler(), Ridge(), TimingMeanVariance(a_min=-0.25, a_max=0.25)
)

m = Backtester(estimator=estimator)
m.compute_holdings(features, target).compute_pnl(ret_)

np.allclose(h, m.h_), np.allclose(pnl, m.pnl_)

## Other timing backtest statistics

In [None]:
coef = pd.DataFrame(
    [m_.steps[1][1].coef_ for m_ in m.estimators_], columns=cols, index=m.h_.index
)
line(coef, title="Ridge coefficient")

In [None]:
bar(coef.mean(), horizontal=True)

In [None]:
from skfin.metrics import sharpe_ratio

In [None]:
sr = {i: m.h_.shift(1 + i).mul(ret_).pipe(sharpe_ratio) for i in range(-10, 12)}
bar(sr, baseline=0, sort=False, title="Lead-lag sharpe ratio")

In [None]:
pnls_ = {}
for c in cols + ["ALL"]:
    features_ = df.loc[ret.index].drop(c, axis=1, errors="ignore").fillna(0)
    pnls_[c] = Backtester(estimator=estimator).train(features_, target, ret=ret_)
line(pnls_, cumsum=True, title="Feature off the top")

In [None]:
pnls_ = {}
for alpha in [0.1, 1, 10, 100, 1000]:
    estimator_ = make_pipeline(
        StandardScaler(),
        Ridge(alpha=alpha),
        TimingMeanVariance(a_min=-0.25, a_max=0.25),
    )
    pnls_[alpha] = Backtester(estimator=estimator_).train(features, target, ret=ret_)
line(pnls_, cumsum=True, title="Robustness: ridge alpha")

In [None]:
estimator_ = make_pipeline(
    StandardScaler(),
    RidgeCV(alphas=[1, 10, 100, 1000]),
    TimingMeanVariance(a_min=-0.25, a_max=0.25),
)

m_ = Backtester(estimator=estimator_)
m_.compute_holdings(features, target).compute_pnl(ret_)
line({"ridge": m.pnl_, "ridgeCV": m_.pnl_}, cumsum=True, title="Robustness: estimator")

The following graph shows the regularization paramter `alpha` estimated by cross-validation by the `RidgeCV` estimator. 

In [None]:
alpha = pd.Series([m_.steps[1][1].alpha_ for m_ in m_.estimators_], index=m_.h_.index)
line(alpha, legend=False, title="RidgeCV alpha")