# Risk

In [None]:
# hide
%load_ext autoreload
%autoreload 2
%matplotlib inline

import logging
from pathlib import Path

import numpy as np
import pandas as pd
from IPython.display import Image, display
from matplotlib import pyplot as plt
from tqdm.auto import tqdm

logging.basicConfig(level=logging.CRITICAL)

from lightgbm.sklearn import LGBMRegressor
from skfin.backtesting import Backtester
from skfin.datasets import load_kf_returns
from skfin.estimators import MLPRegressor, MultiOutputRegressor, RidgeCV
from skfin.metrics import sharpe_ratio
from skfin.mv_estimators import MeanVariance
from skfin.plot import *
from sklearn.pipeline import make_pipeline
from sklearn.preprocessing import PolynomialFeatures, StandardScaler

returns_data = load_kf_returns(cache_dir="data")
ret = returns_data["Monthly"]["Average_Value_Weighted_Returns"][:"1999"]

transform_X = lambda x: x.rolling(12).mean().fillna(0)
transform_y = lambda x: x.shift(-1)
features = transform_X(ret)
target = transform_y(ret)

A key ingredient of portfolio construction is the ability to predict portfolio risk (in particular, with a risk-model) to be able to properly size the positions. 

In this section, we discuss different ways to estimate risk. More precisely, for the empirical covariance matrix $V$, there might be transformation $\Phi: V \mapsto V_{\Phi}$ that improve the forward-looking estimates (and the portfolio construction). For a given portfolio $h_{\Phi}$ using the covariance $V_{\Phi}$, the metric that we use is the `risk-bias` given by 

$$ \text {RiskBias}_{\Phi}  = Std \left[\frac{h_{\Phi}^T r}{\sqrt{h^T V_{\Phi} h }} \right] -1 , $$
where the variance is evaluated over empirical returns. 

In [None]:
# hide
display(Image("images/ledoit_2004.png", width=600))

The insight of Ledoit and Wolf (2004) is to use a weighted average of two matrices to reduce estimation error

- the empirical covariance matrix $V$ is asymptotically an unbiased estimated – but with a slow convergence for small samples

- there are biased estimators but with a faster rate of convergence –- for instance the diagonal $Diag(V)$ of $V$ -- and on smaller samples, such biased estimators can be more efficient than the unbiased ones

- The covariance matrix used in the portfolio optimisation is 

$$V_{\omega} = \omega \times Diag(V) + (1-\omega) \times V.$$

How to determine $\omega$? Ledoit and Wolf (2004) minimize a norm that applies to matrices (Frobenius norm). In what follows, we test different shrinkage values. 

## Risk in the industry momentum backtest

In [None]:
from skfin.metrics import drawdown, sharpe_ratio

We first compute the Industry momentum benchmark. 

In [None]:
m = Backtester(MeanVariance()).compute_holdings(features, target).compute_pnl(ret)
h0, pnl0, estimators0 = m.h_, m.pnl_, m.estimators_

In [None]:
fig, ax = plt.subplots(1, 2, figsize=(16, 5))
line(
    pnl0.rename("Industry momentum"),
    cumsum=True,
    loc="best",
    title="Cumulative pnl",
    ax=ax[0],
)
line(
    pnl0.rolling(36).std().mul(np.sqrt(12)),
    title="Annualized risk",
    legend=False,
    ax=ax[1],
)

The definition of a drawdown (in unit of annualized risk) is: 
    
$$ dd_t = \frac{\sum_{s=0}^{s=t} pnl_s - Max_{\tau}\left(\sum_{s=0}^{s=\tau} pnl_s \right)}{annualized\_factor \times \sqrt{Var[pnl_s|s \leq t]} }.$$

In [None]:
line(
    pnl0.pipe(drawdown),
    title="Drawdown in unit of annualized risk",
    legend=False,
    figsize=(8, 5),
)

The followning graph shows that on the period up to 2000, large absolute returns tend to be positive. It turns out that in the following period, the pnl of Momentum becomes left-skewed with large negative returns. For instance, February/March 2009 is a famous example of a Momentum drawdown. 

In [None]:
line(
    pnl0.rename("pnl")
    .to_frame()
    .assign(pnl_abs=lambda x: x.pnl.abs())
    .sort_values("pnl_abs")
    .reset_index(drop=True)["pnl"],
    cumsum=True,
    title="Cumulative returns sorted by absolute monthly return",
    legend_sharpe_ratio=False,
)

## Return covariance eigenvalues

The risk-model is defined here as the covariance of returns $V$. To understand its impact on the backtest, it is important to remember that in the mean-variance optimisation, it is the inverse of the covariance matrix $V^{-1}$ that is used. 

Viewed from the point of view of a singular value decomposition, the smallest eigenvalues of $V$ are not only estimated with noise, but their impact is magnified in $V^{-1}$, leading to potentially significant noise in the estimate of positions. 

In [None]:
for train, test in m.cv_.split(ret):
    break

u, s, _ = np.linalg.svd(ret.iloc[train].cov())

The graph below shows that the largest eigenvalue is two-order of magnitude larger than the smallest one. 

In [None]:
df = pd.Series(s, np.arange(1, 13))
scatter(
    df,
    xscale="log",
    yscale="log",
    xlabel="Eigenvalue (log scale)",
    ylabel="Rank (log scale)",
    xticks=[1, 2, 4, 8, 16],
    yticks=[0.1, 1, 10, 100],
    title="Distribution of return covariance eigenvalues",
)

In [None]:
print(f"The ratio of the largest to the smallest eigenvalue is {s[0]/s[-1]:.1f}")

In [None]:
d = {'largest eigenvalue': pd.Series(u[:, 0]/np.sign(np.mean(u[:, 0])), ret.columns), 
     'smallest eigenvalue': pd.Series(u[:, -1]/np.sign(np.mean(u[:, 1])), ret.columns)}
fig, ax = plt.subplots(1, 2, figsize=(20, 6))
fig.suptitle('Eigvectors', y=.95)
for i, (k, v) in enumerate(d.items()): 
    bar(v, title=k, ax=ax[i])

**Lemma**: the eigenvector associated to the largest eigenvalue maximizes $u^T V u$ such that $u^T u = 1$. 

*Proof*. Introducing the Lagrange multiplier $\xi$ on the constraint, the first-order condition is 

$$ V u = \xi u, $$

so that $u$ is an eigenvector and the value of the objective is the eigenvalue associated to $u$. So the objective is maximized for the largest eigenvalue. 


**Corollary**:  the eigenvector associated to the smallest eigenvalue minimizes $u^T V u$ such that $u^T u = 1$. 

The Lemma and corollary above show that the eigenvalues measure the *in-sample* variance of a mode. But how well does the in-sample variance predicts the out-of-sample variance? 

To test assess this point, we construct the pnls of modes (defined as the portfolio with the eigenvectors as positions), normalized by the ex-ante standard deviation (as the square-root of the eigenvalue) and signed so that the in-sample pnl is positive.  

In [None]:
mode_pnl = []
for train, test in m.cv_.split(ret):
    V_ = ret.iloc[train].cov()
    u, s, _ = np.linalg.svd(V_)
    mu = ret.iloc[train].dot(u).mean()
    mode_pnl +=[ret.iloc[test].dot(u).mul(np.sign(mu)).div(np.sqrt(s))]
mode_pnl = pd.concat(mode_pnl)

The graph below shows the out-of-sample risk of each mode pnl which has been rescaled to unit ex-ante risk (so that the natural baseline is 1). This metric is called a `risk bias` and will be defined formally in the next section. We see that for the first largest modes, the risk bias is close to 1, so that the ex-ante risk measures well the out-of-sample risk. However, for the smallest modes, this ex-ante meausre is completely off. For the smallest modes, the positions "overfit" information from the covariance matrix (in particular the correlation) and it is intuitive that the small in-sample risk estimates mean-revert to larger out-of-sample volatility.  

In [None]:
bar(mode_pnl.std(), sort=False, title="Covariance mode risk bias")

In the next section, we discuss techniques to regularize the covariance matrix so that the risk estimates are better at forecasting out-of-sample volatility. We also test whether better risk estimates lead to higher sharpe ratio. 

In [None]:
from sklearn.covariance import LedoitWolf, ShrunkCovariance


$$ShrunkCovariance = (1 - shrinkage) * cov + shrinkage * mu * np.identity(n\_features), $$

where mu = trace(cov) / n_features

In [None]:
S = {}
U0 = {}
for shrinkage in np.arange(0, 1.1, .1): 
    V_ = shrinkage * np.diag(np.diag(ret.iloc[train].cov())) + (1-shrinkage) * ret.iloc[train].cov()
    u, s, _ = np.linalg.svd(V_)
    S[shrinkage] = s 
    U0[shrinkage] = u[:, 0] * np.sign(np.mean(u[:, 0]))
S = pd.DataFrame.from_dict(S, orient='index')
U0 = pd.DataFrame.from_dict(U0, orient='index').rename(columns = {i: c for i, c in enumerate(ret.columns)})

In [None]:
line(S, title='Eigenvalues (x=0: no shrinkage; x=1: full shrinkage)')

In [None]:
line(U0, title='Loadings of first mode (x=0: no shrinkage; x=1: full shrinkage)')

In [None]:
line(U0, title='Loadings of first mode (x=0: no shrinkage; x=1: full shrinkage)')

In [None]:
def simple_shrunk_covariance(x, shrinkage): 
    v = np.cov(x.T)
    return shrinkage * np.diag(np.diag(v)) + (1-shrinkage) * v

In [None]:
pnls = {}
for shrinkage in [0, 0.01, 0.05, 0.1, 1]:
    transform_V_ = lambda x: simple_shrunk_covariance(x, shrinkage=shrinkage)
    estimator = MeanVariance(transform_V=transform_V_)
    pnls[shrinkage] = Backtester(estimator).train(features, target, ret)
line(
    pnls, cumsum=True, title="Robustness for different value of the shrinkage parameter"
)

In [None]:
from sklearn.covariance import shrunk_covariance
pnls = {}
for shrinkage in [0, 0.01, 0.05, 0.1, 1]:
    transform_V_ = lambda x: shrunk_covariance(np.cov(x.T), shrinkage=shrinkage)
    estimator = MeanVariance(transform_V=transform_V_)
    pnls[shrinkage] = Backtester(estimator).train(features, target, ret)
line(
    pnls, cumsum=True, title="Robustness for different value of the shrinkage parameter"
)

In [None]:
S = {}
for shrinkage in np.arange(0, 1.01, .1): 
    V_ = ShrunkCovariance(shrinkage=shrinkage).fit(ret.iloc[train].cov()).covariance_
    _, s, _ = np.linalg.svd(V_)
    S[shrinkage] = s 
S = pd.DataFrame.from_dict(S, orient='index')

line(S)

## Risk model estimation

In [None]:
from sklearn.covariance import LedoitWolf, ShrunkCovariance

The default value of the `shrinkage` parameter for `ShrunkCovariance` is 0.1. When `shrinkage=0`, there is no shrinkage and when `shrinkage=1`, all the off-diagonal terms are set to zero and the covariance matrix is diagonal.   

In [None]:
transform_V_ = lambda x: ShrunkCovariance(shrinkage=0.1).fit(x).covariance_
m = (
    Backtester(MeanVariance(transform_V=transform_V_))
    .compute_holdings(features, target)
    .compute_pnl(ret)
)
h, pnl, estimators = m.h_, m.pnl_, m.estimators_
line({"benchmark": pnl0, "shrunk covariance": pnl}, cumsum=True)

The estimation of risk with the shrunk covariance is much closer to the ex-ante risk (of 1). 

In [None]:
line(
    {"benchmark": pnl0.rolling(36).std(), "shrunk covariance": pnl.rolling(36).std()},
    title="Rolling risk bias (36-month)",
)

The ratio of the largest to the smallest eigenvalue is an order of magnitude smaller for the backtest with the shrunk covariance relative to the benchmark. 

In [None]:
for m in estimators0: 
    break 

In [None]:
s = np.linalg.svd(np.diag(np.diag(m.V_)), compute_uv=False)

In [None]:
get_eigenvalues = lambda estimators: pd.DataFrame(
    [np.linalg.svd(m.V_, compute_uv=False) for m in estimators]
)

ratio_largest_smallest_eigenvalue = lambda x: x.pipe(
    lambda x: x.iloc[:, 0] / x.iloc[:, -1]
)

eigenvalues0 = get_eigenvalues(estimators0)
eigenvalues = get_eigenvalues(estimators)

line(
    {
        "benchmark": eigenvalues0.pipe(ratio_largest_smallest_eigenvalue),
        "shrunk covariance": eigenvalues.pipe(ratio_largest_smallest_eigenvalue),
    },
    yscale="log",
    title="Ratio of the largest-to-the-smallest-eigenvalues",
)

In [None]:
pnls = {}
for shrinkage in [0, 0.01, 0.1, 1]:
    transform_V_ = lambda x: ShrunkCovariance(shrinkage=shrinkage).fit(x).covariance_
    estimator = MeanVariance(transform_V=transform_V_)
    pnls[shrinkage] = Backtester(estimator).train(features, target, ret)
line(
    pnls, cumsum=True, title="Robustness for different value of the shrinkage parameter"
)

A related shrinkage is to use the `LedoitWolf` method to determine the shrinkage and it yield similar performance. 

In [None]:
transform_V_ = lambda x: LedoitWolf().fit(x).covariance_
estimator = MeanVariance(transform_V=transform_V_)
pnl_ = Backtester(estimator).train(features, target, ret)
line(
    {"benchmark": pnl0, "shrunk covaraince": pnl, "ledoit-wolf": pnl_},
    cumsum=True,
    title="Ledoit-Wolf shrinkage",
)

The key empirical point is that the sharpe ratio is maximized for a covariance that involves a small amount of shrinkage.