# Mean-variance estimators

In [None]:
#hide
%load_ext autoreload
%autoreload 2
%matplotlib inline

import numpy as np
import pandas as pd
from IPython.display import display, Image

## Markowitz portfolio optimisation

First, we review the derivation of mean-variance optimisation for a universe with $N$ assets: $r$ is a vector of size $N$; $\alpha$ is the return forecast: $\alpha = E(r)$.

Note: for a vector $v$, we denote $v^T$ as the transpose of $v$. 


**Lemma** [mean-variance]: the allocation that maximizes the utility $h^T \alpha - \frac{h^T V h}{2 \lambda}$ is
$$h = \lambda V^{-1} \alpha, $$ 
where $\lambda$ is the risk-tolerance.

The ex-ante risk is $h^T V h = \lambda^2 \alpha^T V^{-1} \alpha$ and the ex-ante Sharpe ratio is
$$
S = \frac{h^T E(r)}{\sqrt{h^T V h}} = \sqrt{\alpha^T V^{-1} \alpha}. 
$$

**Corollary**: The maximisation of the sharpe ratio is equivalent (up to a scaling factor) the mean-variance optimisation. 

The mean-variance formula is extended to account for the linear constraints
$$A h = b. $$ 

To do so, we introduce the Lagrangian $\mathcal {L}$ (and Lagrange multiplier $\xi$)

$$
\mathcal {L}= h^T \alpha - \frac{h^T V h}{2\lambda} - (h^T A^T - b^T)\xi
$$

The Lagrange multiplier $\xi$ is a `tuning parameter` chosen exactly so that the constraint above holds. At the optimal value of $\xi$, the constrained problem boils down to an unconstrained problem with the adjusted return forecast $\alpha - A^T \xi$.


**Lemma**: the allocation that maximizes the utility $h^T \alpha - \frac{h^T V h}{2 \lambda}$ under the linear constraint $A h = b$ is

$$ h = V^{-1} A^T \left(A V^{-1} A^T \right)^{-1} b + \lambda V^{-1} \left[\alpha - A^T \left(A V^{-1} A^T \right)^{-1} A V^{-1} \alpha \right]$$

*Proof*: the first-order condition is

$$ \frac{\partial \mathcal {L}}{\partial h} = \alpha - \frac{V h}{\lambda} - A^T \xi =0  \Leftrightarrow  h = \lambda V^{-1}[\alpha - A^T \xi] $$

The parameter $\xi$ is chosen so that $A h = b$

$$b = Ah = \lambda A  V^{-1}[\alpha - A^T \xi]  \Rightarrow  \xi = [A V^{-1}A^T]^{-1} \left[ A  V^{-1}\alpha - \frac{b}{\lambda}  \right]
$$

The holding vector under constraint is

$$ h_{\lambda} = \underbrace {V^{-1} A^T \left(A V^{-1} A^T \right)^{-1} b}_{\text {minimum variance portfolio}} + \underbrace { \lambda V^{-1} \left[\alpha - A^T \left(A V^{-1} A^T \right)^{-1} A V^{-1} \alpha \right]}_{\text {speculative portfolio}} $$

- The first term is what minimises the risk $h^T V h$ under the constraint $Ah =b$ (in particular, it does not depend on expected returns or risk-tolerance).

- The second term is the speculative portfolio (it is sensitive to both inputs).

The efficient frontier is the relation between  expected portfolio return $h^T \alpha$ and portfolio standard deviation $\sqrt{h^T V h}$ for varying level of risk-tolerance
$$ (x, y) \mapsto \left(h_{\lambda}^T \alpha, \sqrt{h_{\lambda}^T V h_{\lambda}} \right)$$

When $b=0$, the efficient frontier between $h_{\lambda}^T \alpha$ and $\sqrt{h_{\lambda}^T V h_{\lambda}}$ is a line through $(0,0)$; otherwise, it is a parabolic curve.

We focus on pure "alpha views" -- that is, long-short "cash-neutral" portfolios where the sum of holdings is zero. In this case $b=0$ and $A = \textbf{1}$ where

$$ \textbf {1} = \left[\begin {array}{ccc} 1  & \ldots & 1  \end {array} \right].$$

## Mean-variance estimators

In the next set of helper file, we introduce two main functions: 

- a function that computes mean-variance holdings for batches

- a `MeanVariance` class that follows the `sklearn` api

In [None]:
%%writefile ../skfin/mv_estimators.py
import numpy as np
import pandas as pd
from sklearn.base import BaseEstimator, TransformerMixin
from sklearn.linear_model import LinearRegression

from skfin.metrics import sharpe_ratio


def compute_batch_holdings(pred, V, A=None, past_h=None, constant_risk=False):
    """
    compute markowitz holdings with return prediction "mu" and covariance matrix "V"

    mu: numpy array (shape N * K)
    V: numpy array (N * N)

    """

    N, _ = V.shape
    if isinstance(pred, pd.Series) | isinstance(pred, pd.DataFrame):
        pred = pred.values
    if pred.shape == (N,):
        pred = pred[:, None]
    elif pred.shape[1] == N:
        pred = pred.T

    invV = np.linalg.inv(V)
    if A is None:
        M = invV
    else:
        U = invV.dot(A)
        if A.ndim == 1:
            M = invV - np.outer(U, U.T) / U.dot(A)
        else:
            M = invV - U.dot(np.linalg.inv(U.T.dot(A)).dot(U.T))
    h = M.dot(pred)
    if constant_risk:
        h = h / np.sqrt(np.diag(h.T.dot(V.dot(h))))
    return h.T


class MeanVariance(BaseEstimator):
    def __init__(self, transform_V=None, A=None, constant_risk=True):
        if transform_V is None:
            self.transform_V = lambda x: np.cov(x.T)
        else:
            self.transform_V = transform_V
        self.A = A
        self.constant_risk = constant_risk

    def fit(self, X, y=None):
        self.V_ = self.transform_V(y)

    def predict(self, X):
        if self.A is None:
            T, N = X.shape
            A = np.ones(N)
        else:
            A = self.A
        h = compute_batch_holdings(X, self.V_, A, constant_risk=self.constant_risk)
        return h

    def score(self, X, y):
        return sharpe_ratio(np.sum(X * y, axis=1))


class Mbj(TransformerMixin):
    def __init__(self, positive=False):
        self.positive = positive

    def fit(self, X, y=None):
        m = LinearRegression(fit_intercept=False, positive=self.positive)
        m.fit(X, y=np.ones(len(X)))
        self.coef_ = m.coef_ / np.sqrt(np.sum(m.coef_**2))
        return self

    def transform(self, X):
        return X.dot(self.coef_)


class TimingMeanVariance(BaseEstimator):
    def __init__(self, transform_V=None, a_min=None, a_max=None):
        if transform_V is None:
            self.transform_V = lambda x: np.var(x)
        else:
            self.transform_V = transform_V
        self.a_min = a_min
        self.a_max = a_max

    def fit(self, X, y=None):
        self.V_ = self.transform_V(y)

    def predict(self, X):
        if (self.a_min is None) & (self.a_max is None):
            h = X / self.V_
        else:
            h = np.clip(
                X / np.sqrt(self.V_), a_min=self.a_min, a_max=self.a_max
            ) / np.sqrt(self.V_)
        return h

In [None]:
from skfin.mv_estimators import compute_batch_holdings, MeanVariance
from skfin.datasets import load_kf_returns
returns_data = load_kf_returns(cache_dir="data", force_reload=True)
ret = returns_data["Monthly"]["Average_Value_Weighted_Returns"][:'1999']

In [None]:
T, N = ret.shape
A = np.ones(N)

In [None]:
h = compute_batch_holdings(ret.mean(), ret.cov(), A, past_h=None)

In [None]:
np.allclose(h.dot(A), [0.])

In [None]:
A = np.stack([np.ones(N), np.zeros(N)], axis=1)
A[0, 1] = 1

In [None]:
h = compute_batch_holdings(pred=ret.mean(), V=ret.cov(), A=A, past_h=None)

## A shortcut to compute markowitz weights

In [None]:
#hide
display(Image('images/mbj.png',width=500))

Trick to compute markowitz weights just with the pnl of different assets 

- X: pnl of $K$ assets over $T$ days -- so that the shape of X is $[T \times K]$. 

- y: vector of ones of size $T$. 

**Lemma** [Mark Britten-Jones]: the markowitz weights of are proportional to the slope coefficient of a regression of the vector of ones $y$ on the pnls $X$ *with no intercept*. 

Proof: the coefficient of the regression with no intercept is given by 

$$ b = (X^T X)^{-1} X^T y  $$

The mean of the pnls is given by $\mu = \frac{1}{T} X^T y$. The variance of the pnls is $V = \frac{1}{T} X^T X - \mu \mu^T$

Using the Woodbury identity (https://en.wikipedia.org/wiki/Woodbury_matrix_identity), we have: 

$$ b = (V + \mu \mu^{T})^{-1} \mu = \left[ V^{-1} -  \frac{V^{-1} \mu \mu^{T}V^{-1}}{1 + \mu^T V^{-1} \mu}  \right] \mu = \frac{V^{-1} \mu}{1 + \mu^T V^{-1} \mu} $$

The main trick is to recognise that 

$$(X^T X)^{-1} X^T y \propto [X^T X - (X^T y)^T  X^T y ]^{-1} X^T y$$

## Mean-variance from a trade perspective 

We can rewrite the mean-variance objective as a function of trades $t$ insead of holdings $h$

$$ t = h - h_0, $$ 

where $h_0$ are the previous-period holdings.

**Lemma**. If $A h_0 = b$, then  

$$
\mathcal {L}= t^T \left[\alpha - \frac{V h_0}{\lambda}\right] - \frac{t^T V t}{2\lambda} - t^T A^T \xi
$$