## RiskEstimators class functions

This description is partially based on the __scikit-learn User Guide on Covariance estimation__ [available here](https://scikit-learn.org/stable/modules/covariance.html#robust-covariance).

## Introduction

Risk Estimators class includes the implementations of functions for different ways to calculate and adjust Covariance matrices.

The following algorithms are now implemented:
- Minimum Covariance Determinant
- Maximum likelihood covariance estimator (Empirical covariance)
- Covariance estimator with shrinkage
  - Basic shrinkage
  - Ledoit-Wolf shrinkage
  - Oracle Approximating Shrinkage
- Semi-Covariance matrix
- Exponentially-weighted Covariance matrix
- De-noising and De-toning Covariance/Correlation Matrix
  - Constant Residual Eigenvalue De-noising Method
  - Spectral Clustering De-noising Method
  - Targeted Shrinkage De-noising Method
  - Hierarchical Clustering De-noising Method
  - De-toning
- Transforming covariance matrix to correlation matrix and back

This Notebook will describe the above algorithms as well as provide use cases and analysis of results.

## Minimum Covariance Determinant

According to the __scikit-learn User Guide on Covariance estimation__:

_"The outliers are appearing in real data sets and seriously affect the Empirical covariance estimator and the Covariance estimators with shrinkage. For this reason, a robust covariance estimator is needed in order to discard/downweight the outliers in the data"._

The robust estimator presented in the package is the Minimum Covariance Determinant estimator, introduced by P.J. Rousseeuw.

_"The basic idea of the algorithm is to find a set of observations that are not outliers and compute their empirical covariance matrix, which is then rescaled to compensate for the performed selection of observations"._

Our function is a wrap around the sklearn's MinCovDet class, which uses FastMCD algorithm, developed by Rousseeuw and Van Driessen.

A detailed description of the algorithm is available in the paper by _Mia Hubert_ and _Michiel Debruyne_ __Minimum covariance determinant__ [available here](https://wis.kuleuven.be/stat/robust/papers/2010/wire-mcd.pdf)

### Examples of use

We can calculate the Minimum Covariance Determinant estimator of covariance for a data set of stock prices and compare it to the simple covariance.

In [1]:
import portfoliolab as pl
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt

In [2]:
# Getting the data
stock_prices = pd.read_csv('../Sample-Data/stock_prices.csv', parse_dates=True, index_col='Date', dayfirst=True)
stock_prices = stock_prices.dropna(axis=1)

# Leaving only 5 stocks in the dataset, so the differences between the 
# calculated covariance matrices would be easy to observe.
stock_prices = stock_prices.iloc[:, :5]
stock_prices.head()

Unnamed: 0_level_0,EEM,EWG,TIP,EWJ,EFA
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
2008-01-02,49.273335,35.389999,106.639999,52.919998,78.220001
2008-01-03,49.716667,35.290001,107.0,53.119999,78.349998
2008-01-04,48.223331,34.599998,106.970001,51.759998,76.57
2008-01-07,48.576668,34.630001,106.949997,51.439999,76.650002
2008-01-08,48.200001,34.389999,107.029999,51.32,76.220001


In [3]:
# A class that has the Minimum Covariance Determinant estimator
risk_estimators = pl.estimators.RiskEstimators()

# Finding the Minimum Covariance Determinant estimator on price data and with set random seed to 0
min_cov_det = risk_estimators.minimum_covariance_determinant(stock_prices, price_data=True, random_state=0)

# For the simple covariance, we need to transform the stock prices to returns

# A class with function to calculate returns from prices
returns_estimation =pl.estimators.ReturnsEstimators()

# Calcualting the data set of returns
stock_returns = returns_estimation.calculate_returns(stock_prices)

# Finding the simple covariance matrix from a series of returns
cov_matrix = stock_returns.cov()

# Transforming Minimum Covariance Determinant estimator from np.array to pd.DataFrame
min_cov_det = pd.DataFrame(min_cov_det, index=cov_matrix.index, columns=cov_matrix.columns)

print('The Minimum Covariance Determinant estimator is:')
min_cov_det

The Minimum Covariance Determinant estimator is:


Unnamed: 0,EEM,EWG,TIP,EWJ,EFA
EEM,0.000146,0.000112,-5e-06,7.6e-05,0.000102
EWG,0.000112,0.000154,-7e-06,7.6e-05,0.000114
TIP,-5e-06,-7e-06,1.1e-05,-3e-06,-5e-06
EWJ,7.6e-05,7.6e-05,-3e-06,9.8e-05,7.7e-05
EFA,0.000102,0.000114,-5e-06,7.7e-05,0.0001


In [4]:
print('The Simple Covariance is:')
cov_matrix

The Simple Covariance is:


Unnamed: 0,EEM,EWG,TIP,EWJ,EFA
EEM,0.000466,0.00035,-1.7e-05,0.000255,0.000324
EWG,0.00035,0.000372,-1.5e-05,0.000221,0.000303
TIP,-1.7e-05,-1.5e-05,1.9e-05,-9e-06,-1.2e-05
EWJ,0.000255,0.000221,-9e-06,0.000232,0.000218
EFA,0.000324,0.000303,-1.2e-05,0.000218,0.000278


From the results, the absolute values in the Minimum Covariance Determinant estimator are lower in comparison to the simple Covariance matrix, which means that the algorithm has eliminated some of the outliers in the data and the resulting covariance matrix estimator is a more robust one.

## Maximum likelihood covariance estimator (Empirical covariance)

According to the __scikit-learn User Guide on Covariance estimation__:

_"The covariance matrix of a data set can be well approximated by the maximum likelihood estimator (Empirical covariance) if the number of observations is big enough in relation to the number of features"._

_"The Maximum Likelihood Estimator of a sample is an unbiased estimator of the corresponding population’s covariance matrix"._

### Examples of use

We can calculate the Empirical covariance for a data set of stock prices and compare it to the simple covariance.

In [5]:
# Finding the Empirical Covariance on price data
empirical_cov = risk_estimators.empirical_covariance(stock_prices, price_data=True)

# Transforming Empirical Covariance from np.array to pd.DataFrame
empirical_cov = pd.DataFrame(empirical_cov, index=cov_matrix.index, columns=cov_matrix.columns)

print('The Empirical Covariance is:')
empirical_cov

The Empirical Covariance is:


Unnamed: 0,EEM,EWG,TIP,EWJ,EFA
EEM,0.000466,0.00035,-1.7e-05,0.000255,0.000324
EWG,0.00035,0.000372,-1.5e-05,0.000221,0.000303
TIP,-1.7e-05,-1.5e-05,1.9e-05,-9e-06,-1.2e-05
EWJ,0.000255,0.000221,-9e-06,0.000232,0.000218
EFA,0.000324,0.000303,-1.2e-05,0.000218,0.000278


In [6]:
print('The Simple Covariance is:')
cov_matrix

The Simple Covariance is:


Unnamed: 0,EEM,EWG,TIP,EWJ,EFA
EEM,0.000466,0.00035,-1.7e-05,0.000255,0.000324
EWG,0.00035,0.000372,-1.5e-05,0.000221,0.000303
TIP,-1.7e-05,-1.5e-05,1.9e-05,-9e-06,-1.2e-05
EWJ,0.000255,0.000221,-9e-06,0.000232,0.000218
EFA,0.000324,0.000303,-1.2e-05,0.000218,0.000278


The result is the same as from the standard covariance function from the pandas package.

## Covariance estimator with shrinkage

According to the __scikit-learn User Guide on Covariance estimation__:

_"The Maximum Likelihood Estimator is not a good estimator of the eigenvalues of the covariance matrix and the inverted matrix is not accurate. Sometimes, ... it cannot be inverted for numerical reasons"._

_"To avoid problems with inversion, a transformation of the empirical covariance matrix has been introduced: the shrinkage"._

_"Mathematically, this shrinkage consists in reducing the ratio between the smallest and the largest eigenvalues of the empirical covariance matrix"._

### Basic shrinkage

_"This shrinkage is done by shifting every eigenvalue according to a given offset, which is equivalent to finding the l2-penalized Maximum Likelihood Estimator of the covariance matrix"._

_"Shrinkage boils down to a simple a convex transformation":_

$$\sum_{shrunk} = (1 - \alpha)\sum_{unshrunk} + \alpha\frac{Tr \sum_{unshrunk}}{p}Id$$

_"The amount of shrinkage $\alpha$ is setting a trade-off between bias and variance"._

In the implementation, $\alpha$ is passed to a function as the $basic\_shrinkage$ parameter.

### Ledoit-Wolf shrinkage

_"The Ledoit-Wolf shrinkage is based on computing the optimal shrinkage coefficient $\alpha$ that minimizes the Mean Squared Error between the estimated and the real covariance matrix"._

The algorithm is described in more detail in the paper by _Olivier Ledoit_ and _Michael Wolf_ __A well-conditioned estimator forlarge-dimensional covariance matrices__ [available here](http://perso.ens-lyon.fr/patrick.flandrin/LedoitWolf_JMA2004.pdf)

### Oracle Approximating shrinkage

_"Assuming that the data are Gaussian distributed, Chen et al. derived a formula aimed at choosing a shrinkage coefficient $\alpha$ that yields a smaller Mean Squared Error than the one given by Ledoit and Wolf’s formula"._

_"The resulting estimator is known as the Oracle Shrinkage Approximating estimator of the covariance"._

The algorithm is described in more detail in the paper by _Y. Chen_, _A. Wiesel_, _Y.C. Eldar_ and _A.O. Hero_ __Shrinkage Algorithms for MMSE Covariance Estimation__ [available here](https://webee.technion.ac.il/people/YoninaEldar/104.pdf)

### Examples of use

We can calculate the Shrinked Covariances for every method and compare them.

In [7]:
# Finding the Shrinked Covariances on price data with every method
shrinked_cov = risk_estimators.shrinked_covariance(stock_prices, price_data=True,
                                                   shrinkage_type='all', basic_shrinkage=0.1)

# Separating the Shrinked covariances for every method
shrinked_cov_basic, shrinked_cov_lw, shrinked_cov_oas = shrinked_cov

# Transforming each Shrinked Covariance from np.array to pd.DataFrame
shrinked_cov_basic = pd.DataFrame(shrinked_cov_basic, index=cov_matrix.index, columns=cov_matrix.columns)
shrinked_cov_lw = pd.DataFrame(shrinked_cov_lw, index=cov_matrix.index, columns=cov_matrix.columns)
shrinked_cov_oas = pd.DataFrame(shrinked_cov_oas, index=cov_matrix.index, columns=cov_matrix.columns)

print('The Basic Shrinked covariance with an alpha of 0.1 is:')
shrinked_cov_basic

The Basic Shrinked covariance with an alpha of 0.1 is:


Unnamed: 0,EEM,EWG,TIP,EWJ,EFA
EEM,0.000446,0.000315,-1.5e-05,0.00023,0.000292
EWG,0.000315,0.000362,-1.3e-05,0.000199,0.000273
TIP,-1.5e-05,-1.3e-05,4.5e-05,-8e-06,-1.1e-05
EWJ,0.00023,0.000199,-8e-06,0.000236,0.000197
EFA,0.000292,0.000273,-1.1e-05,0.000197,0.000278


In [8]:
print('The Ledoit-Wolf Shrinked covariance is:')
shrinked_cov_lw

The Ledoit-Wolf Shrinked covariance is:


Unnamed: 0,EEM,EWG,TIP,EWJ,EFA
EEM,0.000464,0.000346,-1.6e-05,0.000252,0.000321
EWG,0.000346,0.000371,-1.5e-05,0.000219,0.0003
TIP,-1.6e-05,-1.5e-05,2.2e-05,-9e-06,-1.2e-05
EWJ,0.000252,0.000219,-9e-06,0.000233,0.000216
EFA,0.000321,0.0003,-1.2e-05,0.000216,0.000278


In [9]:
print('The Oracle Approximating Shrinked covariance is:')
shrinked_cov_oas

The Oracle Approximating Shrinked covariance is:


Unnamed: 0,EEM,EWG,TIP,EWJ,EFA
EEM,0.000465,0.000349,-1.7e-05,0.000255,0.000324
EWG,0.000349,0.000372,-1.5e-05,0.000221,0.000303
TIP,-1.7e-05,-1.5e-05,2e-05,-9e-06,-1.2e-05
EWJ,0.000255,0.000221,-9e-06,0.000232,0.000218
EFA,0.000324,0.000303,-1.2e-05,0.000218,0.000278


In [10]:
print('The Simple Covariance is:')
cov_matrix

The Simple Covariance is:


Unnamed: 0,EEM,EWG,TIP,EWJ,EFA
EEM,0.000466,0.00035,-1.7e-05,0.000255,0.000324
EWG,0.00035,0.000372,-1.5e-05,0.000221,0.000303
TIP,-1.7e-05,-1.5e-05,1.9e-05,-9e-06,-1.2e-05
EWJ,0.000255,0.000221,-9e-06,0.000232,0.000218
EFA,0.000324,0.000303,-1.2e-05,0.000218,0.000278


The Shrinked Covariance matrices for the Ledoit-Wolf and Oracle Approximating algorithms are similar with absolute covariance values in the Oracle Approximating covariance matrix being slightly bigger. With the basic Shrinkage covariance matrix with $\alpha = 0.1$, the absolute values are even smaller. The Simple Covariance matrix has the highest absolute values in comparison.

## Semi-Covariance matrix

Semi-covariance matrix is the way to measure the volatility of the negative returns or returns below a certain threshold. 

This measure can be used to decrease the negative volatility and is being more precise for this goal than the covariance matrix that measures both positive and negative variance. 

According to the __Minimum Downside Volatility Indices__ paper:

_"Each element in the Semi-Covariance matrix is calculated as:"_

$$SemiCov_{ij} = \frac{1}{T}\sum_{t=1}^{T}[Min(R_{i,t}-B,0)*Min(R_{j,t}-B,0)]$$

_where $T$ is the number of observations,_ $R_{i,t}$ _is the return of an asset $i$ at time $t$, and $B$ is the threshold return._

_If the $B$ is set to zero, the volatility of negative returns is measured._

A deeper analysis of use cases of Semi-Covariance matrix is available in the paper by _Solactive AG - German Index Engineering_ __Minimum Downside Volatility Indices__ [available here](https://www.solactive.com/wp-content/uploads/2018/04/Solactive_Minimum-Downside-Volatility-Indices.pdf)

### Examples of use

We can calculate the Semi-Covariance and compare it to the simple covariance.

In [11]:
# Finding the Semi-Covariance on price data
semi_cov = risk_estimators.semi_covariance(stock_prices, price_data=True, threshold_return=0)

# Transforming Semi-Covariance from np.array to pd.DataFrame
semi_cov = pd.DataFrame(semi_cov, index=cov_matrix.index, columns=cov_matrix.columns)

print('The Semi-Covariance matrix is:')
semi_cov

The Semi-Covariance matrix is:


Unnamed: 0,EEM,EWG,TIP,EWJ,EFA
EEM,4.4e-05,3.5e-05,2e-06,2.5e-05,3.2e-05
EWG,3.5e-05,3.8e-05,2e-06,2.3e-05,3.1e-05
TIP,2e-06,2e-06,2e-06,2e-06,2e-06
EWJ,2.5e-05,2.3e-05,2e-06,2.3e-05,2.2e-05
EFA,3.2e-05,3.1e-05,2e-06,2.2e-05,2.9e-05


In [12]:
print('The Simple Covariance is:')
cov_matrix

The Simple Covariance is:


Unnamed: 0,EEM,EWG,TIP,EWJ,EFA
EEM,0.000466,0.00035,-1.7e-05,0.000255,0.000324
EWG,0.00035,0.000372,-1.5e-05,0.000221,0.000303
TIP,-1.7e-05,-1.5e-05,1.9e-05,-9e-06,-1.2e-05
EWJ,0.000255,0.000221,-9e-06,0.000232,0.000218
EFA,0.000324,0.000303,-1.2e-05,0.000218,0.000278


As the computation of the Semi-Covariance matrix is different from the usual computation of the covariance matrix, the absolute values in the Semi-Covariance matrix are significantly lower. Since it's a measure, let's multiply the Semi-Covariance matrix by 10 to better see the changes in the measures.

In [13]:
print('The Semi-Covariance matrix multiplied by 10 is:')
semi_cov * 10

The Semi-Covariance matrix multiplied by 10 is:


Unnamed: 0,EEM,EWG,TIP,EWJ,EFA
EEM,0.000438,0.000351,1.8e-05,0.000251,0.000322
EWG,0.000351,0.000377,1.7e-05,0.000231,0.000312
TIP,1.8e-05,1.7e-05,1.9e-05,1.6e-05,1.5e-05
EWJ,0.000251,0.000231,1.6e-05,0.00023,0.000222
EFA,0.000322,0.000312,1.5e-05,0.000222,0.000285


Now we can see that the values in the two matrices are similar, however, some differences are present.

For example, the simple Covariance between the EEM and TIP is negative, but the negative returns have positive covariance. 

## Exponentially-weighted Covariance matrix

Each element in the Exponentially-weighted Covariance matrix is calculated as follows.

First, we calculate the series of covariances for every observation time $t$ between each two elements $i$ and $j$:

$$CovarSeries_{i,j}^{t} = (R_{i}^{t} - Mean(R_{i})) * (R_{j}^{t} - Mean(R_{j}))$$

Then we apply the exponential weighted moving average based on the obtained series with decay in terms of span, as $\alpha=\frac{2}{span+1}$, for $span \ge 1$

$$ExponentialCovariance_{i,j} = ExponentialWeightedMovingAverage(CovarSeries_{i,j})[T]$$

So, it's the last element from an exponentially weighted moving average series based on a series of covariances between returns of the corresponding assets. 

It is used to give greater weight to most relevant observations in computing the covariance.

### Examples of use

We can calculate the Exponential Covariance and compare it to the simple covariance.

In [14]:
# Finding the Exponential Covariance on price data and span of 60
exponential_cov = risk_estimators.exponential_covariance(stock_prices, price_data=True, window_span=60)

# Transforming Semi-Covariance from np.array to pd.DataFrame
exponential_cov = pd.DataFrame(exponential_cov, index=cov_matrix.index, columns=cov_matrix.columns)

print('The Exponential Covariance matrix is:')
exponential_cov

The Exponential Covariance matrix is:


Unnamed: 0,EEM,EWG,TIP,EWJ,EFA
EEM,0.000282,0.000322,-4e-06,0.00019,0.000303
EWG,0.000322,0.000459,-1.9e-05,0.000237,0.00041
TIP,-4e-06,-1.9e-05,9e-06,-1.1e-05,-1.6e-05
EWJ,0.00019,0.000237,-1.1e-05,0.000199,0.000229
EFA,0.000303,0.00041,-1.6e-05,0.000229,0.00038


In [15]:
print('The Simple Covariance is:')
cov_matrix

The Simple Covariance is:


Unnamed: 0,EEM,EWG,TIP,EWJ,EFA
EEM,0.000466,0.00035,-1.7e-05,0.000255,0.000324
EWG,0.00035,0.000372,-1.5e-05,0.000221,0.000303
TIP,-1.7e-05,-1.5e-05,1.9e-05,-9e-06,-1.2e-05
EWJ,0.000255,0.000221,-9e-06,0.000232,0.000218
EFA,0.000324,0.000303,-1.2e-05,0.000218,0.000278


From the results it's seen that the variance of the EWG has increased in the last observations, whereas the the variance of the EEM has decreased. The covariance between the EEM and EWJ has decreased in the last observations.

So, the covariance with higher weights in the most recent observations can be analyzed in comparison to covariance with equal weights (simple covariance). And the conclusions about how the covariance has changed over time can be drawn.

## De-noising and De-toning Covariance/Correlation Matrix

### Constant Residual Eigenvalue De-noising Method

The main idea behind de-noising the covariance matrix is to eliminate the eigenvalues of the covariance matrix that are representing noise and not useful information. 

This is done by determining the maximum theoretical value of the eigenvalue of such matrix as a threshold and then setting all the calculated eigenvalues below the threshold to the same value.

The function provided below for de-noising the covariance works as follows:
- The given covariance matrix is transformed to the correlation matrix.

- The eigenvalues and eigenvectors of the correlation matrix are calculated.

- Using the Kernel Density Estimate algorithm a kernel of the eigenvalues is estimated.

- The Marcenko-Pastur pdf is fitted to the KDE using the variance as the parameter for the optimization.

- From the obtained Marcenko-Pastur distribution, the maximum theoretical eigenvalue is calculated using the formula from the **Instability caused by noise** part of [A Robust Estimator of the Efficient Frontier paper](https://papers.ssrn.com/sol3/abstract_id=3469961).

- The eigenvalues in the set that are below the theoretical value are all set to their average value. For example, we have a set of 5 eigenvalues sorted in the descending order ( $\lambda_1 ... \lambda_5$ ), 3 of which are below the maximum theoretical value, then we set

$$\lambda_3^{NEW} = \lambda_4^{NEW} = \lambda_5^{NEW} = \frac{\lambda_3^{OLD} + \lambda_4^{OLD} + \lambda_5^{OLD}}{3}$$

- Eigenvalues above the maximum theoretical value are left intact.

$$\lambda_1^{NEW} = \lambda_1^{OLD}$$

$$\lambda_2^{NEW} = \lambda_2^{OLD}$$

- The new set of eigenvalues with the set of eigenvectors is used to obtain the new de-noised correlation matrix. $\tilde{C}$ is the de-noised correlation matrix, $W$ is the eigenvectors matrix, and $\Lambda$ is the diagonal matrix with new eigenvalues.

$$\tilde{C} = W \Lambda W$$

- To rescale $\tilde{C}$ so that the main diagonal consists of 1s the following transformation is made. This is how the final $C_{denoised}$ is obtained.

$$C_{denoised} = \tilde{C} [(diag[\tilde{C}])^\frac{1}{2}(diag[\tilde{C}])^{\frac{1}{2}'}]^{-1}$$

- The new correlation matrix is then transformed back to the new de-noised covariance matrix.

The process of de-noising the covariance matrix is described in a paper by _Potter M._, _J.P. Bouchaud_, _L. Laloux_ __“Financial applications of random matrix theory: Old laces and new pieces.”__  [available here](https://arxiv.org/abs/physics/0507111).

**Note: Lopez de Prado suggests that this de-noising algorithm is preferable as it removes the noise while preserving the signal.**

In [16]:
# Setting the required parameters for de-noising

# Relation of number of observations T to the number of variables N (T/N)
tn_relation = stock_prices.shape[0] / stock_prices.shape[1]

# The bandwidth of the KDE kernel
kde_bwidth = 0.01

# Finding the De-noised Сovariance matrix using the Constant Residual Eigenvalue Method
cov_matrix_denoised = risk_estimators.denoise_covariance(cov_matrix, tn_relation, denoise_method='const_resid_eigen', kde_bwidth=kde_bwidth)

# Transforming De-noised Covariance from np.array to pd.DataFrame
cov_matrix_denoised = pd.DataFrame(cov_matrix_denoised, index=cov_matrix.index, columns=cov_matrix.columns)

# Outputting the result
print('CRE De-noised Сovariance matrix')
cov_matrix_denoised

CRE De-noised Сovariance matrix


Unnamed: 0,EEM,EWG,TIP,EWJ,EFA
EEM,0.000466,0.000288,-2.8e-05,0.000224,0.000252
EWG,0.000288,0.000372,-2.5e-05,0.0002,0.000226
TIP,-2.8e-05,-2.5e-05,1.9e-05,-1.9e-05,-2.2e-05
EWJ,0.000224,0.0002,-1.9e-05,0.000232,0.000175
EFA,0.000252,0.000226,-2.2e-05,0.000175,0.000278


In [17]:
print('The Simple Covariance is:')
cov_matrix

The Simple Covariance is:


Unnamed: 0,EEM,EWG,TIP,EWJ,EFA
EEM,0.000466,0.00035,-1.7e-05,0.000255,0.000324
EWG,0.00035,0.000372,-1.5e-05,0.000221,0.000303
TIP,-1.7e-05,-1.5e-05,1.9e-05,-9e-06,-1.2e-05
EWJ,0.000255,0.000221,-9e-06,0.000232,0.000218
EFA,0.000324,0.000303,-1.2e-05,0.000218,0.000278


As we can see, the main diagonal hasn't changed, but the other covariances are different. This means that the algorithm has changed the eigenvalues of the correlation matrix.

### Spectral Clustering De-noising Method

The main idea behind spectral clustering is to remove the noise-related eigenvalues from an empirical correlation matrix, the method in which this is achieved is similar to the Constant Residual Eigenvalue de-noising method, the only difference is that instead of setting the eigenvalues which are below the theoretical value to their average value, they are set to zero in an attempt to remove the effects of those eigenvalues that are consistent with the null hypothesis of uncorrelated random variables.

Let us consider $n$ independent random variables with finite variance and $T$ records each. Random matrix
theory allows to prove that in the $\lim\limits_{n \to \infty} T$, with a fixed ratio $Q = T/n \geq 1$, the
eigenvalues of the sample correlation matrix cannot be larger than

$$ \lambda_{max} = \sigma^2(1 + \frac{1}{Q} + 2\sqrt{\frac{1}{Q}})$$

where $\sigma^2 = 1$ for correlation matrices, once achieved we set any eignevalues above this threshold to $0$.
For example, we have a set of 5 eigenvalues sorted in the descending order ( $\lambda_1$ ... $\lambda_5$ ),
3 of which are below the maximum theoretical value, then we set

$$ \lambda_3^{NEW} = \lambda_4^{NEW} = \lambda_5^{NEW} = 0$$

We can use this method by setting the denoise_method parameter to 'spectral'.

In [18]:
# Finding the De-noised Сovariance matrix using the Spectral Clustering De-noising Method
cov_matrix_spectral = risk_estimators.denoise_covariance(cov_matrix, tn_relation, denoise_method='spectral')

# Transforming De-noised Covariance from np.array to pd.DataFrame
cov_matrix_spectral = pd.DataFrame(cov_matrix_spectral, index=cov_matrix.index, columns=cov_matrix.columns)

# Outputting the result
print('The Spectral Clustering De-noised Сovariance matrix is:')
cov_matrix_spectral

The Spectral Clustering De-noised Сovariance matrix is:


Unnamed: 0,EEM,EWG,TIP,EWJ,EFA
EEM,0.000466,0.000416,-9.5e-05,0.000329,0.00036
EWG,0.000416,0.000372,-8.5e-05,0.000294,0.000322
TIP,-9.5e-05,-8.5e-05,1.9e-05,-6.7e-05,-7.3e-05
EWJ,0.000329,0.000294,-6.7e-05,0.000232,0.000254
EFA,0.00036,0.000322,-7.3e-05,0.000254,0.000278


### Targeted Shrinkage De-noising Method

PortfolioLab also has the Targeted Shrinkage de-noising method available to users. The main idea behind the Targeted Shrinkage de-noising method is to
shrink the eigenvectors/eigenvalues that are noise-related. This is done by shrinking the correlation matrix calculated from noise-related eigenvectors/eigenvalues and then adding the correlation matrix composed from signal-related eigenvectors/eigenvalues.

The de-noising function works as follows:

- The given covariance matrix is transformed to the correlation matrix.

- The eigenvalues and eigenvectors of the correlation matrix are calculated and sorted in the descending order.

- Using the Kernel Density Estimate algorithm a kernel of the eigenvalues is estimated.

- The Marcenko-Pastur pdf is fitted to the KDE estimate using the variance as the parameter for the optimization.

- From the obtained Marcenko-Pastur distribution, the maximum theoretical eigenvalue is calculated using the formula
  from the **Instability caused by noise** part of [A Robust Estimator of the Efficient Frontier paper](https://papers.ssrn.com/sol3/abstract_id=3469961).

- The correlation matrix composed from eigenvectors and eigenvalues related to noise (eigenvalues below the maximum
  theoretical eigenvalue) is shrunk using the $\alpha$ variable.

$$C_n = \alpha W_n \Lambda_n W_n' + (1 - \alpha) diag[W_n \Lambda_n W_n']$$

- The shrinked noise correlation matrix is summed to the information correlation matrix.

$$C_i = W_i \Lambda_i W_i'$$

$$C_{denoised} = C_n + C_i$$

- The new correlation matrix is then transformed back to the new de-noised covariance matrix.

We can use this method by setting the denoise_method parameter to 'target_shrink'.

In [19]:
# Finding the De-noised Сovariance matrix using the Targeted Shrinkage Method assuming alpha is 0.5
cov_matrix_target_denoised = risk_estimators.denoise_covariance(cov_matrix, tn_relation, denoise_method='target_shrink', kde_bwidth=kde_bwidth, alpha=0.5)

# Transforming De-noised Covariance from np.array to pd.DataFrame
cov_matrix_target_denoised = pd.DataFrame(cov_matrix_target_denoised, index=cov_matrix.index, columns=cov_matrix.columns)

# Outputting the result
print('Shrinkage De-noised Сovariance matrix')
cov_matrix_target_denoised

Shrinkage De-noised Сovariance matrix


Unnamed: 0,EEM,EWG,TIP,EWJ,EFA
EEM,0.000466,0.000358,-1.9e-05,0.000265,0.000328
EWG,0.000358,0.000372,-1.7e-05,0.000234,0.0003
TIP,-1.9e-05,-1.7e-05,1.9e-05,-1.2e-05,-1.5e-05
EWJ,0.000265,0.000234,-1.2e-05,0.000232,0.000221
EFA,0.000328,0.0003,-1.5e-05,0.000221,0.000278


The results of this de-noising method are the same as for the previous method for this particular example, however, they may differ when used on other datasets.

### Hierarchical Clustering De-noising Method

Hierarchical Clustering, unlike K-means Clustering, does not create multiple clusters of identical size, nor does it
require a pre-defined number of clusters. Of the two different types of hierarchical clustering - Agglomerative and
Divisive - Agglomerative, or bottom-up clustering is used here.

Agglomerative Clustering assigns each observation to its own individual cluster before iteratively joining the two
most similar clusters. This process repeats until only a singular cluster remains.

Given a positive empirical correlation matrix, $C$ generated using $n$ features, the procedure given below
returns as an output a rooted tree and a filtered correlation matrix $C^<$ of elements $c^<_{ij}$.

First, set $C = C^<$. 

Then, beginning with the most highly correlated features (clusters) $h$ and $k \in C$ and the correlation
between them, $c_{hk}$, one sets the elements $c^<_{ij} = c^<_{ji} = c_{hk}$.

The matrix $C^<$ is then redefined such that:

$$\begin{cases} c^<_{qj} = f(c^<_{hj}, c^<_{kj}) & where \ j \notin h \ and \ j \notin k \\ c^<_{ij} = c^<_{ij} & otherwise \end{cases}$$

where $f(c^<_{hj}, c^<_{kj})$ is any distance metric. In effect, merging the clusters $h$ and $k$.
These steps are then completed for the next two most similar clusters, and are repeated for a total
of $n-1$ iterations; until only a single cluster remains.

There are a few methods to use with Hierarchical Clustering for calculating the distance metric, here are 3 of them:

- **Single** $d(u,v) = min(dist(u[i], v[j]))$ for all points $i$ in cluster $u$ and $j$ in cluster $v$. This is also known as the Nearest Point Algorithm.

- **Complete** $d(u,v) = max(dist(u[i], v[j]))$ for all points $i$ in cluster $u$ and $j$ in cluster $v$. This is also known by the Farthest Point Algorithm or Voor Hees Algorithm

- **Average** $d(u,v) = \displaystyle\sum_{ij} \frac{d(u[i], v[j])}{|u| * |v|}$ for all points $i$ in cluster $|u|$ and $|j|$ in cluster $u$ and $v$, respectively. This is also called the UPGMA algorithm.

We can use this method by setting the Hierarchical Clustering method parameter to one of the available methods.

In [20]:
# Getting the Empirical Correlation matrix from the Covariance matrix
empirical_corr = risk_estimators.cov_to_corr(empirical_cov ** 2) # Matrix must be positive definite.

# Finding the De-noised Сorrelation matrix using the Hierarchical Clustering De-noising Average Method
corr_matrix_hierarchical = risk_estimators.filter_corr_hierarchical(empirical_corr.to_numpy(), method='average', draw_plot=False)

# Transforming De-noised Covariance from np.array to pd.DataFrame
corr_matrix_hierarchical = pd.DataFrame(corr_matrix_hierarchical, index=cov_matrix.index, columns=cov_matrix.columns)

# Outputting the result
print('The Hierarchical Clustering De-noised Correlation matrix is:')
corr_matrix_hierarchical

The Hierarchical Clustering De-noised Correlation matrix is:


Unnamed: 0,EEM,EWG,TIP,EWJ,EFA
EEM,1.0,0.446184,0.446184,0.446184,0.617114
EWG,0.446184,1.0,0.29843,0.29843,0.617114
TIP,0.446184,0.29843,1.0,0.017605,0.617114
EWJ,0.446184,0.29843,0.017605,1.0,0.617114
EFA,0.617114,0.617114,0.617114,0.617114,1.0


### De-toning

De-noised correlation matrix from the previous methods can also be de-toned by excluding a number of first
eigenvectors representing the market component.

According to Lopez de Prado:

"Financial correlation matrices usually incorporate a market component. The market component is characterized by the
first eigenvector, with loadings $W_{n,1} \approx N^{-\frac{1}{2}}, n = 1, ..., N.$
Accordingly, a market component affects every item of the covariance matrix. In the context of clustering
applications, it is useful to remove the market component, if it exists (a hypothesis that can be
tested statistically)."

"By removing the market component, we allow a greater portion of the correlation to be explained
by components that affect specific subsets of the securities. It is similar to removing a loud tone
that prevents us from hearing other sounds"

"The detoned correlation matrix is singular, as a result of eliminating (at least) one eigenvector.
This is not a problem for clustering applications, as most approaches do not require the invertibility
of the correlation matrix. Still, **a detoned correlation matrix** $C_{detoned}$ **cannot be used directly for**
**mean-variance portfolio optimization**."

The de-toning function works as follows:

- De-toning is applied on the de-noised correlation matrix.

- The correlation matrix representing the market component is calculated from market component eigenvectors and eigenvalues
  and then subtracted from the de-noised correlation matrix. This way the de-toned correlation matrix is obtained.
  
$$\hat{C} = C_{denoised} - W_m \Lambda_m W_m'$$

- De-toned correlation matrix $\hat{C}$ is then rescaled so that the main diagonal consists of 1s

$$C_{detoned} = \hat{C} [(diag[\hat{C}])^\frac{1}{2}(diag[\hat{C}])^{\frac{1}{2}'}]^{-1}$$

We can use de-toning setting the detone parameter to True.

In [21]:
# Finding the De-toned Сovariance matrix assuming the market component is 1
cov_matrix_detoned = risk_estimators.denoise_covariance(cov_matrix, tn_relation, denoise_method='const_resid_eigen',
                                                        detone=True, market_component=1,kde_bwidth=kde_bwidth)

# Transforming De-noised Covariance from np.array to pd.DataFrame
cov_matrix_detoned = pd.DataFrame(cov_matrix_detoned, index=cov_matrix.index, columns=cov_matrix.columns)

# Outputting the result
print('The De-toned Сovariance matrix is:')
cov_matrix_detoned

The De-toned Сovariance matrix is:


Unnamed: 0,EEM,EWG,TIP,EWJ,EFA
EEM,0.000466,-0.000108,2.2e-05,-8.9e-05,-8.9e-05
EWG,-0.000108,0.000372,2e-05,-7.9e-05,-7.9e-05
TIP,2.2e-05,2e-05,1.9e-05,1.6e-05,1.8e-05
EWJ,-8.9e-05,-7.9e-05,1.6e-05,0.000232,-6.6e-05
EFA,-8.9e-05,-7.9e-05,1.8e-05,-6.6e-05,0.000278


In [22]:
# plt.figure(figsize=(6, 5))
# sns.heatmap(cov_matrix_detoned)
# plt.title('Detoned Covariance', size=15)
# plt.tight_layout()
# plt.savefig('cov_matrix_detoned.png', dpi=150)
# plt.show()

The results of de-toning are significantly different from the de-noising results. This indicates that the deleted market component had an effect on the covariance between elements.

## Transforming covariance matrix to correlation matrix and back

These are simple functions to:
- transform covariance matrix into correlation matrix 
- transform correlation matrix into covariance matrix 

In [23]:
# Transforming our covariance matrix to a correlation matrix
corr_matrix = risk_estimators.cov_to_corr(cov_matrix)

# Outputting the result
print('The correlation matrix is:')
corr_matrix

The correlation matrix is:


Unnamed: 0,EEM,EWG,TIP,EWJ,EFA
EEM,1.0,0.840079,-0.175654,0.775864,0.90119
EWG,0.840079,1.0,-0.176822,0.75206,0.943192
TIP,-0.175654,-0.176822,1.0,-0.132683,-0.168585
EWJ,0.775864,0.75206,-0.132683,1.0,0.859232
EFA,0.90119,0.943192,-0.168585,0.859232,1.0


In [24]:
# The standard deviation to use when calculating the covaraince matrix back
std = np.diag(cov_matrix) ** (1/2)

# And back to the covariance matrix
cov_matrix_again = risk_estimators.corr_to_cov(corr_matrix, std)

# Outputting the result
print('The covariance matrix calculated back is:')
cov_matrix_again

The covariance matrix calculated back is:


Unnamed: 0,EEM,EWG,TIP,EWJ,EFA
EEM,0.000466,0.00035,-1.7e-05,0.000255,0.000324
EWG,0.00035,0.000372,-1.5e-05,0.000221,0.000303
TIP,-1.7e-05,-1.5e-05,1.9e-05,-9e-06,-1.2e-05
EWJ,0.000255,0.000221,-9e-06,0.000232,0.000218
EFA,0.000324,0.000303,-1.2e-05,0.000218,0.000278


In [25]:
print('Exactly the same as the original one:')
cov_matrix

Exactly the same as the original one:


Unnamed: 0,EEM,EWG,TIP,EWJ,EFA
EEM,0.000466,0.00035,-1.7e-05,0.000255,0.000324
EWG,0.00035,0.000372,-1.5e-05,0.000221,0.000303
TIP,-1.7e-05,-1.5e-05,1.9e-05,-9e-06,-1.2e-05
EWJ,0.000255,0.000221,-9e-06,0.000232,0.000218
EFA,0.000324,0.000303,-1.2e-05,0.000218,0.000278


## Conclusion

This notebook describes the functions implemented in the RiskEstimators class, related to different ways of calculating and adjusting the Covariance matrix. Also, it shows how the corresponding functions from the PortfolioLab library can be used and how the outputs can be analyzed.

Key takeaways from the notebook:
- A robust covariance estimator (such as the Minimum Covariance Determinant) is needed in order to discard/downweight the outliers in the data. These outliers seriously affect the Empirical covariance estimator and the Covariance estimators with shrinkage.
- The Maximum Likelihood Estimator (Empirical Covariance) of a sample is an unbiased estimator of the corresponding population’s covariance matrix.
- Shrinkage consists in reducing the ratio between the smallest and the largest eigenvalues of the empirical covariance matrix. It is used to avoid the problem with inversion of the covariance matrix.
- Ledoit-Wolf and Oracle Approximating are methods to calculate the optimal shrinkage coefficient $\alpha$ used in the Basic Shrinkage.
- The semi-covariance matrix is the way to measure the volatility of the negative returns or returns below a certain threshold. 
- Exponential Covariance is used to give greater weight to the most relevant observations in computing the covariance.
- The Constant Residual Eigenvalue De-noising Method calculates the eigenvalues of the correlation matrix and adjusts the ones that are lower than the theoretically estimated ones, as they are caused by noise.
- The Spectral Clustering De-noising Method works just like the Constant Residual Eigenvalue De-noising Method, But instead of lowering Eigenvalues below the theoretical maximum Eigenvalue, They are set to 0.
- The Targeted Shrinkage De-noising Method shrinks the eigenvectors/eigenvalues that are noise-related.
- The Hierarchical Clustering De-noising Method is used to filter empirical correlation matrices using Agglomerative Clustering.
- The De-toning algorithm excludes a number of first eigenvectors representing the market component. This allows a greater portion of the correlation to be explained by components that affect specific subsets of the securities. Note, that a de-toned correlation matrix cannot be used directly for mean-variance portfolio optimization.