Successfully measuring volatility would allow for more accurate modeling of the returns and more stable investments leading to greater returns, but <font color='blue'>forecasting volatility accurately is a difficult problem.

<font color='blue'>Volatility needs to be forward-looking and predictive in order to make smart decisions.

Unfortunately, simply taking the historical standard deviation of an individual asset's returns falls short when we take into account need for robustness to the future

<font color='blue'>To model how a portfolio overall changes, it is important to look not only at the volatility of each asset in the portfolio, but also at the pairwise covariances of every asset involved.

The relationship between two or more assets provides valuable insights and a path towards reduction of overall portfolio volatility.

<font color='blue'>A large number of assets with low covariance would assure they decrease or increase independently of each other. 

In statistics and probability, <font color='blue'>the covariance is a measure of the joint variability of two random variables.

<font color='blue'>When random variables exhibit similar behavior, there tends to be a high covariance between them. 

Mathematically, we express the covariance of X with respect to Y as:
<font color='blue'>$ COV(X, Y) = E[(X - E[X])(Y - E[Y])]$

<font color='blue'>If two assets have a high covariance, they will generally behave the same way. 

<font color='blue'>Assets with particularly high covariance can essentially replace each other.

<font color='blue'> We use covariances to quantify the joint risk of assets, forming how we view the risk of an entire portfolio

What is key is that investing in assets that have high pairwise covariances provides little diversification because of how closely their fluctuations are related.

In [None]:
import seaborn as sns
import scipy.stats as stats
from sklearn import covariance

In [None]:
# Generate random values of x
X = np.random.normal(size = 1000)
epsilon = np.random.normal(0, 3, size = len(X))
Y = 5*X + epsilon

product = (X - np.mean(X))*(Y - np.mean(Y))
expected_value = np.mean(product)

print 'Value of the covariance between X and Y:', expected_value

In [None]:
np.cov([X, Y])

In [None]:
print np.var(X), np.var(Y)

Covariance matrices are symmetric, since <font color='blue'>$COV(X, Y) = COV(Y, X)$,</font> which is why the off-diagonals mirror each other.

In [None]:
# scatter plot of X and y
from statsmodels import regression
import statsmodels.api as sm
def linreg(X,Y):
    # Running the linear regression
    X = sm.add_constant(X)
    model = regression.linear_model.OLS(Y, X).fit()
    a = model.params[0]
    b = model.params[1]
    X = X[:, 1]

    # Return summary of the regression and plot results
    X2 = np.linspace(X.min(), X.max(), 100)
    Y_hat = X2 * b + a
    plt.scatter(X, Y, alpha=0.3) # Plot the raw data
    plt.plot(X2, Y_hat, 'r', alpha=0.9);  # Add the regression line, colored in red
    plt.xlabel('X Value')
    plt.ylabel('Y Value')
    return model.summary()

linreg(X, Y)
plt.scatter(X, Y)
plt.title('Scatter plot and linear equation of x as a function of y')
plt.xlabel('X')
plt.ylabel('Y')
plt.legend(['Linear equation', 'Scatter Plot']);

<font color='blue'>If we take the covariance between $N$ assets, we will get out a $N \times N$ covariance matrix.
$$ \Sigma = \left[\begin{matrix}
VAR(X_1) & COV(X_1, X_2) & \cdots & COV(X_1, X_N) \\
COV(X_2, X_0) & VAR(X_2) & \cdots & COV(X_2, X_N) \\
\vdots & \vdots & \ddots & \vdots \\
COV(X_N, X_1) & COV(X_N, X_2) & \cdots & VAR(X_N)
\end{matrix}\right] $$ 

In [None]:
# Four asset example of the covariance matrix.
start_date = '2016-01-01'
end_date = '2016-02-01'

returns = get_pricing(
    ['SBUX', 'AAPL', 'GS', 'GILD'],
    start_date=start_date,
    end_date=end_date,
    fields='price'
).pct_change()[1:]
returns.columns = map(lambda x: x.symbol, returns.columns)

print 'Covariance matrix:'
print returns.cov()

We measure the covariance of the assets in our portfolio to make sure we have an accurate picture of the risks involved in holding those assets togther.

<font color='blue'>Estimating the covariance matrix becomes critical when using methods that rely on it, as we cannot know the true statistical relationships underlying our chosen assets.

<font color='blue'>The stability and accuracy of these estimates are essential to getting stable weights that encapsulate our risks and intentions.

Unfortunately, <font color='blue'>the most obvious way to calculate a covariance matrix estimate, the sample covariance, is notoriously unstable.

<font color='blue'>If we have fewer time observations of our assets than the number of assets ( T<N ), the estimate becomes especially unreliable.

<font color='blue'>The extreme values react more strongly to changes, and as the extreme values of the covariance jump around, our optimizers are perturbed, giving us inconsistent weights. 

<font color='blue'>Even if we have more time elements than assets that we are trading, we can run into issues, as the time component may span multiple regimes, giving us covariance matrices that are still inaccurate.

The solution in many cases is to<font color='blue'> use a robust formulation of the covariance matrix. If we can estimate a covariance matrix that still captures the relationships between assets and is simultaneously more stable, then we can have more faith in the output of our optimizers.

<font color='blue'>The concept of shrinkage stems from the need for stable covariance matrices. The basic way we "shrink" a matrix is to reduce the extreme values of the sample covariance matrix by pulling them closer to the center.

Practically, <font color='blue'>we take a linear combination of the sample covariance covariance matrix a constant array representing the center.

<font color='blue'>Given a sample covariance matrix, $\textbf{S}$, the mean variance, $\mu$, and the shrinkage constant $\delta$, the shrunk estimated covariance is mathematically defined as:   
$(1 - \delta)\textbf{S} + \delta\mu\textbf{1}$
 
<font color='blue'>We restrict $\delta$ such that $0 \leq \delta \leq 1$ </font>making this a weighted average between the sample covariance and the mean variance matrix. 

The optimal value of  δ  has been tackled several times. For our purposes, we will use the formulation by Ledoit and Wolf.

In [their paper](http://ledoit.net/honey.pdf), <font color='blue'>Ledoit and Wolf  proposed an optimal $\delta$: 
$\hat\delta^* \max\{0, \min\{\frac{\hat\kappa}{T},1\}\}$<br></font>
$\hat\kappa$ has a mathematical formulation that is beyond the scope of this lecture, but you can find its definition in the paper.

<font color='blue'>The Ledoit-Wolf Estimator is the robust covariance estimate that uses this optimal $\hat\delta^*$ to shrink the sample covariance matrix.</font> We can draw an implementation of it directly from `scikit-learn` for easy use.

In [None]:
# Getting the return data of assets. 
start = '2016-01-01'
end = '2016-02-01'

symbols = ['AAPL', 'MSFT', 'BRK-A', 'GE', 'FDX', 'SBUX']

prices = get_pricing(symbols, start_date = start, end_date = end, fields = 'price')
prices.columns = map(lambda x: x.symbol, prices.columns)
returns = prices.pct_change()[1:]

In [None]:
returns.head()

In [None]:
in_sample_lw = covariance.ledoit_wolf(returns)[0]
print in_sample_lw

<font color='blue'>We can quantify the difference between the in and out-of-sample estimates by taking the absolute difference element-by-element for the two matrices. We represent this mathematically as: 
$ \frac{1}{n} \sum_{i=1}^{n} |a_i - b_i| $

In [None]:
oos_start = '2016-02-01'
oos_end = '2016-03-01'
oos_prices = get_pricing(symbols, start_date = oos_start, end_date = oos_end, fields = 'price')
oos_prices.columns = map(lambda x: x.symbol, oos_prices.columns)
oos_returns = oos_prices.pct_change()[1:]
out_sample_lw = covariance.ledoit_wolf(oos_returns)[0]

In [None]:
lw_errors = sum(abs(np.subtract(in_sample_lw, out_sample_lw)))
print "Average Ledoit-Wolf error: ", np.mean(lw_errors)

In [None]:
sample_errors = sum(abs(np.subtract(returns.cov().values, oos_returns.cov().values)))
print 'Average sample covariance error: ', np.mean(sample_errors)

In [None]:
print 'Error improvement of LW over sample: {0:.2f}%'.format((np.mean(sample_errors/lw_errors)-1)*100)

In [None]:
sns.boxplot(
    data = pd.DataFrame({
        'Sample Covariance Error': sample_errors,
        'Ledoit-Wolf Error': lw_errors
    })
)
plt.title('Box Plot of Errors')
plt.ylabel('Error');

In [None]:
start_date = '2016-01-01'
end_date = '2017-06-01'

symbols = [
    'SPY', 'XLF', 'XLE', 'XLU','XLK', 'XLI', 'XLB', 'GE', 'GS', 'BRK-A', 'JPM', 'AAPL', 'MMM', 'BA',
    'CSCO','KO', 'DIS','DD', 'XOM', 'INTC', 'IBM', 'NKE', 'MSFT', 'PG', 'UTX', 'HD', 'MCD', 'CVX', 
    'AXP','JNJ', 'MRK', 'CAT', 'PFE', 'TRV', 'UNH', 'WMT', 'VZ', 'QQQ', 'BAC', 'F', 'C', 'CMCSA',
    'MS', 'ORCL', 'PEP', 'HON', 'GILD', 'LMT', 'UPS', 'HP', 'FDX', 'GD', 'SBUX'
]

prices = get_pricing(symbols, start_date=start_date, end_date=end_date, fields='price')
prices.columns = map(lambda x: x.symbol, prices.columns)
returns = prices.pct_change()[1:]

In [None]:
dates = returns.resample('M').first().index

In [None]:
sample_covs = []
lw_covs = []

for i in range(1, len(dates)):
    sample_cov = returns[dates[i-1]:dates[i]].cov().values
    sample_covs.append(sample_cov)
    
    lw_cov = covariance.ledoit_wolf(returns[dates[i-1]:dates[i]])[0]
    lw_covs.append(lw_cov)      

In [None]:
lw_diffs = []
for pair in zip(lw_covs[:-1], lw_covs[1:]):
    diff = np.mean(np.sum(np.abs(pair[0] - pair[1])))
    lw_diffs.append(diff)
    
sample_diffs = []
for pair in zip(sample_covs[:-1], sample_covs[1:]):
    diff = np.mean(np.sum(np.abs(pair[0] - pair[1])))
    sample_diffs.append(diff)

In [None]:
plt.plot(dates[2:], lw_diffs)
plt.plot(dates[2:], sample_diffs)
plt.xlabel('Time')
plt.ylabel('Mean Error')
plt.legend(['Ledoit-Wolf Errors', 'Sample Covariance Errors']);

<font color='blue'>the Ledoit-Wolf estimator would likely perform even better as the number of assets outpaces the number of observations.