# Course 2: Advanced Portfolio Construction and Analysis With Python
## Module 3: Robust Estimates for the Covariance Matrix


**Question 1**

Load the 49 industries Value weighted returns and cap weights, and use the period 2013-2018 both included. For the period, use the starting cap weights of the period. Limit yourself to the following 5 industry sectors: 'Hlth', 'Fin', 'Whlsl', 'Rtail', 'Food'.

You will need to compute the correlation matrix as well as the volatilities. (Hint: Remember to annualize the volatilities by multiplying the volatility you get from the monthly data by the sqrt iof 12)

Using the same value of delta used in the He-Litterman paper of 2.5 and using the same sigma prior methodology used in the notebook and in the paper, compute the implied returns vector.

Which industry sector has the highest capweight?

Enter your answer as text, exactly as they are named in the Data file (i.e. 'Hlth', 'Fin', 'Whlsl', 'Rtail', or 'Food')

In [1]:
import pandas as pd
import numpy as np

In [2]:
def get_ind_file(filetype, weighting="vw", n_inds=30):
    """
    Load and format the Ken French Industry Portfolios files
    Variant is a tuple of (weighting, size) where:
        weighting is one of "ew", "vw"
        number of inds is 30 or 49
    """    
    if filetype is "returns":
        name = f"{weighting}_rets" 
        divisor = 100
    elif filetype is "nfirms":
        name = "nfirms"
        divisor = 1
    elif filetype is "size":
        name = "size"
        divisor = 1
    else:
        raise ValueError(f"filetype must be one of: returns, nfirms, size")
    
    ind = pd.read_csv(f"data/ind{n_inds}_m_{name}.csv", header=0, index_col=0, na_values=-99.99)/divisor
    ind.index = pd.to_datetime(ind.index, format="%Y%m").to_period('M')
    ind.columns = ind.columns.str.strip()
    return ind

def get_ind_market_caps(n_inds=30, weights=False):
    """
    Load the industry portfolio data and derive the market caps
    """
    ind_nfirms = get_ind_file('nfirms', n_inds=n_inds)
    ind_size =get_ind_file('size', n_inds=n_inds)
    ind_mktcap = ind_nfirms * ind_size
    if weights:
        total_mktcap = ind_mktcap.sum(axis=1)
        ind_capweight = ind_mktcap.divide(total_mktcap, axis="rows")
        return ind_capweight
    return ind_mktcap

def weight_cw(r, cap_weights, **kwargs):
    """
    Returns the weights of the CW portfolio based on the time series of capweights
    """
    w = cap_weights.loc[r.index[0]]
    return w/w.sum()

In [3]:
#Load the 49 Value Weighted industry portfolio returns & market cap from 2013 onwards
industries = ['Hlth', 'Fin', 'Whlsl', 'Rtail', 'Food']
ind_returns =  get_ind_file('returns', 'vw', n_inds=49).loc['2013':'2018', industries]
ind_mcap = get_ind_market_caps(n_inds=49, weights=True).loc['2013':'2018', industries]
#getting average weight over the period
weights = ind_mcap.mean(axis=0)
weights = weights/weights.sum()

In [4]:
weights.idxmax()

'Rtail'

**Question 2**

Use the same data as the previous question, which industry sector has the highest implied return?

Enter your answer as text, exactly as they are named in the Data file (i.e. 'Hlth', 'Fin', 'Whlsl', 'Rtail', or 'Food')

In [5]:
def implied_returns(delta, sigma, w):
    """
    Obtain the implied expected returns by reverse engineering the weights
    Inputs:
    delta: Risk Aversion Coefficient (scalar)
    sigma: Variance-Covariance Matrix (N x N) as DataFrame
        w: Portfolio weights (N x 1) as Series
    Returns an N x 1 vector of Returns as Series
    """
    ir = delta * sigma.dot(w).squeeze() # to get a series from a 1-column dataframe
    ir.name = 'Implied Returns'
    return ir

In [6]:
rho = ind_returns.corr()
vol = ind_returns.std() * np.sqrt(12)
sigma_prior = vol.dot(vol.T) * rho

impl_returns = implied_returns(2.5, sigma_prior, weights)
impl_returns

Hlth     0.153260
Fin      0.177403
Whlsl    0.202476
Rtail    0.224144
Food     0.156550
Name: Implied Returns, dtype: float64

In [7]:
impl_returns.idxmax()

'Rtail'

**Question 3**

Use the same data and assumptions as the previous question.

Which industry sector has the lowest implied return?

Enter your answer as text, exactly as they are named in the Data file (i.e. 'Hlth', 'Fin', 'Whlsl', 'Rtail', or 'Food')

In [8]:
impl_returns.idxmin()

'Hlth'

**Question 4**

Impose the subjective relative view that Hlth will outperform Rtail and Whlsl by 3%  (Hint: Use the same logic as View 1 in the He-Litterman paper)

What is the entry you will use for the Pick Matrix P for Whlsl. (Hint: Remember to use the correct sign)

Enter the number as a positive or negative number correct to at least two decimal places (e.g. -0.23 or +0.23)

In [9]:
q = pd.Series([.03])
p = pd.DataFrame([0] * len(industries), index = industries).T
w_rt = weights.loc['Rtail'] / (weights.loc['Rtail'] + weights.loc['Whlsl'])
w_wh = weights.loc['Whlsl'] / (weights.loc['Rtail'] + weights.loc['Whlsl'])

p.iloc[0]['Hlth'] = 1
p['Rtail'] = -w_rt
p['Whlsl'] = -w_wh
p

Unnamed: 0,Hlth,Fin,Whlsl,Rtail,Food
0,1,0,-0.156321,-0.843679,0


In [10]:
p['Whlsl'].round(2)

0   -0.16
Name: Whlsl, dtype: float64

**Question 5**

Impose the subjective relative view that Hlth will outperform Rtail and Whlsl by 3%  (Hint: Use the same logic as View 1 in the He-Litterman paper)

What is the entry you will use for the Pick Matrix P for Rtail. (Hint: Remember to use the correct sign)

Enter the number as a positive or negative number correct to at least two decimal places (e.g. -0.234 or +0.234)

In [11]:
p['Rtail'].round(2)

0   -0.84
Name: Rtail, dtype: float64

**Question 6**

Impose the subjective relative view that Hlth will outperform Rtail and Whlsl by 3%  (Hint: Use the same logic as View 1 in the He-Litterman paper)

Once you impose this view (use delta = 2.5 and tau = 0.05 as in the paper), which sector has the lowest implied return?

Enter your answer as text, exactly as they are named in the Data file (i.e. 'Hlth', 'Fin', 'Whlsl', 'Rtail', or 'Food')

In [12]:
from numpy.linalg import inv

def bl(w_prior, sigma_prior, p, q,
                omega=None,
                delta=2.5, tau=.02):
    """
# Computes the posterior expected returns based on 
# the original black litterman reference model
#
# W.prior must be an N x 1 vector of weights, a Series
# Sigma.prior is an N x N covariance matrix, a DataFrame
# P must be a K x N matrix linking Q and the Assets, a DataFrame
# Q must be an K x 1 vector of views, a Series
# Omega must be a K x K matrix a DataFrame, or None
# if Omega is None, we assume it is
#    proportional to variance of the prior
# delta and tau are scalars
    """
    if omega is None:
        omega = proportional_prior(sigma_prior, tau, p)
    # Force w.prior and Q to be column vectors
    # How many assets do we have?
    N = w_prior.shape[0]
    # And how many views?
    K = q.shape[0]
    # First, reverse-engineer the weights to get pi
    pi = implied_returns(delta, sigma_prior,  w_prior)
    # Adjust (scale) Sigma by the uncertainty scaling factor
    sigma_prior_scaled = tau * sigma_prior  
    # posterior estimate of the mean, use the "Master Formula"
    # we use the versions that do not require
    # Omega to be inverted (see previous section)
    # this is easier to read if we use '@' for matrixmult instead of .dot()
    #     mu_bl = pi + sigma_prior_scaled @ p.T @ inv(p @ sigma_prior_scaled @ p.T + omega) @ (q - p @ pi)
    mu_bl = pi + sigma_prior_scaled.dot(p.T).dot(inv(p.dot(sigma_prior_scaled).dot(p.T) + omega).dot(q - p.dot(pi).values))
    # posterior estimate of uncertainty of mu.bl
#     sigma_bl = sigma_prior + sigma_prior_scaled - sigma_prior_scaled @ p.T @ inv(p @ sigma_prior_scaled @ p.T + omega) @ p @ sigma_prior_scaled
    sigma_bl = sigma_prior + sigma_prior_scaled - sigma_prior_scaled.dot(p.T).dot(inv(p.dot(sigma_prior_scaled).dot(p.T) + omega)).dot(p).dot(sigma_prior_scaled)
    return (mu_bl, sigma_bl)

def proportional_prior(sigma, tau, p):
    """
    Returns the He-Litterman simplified Omega
    Inputs:
    sigma: N x N Covariance Matrix as DataFrame
    tau: a scalar
    p: a K x N DataFrame linking Q and Assets
    returns a P x P DataFrame, a Matrix representing Prior Uncertainties
    """
    helit_omega = p.dot(tau * sigma).dot(p.T)
    # Make a diag matrix from the diag elements of Omega
    return pd.DataFrame(np.diag(np.diag(helit_omega.values)),index=p.index, columns=p.index)

In [13]:
delta = 2.5
tau = 0.05

bl_mu, bl_sigma = bl(weights, sigma_prior, p, q, tau = tau)
bl_mu

Hlth     0.179802
Fin      0.171057
Whlsl    0.193950
Rtail    0.199403
Food     0.143844
dtype: float64

In [14]:
bl_mu.idxmin()

'Food'

**Question 7**

Impose the subjective relative view that Hlth will outperform Rtail and Whlsl by 3%  (Hint: Use the same logic as View 1 in the He-Litterman paper)

Which sector now has the highest weight in the MSR portfolio using the Black-Litterman model?

Enter your answer as text, exactly as they are named in the Data file (i.e. 'Hlth', 'Fin', 'Whlsl', 'Rtail', or 'Food')

In [15]:
def w_star(delta, sigma, mu):
    return (inverse(sigma).dot(mu))/delta

def inverse(d):
    """
    Invert the dataframe by inverting the underlying matrix
    """
    return pd.DataFrame(inv(d.values), index=d.columns, columns=d.index)

In [16]:
wstar = w_star(delta=2.5, sigma=bl_sigma, mu=bl_mu)
wstar

Hlth     0.262911
Fin      0.177747
Whlsl    0.060301
Rtail    0.325453
Food     0.125969
dtype: float64

In [17]:
wstar.idxmax()

'Rtail'

**Question 8**

Impose the subjective relative view that Hlth will outperform Rtail and Whlsl by 3%  (Hint: Use the same logic as View 1 in the He-Litterman paper)

Which sector now has the lowest weight in the MSR portfolio using the Black-Litterman model?

Enter your answer as text, exactly as they are named in the Data file (i.e. 'Hlth', 'Fin', 'Whlsl', 'Rtail', or 'Food')

In [18]:
wstar.idxmin()

'Whlsl'

**Question 9**

Now, let’s assume you change the relative view. You still think that it Hlth will outperform Rtail and Whlsl but you think that the outperformance will be 5% not the 3% you originally anticipated.

Which of the arrays will you need to update?

- Both P and Q
- Neither P nor Q but a different parameter
- Q and not P (correct)
- P and not Q

**Question 10**

Now, let’s assume you change the relative view. You still think that it Hlth will outperform Rtail and Whlsl but you think that the outperformance will be 5% not the 3% you originally anticipated.

Under this new view which sector has the highest expected return?

Enter your answer as text, exactly as they are named in the Data file (i.e. 'Hlth', 'Fin', 'Whlsl', 'Rtail', or 'Food')

In [19]:
q = pd.Series([.05])

bl_mu, bl_sigma = bl(weights, sigma_prior, p, q, tau = tau)
bl_mu

Hlth     0.185247
Fin      0.169755
Whlsl    0.192201
Rtail    0.194327
Food     0.141238
dtype: float64

In [20]:
bl_mu.idxmax()

'Rtail'

**Question 11**

Now, let’s assume you change the relative view. You still think that it Hlth will outperform Rtail and Whlsl but you think that the outperformance will be 5% not the 3% you originally anticipated.

Under this new view which sector does the Black-Litterman model assign the highest weight?

Enter your answer as text, exactly as they are named in the Data file (i.e. 'Hlth', 'Fin', 'Whlsl', 'Rtail', or 'Food')

In [21]:
wstar = w_star(delta=2.5, sigma=bl_sigma, mu=bl_mu)
wstar

Hlth     0.310142
Fin      0.177747
Whlsl    0.052918
Rtail    0.285604
Food     0.125969
dtype: float64

In [22]:
wstar.idxmax()

'Hlth'