1. Mean-variance optimization goes long the highest Sharpe-Ratio assets and shorts the lowest
Sharpe-ratio assets

False. While it will generally keep the assets with the highest Sharpe-Ratio, what it will do more often is keep the assets with the highest means (returns) and short the assets with the lowest means (returns). Covariances are also an important determining factor in what gets kept in the portfolio.

2. Investing in an LETF makes more sense for a long-term horizon than a short-term horizon.

????

3. This week ProShares launches BITO on the NYSE. The ETF holds Bitcoin futures contracts. Suppose in a year from now, 
    we want to try to replicate BITO using SPY and IEF as regressors in a LFD. Because BITO will only have a year of data, 
    we do not trust that we will have a good estimate of the mean return.
    
    Do you suggest that we (in a year) estimate the regression with an intercept or without an intercept? Why?

I suggest using an intercept. This will allow the betas to account for as much of the change in the portfolio as possible, replicating BITO as closely as possible moving forward.

4. Is HDG effective at tracking HFRI in-sample? And out of sample?

Yes, we found in HW#2 Problem 7 that the Out of Sample Regression did track HFRI very closely.

5. A hedge fund claims to beat the market by having a very high alpha. After regressing the hedge fund returns on the 6 Merrill-Lynch style factors, you find the alpha to be negative. Explain why this discrepancy can happen.

We don't know what factors they used to calculate their alpha value (could just be SPY) or that the 6 Merrill-Lynch style factors aren't the right factors to replicate this portfolio.

In [3]:
import pandas as pd
import numpy as np
import statsmodels.api as sm
import seaborn as sns
import warnings
warnings.filterwarnings('ignore')
# Read excel file, sheet excess returns
factor_data = pd.read_excel('../data/proshares_analysis_data.xlsx', 'merrill_factors')
# Set the index to Date (rather than default 0 indexing)
factor_data.rename(columns={'Unnamed: 0':'Date'}, inplace=True)
factor_data = factor_data.set_index('Date')

# Subtract USGG3M Index and only keep 5 other columns
risky_assets_data = factor_data.subtract(factor_data['USGG3M Index'],axis=0).drop(columns=['USGG3M Index'])
risky_assets_data.head()


Unnamed: 0_level_0,SPY US Equity,EEM US Equity,EFA US Equity,EUO US Equity,IWM US Equity
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
2011-08-31,-0.054984,-0.092558,-0.087557,-0.005898,-0.088923
2011-09-30,-0.069438,-0.179083,-0.108101,0.142163,-0.111521
2011-10-31,0.10916,0.163002,0.096289,-0.069489,0.151022
2011-11-30,-0.004064,-0.019724,-0.021763,0.054627,-0.003782
2011-12-31,0.01044,-0.042657,-0.021755,0.075573,0.005135


In [8]:
# 2. What are the weights of the tangency portfolio?
# These do the same thing

def tangency_weights(returns,dropna=True,scale_cov=1):
    if dropna:
        returns = returns.dropna()

    covmat_full = returns.cov()
    covmat_diag = np.diag(np.diag(covmat_full))
    covmat = scale_cov * covmat_full + (1-scale_cov) * covmat_diag

    weights = np.linalg.solve(covmat,returns.mean())
    weights = weights / weights.sum()

    return pd.DataFrame(weights, index=returns.columns,columns=['tangency weights'])

def compute_tangency(excessReturnMatrix):
    # Get the covariance matrix based on excess returns
    sigma = excessReturnMatrix.cov()
    
    # Get the number of asset classes (in this example should be 11)
    n = sigma.shape[0]
    
    # Get the vector of mean excess returns
    mu = excessReturnMatrix.mean()
    
    # Get sigma inverse
    sigma_inv = np.linalg.inv(sigma)
    
    # Now we have all the pieces, do the calculation
    weights = (sigma_inv @ mu) / (np.ones(n) @ sigma_inv @ mu)
    
    # Convert back to a Series for convenience
    return pd.Series(weights, index=mu.index)

tangency_weights = compute_tangency(risky_assets_data)
tangency_weights.to_frame('Tangency Weights')

Unnamed: 0,Tangency Weights
SPY US Equity,2.185642
EEM US Equity,-0.04099
EFA US Equity,-0.993843
EUO US Equity,0.320766
IWM US Equity,-0.471576


In [9]:
# 3. What are the weights of the optimal portfolio with a target of 0.02 return?
# Compute weights for return of 0.02
def compute_weights(excessReturnData, tangency_weights, target_return):
    mu = excessReturnData.mean()
    sigma = excessReturnData.cov()
    n = sigma.shape[0]
    scalar = ((np.ones(n) @ np.linalg.inv(sigma) @ mu) / (mu @ np.linalg.inv(sigma) @ mu)) * target_return
    return scalar * tangency_weights

optimized_portfolio = compute_weights(risky_assets_data, tangency_weights, 0.02)
optimized_portfolio

SPY US Equity    2.626815
EEM US Equity   -0.049263
EFA US Equity   -1.194450
EUO US Equity    0.385513
IWM US Equity   -0.566764
dtype: float64

In [11]:
# Are we invested in the risk-free rate?
optimized_portfolio.sum()

1.2018503408544545

Since the total sum of our portfolio is 1.202, then we know we must be short 0.202 of the risk free rate

In [12]:
# Report the mean, vol and sharpe of the optimized portfolio
def portfolio_stats(excessReturnData, portfolio_weights):
    # Calculate the mean by multiplying the mean excess returns by the tangency weights and annualizing
    # TODO: double check where these formulas came from (class notes?)
    mean = excessReturnData.mean() @ portfolio_weights * 12

    # Volatility = sqrt(variance), and by class notes: variance = allocation_matrix * covariance_matrix * allocation_matrix
    # Annualize the result with sqrt(12)
    vol = np.sqrt(portfolio_weights @ excessReturnData.cov() @ portfolio_weights) * np.sqrt(12)

    # Sharpe Ratio is mean / vol
    sharpe_ratio = mean / vol

    # Format for easy reading
    return round(pd.DataFrame(data = [mean, vol, sharpe_ratio], 
        index = ['Mean', 'Volatility', 'Sharpe'], 
        columns = ['Portfolio Stats']), 4)
    
portfolio_stats(risky_assets_data, optimized_portfolio)

Unnamed: 0,Portfolio Stats
Mean,0.24
Volatility,0.1745
Sharpe,1.3757


In [23]:
# Recalculate optimal portfolio using only data through 2018
# First just get data through 2021
risky_assets_data_2018 = risky_assets_data.loc[:'2018']

# Compute tangency weights
tangency_weights_2018 = compute_tangency(risky_assets_data_2018)

# Get optimized portfolio
optimized_portfolio_2018 = compute_weights(risky_assets_data_2018, tangency_weights_2018, 0.02)
print(optimized_portfolio_2018)
# Now calculate the returns out of sample (2019+)
risky_assets_data_2019 = risky_assets_data.loc['2019':]
# This is calculated by doing the returns post 2019 matrix multiplied by the weights of the optimal portfolio pre-2018
# This is returns out of sample based on in sample weights
df_optimal_port_oos = pd.DataFrame(risky_assets_data_2019 @ optimized_portfolio_2018, columns= ['Optimal Portfolio'])
# Portfolio_stats takes return data and portfolio weights
portfolio_stats(risky_assets_data_2019, optimized_portfolio_2018)

# This does the same as portfolio_stats but only takes returns (matrix multiplication is done outside of function call)
def performanceMetrics(returns, annualization=1):
    metrics = pd.DataFrame(index=returns.columns)
    metrics['Mean'] = returns.mean() * annualization
    metrics['Vol'] = returns.std() * np.sqrt(annualization)
    metrics['Sharpe'] = (returns.mean() / returns.std()) * np.sqrt(annualization)

    metrics['Min'] = returns.min()
    metrics['Max'] = returns.max()

    return metrics
performanceMetrics(df_optimal_port_oos, 12)


SPY US Equity    2.959602
EEM US Equity   -0.303268
EFA US Equity   -0.826552
EUO US Equity    0.167792
IWM US Equity   -0.744995
dtype: float64


Unnamed: 0,Mean,Vol,Sharpe,Min,Max
Optimal Portfolio,0.290792,0.264263,1.100388,-0.095761,0.204606


2.5) Suppose that instead of optimizing these 5 risky assets, we optimized 5 commodity futures: oil, coffee, cocoa, lumber, cattle, and gold. Do you think the out-of-sample fragility problem would be better or worse than what we have seen optimizing equities?

The out-of-sample fragility problem is caused by the inverted covariance matrix, and the high correlation between assets in the portfolio. Since the commodities are likely less related than the equities, we can assume that the fragility problem would be improved by switching to the commodity futures.

<u>Last Years Answer:</u> The biggest reason the MV solution is “fragile” out-of-sample is due to the inversion of the covariance matrix. In HW#1 we learned that optimization over highly correlated assets leads to over-fitting, (as seen in extreme long-short portfolios, etc.) Thus, we expect the optimization will be overfit particularly in cases where the assets are highly correlated.

The five commodities are much less correlated to each other than our five factors, (which include several equity-focused securities.) We saw lower correlation in commodities in one of our demos, but just from the stated descriptions, we can infer the commodities will likely have less correlation and thus less of a problem with the inverted covariance matrix.

In [38]:
# 3 Hedging and Replication
# 3. Regression based statistics against SPY
# First get SPY
regression_data = pd.read_excel('../data/proshares_analysis_data.xlsx', sheet_name = 'merrill_factors')
regression_data.rename(columns={'Unnamed: 0':'Date'}, inplace=True)
regression_data = regression_data.set_index('Date')

def regression_stats(df):
    reg_stats = pd.DataFrame(data = None, index = df.columns, columns = ['beta', 
                                                                         'Treynor Ratio', 
                                                                         'Information Ratio'])
    for col in df.columns:
        # Drop the NAs in y
        y = df[col].dropna()
        # Align the X with y - this is us including the intercept
        # X = sm.add_constant(regression_data['SPY US Equity'].loc[y.index])
        # Without an intercept
        X = df['SPY US Equity'].dropna()
        reg = sm.OLS(y, X).fit()
        reg_stats.loc[col, 'beta'] = reg.params[1]
        # Treynor is calulated as mean/beta
        reg_stats.loc[col, 'Treynor Ratio'] = (df[col].mean() * 12) / reg.params[1]
        reg_stats.loc[col, 'Information Ratio'] = (reg.params[0] / reg.resid.std()) * np.sqrt(12)

    return reg_stats.astype(float).round(4)

# Do simple regression for 2 data sets:
y = regression_data['EEM US Equity']
X = regression_data['SPY US Equity']

hedge_reg = sm.OLS(y, X).fit()
print(hedge_reg.params)


SPY US Equity    0.844954
dtype: float64


### 3.1 
Since we are hedging, for every dollar invested in EEM, we want to short 84 cents of SPY

In [39]:
# 3.2
# First change the position by taking EEM - $0.84*SPY
hedged_pos = (regression_data['EEM US Equity'] - hedge_reg.params[0] * regression_data['SPY US Equity']).to_frame('Market Hedged EEM')

def portfolio_stats_2(data):
    # Calculate the mean and annualize
    mean = data.mean() * 12

    # Volatility = standard deviation
    # Annualize the result with sqrt(12)
    vol = data.std() * np.sqrt(12)

    # Sharpe Ratio is mean / vol
    sharpe_ratio = mean / vol

    # Format for easy reading
    return round(pd.DataFrame(data = [mean, vol, sharpe_ratio], 
        index = ['Mean', 'Volatility', 'Sharpe']), 4)
    
portfolio_stats_2(hedged_pos)

Unnamed: 0,Market Hedged EEM
Mean,-0.0924
Volatility,0.1274
Sharpe,-0.7258


In [40]:
# 3.3 Is the mean the same as EEM?
portfolio_stats_2(regression_data['EEM US Equity'])

Unnamed: 0,0
Mean,0.01
Volatility,0.1815
Sharpe,0.0551


In [41]:
portfolio_stats_2(regression_data['SPY US Equity'])

Unnamed: 0,0
Mean,0.1213
Volatility,0.1456
Sharpe,0.8327


### 3.3
No, the mean of the EEM US Equity is not the same, because we introduced SPY into the portfolio. The mean of the hedged portofolio can be represented by the equation:

mu_hedged = mu_EEM - beta * mu_SPY

### 3.4
It would be difficult to use a regression also containing IWM for attribution or hedging because of how highly correlated it is with SPY.

In [42]:

print('Correlation between IWM and SPY: ' + str(round(regression_data.corr().loc['IWM US Equity', 'SPY US Equity'], 4)))

Correlation between IWM and SPY: 0.8863
