# M1 Portfolio Theory I
# JTCerda

# Q1 True / False

## 1.1
TRUE. This is because it says that the premium priced in comes only from the market beta 

## 1.2
FALSE. The tangency portfolio is the portfolio on the risky MV frontier with maximum Sharpe ratio

## 1.3
FALSE. As in any regression, including too many regressors will increase the R2 artificially (better look adjusted R2), however will not produce bias on the estimates. It will produce higher standard errors which may make the model useless.

## 1.4
DEPENDS. It depends on what we want to do. Do we want to explain the total return (including the mean) or simply the excess-mean return? Also, in short samples, mean returns may be estimated inaccurately so we may want to include intercept to focus on explaining variation.

## 1.5
TRUE. As n gets large the portfolio must have trivial exposure to security i. Marginal risk is what matters.

# Q2 Short Answer

## 2.1
We use portfolios because they are more stable. Individual securities have much higher volatility, which may affect the statistical analysis, making the coefficients not significant.

## 2.2
The momentum strategy is robust to different construction methods. We can see this as FF approach and AQR approach get roughly the same results and metrics. Even using different degrees of extreme selection (1 or 3 deciles on each side). However, the negative correlation to the market is lost if we are only in the long position. This means that the mutual fund version can not be used in diversification as a hedge against the market.

## 2.3
The information ratio refers to the Sharpe Ratio of the non-factor component of the return (hedge portion). This means that if we run a regression of $r$ against $z$ and take the $\alpha$ and the $\epsilon$ we can compute $$IR=\frac{\alpha}{\sigma_{\epsilon}}$$
where $\alpha$ measures the excess return beyond what is explained by the factor (benchmark) $z$ and $\sigma_{\epsilon}$ measures the non-factor volatility

## 2.4.a
One way to do this will be to run a regression of the target on the three ETFs. This means,
$$\tilde{r}^H_t = \alpha + \beta_1\tilde{r}^{z1}_t + \beta_2\tilde{r}^{z2}_t + \beta_3\tilde{r}^{z3}_t+ \epsilon_t$$
The OLS procedure will minimize the sum of squared errors $\epsilon_t$, which in our setting is the tracking error. We will include the $\alpha$ if we can invest in the risk free rate.

## 2.4.b
Adding a fourth factor that is highly correlated will introduce bias in the estimates (because we violate the OLS asumption that the regressors are uncorrelated) and it will increase the correlation to the hedge (measure here as the R2), however, this would be "data mining" as we are not adding much new information to the model.

# Q3 Allocation

Preliminaries

In [2]:
#Import some libraries
import numpy as np
import pandas as pd
import statsmodels.formula.api as smf
import statsmodels.api as sm
import matplotlib.pyplot as plt
import seaborn as sns
from dataclasses import dataclass
import warnings
sns.set()

pd.set_option('display.float_format', lambda x: '%.4f' % x)

Import the data

In [3]:
#Import from excel on relative folder and merge the DFs
path = 'data/ff_data.xlsx'
factors_df = pd.read_excel(path, sheet_name='FACTORS')[['MKT', 'HML','date']]
factors_df = factors_df.set_index('date')

portf_df = pd.read_excel(path, sheet_name='PORTFOLIOS')
portf_df = portf_df.set_index('date')

df_bm = factors_df.join(portf_df)

In [4]:
#Check
df_bm.tail()

Unnamed: 0_level_0,MKT,HML,SMALL LoBM,ME1 BM2,ME1 BM3,ME1 BM4,SMALL HiBM,ME2 BM1,ME2 BM2,ME2 BM3,...,ME4 BM1,ME4 BM2,ME4 BM3,ME4 BM4,ME4 BM5,BIG LoBM,ME5 BM2,ME5 BM3,ME5 BM4,BIG HiBM
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
2020-04-30,0.1365,-0.0135,0.2438,0.2089,0.1455,0.1143,0.1586,0.1929,0.1284,0.1196,...,0.1678,0.14,0.1186,0.1623,0.1778,0.1409,0.1369,0.0974,0.1046,0.1553
2020-05-31,0.0558,-0.0495,0.1355,0.065,0.0664,0.0429,0.0439,0.1014,0.0725,0.0409,...,0.1064,0.0826,0.0597,0.0333,0.0193,0.0565,0.055,0.038,0.0116,0.038
2020-06-30,0.0246,-0.0222,0.1103,0.0241,0.0681,0.0443,0.0605,0.0542,0.0294,0.0369,...,0.0216,0.0066,-0.0076,0.0265,0.0188,0.0511,-0.0103,-0.0064,-0.0187,0.0211
2020-07-31,0.0577,-0.0132,-0.0015,0.0159,0.0282,-0.0059,0.0471,0.0085,0.0365,0.0335,...,0.0596,0.0768,0.0636,0.0271,0.022,0.0772,0.0517,0.0432,0.0174,0.0294
2020-08-31,0.0762,-0.031,0.0347,0.0465,0.0467,0.0543,0.0582,0.0766,0.0704,0.048,...,0.044,0.0366,0.0457,0.0456,0.0376,0.1079,0.0644,0.0243,0.0277,0.0348


## 3.1.a

Summary Statistics for the 25 test assets.

In [5]:
#Compute the stats and anualize
mu = portf_df.mean()
vol = portf_df.std()
sharpe = mu / vol
summary = pd.DataFrame({'Mean':mu * 12, 'Vol':vol * np.sqrt(12), 'Sharpe': sharpe * np.sqrt(12)})
summary

Unnamed: 0,Mean,Vol,Sharpe
SMALL LoBM,0.1035,0.4186,0.2473
ME1 BM2,0.1167,0.3391,0.3441
ME1 BM3,0.1497,0.309,0.4844
ME1 BM4,0.1686,0.2874,0.5867
SMALL HiBM,0.1899,0.3194,0.5944
ME2 BM1,0.1104,0.2754,0.4009
ME2 BM2,0.1445,0.259,0.5578
ME2 BM3,0.1484,0.25,0.5937
ME2 BM4,0.155,0.2563,0.6049
ME2 BM5,0.1775,0.3007,0.5901


## 3.2.a

Compute tangency portfolio

In [6]:
Sigma = portf_df.cov()
Sigma_inv = np.linalg.inv(Sigma)

# from the formula for the tangency weights
N = mu.shape[0]
weights = Sigma_inv @ mu / (np.ones(N) @ Sigma_inv @ mu)      

#Make a series
wts_tan = pd.Series(weights, index=summary.index)

print('Weights of the tangency portfolio')
wts_tan

Weights of the tangency portfolio


SMALL LoBM   -0.3135
ME1 BM2      -0.5536
ME1 BM3      -0.3020
ME1 BM4       0.5322
SMALL HiBM    0.7394
ME2 BM1      -0.6465
ME2 BM2       0.1100
ME2 BM3       0.3757
ME2 BM4       0.3046
ME2 BM5       0.2290
ME3 BM1      -0.4924
ME3 BM2       0.7292
ME3 BM3       0.1982
ME3 BM4       0.4471
ME3 BM5      -0.1915
ME4 BM1       0.6138
ME4 BM2      -0.7314
ME4 BM3      -0.1978
ME4 BM4       0.2559
ME4 BM5      -0.5315
BIG LoBM      0.8291
ME5 BM2      -0.1439
ME5 BM3       0.7051
ME5 BM4      -1.0271
BIG HiBM      0.0619
dtype: float64

In [7]:
#Compute the tangency series and stats
portf_df_tan = portf_df @ wts_tan
mu_tan = portf_df_tan.mean()
vol_tan = portf_df_tan.std()
sharpe_tan = mu_tan / vol_tan
print('Tangency mean: ', mu_tan * 12)
print('Tangency volatility: ', vol_tan * np.sqrt(12))
print('Tangency sharpe ratio: ', sharpe_tan * np.sqrt(12))

Tangency mean:  0.25565869505868993
Tangency volatility:  0.2176548094233915
Tangency sharpe ratio:  1.1746062296347959


## 3.3.a

In [8]:
Sigma_diag = np.diag(Sigma.values.diagonal())
Sigma_diag_inv = np.linalg.inv(Sigma_diag)

# from the formula for the tangency weights
N = mu.shape[0]
weights_diag = Sigma_diag_inv @ mu / (np.ones(N) @ Sigma_diag_inv @ mu)      

#Make a series
wts_tan_diag = pd.Series(weights_diag, index=summary.index)

print('Weights of the tangency diagonal portfolio')
wts_tan_diag

Weights of the tangency diagonal portfolio


SMALL LoBM   0.0107
ME1 BM2      0.0183
ME1 BM3      0.0283
ME1 BM4      0.0369
SMALL HiBM   0.0336
ME2 BM1      0.0263
ME2 BM2      0.0389
ME2 BM3      0.0429
ME2 BM4      0.0427
ME2 BM5      0.0355
ME3 BM1      0.0334
ME3 BM2      0.0512
ME3 BM3      0.0507
ME3 BM4      0.0486
ME3 BM5      0.0337
ME4 BM1      0.0476
ME4 BM2      0.0511
ME4 BM3      0.0497
ME4 BM4      0.0467
ME4 BM5      0.0308
BIG LoBM     0.0601
ME5 BM2      0.0588
ME5 BM3      0.0560
ME5 BM4      0.0379
BIG HiBM     0.0294
dtype: float64

In [8]:
#Compute the tangency series and stats
portf_df_tan_diag = portf_df @ wts_tan_diag
mu_tan_diag = portf_df_tan_diag.mean()
vol_tan_diag = portf_df_tan_diag.std()
sharpe_tan_diag = mu_tan_diag / vol_tan_diag
print('Diagonal Tangency  mean: ', mu_tan_diag * 12)
print('Diagonal Tangency volatility: ', vol_tan_diag * np.sqrt(12))
print('Diagonal Tangency sharpe ratio: ', sharpe_tan_diag * np.sqrt(12))

Diagonal Tangency  mean:  0.13766329412572048
Diagonal Tangency volatility:  0.2275292771147978
Diagonal Tangency sharpe ratio:  0.6050355183797456


# Q4

## 4.1.a

In [9]:
#Define the class of our outputs
@dataclass
class RegressionsOutput():
    excess_ret_stats: pd.DataFrame
    params: pd.DataFrame
    residuals: pd.DataFrame
    tstats: pd.DataFrame
    other: pd.DataFrame
    df: pd.DataFrame
        

#Define a function that runs the factor regression        
def lfm_time_series_regression(df, portfolio_names, factors, annualize_factor=12):
    excess_ret_stats = pd.DataFrame(index=factors, columns=['average', 'std'], dtype=float)
    for factor in factors:
        excess_ret_stats.loc[factor, 'average'] = annualize_factor * df[factor].mean()
        excess_ret_stats.loc[factor, 'std'] = np.sqrt(annualize_factor) * df[factor].std()
        excess_ret_stats.loc[factor, 'sharpe_ratio'] = \
            excess_ret_stats.loc[factor, 'average'] / excess_ret_stats.loc[factor, 'std']
        # Here I'll just report the unscaled skewness
        excess_ret_stats.loc[factor, 'skewness'] = df[factor].skew()
        # excess_ret_stats.loc[factor, 'skewness'] = annualize_factor * df[factor].skew()

    _temp_excess_ret_stats = excess_ret_stats.copy()
    _temp_excess_ret_stats.loc['const', :] = 0

    with warnings.catch_warnings():
        warnings.simplefilter("ignore")
        rhs = sm.add_constant(df[factors])
    df_params = pd.DataFrame(columns=portfolio_names)
    df_other = pd.DataFrame(columns=portfolio_names)
    df_residuals = pd.DataFrame(columns=portfolio_names)
    df_tstats = pd.DataFrame(columns=portfolio_names)
    for portfolio in portfolio_names:
        lhs = df[portfolio]
        res = sm.OLS(lhs, rhs, missing='drop').fit()
        df_params[portfolio] = res.params
        df_params.loc['const', portfolio] = annualize_factor * res.params['const']
        df_other.loc['r_squared', portfolio] = res.rsquared
        df_other.loc['model_implied_excess_ret', portfolio] = df_params[portfolio] @ _temp_excess_ret_stats['average']
        df_other.loc['ave_excess_ret', portfolio] = \
            annualize_factor * df[portfolio].mean()
        df_other.loc['std_excess_ret', portfolio] = \
            np.sqrt(annualize_factor) * df[portfolio].std()
        df_other.loc['skewness_excess_ret', portfolio] = \
            annualize_factor * df[portfolio].skew()
        df_other.loc['sharpe_ratio', portfolio] = \
            df_other.loc['ave_excess_ret', portfolio] / df_other.loc['std_excess_ret', portfolio]
        df_residuals[portfolio] = res.resid
        df_tstats[portfolio] = res.tvalues

    regression_outputs = RegressionsOutput(
        excess_ret_stats.T,
        df_params.T,
        df_residuals,
        df_tstats.T,
        df_other.T,
        df)


    return regression_outputs

In [19]:
#Define the names of the assets in our portfolio
beta_names = list(df_bm[:0])

#Run the regression
beta_regs = lfm_time_series_regression(
    df=df_bm,
    portfolio_names=beta_names,
    factors=['MKT','HML']
    )

beta_regs_mkt = lfm_time_series_regression(
    df=df_bm,
    portfolio_names=beta_names,
    factors=['MKT']
    )

#Run the regression
beta_regs_hml = lfm_time_series_regression(
    df=df_bm,
    portfolio_names=beta_names,
    factors=['HML']
    )

In [26]:
from sklearn.metrics import mean_absolute_error as mae

bm_portfolios = pd.DataFrame(index=portf_df.columns)
rhs = sm.add_constant(df_bm[['MKT','HML']])

bm_residuals = pd.DataFrame(columns=portf_df.columns)
t_p_values = pd.DataFrame()

for portf in bm_portfolios.index:
    lhs = df_bm[portf]
    res = sm.OLS(lhs, rhs, missing='drop').fit()
    bm_portfolios.loc[portf, 'alpha_hat'] = res.params['const']
    bm_portfolios.loc[portf, 'beta_MKT'] = res.params['MKT']
    bm_portfolios.loc[portf, 'beta_HML'] = res.params['HML']
    bm_portfolios.loc[portf, '$R^2$'] = res.rsquared

mae_alpha = np.mean(np.abs(bm_portfolios['alpha_hat'].values))

display(bm_portfolios)

print("MAE of alphas = {:.5f}".format(mae_alpha))

Unnamed: 0,alpha_hat,beta_MKT,beta_HML,$R^2$
SMALL LoBM,-0.0033,1.5421,0.4685,0.5281
ME1 BM2,-0.0003,1.3551,0.289,0.5949
ME1 BM3,0.002,1.2748,0.5752,0.7165
ME1 BM4,0.0041,1.1648,0.6378,0.7318
SMALL HiBM,0.0044,1.2199,0.9726,0.7606
ME2 BM1,0.0011,1.2901,-0.1674,0.7286
ME2 BM2,0.0034,1.196,0.1807,0.7735
ME2 BM3,0.0034,1.1327,0.3972,0.8184
ME2 BM4,0.0035,1.1122,0.6066,0.8379
ME2 BM5,0.0035,1.2363,0.922,0.8524


MAE of alphas = 0.00272


## 4.1.b

In [27]:
print("\nR-squared average = {:.3f}".format(bm_portfolios['$R^2$'].mean()))


R-squared average = 0.828


## 4.2.a

In [30]:
cross_df = pd.DataFrame(index=portf_df.columns)
cross_df['mean_ret'] = portf_df.mean()
cross_df['alpha_hat'] = bm_portfolios['alpha_hat']
cross_df['beta_MKT'] = bm_portfolios['beta_MKT']
cross_df['beta_HML'] = bm_portfolios['beta_HML']

cross_df

Unnamed: 0,mean_ret,alpha_hat,beta_MKT,beta_HML
SMALL LoBM,0.0086,-0.0033,1.5421,0.4685
ME1 BM2,0.0097,-0.0003,1.3551,0.289
ME1 BM3,0.0125,0.002,1.2748,0.5752
ME1 BM4,0.0141,0.0041,1.1648,0.6378
SMALL HiBM,0.0158,0.0044,1.2199,0.9726
ME2 BM1,0.0092,0.0011,1.2901,-0.1674
ME2 BM2,0.012,0.0034,1.196,0.1807
ME2 BM3,0.0124,0.0034,1.1327,0.3972
ME2 BM4,0.0129,0.0035,1.1122,0.6066
ME2 BM5,0.0148,0.0035,1.2363,0.922


In [39]:
#rhs_cross = sm.add_constant(cross_df[['alpha_hat', 'beta_MKT', 'beta_HML']])
#rhs_cross = cross_df[['alpha_hat', 'beta_MKT', 'beta_HML']]
rhs_cross = sm.add_constant(cross_df[['beta_MKT', 'beta_HML']])
lhs_cross = cross_df['mean_ret']
res_cross = sm.OLS(lhs_cross, rhs_cross, missing='drop').fit()
factors_premia = res_cross.params
cross_r_sq = res_cross.rsquared
res_cross.summary()

0,1,2,3
Dep. Variable:,mean_ret,R-squared:,0.46
Model:,OLS,Adj. R-squared:,0.411
Method:,Least Squares,F-statistic:,9.362
Date:,"Wed, 18 Nov 2020",Prob (F-statistic):,0.00114
Time:,13:49:29,Log-Likelihood:,129.11
No. Observations:,25,AIC:,-252.2
Df Residuals:,22,BIC:,-248.6
Df Model:,2,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
const,0.0107,0.003,4.244,0.000,0.005,0.016
beta_MKT,-0.0004,0.002,-0.199,0.844,-0.005,0.004
beta_HML,0.0033,0.001,4.289,0.000,0.002,0.005

0,1,2,3
Omnibus:,2.512,Durbin-Watson:,0.698
Prob(Omnibus):,0.285,Jarque-Bera (JB):,1.658
Skew:,-0.631,Prob(JB):,0.436
Kurtosis:,3.021,Cond. No.,18.0


In [42]:
print('factors_premia',factors_premia[1:])
print('mean_MKT',factors_df['MKT'].mean())
print('mean_HML',factors_df['HML'].mean())

factors_premia beta_MKT   -0.0004
beta_HML    0.0033
dtype: float64
mean_MKT 0.006738053097345137
mean_HML 0.003245309734513273


In [44]:
print('Intercept',factors_premia[0])

Intercept 0.010713366431000538


In [45]:
print('R2',cross_r_sq)

R2 0.4597817405648378


In [13]:
mkt_df = pd.DataFrame(beta_regs_mkt.other['ave_excess_ret'][2:])
mkt_df['beta_hat'] = beta_regs_mkt.params['MKT'][2:]
mkt_df

Unnamed: 0,ave_excess_ret,beta_hat
SMALL LoBM,0.1035,1.6143
ME1 BM2,0.1167,1.3997
ME1 BM3,0.1497,1.3634
ME1 BM4,0.1686,1.2632
SMALL HiBM,0.1899,1.3699
ME2 BM1,0.1104,1.2643
ME2 BM2,0.1445,1.2239
ME2 BM3,0.1484,1.194
ME2 BM4,0.155,1.2057
ME2 BM5,0.1775,1.3785


In [16]:
Y = np.asarray(mkt_df['ave_excess_ret'])
X = np.asarray(mkt_df['beta_hat'])
res = sm.OLS(Y, X).fit()
res.params

array([0.11243330934939118], dtype=object)

In [None]:
#Create a table wit the statistics
def create_table1(regs_object):
    table1 = pd.DataFrame(
        regs_object.other[['ave_excess_ret', 'std_excess_ret', 'sharpe_ratio']])
    table1['Mkt beta'] = regs_object.params['MKT']
    table1['Mkt Corr'] = regs_object.df.corr()['MKT']
    table1['Treynor_ratio'] = regs_object.other['ave_excess_ret'] / regs_object.params['MKT']
    table1['Information_ratio'] = regs_object.params['const'] / regs_object.other['std_excess_ret']
    return table1

table1 = create_table1(beta_regs)
table1