### 1 Short Answer
#### 1. Suppose the Fama-French 3-factor model works perfectly for pricing. Then the three Fama-French factors have the three highest Sharpe ratios of all assets.

False. If the Fama-French model works perfectly, all that means is that it spans the tangency. Spanning the tangency means that the portfolio has the highest Sharpe, but it says nothing about the individual factors.

#### 2. The Fama-French 5-Factor model indicates that, all else equal, a stock with higher investment beta has higher expected returns.

False. The Fama-French Investment factor finds that companies with a high re-investment should actually be shorted in the investment factor to boost mean returns. 

#### 3. Suppose you show DFA that Size and Value have had Sharpe Ratios near 0 since the end of the case. Do you think they would give up on Size and Value factor strategies? Why?

No. Size and Value factors may be useful still in their low correlations to the market, even if their mean returns are low, they could act as a hedge in the portfolio. So DFA should not give up on these strategies.

#### 4. Suppose a stock is uncorrelated to each asset and to each Fama-French factor. Suppose this stock has a relatively high book-to-market ratio. What would the Fama-French 3-factor model predict about the mean return of this stock?

Nothing. Fama-French is not used as a predictive tool for mean returns. Rather, it only predicts how the asset's beta impacts returns. And zero correlation to the other factors means beta = 0, and so its expected return in the model equals zero.

#### 5. In constructing the momentum factor, how do we ensure that the factor does not take too much idiosyncratic risk? How do we ensure it does not have too much turnover?

We use only the very extremes of the universe of assets- the biggest winners and losers. We long the winners and short the losers. We can then reduce turnover by using rolling 12 month momentum, and trade a small slice of the assets in our portfolio on a monthly basis. 

#### 6. Is a long-only momentum fund an attractive investment strategy? Be specific.

No. When investigating a long only momentum fund, we found that the portfolio was highly correlated to the market and was essentially losing all the benefits of a Fama-French style momentum factor. Using a long-short construction was much more effective (even if it requires more trading). 

#### 7. Suppose the CAPM is true, and we test n assets. For these n assets, what do we know about their:
#####       - time-series r-squared metrics?
We know nothing about their time-series r-squared metrics, other than that they are likely to be low. CAPM is not designed to replicate a portfolio, it is used for pricing, so it does not tend to have high r-squared values.

#####       - Treynor Ratios?
If CAPM is true, we should have Treynor Ratios equal to the market return, for all assets

#####       - Information Ratios?
If CAPM is true, all the information ratios should be 0

#### 8. Which of the following do you think Barnstable should be confident about, and which do you think they should reconsider. . .
Over 100 years,<br/>
• The average Market return will outperform the average risk-free rate.<br/>
• The 100-year Market Sharpe Ratio will outperform the 1-year Market Sharpe Ratio.<br/>
• The volatility of the 100-year cumulative Market return is smaller than volatility of the
1-year cumulative Market return.

Barnstable can be confident about the fact that in the long run, the market return will outperform the risk-free rate. The data supported this. 

They can also be confident about the 100 year Sharpe Ratio outperforming the 1 year Sharpe Ratio. We found that Sharpe Ratios grow at roughly the square root of the time horizon, making the 100 year Sharpe ten times that of the 1 year Sharpe.

This statement is false. The volatility grows with cumulative returns (shrinks with average). 

### 2 Pricing Model: Time-Series Test

In [2]:
# Import data
import pandas as pd
import numpy as np
import scipy.stats as stats
import matplotlib.pyplot as plt
import seaborn as sns
import statsmodels.api as sm
commodities_df = pd.read_excel('../data/midterm_2_data_pricing.xlsx', sheet_name='assets (excess returns)')
commodities_df = commodities_df.set_index('Date')
factor_data = pd.read_excel('../data/midterm_2_data_pricing.xlsx', sheet_name='factors (excess returns)')
factor_data = factor_data.set_index('Date')
factor_data.tail()

def time_series_test_2(df, factor_df, factors, annualization=12):
    columns = ['alpha']
    for factor in factors:
        columns.append(f'{factor} Beta')
        
    ts_test = pd.DataFrame(data = None, index = df.columns, columns = columns)

        
    for asset in ts_test.index:
        y = df[asset]
        X = sm.add_constant(factor_df[factors])

        reg = sm.OLS(y, X).fit().params
        ts_test.loc[asset] = [reg[0] * 12, reg[1], reg[2]]
    
    return ts_test
ts_test = time_series_test_2(commodities_df, factor_data, ['MKT', 'CL1'])
display(ts_test)


Unnamed: 0,alpha,MKT Beta,CL1 Beta
NG1,0.119455,-0.037687,0.250161
KC1,0.020321,0.299161,0.03211
CC1,0.063213,0.113898,0.124338
LB1,0.055498,0.779146,0.1874
CT1,0.013018,0.529072,0.06292
SB1,0.069568,0.057906,0.162752
LC1,0.016274,0.106781,0.052885
W1,0.055759,0.291154,-0.002553
S1,0.042099,0.353274,0.038602
C1,0.060939,0.255092,0.065222


#### 2.1 For the asset NG1, report the alpha and betas of the regression.

In [3]:
ts_test.loc['NG1']

alpha       0.119455
MKT Beta   -0.037687
CL1 Beta    0.250161
Name: NG1, dtype: object

#### 2.2 Report the two factor premia implied by the time-series test. Annualize them.

In [4]:
(factor_data.mean() * 12).to_frame('Factor Premia')

Unnamed: 0,Factor Premia
MKT,0.07067
CL1,0.108693


#### 2.3 Report the Mean Absolute Pricing Error (MAE) of the model. Annualize it.

In [5]:
print('MAE: ' + str(round(ts_test['alpha'].abs().mean(), 4)))

MAE: 0.0549


#### 2.4 Report the the largest predicted premium from the model, and note which asset it is.

In [6]:
ts_test = ts_test.rename(columns={'MKT Beta': 'MKT', 'CL1 Beta': 'CL1'})
# NOTE: for the following code to to work, the column names have to align!!!!
(factor_data.mean() * 12 * ts_test[['MKT','CL1']]).sum(axis = 1).to_frame('Predicted Premium').nlargest(1, 'Predicted Premium')

Unnamed: 0,Predicted Premium
LB1,0.075431


### 3 Pricing Model: Cross-Sectional Test

#### 3.1 For the cross-sectional regression, report the R-squared and Intercept. Annualize this number.

In [7]:
y = commodities_df.mean()
X = sm.add_constant(ts_test[['MKT','CL1']].astype(float))

cross_sect = sm.OLS(y, X).fit()
display(cross_sect.summary())

print('R-squared: ' + str(round(cross_sect.rsquared, 4)))
print('Alpha: ' + str(round(cross_sect.params[0] * 12, 4)))



0,1,2,3
Dep. Variable:,y,R-squared:,0.631
Model:,OLS,Adj. R-squared:,0.564
Method:,Least Squares,F-statistic:,9.417
Date:,"Mon, 07 Nov 2022",Prob (F-statistic):,0.00414
Time:,12:55:24,Log-Likelihood:,70.426
No. Observations:,14,AIC:,-134.9
Df Residuals:,11,BIC:,-132.9
Df Model:,2,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
const,0.0038,0.001,3.632,0.004,0.001,0.006
MKT,0.0015,0.002,0.748,0.470,-0.003,0.006
CL1,0.0277,0.006,4.264,0.001,0.013,0.042

0,1,2,3
Omnibus:,0.783,Durbin-Watson:,2.09
Prob(Omnibus):,0.676,Jarque-Bera (JB):,0.636
Skew:,-0.068,Prob(JB):,0.728
Kurtosis:,1.965,Cond. No.,14.4


R-squared: 0.6313
Alpha: 0.0456


#### 3.2 Are either, neither, or both of these estimated metrics evidence against the model?

Both are evidence against the model. For the Cross-Sectional Estimate, we would expect the R-squared to be 1 and the alpha to be 0

#### 3.3 Report the estimated factor premia. (i.e. the two cross-sectional regression slopes). Annualize this number.

In [8]:
display((cross_sect.params[1:] * 12).to_frame('Estimated Factor Premia'))


Unnamed: 0,Estimated Factor Premia
MKT,0.018582
CL1,0.331945


#### 3.4 Report the Mean Absolute Pricing Error (MAE) of the model. Annualize it.

In [9]:
MAE_cs = cross_sect.resid.abs().mean() * 12

print('MAE: ' + str(round(MAE_cs, 4)))

MAE: 0.0169


#### 3.5 Report the the largest predicted premium from the model, and note which asset it is.

In [10]:
# To calculate cross-sectional predicted premia
print(cross_sect.params)
predicted = cross_sect.params[0] + (ts_test[['MKT','CL1']] * cross_sect.params[1:]).sum(axis=1)
(predicted * 12).nlargest(1).to_frame('Predicted Premium')

const    0.003799
MKT      0.001549
CL1      0.027662
dtype: float64


Unnamed: 0,Predicted Premium
NG1,0.127924


### 4 Pricing Model: Conceptual Questions

#### 1. Which is more useful in assessing the model’s fit for pricing: the r-squared of the time-series regressions, the r-squared of the cross-sectional regression, or neither?
The r-squared of the cross-sectional regression. We generally expect a poor r-squared for the time-series regression, and care only about the alphas. However, we expect an r-squared value of one in the cross-sectional regression, otherwise the pricing model is not completely explaining all the premia.

#### 2. We calculated the MAE from the time-series estimation and from the cross-sectional (with intercept) estimation. Is one always bigger than the other? Why or why not?
We would expect the MAE from the time-series estimation to be higher than that of the cross-sectional. The cross-sectional is given an additional degree of freedom and allowed to pick it's intercept. This leads to a lower mean absolute error.

#### 3. If we add another factor, will the time-series MAE decrease? And how about the cross-sectional MAE? Explain.
It is unclear if the MAE of the time-series would increase or decrease. In the cross-sectional, the fit must improve, so we would expect the MAE to decrease.

#### 4. Suppose we built a tangency portfolio using only the factors.

In [11]:
def tangency_weights(returns,dropna=True,scale_cov=1):
    if dropna:
        returns = returns.dropna()

    covmat_full = returns.cov()
    covmat_diag = np.diag(np.diag(covmat_full))
    covmat = scale_cov * covmat_full + (1-scale_cov) * covmat_diag

    weights = np.linalg.solve(covmat,returns.mean())
    weights = weights / weights.sum()

    return pd.DataFrame(weights, index=returns.columns,columns=['tangency weights'])

display(tangency_weights(factor_data))

Unnamed: 0,tangency weights
MKT,0.881089
CL1,0.118911


#### 4.a Does CL1 have much weight in this factor-tangency portfolio?
It isn't huge, but it does have a decent weight at ~12%

#### 4.b Conceptually, does this seem like evidence that CL1 is a useful pricing factor? Why?
While it is not nearly as important as the MKT factor, it does seem like CL1 is mildly useful (at least in this model). It would be more useful if its tangency weight was higher