This problem set explores momentum strategies, particularly industry momentum (so you don’t have to deal with 10,000 individual stocks) and also commodity momentum. Reading the Moskowitz and Grinblatt (1999) article will be useful. In order to proceed you need the file “Problem_Set6.xls”. This file contains four spreadsheets. The first contains returns on 30 value-weighted industries, the second contains the factor portfolios of Fama and French, the third contains 32 commodity return series (from futures contracts), and the fourth contains returns to 125 portfolios based on size and short-term, intermediate-term, and long-term past-return sorts to look at the horizon over which momentum and reversals work. In addition, the file contains monthly T-bill returns. The factor portfolios include RMRF (market), HML (book-to-market factor), SMB (size factor), and UMD (individual stock momentum factor).

In [1]:
import pandas as pd
import numpy as np
import statsmodels.api as sm
from scipy.stats import f
import matplotlib.pyplot as plt

  import pandas.util.testing as tm


In [2]:
#--------------------Read subsheet 1--------------------
sheet1=pd.read_excel("Problem_Set6.xls", sheet_name='30 industries', 
                     skiprows=3, index_col=None, na_values=-999)
sheet1["Date"] = pd.to_datetime(sheet1["Unnamed: 0"], format="%Y%m")
industries = sheet1.set_index("Date")
industries = industries.loc[industries.index.dropna()]
industries = industries.drop("Unnamed: 0", axis = 1)

#--------------------Read subsheet 2--------------------
sheet2=pd.read_excel("Problem_Set6.xls", sheet_name='Fama-French factors', 
                     skiprows=2, index_col=None, na_values=-999)
sheet2["Date"] = pd.to_datetime(sheet2["Unnamed: 0"], format="%Y%m")
factors = sheet2.set_index("Date")
factors = factors.loc[factors.index.dropna()]
factors = factors.drop("Unnamed: 0", axis = 1)

#--------------------Read subsheet 3--------------------
sheet3=pd.read_excel("Problem_Set6.xls", sheet_name='Commodities', 
                     index_col=None, na_values=-999)
commodities = sheet3.set_index("time")
commodities = commodities.loc[commodities.index.dropna()]

#--------------------Read subsheet 4--------------------
sheet4_p1=pd.read_excel("Problem_Set6.xls", sheet_name='125 Size-Past-return portfolios', 
                     skiprows=4, index_col=None, usecols="A:Z", na_values=-999)
sheet4_p1["Date"] = pd.to_datetime(sheet4_p1["Past 1-month return"], format="%Y%m")
p1_return = sheet4_p1.set_index("Date")
p1_return = p1_return.loc[p1_return.index.dropna()]
p1_return = p1_return.drop("Past 1-month return", axis = 1)

sheet4_p212=pd.read_excel("Problem_Set6.xls", sheet_name='125 Size-Past-return portfolios', 
                     skiprows=4, index_col=None, usecols="AA:AZ", na_values=-999)
sheet4_p212["Date"] = pd.to_datetime(sheet4_p212["Past 2-12-month return"], format="%Y%m")
p212_return = sheet4_p212.set_index("Date")
p212_return = p212_return.loc[p212_return.index.dropna()]
p212_return = p212_return.drop("Past 2-12-month return", axis = 1)

sheet4_p1360=pd.read_excel("Problem_Set6.xls", sheet_name='125 Size-Past-return portfolios', 
                     skiprows=4, index_col=None, usecols="BA:BZ", na_values=-999)
sheet4_p1360["Date"] = pd.to_datetime(sheet4_p1360["Past 13-60-month return"], format="%Y%m")
p1360_return = sheet4_p1360.set_index("Date")
p1360_return = p1360_return.loc[p1360_return.index.dropna()]
p1360_return = p1360_return.drop("Past 13-60-month return", axis = 1)

## a)	Start with the industry portfolio spreadsheet. Compute the 1-month, 1-month industry momentum portfolio by finding the three best and three worst performing industries in the previous month.  Then, take the equal weighted average return of the three best industries in the following month, and subtract from this the equal weighted average return of the three worst performing industries in the following month.  Calculate the time-series average return, t-statistic, Sharpe ratio, and standard deviation on this momentum strategy.

In [3]:
# Problem a)
# Find best and worst performing 3 industries for each period

maxdf = industries.apply(lambda x: pd.Series(x.nlargest(3).index.values), axis=1) # best three
maxdf.columns = ['Max1', 'Max2', 'Max3']
mindf = industries.apply(lambda x: pd.Series(x.nsmallest(3).index.values), axis=1) # worst three
mindf.columns = ['Min1', 'Min2', 'Min3']

#print(industries)
#print(maxdf)
#print(mindf)

In [4]:
mom_pf = []
for i in range(0,len(industries)-1):
  temp = (1/3) * (industries.iloc[i+1,:][maxdf.iloc[i,0]] + industries.iloc[i+1,:][maxdf.iloc[i,1]] + \
                  industries.iloc[i+1,:][maxdf.iloc[i,2]] - industries.iloc[i+1,:][mindf.iloc[i,0]] - \
                  industries.iloc[i+1,:][mindf.iloc[i,1]] - industries.iloc[i+1,:][mindf.iloc[i,2]])
  mom_pf.append(temp)

print("mean: ", round(np.mean(mom_pf), 2))
print("T-value: ", round(np.mean(mom_pf) * np.sqrt(len(industries)-1) / np.std(mom_pf), 2))
print("Sharpe ratio: ", round((np.mean(mom_pf) - np.mean(factors.iloc[0:-4,:]['RF']))/np.std(mom_pf), 2))
print("standard deviation: ", round(np.std(mom_pf), 2))

mean:  0.73
T-value:  4.38
Sharpe ratio:  0.08
standard deviation:  5.41


## b)	For the 1-month, 1-month strategy of part a), decompose its returns into three components:
1)	The cross-sectional variance of industry sample mean returns (cross-sectional variance of the 30 industry sample means).
2)	The cross-sectional variance of market betas times the average cross-autocovariance of returns to the market portfolio (you need to calculate this yourself).
3)	The average cross-autocovariance of residual returns (with respect to the market model) for all industries.
(Hint: you will need to estimate the betas and residual returns for each industry relative to the market model or CAPM.)

To summarize the three components, momentum profits should be approximately the sum of the three, as follows:
$$
\frac{1}{N} \sum_{j=1}^{N} M o m_{j}=\sigma_{\mu}^{2}+\sigma_{\beta}^{2} \operatorname{cov}\left(\widetilde{F}_{t}, \widetilde{F}_{t-1}\right)+\frac{1}{N} \sum_{j=1}^{N} \operatorname{cov}\left(\varepsilon_{j t}, \varepsilon_{j t-1}\right)
$$
Which component is the greatest contributor to momentum profits?

In [5]:
# after doing a), we noticed that factors and industries have 1069, 1073 length, respectively. 
# We delete the last four rows in factors and rename it as factors_new
factors_new = factors.iloc[0:-4,:]

# first component
var_mu = np.var(np.mean(industries))

# second component
beta = []
residual = []
for i in range(industries.shape[1]):
  x = sm.add_constant(factors_new['RF']) 
  mod1 = sm.OLS(industries.iloc[:,i], x).fit()
  residual.append(mod1.params[0]) 
  beta.append(mod1.params[1])

var_beta = np.var(beta)
cov_mkt = np.cov(factors_new['Mkt-RF'][0:len(factors_new)-1], factors_new['Mkt-RF'][1:len(factors_new)])
second_comp = var_beta * cov_mkt[0,1]

# third component
cov_sum = []
for i in range(len(residual)):
  temp = np.cov(residual[0:len(residual)-1], residual[1:len(residual)])
  cov_sum.append(temp)
third_comp = np.sum(cov_sum) / len(residual)

print("first component =", round(var_mu, 2), "second component =", round(second_comp, 2), "third component =", round(third_comp, 2))

first component = 0.01 second component = 1.37 third component = 0.11


We can see that the second component is overwhelmingly the greatest contributor to momentum profits. 

## c)	Repeat part a) (but NOT part b)) for the 12-month, 1-month industry momentum portfolio, where the only difference is that we define winning and losing industries by their past 12-month performance (cumulative return over past 12-months) rather than past 1-month. 

In [6]:
def k_largest_index_argpartition_v1(a, k):
  # https://stackoverflow.com/questions/43386432/how-to-get-indexes-of-k-maximum-values-from-a-numpy-multidimensional-array
    idx = np.argpartition(-a.ravel(),k)[:k]
    return np.column_stack(np.unravel_index(idx, a.shape))

temp_df = np.zeros(industries.shape)
for i in range(12, industries.shape[0]):
  temp_df[i,:] = np.sum(industries.iloc[i-12:i,:])
temp_df_for_real = pd.DataFrame(temp_df)

df_min = pd.DataFrame(np.zeros((industries.shape[0],3)))
for i in range(0, industries.shape[0]-12):
  df_min.iloc[i + 12]= k_largest_index_argpartition_v1(-temp_df_for_real.iloc[i+12,:],3).squeeze()

df_max = pd.DataFrame(np.zeros((industries.shape[0],3)))
for i in range(0, industries.shape[0]-12):
  df_max.iloc[i + 12]= k_largest_index_argpartition_v1(temp_df_for_real.iloc[i+12,:],3).squeeze()

# assign first 12 rows as NA
empty_rows = pd.DataFrame(df_min[:][:12]).copy()
empty_rows.replace(0, np.nan, inplace = True)

dic = {}
for i, j in enumerate(industries.columns):
  dic[i] = j

for index, data in df_max.iterrows():
  tmp = []
  for i, j in enumerate(data):
    tmp.append(dic[int(j)])
  df_max.iloc[index] = tmp

for index, data in df_min.iterrows():
  tmp = []
  for i, j in enumerate(data):
    tmp.append(dic[int(j)])
  df_min.iloc[index] = tmp

df_max[:][:12] = empty_rows
df_min[:][:12] = empty_rows

#print(df_max)
#print(df_min)

In [7]:
df_max.index = industries.index
df_min.index = industries.index

twelve_mom_pf = []
for i in range(0,len(industries)-12):
  temp = (1/3) * (industries.iloc[i+12,:][df_max.iloc[i+12,0]] + industries.iloc[i+12,:][df_max.iloc[i+12,1]] + \
                  industries.iloc[i+12,:][df_max.iloc[i+12,2]] - industries.iloc[i+12,:][df_min.iloc[i+12,0]] - \
                  industries.iloc[i+12,:][df_min.iloc[i+12,1]] - industries.iloc[i+12,:][df_min.iloc[i+12,2]])
  twelve_mom_pf.append(temp)

print("mean: ", round(np.mean(twelve_mom_pf), 2))
print("T-value: ", round(np.mean(twelve_mom_pf) * np.sqrt(len(industries[12:])-1) / np.std(twelve_mom_pf), 2))
print("Sharpe ratio: ", round((np.mean(twelve_mom_pf) - np.mean(factors_new.iloc[12:,:]['RF']))/np.std(twelve_mom_pf), 2))
print("standard deviation: ", round(np.std(twelve_mom_pf), 2))

mean:  0.89
T-value:  4.76
Sharpe ratio:  0.1
standard deviation:  6.07


## d)	Now compute the 12-month, 1-month industry momentum portfolio but this time skipping a month between portfolio formation and returns.  That is, rank the industries by their past returns from month t-12 to t-2, then take the equal weighted average return of the top three industries based on that ranking and subtract the equal-weighted average of the bottom three industries based on that ranking using returns from month t. Thus, skipping the information in month t-1 entirely. Calculate the time-series average return, t-statistic, Sharpe ratio, and standard deviation on this momentum strategy and compare it to c) where you didn’t skip a month between portfolio formation and returns.

In [8]:
temp_df1 = np.zeros(industries.shape)
for i in range(12, industries.shape[0]):
  temp_df1[i,:] = np.sum(industries.iloc[i-12:i-1,:])
temp_df_for_real1 = pd.DataFrame(temp_df1)

df_min1 = pd.DataFrame(np.zeros((industries.shape[0],3)))
for i in range(0, industries.shape[0]-12):
  df_min1.iloc[i + 12]= k_largest_index_argpartition_v1(-temp_df_for_real1.iloc[i+12,:],3).squeeze()

df_max1 = pd.DataFrame(np.zeros((industries.shape[0],3)))
for i in range(0, industries.shape[0]-12):
  df_max1.iloc[i + 12]= k_largest_index_argpartition_v1(temp_df_for_real1.iloc[i+12,:],3).squeeze()

# assign first 12 rows as NA
empty_rows1 = pd.DataFrame(df_min1[:][:12]).copy()
empty_rows1.replace(0, np.nan, inplace = True)

dic = {}
for i, j in enumerate(industries.columns):
  dic[i] = j

for index, data in df_max1.iterrows():
  tmp = []
  for i, j in enumerate(data):
    tmp.append(dic[int(j)])
  df_max1.iloc[index] = tmp

for index, data in df_min1.iterrows():
  tmp = []
  for i, j in enumerate(data):
    tmp.append(dic[int(j)])
  df_min1.iloc[index] = tmp

df_max1[:][:12] = empty_rows1
df_min1[:][:12] = empty_rows1

#print(df_max1)
#print(df_min1)

df_max1.index = industries.index
df_min1.index = industries.index

twelve_mom_pf1 = []
for i in range(0,len(industries)-12):
  temp = (1/3) * (industries.iloc[i+12,:][df_max1.iloc[i+12,0]] + industries.iloc[i+12,:][df_max1.iloc[i+12,1]] + \
                  industries.iloc[i+12,:][df_max1.iloc[i+12,2]] - industries.iloc[i+12,:][df_min1.iloc[i+12,0]] - \
                  industries.iloc[i+12,:][df_min1.iloc[i+12,1]] - industries.iloc[i+12,:][df_min1.iloc[i+12,2]])
  twelve_mom_pf1.append(temp)

print("mean: ", round(np.mean(twelve_mom_pf1), 2))
print("T-value: ", round(np.mean(twelve_mom_pf1) * np.sqrt(len(industries[12:])-1) / np.std(twelve_mom_pf1), 2))
print("Sharpe ratio: ", round((np.mean(twelve_mom_pf1) - np.mean(factors_new.iloc[12:,:]['RF']))/np.std(twelve_mom_pf1), 2))
print("standard deviation: ", round(np.std(twelve_mom_pf1), 2))

mean:  0.95
T-value:  5.08
Sharpe ratio:  0.11
standard deviation:  6.06


Comparing (a), (c), and (d), the mean return gets progresively higher while the T-value also gets progressively higher, indicating that the strategies are progressively better in terms of both return and validity.

# e)	For all three momentum strategies from parts a), c), and d), calculate the three-factor Fama-French alpha using the factors RMRF, SMB, and HML, and determine if the Fama-French model can price the momentum strategies?

In [9]:
#factors.iloc[12:-4, :]

In [10]:
mom_3factors = factors.iloc[1:-4, :]
x = sm.add_constant(mom_3factors[['Mkt-RF', 'SMB', 'HML']])
mod2 = sm.OLS(mom_pf, x).fit()
print("For 1-month, 1-month strategy: 3-factor alpha = ", round(mod2.params[0], 2), "; 3-factor R-squared = ", round(mod2.rsquared, 2))

twelve_mom_3factors = factors.iloc[12:-4, :]
x = sm.add_constant(twelve_mom_3factors[['Mkt-RF', 'SMB', 'HML']])
mod3 = sm.OLS(twelve_mom_pf, x).fit()
print("For 12-month, 1-month strategy: 3-factor alpha = ", round(mod3.params[0], 2), "; 3-factor R-squared = ", round(mod3.rsquared, 2))

twelve_mom_3factors2 = factors.iloc[12:-4, :]
x = sm.add_constant(twelve_mom_3factors2[['Mkt-RF', 'SMB', 'HML']])
mod3 = sm.OLS(twelve_mom_pf1, x).fit()
print("For 12-month, 2-month strategy: 3-factor alpha = ", round(mod3.params[0], 2), "; 3-factor R-squared = ", round(mod3.rsquared, 2))


For 1-month, 1-month strategy: 3-factor alpha =  0.73 ; 3-factor R-squared =  0.01
For 12-month, 1-month strategy: 3-factor alpha =  1.07 ; 3-factor R-squared =  0.05
For 12-month, 2-month strategy: 3-factor alpha =  1.12 ; 3-factor R-squared =  0.05


The three-factor Fama-Frech model seems unable to price the momentum strategies. The alpha values are very high, indicating that there is a large portion of the return unexplained by RMRF, SMB, and HML. This is also confirmed by the extremely low R-squared value, again showing that the three factors only account for a very small portion of the return from the momentum strategies.

Among the three strategies, we can see that the R-squared slightly increases, though all at extremely low level. At the same time, however, the alpha values also increase, showing progressively larger return unexplained by the three-factors.

# f)	For all three momentum strategies from parts a), c), and d), calculate the four-factor Fama-French alpha using the factors RMRF, SMB, HML, and UMD, and determine if this model can now price the momentum strategies?  What do you learn from this regression?

In [11]:
# Part f)
mom_4factors = mom_3factors.iloc[5:]
x = sm.add_constant(mom_4factors[['Mkt-RF', 'SMB', 'HML', 'UMD']])
mod3 = sm.OLS(mom_pf[5:], x).fit()
print("For 1-month, 1-month strategy: 4-factor alpha = ", round(mod3.params[0], 2), "; 4-factor R-squared = ", round(mod3.rsquared, 2))

twelve_mom_4factors = twelve_mom_3factors
x = sm.add_constant(twelve_mom_4factors[['Mkt-RF', 'SMB', 'HML', 'UMD']])
mod6 = sm.OLS(twelve_mom_pf, x).fit()
print("For 12-month, 1-month strategy: 4-factor alpha = ", round(mod6.params[0], 2), "; 4-factor R-squared = ", round(mod6.rsquared, 2))


twelve_mom_4factors2 = twelve_mom_3factors
x = sm.add_constant(twelve_mom_4factors2[['Mkt-RF', 'SMB', 'HML', 'UMD']])
mod6 = sm.OLS(twelve_mom_pf1, x).fit()
print("For 12-month, 2-month strategy: 4-factor alpha = ", round(mod6.params[0], 2), "; 4-factor R-squared = ", round(mod6.rsquared, 2))

For 1-month, 1-month strategy: 4-factor alpha =  0.59 ; 4-factor R-squared =  0.02
For 12-month, 1-month strategy: 4-factor alpha =  0.1 ; 4-factor R-squared =  0.49
For 12-month, 2-month strategy: 4-factor alpha =  0.15 ; 4-factor R-squared =  0.49


Similar to the three-factors model, the four-factors model is unable to explain the 1-month, 1-month strategy, with large alpha value and extremely low R-squared value.

Howevever, unlike the three-factors model, the four-factors model seem to be able to explain, at least to a good extent, the return for the 12-month, 1-month strategy and the 12-month, 2-month strategy. Both have relatively low alpha values and somewhat satisfactory R-squared of 0.49.

This exercise shows that, to explain the momentum strategy with factor models, the selection period should not be short-term (1-month), but rather but at least medium length (12-month). Additionally, UMD seems to be a factor that boosts the model's ability to explain the strategy's return.

# g)	Using the commodity return series, 

## i.	Compute the following momentum strategy returns: 1-month, 1-month; 12-month, 1-month; and 12-month, 1-month, skipping a month between portfolio formation and returns. (These are the same three momentum strategies you computed above for the industry portfolios.)
*Start the strategy at the beginning of the sample using the commodities that have available return data at that time and over time continue to add commodities to the sample as data becomes more available. (Hence, in 1970 you are taking the top and bottom 3 commodities from among 11 commodities, whereas by the end of the sample in 2015 you are choosing the top and bottom 3 among 32 commodities.) Comment on whether or not you think this matters.


In [12]:
# 1-month 1-month
maxdf = commodities.apply(lambda x: pd.Series(x.nlargest(3).index.values), axis=1) # best three
maxdf.columns = ['Max1', 'Max2', 'Max3']
mindf = commodities.apply(lambda x: pd.Series(x.nsmallest(3).index.values), axis=1) # worst three
mindf.columns = ['Min1', 'Min2', 'Min3']

comm_mom_pf = []
for i in range(0,len(commodities)-1):
  temp = (1/3) * (commodities.iloc[i+1,:][maxdf.iloc[i,0]] + commodities.iloc[i+1,:][maxdf.iloc[i,1]] + \
                  commodities.iloc[i+1,:][maxdf.iloc[i,2]] - commodities.iloc[i+1,:][mindf.iloc[i,0]] - \
                  commodities.iloc[i+1,:][mindf.iloc[i,1]] - commodities.iloc[i+1,:][mindf.iloc[i,2]])
  comm_mom_pf.append(temp)

print("1-Month 1-Month Strategy:")
print("mean: ", round(np.mean(comm_mom_pf), 2))
print("T-value: ", round(np.mean(comm_mom_pf) * np.sqrt(len(commodities)-1) / np.std(comm_mom_pf), 2))
print("Sharpe ratio: ", round((np.mean(comm_mom_pf[0:-1]) - np.mean(factors.iloc[523:,:]['RF']))/np.std(comm_mom_pf), 2))
print("standard deviation: ", round(np.std(comm_mom_pf), 2))

1-Month 1-Month Strategy:
mean:  0.02
T-value:  4.25
Sharpe ratio:  -4.1
standard deviation:  0.09


In [13]:
temp_df = np.zeros(commodities.shape)
for i in range(12, commodities.shape[0]):
  temp_df[i,:] = np.sum(commodities.iloc[i-12:i,:])
temp_df_for_real = pd.DataFrame(temp_df)

df_min = pd.DataFrame(np.zeros((commodities.shape[0],3)))
for i in range(0, commodities.shape[0]-12):
  df_min.iloc[i + 12]= k_largest_index_argpartition_v1(-temp_df_for_real.iloc[i+12,:],3).squeeze()

df_max = pd.DataFrame(np.zeros((commodities.shape[0],3)))
for i in range(0, commodities.shape[0]-12):
  df_max.iloc[i + 12]= k_largest_index_argpartition_v1(temp_df_for_real.iloc[i+12,:],3).squeeze()

# assign first 12 rows as NA
empty_rows = pd.DataFrame(df_min[:][:12]).copy()
empty_rows.replace(0, np.nan, inplace = True)

dic = {}
for i, j in enumerate(commodities.columns):
  dic[i] = j

for index, data in df_max.iterrows():
  tmp = []
  for i, j in enumerate(data):
    tmp.append(dic[int(j)])
  df_max.iloc[index] = tmp

for index, data in df_min.iterrows():
  tmp = []
  for i, j in enumerate(data):
    tmp.append(dic[int(j)])
  df_min.iloc[index] = tmp

df_max[:][:12] = empty_rows
df_min[:][:12] = empty_rows

df_max.index = commodities.index
df_min.index = commodities.index

comm_twelve_mom_pf = []
for i in range(0,len(commodities)-12):
  temp = (1/3) * (commodities.iloc[i+12,:][df_max.iloc[i+12,0]] + commodities.iloc[i+12,:][df_max.iloc[i+12,1]] + \
                  commodities.iloc[i+12,:][df_max.iloc[i+12,2]] - commodities.iloc[i+12,:][df_min.iloc[i+12,0]] - \
                  commodities.iloc[i+12,:][df_min.iloc[i+12,1]] - commodities.iloc[i+12,:][df_min.iloc[i+12,2]])
  comm_twelve_mom_pf.append(temp)

print("12-Month 1-Month Strategy:")
print("mean: ", round(np.nanmean(comm_twelve_mom_pf), 2))
print("T-value: ", round(np.nanmean(comm_twelve_mom_pf) * np.sqrt(len(industries[12:])-1) / np.nanstd(comm_twelve_mom_pf), 2))
print("Sharpe ratio: ", round((np.nanmean(comm_twelve_mom_pf) - np.nanmean(factors.iloc[12:,:]['RF']))/np.nanstd(comm_twelve_mom_pf), 2))
print("standard deviation: ", round(np.nanstd(comm_twelve_mom_pf), 2))

12-Month 1-Month Strategy:
mean:  0.01
T-value:  3.86
Sharpe ratio:  -2.98
standard deviation:  0.09


In [14]:
# 12-month 2-month
temp_df1 = np.zeros(commodities.shape)
for i in range(12, commodities.shape[0]):
  temp_df1[i,:] = np.sum(commodities.iloc[i-12:i-1,:])
temp_df_for_real1 = pd.DataFrame(temp_df1)

df_min1 = pd.DataFrame(np.zeros((commodities.shape[0],3)))
for i in range(0, commodities.shape[0]-12):
  df_min1.iloc[i + 12]= k_largest_index_argpartition_v1(-temp_df_for_real1.iloc[i+12,:],3).squeeze()

df_max1 = pd.DataFrame(np.zeros((commodities.shape[0],3)))
for i in range(0, commodities.shape[0]-12):
  df_max1.iloc[i + 12]= k_largest_index_argpartition_v1(temp_df_for_real1.iloc[i+12,:],3).squeeze()

# assign first 12 rows as NA
empty_rows1 = pd.DataFrame(df_min1[:][:12]).copy()
empty_rows1.replace(0, np.nan, inplace = True)

dic = {}
for i, j in enumerate(commodities.columns):
  dic[i] = j

for index, data in df_max1.iterrows():
  tmp = []
  for i, j in enumerate(data):
    tmp.append(dic[int(j)])
  df_max1.iloc[index] = tmp

for index, data in df_min1.iterrows():
  tmp = []
  for i, j in enumerate(data):
    tmp.append(dic[int(j)])
  df_min1.iloc[index] = tmp

df_max1[:][:12] = empty_rows1
df_min1[:][:12] = empty_rows1

df_max1.index = commodities.index
df_min1.index = commodities.index

comm_twelve_mom_pf1 = []
for i in range(0,len(commodities)-12):
  temp = (1/3) * (commodities.iloc[i+12,:][df_max1.iloc[i+12,0]] + commodities.iloc[i+12,:][df_max1.iloc[i+12,1]] + \
                  commodities.iloc[i+12,:][df_max1.iloc[i+12,2]] - commodities.iloc[i+12,:][df_min1.iloc[i+12,0]] - \
                  commodities.iloc[i+12,:][df_min1.iloc[i+12,1]] - commodities.iloc[i+12,:][df_min1.iloc[i+12,2]])
  comm_twelve_mom_pf1.append(temp)

print("12-Month 2-Month Strategy:")
print("mean: ", round(np.nanmean(comm_twelve_mom_pf1), 2))
print("T-value: ", round(np.nanmean(comm_twelve_mom_pf1) * np.sqrt(len(commodities[12:])-1) / np.nanstd(comm_twelve_mom_pf1), 2))
print("Sharpe ratio: ", round((np.nanmean(comm_twelve_mom_pf1) - np.nanmean(factors_new.iloc[12:,:]['RF']))/np.nanstd(comm_twelve_mom_pf1), 2))
print("standard deviation: ", round(np.nanstd(comm_twelve_mom_pf1), 2))

12-Month 2-Month Strategy:
mean:  0.01
T-value:  2.82
Sharpe ratio:  -3.11
standard deviation:  0.09


## ii.	Can the FF 3-factor and 4-factor models price each of the above commodity momentum portfolios?

In [15]:
# 3-factor model for commodities
# 1-Month 1-Month Strategy
comm_mom_3factors = factors.iloc[523:, :]
x = sm.add_constant(comm_mom_3factors[['Mkt-RF', 'SMB', 'HML']])
mod8 = sm.OLS(comm_mom_pf[:-1], x).fit()
print("For 1-month, 1-month strategy: 3-factor alpha = ", round(mod8.params[0], 2), "; 3-factor R-squared = ", round(mod8.rsquared, 2))

# 12-Month 1-Month Strategy
comm_twelve_mom_3factors = factors.iloc[534:, :]
x = sm.add_constant(comm_twelve_mom_3factors[['Mkt-RF', 'SMB', 'HML']])
mod9 = sm.OLS(comm_twelve_mom_pf[:-1], x, missing='drop').fit()
print("For 12-month, 1-month strategy: 3-factor alpha = ", round(mod9.params[0], 2), "; 3-factor R-squared = ", round(mod9.rsquared, 2))

# 12-Month 2-Month Strategy
comm_twelve_mom1_3factors = factors.iloc[534:, :]
x = sm.add_constant(comm_twelve_mom1_3factors[['Mkt-RF', 'SMB', 'HML']])
mod10 = sm.OLS(comm_twelve_mom_pf1[:-1], x, missing='drop').fit()
print("For 12-month, 2-month strategy: 3-factor alpha = ", round(mod10.params[0], 2), "; 3-factor R-squared = ", round(mod10.rsquared, 2))

For 1-month, 1-month strategy: 3-factor alpha =  0.02 ; 3-factor R-squared =  0.01
For 12-month, 1-month strategy: 3-factor alpha =  0.01 ; 3-factor R-squared =  0.0
For 12-month, 2-month strategy: 3-factor alpha =  0.01 ; 3-factor R-squared =  0.0


In [16]:
comm_mom_4factors = factors.iloc[523:, :]
x = sm.add_constant(comm_mom_4factors[['Mkt-RF', 'SMB', 'HML', 'UMD']])
mod11 = sm.OLS(comm_mom_pf[:-1], x).fit()
print("For 1-month 1-month strategy: 4-factor alpha = ", round(mod11.params[0], 2), "; 4-factor R-squared = ", round(mod11.rsquared, 2))

# 12-Month 1-Month Strategy
comm_twelve_mom_4factors = factors.iloc[534:, :]
x = sm.add_constant(comm_twelve_mom_4factors[['Mkt-RF', 'SMB', 'HML', 'UMD']])
mod12 = sm.OLS(comm_twelve_mom_pf[:-1], x, missing='drop').fit()
print("For 12-month, 1-month strategy: 4-factor alpha = ", round(mod12.params[0], 2), "; 4-factor R-squared = ", round(mod12.rsquared, 2))

# 12-Month 2-Month Strategy
comm_twelve_mom1_4factors = factors.iloc[534:, :]
x = sm.add_constant(comm_twelve_mom1_4factors[['Mkt-RF', 'SMB', 'HML', 'UMD']])
mod13 = sm.OLS(comm_twelve_mom_pf1[:-1], x, missing='drop').fit()
print("For 12-month, 2-month strategy: 4-factor alpha = ", round(mod13.params[0], 2), "; 4-factor R-squared = ", round(mod13.rsquared, 2))

For 1-month 1-month strategy: 4-factor alpha =  0.02 ; 4-factor R-squared =  0.01
For 12-month, 1-month strategy: 4-factor alpha =  0.01 ; 4-factor R-squared =  0.06
For 12-month, 2-month strategy: 4-factor alpha =  0.01 ; 4-factor R-squared =  0.04


Similar to previous conclusion, neither the three-factor nor the four factor model could price the commodities momentum profolios. The four-factor models have slightly higher R-squared, yet still at an extremely low level (~0.05).

## iii.	Compute the correlation between each of the momentum returns above and the corresponding industry momentum returns (e.g., correlation of 1,1 IND MOM with 1,1 COM MOM, ..., correlation of 12,1 IND MOM with 12,1 COM MOM).

In [17]:
#len(mom_pf),len(twelve_mom_pf), len(twelve_mom_pf1), len(comm_mom_pf),len(comm_twelve_mom_pf), len(comm_twelve_mom_pf1)

In [18]:
np.corrcoef(mom_pf[521:] , comm_mom_pf[0:547])
print("The correlation between the 1-month, 1-month Industries Momentum and 1-month, 1-month Commodities Momentum is:", round(np.corrcoef(mom_pf[521:] , comm_mom_pf[0:547])[0, 1], 2))
import numpy.ma as ma
# https://stackoverflow.com/questions/31619578/numpy-corrcoef-compute-correlation-matrix-while-ignoring-missing-data
a = ma.corrcoef(ma.masked_invalid(twelve_mom_pf[517:1069]) , ma.masked_invalid(comm_twelve_mom_pf)) # masked na , -0.047
b = ma.corrcoef(ma.masked_invalid(twelve_mom_pf1[517:1069]) , ma.masked_invalid(comm_twelve_mom_pf1)) # -0.10099
print("The correlation between the 12-month, 1-month Industries Momentum and 12-month, 1-month Commodities Momentum is:", round(a[0,1], 2))
print("The correlation between the 12-month, 2-month Industries Momentum and 12-month, 2-month Commodities Momentum is:", round(b[0,1], 2))

The correlation between the 1-month, 1-month Industries Momentum and 1-month, 1-month Commodities Momentum is: -0.04
The correlation between the 12-month, 1-month Industries Momentum and 12-month, 1-month Commodities Momentum is: -0.05
The correlation between the 12-month, 2-month Industries Momentum and 12-month, 2-month Commodities Momentum is: -0.1


## iv.	Add the 12-month, 1-month COM momentum portfolio (not skipping a month) to the FF 3-factor model and see if this augmented factor model can explain the 12-month, 1-month industry momentum returns (not skipping a month).

In [19]:
a = twelve_mom_4factors[['Mkt-RF', 'SMB', 'HML']][522:]
a['comm mom'] = comm_twelve_mom_pf[:-5]
x = sm.add_constant(a)
mod14 = sm.OLS(twelve_mom_pf[522:], x, missing='drop').fit()
print("With 12-month, 1-month COM momentum portfolio added to the 3-factor model to try explain 12-month, 1-month IND returns: alpha =", round(mod14.params[0], 2), "; 4-factor R-squared =", round(mod14.rsquared, 2))

With 12-month, 1-month COM momentum portfolio added to the 3-factor model to try explain 12-month, 1-month IND returns: alpha = 0.75 ; 4-factor R-squared = 0.07


## v.	Now do the reverse of iv. by adding the 12-month, 1-month industry momentum portfolio to the FF 3-factor model and see if this augmented factor model can explain the 12-month, 1-month COM momentum returns.

In [20]:
b = comm_twelve_mom_4factors = factors.iloc[534:-4, 0:3]
b['ind mom'] = twelve_mom_pf[522:]
x = sm.add_constant(b)
mod15 = sm.OLS(comm_twelve_mom_pf[:-5], x, missing='drop').fit()
print("With 12-month, 1-month IND momentum portfolio added to the 3-factor model to try explain 12-month, 1-month COM returns: alpha =", round(mod15.params[0], 2), "; 4-factor R-squared =", round(mod15.rsquared, 2))

With 12-month, 1-month IND momentum portfolio added to the 3-factor model to try explain 12-month, 1-month COM returns: alpha = 0.01 ; 4-factor R-squared = 0.04


## vi.	Do the results in iii., iv., and v. make sense?  Do they surprise you?  What story could you give these findings?

The results in iii, iv, and v are consistent in the sense that, from iii we can see that the correlation between the industries and commodities momentum are very small (or even negatively small), and therefore augmenting either one to the other's three-factor model would not help explain the returns.

There might be certain randomness in the momentum strategies, i.e. what industries / commodities win out in certain period, and by how much, such that the correlation between the two is very low.

## h)	Using the past-return sorted portfolios in the fourth spreadsheet, conduct GRS tests using the Fama and French 4-factor model consisting of RMRF, SMB, HML, and UMD for the following:
i.	The 25 short-term 1-month past return sorted portfolios.
ii.	The 25 intermediate-term 12-month past return sorted portfolios.
iii.	The 25 long-term 60-month past return sorted portfolios.

Which sets of 25 portfolios are explained by the model and which aren’t?  Which set of portfolios does the model have the most difficult time explaining?  Which results are the most surprising?

After all of this, what have you learned about the cross-section of returns?  What have you learned about momentum strategies in general?


In [21]:
factors_new = factors[:-4]

def GRS_test(factors_new = factors_new,y = p1_return): 
  y = y[5:].subtract(factors_new['RF'],axis = 0)
  y = y[0:1069]
  x = sm.add_constant(factors_new[['Mkt-RF', 'SMB', 'HML','UMD']])

  alpha = []
  beta = []
  eps = []

  for pf in y.columns:
      y_in = y[pf]
      mod = sm.OLS(y_in,x,missing = 'drop').fit()
      alpha.append(mod.params[0])
      beta.append(mod.params[1])
      eps.append(mod.resid)

  T = y.shape[0]
  N = y.shape[1]

  eps_df = pd.DataFrame(eps).T
  var_cov = eps_df.cov() # (30,30)

  # create Rm column in rm_rf
  factors_new['RM'] = factors_new['Mkt-RF'] + factors_new['RF']

  F = ((T-N-1)/N)* ((alpha @ np.linalg.inv(var_cov) @ np.transpose(alpha)) / (1+(factors_new['RM'][0:1069].mean()/factors_new['RM'][0:1069].std())**2))
  p_value = f.sf(F, N, T-N-1)
  # print("F statistic: ", F, "p value: ", p_value)

  # coeff_summary = pd.DataFrame({'industry':y.columns, 'alpha': alpha, 'beta': beta})
  return F, p_value

In [22]:
GRS_test(factors_new = factors_new,y = p1_return)

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy


(10.006469439905358, 2.074241872862297e-34)

In [23]:
GRS_test(factors_new = factors_new,y = p212_return)

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy


(2.4631695829028146, 8.961085862687106e-05)

In [24]:
GRS_test(factors_new = factors[54:-4] ,y = p1360_return.iloc[59:,:])

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy


(2.5028615016346474, 6.74703736220244e-05)

We can see the p-values for all momentum strategies are extremely low, indicating that the four-factors model seems to be able to explain all of them. 

That notwithstanding, the model seems to be able to better explain the shorter-term momentum portfolio. It seems to be able to explain the 1-month momentum portoflio with extremely high statistical significance, while the 12-month and 60-month with slightly less significance. In fact, by ranking, it has the most difficult time explaining the 60-month portfolio, though from the p-value it's still able to explain it.

It is very surprising that the ability to explain 1-month momentum portfolio is so strong. Using 1-month result to guide portfolio construction would seem to have a lot of randomness to it. 

These results from the cross-sectional analysis seems to indicate that there is no $\alpha$ found in across different momentum portfolios in terms of their horizon, that there desn't seem to be excess returns due to short-term, middle-term, long-term momentum strategies above what the 4-factor model can explain.