# Assignment 5

1. Please type your solutions in the code cells bellow
    - Do not add more code cells (the solution to each problem should be in a single code cell)
2. Once you are done, restart the kernel and rerun all the code (Kernel -> Restart & Run All) 
3. Save the resulting notebook and upload it on D2L as the solution to your assignment

In [1]:
# Import all needed packages here
import pandas as pd
import numpy as np
import yfinance as yf
import pandas_datareader as pdr
import matplotlib.pyplot as plt
import statsmodels.api as sm
import statsmodels.tsa.stattools as st
from linearmodels import PanelOLS

# Pretty print all cell's output and not just the last one
from IPython.core.interactiveshell import InteractiveShell
InteractiveShell.ast_node_interactivity = "all"

## Problem 1 (100 points)

In this assignment, you will backtest the "momentum" strategy which involves going long on stocks with the highest returns in the past 12 months (the 'ret11' variable introduced below) and shorting the stocks with the lowest returns in the past 12 months. Please follow the steps below to implement this backtest:

- **Data** (20 points)
    - Load the "clean_crsp.pkl.zip" file and keep only the 'ret', 'mktcap', and <font color='red'>'dtdate'</font> variables
    - Create a new variable called 'ret11' which, for each firm, each month, equals the compounded returns of the firm in the past 12 months, excluding the current month (e.g. for December 2010, 'ret11' equals the compounded returns from Jan 2010 to Nov 2010)
    - <font color='red'>Create a new variable 'mktcap_lag1' that, for each firm, each month, equals the market capitalization ('mktcap') of that firm in the previous month. You will need this variable to calculate value-weighted portfolio returns.</font>
    - <font color='red'>After you create the 'ret11' and 'mktcap_lag1' variables, keep only the observations after 1970 (i.e. drop all data from 1970)</font>
    
    
- **Momentum portfolios** (40 points)
    - Every month, form 10 momentum portfolios, based on the decile the firm's 'ret11' falls into that month (portfolio 1 corresponds to decile 1, ..., portfolio 10 corresponds to decile 10).
        - Portfolios are rebalanced every month
    - Calculate the monthly equal weighted (EW) and value weighted (VW) returns of each of the 10 portfolios
        - Note that, because portfolios are rebalanced every month, the return of a portfolio (decile) in a given month is based on which firms were in that 'ret11' decile in the *previous* month
    - Calculate the returns (EW as well as VW) to the momentum strategy by subtracting returns of portfolio 1 from the returns of portfolio 10. This will be referred to as the "spread portfolio" below.
    
    
- **Analysis** (40 points)
    - Calculate and print the average EW returns for the 10 momentum portfolios as well as the spread portfolio.
        - This should be a 1-by-11 table
    - Calculate and print the average VW returns for the 10 momentum portfolios as well as the spread portfolio.
        - This should be a 1-by-11 table
    - Calculate risk-adjusted average EW returns (with respect to the Fama-French three-factor model) for the 10 momentum portfolios as well as the spread portfolio
        - This should be a 1-by-11 table
        - The numbers in this table are "alpha" coefficients from regressing each portfolio's EW returns on the Fama-French three factors
    - Calculate risk-adjusted average VW returns (with respect to the Fama-French three-factor model) for the 10 momentum portfolios as well as the spread portfolio
        - This should be a 1-by-11 table
        - The numbers in this table are "alpha" coefficients from regressing each portfolio's VW returns on the Fama-French three factors


In [2]:
# Load data and clean it
crsp = pd.read_pickle('./clean_crsp.pkl.zip')
crsp = crsp[['ret','mktcap','dtdate']].copy()

#crsp.sort_index(inplace = True)

# Create month variable
crsp['month'] = crsp['dtdate'].dt.year*12 + crsp['dtdate'].dt.month

# Calculate ret11
crsp['ret11'] = 1
for t in range(1,12):
    crsp['ret11'] = crsp['ret11'] * (1 + crsp.groupby('permno')['ret'].shift(t))
    crsp.loc[ crsp.groupby('permno')['month'].diff(t)!=t ,'ret11'] = np.nan

# Calculate lagged market cap
crsp['mktcap_lag1'] = crsp.groupby('permno')['mktcap'].shift()
crsp.loc[ crsp.groupby('permno')['month'].diff(1)!=1 , 'mktcap_lag1'] = np.nan

# Get rid of observations when we don't know ret11
crsp = crsp.loc[crsp['dtdate'].dt.year>1970,:].copy()

# Create ret11 deciles
crsp['ret11_decile'] = crsp.groupby('mdate')['ret11'].transform(lambda x: pd.qcut(x, q = 10, labels = range(1,11)))

# Create portf_nr as the ret11 decile from last month
crsp['portf_nr'] = crsp.groupby('permno')['ret11_decile'].shift()
crsp.loc[crsp.groupby('permno')['month'].diff()!=1,  'portf_nr'] = np.nan

# Get rid of obs when portf_nr is 0
crsp = crsp.loc[crsp['portf_nr']!=0, :].copy()
crsp['portf_nr'].value_counts()

5.0     266210
4.0     266198
3.0     266016
6.0     265976
7.0     265842
2.0     265536
8.0     265371
9.0     264682
10.0    264021
1.0     261713
Name: portf_nr, dtype: int64

In [22]:
crsp

Unnamed: 0_level_0,Unnamed: 1_level_0,ret,mktcap,dtdate,month,ret11,mktcap_lag1,ret11_decile,portf_nr,ret_x_size
permno,mdate,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1
10000,1986-02,-0.257143,1.196000e+04,1986-02-28,23834,,,,,
10000,1986-03,0.365385,1.633000e+04,1986-03-31,23835,,1.196000e+04,,,4.370005e+03
10000,1986-04,-0.098592,1.517200e+04,1986-04-30,23836,,1.633000e+04,,,-1.610007e+03
10000,1986-05,-0.222656,1.179388e+04,1986-05-30,23837,,1.517200e+04,,,-3.378137e+03
10000,1986-06,-0.005025,1.173459e+04,1986-06-30,23838,,1.179388e+04,,,-5.926424e+01
...,...,...,...,...,...,...,...,...,...,...
93436,2019-08,-0.066222,4.041284e+07,2019-08-30,24236,0.800935,4.327887e+07,4.0,3.0,-2.866014e+06
93436,2019-09,0.067639,4.335660e+07,2019-09-30,24237,0.852099,4.041284e+07,5.0,4.0,2.733484e+06
93436,2019-10,0.307427,5.676276e+07,2019-10-31,24238,0.714071,4.335660e+07,3.0,5.0,1.332899e+07
93436,2019-11,0.047695,5.947004e+07,2019-11-29,24239,0.898541,5.676276e+07,4.0,3.0,2.707300e+06


### 3.1. Equal-weighted returns

In [11]:
# Equal-weighted portfolio returns each month
ew_ret_monthly = crsp.groupby(['mdate', 'portf_nr'])['ret'].mean()

# Reshape to have returns of each portfolio side by side
ew_ret = ew_ret_monthly.unstack(level = 'portf_nr')

# Create new column that stores the returns of the "spread" portfolio
ew_ret['Spread'] = ew_ret[10] - ew_ret[1]
ew_ret

# Save the data for later use
ew_ret.to_pickle('./assignment5_ew_returns.pkl')

portf_nr,1.0,2.0,3.0,4.0,5.0,6.0,7.0,8.0,9.0,10.0,Spread
mdate,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1
1971-02,0.102605,0.071062,0.049832,0.041638,0.033034,0.043105,0.032590,0.024926,0.021716,0.047060,-0.055545
1971-03,0.062316,0.043898,0.061652,0.058216,0.069120,0.049439,0.049004,0.051173,0.050588,0.068539,0.006223
1971-04,0.013063,0.024422,0.019538,0.033211,0.036947,0.025829,0.018798,0.031667,0.041232,0.027346,0.014282
1971-05,-0.073541,-0.059478,-0.047906,-0.053817,-0.050778,-0.044537,-0.045598,-0.053042,-0.044615,-0.033982,0.039560
1971-06,-0.054868,-0.034800,-0.024447,-0.019766,-0.016799,-0.015660,-0.016928,-0.014293,0.005182,0.011862,0.066730
...,...,...,...,...,...,...,...,...,...,...,...
2019-08,-0.077588,-0.062789,-0.084102,-0.079861,-0.051361,-0.054194,-0.042419,-0.022288,-0.026877,-0.028917,0.048671
2019-09,0.034890,0.042863,0.042723,0.037264,0.041226,0.043098,0.029307,0.021053,0.000802,-0.037450,-0.072340
2019-10,-0.055666,-0.009457,-0.001955,-0.002158,0.012009,0.006947,0.030364,0.010666,0.015954,0.015206,0.070871
2019-11,0.031109,0.021777,0.048570,0.031661,0.037947,0.039182,0.035976,0.029790,0.039486,0.053221,0.022112


### 3.2. Value-weighted returns

In [12]:
# Calculate returns times lagged market cap and sum it up for each portfolio, each month
crsp['ret_x_size'] = crsp['ret'] * crsp['mktcap_lag1']
sum_ret_x_size = crsp.groupby(['mdate','portf_nr'])['ret_x_size'].sum()

# Calculate sum of lagged market cap for each portfolio each month
sum_size = crsp.groupby(['mdate','portf_nr'])['mktcap_lag1'].sum()

# Calculate monthly VW returns
vw_ret_monthly = sum_ret_x_size / sum_size

# Reshape to have returns of each portfolio side by side
vw_ret = vw_ret_monthly.unstack(level = 'portf_nr')

# Create new column that stores the returns of the "spread" portfolio
vw_ret['Spread'] = vw_ret[10] - vw_ret[1]
vw_ret

# Save the data for later use
vw_ret.to_pickle('./assignment5_vw_returns.pkl')

portf_nr,1.0,2.0,3.0,4.0,5.0,6.0,7.0,8.0,9.0,10.0,Spread
mdate,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1
1971-02,0.054844,0.045158,-0.002704,0.016658,0.006265,0.022770,-0.000143,0.014040,0.009292,0.040026,-0.014818
1971-03,0.087178,0.075216,0.063347,0.049451,0.062312,0.024357,0.038106,0.035950,0.032433,0.051803,-0.035375
1971-04,0.034679,0.023381,0.034183,0.042612,0.005257,0.028656,0.047063,0.047076,0.052585,0.049199,0.014519
1971-05,-0.062351,-0.042848,-0.032517,-0.031020,-0.050665,-0.025406,-0.034474,-0.047055,-0.023979,-0.027585,0.034766
1971-06,-0.047120,0.005874,-0.004075,-0.002064,0.006986,-0.002055,0.000460,-0.003938,0.026086,0.033376,0.080496
...,...,...,...,...,...,...,...,...,...,...,...
2019-08,-0.102602,-0.113539,-0.094929,-0.071649,-0.044886,-0.055158,-0.030312,-0.014248,0.004999,0.001670,0.104272
2019-09,0.056435,0.060578,0.051298,0.057880,0.038505,0.020244,0.030650,0.028094,0.007241,-0.030108,-0.086543
2019-10,-0.065123,-0.001670,0.054835,0.035638,0.036230,0.037595,0.034534,0.019331,0.012943,-0.005957,0.059166
2019-11,0.070698,0.053449,0.049761,0.073066,0.056058,0.052612,0.035344,0.032173,0.026700,0.036105,-0.034593


# 1. Raw returns performance

## 1.1. Equal-weighted (EW) portfolios

In [13]:
# Calculate average EW returns
ew_means = ew_ret.mean()
ew_means

portf_nr
1.0       0.010037
2.0       0.007960
3.0       0.009098
4.0       0.010598
5.0       0.010967
6.0       0.012054
7.0       0.013329
8.0       0.014119
9.0       0.015357
10.0      0.016596
Spread    0.006559
dtype: float64

## 1.2. Value-weighted (VW) portfolios

In [14]:
# Calculate average EW returns
vw_means = vw_ret.mean()
vw_means

portf_nr
1.0      -0.001165
2.0       0.003739
3.0       0.005717
4.0       0.008619
5.0       0.008841
6.0       0.008817
7.0       0.010250
8.0       0.011248
9.0       0.011706
10.0      0.015089
Spread    0.016254
dtype: float64

## 2. Risk-adjusted performance

In [15]:
# Load data on Fama-French factors
ff3f = pdr.DataReader('F-F_Research_Data_Factors', 'famafrench', '1971-01-01')[0]/100
ff3f.index.rename('mdate', inplace = True)
ff3f

Unnamed: 0_level_0,Mkt-RF,SMB,HML,RF
mdate,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
1971-01,0.0484,0.0737,0.0133,0.0038
1971-02,0.0141,0.0186,-0.0122,0.0033
1971-03,0.0413,0.0250,-0.0406,0.0030
1971-04,0.0315,-0.0050,0.0071,0.0028
1971-05,-0.0398,-0.0111,-0.0148,0.0029
...,...,...,...,...
2020-11,0.1247,0.0548,0.0211,0.0001
2020-12,0.0463,0.0481,-0.0136,0.0001
2021-01,-0.0003,0.0719,0.0285,0.0000
2021-02,0.0278,0.0211,0.0708,0.0000


## 2.1. Equal-weighted portfolios

In [20]:
# Merge EW monthly portfolio returns with the risk factors
alldata = ew_ret.join(ff3f)

# Cycle through all portfolios and regress excess returns on risk factors
    # First, create empty tables to store portfolio alphas and their tstats
ew_portf_coeff = pd.DataFrame(np.nan, index = ['const', 'Mkt-RF','SMB', 'HML'], columns = ew_ret.columns)
ew_portf_tstats = pd.DataFrame(np.nan, index = ['const', 'Mkt-RF','SMB', 'HML'], columns = ew_ret.columns)

# Regressions for each portfolio
for p in ew_ret.columns:
    #Set up the data
        # Dependent variable is excess return on the portfolio
    y = alldata[p] - alldata['RF']
        # Except for the spread portfolio (which is alread an excess return)
    if p == 'Spread':
        y = alldata[p] 
        
        # Independent variables are the risk factors
    X = sm.add_constant(alldata[['Mkt-RF','SMB','HML']])
    
    # Run the regression
    res = sm.OLS(y, X, missing='drop').fit()
    res_robust = res.get_robustcov_results(cov_type = 'HAC', maxlags = 12)
    
    # Store the results
    ew_portf_coeff.loc[:,p] = res_robust.params 
    ew_portf_tstats.loc[:,p] = res_robust.tvalues 

# Take a look at the results
print("\n Portfolio alphas and factor loadings:\n")
ew_portf_coeff


 Portfolio alphas and factor loadings:



portf_nr,1.0,2.0,3.0,4.0,5.0,6.0,7.0,8.0,9.0,10.0,Spread
const,-0.003653,-0.004513,-0.002807,-0.000831,8.3e-05,0.001284,0.002766,0.003763,0.004958,0.005924,0.009578
Mkt-RF,1.278544,1.106739,1.037489,0.97245,0.906866,0.898976,0.880678,0.883016,0.913639,1.033239,-0.245305
SMB,1.432141,1.04906,0.852414,0.736663,0.665947,0.627604,0.658823,0.696546,0.825501,1.078132,-0.354009
HML,0.287343,0.369325,0.39378,0.407653,0.379455,0.371788,0.323478,0.231697,0.131093,-0.118158,-0.405501


In [21]:
y


mdate
1971-02   -0.055545
1971-03    0.006223
1971-04    0.014282
1971-05    0.039560
1971-06    0.066730
             ...   
2019-08    0.048671
2019-09   -0.072340
2019-10    0.070871
2019-11    0.022112
2019-12   -0.088021
Freq: M, Name: Spread, Length: 587, dtype: float64

## 2.2. Value-weighted portfolios

In [17]:
# Merge VW monthly portfolio returns with the risk factors
alldata = vw_ret.join(ff3f)

# Cycle through all portfolios and regress excess returns on risk factors
    # First, create empty tables to store portfolio alphas and their tstats
vw_portf_coeff = pd.DataFrame(np.nan, index = ['const', 'Mkt-RF','SMB', 'HML'], columns = vw_ret.columns)
vw_portf_tstats = pd.DataFrame(np.nan, index = ['const', 'Mkt-RF','SMB', 'HML'], columns = vw_ret.columns)


# Regressions for each portfolio
for p in vw_ret.columns:
    #Set up the data
        # Dependent variable is excess return on the portfolio
    y = alldata[p] - alldata['RF']
        # Except for the spread portfolio (which is alread an excess return)
    if p == 'Spread':
        y = alldata[p] 
        
        # Independent variables are the risk factors
    X = sm.add_constant(alldata[['Mkt-RF','SMB','HML']])
    
    # Run the regression
    res = sm.OLS(y, X, missing='drop').fit()
    res_robust = res.get_robustcov_results(cov_type = 'HAC', maxlags = 12)
    
    # Store the results
    vw_portf_coeff.loc[:,p] = res_robust.params 
    vw_portf_tstats.loc[:,p] = res_robust.tvalues 

# Take a look at the results

print("\n Portfolio alphas and factor loadings:\n")
vw_portf_coeff


 Portfolio alphas and factor loadings:



portf_nr,1.0,2.0,3.0,4.0,5.0,6.0,7.0,8.0,9.0,10.0,Spread
const,-0.015329,-0.008964,-0.005833,-0.002024,-0.001334,-0.000964,0.000787,0.001975,0.002608,0.005732,0.021061
Mkt-RF,1.562577,1.351324,1.198411,1.07941,1.028419,0.971549,0.962113,0.96313,0.962826,1.0599,-0.502677
SMB,0.645622,0.322295,0.12871,0.009144,-0.016549,-0.054488,-0.075153,-0.059956,0.118047,0.513474,-0.132148
HML,0.220316,0.271532,0.258783,0.232083,0.182471,0.175623,0.093181,0.019307,-0.115671,-0.385643,-0.605959
