In [1]:
import pandas as pd
import numpy as np
import pandas_datareader as pdr
import statsmodels.api as sm

## Problem 1 (100 points)

In this assignment, you will backtest the "momentum" strategy which involves going long on stocks with the highest returns in the past 12 months and shorting the stocks with the lowest returns in the past 12 months, and rebalancing these portfolios every month. Please follow the steps below to implement this backtest:

- **Data** (10 x 2 = 20 points)
    1. Load the "crspm.zip" file into a dataframe called ``crsp``. Create a new variable called ``mdate`` which converts the ``date`` variable into a monthly period date. Calculate market capitalization (``mktcap``) as number of shares outstanding (``shrout``) times the absolute value of price (``prc``). Create a new variable ``mktcap_lag1`` that, for each firm, each month, equals the market capitalization (``mktcap``) of that firm in the previous month (assume no duplicates or gaps in the data, i.e. use the ``shift()`` method for lagging). Keep only ``permno``, ``mdate``, ``ret``, ``mktcap_lag1``, and drop all rows that have any missing values in any of those variables. Print the first five and last five rows of the resulting dataset.
    2. Create a new variable called ``ret11`` which, for each firm, each month, equals the **net** compounded returns of the firm in the past 12 months, excluding the current month (e.g. for December 2010, ``ret11`` equals the compounded returns from Jan 2010 to Nov 2010). Print a table that gives us the mean and standard deviation of ``ret`` and ``ret11``.
    
- **Momentum portfolios** (10 x 4 = 40 points)

    3. Drop all rows for which ``ret11`` is missing. Then, create a variable called ``ret11_decile`` which, every month, tells us how each firm ranks (which decile) amongst all other firms with respect to their past 11-month returns (``ret``). Firms in decile 1 will form portfolio 1, firms in decile 2 for portfolio 2, etc. Print out a table that tells us how many observations you have for each decile in your dataset. 
    4. These portfolios are held for one month, and then they are re-created again (rebalanced) based on the new values of ``ret11`` for the current month. For this reason, the return of a portfolio (decile) in a given month is based on which firms were in that decile in the *previous* month. To help with this (for steps 5 and 6 below), create a new variable called ``portf_nr`` which equals the ``ret11_decile`` that the firm was in the previous month. Print out a table that tells us how many observations you have for each ``portf_nr`` in your dataset. 
    5. Calculate the monthly equal-weighted (EW) returns of each of the 10 portfolios. Calculate the EW returns to the momentum strategy by subtracting returns of portfolio 1 from the returns of portfolio 10. This will be referred to as the "spread portfolio" below. Store these returns in a new dataframe called ``ew_ret`` and print the first 5 and last 5 rows of this dataframe.
    6. Repeat step 5 using value-weighted (VW) returns. Use ``mktcap_lag1`` (i.e. the market capitalization at the end of the prior month) as weights.  Store these returns in a new dataframe called ``vw_ret`` and print the first 5 and last 5 rows of this dataframe.  
    
- **Analysis** (10 x 4 = 40 points)

    7. Calculate and print the average EW returns for the 10 momentum portfolios as well as the spread portfolio.
        - This should be a 11-by-1 table containing the EW averages
    8. Calculate and print the average VW returns for the 10 momentum portfolios as well as the spread portfolio.
        - This should be a 11-by-1 table containing the VW averages
    9. Calculate and print risk-adjusted average EW returns (with respect to the Fama-French three-factor model) and their t-statistics for the 10 momentum portfolios as well as the spread portfolio
        - This should be an 11-by-2 table
        - The numbers in this table are "alpha" coefficients from regressing each portfolio's EW returns on the Fama-French three factors (in the first column) and the tstats of these coefficients (in the second column)
    10. Calculate and print risk-adjusted average VW returns (with respect to the Fama-French three-factor model) and their t-statistics for the 10 momentum portfolios as well as the spread portfolio
        - This should be an 11-by-2 table
        - The numbers in this table are "alpha" coefficients from regressing each portfolio's VW returns on the Fama-French three factors (in the first column) and the tstats of these coefficients (in the second column)

In [2]:
# 1
print("Output for part 1:")

crsp = pd.read_pickle('../../lectures/data/crspm.zip')
crsp['mdate'] = pd.to_datetime(crsp['date']).dt.to_period('M')

crsp['mktcap'] = crsp['shrout'] * crsp['prc'].abs()
crsp = crsp.sort_values(['permno','mdate'])
crsp['mktcap_lag1'] = crsp.groupby('permno')['mktcap'].shift(1)

crsp = crsp[['permno','mdate','ret','mktcap_lag1']].copy().dropna()
crsp

Output for part 1:


Unnamed: 0,permno,mdate,ret,mktcap_lag1
1,10000.0,1986-02,-0.257143,1.610000e+04
2,10000.0,1986-03,0.365385,1.196000e+04
3,10000.0,1986-04,-0.098592,1.633000e+04
4,10000.0,1986-05,-0.222656,1.517200e+04
5,10000.0,1986-06,-0.005025,1.179386e+04
...,...,...,...,...
2553248,93436.0,2020-08,0.741452,2.666393e+08
2553249,93436.0,2020-09,-0.139087,4.643391e+08
2553250,93436.0,2020-10,-0.095499,4.067015e+08
2553251,93436.0,2020-11,0.462736,3.678235e+08


In [3]:
# 2
print("Output for part 2:")

crsp['ret11'] = 1
for t in range(1,12):
    crsp['ret11'] = crsp['ret11'] * (1 + crsp.groupby('permno')['ret'].shift(t))

crsp['ret11'] = crsp['ret11'] - 1    
crsp[['ret','ret11']].agg(['mean','std'])

Output for part 2:


Unnamed: 0,ret,ret11
mean,0.011656,0.129892
std,0.191348,0.748724


In [4]:
# 3
print("Output for part 3:")

crsp = crsp.loc[crsp['ret11'].notnull(),:].copy()
crsp['ret11_decile'] = crsp.groupby('mdate')['ret11'].transform(lambda x: pd.qcut(x, q = 10, labels = range(1,11)))
crsp['ret11_decile'].value_counts(sort=False)

Output for part 3:


1     229084
2     228849
3     228808
4     228850
5     228905
6     228749
7     228803
8     228855
9     228802
10    229044
Name: ret11_decile, dtype: int64

In [5]:
# 4
print("Output for part 4:")

crsp.sort_values(['permno','mdate'], inplace=True)
crsp['portf_nr'] = crsp.groupby('permno')['ret11_decile'].shift(1)

crsp = crsp.loc[crsp['portf_nr']!=0, :].copy()
crsp['portf_nr'].value_counts(sort=False)

Output for part 4:


1     223833
2     227185
3     227532
4     227730
5     227686
6     227477
7     227315
8     226930
9     226293
10    225875
Name: portf_nr, dtype: int64

In [6]:
# 5
print("Output for part 5:")

ew_ret_monthly = crsp.groupby(['mdate', 'portf_nr'])['ret'].mean()

# Reshape to have returns of each portfolio side by side
ew_ret = ew_ret_monthly.unstack(level = 'portf_nr')

# Create new column that stores the returns of the "spread" portfolio
ew_ret['Spread'] = ew_ret[10] - ew_ret[1]
ew_ret

Output for part 5:


portf_nr,1,2,3,4,5,6,7,8,9,10,Spread
mdate,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1
1981-01,,,,,,,,,,,
1981-02,-0.001611,0.018650,0.011000,0.010897,0.012980,0.016318,0.008814,0.007066,0.003615,-0.017493,-0.015882
1981-03,0.073697,0.070473,0.080238,0.082625,0.071313,0.074025,0.068516,0.069465,0.087356,0.079740,0.006043
1981-04,0.033108,0.024192,0.020516,0.029522,0.029836,0.027749,0.028870,0.029622,0.038376,0.031609,-0.001499
1981-05,0.010310,0.011268,0.016465,0.013602,0.026789,0.028721,0.026589,0.022504,0.055683,0.055878,0.045568
...,...,...,...,...,...,...,...,...,...,...,...
2020-08,0.058369,0.050536,0.066951,0.054317,0.053586,0.043611,0.035730,0.041590,0.032555,0.037719,-0.020650
2020-09,-0.073705,-0.038189,-0.035999,-0.023008,-0.031162,-0.018343,-0.023794,-0.023569,-0.012108,0.007355,0.081060
2020-10,-0.012347,0.030453,0.046157,0.028848,0.014210,0.003614,0.018308,0.009093,0.010959,-0.025737,-0.013390
2020-11,0.404211,0.242672,0.194575,0.185290,0.168104,0.167673,0.145358,0.165401,0.165987,0.216825,-0.187386


In [7]:
# 6
print("Output for part 6:")

crsp['ret_x_size'] = crsp['ret'] * crsp['mktcap_lag1']
sum_ret_x_size = crsp.groupby(['mdate','portf_nr'])['ret_x_size'].sum()

# Calculate sum of lagged market cap for each portfolio each month
sum_size = crsp.groupby(['mdate','portf_nr'])['mktcap_lag1'].sum()

# Calculate monthly portfolio VW returns
vw_ret_monthly = sum_ret_x_size / sum_size

# Reshape to have returns of each portfolio side by side
vw_ret = vw_ret_monthly.unstack(level = 'portf_nr')

# Create new column that stores the returns of the "spread" portfolio
vw_ret['Spread'] = vw_ret[10] - vw_ret[1]
vw_ret

Output for part 6:


portf_nr,1,2,3,4,5,6,7,8,9,10,Spread
mdate,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1
1981-01,,,,,,,,,,,
1981-02,-0.007998,0.034823,0.032598,0.011224,0.025033,0.030649,0.031447,0.000641,0.024081,-0.001883,0.006115
1981-03,0.143031,0.085449,0.044863,0.033512,0.048550,0.023256,0.034338,0.043392,0.060206,0.090606,-0.052425
1981-04,-0.011028,-0.010393,0.023165,-0.015613,-0.008582,-0.021044,-0.017775,-0.029303,-0.002552,-0.003691,0.007337
1981-05,-0.005517,0.030109,-0.009834,0.004325,0.017044,0.002600,0.000368,0.020718,0.035657,0.062959,0.068476
...,...,...,...,...,...,...,...,...,...,...,...
2020-08,0.042744,0.060993,0.055256,0.039988,0.063935,0.040496,0.050129,0.071553,0.076713,0.165289,0.122546
2020-09,-0.076537,-0.069751,-0.054620,-0.044092,-0.025421,-0.009056,-0.020338,-0.024782,-0.042851,-0.066764,0.009772
2020-10,-0.045062,0.015458,0.030485,0.013707,-0.003825,-0.031272,-0.022938,-0.018253,-0.008879,-0.048785,-0.003723
2020-11,0.373683,0.275725,0.219271,0.179983,0.143367,0.128766,0.112797,0.104732,0.075896,0.125390,-0.248294


In [8]:
# 7
print("Output for part 7:")

ew_means = ew_ret.mean()
ew_means

Output for part 7:


portf_nr
1         0.011980
2         0.008312
3         0.009356
4         0.010551
5         0.010858
6         0.012008
7         0.012817
8         0.013812
9         0.014665
10        0.015816
Spread    0.003836
dtype: float64

In [9]:
# 8
print("Output for part 8:")

vw_means = vw_ret.mean()
vw_means

Output for part 8:


portf_nr
1         0.000556
2         0.004428
3         0.006921
4         0.009850
5         0.009972
6         0.009068
7         0.010314
8         0.011152
9         0.010892
10        0.014960
Spread    0.014404
dtype: float64

In [10]:
# 9: Prep data for risk-adjustment of EW portfolio returns

# Load data on Fama-French factors
ff3f = pdr.DataReader('F-F_Research_Data_Factors', 'famafrench', '1971-01-01')[0]/100
ff3f.index.rename('mdate', inplace = True)
ff3f.reset_index(inplace=True)
#ff3f

# Merge EW monthly portfolio returns with the risk factors
alldata = ew_ret.merge(ff3f, how='inner', on='mdate')
alldata['const'] = 1
#alldata

In [11]:
# 9: Cycle through all portfolios and regress EW excess returns on risk factors

    # First, create empty tables to store portfolio alphas and their tstats
ew_portf_coeff = pd.DataFrame(np.nan, index = ['const', 'Mkt-RF','SMB', 'HML'], columns = ew_ret.columns)
ew_portf_tstats = pd.DataFrame(np.nan, index = ['const', 'Mkt-RF','SMB', 'HML'], columns = ew_ret.columns)

# Regressions for each portfolio
for p in ew_ret.columns:
    #Set up the data
        # Dependent variable is excess return on the portfolio
    y = alldata[p] - alldata['RF']
        # Except for the spread portfolio (which is alread an excess return)
    if p == 'Spread':
        y = alldata[p] 
        
        # Independent variables are the risk factors
    X = alldata[['const','Mkt-RF','SMB','HML']]
    
    # Run the regression
    res = sm.OLS(y, X, missing='drop').fit()
    res_robust = res.get_robustcov_results(cov_type = 'HAC', maxlags = 4)
    
    # Store the results
    ew_portf_coeff.loc[:,p] = res_robust.params 
    ew_portf_tstats.loc[:,p] = res_robust.tvalues 

In [12]:
# 9: Final result
print("Output for part 9:")

ew_alphas = ew_portf_coeff.loc['const',:].to_frame(name='EW_alphas')
ew_tstats = ew_portf_tstats.loc['const',:].to_frame(name='EW_alpha_tstats')
ew_results = ew_alphas.join(ew_tstats)
ew_results

Output for part 9:


Unnamed: 0_level_0,EW_alphas,EW_alpha_tstats
portf_nr,Unnamed: 1_level_1,Unnamed: 2_level_1
1,-0.002588,-0.824595
2,-0.004648,-2.647606
3,-0.002775,-2.270109
4,-0.000846,-0.882864
5,9.2e-05,0.128254
6,0.001452,2.174552
7,0.002508,3.804629
8,0.00356,5.139446
9,0.00427,5.312719
10,0.004443,3.311024


In [17]:
alldata

Unnamed: 0,mdate,1,2,3,4,5,6,7,8,9,10,Spread,Mkt-RF,SMB,HML,RF,const
0,1981-01,,,,,,,,,,,,-0.0504,0.0292,0.0672,0.0104,1
1,1981-02,-0.007998,0.034823,0.032598,0.011224,0.025033,0.030649,0.031447,0.000641,0.024081,-0.001883,0.006115,0.0057,-0.0034,0.0102,0.0107,1
2,1981-03,0.143031,0.085449,0.044863,0.033512,0.048550,0.023256,0.034338,0.043392,0.060206,0.090606,-0.052425,0.0356,0.0354,0.0064,0.0121,1
3,1981-04,-0.011028,-0.010393,0.023165,-0.015613,-0.008582,-0.021044,-0.017775,-0.029303,-0.002552,-0.003691,0.007337,-0.0211,0.0440,0.0228,0.0108,1
4,1981-05,-0.005517,0.030109,-0.009834,0.004325,0.017044,0.002600,0.000368,0.020718,0.035657,0.062959,0.068476,0.0011,0.0200,-0.0042,0.0115,1
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
475,2020-08,0.042744,0.060993,0.055256,0.039988,0.063935,0.040496,0.050129,0.071553,0.076713,0.165289,0.122546,0.0763,-0.0028,-0.0295,0.0001,1
476,2020-09,-0.076537,-0.069751,-0.054620,-0.044092,-0.025421,-0.009056,-0.020338,-0.024782,-0.042851,-0.066764,0.009772,-0.0363,0.0005,-0.0258,0.0001,1
477,2020-10,-0.045062,0.015458,0.030485,0.013707,-0.003825,-0.031272,-0.022938,-0.018253,-0.008879,-0.048785,-0.003723,-0.0210,0.0440,0.0414,0.0001,1
478,2020-11,0.373683,0.275725,0.219271,0.179983,0.143367,0.128766,0.112797,0.104732,0.075896,0.125390,-0.248294,0.1247,0.0563,0.0210,0.0001,1


In [13]:
# 10: Prep data for risk-adjustment of VW portfolio returns

# Merge VW monthly portfolio returns with the risk factors
alldata = vw_ret.merge(ff3f, how='inner', on='mdate')
alldata['const'] = 1
#alldata

In [14]:
# 10: Cycle through all portfolios and regress excess VW returns on risk factors

    # First, create empty tables to store portfolio alphas and their tstats
vw_portf_coeff = pd.DataFrame(np.nan, index = ['const', 'Mkt-RF','SMB', 'HML'], columns = vw_ret.columns)
vw_portf_tstats = pd.DataFrame(np.nan, index = ['const', 'Mkt-RF','SMB', 'HML'], columns = vw_ret.columns)

# Regressions for each portfolio
for p in vw_ret.columns:
    #Set up the data
        # Dependent variable is excess return on the portfolio
    y = alldata[p] - alldata['RF']
        # Except for the spread portfolio (which is alread an excess return)
    if p == 'Spread':
        y = alldata[p] 
        
        # Independent variables are the risk factors
    X = alldata[['const','Mkt-RF','SMB','HML']]
    
    # Run the regression
    res = sm.OLS(y, X, missing='drop').fit()
    res_robust = res.get_robustcov_results(cov_type = 'HAC', maxlags = 4)
    
    # Store the results
    vw_portf_coeff.loc[:,p] = res_robust.params 
    vw_portf_tstats.loc[:,p] = res_robust.tvalues 

In [15]:
# 10: final result
print("Output for part 10:")

vw_alphas = vw_portf_coeff.loc['const',:].to_frame(name='VW_alphas')
vw_tstats = vw_portf_tstats.loc['const',:].to_frame(name='VW_alpha_tstats')
vw_results = vw_alphas.join(vw_tstats)
vw_results

Output for part 10:


Unnamed: 0_level_0,VW_alphas,VW_alpha_tstats
portf_nr,Unnamed: 1_level_1,Unnamed: 2_level_1
1,-0.015835,-5.464797
2,-0.009969,-4.848099
3,-0.005654,-3.630615
4,-0.001645,-1.411047
5,-0.000828,-1.023927
6,-0.001115,-1.606668
7,0.000489,0.651764
8,0.001369,1.963663
9,0.001285,1.355828
10,0.004616,3.014059
