**Fin 585R**  
**Diether**  
**Problem Set**  
**Momentum Portfolios**  

**Overview**

In this problem set you reproduce your second seminal empirical result in academic finance. Specifically, you reproduce and extend (the original sample was about 1963 to 1990) **the momentum effect** of Jegadeesh and Titman (1993) (see "Returns to Buying Winners and Selling Losers: Implications for Stock Market Efficiency"). This empirical result spawned a huge literature in academic finance, and has been a critical core strategy for quant hedge funds (and others) for the last 30 years. You will find out in the next couple of weeks that models like the CAPM can't explain this portfolio return pattern at all. 

Momentum portfolios are formed based on past returns. Specifically, momentum portfolios are most commonly formed based on the cumulative return from months $t-12$ to $t-2$ (you should use this past return window for your portfolios):

$$
r_{i,t-12:t-2} \approx \sum_{x=2}^{12} \log(1+r_{i,t-x})
$$

Note, it's common practice to cumulate (or compound) the returns using the log approximation (as above). You certainly can do the following if you want (well, not for this problem set ... use log returns for the problem set):

$$
r_{i,t-12:t-2} = \left[ \prod_{x=2}^{12} \bigl(1+r_{i,t-x} \bigr) \right]  - 1
$$

The log approximation is traditionally used in this situation because it's less computational intensive. 

The data for this problem set are monthly observations for all stocks on the NYSE, AMEX, and Nasdaq from July of 1962 to  September of 2022. You can download the data directly using the following link: [the data](https://diether.org/prephd/06-mstk_62-22.csv). There is also a link on *Learning Suite*. The data contain the following variables that you will need for the assignment (it also contains som additional variables):

|Variable | Description                                              |
|---------|----------------------------------------------------------|
|permno   | stock identifier                                         |
|caldt    | calendar date                                            |
|ticker   | ticker symbol                                            |
|prc      | stock price (not lagged, contemporaneous with returns)   |
|me       | market equity (not lagged, contemporaneous with returns) |
|ret      | monthly return                                           |
|shr      | shares outstanding in 1000s                              |


**Tasks**

1. Form quintile based equal-weight momentum portfolios and report summary statistics (including a t-test of whether the average return is statistically different from zero for each portfolio). Note, you should exclude low price stocks from your portfolios (price below $5). We will discuss the code for creating the portfolio formation variable in the class before the assignment. <br><br>

2. Compute the average number of stocks that are in each portfolio.<br><br>

3. Add a spread portfolio (long portfolio 4 and short portfolio 0 $\leftarrow$ it's a zero cost L/S portfolio) to your dataframe of equal-weight momentum portfolios and then compute the summary statistics.<br><br>

4. Form quintile based value-weight momentum portfolios and report summary statistics (including a t-test of whether the average return is statistically different from zero for each portfolio). You should once again have five portfolios (note, the only difference between your equal-weight and value-weight portfolios will be the weights). Note, a value weight portfolio is defined as the following ($me$ refers to the marke value of equity): <br><br>
$$
r_{pt} = \sum_{i=1}^{n} \omega_{i}r_{it} = \sum_{i=1}^{n} \left(\frac{me_{i,t-1}}{\sum_{j=1}^{n} me_{j,t-1}} \right) r_{it}
$$<br><br>
Hint: think about splitting the formula into the following parts delineated by the parentheses:<br><br>
\begin{align*}
r_{pt} &= \left( \frac{1}{\sum_{j=1}^{n} me_{j,t-1}} \right) \left( \sum_{i=1}^{n} me_{i,t-1} r_{it}
          \right)
 \end{align*}<br><br>
And then compute each part as a separate groupby. Finally, just multiple the resulting dataframes together and you will have computed the value-weight portfolio returns. <br><br>

In [1]:
import pandas as pd
import numpy as np

**Background: Using the `rolling` Method**  

Given we are working with log returns, we cumulate the past returns using a sum. Therefore, `rolling().sum()` is the rolling window method that makes the most sense in this context. Let's take a look at how `rolling().sum()` works.

In [2]:
df = pd.DataFrame({'id':['a','b','c','d','e','f','g'],
                   'val':range(1,8)})
df

Unnamed: 0,id,val
0,a,1
1,b,2
2,c,3
3,d,4
4,e,5
5,f,6
6,g,7


In [3]:
df['rsum'] = df['val'].rolling(3).sum()
df

Unnamed: 0,id,val,rsum
0,a,1,
1,b,2,
2,c,3,6.0
3,d,4,9.0
4,e,5,12.0
5,f,6,15.0
6,g,7,18.0


Note, the timing of the `rolling().sum` above. For example, the three observations `rolling(3).sum()` for id = 'd' (row index = 3) is 9 (2+3+4). So the sum include the current observations. We'll have to take the into account to compute returns from t-12 to t-2.

In [4]:
df = pd.DataFrame({'id':['a','b','c','d','e','f','g','h'],
                   'g':['1','1','1','1','2','2','2','2'],
                   'val':range(1,9)})
df

Unnamed: 0,id,g,val
0,a,1,1
1,b,1,2
2,c,1,3
3,d,1,4
4,e,2,5
5,f,2,6
6,g,2,7
7,h,2,8


In [5]:
df.groupby('g')['val'].rolling(2).sum()

g   
1  0     NaN
   1     3.0
   2     5.0
   3     7.0
2  4     NaN
   5    11.0
   6    13.0
   7    15.0
Name: val, dtype: float64

**Using a Rolling Sum and Shift**

+ Can't just add shift at the end.<br><br>

+ That shifts the whole dataframe. Not shifts within groups<br><br>

+ Use two separate groupbys.

In [6]:
df.groupby('g')['val'].rolling(2).sum().shift(1)

g   
1  0     NaN
   1     NaN
   2     3.0
   3     5.0
2  4     7.0
   5     NaN
   6    11.0
   7    13.0
Name: val, dtype: float64

In [7]:
df['roll'] = df.groupby('g')['val'].rolling(2).sum().reset_index(drop=True)
df['rolllag'] = df.groupby('g')['roll'].shift()
df

Unnamed: 0,id,g,val,roll,rolllag
0,a,1,1,,
1,b,1,2,3.0,
2,c,1,3,5.0,3.0
3,d,1,4,7.0,5.0
4,e,2,5,,
5,f,2,6,11.0,
6,g,2,7,13.0,11.0
7,h,2,8,15.0,13.0


**Momentum portfolio construction**  

1. Data Preparation.<br><br>

2. Create portfolio formation variable.<br><br>

3. Bin the data<br><br>

4. Create the portfolios based on bins and weights scheme.<br><br>

In [8]:
df = pd.read_csv('06-mstk_62-22.csv',parse_dates=['caldt'])
df

Unnamed: 0,permno,caldt,ticker,prc,me,ret,shr
0,10000,1986-01-31,OMFGA,4.37500,16.1000,,3680.0
1,10000,1986-02-28,OMFGA,3.25000,11.9600,-0.257143,3680.0
2,10000,1986-03-31,OMFGA,4.43750,16.3300,0.365385,3680.0
3,10000,1986-04-30,OMFGA,4.00000,15.1720,-0.098592,3793.0
4,10000,1986-05-30,OMFGA,3.10938,11.7939,-0.222656,3793.0
...,...,...,...,...,...,...,...
3384890,93436,2022-05-31,TSLA,758.26000,785565.0000,-0.129197,1036010.0
3384891,93436,2022-06-30,TSLA,673.42000,701030.0000,-0.111888,1041000.0
3384892,93436,2022-07-29,TSLA,891.45000,931111.0000,0.323765,1044490.0
3384893,93436,2022-08-31,TSLA,275.61000,863616.0000,-0.072489,3133470.0


**Cumulative rolling past returns**

1. Create log returns.<br><br>

2. Create 12 period cumulative log return windows: t-11 to t-0<br><br>

3. Lag/shift two periods<br><br>

In [9]:
df['logret'] = np.log(1 + df['ret'])
df['mom'] = df.groupby('permno')['logret'].rolling(11,11).sum().reset_index(drop=True)
df['mom'] = df.groupby('permno')['mom'].shift(2)
df.head(15)

Unnamed: 0,permno,caldt,ticker,prc,me,ret,shr,logret,mom
0,10000,1986-01-31,OMFGA,4.375,16.1,,3680.0,,
1,10000,1986-02-28,OMFGA,3.25,11.96,-0.257143,3680.0,-0.297252,
2,10000,1986-03-31,OMFGA,4.4375,16.33,0.365385,3680.0,0.311436,
3,10000,1986-04-30,OMFGA,4.0,15.172,-0.098592,3793.0,-0.103797,
4,10000,1986-05-30,OMFGA,3.10938,11.7939,-0.222656,3793.0,-0.251872,
5,10000,1986-06-30,OMFGA,3.09375,11.7346,-0.005025,3793.0,-0.005038,
6,10000,1986-07-31,OMFGA,2.84375,10.7863,-0.080808,3793.0,-0.08426,
7,10000,1986-08-29,OMFGA,1.09375,4.14859,-0.615385,3793.0,-0.955512,
8,10000,1986-09-30,OMFGA,1.03125,3.91153,-0.057143,3793.0,-0.058841,
9,10000,1986-10-31,OMFGA,0.78125,3.00234,-0.242424,3843.0,-0.277631,


**Lag Variables Before Removing Any Observations**

+ Need to remove missing `mom` observations before binning.<br><br>

+ So lag price and market-cap before you do that.<br><br>

+ I'll remove low priced stock at the same time.<br><br>

+ Remember, you must always impose this restriction using lagged price. Otherwise, you will create a look ahead bias in your portfolio formation.<br><br>

In [10]:
df['prclag'] = df.groupby('permno')['prc'].shift()
df['melag'] = df.groupby('permno')['me'].shift(1)

df = df.query("mom == mom and prclag >= 5").reset_index(drop=True)
df.head(10)

Unnamed: 0,permno,caldt,ticker,prc,me,ret,shr,logret,mom,prclag,melag
0,10001,1987-02-27,GFGC,6.25,6.19375,-0.074074,991.0,-0.076961,0.196691,6.75,6.68925
1,10001,1987-03-31,GFGC,6.375,6.31763,0.0368,991.0,0.036139,0.140121,6.25,6.19375
2,10001,1987-04-30,GFGC,6.125,6.06987,-0.039216,991.0,-0.040005,0.038272,6.375,6.31763
3,10001,1987-05-29,GFGC,5.6875,5.63631,-0.071429,991.0,-0.074108,0.064559,6.125,6.06987
4,10001,1987-06-30,GFGC,5.875,5.82212,0.051429,991.0,0.05015,0.034406,5.6875,5.63631
5,10001,1987-07-31,GFGC,6.0,5.946,0.021277,991.0,0.021053,-0.026547,5.875,5.82212
6,10001,1987-08-31,GFGC,6.5,6.4415,0.083333,991.0,0.080043,0.03386,6.0,5.946
7,10001,1987-09-30,GFGC,6.25,6.2,-0.022308,992.0,-0.02256,-0.014767,6.5,6.4415
8,10001,1987-10-30,GFGC,6.375,6.324,0.02,992.0,0.019803,0.068358,6.25,6.2
9,10001,1987-11-30,GFGC,6.1875,6.138,-0.029412,992.0,-0.029853,0.007331,6.375,6.324


**Bin the Data/Create Portfolio Breakpoints**

+ For the short selling loan fee portfolios we used cut to create bins.<br><br>

+ Here we want to create bins based on the quintiles of the `mom` variable so we use `qcut`.<br><br>

+ `qcut` also must be done with a groupby; the quintiles will have different breakpoints ever month.<br><br>

+ Specifically we use `groupby` and transform to call `qcut` every month and transform the `mom` variable into bins.<br><br>

+ Use transform when you're mapping a Nx1 variable (mom) into and new Nx1 variable (bins).<br><br>

In [11]:
df.groupby('caldt')['mom'].transform(pd.qcut,5)

0            (0.142, 0.286]
1            (0.0255, 0.16]
2           (-0.13, 0.0423]
3           (0.0465, 0.176]
4           (-0.0212, 0.11]
                 ...       
2215018       (0.189, 2.39]
2215019      (0.104, 2.053]
2215020      (0.107, 1.842]
2215021    (-0.0605, 0.052]
2215022     (0.0933, 1.255]
Name: mom, Length: 2215023, dtype: interval

In [12]:
df.groupby('caldt')['mom'].transform(pd.qcut,5,labels=False)

0          3
1          2
2          1
3          2
4          2
          ..
2215018    4
2215019    4
2215020    4
2215021    3
2215022    4
Name: mom, Length: 2215023, dtype: int64

In [13]:
df['bins'] = df.groupby('caldt')['mom'].transform(pd.qcut,5,labels=False)
df

Unnamed: 0,permno,caldt,ticker,prc,me,ret,shr,logret,mom,prclag,melag,bins
0,10001,1987-02-27,GFGC,6.2500,6.19375,-0.074074,991.0,-0.076961,0.196691,6.7500,6.68925,3
1,10001,1987-03-31,GFGC,6.3750,6.31763,0.036800,991.0,0.036139,0.140121,6.2500,6.19375,2
2,10001,1987-04-30,GFGC,6.1250,6.06987,-0.039216,991.0,-0.040005,0.038272,6.3750,6.31763,1
3,10001,1987-05-29,GFGC,5.6875,5.63631,-0.071429,991.0,-0.074108,0.064559,6.1250,6.06987,2
4,10001,1987-06-30,GFGC,5.8750,5.82212,0.051429,991.0,0.050150,0.034406,5.6875,5.63631,2
...,...,...,...,...,...,...,...,...,...,...,...,...
2215018,93436,2022-05-31,TSLA,758.2600,785565.00000,-0.129197,1036010.0,-0.138340,0.418017,870.7600,902116.00000,4
2215019,93436,2022-06-30,TSLA,673.4200,701030.00000,-0.111888,1041000.0,-0.118657,0.331264,758.2600,785565.00000,4
2215020,93436,2022-07-29,TSLA,891.4500,931111.00000,0.323765,1044490.0,0.280480,0.109376,673.4200,701030.00000,4
2215021,93436,2022-08-31,TSLA,275.6100,863616.00000,-0.072489,3133470.0,-0.075250,-0.020255,891.4500,931111.00000,3


1. Form quintile based equal-weight momentum portfolios and report summary statistics (including a t-test of whether the average return is statistically different from zero for each portfolio). Note, you should exclude low price stocks from your portfolios (price below $5). We will discuss the code for creating the portfolio formation variable in the class before the assignment. <br><br>

In [22]:
ew = df.groupby(['caldt', 'bins'])['ret'].mean().unstack() * 100
from finance_byu.summarize import summary
summary(ew).round(2)

bins,0,1,2,3,4
count,711.0,711.0,711.0,711.0,711.0
mean,0.41,0.95,1.14,1.31,1.62
std,6.77,5.14,4.69,4.83,6.32
tstat,1.61,4.93,6.47,7.22,6.83
pval,0.11,0.0,0.0,0.0,0.0
min,-27.94,-23.77,-25.29,-28.54,-31.34
25%,-3.12,-1.62,-1.3,-1.29,-1.61
50%,0.6,1.28,1.63,1.75,2.04
75%,3.97,3.7,3.89,4.38,5.42
max,31.71,23.75,20.51,18.01,31.63


In [24]:
# Standard renaming syntax
ew = (df.groupby(['caldt', 'bins'])['ret'].mean().unstack().rename('p{:.0f}'.format, axis = 'columns') * 100)
summary(ew).round(2)

bins,p0,p1,p2,p3,p4
count,711.0,711.0,711.0,711.0,711.0
mean,0.41,0.95,1.14,1.31,1.62
std,6.77,5.14,4.69,4.83,6.32
tstat,1.61,4.93,6.47,7.22,6.83
pval,0.11,0.0,0.0,0.0,0.0
min,-27.94,-23.77,-25.29,-28.54,-31.34
25%,-3.12,-1.62,-1.3,-1.29,-1.61
50%,0.6,1.28,1.63,1.75,2.04
75%,3.97,3.7,3.89,4.38,5.42
max,31.71,23.75,20.51,18.01,31.63


2. Compute the average number of stocks that are in each portfolio.<br><br>

In [25]:
df.groupby(['caldt', 'bins'])['ret'].count().unstack(level = 'bins').rename('p{:.0f}'.format, axis = 'columns').mean()

bins
p0    623.123769
p1    622.592124
p2    622.578059
p3    622.621660
p4    622.963432
dtype: float64


3. Add a spread portfolio (long portfolio 4 and short portfolio 0 $\leftarrow$ it's a zero cost L/S portfolio) to your dataframe of equal-weight momentum portfolios and then compute the summary statistics.<br><br>


In [26]:
ew['spread'] = ew['p4'] - ew['p0']
summary(ew).round(2)
# Really nice and big t-stat!

bins,p0,p1,p2,p3,p4,spread
count,711.0,711.0,711.0,711.0,711.0,711.0
mean,0.41,0.95,1.14,1.31,1.62,1.21
std,6.77,5.14,4.69,4.83,6.32,4.55
tstat,1.61,4.93,6.47,7.22,6.83,7.09
pval,0.11,0.0,0.0,0.0,0.0,0.0
min,-27.94,-23.77,-25.29,-28.54,-31.34,-27.12
25%,-3.12,-1.62,-1.3,-1.29,-1.61,-0.66
50%,0.6,1.28,1.63,1.75,2.04,1.47
75%,3.97,3.7,3.89,4.38,5.42,3.35
max,31.71,23.75,20.51,18.01,31.63,29.49



4. Form quintile based value-weight momentum portfolios and report summary statistics (including a t-test of whether the average return is statistically different from zero for each portfolio). You should once again have five portfolios (note, the only difference between your equal-weight and value-weight portfolios will be the weights). Note, a value weight portfolio is defined as the following ($me$ refers to the marke value of equity): <br><br>
$$
r_{pt} = \sum_{i=1}^{n} \omega_{i}r_{it} = \sum_{i=1}^{n} \left(\frac{me_{i,t-1}}{\sum_{j=1}^{n} me_{j,t-1}} \right) r_{it}
$$<br><br>
Hint: think about splitting the formula into the following parts delineated by the parentheses:<br><br>
\begin{align*}
r_{pt} &= \left( \frac{1}{\sum_{j=1}^{n} me_{j,t-1}} \right) \left( \sum_{i=1}^{n} me_{i,t-1} r_{it}
          \right)
 \end{align*}<br><br>
And then compute each part as a separate groupby. Finally, just multiple the resulting dataframes together and you will have computed the value-weight portfolio returns. <br><br>

In [27]:
mcapsum = df.groupby(['caldt', 'bins'])['melag'].sum()

df['rme'] = df['ret'] * df['melag']
vw = df.groupby(['caldt', 'bins'])['rme'].sum() / mcapsum
vw

caldt       bins
1963-07-31  0      -0.016848
            1      -0.005667
            2       0.006116
            3       0.001417
            4      -0.003177
                      ...   
2022-09-30  0      -0.120950
            1      -0.115262
            2      -0.088376
            3      -0.092416
            4      -0.060210
Length: 3555, dtype: float64

In [29]:
vw = vw.unstack(level = 'bins').rename('p{:.0f}'.format, axis = 'columns') * 100
vw

bins,p0,p1,p2,p3,p4
caldt,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
1963-07-31,-1.684790,-0.566724,0.611594,0.141664,-0.317750
1963-08-30,5.283128,5.319040,4.752087,4.836893,6.456210
1963-09-30,-3.279339,0.664446,-1.051720,-1.395222,-2.508058
1963-10-31,2.369550,-0.283502,2.815613,1.314008,6.873582
1963-11-29,-0.775725,1.814131,-1.304004,-1.915759,-0.519791
...,...,...,...,...,...
2022-05-31,-4.686609,2.988749,-0.082342,-0.763466,-0.343652
2022-06-30,-11.458108,-10.989660,-7.952733,-6.587350,-7.691200
2022-07-29,11.306234,13.595063,9.043699,10.726340,7.016211
2022-08-31,-1.494352,-6.191978,-4.266549,-3.525443,-1.345361


In [33]:
vw['spread'] = vw['p4'] - vw['p0']
summary(vw).round(3)

bins,p0,p1,p2,p3,p4,spread
count,711.0,711.0,711.0,711.0,711.0,711.0
mean,0.427,0.815,0.854,1.018,1.321,0.894
std,6.624,4.836,4.345,4.462,5.712,5.636
tstat,1.719,4.494,5.242,6.083,6.167,4.229
pval,0.086,0.0,0.0,0.0,0.0,0.0
min,-25.434,-20.968,-20.641,-22.346,-26.515,-31.624
25%,-3.068,-1.764,-1.614,-1.59,-1.889,-1.727
50%,0.374,1.039,1.043,1.311,1.669,1.261
75%,3.931,3.468,3.597,3.776,4.805,3.928
max,29.151,17.905,13.877,18.637,24.753,31.979
