# Return Predictability

## Table of Contents
1. Statistical Framework
    - 1.1 Autocorrelations
    - 1.2 Variance Ratios
    - 1.3 Persistance across size portfolios
2. Long-Horizon Predictability
    - 2.1 Fama-French (1988)
        - 2.1.1 Emprical Implementation
    - 2.2 Predictors other than Returns Themselves


REcall that in our factor regressions, we regressed time t valiables on time t variables. These regressions thus allowed us to answer questions like 'how to assets comove?' We compared across assets. The financial consepts were those of security selection: 'conditional on a time t factor return, what do I expect asset i's return at t to be? E[Rit[ft]]

In In contrast, a time series analyst is interested in regressions of time t + h variables, where h > 0, on time t variables. They could be trying to answer 'how do risk premia vary over time?' that is, 'how persistent are returns?'. In contrast to our analysis in the first quarter of the course, we now compare acress time. The financial concepts are those of market timing: 'conditional on information at time t, what do I expect returns to be at time t + h?'

## Statistical Framework

All stitistical hypothese are tested by null hypotheses that are formed with assumptions. The null hypothesis forms the 'base case' for what we compare what we see in the data to. If the data tell us that the null hypothesis is not true, we reject the null hypothesis as being true.
In regression, when using ordinary least squares, for example, we assumed homoscedasicity that led us to the variance of coefficients var(b-hat|X) = sig^2(X'X)^-1. We then tested if a coefficient, say a CAPM alpha, was different from zero. The null was that there should be no alpha, and this was informed by finance theory (the CAPM).

For return predictability, we set a null hypothesis of no predictability. For time series processes, our null is naturally that returns should be uncorrelated over time. Consistent with this is an assumptio that stock prices follow random walks and, equivalently that log returns are iid over time.

rt ~(iid) N(mu, sig^2)

Thus, two hull hypothese that follw from an iid assumption have to do with autocorrelation coefficients rho:

H0: rho-j = 0 for all j >= 1
H0: rho-1 = rho-2 = ... = rho-j = 0

which we saw before, respectively, in time series when computing Bartlett standard errors for autocorrelation coefficients (single tests) and portmanteau tests (joint tests) like Ljung-Box.

## Autocorrelations

Let's get some data and test these hypotheses for returns at varios frequencies: daily, weekly, and monthly.

In [23]:
%pip install numpy

Note: you may need to restart the kernel to use updated packages.


In [24]:
import pandas as pd
import statsmodels.api as sm
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np

In [36]:
x = pd.read_csv('CRSPmktportfolios_daily.txt', sep='\t', names=['year', 'month', 'day', 'equal_weighted_daily', 'value_weighted_daily'])
x['date'] = pd.to_datetime(x[['year', 'month', 'day']])
x.set_index('date', inplace=True)
x.drop(columns=['year', 'month', 'day'], inplace=True)

In [40]:
display(x)
x_log_daily = np.log(x + 1)
display(x_log_daily)


Unnamed: 0_level_0,equal_weighted_daily,value_weighted_daily
date,Unnamed: 1_level_1,Unnamed: 2_level_1
1963-07-01,-0.006464,-0.004666
1963-07-02,0.007971,0.005069
1963-07-03,0.006421,0.005419
1963-07-05,0.004134,0.003618
1963-07-08,-0.006111,-0.004468
...,...,...
2019-06-24,-0.002901,-0.006011
2019-06-25,-0.008980,-0.006024
2019-06-26,-0.001043,0.000982
2019-06-27,0.005847,0.009434


Unnamed: 0_level_0,equal_weighted_daily,value_weighted_daily
date,Unnamed: 1_level_1,Unnamed: 2_level_1
1963-07-01,-0.006485,-0.004677
1963-07-02,0.007939,0.005056
1963-07-03,0.006400,0.005404
1963-07-05,0.004125,0.003611
1963-07-08,-0.006130,-0.004478
...,...,...
2019-06-24,-0.002905,-0.006029
2019-06-25,-0.009021,-0.006042
2019-06-26,-0.001044,0.000982
2019-06-27,0.005830,0.009390


In [41]:
# Converting from daily to weekly frequency
x_log_weekly = x_log_daily.resample('W').mean()
x_log_weekly

Unnamed: 0_level_0,equal_weighted_daily,value_weighted_daily
date,Unnamed: 1_level_1,Unnamed: 2_level_1
1963-07-07,0.002995,0.002349
1963-07-14,-0.001137,0.000087
1963-07-21,-0.003609,-0.003281
1963-07-28,0.000335,-0.000152
1963-08-04,0.002131,0.000921
...,...,...
2019-06-02,-0.006211,-0.005538
2019-06-09,0.007950,0.003708
2019-06-16,0.000890,0.000944
2019-06-23,0.004305,0.003775


In [43]:
# Converting from weekly to monthly frequency
x_log_monthly = x_log_weekly.resample('ME').mean()
x_log_monthly

Unnamed: 0_level_0,equal_weighted_daily,value_weighted_daily
date,Unnamed: 1_level_1,Unnamed: 2_level_1
1963-07-31,-0.000354,-0.000249
1963-08-31,0.002402,0.001683
1963-09-30,0.000272,0.000501
1963-10-31,0.001156,0.001039
1963-11-30,-0.002907,-0.003092
...,...,...
2019-02-28,0.002587,0.002892
2019-03-31,0.000432,-0.000164
2019-04-30,0.001738,0.000911
2019-05-31,-0.001845,-0.002257


In [49]:
names = ['AC1', 'AC2', 'AC3', 'AC4', 'AC5', 'AC6', 'AC7', 'AC8', 'AC9', 'AC10', 'AC11', 'AC12']

In [50]:
ew_daily_acf = sm.tsa.stattools.acf(x_log_daily['equal_weighted_daily'], nlags=12, fft=True)
vw_daily_acf = sm.tsa.stattools.acf(x_log_daily['value_weighted_daily'], nlags=12, fft=True)

ew_weekly_acf = sm.tsa.stattools.acf(x_log_weekly['equal_weighted_daily'], nlags=12, fft=True)
vw_weekly_acf = sm.tsa.stattools.acf(x_log_weekly['value_weighted_daily'], nlags=12, fft=True)

ew_monthly_acf = sm.tsa.stattools.acf(x_log_monthly['equal_weighted_daily'], nlags=12, fft=True)
vw_monthly_acf = sm.tsa.stattools.acf(x_log_monthly['value_weighted_daily'], nlags=12, fft=True)

AttributeError: 'numpy.ndarray' object has no attribute 'index'