# Replicating the PRF

My conjecture is that the PRF has not done a good job of predicting returns in the past 10 years. To test this, I will use the strongest version of the PRF estimator found in the paper -- in particular I am going to use the price moving average (price minus 3y moving average of price) of the top 500 stocks in each period to predict one month ahead returns. For simplicity, in each period I just use the In this notebook you will be able to see

1. How the algorithm is applied in a given month
2. The time series of the returns over time
3. A rough estimate of how the quality of the forecasts have changed over time.

For the data, I use the CRSP tape going back to 1929 to June 2018 as well as the CRSP value weighted returns. I've already cleaned the data to target the 

In [1]:
import sys
sys.path.append('../Code')
from utils import *
from beakerx import *
from beakerx.object import beakerx
import statsmodels.regression.linear_model as sm

# Loading the Data

In [2]:
crsp = pd.read_hdf('../Output/merged.h5')

In [3]:
sp500 = pd.read_csv('../Data/sp500_returns.txt', sep = '\s+')

sp500_dict = {'date_vars': ['caldt'],
               'float_vars': ['vwretd'], 
               'int_vars': []}

sp500 = clean_data(sp500, sp500_dict)
sp500 = sp500.rename(columns = {'vwretd': 'S&P 500 Return', 'caldt': 'datadate'})
sp500['S&P 500 Return'] = pd.to_numeric(sp500['S&P 500 Return'], errors = 'coerce')
sp500['Next Month Return'] = sp500['S&P 500 Return'].shift(-1)
sp500 = sp500.loc[:, ['datadate', 'S&P 500 Return', 'Next Month Return']]
sp500['datadate'] = pd.to_datetime(sp500['datadate'], format = '%Y/%m/%d', errors = 'coerce')
sp500 = sp500.loc[~sp500['Next Month Return'].isnull(), :]
sp500 = sp500.loc[~sp500['datadate'].isnull(), :]
sp500 = sp500.safe_index('datadate')

Cleaning date variables:
caldt
Cleaning numeric variables:
vwretd
Cleaning integer variables:
Final data types:
caldt     datetime64[ns]
vwretd           float64
vwretx            object
usdval            object
dtype: object


In [4]:
def continuous_index(company_dataframe, num_months = 1):
    """
    Detects if a company has continuous return data. Returns "True" if the returns are continuous. False otherwise
    :param company_dataframe -- a dataframe from a groupby object, indexed on Permco and the time variable at level 2
    :param num_months -- the maximum number of months between timestamps
    :returns True or False, depending on whether the return sequence is continuous
    """
    times = company_dataframe.index.get_level_values('datadate')
    diffs = times.shift(1, freq = 'M') - times
    return (diffs.max().total_seconds() < 32 * num_months * 24 * 60 * 60)

assert(continuous_index(sp500)) # Check no missing dates

In [5]:
sp500 = sp500.safe_drop(variable_list = ['index'])

In [6]:
returns_only = crsp.loc[:, ['Company Name', 'Cumulative Return', 'Market Cap (Billions, CRSP)']]

In [7]:
crsp.columns

Index(['Company Name', 'Permno', 'Ticker', 'Price', 'Bid', 'Ask',
       'Exchange Code', 'Market Cap (Billions, CRSP)', 'Return', 'Volume',
       'Price Volume (Billions)', 'Log Return', 'Cumulative Return',
       'Price Volume (3mma)', 'Market Cap (3mma)',
       'Volume (% of Market Cap, 3mma)', 'Fiscal Year', 'Gvkey', 'Permno.comp',
       'Fiscal Quarter', 'Company Name.comp', 'Currency', 'Report Date',
       'NAICS Sector Code', 'Exchange Code.comp', 'Assets, Total',
       'Common Equity, Total', 'Shareholder Equity, Total',
       'Preferred Equity, Total', 'Liabilities, Total', 'Deferred Tax Assets',
       'Long Term Debt', 'Short Term Debt', 'Cash', 'Sales', 'COGS', 'SG&A',
       'Depreciation and Amortization', 'EBITDA', 'EBIT', 'Interest Expense',
       'Taxes Payable', 'Net Income', 'Diluted EPS, Raw',
       'Diluted EPS, Adjusted', 'Operating Cash Flow', 'Financing Activities',
       'Long Term Debt, Gross Issuance', 'Long Term Debt, Retired',
       'Cash Dividen

In [8]:
returns_only = returns_only.join(sp500, on = 'datadate')

In [9]:
returns_only = returns_only.loc[returns_only.index.get_level_values('datadate') < '2018-06-30', :]
invalid_returns = returns_only.loc[(returns_only['S&P 500 Return'].isnull()) | (returns_only['S&P 500 Return'] > 1), :]

## Testing out the Price Moving Average Measure

In [10]:
PMA_NUM_YEARS = 3 # Moving average window over which to compute the PMA
ret = returns_only['Cumulative Return'].groupby(by = ['Permco'], group_keys = False).rolling(PMA_NUM_YEARS * 12).mean()
returns_only['Price Moving Average'] = ret

In [11]:
returns_only['Lagged Cumulative Return'] = returns_only['Cumulative Return'].groupby(['Permco']).shift(1) # Use two months to be more robust. In line with momentum lit
returns_only['Price Diff to Moving Average'] = returns_only['Lagged Cumulative Return'] - returns_only['Price Moving Average']
# returns_only['Price Diff to Moving Average'] = returns_only['Cumulative Return'] - returns_only['Price Moving Average']

In [12]:
returns_only['Intercept'] = 1

In [13]:
returns_only = returns_only.loc[~returns_only['Price Diff to Moving Average'].isnull(), :]

In [14]:
# Quickly visualize the PMA
apple = returns_only.xs(7, level = 'Permco')
plot = TimePlot(title = 'Apple Price Moving Average', legendLayout=LegendLayout.HORIZONTAL,\
                      legendPosition=LegendPosition(position=LegendPosition.Position.TOP),\
                    initWidth = 1000)
plot.add(Line(displayName = 'Cumulative Return', \
              x = apple.index.get_level_values('datadate'),\
              y = apple['Cumulative Return']))
plot.add(Line(displayName = 'Cumulative Return', \
              x = apple.index.get_level_values('datadate'),\
              y = apple['Price Moving Average']))
plot.add(Line(displayName = 'Price v Moving Average', \
              x = apple.index.get_level_values('datadate'),\
              y = apple['Price Diff to Moving Average']))

## Basic Estimation

In [15]:
# Full Code for 3PRF

MIN_NON_MISSING = 0.7

def calc_ts_regression(one_company):
    valuation_on_returns = sm.OLS(one_company['Price Diff to Moving Average'], one_company[['Intercept', 'Next Month Return']]).fit()
    ret = valuation_on_returns.params['Next Month Return']
    return ret

def calc_cross_sectional_regression(one_date):
    valuations_on_loadings = sm.OLS(one_date['Price Diff to Moving Average'], one_date[['Intercept', 'TS Loading']]).fit()
    ret = valuations_on_loadings.params['TS Loading']
    next_month = one_date['Next Month Return'].values[0]
    return pd.DataFrame.from_dict({'Factor': [ret], 'Next Month Return': [next_month], 'Intercept': 1})

def calculate_forecast_on_date(full_dataframe, forecast_date, estimation_window = 5, company_per_date = 100, verbose = False):

    print('Forecasting on: ' + str(forecast_date))
        
    target_datetime = pd.to_datetime(forecast_date, format = '%Y-%m-%d')
    first_datetime = target_datetime - pd.Timedelta(estimation_window, 'Y')
    forecast_sample = returns_only.loc[(full_dataframe.index.get_level_values('datadate') <= target_datetime) & (returns_only.index.get_level_values('datadate') > first_datetime)]
    
    # Grab the largest firms
    firms_in_sample = forecast_sample[['Market Cap (Billions, CRSP)', 'Company Name']].groupby(['Permco']).last()
    firms_in_sample = firms_in_sample.nlargest(columns = ['Market Cap (Billions, CRSP)'], n = company_per_date)
    large_firms_for_forecast = forecast_sample.join(firms_in_sample['Market Cap (Billions, CRSP)'], rsuffix = '.sort', how = 'inner')
    
    if verbose:
        print('Firms Used in Forecasting')
        print(firms_in_sample)
    
    # Count missing observations
    cos_with_missing = large_firms_for_forecast['Price Diff to Moving Average'].groupby(['Permco']).apply(lambda s: s.isna().mean())
    valid_cos = cos_with_missing[cos_with_missing > MIN_NON_MISSING]
    valid_cos.name = 'Number Missing'
    large_firms_for_forecast = large_firms_for_forecast.join(valid_cos)
    
    # Be careful in separating today's data from yesterday's
    yesterday = large_firms_for_forecast.loc[large_firms_for_forecast.index.get_level_values('datadate') < target_datetime - pd.Timedelta(1, 'M') - MonthEnd(1), :]
    
    # Do time series regressions
    ts_loadings = yesterday.groupby(['Permco']).apply(calc_ts_regression)
    ts_loadings.name = 'TS Loading'
    
    # Now do the time series regressions
    yesterday = yesterday.join(ts_loadings)
    implied_factors = yesterday.groupby(['datadate']).apply(calc_cross_sectional_regression)
    
    # Now do the forecasting regression
    return_on_factor = sm.OLS(implied_factors['Next Month Return'], implied_factors[['Intercept', 'Factor']]).fit()
    
    # Now join in the XS loadings and calculate today's factor
    today = large_firms_for_forecast.loc[large_firms_for_forecast.index.get_level_values('datadate') == target_datetime, :]
    today = today.join(ts_loadings)
    today = today.loc[~today['TS Loading'].isnull(), :]
    
    red_form_correl = np.corrcoef(today['TS Loading'], today['Price Diff to Moving Average'])[0, 1]
    
    if verbose:
        print('Todays Metrics')
        print(today[['Company Name', 'TS Loading', 'Price Diff to Moving Average']].round(2))    
        print('Correlation Between PMA and Loading: %0.2f' % red_form_correl)
    
    factor_today = calc_cross_sectional_regression(today)
    return pd.DataFrame.from_dict({'Factor': [factor_today['Factor'].values[0]], 'Next Month Return': [today['Next Month Return'].values[0]], 'datadate': [forecast_date], 'sorting_loading_correl': [red_form_correl]})

    # return today_expected_return.values[0]

In [16]:
calculate_forecast_on_date(returns_only, '2010-01-31', verbose = True)

Forecasting on: 2010-01-31
Firms Used in Forecasting
        Market Cap (Billions, CRSP)                      Company Name
Permco                                                               
20678                    304.876188                  EXXON MOBIL CORP
8048                     247.151591                    MICROSOFT CORP
21880                    203.577490               WAL MART STORES INC
21446                    178.777699               PROCTER & GAMBLE CO
540                      177.835127        BERKSHIRE HATHAWAY INC DEL
7                        174.161768                         APPLE INC
21018                    173.437026                 JOHNSON & JOHNSON
20792                    171.211720               GENERAL ELECTRIC CO
20990                    160.771871  INTERNATIONAL BUSINESS MACHS COR
20436                    153.501480               JPMORGAN CHASE & CO
21394                    150.577542                        PFIZER INC
21645                    149.674720  

In [17]:
START_DATE = '1930-01-31'
date_sequence = returns_only.index.get_level_values('datadate')
date_sequence = date_sequence[date_sequence > START_DATE].unique()
date_sequence = sorted(date_sequence)

In [20]:
with Pool(4) as p:
    all_forecasts = p.map(lambda t: calculate_forecast_on_date(returns_only, t, company_per_date = 1000), date_sequence)

Forecasting on: 1930-02-28 00:00:00
Forecasting on: 1935-09-30 00:00:00
Forecasting on: 1941-04-30 00:00:00
Forecasting on: 1946-11-30 00:00:00
Forecasting on: 1930-03-31 00:00:00
Forecasting on: 1935-10-31 00:00:00
Forecasting on: 1941-05-31 00:00:00
Forecasting on: 1946-12-31 00:00:00
Forecasting on: 1930-04-30 00:00:00
Forecasting on: 1930-05-31 00:00:00
Forecasting on: 1935-11-30 00:00:00
Forecasting on: 1941-06-30 00:00:00
Forecasting on: 1947-01-31 00:00:00
Forecasting on: 1930-06-30 00:00:00
Forecasting on: 1935-12-31 00:00:00
Forecasting on: 1941-07-31 00:00:00
Forecasting on: 1930-07-31 00:00:00
Forecasting on: 1947-02-28 00:00:00
Forecasting on: 1936-01-31 00:00:00
Forecasting on: 1930-08-31 00:00:00
Forecasting on: 1941-08-31 00:00:00
Forecasting on: 1947-03-31 00:00:00
Forecasting on: 1930-09-30 00:00:00
Forecasting on: 1936-02-29 00:00:00
Forecasting on: 1941-09-30 00:00:00
Forecasting on: 1930-10-31 00:00:00
Forecasting on: 1947-04-30 00:00:00
Forecasting on: 1936-03-31 0

Forecasting on: 1945-09-30 00:00:00
Forecasting on: 1940-07-31 00:00:00
Forecasting on: 1951-01-31 00:00:00
Forecasting on: 1952-06-30 00:00:00
Forecasting on: 1945-10-31 00:00:00
Forecasting on: 1940-08-31 00:00:00
Forecasting on: 1951-02-28 00:00:00
Forecasting on: 1952-07-31 00:00:00
Forecasting on: 1945-11-30 00:00:00
Forecasting on: 1940-09-30 00:00:00
Forecasting on: 1951-03-31 00:00:00
Forecasting on: 1945-12-31 00:00:00
Forecasting on: 1952-08-31 00:00:00
Forecasting on: 1940-10-31 00:00:00
Forecasting on: 1946-01-31 00:00:00
Forecasting on: 1951-04-30 00:00:00
Forecasting on: 1952-09-30 00:00:00
Forecasting on: 1940-11-30 00:00:00
Forecasting on: 1946-02-28 00:00:00
Forecasting on: 1951-05-31 00:00:00
Forecasting on: 1940-12-31 00:00:00
Forecasting on: 1952-10-31 00:00:00
Forecasting on: 1946-03-31 00:00:00
Forecasting on: 1941-01-31 00:00:00
Forecasting on: 1951-06-30 00:00:00
Forecasting on: 1952-11-30 00:00:00
Forecasting on: 1946-04-30 00:00:00
Forecasting on: 1941-02-28 0

Forecasting on: 1962-01-31 00:00:00
Forecasting on: 1972-06-30 00:00:00
Forecasting on: 1967-05-31 00:00:00
Forecasting on: 1957-02-28 00:00:00
Forecasting on: 1962-02-28 00:00:00
Forecasting on: 1972-07-31 00:00:00
Forecasting on: 1967-06-30 00:00:00
Forecasting on: 1957-03-31 00:00:00
Forecasting on: 1962-03-31 00:00:00
Forecasting on: 1972-08-31 00:00:00
Forecasting on: 1967-07-31 00:00:00
Forecasting on: 1957-04-30 00:00:00
Forecasting on: 1962-04-30 00:00:00
Forecasting on: 1972-09-30 00:00:00
Forecasting on: 1967-08-31 00:00:00
Forecasting on: 1957-05-31 00:00:00
Forecasting on: 1962-05-31 00:00:00
Forecasting on: 1972-10-31 00:00:00
Forecasting on: 1967-09-30 00:00:00
Forecasting on: 1957-06-30 00:00:00
Forecasting on: 1962-06-30 00:00:00
Forecasting on: 1972-11-30 00:00:00
Forecasting on: 1967-10-31 00:00:00
Forecasting on: 1957-07-31 00:00:00
Forecasting on: 1962-07-31 00:00:00
Forecasting on: 1972-12-31 00:00:00
Forecasting on: 1967-11-30 00:00:00
Forecasting on: 1957-08-31 0

Forecasting on: 1983-07-31 00:00:00
Forecasting on: 1993-12-31 00:00:00
Forecasting on: 1988-11-30 00:00:00
Forecasting on: 1978-08-31 00:00:00
Forecasting on: 1983-08-31 00:00:00
Forecasting on: 1994-01-31 00:00:00
Forecasting on: 1988-12-31 00:00:00
Forecasting on: 1978-09-30 00:00:00
Forecasting on: 1983-09-30 00:00:00
Forecasting on: 1989-01-31 00:00:00
Forecasting on: 1994-02-28 00:00:00
Forecasting on: 1978-10-31 00:00:00
Forecasting on: 1983-10-31 00:00:00
Forecasting on: 1978-11-30 00:00:00
Forecasting on: 1989-02-28 00:00:00
Forecasting on: 1994-03-31 00:00:00
Forecasting on: 1983-11-30 00:00:00
Forecasting on: 1978-12-31 00:00:00
Forecasting on: 1989-03-31 00:00:00
Forecasting on: 1994-04-30 00:00:00
Forecasting on: 1983-12-31 00:00:00
Forecasting on: 1994-05-31 00:00:00
Forecasting on: 1989-04-30 00:00:00
Forecasting on: 1979-01-31 00:00:00
Forecasting on: 1984-01-31 00:00:00
Forecasting on: 1994-06-30 00:00:00
Forecasting on: 1989-05-31 00:00:00
Forecasting on: 1979-02-28 0

Forecasting on: 2000-01-31 00:00:00
Forecasting on: 2015-06-30 00:00:00
Forecasting on: 2010-05-31 00:00:00
Forecasting on: 2005-02-28 00:00:00
Forecasting on: 2000-02-29 00:00:00
Forecasting on: 2015-07-31 00:00:00
Forecasting on: 2010-06-30 00:00:00
Forecasting on: 2005-03-31 00:00:00
Forecasting on: 2000-03-31 00:00:00
Forecasting on: 2015-08-31 00:00:00
Forecasting on: 2010-07-31 00:00:00
Forecasting on: 2005-04-30 00:00:00
Forecasting on: 2000-04-30 00:00:00
Forecasting on: 2015-09-30 00:00:00
Forecasting on: 2010-08-31 00:00:00
Forecasting on: 2005-05-31 00:00:00
Forecasting on: 2000-05-31 00:00:00
Forecasting on: 2015-10-31 00:00:00
Forecasting on: 2010-09-30 00:00:00
Forecasting on: 2005-06-30 00:00:00
Forecasting on: 2000-06-30 00:00:00
Forecasting on: 2015-11-30 00:00:00
Forecasting on: 2010-10-31 00:00:00
Forecasting on: 2005-07-31 00:00:00
Forecasting on: 2000-07-31 00:00:00
Forecasting on: 2015-12-31 00:00:00
Forecasting on: 2010-11-30 00:00:00
Forecasting on: 2005-08-31 0

## Visualize The Factor

In [21]:
forecast_frame = pd.concat(all_forecasts)

In [22]:
forecast_frame['Intercept'] = 1

In [23]:
plot = TimePlot(title = 'Return Factor', legendLayout=LegendLayout.HORIZONTAL,\
                      legendPosition=LegendPosition(position=LegendPosition.Position.TOP),\
                    initWidth = 1000)
plot.add(Line(displayName = 'Return Factor (F)', \
              x = forecast_frame['datadate'],\
              y = forecast_frame['Factor']))

## Trying to replicate their particular plot

In [24]:
ROLL_WINDOW = int(40) # Number of years for rolling window
MIN_WINDOW = int(10) # Number of years for minimum window

In [25]:
from copy import copy

def oos_by_split_date(dataframe, split_date):
    
    print('Splitting on ' + str(split_date))

    before = dataframe.loc[dataframe['datadate'] < split_date, :]    
    after = copy(dataframe.loc[dataframe['datadate'] >= split_date, :])
    
    if (before.shape[0] < MIN_WINDOW * 12) or (after.shape[0] < MIN_WINDOW * 12):
        return np.nan
    
    sample_mean_before_date = before['Next Month Return'].mean()
    factor_coef = sm.OLS(before['Next Month Return'], before[['Intercept', 'Factor']]).fit()

    pred_3prf = factor_coef.predict(after[['Intercept', 'Factor']])
    after.loc[:, '3PRF'] = pred_3prf
    after.loc[:, 'Mean'] = sample_mean_before_date
    mse_3prf = ((after.loc[:, '3PRF'] - after.loc[:, 'Next Month Return']) ** 2).mean()
    mse_sample_mean = ((after.loc[:, 'Mean'] - after.loc[:, 'Next Month Return']) ** 2).mean()
    oos_r2 = 1 - mse_3prf / mse_sample_mean
    return oos_r2

In [26]:
with Pool(4) as p:
    forecast_frame['OOS By Split Date'] = p.map(lambda t: oos_by_split_date(forecast_frame, t), date_sequence)

Splitting on 1930-02-28 00:00:00
Splitting on 1935-09-30 00:00:00
Splitting on 1941-04-30 00:00:00
Splitting on 1946-11-30 00:00:00
Splitting on 1930-03-31 00:00:00
Splitting on 1935-10-31 00:00:00
Splitting on 1930-04-30 00:00:00
Splitting on 1935-11-30 00:00:00
Splitting on 1930-05-31 00:00:00
Splitting on 1935-12-31 00:00:00
Splitting on 1930-06-30 00:00:00
Splitting on 1936-01-31 00:00:00
Splitting on 1930-07-31 00:00:00
Splitting on 1936-02-29 00:00:00
Splitting on 1930-08-31 00:00:00
Splitting on 1936-03-31 00:00:00
Splitting on 1936-04-30 00:00:00
Splitting on 1930-09-30 00:00:00
Splitting on 1936-05-31 00:00:00
Splitting on 1930-10-31 00:00:00
Splitting on 1941-05-31 00:00:00
Splitting on 1936-06-30 00:00:00
Splitting on 1930-11-30 00:00:00
Splitting on 1946-12-31 00:00:00
Splitting on 1936-07-31 00:00:00
Splitting on 1930-12-31 00:00:00
Splitting on 1936-08-31 00:00:00
Splitting on 1931-01-31 00:00:00
Splitting on 1936-09-30 00:00:00
Splitting on 1931-02-28 00:00:00
Splitting 

Splitting on 1959-02-28 00:00:00
Splitting on 1954-09-30 00:00:00
Splitting on 1944-07-31 00:00:00
Splitting on 1949-11-30 00:00:00
Splitting on 1944-08-31 00:00:00
Splitting on 1959-03-31 00:00:00
Splitting on 1954-10-31 00:00:00
Splitting on 1944-09-30 00:00:00
Splitting on 1949-12-31 00:00:00
Splitting on 1944-10-31 00:00:00
Splitting on 1959-04-30 00:00:00
Splitting on 1954-11-30 00:00:00
Splitting on 1950-01-31 00:00:00
Splitting on 1959-05-31 00:00:00
Splitting on 1950-02-28 00:00:00
Splitting on 1959-06-30 00:00:00
Splitting on 1950-03-31 00:00:00
Splitting on 1944-11-30 00:00:00
Splitting on 1959-07-31 00:00:00
Splitting on 1944-12-31 00:00:00
Splitting on 1959-08-31 00:00:00
Splitting on 1954-12-31 00:00:00
Splitting on 1955-01-31 00:00:00
Splitting on 1950-04-30 00:00:00
Splitting on 1945-01-31 00:00:00
Splitting on 1950-05-31 00:00:00
Splitting on 1955-02-28 00:00:00
Splitting on 1959-09-30 00:00:00
Splitting on 1945-02-28 00:00:00
Splitting on 1955-03-31 00:00:00
Splitting 

Splitting on 1971-07-31 00:00:00
Splitting on 1976-10-31 00:00:00
Splitting on 1966-09-30 00:00:00
Splitting on 1976-11-30 00:00:00
Splitting on 1971-08-31 00:00:00
Splitting on 1966-10-31 00:00:00
Splitting on 1980-12-31 00:00:00
Splitting on 1976-12-31 00:00:00
Splitting on 1971-09-30 00:00:00
Splitting on 1966-11-30 00:00:00
Splitting on 1971-10-31 00:00:00
Splitting on 1981-01-31 00:00:00
Splitting on 1966-12-31 00:00:00
Splitting on 1977-01-31 00:00:00
Splitting on 1977-02-28 00:00:00
Splitting on 1967-01-31 00:00:00
Splitting on 1981-02-28 00:00:00
Splitting on 1971-11-30 00:00:00
Splitting on 1971-12-31 00:00:00
Splitting on 1977-03-31 00:00:00
Splitting on 1967-02-28 00:00:00
Splitting on 1967-03-31 00:00:00
Splitting on 1972-01-31 00:00:00
Splitting on 1977-04-30 00:00:00
Splitting on 1977-05-31 00:00:00
Splitting on 1981-03-31 00:00:00
Splitting on 1967-04-30 00:00:00
Splitting on 1981-04-30 00:00:00
Splitting on 1967-05-31 00:00:00
Splitting on 1967-06-30 00:00:00
Splitting 

Splitting on 1993-09-30 00:00:00
Splitting on 1989-02-28 00:00:00
Splitting on 1993-10-31 00:00:00
Splitting on 1993-11-30 00:00:00
Splitting on 1989-03-31 00:00:00
Splitting on 1998-05-31 00:00:00
Splitting on 1993-12-31 00:00:00
Splitting on 1989-04-30 00:00:00
Splitting on 1985-10-31 00:00:00
Splitting on 1989-05-31 00:00:00
Splitting on 1998-06-30 00:00:00
Splitting on 1998-07-31 00:00:00
Splitting on 1985-11-30 00:00:00
Splitting on 1994-01-31 00:00:00
Splitting on 1994-02-28 00:00:00
Splitting on 1989-06-30 00:00:00
Splitting on 1994-03-31 00:00:00
Splitting on 1998-08-31 00:00:00
Splitting on 2002-09-30 00:00:00
Splitting on 2002-10-31 00:00:00
Splitting on 1998-09-30 00:00:00
Splitting on 1994-04-30 00:00:00
Splitting on 1989-07-31 00:00:00
Splitting on 2002-11-30 00:00:00
Splitting on 1998-10-31 00:00:00
Splitting on 1989-08-31 00:00:00
Splitting on 1998-11-30 00:00:00
Splitting on 1998-12-31 00:00:00
Splitting on 1989-09-30 00:00:00
Splitting on 2002-12-31 00:00:00
Splitting 

Splitting on 2015-09-30 00:00:00
Splitting on 2015-10-31 00:00:00
Splitting on 2015-11-30 00:00:00
Splitting on 2015-12-31 00:00:00
Splitting on 2002-04-30 00:00:00
Splitting on 2016-01-31 00:00:00
Splitting on 2006-02-28 00:00:00
Splitting on 2016-02-29 00:00:00
Splitting on 2016-03-31 00:00:00
Splitting on 2016-04-30 00:00:00
Splitting on 2016-05-31 00:00:00
Splitting on 2016-06-30 00:00:00
Splitting on 2006-03-31 00:00:00
Splitting on 2016-07-31 00:00:00
Splitting on 2002-05-31 00:00:00
Splitting on 2016-08-31 00:00:00
Splitting on 2016-09-30 00:00:00
Splitting on 2016-10-31 00:00:00
Splitting on 2016-11-30 00:00:00
Splitting on 2006-04-30 00:00:00
Splitting on 2002-06-30 00:00:00
Splitting on 2002-07-31 00:00:00
Splitting on 2016-12-31 00:00:00
Splitting on 2002-08-31 00:00:00
Splitting on 2006-05-31 00:00:00
Splitting on 2017-01-31 00:00:00
Splitting on 2017-02-28 00:00:00
Splitting on 2006-06-30 00:00:00
Splitting on 2017-03-31 00:00:00
Splitting on 2017-04-30 00:00:00
Splitting 

In [27]:
plot = TimePlot(title = 'OOS R2 (Direct Replication of Plot)', legendLayout=LegendLayout.HORIZONTAL,\
                      legendPosition=LegendPosition(position=LegendPosition.Position.TOP),\
                    initWidth = 1000)
plot.add(Line(displayName = 'OOS Performance', \
              x = forecast_frame['datadate'],\
              y = forecast_frame['OOS By Split Date']))

## A More Natural Recursive Updating Procedure

In [47]:
# Now do the expanding and rolling forecasts

def calculate_forecast_from_extracted_factor(data):    
    yesterday = data[:-1]
    today = data.tail(1)
    reg = sm.OLS(yesterday['Next Month Return'], yesterday[['Factor', 'Intercept']]).fit()
    return reg.predict(today[['Factor', 'Intercept']])[0]

def rolling_forecast_on_date(dataframe, target_date, window, min_periods): 
    valid_dates = dataframe.loc[(dataframe['datadate'] <= target_date) & (dataframe['datadate'] >= target_date - pd.Timedelta(window, 'M'))]
    if valid_dates.shape[0] < min_periods:
        return np.nan
    return calculate_forecast_from_extracted_factor(valid_dates)

def expanding_forecast_on_date(dataframe, target_date, min_periods):
    valid_dates = dataframe.loc[(dataframe['datadate'] <= target_date)]
    if valid_dates.shape[0] < min_periods:
        return np.nan
    return calculate_forecast_from_extracted_factor(valid_dates)

In [48]:
with Pool(4) as p:
    forecast_frame['Rolling Forecast'] = p.map(lambda t: rolling_forecast_on_date(forecast_frame, t, ROLL_WINDOW * 12, MIN_WINDOW * 12), date_sequence)

In [49]:
with Pool(4) as p:
    forecast_frame['Expanding Forecast'] = p.map(lambda t: expanding_forecast_on_date(forecast_frame, t, MIN_WINDOW * 12), date_sequence)

In [40]:
forecast_frame['Average So Far'] = forecast_frame['Next Month Return'].shift(1).expanding(MIN_WINDOW * 12).mean()

In [41]:
forecast_frame[['Rolling Forecast', 'Expanding Forecast', 'Average So Far', 'Next Month Return']].corr()

In [42]:
forecast_frame['Expanding Forecast Error'] = (forecast_frame['Next Month Return'] - forecast_frame['Expanding Forecast']) ** 2
forecast_frame['Rolling Forecast Error'] = (forecast_frame['Next Month Return'] - forecast_frame['Rolling Forecast']) ** 2
forecast_frame['Sample Mean Error'] = (forecast_frame['Next Month Return'] - forecast_frame['Average So Far']) ** 2

In [43]:
plot = TimePlot(title = 'Expected Returns vs Actual', legendLayout=LegendLayout.HORIZONTAL,\
                      legendPosition=LegendPosition(position=LegendPosition.Position.TOP),\
                    initWidth = 1000)
plot.add(Line(displayName = 'Expanding Forecast', \
              x = forecast_frame['datadate'],\
              y = forecast_frame['Expanding Forecast']))
plot.add(Line(displayName = 'Rolling Forecast', \
              x = forecast_frame['datadate'],\
              y = forecast_frame['Rolling Forecast']))

In [44]:
forecast_frame['Rolling 3PRF MSE'] = forecast_frame['Rolling Forecast Error'].rolling(MIN_WINDOW * 12).mean()
forecast_frame['Expanding 3PRF MSE'] = forecast_frame['Expanding Forecast Error'].rolling(MIN_WINDOW * 12).mean()
forecast_frame['Rolling Sample Mean MSE'] = forecast_frame['Sample Mean Error'].rolling(MIN_WINDOW * 12).mean()
forecast_frame['Expanding Sample Mean MSE'] = forecast_frame['Sample Mean Error'].rolling(MIN_WINDOW * 12).mean()

In [45]:
forecast_frame['OOS R Squared (Expanding)'] = 1 - forecast_frame['Expanding 3PRF MSE'] / forecast_frame['Expanding Sample Mean MSE']
forecast_frame['OOS R Squared (Rolling)'] = 1 - forecast_frame['Rolling 3PRF MSE'] / forecast_frame['Rolling Sample Mean MSE']

In [46]:
plot = TimePlot(title = 'OOS R2', legendLayout=LegendLayout.HORIZONTAL,\
                      legendPosition=LegendPosition(position=LegendPosition.Position.TOP),\
                    initWidth = 1000)
plot.add(Line(displayName = 'OOS Performance (Expanding)', \
              x = forecast_frame['datadate'],\
              y = forecast_frame['OOS R Squared (Expanding)']))
plot.add(Line(displayName = 'OOS Performance (Rolling)', \
              x = forecast_frame['datadate'],\
              y = forecast_frame['OOS R Squared (Rolling)']))