# Paper Data Prep

Factor data:
- 25 size-value FF portfolios 
- FF Industry portfolios
- the 6 portfolios which form SMB and HML
    - sourced from Ken French's website
- DEF (difference between the return on long-term corporate bonds and long-term government bonds)
- TERM (difference between the return on 30 year government bonds and the short-term rate)
    - both sourced Ibbotson
    - possible sub for corp bond returns (https://fred.stlouisfed.org/series/BAMLCC8A015PYTRIV)
    - CRSP 20 or 30 year bond return, short term as 30 or 90 day
- DEFY (default yield spread: Moody's BAA and AAA yield spread)
    - Amit Goyal or FRED (https://fred.stlouisfed.org/series/BAA & https://fred.stlouisfed.org/series/AAA)
- TERMY (term yield spread: 10 year and 1 year Treasury spread)
    - FRED (https://fred.stlouisfed.org/series/DGS10 & https://fred.stlouisfed.org/series/DGS1)
- RF (30 day T-bill rate)
    - CRSP
- Market portfolio (value-weighted NYSE)
- GDP (seasonally adjusted)
    - FRED (https://fred.stlouisfed.org/series/GDPC1)

Other required data:
- ISM Manufacturing Index
    - sourced from Bloomberg
- FF5 factors + Momentum
    - Ken French

In [2]:
import pandas as pd
import numpy as np
from pandas.tseries.offsets import *
import pandas_datareader.data as pdr

## FRED Data

In [12]:
fred_data = pdr.DataReader(['BAA', 'AAA', 'DGS10' ,'DGS1', 'BAMLCC8A015PYTRIV'],'fred', start='1947-01-01')
fred_data = fred_data.rename(columns={'DGS10':'10 year' ,'DGS1':'1 year', 'BAMLCC8A015PYTRIV':'corp_bond_return'})
fred_data.index = fred_data.index + MonthEnd(0)
fred_data.head()

Unnamed: 0_level_0,BAA,AAA,10 year,1 year,corp_bond_return
DATE,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
1947-01-31,3.13,2.57,,,
1947-02-28,3.12,2.55,,,
1947-03-31,3.15,2.55,,,
1947-04-30,3.16,2.53,,,
1947-05-31,3.17,2.53,,,


#### GDP

In [15]:
gdp_data = pdr.DataReader(['GDPC1'],'fred', start='1947-01-01')
gdp_data.index = gdp_data.index + MonthEnd(0)
gdp_data.head()

Unnamed: 0_level_0,GDPC1
DATE,Unnamed: 1_level_1
1947-01-31,2034.45
1947-04-30,2029.024
1947-07-31,2024.834
1947-10-31,2056.508
1948-01-31,2087.442


## Ken French Data

In [7]:
from pandas_datareader.famafrench import get_available_datasets
# get_available_datasets()

In [46]:
def get_ff_data():
    series = ['F-F_Research_Data_5_Factors_2x3', '25_Portfolios_5x5', '6_Portfolios_2x3', 
                          'F-F_Momentum_Factor', '30_Industry_Portfolios']
    
    dataframes = [pdr.DataReader(data,'famafrench', start='1925-01-01')[0] for data in series]
    
    return pd.concat(dataframes, axis=1).sort_values(by='Date')

In [47]:
ff_data = get_ff_data()

In [60]:
ff_data['test_SMB'] = 0.5*(ff_data['SMALL LoBM'])

Unnamed: 0_level_0,Mkt-RF,SMB,HML,RMW,CMA,RF,SMALL LoBM,ME1 BM2,ME1 BM3,ME1 BM4,...,Telcm,Servs,BusEq,Paper,Trans,Whlsl,Rtail,Meals,Fin,Other
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
1926-07,,,,,,,5.8248,-1.7006,0.4875,-1.4580,...,0.83,9.22,2.06,7.70,1.91,-23.79,0.07,1.87,-0.02,5.20
1926-08,,,,,,,-2.0206,-8.0282,1.3796,1.4606,...,2.17,2.02,4.39,-2.38,4.85,5.39,-0.75,-0.13,4.47,6.76
1926-09,,,,,,,-4.8291,-2.6154,-4.3417,-3.2729,...,2.41,2.25,0.19,-5.54,0.07,-7.87,0.25,-0.56,-1.61,-3.86
1926-10,,,,,,,-9.3729,-3.5519,-3.4948,3.4413,...,-0.11,-2.00,-1.09,-5.08,-2.61,-15.38,-2.20,-4.11,-5.51,-8.49
1926-11,,,,,,,5.5888,4.1877,2.4623,-4.4494,...,1.63,3.77,3.64,3.84,1.61,4.67,6.52,4.33,2.34,4.00
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2022-09,-9.35,-0.97,0.06,-1.51,-0.84,0.19,-13.1838,-9.2149,-7.2748,-11.4204,...,-13.94,-11.07,-11.54,-13.27,-14.24,-9.46,-7.67,-6.26,-7.73,-6.40
2022-10,7.83,1.86,8.05,3.07,6.52,0.23,3.2654,5.1384,8.7991,9.0880,...,10.94,1.99,8.97,10.02,6.68,13.65,1.94,10.26,12.80,11.25
2022-11,4.60,-2.67,1.38,6.01,3.11,0.29,-5.6998,-3.0675,0.6161,0.9805,...,2.32,5.66,4.93,6.96,10.32,5.05,2.95,5.65,4.75,6.54
2022-12,-6.41,-0.16,1.32,0.09,4.19,0.33,-6.4639,-4.6920,-5.2707,-4.8792,...,-6.76,-6.68,-9.07,-4.09,-7.66,-5.63,-8.97,-6.82,-5.49,-3.06


## CRSP Data

In [None]:
crsp_data = pd.read_csv("crsp_data.csv", index_col='MthCalDt', parse_dates=True)
crsp_data = crsp_data.pivot(columns="IndNm", values='COL1')

## ISM