# Paper Data Prep

Factor data:
- 25 size-value FF portfolios 
- FF Industry portfolios
- the 6 portfolios which form SMB and HML
    - sourced from Ken French's website
- DEF (difference between the return on long-term corporate bonds and long-term government bonds)
- TERM (difference between the return on 30 year government bonds and the short-term rate)
    - both sourced Ibbotson
    - possible sub for corp bond returns (https://fred.stlouisfed.org/series/BAMLCC8A015PYTRIV)
    - CRSP 20 or 30 year bond return, short term as 30 or 90 day
- DEFY (default yield spread: Moody's BAA and AAA yield spread)
    - Amit Goyal or FRED (https://fred.stlouisfed.org/series/BAA & https://fred.stlouisfed.org/series/AAA)
- TERMY (term yield spread: 10 year and 1 year Treasury spread)
    - FRED (https://fred.stlouisfed.org/series/DGS10 & https://fred.stlouisfed.org/series/DGS1)
- RF (30 day T-bill rate)
    - CRSP
- Market portfolio (value-weighted NYSE)
- GDP (seasonally adjusted)
    - FRED (https://fred.stlouisfed.org/series/GDPC1)

Other required data:
- ISM Manufacturing Index
    - sourced from Bloomberg
- FF5 factors + Momentum
    - Ken French

In [2]:
import pandas as pd
import numpy as np
from pandas.tseries.offsets import *
import pandas_datareader.data as pdr

## FRED Data

In [12]:
fred_data = pdr.DataReader(['BAA', 'AAA', 'DGS10' ,'DGS1', 'BAMLCC8A015PYTRIV'],'fred', start='1947-01-01')
fred_data = fred_data.rename(columns={'DGS10':'10 year' ,'DGS1':'1 year', 'BAMLCC8A015PYTRIV':'corp_bond_return'})
fred_data.index = fred_data.index + MonthEnd(0)
fred_data.head()

Unnamed: 0_level_0,BAA,AAA,10 year,1 year,corp_bond_return
DATE,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
1947-01-31,3.13,2.57,,,
1947-02-28,3.12,2.55,,,
1947-03-31,3.15,2.55,,,
1947-04-30,3.16,2.53,,,
1947-05-31,3.17,2.53,,,


#### GDP

In [15]:
gdp_data = pdr.DataReader(['GDPC1'],'fred', start='1947-01-01')
gdp_data.index = gdp_data.index + MonthEnd(0)
gdp_data.head()

Unnamed: 0_level_0,GDPC1
DATE,Unnamed: 1_level_1
1947-01-31,2034.45
1947-04-30,2029.024
1947-07-31,2024.834
1947-10-31,2056.508
1948-01-31,2087.442


## Ken French Data

In [7]:
from pandas_datareader.famafrench import get_available_datasets
# get_available_datasets()

In [71]:
def get_ff_data():
    series = ['F-F_Research_Data_5_Factors_2x3', '25_Portfolios_5x5', 
                          'F-F_Momentum_Factor', '30_Industry_Portfolios']
    
    dataframes = [pdr.DataReader(data,'famafrench', start='1925-01-01')[0] for data in series]
    
    return pd.concat(dataframes, axis=1).sort_values(by='Date')

In [72]:
ff_data = get_ff_data()

In [79]:
ff_data.index

PeriodIndex(['1926-07', '1926-08', '1926-09', '1926-10', '1926-11', '1926-12',
             '1927-01', '1927-02', '1927-03', '1927-04',
             ...
             '2022-04', '2022-05', '2022-06', '2022-07', '2022-08', '2022-09',
             '2022-10', '2022-11', '2022-12', '2023-01'],
            dtype='period[M]', name='Date', length=1159)

## CRSP Data

In [None]:
crsp_data = pd.read_csv("crsp_data.csv", index_col='MthCalDt', parse_dates=True)
crsp_data = crsp_data.pivot(columns="IndNm", values='COL1')

## ISM