# Recession Predictor

Introduction
Currently in the longest expansion ever. All the talk in the news about possible recession in 2020, the recent treasury interest inversion, market corrections in 2018 & 2019, and generally low investor sentiment, etc. Recessions happen every 10 years or so, are we overdue for one?

Explain other methodologies used to predict recession.
*Federal Reserve
*Rabobank
*Blah
*Blah

Explain what I'm trying to do, not only to look at recession as a binary as NBER describes it, but also to predict future GDP in the [] futre terms.

For decision variable, we will use GDP 

We are not trying to assign causality, as 

The following factors are considered when selecting data indicators:
* Having sufficient data, as recessions are rare occurances.
* Having forward predictive power, which means inflection changes should lead or occur simultaneously as inflection changes in GDP.
* Represent various aspects and mechanisms of the economy, such as a risk free benchmark, or inflation.
* Avoid composite models such as the Leading Economic Indicators (Conference Board). The goal is to construct our own model.

For equities, the following indicators are considered:
* [S&P 500](https://finance.yahoo.com/quote/%5EGSPC?p=^GSPC)- Weighted stock index of 500 companies listed on US exchanges with the largest market cap.
* [S&P 500 PE Ratio](https://www.quandl.com/data/MULTPL/SHILLER_PE_RATIO_MONTH-Shiller-PE-Ratio-by-Month)- Opt for Cyclycally Adjusted PE Ratio instead.
* [**S&P 500 Cyclically Adjusted PE Ratio**](https://www.investopedia.com/terms/c/cape-ratio.asp) [(info)](https://www.quandl.com/data/MULTPL/SHILLER_PE_RATIO_MONTH-Shiller-PE-Ratio-by-Month)- Normalizes PE ratio fluctuations over 10 year inflation-adjusted earnings. This indicator is used to gauge whether the equities market is over or under-valued.
* [S&P 500 RE Ratio](https://www.quandl.com/data/MULTPL/SP500_PSR_QUARTER-S-P-500-Price-to-Sales-Ratio-by-Quarter)- Profit can be manipulated through operations or accounting, or changes in tax laws, such as Trump's tax cut. Ratio, however, cannot.
* [Nonfinancial Corporate Debt as Percentage of Equity](https://fred.stlouisfed.org/series/NCBCMDPMVCE)- Financial firms tend to be more levered than nonfinancial firms.

For bonds, only government bonds are considered, as it provides "risk free" benchmark. The following indicators are considered: 
* [10 Year Constant Maturity Minus 3 Month Treasuries Yield Spread](https://fred.stlouisfed.org/series/T10Y3M)- Federal Reserve's main methodology. Financial institutions borrow at low rates short term, to lend at high rates long term. High spread means  The data only goes back to 1982, which is not enough to train this model.
* [**10 Year Constant Maturity Minus 3 Month Treasuries Secondary Market Yield Spread**](https://fred.stlouisfed.org/series/TB3MS)- Makes up for the lack of data in the primary market. Primary and secondary market spreads are very close today, due to electronic trading. However, the spread has been higher historically. This discrepancy is mostly caused by information inefficiency. Since this model relies on multiple input indicators, this flaw can be overlooked. 
* [10 Year Treasuries Constant Maturity Rate](https://fred.stlouisfed.org/series/DGS10)- Long term treasuries usually reflect investor sentiment regarding long term economic growth, with higher yield . This does not provide strong evidence of liquidity.
* 3 Months Outstanding Repo [(info)](http://law.emory.edu/ecgar/content/volume-5/issue-2/essays/repo-recession-financial-regulation.html)- Hard to find on the web. Before the Great Recession, investment banks used short term repo to inject liquidity to stay afloat. Uptick in short term repo may indicate credit crunch. FRED only has records of contracts with itself as a participant, leaving out the majority of transactions.

For inflation, the following indicators are considered:
* [**Consumer Price Index for Urban Consumers**](https://fred.stlouisfed.org/series/SUUR0000SA0) [(info)](https://www.bls.gov/opub/btn/volume-3/why-does-bls-provide-both-the-cpi-w-and-cpi-u.htm)- Price inflation covering 88% of Americans, calculated from expenditures. Uses a survey to calculate the basket of goods and services. Basket is updated every 2 years. Selected due to difficulty quantifying intangible traits such as quality.
* [Chained Consumper Price Index for Urban Consumers](https://fred.stlouisfed.org/series/SUUR0000SA0) [(info)](https://www.brookings.edu/blog/up-front/2017/12/07/the-hutchins-center-explains-the-chained-cpi/)- Similar to CPI-U, but considers substitution purchases, and weights the changes every month.

For employment, the following indicators are considered:
* [Labor Participation Rate](https://fred.stlouisfed.org/series/CIVPART)- Percent of population over 16 actively seeking or engaged in employment. Too macro driven, such as women participation in the workforce, or retirement of Baby Boomers.
* [**Total Nonfarm Payroll**](https://fred.stlouisfed.org/series/PAYEMS)- Accounts for 80% of workers who contribute to GDP, excluding propritors, unpaid volunteers, or farm workers.
* [Wage Growth](https://fred.stlouisfed.org/series/CES0500000003)- Lack of long term data.

Lastly, the following indicators are not placed in any categories above, but are included because of their predictive power.
* [**Private Investment as Percent of GDP**](https://fred.stlouisfed.org/series/PNFI)- Investment represents expenditure on capital goods and residential properties. Provides an indicator for future productivity and GDP growth. Also a strong sign of economic recovery.

Factors dismissed:

The following indicators were considered, but ultimately dismissed.
* Consumer Confidence Index- Locked behind paywall. Survey of consumer purchases and sentiments.
* [VIX Volativity Index](https://fred.stlouisfed.org/series/VIXCLS)- Describes the volatility, not the direction, of stocks. Also does not describe economic strength.
* [Effective Federal Funds Rate](https://fred.stlouisfed.org/series/FEDFUNDS)- Not an organic indicator of market conditions. Different FED chairs have different doctrines. The FED was fighting inflation in the 70s, deregulation in the 80s, inflation targeting in the 90s & 2000s, then quantitative easing in the 2010s. 
* Change in working hours- Symptom, not sign, of economic strength.
* [Personal Savings Rate](https://fred.stlouisfed.org/series/PSAVERT)- Too macro driven, such as women entering workforce, or retirement of Baby Boomers.
* Incremental Capital Outputs Ratio- Hard to find on the web. Calculates how much additional capital investment is needed to create growth.

## Data Preparation

First, import the necessary packages.

In [1]:
import fredapi
import requests
import pandas as pd
import json

Explain data source: Federal Reserve Economic Research (FRED), and Quandl

#### FRED Data

In [2]:
f = fredapi.Fred(api_key='8b91217446b6307d20cb5e4fcfba70eb')
# daily data
f_t10y3m = f.get_series('T10Y3M')
# monthly data
f_cpiu = f.get_series('CPIAUCSL') # assume end of period
# quarterly data
f_gdp = f.get_series('GDP')
f_gdp_growth = f.get_series('A191RL1Q225SBEA')
f_pinvest = f.get_series('PNFI') # assume end of period, need to divide by GDP to get percentage
f_delcon = f.get_series('DRCLACBS') # end of period

We need to convert FRED data from series into dataframe.

In [3]:
f_gdp = f_gdp.to_frame().reset_index()
f_gdp.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 294 entries, 0 to 293
Data columns (total 2 columns):
index    294 non-null datetime64[ns]
0        290 non-null float64
dtypes: datetime64[ns](1), float64(1)
memory usage: 4.7 KB


In [4]:
f_t10y3m = f_t10y3m.to_frame().reset_index()
f_cpiu = f_cpiu.to_frame().reset_index()
f_gdp_growth = f_gdp_growth.to_frame().reset_index()
f_pinvest = f_pinvest.to_frame().reset_index()
f_delcon = f_delcon.to_frame().reset_index()

In [5]:
#data = [f_t10y3m, fred_cpiu]

#### Quandl Data

In [17]:
q_sp500 = pd.DataFrame.from_dict(requests.get('https://www.quandl.com/api/v3/datasets/MULTPL/SP500_REAL_PRICE_MONTH.json?api_key=8ufKe7Y2JMsYPU3CGN7m').json()['dataset']['data'])
q_sp500_pe = pd.DataFrame.from_dict(requests.get('https://www.quandl.com/api/v3/datasets/MULTPL/SHILLER_PE_RATIO_MONTH.json?api_key=8ufKe7Y2JMsYPU3CGN7m').json()['dataset']['data'])
q_sp500_pr = pd.DataFrame.from_dict(requests.get('https://www.quandl.com/api/v3/datasets/MULTPL/SP500_PSR_QUARTER.json?api_key=8ufKe7Y2JMsYPU3CGN7m').json()['dataset']['data'])

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1787 entries, 0 to 1786
Data columns (total 2 columns):
0    1787 non-null object
1    1787 non-null float64
dtypes: float64(1), object(1)
memory usage: 28.0+ KB


We need to convert object type to datetime type.

In [None]:
pe_ratio.iloc[:,0] = pd.to_datetime(pe_ratio.iloc[:,0], format='%Y-%m-%d')
pe_ratio.info()

Since Quandl data is taken at the beginning of the month, we will subtract 1 day to get end of month. Assuming no catastrophic changes should occur in the span of 1 day.

In [None]:
pe_ratio.iloc[:,0] = pe_ratio.iloc[:,0] - pd.to_timedelta(1, unit='day')
pe_ratio.head()

#### Steps
1. get json data and convert to dataframe
2. convert fred series into dataframe
3. convert dates into srptime type
4. convert data to same timeframe (monthly)
    *daily to monthly
    *quarterly to monthly, fill in missing data
5. find reasonable start and end time
6. remove dates with null data
7. join all data into same dataframe

Misc.
* table of contents & navigation

Code Dump
'''to view json'''
print(json.dumps(q_pr_ratio, indent=2))

'''convert json into DataFrame
q_pe_ratio = pd.DataFrame.from_dict(r['dataset']['data'])

In [None]:
# find oldest date
# find newest date

# fred_t10y3m.head(1)
# fred_sp500.head(1)
# fred_cpiw.head(1)
# fred_gdp.head(1)
# fred_gdp_growth.head(1)
# fred_pinvest.head(1)
# fred_delcon.head(1)

## Data Analysis

## Findings

## Limitations

This predictor does not pinpoint the cause of a recession, but makes a general comment about the state of the economy. The underlying assumption is that these inputs are efficient enough to reflect long term market conditions, but inefficient enough to price in the short term fluctuations in GDP.

## Closing Thoughts