# Module 5: Regime Prediction with Machine Learning - Part 1

In this part, we are going to explain the problem and get familiar with the dataset we are going to use.

## Table of Contents:
&nbsp;&nbsp;1. [Problem Description and Related Work](#1)


&nbsp;&nbsp;2. [Understand Data](#2)   

&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;2.0 [Set Up Environment](#2.0)

&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;2.1 [Read Data and Description of the Variables](#2.1)

&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;2.2 [Explanatory Data Anlaysis](#2.2)


&nbsp;&nbsp;3. [References](#3) 

## 1. Problem Description and Related Work <a id="1"></a>

Business cycle describes the rise and fall in the growth of the economy that occurs over time. Each business cyle has two turning points trough (or bottom) and peak. Expansion is measured from the trough of the previous business cycle to the peak of the current cycle, while recession is measured from the peak to the trough. A representation of business cycles is shown in __[figure](https://courses.lumenlearning.com/baycollege-introbusiness/chapter/reading-the-business-cycle-definition-and-phases/)__ below. In the United States (US), the Business Cycle Dating Committee of the National Bureau of Economic Research (NBER) determines the dates for business cycles. The turning points are determined by considering monthly growth indicators of the economy such as industrial production, employment, real income. The main focus of business cycle analysis is to analyze why economy goes through contraction and expansion periods. It is a well-studied topic in the literature and still an active reserach area. In our work, we are going to predict recessions in US economy with leading macroeconomic indicators using machine learning algorithms.

<center><img src="image2.png" align="center"  ></center>

Predicting business cycle turning points, especially economic recessions, is of great importance to investors, households, businesses in the economy.
Starting from &nbsp;[Mitchell and Burns (1938)](#a) pioneering work,  analyzing indicators of business cycles have become a core research area in business cycle analysis and there has been many work done in that field since then. Much of the work in the literature has indicated a wide range of economic and financial variables contain predictive information about future recessions in the economy.&nbsp;[Stock and Watson (1989)](#b) established coincedent and leading economic indicators for recession forecasting. &nbsp;[Estrella and Mishkin (1998)](#c) have documented predictive power of the slope of the term structure of Treasury yields and stock market for US recessions. 
&nbsp;[Liu and Moench (2016)](#f) have also showed the Treasury term spread has the highest predictive power with lagged abservations as well. Also, they indicated that balances in broker-dealer margin accounts have significant effect on recession predictions, especially in longer horizons.
&nbsp;[Chionis et al. (2009)](#d) worked on forecasting recessions in Europe with European Union (EU) data. They found out that the yield curve augmented with the composite stock index has significant predictive
power in terms of the EU real output.
&nbsp;[Ng (2014)](#e) worked on the problem through exploring effectiveness of boosting and found out interest rate measures and certain employment variables have predictive power for the recessions.
Recently, &nbsp;[Huang et al. (2018)](#g) have worked on predictive power of news sentiment analysis on recession forecasting in US economy. 



## 2. Understand Data <a id="2"></a>

### 2.0 Set Up Environment <a id="2.0"></a>

In [1]:
# load libraries
import pandas as pd 
import numpy as np
import matplotlib.pyplot as plt 
import warnings
warnings.filterwarnings('ignore')

### 2.1 Read Data and Description of the Variables <a id="2.1"></a>

For our analysis we will use a large macroeconomic database from FRED St. Louis designed by [McCracken and Ng (2015)](#i). It involves 129 macroeconomic monthly time series over the period 1959-2018. The data is organized into 8 categories (1)output and income, (2)labor market, (3)housing, (4)consumption, orders and inventories, (5)money and credit, (6)interest and exchange rates, (7)prices and (8)stock market. Detail description of the variables under each category can be found in __[appendix]( https://s3.amazonaws.com/files.fred.stlouisfed.org/fred-md/Appendix_Tables_Update.pdf)__.

In [2]:
bigmacro = pd.read_csv("Macroeconomic_Variables.csv")
bigmacro = bigmacro.rename(columns={'sasdate': 'Date'})
bigmacro.head()

Unnamed: 0,Date,RPI,W875RX1,DPCERA3M086SBEA,CMRMTSPLx,RETAILx,INDPRO,IPFPNSS,IPFINAL,IPCONGD,...,DSERRG3M086SBEA,CES0600000008,CES2000000008,CES3000000008,UMCSENTx,MZMSL,DTCOLNVHFNM,DTCTHFNM,INVEST,VXOCLSx
0,1/1/59,2437.296,2288.8,17.302,292258.8329,18235.77392,22.6248,23.4555,22.1893,32.4027,...,11.358,2.13,2.45,2.04,,274.9,6476.0,12298.0,84.2043,
1,2/1/59,2446.902,2297.0,17.482,294429.5453,18369.56308,23.0679,23.772,22.3816,32.6404,...,11.375,2.14,2.46,2.05,,276.0,6476.0,12298.0,83.528,
2,3/1/59,2462.689,2314.0,17.647,293425.3813,18523.05762,23.4002,23.9159,22.4914,32.6404,...,11.395,2.15,2.45,2.07,,277.4,6508.0,12349.0,81.6405,
3,4/1/59,2478.744,2330.3,17.584,299331.6505,18534.466,23.8987,24.2613,22.821,33.1553,...,11.436,2.16,2.47,2.08,,278.1,6620.0,12484.0,81.8099,
4,5/1/59,2493.228,2345.8,17.796,301372.9597,18679.66354,24.2587,24.4628,23.0407,33.3137,...,11.454,2.17,2.48,2.08,95.3,280.1,6753.0,12646.0,80.7315,


Consistent with the previous works in the literature, we use __[business cycle dating chronology provided by NBER](http://www.nber.org/cycles.html)__  which involves dates when recession began and ended in US economy. According to NBER's statistics we have 8 recession periods in our dataset where duration is changing from 6 to 18 months. We represent regimes as "Normal" and "Recession" in our dataset. 

In [3]:
Recession_periods = pd.read_csv('Recession_Periods.csv')
bigmacro.insert(loc=1, column="Regime", value=Recession_periods['Regime'].values)
bigmacro.head()

Unnamed: 0,Date,Regime,RPI,W875RX1,DPCERA3M086SBEA,CMRMTSPLx,RETAILx,INDPRO,IPFPNSS,IPFINAL,...,DSERRG3M086SBEA,CES0600000008,CES2000000008,CES3000000008,UMCSENTx,MZMSL,DTCOLNVHFNM,DTCTHFNM,INVEST,VXOCLSx
0,1/1/59,Normal,2437.296,2288.8,17.302,292258.8329,18235.77392,22.6248,23.4555,22.1893,...,11.358,2.13,2.45,2.04,,274.9,6476.0,12298.0,84.2043,
1,2/1/59,Normal,2446.902,2297.0,17.482,294429.5453,18369.56308,23.0679,23.772,22.3816,...,11.375,2.14,2.46,2.05,,276.0,6476.0,12298.0,83.528,
2,3/1/59,Normal,2462.689,2314.0,17.647,293425.3813,18523.05762,23.4002,23.9159,22.4914,...,11.395,2.15,2.45,2.07,,277.4,6508.0,12349.0,81.6405,
3,4/1/59,Normal,2478.744,2330.3,17.584,299331.6505,18534.466,23.8987,24.2613,22.821,...,11.436,2.16,2.47,2.08,,278.1,6620.0,12484.0,81.8099,
4,5/1/59,Normal,2493.228,2345.8,17.796,301372.9597,18679.66354,24.2587,24.4628,23.0407,...,11.454,2.17,2.48,2.08,95.3,280.1,6753.0,12646.0,80.7315,


In [4]:
bigmacro[["Date", "Regime"]].groupby("Regime").count()

Unnamed: 0_level_0,Date
Regime,Unnamed: 1_level_1
Normal,628
Recession,93


### 2.2 Explanatory Data Analysis <a id="2.2"></a>

Let's look at performance of stock market and economy over normal and recession periods. Below you can see the performance of SP500 and GDP over the period 1959-2018. Gray areas indicate recession periods in our dataset.

<img src="plots.png" />

## 3. References <a id="3"></a>

1. **W. Mitchell and A. Burns.** "Statistical Indicators of Cyclical Revivals", _National Bureau of Economic Research,_ 1938. <a class="anchor" id="a"></a>

2. **J. H. Stock and M. W. Watson.** "New Indexes of Coincedent and Leading Economic Indicators", _NBER Macroeconomics Annual,_ 1989. <a class="anchor" id="b"></a> 

3. **A. Estrella and F. S. Mishkin.** "Predicting U.S. Recessions: Financial Variables as Leading Indicators", _Review of Economics and Statistics,_ 1998. <a class="anchor" id="c"></a>

4. **D. Chionis, P. Gogas and I. Pragidis.** "Predicting Euroepan Union Recessions in the Euro Era: the Yield Curve as a Forecasting Tool of Economic Activity", _International Advances in Economic Research,_ 2009. <a class="anchor" id="d"></a>

5. **S. Ng.** "Viewpoint: Boosting Recessions", _Canadian Journal of Economics,_ 2014. <a class="anchor" id="e"></a>

6. **W. Liu and E. Moench** "What Predicts US Recessions?", _International Journal of Forecasting,_ 2016. <a class="anchor" id="f"></a>

7. **M. Y. Huang, R. R. Rojas and P. D. Convery.** " News Sentiment as Leading Indicators for Recessions", _arXiv,_ 2018. <a id="g"></a>

8. **M. McCracken and S. Ng** "__[FRED-MD: A Monthly Database for Macroeconomic Research](https://research.stlouisfed.org/econ/mccracken/fred-databases/)__", _Working Paper,_ 2015.  <a class="anchor" id="i"></a>


