# Regime Prediction with Machine Learning

*(Phillip Rowe comment):  The Federal Reserve and other Central Banks around the world have taken unprecedented action in providing supportive stimulus since the outbreak of the Covid pandemic. Also, the corrections and bounces in the market appear to occur much more rapidly.  Moreover, the S&P500 and Nasdaq are at near all-time high valuations.  I think it is a fair question, in terms of positioning of investments, "Do recessions matter any more?"*

Based on the database below, which is updated regularly, we are going to build a recession predictor, testing various machine learning models in the process.  The database of monthly macro indicators was downloaded on May 6, 2021, and the most recent entries were for the month of March 2021.

Associated Files for this project
- **Data_Preprocessing_2021.ipynb** - download monthly economic indicators from Federal Reserve and clean the data
- **Forecasting_Regimes_with_updated_data.ipynb** - 
- **Random_Forest_Model_Tuning.ipynb**
- **Scenario_Simulations.ipynb**

References
- **Part_1_Problem_Description_and_Data_Analysis.ipynb** - more detailed background of the data with academic references 

Source Database:

- M. McCracken and S. Ng "FRED-MD: A Monthly Database for Macroeconomic Research", Working Paper, 2015. https://research.stlouisfed.org/econ/mccracken/fred-databases/
- https://s3.amazonaws.com/files.fred.stlouisfed.org/fred-md/monthly/current.csv

## Table of Contents:
&nbsp;&nbsp;1. [Set Up Environment and Read Data](#1)

&nbsp;&nbsp;2. [Data Cleaning](#2)


## 1. Set Up Environment and Read Data <a id="1"></a>

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt 

import requests
import csv
import os

import time
import datetime

# Anaconda has all these packages
from statsmodels.tsa.stattools import adfuller # to check unit root in time series  

# we don't use any of the following in this notebook
from sklearn import preprocessing
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LogisticRegression
from sklearn.feature_selection import SelectFromModel

import seaborn as sns #for correlation heatmap

import warnings
warnings.filterwarnings('ignore')

In [2]:
url = 'https://files.stlouisfed.org/files/htdocs/fred-md/monthly/current.csv'
bigmacro = pd.read_csv(url)
bigmacro = bigmacro.rename(columns={'sasdate':'Date'})
bigmacro = bigmacro.iloc[1:,]
bigmacro = bigmacro.iloc[:-1,]

ts=time.localtime()
day= time.strftime('%Y-%m-%d', ts)
bigmacro.to_csv('current_'+ day + '.csv')
# bigmacro = bigmacro.reset_index()
bigmacro.tail()

Unnamed: 0,Date,RPI,W875RX1,DPCERA3M086SBEA,CMRMTSPLx,RETAILx,INDPRO,IPFPNSS,IPFINAL,IPCONGD,...,DSERRG3M086SBEA,CES0600000008,CES2000000008,CES3000000008,UMCSENTx,MZMSL,DTCOLNVHFNM,DTCTHFNM,INVEST,VXOCLSx
743,11/1/2020,17355.669,14021.3,118.208,1569672.0,542583.0,104.8319,101.1674,100.1821,104.844,...,120.874,25.69,29.55,23.1,76.9,21565.0,350766.1,733096.73,4609.7215,24.8047
744,12/1/2020,17386.005,14000.7,117.115,1566283.0,535972.0,105.8997,102.4056,101.3916,106.6467,...,121.328,25.77,29.64,23.12,80.7,21741.0,350336.43,733463.42,4671.9751,21.6803
745,1/1/2021,19120.289,13984.7,120.698,1616846.0,576466.0,106.8853,103.0375,102.1033,106.6088,...,121.469,25.85,29.69,23.2,79.0,22000.2,354221.61,739961.66,4754.4812,23.7684
746,2/1/2021,17741.109,14024.6,119.198,1564397.0,559893.0,104.0838,101.3872,100.7193,105.6558,...,121.706,25.81,29.69,23.25,76.8,,354467.93,740057.0,4815.6771,21.357
747,3/1/2021,21368.815,14153.1,123.529,,614449.0,105.583,102.2385,101.2736,105.1268,...,122.23,25.94,29.77,23.28,84.9,,,,4887.313,20.7201


In [3]:
Recession_periods = pd.read_excel('Recession_Periods.xlsx')
Recession_periods.tail(20)
regime = Recession_periods['Regime'].values

In [4]:
bigmacro.reindex()
bigmacro.insert(loc=1, column="Regime", value=regime)

In [5]:
bigmacro.tail()
# 130 columns as of 3/1/2021 data

Unnamed: 0,Date,Regime,RPI,W875RX1,DPCERA3M086SBEA,CMRMTSPLx,RETAILx,INDPRO,IPFPNSS,IPFINAL,...,DSERRG3M086SBEA,CES0600000008,CES2000000008,CES3000000008,UMCSENTx,MZMSL,DTCOLNVHFNM,DTCTHFNM,INVEST,VXOCLSx
743,11/1/2020,Recession,17355.669,14021.3,118.208,1569672.0,542583.0,104.8319,101.1674,100.1821,...,120.874,25.69,29.55,23.1,76.9,21565.0,350766.1,733096.73,4609.7215,24.8047
744,12/1/2020,Recession,17386.005,14000.7,117.115,1566283.0,535972.0,105.8997,102.4056,101.3916,...,121.328,25.77,29.64,23.12,80.7,21741.0,350336.43,733463.42,4671.9751,21.6803
745,1/1/2021,Recession,19120.289,13984.7,120.698,1616846.0,576466.0,106.8853,103.0375,102.1033,...,121.469,25.85,29.69,23.2,79.0,22000.2,354221.61,739961.66,4754.4812,23.7684
746,2/1/2021,Recession,17741.109,14024.6,119.198,1564397.0,559893.0,104.0838,101.3872,100.7193,...,121.706,25.81,29.69,23.25,76.8,,354467.93,740057.0,4815.6771,21.357
747,3/1/2021,Recession,21368.815,14153.1,123.529,,614449.0,105.583,102.2385,101.2736,...,122.23,25.94,29.77,23.28,84.9,,,,4887.313,20.7201


## 2. Data Cleaning <a id="2"></a>

We will follow the steps below to clean data and make it ready for feature selection process.

1. Remove the variables with missing observations
2. Add lags of the variables as additional features
3. Test stationarity of time series
4. Standardize the dataset

In [6]:
# remove columns with missing observations
missing_colnames = []
print(bigmacro.shape) # 747 rows x 130 columns before removal of missing columns

for col_name in bigmacro.drop(['Date','Regime'], axis=1):
    observations = len(bigmacro) - bigmacro[col_name].count()
    if (observations > 10):
        print(col_name + ':' + str(observations))
        missing_colnames.append(col_name)

bigmacro = bigmacro.drop(labels=missing_colnames, axis=1)
#  there are a few rows with missing values but they are at the end of the dataset, so there are no missing months
#  in dataset; 59 years and 10 months, starting 1/1/1959, ending 10/2018, or 718 months
bigmacro = bigmacro.dropna(axis=0)

print(bigmacro.shape)  # 120 columns after dropping missing columns; trims a few rows off bottom due to na 
bigmacro.head()

(747, 130)
PERMIT:12
PERMITNE:12
PERMITMW:12
PERMITS:12
PERMITW:12
ACOGNO:398
ANDENOx:109
TWEXAFEGSMTHx:168
UMCSENTx:154
VXOCLSx:42
(744, 120)


Unnamed: 0,Date,Regime,RPI,W875RX1,DPCERA3M086SBEA,CMRMTSPLx,RETAILx,INDPRO,IPFPNSS,IPFINAL,...,DDURRG3M086SBEA,DNDGRG3M086SBEA,DSERRG3M086SBEA,CES0600000008,CES2000000008,CES3000000008,MZMSL,DTCOLNVHFNM,DTCTHFNM,INVEST
1,1/1/1959,Normal,2437.296,2288.8,17.302,292258.8329,18235.77392,22.625,23.4581,22.1904,...,56.918,17.791,11.358,2.13,2.45,2.04,274.9,6476.0,12298.0,84.2043
2,2/1/1959,Normal,2446.902,2297.0,17.482,294429.5453,18369.56308,23.0681,23.7747,22.3827,...,56.951,17.798,11.375,2.14,2.46,2.05,276.0,6476.0,12298.0,83.528
3,3/1/1959,Normal,2462.689,2314.0,17.647,293425.3813,18523.05762,23.4004,23.9186,22.4925,...,57.022,17.785,11.395,2.15,2.45,2.07,277.4,6508.0,12349.0,81.6405
4,4/1/1959,Normal,2478.744,2330.3,17.584,299331.6505,18534.466,23.8989,24.2641,22.8221,...,57.08,17.796,11.436,2.16,2.47,2.08,278.1,6620.0,12484.0,81.8099
5,5/1/1959,Normal,2493.228,2345.8,17.796,301372.9597,18679.66354,24.2589,24.4655,23.0418,...,57.175,17.777,11.454,2.17,2.48,2.08,280.1,6753.0,12646.0,80.7315


In [8]:
print(bigmacro.shape)
bigmacro.tail()

(744, 120)


Unnamed: 0,Date,Regime,RPI,W875RX1,DPCERA3M086SBEA,CMRMTSPLx,RETAILx,INDPRO,IPFPNSS,IPFINAL,...,DDURRG3M086SBEA,DNDGRG3M086SBEA,DSERRG3M086SBEA,CES0600000008,CES2000000008,CES3000000008,MZMSL,DTCOLNVHFNM,DTCTHFNM,INVEST
741,9/1/2020,Recession,17698.892,14029.2,118.656,1564146.0,549528.0,102.8028,99.4293,98.785,...,86.611,98.659,120.743,25.49,29.09,23.01,21249.9,347627.43,730734.42,4425.5453
742,10/1/2020,Recession,17573.127,14121.8,118.978,1572500.0,550038.0,103.8958,100.598,99.5556,...,86.51,98.558,120.871,25.58,29.4,22.99,21369.3,348262.68,730398.69,4505.3741
743,11/1/2020,Recession,17355.669,14021.3,118.208,1569672.0,542583.0,104.8319,101.1674,100.1821,...,86.292,98.703,120.874,25.69,29.55,23.1,21565.0,350766.1,733096.73,4609.7215
744,12/1/2020,Recession,17386.005,14000.7,117.115,1566283.0,535972.0,105.8997,102.4056,101.3916,...,86.443,99.143,121.328,25.77,29.64,23.12,21741.0,350336.43,733463.42,4671.9751
745,1/1/2021,Recession,19120.289,13984.7,120.698,1616846.0,576466.0,106.8853,103.0375,102.1033,...,86.474,100.118,121.469,25.85,29.69,23.2,22000.2,354221.61,739961.66,4754.4812


In [9]:
# bigmacro Date ended at 1/1/2021 after cleaning out rows with na's.

# Add lags
for col in bigmacro.drop(['Date', 'Regime'], axis=1):
    for n in [3, 6, 9, 12, 18]:   
        bigmacro['{}_{}M_lag'.format(col, n)] = bigmacro[col].shift(n).ffill().values 

# 1 month ahead prediction
bigmacro["Regime"] = bigmacro["Regime"].shift(-1)

bigmacro = bigmacro.dropna(axis=0)
bigmacro.tail(1)
# now only goes to 12/2020, due to shifting regime back one month
# 710 columns vs. 118 data columns before adding lags, 5x118 => 590+120=710

Unnamed: 0,Date,Regime,RPI,W875RX1,DPCERA3M086SBEA,CMRMTSPLx,RETAILx,INDPRO,IPFPNSS,IPFINAL,...,DTCTHFNM_3M_lag,DTCTHFNM_6M_lag,DTCTHFNM_9M_lag,DTCTHFNM_12M_lag,DTCTHFNM_18M_lag,INVEST_3M_lag,INVEST_6M_lag,INVEST_9M_lag,INVEST_12M_lag,INVEST_18M_lag
744,12/1/2020,Recession,17386.005,14000.7,117.115,1566283.0,535972.0,105.8997,102.4056,101.3916,...,730734.42,720036.02,721487.58,727441.38,725600.7,4425.5453,4175.7501,3864.5748,3821.8232,3631.3909


In [10]:
# 120 columns before, or 118 minus the Date and Regime
# 118 data columns x 5 diffent lags = 590 additional columns; 
# TOTAL # COLUMNS = 590 + 120 = 710
print(bigmacro.shape)
# Only 725 rows is 19 months less than before, because shifts clipped off 18 months from beginning of time series due to 
# largest time lag, and one month off end of time series
bigmacro['Date'].iloc[[0,-1]]


(725, 710)


19      7/1/1960
744    12/1/2020
Name: Date, dtype: object

Augmented Dickey-Fuller Test can be used to test for stationarity in macroeconomic time series variables. We will use `adfuller` function from `statsmodels` module in Python. More information about the function can be found __[here](https://www.statsmodels.org/dev/generated/statsmodels.tsa.stattools.adfuller.html)__.

In [11]:
# check stationarity
from statsmodels.tsa.stattools import adfuller  # to check unit root in time series 
threshold = 0.01  # significance level
for column in bigmacro.drop(['Date', 'Regime'], axis=1):
    result = adfuller(bigmacro[column])
    if result[1] > threshold:
        print(column)
        bigmacro[column] = bigmacro[column].diff()  # replaces values with diff between current and prior row value
bigmacro = bigmacro.dropna(axis=0)

RPI
W875RX1
DPCERA3M086SBEA
CMRMTSPLx
RETAILx
INDPRO
IPFPNSS
IPFINAL
IPCONGD
IPDCONGD
IPNCONGD
IPBUSEQ
IPMAT
IPDMAT
IPNMAT
IPMANSICS
IPB51222S
IPFUELS
CUMFNS
HWI
CLF16OV
CE16OV
UEMPMEAN
UEMPLT5
UEMP5TO14
UEMP15OV
UEMP15T26
UEMP27OV
PAYEMS
USGOOD
CES1021000001
USCONS
MANEMP
DMANEMP
NDMANEMP
SRVPRD
USTPU
USWTRADE
USTRADE
USFIRE
CES0600000007
AWOTMAN
AWHMAN
HOUSTNE
HOUSTMW
AMDMNOx
AMDMUOx
BUSINVx
ISRATIOx
M1SL
M2SL
M2REAL
BOGMBASE
TOTRESNS
NONBORRES
BUSLOANS
REALLN
NONREVSL
CONSPI
S&P 500
S&P: indust
S&P div yield
FEDFUNDS
CP3Mx
TB3MS
TB6MS
GS1
GS5
GS10
AAA
BAA
EXSZUSx
EXJPUSx
EXUSUKx
EXCAUSx
WPSFD49207
WPSFD49502
WPSID61
WPSID62
OILPRICEx
PPICMM
CPIAUCSL
CPIAPPSL
CPITRNSL
CPIMEDSL
CUSR0000SAC
CUSR0000SAD
CUSR0000SAS
CPIULFSL
CUSR0000SA0L2
CUSR0000SA0L5
PCEPI
DDURRG3M086SBEA
DNDGRG3M086SBEA
DSERRG3M086SBEA
CES0600000008
CES2000000008
CES3000000008
MZMSL
DTCOLNVHFNM
DTCTHFNM
INVEST
RPI_3M_lag
RPI_6M_lag
RPI_9M_lag
RPI_12M_lag
RPI_18M_lag
W875RX1_3M_lag
W875RX1_6M_lag
W875RX1_9M_lag
W875RX1

PCEPI_9M_lag
PCEPI_12M_lag
PCEPI_18M_lag
DDURRG3M086SBEA_3M_lag
DDURRG3M086SBEA_6M_lag
DDURRG3M086SBEA_9M_lag
DDURRG3M086SBEA_12M_lag
DDURRG3M086SBEA_18M_lag
DNDGRG3M086SBEA_3M_lag
DNDGRG3M086SBEA_6M_lag
DNDGRG3M086SBEA_9M_lag
DNDGRG3M086SBEA_12M_lag
DNDGRG3M086SBEA_18M_lag
DSERRG3M086SBEA_3M_lag
DSERRG3M086SBEA_6M_lag
DSERRG3M086SBEA_9M_lag
DSERRG3M086SBEA_12M_lag
DSERRG3M086SBEA_18M_lag
CES0600000008_3M_lag
CES0600000008_6M_lag
CES0600000008_9M_lag
CES0600000008_12M_lag
CES0600000008_18M_lag
CES2000000008_3M_lag
CES2000000008_6M_lag
CES2000000008_9M_lag
CES2000000008_12M_lag
CES2000000008_18M_lag
CES3000000008_3M_lag
CES3000000008_6M_lag
CES3000000008_9M_lag
CES3000000008_12M_lag
CES3000000008_18M_lag
MZMSL_3M_lag
MZMSL_6M_lag
MZMSL_9M_lag
MZMSL_12M_lag
MZMSL_18M_lag
DTCOLNVHFNM_3M_lag
DTCOLNVHFNM_6M_lag
DTCOLNVHFNM_9M_lag
DTCOLNVHFNM_12M_lag
DTCOLNVHFNM_18M_lag
DTCTHFNM_3M_lag
DTCTHFNM_6M_lag
DTCTHFNM_9M_lag
DTCTHFNM_12M_lag
DTCTHFNM_18M_lag
INVEST_3M_lag
INVEST_6M_lag
INVEST_9M_lag

In [50]:
threshold=0.01 #significance level
for column in bigmacro.drop(['Date','Regime'], axis=1):
    result=adfuller(bigmacro[column])
    if result[1]>threshold:
        print(column)
        bigmacro[column]=bigmacro[column].diff()
bigmacro=bigmacro.dropna(axis=0)

M1SL
M2SL
REALLN
NONREVSL
CPIAPPSL
CPIMEDSL
CUSR0000SAD
CUSR0000SAS
DDURRG3M086SBEA
DSERRG3M086SBEA
CES0600000008
CES2000000008
CES3000000008
MZMSL
INVEST
IPNCONGD 6M lag
CUMFNS 3M lag
M1SL 3M lag
M1SL 6M lag
M1SL 9M lag
M1SL 12M lag
M1SL 18M lag
M2SL 3M lag
M2SL 6M lag
M2SL 9M lag
M2SL 12M lag
M2SL 18M lag
REALLN 3M lag
REALLN 6M lag
REALLN 9M lag
REALLN 12M lag
REALLN 18M lag
NONREVSL 3M lag
NONREVSL 6M lag
NONREVSL 9M lag
NONREVSL 12M lag
NONREVSL 18M lag
CPIAPPSL 3M lag
CPIAPPSL 6M lag
CPIAPPSL 9M lag
CPIAPPSL 12M lag
CPIAPPSL 18M lag
CPIMEDSL 3M lag
CPIMEDSL 6M lag
CPIMEDSL 9M lag
CPIMEDSL 12M lag
CPIMEDSL 18M lag
CUSR0000SAD 3M lag
CUSR0000SAD 6M lag
CUSR0000SAD 9M lag
CUSR0000SAD 12M lag
CUSR0000SAD 18M lag
CUSR0000SAS 3M lag
CUSR0000SAS 6M lag
CUSR0000SAS 9M lag
CUSR0000SAS 12M lag
CUSR0000SAS 18M lag
PCEPI 9M lag
PCEPI 12M lag
PCEPI 18M lag
DDURRG3M086SBEA 3M lag
DDURRG3M086SBEA 6M lag
DDURRG3M086SBEA 9M lag
DDURRG3M086SBEA 12M lag
DDURRG3M086SBEA 18M lag
DSERRG3M086SBEA 3M la

In [12]:
threshold = 0.01  # significance level
for column in bigmacro.drop(['Date', 'Regime'], axis=1):
    result = adfuller(bigmacro[column])
    if result[1] > threshold:
        print(column)
bigmacro = bigmacro.dropna(axis=0)      
# not sure why we do this three times, but we do get just zero columns that are still 
# nonstationary after this 

M1SL
M2SL
M2REAL
REALLN
NONREVSL
S&P: indust
CPIAPPSL
CPIMEDSL
CUSR0000SAD
DDURRG3M086SBEA
DSERRG3M086SBEA
CES0600000008
CES2000000008
CES3000000008
MZMSL
INVEST
DPCERA3M086SBEA_6M_lag
PAYEMS_6M_lag
SRVPRD_3M_lag
SRVPRD_6M_lag
USTPU_6M_lag
USWTRADE_6M_lag
USTRADE_6M_lag
USFIRE_6M_lag
AWHMAN_6M_lag
M1SL_3M_lag
M1SL_6M_lag
M1SL_9M_lag
M1SL_12M_lag
M1SL_18M_lag
M2SL_3M_lag
M2SL_6M_lag
M2SL_9M_lag
M2SL_12M_lag
M2SL_18M_lag
M2REAL_3M_lag
M2REAL_6M_lag
BUSLOANS_3M_lag
BUSLOANS_6M_lag
REALLN_3M_lag
REALLN_6M_lag
REALLN_9M_lag
REALLN_12M_lag
REALLN_18M_lag
NONREVSL_3M_lag
NONREVSL_6M_lag
NONREVSL_9M_lag
NONREVSL_12M_lag
NONREVSL_18M_lag
S&P: indust_3M_lag
CPIAPPSL_3M_lag
CPIAPPSL_6M_lag
CPIAPPSL_9M_lag
CPIAPPSL_12M_lag
CPIAPPSL_18M_lag
CPIMEDSL_3M_lag
CPIMEDSL_6M_lag
CPIMEDSL_9M_lag
CPIMEDSL_12M_lag
CPIMEDSL_18M_lag
CUSR0000SAD_3M_lag
CUSR0000SAD_6M_lag
CUSR0000SAD_9M_lag
CUSR0000SAD_12M_lag
CUSR0000SAD_18M_lag
CUSR0000SAS_3M_lag
CUSR0000SAS_6M_lag
CUSR0000SAS_9M_lag
CUSR0000SAS_12M_lag
CUSR00

In [52]:
print(bigmacro.shape)   
# Still 710 columns but we have lost two rows because taking .diff() results in one NA after each 
# operation from start of series
bigmacro['Date'].iloc[[0, -1]]


(708, 710)


21     9/1/1960
728    8/1/2019
Name: Date, dtype: object

In [13]:
# Standardize
from sklearn.preprocessing import StandardScaler
features = bigmacro.drop(['Date', 'Regime'], axis=1)
col_names = features.columns

scaler = StandardScaler()
scaler.fit(features)
standardized_features = scaler.transform(features)
# features have been centered and scaled

print(standardized_features.shape)
df = pd.DataFrame(data=standardized_features, columns=col_names)
df.insert(loc=0, column="Date", value=bigmacro['Date'].values)
df.insert(loc=1, column='Regime', value=bigmacro['Regime'].values)
df.shape

(724, 708)


(724, 710)

In [14]:
bigmacro.head()

Unnamed: 0,Date,Regime,RPI,W875RX1,DPCERA3M086SBEA,CMRMTSPLx,RETAILx,INDPRO,IPFPNSS,IPFINAL,...,DTCTHFNM_3M_lag,DTCTHFNM_6M_lag,DTCTHFNM_9M_lag,DTCTHFNM_12M_lag,DTCTHFNM_18M_lag,INVEST_3M_lag,INVEST_6M_lag,INVEST_9M_lag,INVEST_12M_lag,INVEST_18M_lag
20,8/1/1960,Recession,-4.174,-6.2,-0.017,-369.9733,64.30177,-0.0277,-0.0575,0.0,...,146.0,81.0,85.0,272.0,0.0,0.214,-2.1152,-0.3442,-1.6939,-0.6763
21,9/1/1960,Recession,3.275,0.7,0.087,4748.0925,-17.63113,-0.2492,-0.2015,-0.1373,...,257.0,95.0,169.0,186.0,51.0,-0.4994,-1.8067,0.2918,-0.4119,-1.8875
22,10/1/1960,Recession,7.607,7.6,0.089,-5606.379,165.94004,-0.0277,0.1439,0.1098,...,212.0,174.0,5.0,147.0,135.0,1.8156,-0.0578,-0.7196,-0.6691,0.1694
23,11/1/1960,Recession,-11.052,-13.9,-0.064,-2854.8054,-271.72682,-0.3323,-0.259,-0.2746,...,180.0,146.0,81.0,85.0,162.0,0.0783,0.214,-2.1152,-0.3442,-1.0784
24,12/1/1960,Recession,-11.71,-14.1,-0.225,2270.7833,-136.90054,-0.4431,-0.3454,-0.3021,...,74.0,257.0,95.0,169.0,280.0,0.9259,-0.4994,-1.8067,0.2918,-2.0343


In [15]:
# Note how the big macro at this point starts with index 20, at 8/1/60, whereas at the beginning of this notebook, it 
# started at 1/1/59
# it ends at 12/1/2020: another indication that this data cannot be used to create a viable trading / investing
# system, because the monthly data is too old by the time a correction and bounce has already taken place.
bigmacro.tail()

Unnamed: 0,Date,Regime,RPI,W875RX1,DPCERA3M086SBEA,CMRMTSPLx,RETAILx,INDPRO,IPFPNSS,IPFINAL,...,DTCTHFNM_3M_lag,DTCTHFNM_6M_lag,DTCTHFNM_9M_lag,DTCTHFNM_12M_lag,DTCTHFNM_18M_lag,INVEST_3M_lag,INVEST_6M_lag,INVEST_9M_lag,INVEST_12M_lag,INVEST_18M_lag
740,8/1/2020,Recession,-546.709,141.5,1.052,4182.0,4351.0,0.9799,1.3785,1.3432,...,-1363.66,-2691.83,185.12,2203.46,-1135.26,39.9613,4.0332,6.5273,25.0625,39.7375
741,9/1/2020,Recession,95.741,128.5,1.321,9707.0,10882.0,-0.0857,-0.414,-0.7672,...,3870.26,-5046.31,-693.34,1030.96,-4843.41,147.3571,36.2188,23.5833,41.8551,10.0658
742,10/1/2020,Recession,-125.765,92.6,0.322,8354.0,510.0,1.093,1.1687,0.7706,...,4297.44,-3958.16,1784.34,-291.94,6.22,129.6204,123.8569,2.4996,62.0678,23.2322
743,11/1/2020,Recession,-217.458,-100.5,-0.77,-2828.0,-7455.0,0.9361,0.5694,0.6265,...,2390.39,-1363.66,-2691.83,185.12,210.46,62.5068,39.9613,4.0332,6.5273,44.3045
744,12/1/2020,Recession,30.336,-20.6,-1.093,-3389.0,-6611.0,1.0678,1.2382,1.2095,...,4010.57,3870.26,-5046.31,-693.34,3106.36,57.668,147.3571,36.2188,23.5833,21.6856


In [16]:
df.to_csv('current_cleaned_' + day + '.csv', index=False)