In [1]:
import pandas as pd
import numpy as np
import yfinance as yf
from datetime import date, timedelta
import fredapi as fd
fred = fd.Fred(api_key = 'bde7928ce1d3cc555b5d2fb725f0ef4b')

### Research Question Overview

One of the most important and influential indicators of the United States economy is the federal funds rate, often colloquially referred to as the interest rate. The federal funds rate (FFR) is a target interest rate set by the Federal Reserve (Fed) in an attempt to control other key economic indicators. The Fed seeks to reach its target interest rate by setting a discount rate (the cost banks face to borrow money from other banks in order to meet overnight reserve requirements), engaging in open market operations (reducing or increasing the quantity of U.S. treasury bills available for purchase), and adjusting reserve requirement (the percentage of money banks must keep in reserve rather than lending it to clients) (#1). In practice, the federal funds rate truly does prevail as the standard interest rate throughout the economy.

The Fed is tasked with promoting "maximum employment, stable prices, and moderate long-term interest rates" (#2). Although this list contains three goals, it is commonly referred to as the "dual mandate", as it is often summarized as aiming for maximum employment and low, stable inflation (#2). An increase or decrease in the FFR impacts levels of personal consumption and spending as well as corporate investment, which affects inflation. Higher interest rates increase the cost of borrowing money, which allows us to think of interest rates as the price of money. For example, when interest rates are relatively high, consumers have greater incentive to save money (and earn interest) rather than spending, while companies have less incentive to invest (since they must borrow at a higher rate to do so). This leads to an overall cooling in the economy. Conversely, lower interest rates increase personal consumption and corporate investment, which stimulates the economy. Together, these personal and corporate decisions, all based on the federal funds rate, have direct and dramatic effects on important economic indicators such as inflation, real GDP, mortgage rates, and unemployment. 

Since the interest rate influences so many economic outcomes, foreknowledge of changes in the FFR would allow consumers, producers, and investment managers to time and allocate their resources effectively. Of particular interest to our group is how changes in the FFR affect the performance of institutional investors' bond portfolios. There is an inverse relationship between interest rates and bond prices; higher interest rates lead to lower prices (#3). Thus, bond portfolios can get burned by a rapid increase in rates, leading to a painful dip in returns (#3). Even equity portfolios can be negatively impacted by higher rates. As companies have less incentive to borrow money for spending on new projects, the higher rates often lead to lower profits, and thus lower stock prices. For this reason, much effort is expended to predict the target FFR, which is determined roughly every six weeks by the Federal Open Market Committee (FOMC), the policy-making team of the Federal Reserve (#4). The FOMC has shared some indicators that it considers in its decisions, including various price indices and labor market indicators, such as the annual change in the price index for personal consumption expenditures (#5). However, it does not explicitly share the way it considers or weights these metrics.

Many predictive bodies, such as the Blue Chip consensus, are relatively successful at predicting changes in the FFR when most or all economic indicators are positive (or negative), but are inconsistent in accuracy when economic conditions are mixed (#6). Their historical accuracy and prediction methods are unclear. As a proxy for their accuracy, we use as a benchmark for the success of our methods as the naive method of predicting that each meeting, the Fed will choose to hold interest rates constant. Using the techniques of regression and classification, we seek to appropriately select critical features from available economic indicators in order to develop a robust model for predicting changes in the FFR. In particular, we are concerned with finding features that best predict FOMC decisions to raise, lower, or hold constant the target FFR. Throughout this project, we seek to answer three key questions: 1) What features have the most predictive power for forecasting FOMC decisions? 2) What regression and classification methods are best suited to forecasting these decisions? 3) How accurately can we predict FOMC decisions?

Citations:
1. https://www.federalreserve.gov/monetarypolicy/fomc.htm
2. https://www.federalreserve.gov/monetarypolicy/monetary-policy-what-are-its-goals-how-does-it-work.htm
3. https://www.morningstar.com/portfolios/how-invest-your-money-fed-raises-interest-rates
4. https://www.investopedia.com/terms/f/federalfundsrate.asp
5. https://federalreserve.gov/monetarypolicy/monetary-policy-what-are-its-goals-how-does-it-work.htm
6. https://www.stlouisfed.org/publications/regional-economist/july-2000/inside-the-briefcase-the-art-of-predicting-the-federal-reserve

### Our Data

In our search for data on key economic indicators that might influence the FFR, we looked to the St. Louis Federal Reserve (FRED) website (#1). This site hosts an enormous quantity of time series data on economic indicators. This data has been carefully and methodically tracked, and is a gold standard for economic data. Many reputable news sources, including the Wall Street Journal, frequently cite FRED data. In selecting features, we focused on features with data continuously tracked since 1989 (the year our FOMC decisions data starts) to the present day. We also chose features that we felt were most likely to have predictive power in determining the FOMC's decision to raise or lower interest rate. With these two considerations in mind, we selected the several features which are summarized in the table below.

| Feature | Variable | Frequency | Description |
| --- | --- | --- | --- |
| Bank Prime Loan Rate Changes | `loan` | daily | Rate charged by banks for short-term loans to creditworthy debtors |
| Exports of Goods and Services | `export` | quarterly | Total dollar value of goods and services exported in the quarter |
| Personal Consumption Expenditures Rate | `pce` | monthly | Measure of core inflation for personal expenditures |
| Unemployment Rate | `ue` | monthly | Number of unemployed as a percentage of the labor force |
| Change in Real GDP | `rgdp` | quarterly | Quarterly change in inflation-adjusted GDP |
| Total Vehicle Sales | `cars` | monthly | Total number of vehicle sales in millions |
| Recession Indicator | `recess` | monthly | Binary variable indicating whether the US is in a recession |
| GDP Deflator | `gdpd` | daily | Price index given by ratio of nominal GDP to real GDP |
| Velocity of M1 Money | `veloc` | quarterly | Ratio of nominal GDP to the quarterly average of M1 money stock |
| New Private Housing Units Started | `house` | monthly | Number of new housing units beginning construction in millions  |

In addition, we hypothesized that stock market performance and the strength of the dollar might be important factors in the FOMC's decision. The stock market acts as a barometer for the health of the economy, and the strength of the dollar indicates the strength of the economy relative to other countries. Thus, we used daily closing price data from Yahoo! Finance for the S&P 500 (`spx`, a proxy for total stock market) and US Dollar (`usd`, the value of the USD versus a basket of foreign currencies) Indices (#2).

We also generated binary variables for the political party of the Fed Chair (`fed_party`) and President of the United States (`potus_party`) (#2, #3). These are set 1 if the position was held by a Republican, and 0 if held by a Democrat. This data was pulled from Wikipedia and is reliable given that the information is readily accessible on a host of websites. We hypothesized that differences in opinion on economic policy might affect the FOMC's decisions to raise or lower interest rates. 

Our final feature, `cli`, was pulled from The Organisation for Economic Co-operation and Development (OECD) (#4). This organization is a multinational group founded in 1961 for the purpose of stimulating economic progress and world trade (#5). Given its status as a United Nations observer, we are confident in the validity of its data. The composite leading indicator (CLI) is a statistic created to "provide early signals of turning points in business cycles". Hence, the Fed would likely care about proactively adjusting interest rates in response to changes in this indicator.

Finally, our data on the FFR (`ffr`) came from the research of an economics professor at Williams University. Kenneth Kuttner is a leading researcher in monetary policy, and the data we downloaded from his website was the backbone of a paper he published in the Journal of Finance (the top journal in academic finance) (#6). It contains information on the percentage point change (`change`) at each announcement and the direction of each change (`direction`). Our project aims to forecast the direction of each FOMC decision, and is indicated by -1 for a decrease, 0 for no change, and 1 for an increase in the FFR.

Citations:

1. https://fred.stlouisfed.org/
2. https://en.wikipedia.org/wiki/Chair_of_the_Federal_Reserve
3. https://simple.wikipedia.org/wiki/List_of_presidents_of_the_United_States
4. https://data.oecd.org/leadind/composite-leading-indicator-cli.htm
5. https://en.wikipedia.org/wiki/OECD
6. https://econ.williams.edu/faculty-pages/research/

### Data Collection and Cleaning

How was the data collected? Include corresponding code. Is the procedure clear? Any missing references or potential legal problems with the data gathering?

How was the data cleaned? Include any corresponding code. Is the procedure clear? If there is little or no data cleaning, strongly defend why.

How do you handle missing, badly formatted, or incorrect data? Justify choices of what you remove, edit, reformat, or left unchanged?

Justify engineered features. If there is little or no feature engineering, defend why this is appropriate.

How robust is the cleaning and scraping code? Can it be easily modified to be usable for similar datasets, or datasets with more data points?

NOTE: comment code, use reasonable variable names, docstrings in every function

The data is pulled from four primary sources: Federal Reserve of St. Louis (FRED), Yahoo! Finance, The Organisation for Economic Co-operation and Development (OECD), and Wikipedia. FRED data is easily pulled using their API by entering a specific data series identifier. The data we chose from FRED all spanned the entire period of interest (1989-2023). A few data series, such as nonfarm job openings, are reportedly important to the FOMC when setting interest rates. However, given they didn't span the entire time period of our interest rate data, we chose to exclude these variables. This is a limitation to our classification attempts, but attempting to fill in macroeconomic variables is far beyond the scope of this project. Furthermore, we are confident that the other features we selected have high explanatory power for FOMC decisions, which gave us solace in not including the few economic indicators with insufficient history.

In [63]:
# specify what data to pull from FRED
start_dt = '1/1/1989'
name_id = [('loan', 'PRIME'), ('exports', 'A020RL1Q158SBEA'), ('pce', 'PCETRIM12M159SFRBDAL'),
           ('ue', 'UNRATE'), ('rgdp', 'A191RL1Q225SBEA'), ('cars', 'TOTALSA'), ('recess', 'USREC'),
           ('gdpd', 'A191RI1Q225SBEA'), ('veloc', 'M1V'), ('house', 'HOUST'), ('mich', 'MICH')]
fred_data = []

for x in name_id:
    # get FRED series using the proper id
    df = pd.DataFrame(fred.get_series(x[1], observation_start=start_dt))
    # make column name identifiable
    df.rename(columns={0: x[0]}, inplace=True)
    fred_data.append(df)

# aggregate FRED data series
df_fred = pd.concat(fred_data, axis=1)

In [64]:
# specify what data to pull from Yahoo! Finance
tickers = [('spx', '^SPX'), ('usd', 'DX-Y.NYB')]
start_dt = '1989-01-01'
end_dt = str(date.today() - timedelta(1))
interval = '1d'
stock_data = []

# access data and display
for t in tickers:
    # get price data using ticker
    df = yf.download(t[1], start = start_dt, end = end_dt, interval= interval)
    # make column name identifiable
    df = df[['Adj Close']].rename(columns={'Adj Close': t[0]})
    stock_data.append(df)

# aggregate stock data
df_stock = pd.concat(stock_data, axis=1)

[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed


In [65]:
# read in cli data from csv file downloaded off OECD website
df_cli = pd.read_csv('cli.csv', index_col=0)
df_cli.index = pd.to_datetime(df_cli.index)

# read in manually gathered political party data
df_party = pd.read_csv('party.csv', index_col=0)
df_party.index = pd.to_datetime(df_party.index)

In [66]:
# create feature dataset
df_features = pd.concat([df_fred, df_stock, df_cli, df_party], axis=1)

In [83]:
# pull in FOMC data 
df_fomc = pd.read_csv('ffr_clean.csv', index_col=0)
df_fomc.index = pd.to_datetime(df_fomc.index)

# create decisions and momentum column
decision = df_fomc['change'].values.copy()
down, up = tuple([decision < 0]), tuple([decision > 0])
decision[down], decision[up] = -1., 1.

# create new columns
df_fomc['decision'] = decision
df_fomc['mom'] = decision

# consider only scheduled meetings
df_fomc = df_fomc.query('scheduled == 1')
df_fomc.drop('scheduled', axis=1, inplace=True)

In [84]:
# get start and end dates
start_dt = df_features.index[0]
end_dt = df_fomc.index[-1]

# get every day between start and end date
delta = end_dt - start_dt
daily = [start_dt + timedelta(days=i) for i in range(delta.days + 1)]

# make dataframe of dates
df_days = pd.DataFrame(daily)
df_days.index = pd.to_datetime(df_days.values[:,0])

In [88]:
# initialize master dataframe
df = pd.concat([df_days, df_fomc, df_features], axis=1)

# drop irrelevant columns and rows
df.drop(0, axis=1, inplace=True)
last_ix = df_days.index.get_loc(end_dt) + 1
df.drop(df.index[last_ix:], inplace=True)

# find start and stop columns
cols = df.columns.to_list()
start_ix, end_ix = cols.index('mom'), cols.index('fed_party')

# shift necessary columns
df_shift = df.iloc[:, start_ix:end_ix].copy()
df.iloc[:, start_ix:end_ix] = df_shift.shift()

# fill in NaN entries
df_fill = df.iloc[:, start_ix:].copy()
df.iloc[:, start_ix:] = df_fill.ffill()

# save master dataframe
df.to_csv('master_data.csv')
df.tail()

Unnamed: 0,ffr,change,decision,mom,loan,exports,pce,ue,rgdp,cars,recess,gdpd,veloc,house,mich,spx,usd,cli,fed_party,potus_party
2023-10-28,,,,0.0,8.5,6.0,3.64,3.9,5.2,15.91,0.0,3.5,1.511,1372.0,4.2,4117.370117,106.559998,99.47179,1.0,0.0
2023-10-29,,,,0.0,8.5,6.0,3.64,3.9,5.2,15.91,0.0,3.5,1.511,1372.0,4.2,4117.370117,106.559998,99.47179,1.0,0.0
2023-10-30,,,,0.0,8.5,6.0,3.64,3.9,5.2,15.91,0.0,3.5,1.511,1372.0,4.2,4117.370117,106.559998,99.47179,1.0,0.0
2023-10-31,,,,0.0,8.5,6.0,3.64,3.9,5.2,15.91,0.0,3.5,1.511,1372.0,4.2,4166.819824,106.120003,99.47179,1.0,0.0
2023-11-01,5.4,0.0,0.0,0.0,8.5,6.0,3.64,3.9,5.2,15.91,0.0,3.5,1.511,1372.0,4.2,4193.799805,106.660004,99.47179,1.0,0.0


### Robustness

The data cleaning and feature engineering code found above is flexible and simple to modify. One can easily add not only new features but also recent FOMC decisions. Adding new data series from FRED and Yahoo! Finance is the easiest modification. Simply add the identifying name of the feature as well as the series code for FRED or the ticker for Yahoo! Finance, and the for loops will handle the rest. Adding data from outside CSV files only requires adding the name of the new dataframe to the list when creating `df_features`. Thus, any number of new features (columns) from a variety of sources can be quickly added. By virtue of the `ffill()` function, any data pulled using the FRED or Yahoo! Finance API is properly extended to the last row of the dataframe, which corresponds to the most recent FOMC announcement we have data for.

New FOMC announcement data can be appended by modifying the ffr_clean.csv file. This is done by adding the new target Fed Funds Rate under the `ffr` column and the change in basis points under the `change` column. In addition, the `scheduled` column should be entered as 1 if the meeting was scheduled and 0 if it was a surprise. The columns `decision` and `mom` are calculated automatically. It is critical to note that new FOMC announcements should be added sequentially and even if the meeting was unscheduled in order to keep the data consistent. Modifying this data is more of a hassle than adding new features, but meetings generally only occur every six weeks and require the addition of only three data points to the CSV file. All things considered, this is an insignificant burden to adding more data points. In addition, when new data points are added, the data coming from FRED and Yahoo! Finance automatically pull from 1989 until the present day. It is the responsibility of the researcher to ensure any data coming from CSV files is still accurate when adding new FOMC announcement data, as this will not automatically refresh, but autofill from the most recent valid entry using the `ffill()` function.

### Data Visualization and Basic Analysis

Use summary statistics and visualizations to describe data with some depth. How well do the visualizations and analysis contribute to answering the questions from the introduction?

Are the visualizations readable? Do they convey the information clearly and aesthetically?

Evaluate the validity of the analysis and conclusions about the data. Does it avoid bias? Are the conclusions statistically sound?

NOTE: comment code, use reasonable variable names, docstrings in every function

### Data labels

fred_data = [loan, exports, pce, ue, rgdp, cars, recess, gdpd, veloc, house, mich]

loan = Bank Prime Loan Rate Changes: Historical Dates of Changes and Rates (PRIME) - Daily, 1955-Jul 2023

exports = Real Exports of Goods and Services (A020RL1Q158SBEA) - Quarterly, 1947-Jul 2023

pce = Trimmed Mean PCE Inflation Rate (PCETRIM12M159SFRBDAL) -  Monthly, 1978-Oct 2023

ue = Unemployment Rate (UNRATE) - Monthly 1948-Oct 2023

rgdp = Real Gross Domestic Product (A191RL1Q225SBEA) - Quarterly 1947-Jul 2023 

cars = Total Vehicle Sales (TOTALSA) - Monthly 1976-Oct 2023

recess = NBER based Recession Indicators for the United States from the Period following the Peak through the Trough (USREC) - Monthly 1854-Nov 2023

gdpd = Gross Domestic Product: Implicit Price Deflator (A191RI1Q225SBEA)

veloc = Velocity of M1 Money Stock (M1V) - Quarterly 1959-Jul 2023

house = New Privately-Owned Housing Units Started: Total Units (HOUST) - Monthly 1959-Oct 2023

cli = Leading Indicators OECD: Leading indicators: CLI: Amplitude adjusted for OECD Total (OECDLOLITOAASTSAM) - Monthly 1961-Oct 2023 ***Use data from CSV

spx = SPX - S&P 500 Adjusted closing price

usd = USD - US Dollar Index for the international value of the US dollar relative to a basket of world currencies

fed_party = political party of Fed Chair (1=Republican)

potus_party = party of US President (1=Republican)

mom_u = momentum up, (1=previous meeting was a rate hike)

mom_d = momentum down, (1=previous meeting was a rate decrease)

In [5]:
df = pd.read_csv('master_data.csv', index_col=0)
df.index = pd.to_datetime(df.index)
df

Unnamed: 0,ffr,change,decision,loan,exports,pce,ue,rgdp,cars,recess,...,veloc,house,mich,cli,spx,usd,fed_party,potus_party,mom_u,mom_d
1989-01-01,,,,,,,,,,,...,,,,,,,,,,
1989-01-02,,,,,12.5,4.05,5.4,4.1,15.372,0.0,...,7.028,1621.0,3.5,100.53120,,,,,,
1989-01-03,,,,,12.5,4.05,5.4,4.1,15.372,0.0,...,7.028,1621.0,3.5,100.53120,,92.500000,,,,
1989-01-04,,,,,12.5,4.05,5.4,4.1,15.372,0.0,...,7.028,1621.0,3.5,100.53120,275.309998,92.169998,,,,
1989-01-05,,,,,12.5,4.05,5.4,4.1,15.372,0.0,...,7.028,1621.0,3.5,100.53120,279.429993,92.980003,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2023-12-04,,,,8.5,6.0,3.64,3.7,5.2,15.863,0.0,...,1.511,1372.0,4.2,99.47179,4594.629883,103.269997,1.0,0.0,,
2023-12-05,,,,8.5,6.0,3.64,3.7,5.2,15.863,0.0,...,1.511,1372.0,4.2,99.47179,4569.779785,103.639999,1.0,0.0,,
2023-12-06,,,,8.5,6.0,3.64,3.7,5.2,15.863,0.0,...,1.511,1372.0,4.2,99.47179,4567.180176,104.050003,1.0,0.0,,
2023-12-07,,,,8.5,6.0,3.64,3.7,5.2,15.863,0.0,...,1.511,1372.0,4.2,99.47179,4549.339844,104.150002,1.0,0.0,,
