# The Federal Reserve statements

This scrapes the contents of https://federalreserve.gov/monetarypolicy/fomccalendars.htm for copies of the U.S. Federal Reserve monetary policy statements.

An analysis is provided at a separate notebook.

## Do your imports

In [1]:
import pandas as pd

import re
import numpy as np

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.support.ui import Select

from webdriver_manager.chrome import ChromeDriverManager

import requests
from bs4 import BeautifulSoup
import altair as alt

## Allow Selenium to open up Chrome and automatically navigate through the website

In [2]:
driver = webdriver.Chrome(ChromeDriverManager().install())



Could not get version for google-chrome with the any command: /Applications/Google\ Chrome.app/Contents/MacOS/Google\ Chrome --version
Current google-chrome version is UNKNOWN
Get LATEST chromedriver version for UNKNOWN google-chrome
Trying to download new driver from https://chromedriver.storage.googleapis.com/105.0.5195.52/chromedriver_mac64.zip
Driver has been saved in cache [/Users/prinzmagtulis/.wdm/drivers/chromedriver/mac64/105.0.5195.52]
  driver = webdriver.Chrome(ChromeDriverManager().install())


In [3]:
driver.get("https://federalreserve.gov/monetarypolicy/fomccalendars.htm")

## Scraping proper: table

First step is to scrape all tabled information, that is, excluding all the contents of **links**.

In [28]:
raw_html = requests.get("https://federalreserve.gov/monetarypolicy/fomccalendars.htm").content
soup_doc = BeautifulSoup(raw_html, "html.parser")
soup_doc

<!DOCTYPE html>

<html class="no-js" lang="en">
<head>
<meta charset="utf-8"/>
<meta content="IE=edge" http-equiv="X-UA-Compatible">
<meta content="width=device-width, initial-scale=1.0, minimum-scale=1.0 maximum-scale=1.6, user-scalable=1" name="viewport"/>
<meta content="Board of Governors of the Federal Reserve System, Federal Reserve Board of Governors, Federal Reserve Board, Federal Reserve" name="keywords"/>
<meta content="The Federal Reserve Board of Governors in Washington DC." name="description"/>
<meta content="Board of Governors of the Federal Reserve System" property="og:site_name"/>
<meta content="article" property="og:type"/>
<meta content="" property="og:image"/>
<meta content="summary" name="twitter:card"/>
<meta content="" name="twitter:image"/>
<title>The Fed - Meeting calendars and information</title>
<link href="/css/bootstrap.css" rel="stylesheet" type="text/css"/>
<link href="/css/bluesteel-theme.css" rel="stylesheet" type="text/css"/>
<script src="/js/modernizr-l

In [200]:
dataset=[]
container = soup_doc.find_all("div", {"class": "row fomc-meeting"})
for containers in container:
    data={}
    data ['month'] = containers.find("div", {"class": "fomc-meeting__month col-xs-5 col-sm-3 col-md-2"}).text
    data ['date'] = containers.find("div", {"class":"fomc-meeting__date col-xs-4 col-sm-9 col-md-10 col-lg-1"})
    data ['link'] = containers.find("div", {"class": "col-xs-12 col-md-4 col-lg-2"}).select_one('a[href*=".htm"]')
    dataset.append(data)
dataset

[{'month': 'January',
  'date': <div class="fomc-meeting__date col-xs-4 col-sm-9 col-md-10 col-lg-1">25-26</div>,
  'link': <a href="/newsevents/pressreleases/monetary20220126a.htm">HTML</a>},
 {'month': 'May',
  'date': <div class="fomc-meeting__date col-xs-4 col-sm-9 col-md-10 col-lg-1">3-4</div>,
  'link': <a href="/newsevents/pressreleases/monetary20220504a.htm">HTML</a>},
 {'month': 'July',
  'date': <div class="fomc-meeting__date col-xs-4 col-sm-9 col-md-10 col-lg-1">26-27</div>,
  'link': <a href="/newsevents/pressreleases/monetary20220727a.htm">HTML</a>},
 {'month': 'November',
  'date': <div class="fomc-meeting__date col-xs-4 col-sm-9 col-md-10 col-lg-1">1-2</div>,
  'link': None},
 {'month': 'January',
  'date': <div class="fomc-meeting__date col-xs-4 col-sm-9 col-md-10 col-lg-1">26-27</div>,
  'link': <a href="/newsevents/pressreleases/monetary20210127a.htm">HTML</a>},
 {'month': 'April',
  'date': <div class="fomc-meeting__date col-xs-4 col-sm-9 col-md-10 col-lg-1">27-28</d

In [177]:
dataset2=[]
rows = soup_doc.find_all("div", {"class": "fomc-meeting--shaded row fomc-meeting"})
for row in rows:
    data={}
    data ['month'] = row.find("div", {"class": "fomc-meeting--shaded fomc-meeting__month col-xs-5 col-sm-3 col-md-2"})
    data ['date'] = row.find("div", {"class":"fomc-meeting__date col-xs-4 col-sm-9 col-md-10 col-lg-1"})
    data ['link'] = row.find("div", {"class": "col-xs-12 col-md-4 col-lg-2"}).select_one('a[href*=".htm"]')
    dataset2.append(data)
dataset2

[{'month': <div class="fomc-meeting--shaded fomc-meeting__month col-xs-5 col-sm-3 col-md-2"><strong>March</strong></div>,
  'date': <div class="fomc-meeting__date col-xs-4 col-sm-9 col-md-10 col-lg-1">15-16*</div>,
  'link': <a href="/newsevents/pressreleases/monetary20220316a.htm">HTML</a>},
 {'month': <div class="fomc-meeting--shaded fomc-meeting__month col-xs-5 col-sm-3 col-md-2"><strong>June</strong></div>,
  'date': <div class="fomc-meeting__date col-xs-4 col-sm-9 col-md-10 col-lg-1">14-15*</div>,
  'link': <a href="/newsevents/pressreleases/monetary20220615a.htm">HTML</a>},
 {'month': <div class="fomc-meeting--shaded fomc-meeting__month col-xs-5 col-sm-3 col-md-2"><strong>September</strong></div>,
  'date': <div class="fomc-meeting__date col-xs-4 col-sm-9 col-md-10 col-lg-1">20-21*</div>,
  'link': None},
 {'month': <div class="fomc-meeting--shaded fomc-meeting__month col-xs-5 col-sm-3 col-md-2"><strong>December</strong></div>,
  'date': <div class="fomc-meeting__date col-xs-4 co

We arrange the information into a **list of dictionaries** in preparation to transforming it into a **data frame** for pandas analysis later.

## Our dataframes

Our **first data frame**

In [201]:
df1 = pd.DataFrame(dataset)
df1.head()

Unnamed: 0,month,date,link
0,January,[25-26],[HTML]
1,May,[3-4],[HTML]
2,July,[26-27],[HTML]
3,November,[1-2],
4,January,[26-27],[HTML]


Our **second data frame**

In [202]:
df2 = pd.DataFrame(dataset2)
df2.head()

Unnamed: 0,month,date,link
0,[[March]],[15-16*],[HTML]
1,[[June]],[14-15*],[HTML]
2,[[September]],[20-21*],
3,[[December]],[13-14*],
4,[[March]],[16-17*],[HTML]


We now need to **combine them**.

In [None]:
df= pd.concat([df1, df2], ignore_index=True)

## Scraping actual statements

We use BeautifulSoup on this one. The process is easier since we already have the links in our dataframes and all we have to do is to just **access and grab** their contents one by one.

I'm commenting this part out to avoid reading through a bunch of texts, but hey, it runs very well so try it on your own!

In [7]:
statements=[]
for speech in dataset[0:]:
    href = speech['link']
    raw_html = requests.get(href).content
    doc = BeautifulSoup(raw_html, "html.parser")
    headers = doc.find_all(class_= 'large-9 large-centered columns')[1]
    text={}
    text['link']= speech['link']
    text['speech']= headers.text 
    speeches.append(text)
#statements

As you can see, the speeches are arranged as a **single block** per row to match their place in the df. This is, of course, not the ideal way and may be improved. Below is a **second data frame** containing the links and speeches themselves.

We then **merge** this information with our earlier df.

In [8]:
df3=pd.DataFrame(statements)
df3

Unnamed: 0,link,speech
0,http://www.officialgazette.gov.ph/1935/11/25/m...,\nMessage\nof\nHis Excellency Manuel L. Quezon...
1,http://www.officialgazette.gov.ph/1936/06/16/m...,\nMessage\nof\nHis Excellency Manuel L. Quezon...
2,http://www.officialgazette.gov.ph/1937/10/18/m...,\nMessage\nof\nHis Excellency Manuel L. Quezon...
3,http://www.officialgazette.gov.ph/1938/01/24/m...,\nMessage\nof\nHis Excellency Manuel L. Quezon...
4,http://www.officialgazette.gov.ph/1939/01/24/m...,\nMessage\nof\nHis Excellency Manuel L. Quezon...
...,...,...
77,https://mirror.officialgazette.gov.ph/2016/07/...,\n\n\n\nSTATE OF THE NATION ADDRESS OF \nRODRI...
78,https://mirror.officialgazette.gov.ph/2017/07/...,\n\n\n\nSTATE OF THE NATION ADDRESS OF \nRODRI...
79,https://mirror.officialgazette.gov.ph/2018/07/...,\n\n\n\nSTATE OF THE NATION ADDRESS OF \nRODRI...
80,https://mirror.officialgazette.gov.ph/2019/07/...,\n\n\n\nSTATE OF THE NATION ADDRESS OF \nRODRI...


Our final df.

In [9]:
merged = df.merge(df3, suffixes=('_left'))
merged

  merged = df1.merge(df2, suffixes=('_left'))


Unnamed: 0,president,date,title,link,venue,session,speech
0,Manuel L. Quezon,"November 25, 1935",Message to the First Assembly on National Defense,http://www.officialgazette.gov.ph/1935/11/25/m...,"Legislative Building, Manila","First National Assembly, First Session",\nMessage\nof\nHis Excellency Manuel L. Quezon...
1,Manuel L. Quezon,"June 16, 1936",On the Country’s Conditions and Problems,http://www.officialgazette.gov.ph/1936/06/16/m...,"Legislative Building, Manila","First National Assembly, First Session",\nMessage\nof\nHis Excellency Manuel L. Quezon...
2,Manuel L. Quezon,"October 18, 1937","Improvement of Philippine Conditions, Philippi...",http://www.officialgazette.gov.ph/1937/10/18/m...,"Legislative Building, Manila","First National Assembly, Second Session",\nMessage\nof\nHis Excellency Manuel L. Quezon...
3,Manuel L. Quezon,"January 24, 1938",Revision of the System of Taxation,http://www.officialgazette.gov.ph/1938/01/24/m...,"Legislative Building, Manila","First National Assembly, Third Session",\nMessage\nof\nHis Excellency Manuel L. Quezon...
4,Manuel L. Quezon,"January 24, 1939",The State of the Nation and Important Economic...,http://www.officialgazette.gov.ph/1939/01/24/m...,"Legislative Building, Manila","Second National Assembly, First Session",\nMessage\nof\nHis Excellency Manuel L. Quezon...
...,...,...,...,...,...,...,...
77,Rodrigo Roa Duterte,"July 25, 2016",State of the Nation Address,https://mirror.officialgazette.gov.ph/2016/07/...,"Batasang Pambansa, Quezon City","Seventeenth Congress, First Session",\n\n\n\nSTATE OF THE NATION ADDRESS OF \nRODRI...
78,Rodrigo Roa Duterte,"July 24, 2017",Second State of the Nation Address,https://mirror.officialgazette.gov.ph/2017/07/...,"Batasang Pambansa, Quezon City","Seventeenth Congress, Second Session",\n\n\n\nSTATE OF THE NATION ADDRESS OF \nRODRI...
79,Rodrigo Roa Duterte,"July 23, 2018",Third State of the Nation Address,https://mirror.officialgazette.gov.ph/2018/07/...,"Batasang Pambansa, Quezon City","Seventeenth Congress, Third Session",\n\n\n\nSTATE OF THE NATION ADDRESS OF \nRODRI...
80,Rodrigo Roa Duterte,"July 22, 2019",Fourth State of the Nation Address,https://mirror.officialgazette.gov.ph/2019/07/...,"Batasang Pambansa, Quezon City","Eighteenth Congress, First Session",\n\n\n\nSTATE OF THE NATION ADDRESS OF \nRODRI...


## Save to CSV

In [10]:
#merged.to_csv('merged.csv', index=False)