# Automatically downloading real estate data

The Mortgage Bankers Association sends a press release every week with the results of their weekly mortgage applications survey. The survey covers over 75% of all applications for retail residential mortgages in the U.S., and contains a huge wealth of information, including interest rates and changes in the number of applications. These press releases are all available on [one page](https://www.mba.org/news-research-and-resources/research-and-economics/single-family-research/weekly-applications-survey/research-and-economics-all-news-about-mbas-weekly-applications-survey) and go back to 2016. The survey itself has been conducted since 1990. Find more information about the survey [here](https://www.mba.org/Documents/mba.org/files/Research/HistoricalWAS/WASFAQ.pdf). 

The purpose of this script is to automatically download all of the press releases and extract information about interest rates.

Skip to the bottom section to see the final, complete code.

### Acknowledgements 

Thank you to Manuel Villa and Jonathan Soma for their invaluable help with this code. 

## Part 1: Get a list of URLs

I need to find the URLs of each press release on the webpage and put them into a list.

In [1]:
import requests
from bs4 import BeautifulSoup

In [5]:
url = 'https://www.mba.org/news-research-and-resources/research-and-economics/single-family-research/weekly-applications-survey/research-and-economics-all-news-about-mbas-weekly-applications-survey'

headers = {'name': "Sharon Lurye", 
           'email': "sharonrlurye@gmail.com",
          'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36'}

response = requests.get(url, headers=headers)

In [6]:
soup = BeautifulSoup(response.text, 'html.parser')

We want to find this part:

```
<ul class="item-list search-results-list">
<h1 class="schema-pressrelease prop-resourcename item-title search-results-item-title"> <a href="http://www.mba.org/2021-press-releases/june/mortgage-applications-increase-in-latest-mba-weekly-survey-x281353">                 Mortgage Applications Increase in Latest MBA Weekly Survey                 </a> </h1>
<p> <span class="schema-pressrelease prop-articledate">                 June 23, 2021                 </span></p>
<br/>
```

In [7]:
links = soup.select(".search-results-list a")

links = [link['href'] for link in links]

In [5]:
len(links)

248

## Part 2: Find the date of each survey and the interest rates

Let's start with the first press release only.

In [14]:
links[0]

'http://www.mba.org/2021-press-releases/june/mortgage-applications-decrease-in-latest-mba-weekly-survey-x281049'

In [16]:
url = links[0]

headers = {'name': "Sharon Lurye", 'email': "sharonrlurye@gmail.com"}

response = requests.get(url, headers=headers)

soup = BeautifulSoup(response.text, "html.parser")


In [18]:
#Find the title of the press release
soup.find('h1', attrs={'class':'title-primary'}).text.strip()

'Mortgage Applications Decrease in Latest MBA Weekly Survey'

In [19]:
#Find first two paragraphs of the press release
soup.find_all('p')[0:3]

[<p></p>,
 <p><strong></strong><strong>WASHINGTON, D.C. (June 30, 2021)</strong> - <strong>Mortgage applications decreased 6.9 percent from one week earlier, </strong>according to data from the Mortgage Bankers Association's (MBA) Weekly Mortgage Applications Survey for the week ending June 25, 2021. </p>,
 <p>The Market Composite Index, a measure of mortgage loan application volume, decreased 6.9 percent on a seasonally adjusted basis from one week earlier. On an unadjusted basis, the Index decreased 7 percent compared with the previous week. The Refinance Index decreased 8 percent from the previous week and was 15 percent lower than the same week one year ago. The seasonally adjusted Purchase Index decreased 5 percent from one week earlier. The unadjusted Purchase Index decreased 6 percent compared with the previous week and was 17 percent lower than the same week one year ago.  </p>]

In [21]:
#Find text of the first paragraph
soup.find_all('p')[1].text.strip()

"WASHINGTON, D.C. (June 30, 2021) -\xa0Mortgage applications decreased 6.9 percent from one week earlier, according to data from the Mortgage Bankers Association's (MBA) Weekly Mortgage Applications Survey for the week ending June 25, 2021."

In [28]:
#https://www.programiz.com/python-programming/regex#python-regex
#https://stackoverflow.com/questions/35413746/regex-to-match-date-like-month-name-day-comma-and-year/35413952
#https://stackoverflow.com/questions/59219456/extract-month-names-and-date-numbers-from-a-raw-string-using-regex-edit-new-te

import re

string = soup.find_all('p')[1].text
pattern = "ending (\w+ \d+, \d\d\d\d)."

#I want it to return June 18 (week the surveyended) but NOT June 23 (week the press release came out) so I search for the date with a whitespace in front. Then I use strip() to remove that whitespace.

result = re.findall(pattern, string)
print(result[0].strip())

June 25, 2021


In [29]:
#Print all paragraphs that mention interest rates
#https://stackabuse.com/python-check-if-string-contains-substring 

string = "interest rate"

#Grab date from the first paragraph
for paragraph in soup.find_all('p'):
    if string in paragraph.text:
        print(paragraph.text, "\n")

The average contract interest rate for 30-year fixed-rate mortgages with conforming loan balances ($548,250 or less) increased to 3.20 percent from 3.18 percent, with points decreasing to 0.39 from 0.48 (including the origination fee) for 80 percent loan-to-value ratio (LTV) loans. The effective rate remained unchanged from last week. 

The average contract interest rate for 30-year fixed-rate mortgages with jumbo loan balances (greater than $548,250) decreased to 3.23 percent from 3.26 percent, with points decreasing to 0.33 from 0.44 (including the origination fee) for 80 percent LTV loans. The effective rate decreased from last week.   The average contract interest rate for 30-year fixed-rate mortgages backed by the FHA decreased to 3.19 percent from 3.21 percent, with points remaining unchanged at 0.34 (including the origination fee) for 80 percent LTV loans. The effective rate decreased from last week.   The average contract interest rate for 15-year fixed-rate mortgages decreased

In [30]:
#Print contract interest rate for 30-year fixed-rate mortgages under $548,250

string = "The average contract interest rate for 30-year fixed-rate mortgages with conforming loan balances"
pattern = "(\d.\d\d) percent"

#Grab date from the first paragraph
for paragraph in soup.find_all('p'):
    if string in paragraph.text:
        print(paragraph.text)
        result = re.findall(pattern, paragraph.text)
        print(result[0])

The average contract interest rate for 30-year fixed-rate mortgages with conforming loan balances ($548,250 or less) increased to 3.20 percent from 3.18 percent, with points decreasing to 0.39 from 0.48 (including the origination fee) for 80 percent loan-to-value ratio (LTV) loans. The effective rate remained unchanged from last week.
3.20


In [31]:
#Print contract interest rate for 5/1 ARMs

string = "The average contract interest rate for 5/1 ARMs"
pattern = "(\d.\d\d) percent"

#Grab date from the first paragraph
for paragraph in soup.find_all('p'):
    if string in paragraph.text:
        print(paragraph.text)
        result = re.findall(pattern, paragraph.text)
        print(result[0])

The average contract interest rate for 30-year fixed-rate mortgages with jumbo loan balances (greater than $548,250) decreased to 3.23 percent from 3.26 percent, with points decreasing to 0.33 from 0.44 (including the origination fee) for 80 percent LTV loans. The effective rate decreased from last week.   The average contract interest rate for 30-year fixed-rate mortgages backed by the FHA decreased to 3.19 percent from 3.21 percent, with points remaining unchanged at 0.34 (including the origination fee) for 80 percent LTV loans. The effective rate decreased from last week.   The average contract interest rate for 15-year fixed-rate mortgages decreased to 2.56 percent from 2.58 percent, with points decreasing to 0.37 from 0.39 (including the origination fee) for 80 percent LTV loans. The effective rate decreased from last week.   The average contract interest rate for 5/1 ARMs increased to 2.98 percent from 2.69 percent, with points decreasing to 0.23 from 0.26 (including the originat

## Create a dataframe with the first 9 links

In [33]:
import pandas as pd

df = pd.DataFrame(links[:9], columns=['url'])
df

Unnamed: 0,url
0,http://www.mba.org/2021-press-releases/june/mo...
1,http://www.mba.org/2021-press-releases/june/mo...
2,http://www.mba.org/2021-press-releases/june/mo...
3,http://www.mba.org/2021-press-releases/june/mo...
4,http://www.mba.org/2021-press-releases/june/mo...
5,http://www.mba.org/2021-press-releases/may/mor...
6,http://www.mba.org/2021-press-releases/may/mor...
7,http://www.mba.org/2021-press-releases/may/mor...
8,http://www.mba.org/2021-press-releases/may/mor...


In [34]:
df['page_contents'] = df.url.apply(lambda url: requests.get(url).text)
df

Unnamed: 0,url,page_contents
0,http://www.mba.org/2021-press-releases/june/mo...,"<!DOCTYPE html>\r\n\r\n<html lang=""en"" class=""..."
1,http://www.mba.org/2021-press-releases/june/mo...,"<!DOCTYPE html>\r\n\r\n<html lang=""en"" class=""..."
2,http://www.mba.org/2021-press-releases/june/mo...,"<!DOCTYPE html>\r\n\r\n<html lang=""en"" class=""..."
3,http://www.mba.org/2021-press-releases/june/mo...,"<!DOCTYPE html>\r\n\r\n<html lang=""en"" class=""..."
4,http://www.mba.org/2021-press-releases/june/mo...,"<!DOCTYPE html>\r\n\r\n<html lang=""en"" class=""..."
5,http://www.mba.org/2021-press-releases/may/mor...,"<!DOCTYPE html>\r\n\r\n<html lang=""en"" class=""..."
6,http://www.mba.org/2021-press-releases/may/mor...,"<!DOCTYPE html>\r\n\r\n<html lang=""en"" class=""..."
7,http://www.mba.org/2021-press-releases/may/mor...,"<!DOCTYPE html>\r\n\r\n<html lang=""en"" class=""..."
8,http://www.mba.org/2021-press-releases/may/mor...,"<!DOCTYPE html>\r\n\r\n<html lang=""en"" class=""..."


In [36]:
df['title'] = df.page_contents.apply(lambda contents: BeautifulSoup(contents, 'html.parser').select_one("h1.title-primary").text.strip())
df

Unnamed: 0,url,page_contents,title
0,http://www.mba.org/2021-press-releases/june/mo...,"<!DOCTYPE html>\r\n\r\n<html lang=""en"" class=""...",Mortgage Applications Decrease in Latest MBA W...
1,http://www.mba.org/2021-press-releases/june/mo...,"<!DOCTYPE html>\r\n\r\n<html lang=""en"" class=""...",Mortgage Applications Increase in Latest MBA W...
2,http://www.mba.org/2021-press-releases/june/mo...,"<!DOCTYPE html>\r\n\r\n<html lang=""en"" class=""...",Mortgage Applications Increase in Latest MBA W...
3,http://www.mba.org/2021-press-releases/june/mo...,"<!DOCTYPE html>\r\n\r\n<html lang=""en"" class=""...",Mortgage Applications Decrease in Latest MBA W...
4,http://www.mba.org/2021-press-releases/june/mo...,"<!DOCTYPE html>\r\n\r\n<html lang=""en"" class=""...",Mortgage Applications Decrease in Latest MBA W...
5,http://www.mba.org/2021-press-releases/may/mor...,"<!DOCTYPE html>\r\n\r\n<html lang=""en"" class=""...",Mortgage Applications Decrease in Latest MBA W...
6,http://www.mba.org/2021-press-releases/may/mor...,"<!DOCTYPE html>\r\n\r\n<html lang=""en"" class=""...",Mortgage Applications Increase in Latest MBA W...
7,http://www.mba.org/2021-press-releases/may/mor...,"<!DOCTYPE html>\r\n\r\n<html lang=""en"" class=""...",Mortgage Applications Increase in Latest MBA W...
8,http://www.mba.org/2021-press-releases/may/mor...,"<!DOCTYPE html>\r\n\r\n<html lang=""en"" class=""...",Mortgage Applications Decrease in Latest MBA W...


In [37]:
df['survey_date'] = df.page_contents.str.extract("ending (\w+ \d+, \d\d\d\d).")
df

Unnamed: 0,url,page_contents,title,survey_date
0,http://www.mba.org/2021-press-releases/june/mo...,"<!DOCTYPE html>\r\n\r\n<html lang=""en"" class=""...",Mortgage Applications Decrease in Latest MBA W...,"June 25, 2021"
1,http://www.mba.org/2021-press-releases/june/mo...,"<!DOCTYPE html>\r\n\r\n<html lang=""en"" class=""...",Mortgage Applications Increase in Latest MBA W...,"June 18, 2021"
2,http://www.mba.org/2021-press-releases/june/mo...,"<!DOCTYPE html>\r\n\r\n<html lang=""en"" class=""...",Mortgage Applications Increase in Latest MBA W...,"June 11, 2021"
3,http://www.mba.org/2021-press-releases/june/mo...,"<!DOCTYPE html>\r\n\r\n<html lang=""en"" class=""...",Mortgage Applications Decrease in Latest MBA W...,"June 4, 2021"
4,http://www.mba.org/2021-press-releases/june/mo...,"<!DOCTYPE html>\r\n\r\n<html lang=""en"" class=""...",Mortgage Applications Decrease in Latest MBA W...,"May 28, 2021"
5,http://www.mba.org/2021-press-releases/may/mor...,"<!DOCTYPE html>\r\n\r\n<html lang=""en"" class=""...",Mortgage Applications Decrease in Latest MBA W...,"May 21, 2021"
6,http://www.mba.org/2021-press-releases/may/mor...,"<!DOCTYPE html>\r\n\r\n<html lang=""en"" class=""...",Mortgage Applications Increase in Latest MBA W...,"May 14, 2021"
7,http://www.mba.org/2021-press-releases/may/mor...,"<!DOCTYPE html>\r\n\r\n<html lang=""en"" class=""...",Mortgage Applications Increase in Latest MBA W...,"May 7, 2021"
8,http://www.mba.org/2021-press-releases/may/mor...,"<!DOCTYPE html>\r\n\r\n<html lang=""en"" class=""...",Mortgage Applications Decrease in Latest MBA W...,"April 30, 2021"


In [38]:
pattern = "interest rate for 30-year fixed-rate mortgages with conforming .* (\d.\d\d) percent"

df['fixed_rates'] = df.page_contents.str.extract(pattern)

df

Unnamed: 0,url,page_contents,title,survey_date,fixed_rates
0,http://www.mba.org/2021-press-releases/june/mo...,"<!DOCTYPE html>\r\n\r\n<html lang=""en"" class=""...",Mortgage Applications Decrease in Latest MBA W...,"June 25, 2021",3.18
1,http://www.mba.org/2021-press-releases/june/mo...,"<!DOCTYPE html>\r\n\r\n<html lang=""en"" class=""...",Mortgage Applications Increase in Latest MBA W...,"June 18, 2021",3.11
2,http://www.mba.org/2021-press-releases/june/mo...,"<!DOCTYPE html>\r\n\r\n<html lang=""en"" class=""...",Mortgage Applications Increase in Latest MBA W...,"June 11, 2021",3.15
3,http://www.mba.org/2021-press-releases/june/mo...,"<!DOCTYPE html>\r\n\r\n<html lang=""en"" class=""...",Mortgage Applications Decrease in Latest MBA W...,"June 4, 2021",3.17
4,http://www.mba.org/2021-press-releases/june/mo...,"<!DOCTYPE html>\r\n\r\n<html lang=""en"" class=""...",Mortgage Applications Decrease in Latest MBA W...,"May 28, 2021",2.53
5,http://www.mba.org/2021-press-releases/may/mor...,"<!DOCTYPE html>\r\n\r\n<html lang=""en"" class=""...",Mortgage Applications Decrease in Latest MBA W...,"May 21, 2021",3.15
6,http://www.mba.org/2021-press-releases/may/mor...,"<!DOCTYPE html>\r\n\r\n<html lang=""en"" class=""...",Mortgage Applications Increase in Latest MBA W...,"May 14, 2021",3.11
7,http://www.mba.org/2021-press-releases/may/mor...,"<!DOCTYPE html>\r\n\r\n<html lang=""en"" class=""...",Mortgage Applications Increase in Latest MBA W...,"May 7, 2021",3.18
8,http://www.mba.org/2021-press-releases/may/mor...,"<!DOCTYPE html>\r\n\r\n<html lang=""en"" class=""...",Mortgage Applications Decrease in Latest MBA W...,"April 30, 2021",3.17


In [39]:
pattern = "interest rate for 5/1 ARMs .* (\d.\d\d) percent"

df['arms_rates'] = df.page_contents.str.extract(pattern)

df

Unnamed: 0,url,page_contents,title,survey_date,fixed_rates,arms_rates
0,http://www.mba.org/2021-press-releases/june/mo...,"<!DOCTYPE html>\r\n\r\n<html lang=""en"" class=""...",Mortgage Applications Decrease in Latest MBA W...,"June 25, 2021",3.18,2.69
1,http://www.mba.org/2021-press-releases/june/mo...,"<!DOCTYPE html>\r\n\r\n<html lang=""en"" class=""...",Mortgage Applications Increase in Latest MBA W...,"June 18, 2021",3.11,2.69
2,http://www.mba.org/2021-press-releases/june/mo...,"<!DOCTYPE html>\r\n\r\n<html lang=""en"" class=""...",Mortgage Applications Increase in Latest MBA W...,"June 11, 2021",3.15,2.54
3,http://www.mba.org/2021-press-releases/june/mo...,"<!DOCTYPE html>\r\n\r\n<html lang=""en"" class=""...",Mortgage Applications Decrease in Latest MBA W...,"June 4, 2021",3.17,2.54
4,http://www.mba.org/2021-press-releases/june/mo...,"<!DOCTYPE html>\r\n\r\n<html lang=""en"" class=""...",Mortgage Applications Decrease in Latest MBA W...,"May 28, 2021",2.53,2.81
5,http://www.mba.org/2021-press-releases/may/mor...,"<!DOCTYPE html>\r\n\r\n<html lang=""en"" class=""...",Mortgage Applications Decrease in Latest MBA W...,"May 21, 2021",3.15,2.58
6,http://www.mba.org/2021-press-releases/may/mor...,"<!DOCTYPE html>\r\n\r\n<html lang=""en"" class=""...",Mortgage Applications Increase in Latest MBA W...,"May 14, 2021",3.11,2.57
7,http://www.mba.org/2021-press-releases/may/mor...,"<!DOCTYPE html>\r\n\r\n<html lang=""en"" class=""...",Mortgage Applications Increase in Latest MBA W...,"May 7, 2021",3.18,2.76
8,http://www.mba.org/2021-press-releases/may/mor...,"<!DOCTYPE html>\r\n\r\n<html lang=""en"" class=""...",Mortgage Applications Decrease in Latest MBA W...,"April 30, 2021",3.17,2.59


# Final, complete code

Here is the full code for:

- Getting the links
- Using regular expressions to extract the datapoints we want 
- Creating a dataframe
- Checking that the scrape was accurate
- Exporting to CSV

In [1]:
import pandas as pd
import requests
from bs4 import BeautifulSoup

In [2]:
url = 'https://www.mba.org/news-research-and-resources/research-and-economics/single-family-research/weekly-applications-survey/research-and-economics-all-news-about-mbas-weekly-applications-survey'

headers = {'name': "Sharon Lurye", 
           'email': "sharonrlurye@gmail.com",
          'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36'}

response = requests.get(url, headers=headers)

soup = BeautifulSoup(response.text, 'html.parser')
links = soup.select(".search-results-list a")
links = [link['href'] for link in links]
len(links)

249

In [3]:
df = pd.DataFrame(links, columns=['url'])
df['page_contents'] = df.url.apply(lambda url: requests.get(url).text)

In [4]:
df['title'] = df.page_contents.apply(lambda contents: BeautifulSoup(contents, 'html.parser').select_one("h1.title-primary").text.strip())

#Alternative REGEX with same results: \w+ \d{1,2}, \d{4}
df['survey_date'] = df.page_contents.str.extract("ending (\w+ \d+, \d\d\d\d).")

#Originally made mistake of writing 'percent from' -- caused it to pick up much more text
#Can also write \d.\d{2}
#,? means the comma is optional. If it does find a comma, it will only pick up the first one
pattern = "interest rate for 30-year,? fixed-rate mortgages with conforming .*?(\d.\d\d) percent"
df['fixed_rates'] = df.page_contents.str.extract(pattern)

#Originally made mistake of writing "to (\d.\d\d)" and "percent from" which caused it to miss text
pattern = "interest rate for 5/1 ARMs .*?(\d.\d\d) percent"
df['arms_rates'] = df.page_contents.str.extract(pattern)

pattern = "(interest rate for 30-year,? fixed-rate mortgages with conforming .*?\d.\d{2} percent)"
df['fixed_rates_full'] = df.page_contents.str.extract(pattern)

pattern = "(interest rate for 5/1 ARMs .*? \d.\d{2} percent)"
df['arms_rates_full'] = df.page_contents.str.extract(pattern)

df.head()

Unnamed: 0,url,page_contents,title,survey_date,fixed_rates,arms_rates,fixed_rates_full,arms_rates_full
0,http://www.mba.org/2021-press-releases/july/mo...,"<!DOCTYPE html>\r\n\r\n<html lang=""en"" class=""...",Mortgage Applications Decrease in Latest MBA W...,"July 2, 2021",3.15,2.94,interest rate for 30-year fixed-rate mortgages...,interest rate for 5/1 ARMs decreased to 2.94 p...
1,http://www.mba.org/2021-press-releases/june/mo...,"<!DOCTYPE html>\r\n\r\n<html lang=""en"" class=""...",Mortgage Applications Decrease in Latest MBA W...,"June 25, 2021",3.2,2.98,interest rate for 30-year fixed-rate mortgages...,interest rate for 5/1 ARMs increased to 2.98 p...
2,http://www.mba.org/2021-press-releases/june/mo...,"<!DOCTYPE html>\r\n\r\n<html lang=""en"" class=""...",Mortgage Applications Increase in Latest MBA W...,"June 18, 2021",3.18,2.69,interest rate for 30-year fixed-rate mortgages...,interest rate for 5/1 ARMs remained unchanged ...
3,http://www.mba.org/2021-press-releases/june/mo...,"<!DOCTYPE html>\r\n\r\n<html lang=""en"" class=""...",Mortgage Applications Increase in Latest MBA W...,"June 11, 2021",3.11,2.69,interest rate for 30-year fixed-rate mortgages...,interest rate for 5/1 ARMs increased to 2.69 p...
4,http://www.mba.org/2021-press-releases/june/mo...,"<!DOCTYPE html>\r\n\r\n<html lang=""en"" class=""...",Mortgage Applications Decrease in Latest MBA W...,"June 4, 2021",3.15,2.54,interest rate for 30-year fixed-rate mortgages...,interest rate for 5/1 ARMs remained unchanged ...


In [8]:
df2 = df
del df2['page_contents']

In [9]:
df2.head()

Unnamed: 0,url,title,survey_date,fixed_rates,arms_rates,fixed_rates_full,arms_rates_full
0,http://www.mba.org/2021-press-releases/july/mo...,Mortgage Applications Decrease in Latest MBA W...,"July 2, 2021",3.15,2.94,interest rate for 30-year fixed-rate mortgages...,interest rate for 5/1 ARMs decreased to 2.94 p...
1,http://www.mba.org/2021-press-releases/june/mo...,Mortgage Applications Decrease in Latest MBA W...,"June 25, 2021",3.2,2.98,interest rate for 30-year fixed-rate mortgages...,interest rate for 5/1 ARMs increased to 2.98 p...
2,http://www.mba.org/2021-press-releases/june/mo...,Mortgage Applications Increase in Latest MBA W...,"June 18, 2021",3.18,2.69,interest rate for 30-year fixed-rate mortgages...,interest rate for 5/1 ARMs remained unchanged ...
3,http://www.mba.org/2021-press-releases/june/mo...,Mortgage Applications Increase in Latest MBA W...,"June 11, 2021",3.11,2.69,interest rate for 30-year fixed-rate mortgages...,interest rate for 5/1 ARMs increased to 2.69 p...
4,http://www.mba.org/2021-press-releases/june/mo...,Mortgage Applications Decrease in Latest MBA W...,"June 4, 2021",3.15,2.54,interest rate for 30-year fixed-rate mortgages...,interest rate for 5/1 ARMs remained unchanged ...


In [12]:
#Look at a random sample of 5% of the dataset to check accuracy
#https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.sample.html

pd.set_option("max_colwidth", 250)

df2.sample(frac=0.05)

Unnamed: 0,url,title,survey_date,fixed_rates,arms_rates,fixed_rates_full,arms_rates_full
223,http://www.mba.org/2016-press-releases/june/rates-drop-refi-apps-jump-in-latest-mba-weekly-survey,"Rates Drop, Refi Apps Jump in Latest MBA Weekly Survey","June 17, 2016",3.76,2.92,"interest rate for 30-year fixed-rate mortgages with conforming loan balances ($417,000 or less) decreased to its lowest level since May 2013, 3.76 percent",interest rate for 5/1 ARMs increased to 2.92 percent
125,http://www.mba.org/2019-press-releases/january/mortgage-applications-rebound-in-latest-mba-weekly-survey,Mortgage Applications Rebound in Latest MBA Weekly Survey,"January 4, 2019",4.74,4.05,"interest rate for 30-year fixed-rate mortgages with conforming loan balances ($484,350 or less) decreased to its lowest level since April 2018, 4.74 percent","interest rate for 5/1 ARMs decreased to its lowest level since August 2018, 4.05 percent"
182,http://www.mba.org/2017-press-releases/october/mortgage-applications-increase-in-latest-mba-weekly-survey,Mortgage Applications Increase in Latest MBA Weekly Survey,"October 13, 2017",4.14,3.31,"interest rate for 30-year fixed-rate mortgages with conforming loan balances ($424,100 or less) decreased to 4.14 percent",interest rate for 5/1 ARMs decreased to 3.31 percent
91,http://www.mba.org/2019-press-releases/september/mortgage-applications-flat-in-latest-mba-weekly-survey,Mortgage Applications Flat in Latest MBA Weekly Survey,"September 13, 2019",4.01,3.54,"interest rate for 30-year fixed-rate mortgages with conforming loan balances ($484,350 or less) increased to 4.01 percent",interest rate for 5/1 ARMs increased to 3.54 percent
122,http://www.mba.org/2019-press-releases/february/mortgage-applications-decline-in-latest-mba-weekly-survey,Mortgage Applications Decline in Latest MBA Weekly Survey,"February 8, 2019",4.65,3.97,"interest rate for 30-year fixed-rate mortgages with conforming loan balances ($484,350 or less) decreased to 4.65 percent",interest rate for 5/1 ARMs decreased to 3.97 percent
234,http://www.mba.org/2016-press-releases/april/refinance-applications-drive-increase-in-latest-mba-weekly-survey,Refinance Applications Drive Increase in Latest MBA Weekly Survey,"April 1, 2016",3.86,2.94,"interest rate for 30-year fixed-rate mortgages with conforming loan balances ($417,000 or less) decreased to 3.86 percent",interest rate for 5/1 ARMs decreased to 2.94 percent
20,http://www.mba.org/2021-press-releases/february/mortgage-applications-decrease-in-latest-mba-weekly-survey,Mortgage Applications Decrease in Latest MBA Weekly Survey,"February 5, 2021",2.96,2.92,"interest rate for 30-year fixed-rate mortgages with conforming loan balances ($548,250 or less) increased to 2.96 percent",interest rate for 5/1 ARMs increased to 2.92 percent
44,http://www.mba.org/2020-press-releases/august/mortgage-applications-decrease-in-latest-mba-weekly-survey-x271818,Mortgage Applications Decrease in Latest MBA Weekly Survey,"August 14, 2020",3.13,2.95,"interest rate for 30-year fixed-rate mortgages with conforming loan balances ($510,400 or less) increased to 3.13 percent",interest rate for 5/1 ARMs decreased to 2.95 percent
150,http://www.mba.org/2018-press-releases/june/mortgage-applications-increase-in-latest-mba-weekly-survey-x234290,Mortgage Applications Increase in Latest MBA Weekly Survey,"June 1, 2018",4.75,4.08,"interest rate for 30-year fixed-rate mortgages with conforming loan balances ($453,100 or less) decreased to 4.75 percent",interest rate for 5/1 ARMs decreased to 4.08 percent
50,http://www.mba.org/2020-press-releases/july/mortgage-applications-increase-in-latest-mba-weekly-survey,Mortgage Applications Increase in Latest MBA Weekly Survey,"July 3, 2020",3.26,2.98,"interest rate for 30-year fixed-rate mortgages with conforming loan balances ($510,400 or less) decreased to 3.26 percent",interest rate for 5/1 ARMs decreased to 2.98 percent


In [13]:
#Check for nas
df2[df2.isnull().any(axis=1)]

#Most NAs were for genuinely missing data (i.e., the press release did not include that information.) 
#The regex failed to pick up the data on 1/29/16 (which is also repeated for some reason)

Unnamed: 0,url,title,survey_date,fixed_rates,arms_rates,fixed_rates_full,arms_rates_full
27,http://www.mba.org/2020-press-releases/december/mortgage-applications-increase-in-latest-mba-weekly-survey,Mortgage Applications Increase in Latest MBA Weekly Survey,"December 11, 2020",2.85,,"interest rate for 30-year fixed-rate mortgages with conforming loan balances ($510,400 or less) decreased to a survey low of 2.85 percent",
36,http://www.mba.org/2020-press-releases/october/mortgage-applications-decrease-in-latest-mba-weekly-survey,Mortgage Applications Decrease in Latest MBA Weekly Survey,"October 9, 2020",3.0,,"interest rate for 30-year fixed-rate mortgages with conforming loan balances ($510,400 or less) decreased to 3.00 percent",
155,http://www.mba.org/2018-press-releases/may/mortgage-credit-availability-unchanged-in-april,Mortgage Credit Availability Unchanged in April,,,,,
186,http://www.mba.org/2017-press-releases/september/mismo-seeks-input-on-online-notary-standards,MISMO Seeks Input on Online Notary Standards,,,,,
240,http://www.mba.org/2016-press-releases/feb/mortgage-applications-decrease-in-latest-mba-weekly-survey,Mortgage Applications Decrease in Latest MBA Weekly Survey,"January 29, 2016",,3.0,,"interest rate for 5/1 ARMs <a name=""MBA51A_RateDir""></a>decreased to 3.00 percent"
243,http://www.mba.org/2016-press-releases/feb/mortgage-applications-decrease-in-latest-mba-weekly-survey-x132879,Mortgage Applications Decrease in Latest MBA Weekly Survey,"January 29, 2016",,3.0,,"interest rate for 5/1 ARMs <a name=""MBA51A_RateDir""></a>decreased to 3.00 percent"
247,http://www.mba.org/2016-press-releases/january/mortgage-credit-availability-decreased-in-december,Mortgage Credit Availability Decreased in December,,,,,


In [14]:
df2.to_csv("mortgage_data_2.csv")