# Problem # 1 - Reverse Engineer a Publicly Available Index

## By: Matthew Pribadi

Citations: 

### Objective: 

- The main motivation for this project is to reverse engineer a calculation to determine whether a person should rent or buy a property based on various factors.
<br><br>

### Project Summary
#### *Outcome Variable*
In this project...
<br><br>

### Terminology:
- The **FHFA House Price Index (HPI)** is a broad measure of the movement of single-family house prices.  The HPI is a weighted, repeat-sales index, meaning that it measures average price changes in repeat sales or refinancings on the same properties. This information is obtained by reviewing repeat mortgage transactions on single-family properties whose mortgages have been purchased or securitized by Fannie Mae or Freddie Mac since January 1975.

#### Assumptions/Limitations
- In order to merge the datasets, I am taking yearly averages for all the data sets. This is a major limitation, and would be improved through adding seasonality when I have more time.
- According to the methodology of the paper, they utilized bi-annual mortgage rate averages. For the sake of time, I have taken the average annual mortgage rates.
- I am using Zillow data which only starts from the year 2000

## Import packages

In [None]:
# general
import io
import os
import requests
import warnings
import re

# file handling
from requests.auth import HTTPBasicAuth
from zipfile import ZipFile

# Data Science Libraries
import datetime
import numpy as np
import pandas as pd


# General plotting
import matplotlib.pyplot as plt
import pydot
import seaborn as sns

# Specialized plotting
from IPython.display import Image
from matplotlib.colors import ListedColormap
from mlxtend.plotting import scatterplotmatrix



%matplotlib inline
sns.set_theme(color_codes=True)

# pd.set_option('display.max_rows', None)
# pd.set_option('display.max_columns', None)

## Functions for Data Cleaning & Processing

## Read Data, Cleaning and Preprocessing 

In [None]:
url_zillow = '../input/buy-rent-index/County_zhvi_uc_sfrcondo_tier_0.33_0.67_sm_sa_month.csv'
url_HPI = '../input/buy-rent-index/HPI_master.csv'
url_rates = '../input/buy-rent-index/MORTGAGE30US.csv'


In [None]:
df_zillow = pd.read_csv(url_zillow)
df_HPI = pd.read_csv(url_HPI)

parse_dates = ['DATE']
df_rates = pd.read_csv(url_rates, parse_dates=parse_dates)


### Clean up Zillow Data

In [None]:
df_zillow.dropna(axis=1)

# df_zillow.set_index(['Metro'], inplace = True)
df_zillow.head()

In [None]:
df_zillow_metro = df_zillow.iloc[:,[5,6]]

df_zillow_metro.head()

In [None]:
df_date_zillow = df_zillow.iloc[:,10:].transpose().reset_index().rename(columns={'index':'date'})
df_date_zillow.set_index(['date']).head()

In [None]:
df_date_zillow['date'] = pd.to_datetime(df_date_zillow['date'])

In [None]:
df_agg_date_zillow = df_date_zillow.groupby(df_date_zillow['date']
                                            .map(lambda x: x.year)).agg('mean').transpose()\
                                            .assign(Index=range(len(df_agg_date_zillow))).set_index('Index')

df_agg_date_zillow.reindex_like(df_zillow_metro).tail()

In [None]:
df_cleaned_zillow = pd.concat([df_zillow_metro, df_agg_date_zillow], axis=1).dropna(axis=0)

In [None]:
df_cleaned_zillow.set_index(['State','Metro']).head()

### Clean HPI Data

In [None]:
df_red = df_HPI[df_HPI['yr']>=2000]

df_red.head()

In [None]:
df_yr_HPI = df_red.groupby(['place_name','yr']).agg({'index_nsa': np.mean})

df_yr_HPI.head()

### Clean Mortgage Rates

In [None]:
df_rates.head()

In [None]:
#Generate a list of average mortgage rates per year in the United States

df_avg_rates = df_rates.groupby(df_rates['DATE'].map(lambda x: x.year)).agg({'MORTGAGE30US': np.mean})

df_avg_rates.plot()