# Zillow000 - Introduction

________




In [87]:
import pandas as pd
import numpy as np
import datetime as dt
import warnings
warnings.filterwarnings('ignore')

In [91]:
# load file with custon plotly functions for timeseries, 
# color scheeme and pandas read to concat series into tables
%run functions/nice_plot_pandas_getdata.ipynb

### Background
Homeownership rates have increased almost steadly for 10 years, from 1994 to 2004. Since then, on the following 12 years, homeownership rates had gone down also almost steadly and reached in April 2016 62.9 %, same levels as in Jan, 1965. Then a reversal started and, in the next 4 years, homeownership rates increased in 2.4 percentage points, reaching levels above before pre-crisis.

The obvious explanation for homeownership is housing affordability. Lower mortgage rates, cheaper houses and higher income should lead to higher homeownership rates, as these translate into housing being more affordable.  However, determing what is the causal effect of each of these variables over homeownership rates is a difficult exercise as these variables are deeply interconnected. Both income and mortgage rates will will play a role on house prices, via higher demand. Mortgages rates and income are highly correlated as well, as mortgage rates will respond to a large extent to fiscal policy. As fiscal policy is an instrument used as an stimulus to economic activity, income and mortgage rates will also be affected by endogeneity. 

In order to circumvent the endogeneity problem, we explore a natural experiment to isolate the causal effect of house prices on homeownership.

In [53]:
s = ['Homeownership Rate for the United States [RHORUSQ156N]']
dic = dict_from_FRED_description(s)
df = make_hrizontal_tbl(dic, start=start)
fig_homeownership = plot_nice(title='Homeownership Rate', dic=dic, df=df, 
          margin=dict(autoexpand=False,l=120,r=120,t=110,), 
          show_series_label=False,
          recessions=True,
          y_units='%')

In [52]:
fig_homeownership.write_html("../images/jm01_homeownership.html")

# Affordability

** to be discussed -  median hh income seems nice, but series are annual...trade-off to use median income - monthly or quaterly data**

** adjusted TS of all times data to match middle of the time unit **

** 1/Jan significa 1st quarter. -> add 45 days = 15fev **

### load data

One way that affordability is usually measured is an ratio that shows median burden: median income*30% / mortgage rate * median house prices. I will not make any adjustments to the amount financed, as there are offsetting items (transaction costs, insurance x lower amount financed as hh have to make a down payment). This will not guarantee how much the median hh is burdened by mortgage, but we will have a interesting variation on the TS.

On this section, I adjust differences of the series, so I can have coherent dates and frequencies.
* Homeownership rates and median sales prices of houses sold are given quaterly.
* Median income is annual. 
* mortgage rates are weekly
* Annual and quarterly dates, when imported, comes as 1st day of period

In [7]:
s = ['Homeownership [RHORUSQ156N]',
    'Median house prices [MSPUS]',
    'Mortgage rates [MORTGAGE30US]',
    'Median income [MEHOINUSA646N]']
dic = dict_from_FRED_description(s)
df1 = make_hrizontal_tbl(dic, start=start)
df = df1.copy()

In [8]:
df1.describe()

Unnamed: 0,RHORUSQ156N,MSPUS,MORTGAGE30US,MEHOINUSA646N
count,221.0,229.0,2565.0,35.0
mean,65.229412,135742.79476,7.961836,41430.4
std,1.607226,94874.086932,3.219494,11454.184204
min,62.9,17800.0,3.23,22415.0
25%,64.1,48800.0,5.69,30938.5
50%,64.7,120000.0,7.54,42228.0
75%,65.9,220900.0,9.72,50143.5
max,69.2,337900.0,18.63,63179.0


### set time span
As seen bellow, median HH income series begin on 1984, so I will start series at that date.

In [9]:
[print(df.index[df[key].notna()][0], key) for key in dic.keys()]

1965-01-01 00:00:00 RHORUSQ156N
1963-01-01 00:00:00 MSPUS
1971-04-02 00:00:00 MORTGAGE30US
1984-01-01 00:00:00 MEHOINUSA646N


[None, None, None, None]

Before cutting the series, I will impute empty values for mortgage rates. 
* mortgage rates are reported on last day of the week 
* homeownership is quarterly and when imported gets day one of next month (ex; Q1 = 1-1-YYYY)
* income is annual, and when imported date is set to first day of year

**temporary adjustment - to be corrected later**: by making imputation before, I dont lose 1st value for mortgage on 1984.

* impute last observed mortgage rate on NA's cells (later make mean mortgage rate)
* drop dates in between
* shift annual income to middle of the year - in many steps:
    - send to middle of year, but actually to match other quarterly data.
    - then change index of all quarterly data
    - finally impute for each quarter


#### 1. shift income cells
A easy way to shift, given number of cells in between is not the same is to extract the dates and income column, add 3 months to the dates - to match the other quartely data - , and merge back.

In [10]:
# extract dates and income
tbl = df.MEHOINUSA646N[df.MEHOINUSA646N.notna()].reset_index()
tbl.head(3)

Unnamed: 0,DATE,MEHOINUSA646N
0,1984-01-01,22415.0
1,1985-01-01,23618.0
2,1986-01-01,24897.0


In [11]:
tbl.DATE  = tbl.DATE + pd.offsets.MonthOffset(3) #to match second quarter on April
tbl.set_index('DATE', inplace = True)
tbl.head(3)

Unnamed: 0_level_0,MEHOINUSA646N
DATE,Unnamed: 1_level_1
1984-04-01,22415.0
1985-04-01,23618.0
1986-04-01,24897.0


In [12]:
df = df.drop('MEHOINUSA646N',1) 
df = df.join(tbl)
df[df.index>='1984-04-01'].head(3)

Unnamed: 0_level_0,RHORUSQ156N,MSPUS,MORTGAGE30US,MEHOINUSA646N
DATE,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
1984-04-01,64.6,80700.0,,22415.0
1984-04-06,,,13.63,
1984-04-13,,,13.58,


#### 2. shift all quartely data
Similar procedure adopted for income only, now for all quarterly data, so dates are the middle of the quarter.

In [13]:
# make a copy to use later for robustness check with zillow data.
df_full=df.copy()

In [14]:
# Move all data originally quaterly by 2 months  ---------- mb better 45 days ???
originally_quart = df.drop('MORTGAGE30US', 1)
originally_week = df['MORTGAGE30US'].to_frame().dropna()

In [15]:
df[df.index>='1984-04-01'].head(3)

Unnamed: 0_level_0,RHORUSQ156N,MSPUS,MORTGAGE30US,MEHOINUSA646N
DATE,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
1984-04-01,64.6,80700.0,,22415.0
1984-04-06,,,13.63,
1984-04-13,,,13.58,


In [16]:
originally_week [originally_week.index>='1984-04-01'].head(3)

Unnamed: 0_level_0,MORTGAGE30US
DATE,Unnamed: 1_level_1
1984-04-06,13.63
1984-04-13,13.58
1984-04-20,13.67


In [17]:
originally_quart[originally_quart.index>='1984-04-01'].head(3)

Unnamed: 0_level_0,RHORUSQ156N,MSPUS,MEHOINUSA646N
DATE,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
1984-04-01,64.6,80700.0,22415.0
1984-04-06,,,
1984-04-13,,,


In [18]:
# originally_quart.index = originally_quart.index + dt.timedelta(days=45)
originally_quart.index = originally_quart.index + pd.offsets.MonthOffset(2) #to match second quarter on April

In [19]:
D = originally_quart.merge(originally_week, left_index=True, right_index=True, how='outer')
D[D.index>='1984-04-01'].head(3)

Unnamed: 0_level_0,RHORUSQ156N,MSPUS,MEHOINUSA646N,MORTGAGE30US
DATE,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
1984-04-03,,,,
1984-04-06,,,,13.63
1984-04-10,,,,


#### 3. impute mortgage rate

In [20]:
# see mortgage rates NAs 
#D[D.index >=max([D.index[D[key].notna()][0] for key in dic.keys()])].head(30)

In [21]:
# make imputations and check results
D.MORTGAGE30US.fillna(method='ffill', inplace=True)     

# show it works!
D[D.index >=max([D.index[D[key].notna()][0] for key in dic.keys()])].head(30)

Unnamed: 0_level_0,RHORUSQ156N,MSPUS,MEHOINUSA646N,MORTGAGE30US
DATE,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
1984-06-01,64.6,80700.0,22415.0,14.29
1984-06-06,,,,14.29
1984-06-08,,,,14.33
1984-06-13,,,,14.33
1984-06-15,,,,14.47
1984-06-20,,,,14.47
1984-06-22,,,,14.49
1984-06-27,,,,14.49
1984-06-29,,,,14.5
1984-07-04,,,,14.5


In [22]:
# dates for income on June, every year
D[D.MEHOINUSA646N.notna()].head(3)

Unnamed: 0_level_0,RHORUSQ156N,MSPUS,MEHOINUSA646N,MORTGAGE30US
DATE,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
1984-06-01,64.6,80700.0,22415.0,14.29
1985-06-01,64.1,84300.0,23618.0,12.71
1986-06-01,63.8,92100.0,24897.0,10.38


In [23]:
D[D.index>='1984-04-01'].head(3)

Unnamed: 0_level_0,RHORUSQ156N,MSPUS,MEHOINUSA646N,MORTGAGE30US
DATE,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
1984-04-03,,,,13.55
1984-04-06,,,,13.63
1984-04-10,,,,13.63


In [24]:
# drop dates that are not quarters
D = D[D.RHORUSQ156N.notna()]
D[D.index>='1984-03-01'].head(15)

Unnamed: 0_level_0,RHORUSQ156N,MSPUS,MEHOINUSA646N,MORTGAGE30US
DATE,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
1984-03-01,64.6,78200.0,,13.25
1984-06-01,64.6,80700.0,22415.0,14.29
1984-09-01,64.6,81000.0,,14.38
1984-12-01,64.1,79900.0,,13.42
1985-03-01,64.1,82800.0,,13.02
1985-06-01,64.1,84300.0,23618.0,12.71
1985-09-01,63.9,83200.0,,12.11
1985-12-01,63.5,86800.0,,11.58
1986-03-01,63.6,88000.0,,10.51
1986-06-01,63.8,92100.0,24897.0,10.38


#### Shorten series

In [25]:
[D.index[D[key].notna()][-1] for key in dic.keys()]

[Timestamp('2020-03-01 00:00:00'),
 Timestamp('2020-03-01 00:00:00'),
 Timestamp('2020-03-01 00:00:00'),
 Timestamp('2018-06-01 00:00:00')]

In [26]:
# shorten series for the 1st and last date available

# drop dates before max date of first non-empty dates of each column
D = D[D.index >=max([D.index[D[key].notna()][0] for key in dic.keys()])]

# drop dates after min date of first non-empty dates of each column
D = D[D.index <=min([D.index[D[key].notna()][-1] for key in dic.keys()])]
D

Unnamed: 0_level_0,RHORUSQ156N,MSPUS,MEHOINUSA646N,MORTGAGE30US
DATE,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
1984-06-01,64.6,80700.0,22415.0,14.29
1984-09-01,64.6,81000.0,,14.38
1984-12-01,64.1,79900.0,,13.42
1985-03-01,64.1,82800.0,,13.02
1985-06-01,64.1,84300.0,23618.0,12.71
...,...,...,...,...
2017-06-01,63.7,318200.0,61136.0,3.94
2017-09-01,63.9,320500.0,,3.82
2017-12-01,64.2,337900.0,,3.90
2018-03-01,64.2,331800.0,,4.43


### Interpolate missing data

In [27]:
D = D.interpolate(method='linear', limit_direction='forward', axis=0)
D # Now I have a complete quaterly data!!!!

Unnamed: 0_level_0,RHORUSQ156N,MSPUS,MEHOINUSA646N,MORTGAGE30US
DATE,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
1984-06-01,64.6,80700.0,22415.00,14.29
1984-09-01,64.6,81000.0,22715.75,14.38
1984-12-01,64.1,79900.0,23016.50,13.42
1985-03-01,64.1,82800.0,23317.25,13.02
1985-06-01,64.1,84300.0,23618.00,12.71
...,...,...,...,...
2017-06-01,63.7,318200.0,61136.00,3.94
2017-09-01,63.9,320500.0,61646.75,3.82
2017-12-01,64.2,337900.0,62157.50,3.90
2018-03-01,64.2,331800.0,62668.25,4.43


*** Future change: Dates represent months. Change format of dates (exclude days) *** 

### Add zillow data

In [28]:
# import zillow data
zillow_p = pd.read_csv('../output/zillow_US_prices.csv', dtype={'DATE':object})
zillow_r = pd.read_csv('../output/zillow_US_rentals.csv', dtype={'DATE':object})

zillow_p.head(3)

Unnamed: 0,DATE,price
0,1996-01-31,106884.0
1,1996-02-29,106911.0
2,1996-03-31,106962.0


In [29]:
zillow_p.DATE = zillow_p.DATE.str.extract('(\d+-\d+)')+ '-01'
zillow_p.set_index('DATE',inplace=True)
zillow_r.DATE = zillow_r.DATE.str.extract('(\d+-\d+)')+ '-01'
zillow_r.set_index('DATE',inplace=True)

In [30]:
D = D.merge(zillow_r, left_index=True, right_index=True, how='left')
D = D.merge(zillow_p, left_index=True, right_index=True, how='left')

In [31]:
D

Unnamed: 0_level_0,RHORUSQ156N,MSPUS,MEHOINUSA646N,MORTGAGE30US,rentals,price
DATE,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
1984-06-01,64.6,80700.0,22415.00,14.29,,
1984-09-01,64.6,81000.0,22715.75,14.38,,
1984-12-01,64.1,79900.0,23016.50,13.42,,
1985-03-01,64.1,82800.0,23317.25,13.02,,
1985-06-01,64.1,84300.0,23618.00,12.71,,
...,...,...,...,...,...,...
2017-06-01,63.7,318200.0,61136.00,3.94,1465.0,217122.0
2017-09-01,63.9,320500.0,61646.75,3.82,1464.0,220551.0
2017-12-01,64.2,337900.0,62157.50,3.90,1465.0,224343.0
2018-03-01,64.2,331800.0,62668.25,4.43,1484.0,228020.0


In [41]:
# make affordability index
D['aff_Census'] = (100 * 0.3 *D.MEHOINUSA646N / np.pmt(D.MORTGAGE30US/100, 30, -D.MSPUS)).to_frame()
D['aff_Zillow'] = (100 * 0.3 *D.MEHOINUSA646N / np.pmt(D.MORTGAGE30US/100, 30, -D.price)).to_frame()

In [42]:
D['aff_rental'] = (100* 0.3 *D.MEHOINUSA646N / (12* D.rentals)).to_frame()

In [43]:
D

Unnamed: 0_level_0,RHORUSQ156N,MSPUS,MEHOINUSA646N,MORTGAGE30US,rentals,price,aff_Census,aff_Zillow,aff_rental
DATE,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
1984-06-01,64.6,80700.0,22415.00,14.29,,,57.251011,,
1984-09-01,64.6,81000.0,22715.75,14.38,,,57.467336,,
1984-12-01,64.1,79900.0,23016.50,13.42,,,62.923453,,
1985-03-01,64.1,82800.0,23317.25,13.02,,,63.236881,,
1985-06-01,64.1,84300.0,23618.00,12.71,,,64.302821,,
...,...,...,...,...,...,...,...,...,...
2017-06-01,63.7,318200.0,61136.00,3.94,1465.0,217122.0,100.400066,147.139862,104.327645
2017-09-01,63.9,320500.0,61646.75,3.82,1464.0,220551.0,101.998778,148.222444,105.271089
2017-12-01,64.2,337900.0,62157.50,3.90,1465.0,224343.0,96.596720,145.491643,106.070819
2018-03-01,64.2,331800.0,62668.25,4.43,1484.0,228020.0,93.061161,135.416600,105.573197


In [122]:
dic= {'aff_Census':'mrtg (Census)', 
      'aff_Zillow':'mrtg (Zillow)',
      'aff_rental':'rental (Zillow)',
     }
dic

{'aff_Census': 'mrtg (Census)',
 'aff_Zillow': 'mrtg (Zillow)',
 'aff_rental': 'rental (Zillow)'}

In [48]:
# Not being used here as it is not rendering on HTML
function= r"$\frac{30\% \text{median income}_{HH}}{\text{housing payments}}$"

In [49]:
fig = plot_nice(title="Housing affordability (HHs)", dic=dic, df=D[D.aff_Census.notna()], 
          margin=dict(autoexpand=False,l=120,r=130,t=110,), 
          show_series_label=True,
          #function = function, fsize= 20, fx= .550, fy = 1.13, 
          recessions=True,
          vertical_label_gutter=0, source = 'U.S. Census Bureau, FreddieMac, Zillow, own calculations.',
          show_endpoints=False,colors=['rgb(41, 58, 143)', 'rgb(11, 102, 189)', 'rgb(69, 144, 256)'])

In [50]:
fig.write_html("../images/jm15_affordability.html")

# #### main  fundamentals ####

# CPI - urban consumers

In [395]:
s = ['CPI all items quaterly [CPALTT01USQ661S]']
dic = dict_from_FRED_description(s)
CPIq = make_hrizontal_tbl(dic, start=start)
s = ['CPI all items monthly [CPALTT01USM661S]']
dic = dict_from_FRED_description(s)
CPIm = make_hrizontal_tbl(dic, start=start)
s = ['CPI all items annual [CPALTT01USA661S]']
dic = dict_from_FRED_description(s)
CPIa = make_hrizontal_tbl(dic, start=start)


# Unemployment

In [404]:
s = ['unemployment [UNRATE]']
dic = dict_from_FRED_description(s)
df = make_hrizontal_tbl(dic, start=start)
df = df[((df.index>='1984-01-01') & (df.index<='2020-04-01'))]
fig = plot_nice(title='Unemployment', dic=dic, df=df, 
          margin=dict(autoexpand=False,l=120,r=120,t=38,b=0), 
          show_series_label=False,height=170,
          recessions=True,
          y_units='%')

In [397]:
fig.write_html("../images/jm_f1.html")

# Mortgage rates

In [398]:
s = ['mortgage rates [MORTGAGE30US]']
dic = dict_from_FRED_description(s)
df = make_hrizontal_tbl(dic, start=start)
df = df[((df.index>='1984-01-01') & (df.index<='2020-04-01'))]
fig = plot_nice(title='Mortgage rates', dic=dic, df=df, 
          margin=dict(autoexpand=False,l=120,r=120,t=38,b=0), 
          show_series_label=False,height=170,
          recessions=True,
          y_units='%')

In [399]:
fig.write_html("../images/jm_f2.html")

# homeownership

In [400]:
# adjust "source position"on plot_nice, reload functions, so source appear correctly bellow
%run functions/nice_plot_pandas_getdata.ipynb

In [401]:
s = ['homeownership [RHORUSQ156N]']
dic = dict_from_FRED_description(s)
df = make_hrizontal_tbl(dic, start=start)
df = df[((df.index>='1984-01-01') & (df.index<='2020-04-01'))]
df.index  = df.index+ pd.offsets.MonthOffset(2) # bcs data is quartely and when imported come as first day of quarter

In [402]:
fig = plot_nice(title='Homeownership', dic=dic, df=df, 
          margin=dict(autoexpand=False,l=120,r=120,t=28,b=40), 
          show_series_label=False,height=190,
          recessions=True,
          y_units='%', colors=['red'], source = 'US BLS, FreddieMac, US Census Bureau.')

In [403]:
fig.write_html("../images/jm_f3.html")

# House Prices

In [423]:
s = ['nominal median sales price house sold [MSPUS]']
dic = dict_from_FRED_description(s)
df = make_hrizontal_tbl(dic, start=start)
df = df.join(CPIq)
df = (100*df.iloc[:,0]/df.iloc[:,1]).to_frame()
df.index  = df.index+ pd.offsets.MonthOffset(2) # bcs data is quartely and when imported come as first day of quarter
df.columns=['median house prices sold']
dic = {'median house prices sold': 'median house prices sold'}
df = df[((df.index>='1984-06-01') & (df.index<='2018-07-01'))]

In [424]:
fig = plot_nice(title='Median house price sold', dic=dic, df=df, 
          margin=dict(autoexpand=False,l=120,r=120,t=38,b=0), 
          show_series_label=False,height=160,
          recessions=True,
          y_units='2005 US$')

In [425]:
fig.write_html("../images/jm_f5.html")

# Income
Income starts from 01-01-1984. I will cutt all series at same date then, same for the end

In [426]:
s = ['nominal income [MEHOINUSA646N]']
dic = dict_from_FRED_description(s)
df = make_hrizontal_tbl(dic, start=start)
df = df.join(CPIa)
df = (100*df.iloc[:,0]/df.iloc[:,1]).to_frame()
df.index  = df.index+ pd.offsets.MonthOffset(6) # bcs data is annual and when imported come as first day of year
df.columns=['real income']
dic = {'real income': 'median nominal HH income'}
df = df[((df.index>='1984-06-01') & (df.index<='2018-07-01'))]

In [427]:
fig = plot_nice(title='Median real HH income', dic=dic, df=df, 
          margin=dict(autoexpand=False,l=120,r=120,t=38,b=0), 
          show_series_label=False,height=160,
          recessions=True,
          y_units='2005 US$')

In [428]:
fig.write_html("../images/jm_f4.html")

In [429]:
s = ['homeownership [RHORUSQ156N]']
dic = dict_from_FRED_description(s)
df = make_hrizontal_tbl(dic, start=start)
df.index  = df.index+ pd.offsets.MonthOffset(2) # bcs data is quartely and when imported come as first day of quarter
df = df[((df.index>='1984-06-01') & (df.index<='2018-07-01'))]
fig = plot_nice(title='Homeownership', dic=dic, df=df, 
          margin=dict(autoexpand=False,l=120,r=120,t=28,b=40), 
          show_series_label=False,height=180,
          recessions=True,
          y_units='%', colors=['red'], source = 'US Census Bureau.')

fig.write_html("../images/jm_f6.html")

## Regress ownership with employment,  affordability, relative cost of rental


In [430]:
#### to do

In [None]:
Homeownership x affordability

### The experiment
Real Estate GTOs - Geographic Targeting Orders - were first anounced on January 2016 (effective on March/2016), by the FinCen - Treasury's department of Financial Crimes Enforcement Network. Under Real Estate GTOs, Title Insurance Companies of certain metropolitan areas must reveal the beneficial owners of shell companies that acquire residential real estate all-cash. When a company purchases real estate and no loan is involved, the final beneficiary is unknown. The objective of the FinCen was to identify illegal money going into top-end real estate market. 

GTOs are valid for 180 days only, but real estate GTOs were renewed since its inception, adding on each wave additional requirements. I summarized the most relevant ones bellow:

- Jan/2016: announcement for Manhattan and Miami-Dade (thresholds: US\$3mm/1mm respectively).
- 01/Mar/2016: came into force.
- 28/Aug/2016: + all other NYC boroughs and some counties in TX, CA, FL. Included personal and business checks.
- 23/08/2017: **included wire transfers** + Honolulu, HI. (4th wave)
- 17/02/2018: **threshold reduced to US$300,000** + counties on NV, WA, MA, IL. (6th wave)

There were not many reports to FinCen on the first waves of the GTO due to a caveat: treasury departments were not allowed to monitor wire transfers. Only a small amount of transactions that met the thresholds were reported. An act of Congress changed this restriction and since end of August/2017 wire transfers were included.

Available research on Real Estate GTOs (Hundtofte and Rantala, 2018) covered only the 2 initial periods (2016) and focused on the reduction of cash sales on treated areas. They concluded that the total value of all-cash residential real estate purchases reduced dramatically immediately after the inception of GTO in 2016 (covered only NYC and Miami-Dade) but, on the other hand, total transactions were still increasing. To evaluate price impact they build an pre-treatment hedonic model and compared forecasted prices after treatment against transaction prices. 

I take a completely different perspective from Hundtofte and Rantala (2018). First, I will use real Estate GTos as a natural experiment, in order to answer my research question. Second, I dont want to find the multiplier for decrease on sales versus decrease on prices. My aim is to validate that prices changed before and after treatment, so the experiment would correctly disentagle endogenous variables. Third, I focus on the GTO 4th wave (wire transfers were introduced) and 6th wave (thresholds lowered to US$300,000 and introduction of cryptocurrencies). The initial versions of Real Estate GTOs had large thresholds, as they targeted high-end real-estate and the potential impact on demand was limited, as purchases with wire transfers were exempt from the regulation. Finally I, use synthetic controls to validate changes in prices, not an hedonic approach, and get the individual county results, not the average treatment effects, to conduct the analysis of its effects on rental prices.

Caveats: potential lags on the time it takes for the price to change, and then for the rental prices to change.

I put asside the reasons one could give up investing on real estate due to loss of secrecy, turning Real Estate GTO a unique natural experiment of demand shock on residential real estate,  totally independent of economic activity. 

------
### Motivation and research question
Higher income and increase in net population, including migration, are variables that pressure both house prices and rentals upwards. Lower interest rates, also have potential to increase house prices, either by lowering the discount rate, as well leading some households to become priced-in (affording monthly mortgage repayments), thus boosting demand. On the other hand, lower interest rates can justify lower rental yields, if build to let or buy to let command yields much above other investment alternatives. But what would happen to price-to-rental ratio if we were only to shock  house prices, without any changes on the aforementioned variables? How long does it take to change? Does it revert to previous levels?

I investigate the dynamics of rental yields response after a exogenous price shock, to understand if decreases in house prices can traslante into lower rental prices.

### Methodology
I use Zillow prices and rentals time series on county level, and explore changes in Real Estate GTO rules as a natural experiment to isolate changes on house prices from other confounders ()

### Importance
Housing is the main expediture of most of the families. Housing burden can be one important driver for increase in inequality, as more and more families have to use a large amount of their income to housing.

### Contribution
I focus on the connexion of house price changes on rentals, a specific aspect on this dynamics: 
Previous literature....Dynamics of rental prices - todd sinai

Use of Real Estate GTO as a natural experiment.

------

### keep for reference (not for paper)

### Criticism on Hundtofte and Rantala
There is a working paper that studies GTO effects on high-end residential real estate market by Hundtofte and Rantala, last updated on May/2018.  On this paper, they document the reduction of cash purchases with, apparently, no difference on total transactions values (substitution of payment mode). I summarize bellow what they have done and some initial changes/additions that I propose:

**time coverage**: only 2 first GTOs (2016) - during this period GTOs were geared towards  high-end market (high thresholds). I think the most important period of the real estate GTOs are 2017 and 2018, when wire transfers were introduced and thresholds were lowered to US$300,000, affecting a broader base of properties and hence more prone to configure a demand shock (to be tested on data). I would model all periods, but I believe these 2 waves will  present most interesting results.

**price impact**: they construct an hedonic model based on prices before treatment and compare the forecasted price versus actual transaction prices after treatment. I personally dislike this approach and prefer diff-diff with an appropriate control of covariates (not done on their paper).

**diff-diff**: they do not present the summary statistics for the treated and control groups, most probably because they are completely not comparable. I think the results can only be rebust if we also take into account the geographic distribution (please see maps bellow).

My idea is not to focus on money laundering on real estate and effects of secrecy lifts as in Hundtofte and Rantala. I believe that, despite the reasons that could have taken agents to give up purchasing real estate once GTOs were in place, the potential effects of lower demand on mid range segments, if confirmed, would lead to a unique natural experiment, and hence it is worth of further investigation. In sum, I would explore many aspects of the enactment of the different waves of GTOs, either with discontinuity or diff-diff:

- how long it takes for the market to recover, if so
- how it affects new homes starts
- are there kinks around US$300,000?

If we configure that GTO on 2017/18 was effectively a shock, we could apply further these results on theoretical models, for instance, to check reduction on consumption due to lower value of collateral or to compare a pure demand shock with other kinds of shocks (ex: incoem shock).

## Code organization
I made a series of separate files, one for each task. I summarize bellow what each code does:

- Zillow000: introduction, motivation, homeownership and affordability + GTO experiment (read me).
- Zillow00: initialization file. Run on begining of other files to both load Zillow data and custom functions.
- Zillow0: download data from web, atributes reasonable names and saves to local disk
- Zillow0_GCP1: downloads CENSUS data from Google Cloud Platform.
- Zillow0_GCP2: organizes and saves CENSUS data to use on other files.
- Zillow1: loads Zillow data saved on disk in bulk, attributing sensible variable names to them
- Zillow2: maps and time series of Zillow rentals - state level
- Zillow3: maps and time series of Zillow prices - state level
- Zillow4: maps and time series of Zillow prices-to-rentals - state level and by US region
- Zillow5: maps and time series of Zillow prices - county level, top,mid,low tier distributions
- Zillow6: organizes data long form and saves it on local disk, adding FIPS necessary
- Zillow7: plot maps and time series - county level. Trade-off time span x counties with complete information.
- Zillow8: build tables with dummies for regression
- Zillow9: plot thresholds x median prices & TS per treated county with initial treatment.
- GTO_waves: documents the processing of raw attributes of each GTO wave into clean data. When run on another code, it loads FIPS of treated counties.

The codes above uses functions that perform repetitive tasks, also saved on separate files on a nested folder `.\functions`. Data is stored on folder `.\input` when raw and `.\output` when cleaned or processed. Finally, images generated by my code are stored on folder `.\images`.

In Zillow 2, 3, 4 and 5 I have explored prices and rents relationship on state/census regions levels. I wont use them for GTO experiment.