# Table of Contents
 <p><div class="lev1 toc-item"><a href="#Introduction" data-toc-modified-id="Introduction-1"><span class="toc-item-num">1&nbsp;&nbsp;</span>Introduction</a></div><div class="lev1 toc-item"><a href="#Basic-Issues" data-toc-modified-id="Basic-Issues-2"><span class="toc-item-num">2&nbsp;&nbsp;</span>Basic Issues</a></div><div class="lev1 toc-item"><a href="#Handling-Covariates" data-toc-modified-id="Handling-Covariates-3"><span class="toc-item-num">3&nbsp;&nbsp;</span>Handling Covariates</a></div>

$$
    \newcommand{\genericdel}[3]{%
      \left#1#3\right#2
    }
    \newcommand{\del}[1]{\genericdel(){#1}}
    \newcommand{\sbr}[1]{\genericdel[]{#1}}
    \newcommand{\cbr}[1]{\genericdel\{\}{#1}}
    \newcommand{\abs}[1]{\genericdel||{#1}}
    \DeclareMathOperator*{\argmin}{arg\,min}
    \DeclareMathOperator*{\argmax}{arg\,max}
    \DeclareMathOperator{\Pr}{\mathbb{p}}
    \DeclareMathOperator{\E}{\mathbb{E}}
    \DeclareMathOperator{\Ind}{\mathbb{I}}
    \DeclareMathOperator{\V}{\mathbb{V}}
    \DeclareMathOperator{\cov}{cov}
    \DeclareMathOperator{\ones}{\mathbf{1}}
    \DeclareMathOperator{\invchi}{\mathrm{Inv-\chi}^2}
    \DeclareMathOperator*{\argmin}{arg\,min}
    \DeclareMathOperator*{\argmax}{arg\,max}
    \newcommand{\effect}{\mathrm{eff}}
    \newcommand{\xtilde}{\widetilde{X}}
    \DeclareMathOperator{\normal}{\mathcal{N}}
    \DeclareMathOperator{\unif}{Uniform}
    \newcommand{\boxleft}{\unicode{x25E7}}
    \newcommand{\boxright}{\unicode{x25E8}}
    \newcommand{\discont}{\unicode{x25EB}}
    \newcommand{\jleft}{\unicode{x21E5}}
    \newcommand{\jright}{\unicode{x21E4}}
    \newcommand{\gp}{\mathcal{GP}}
$$

# Introduction

This documents my attempts to applied our GP-based RDD methodology to house prices in Tucson, AZ. The hypothesis was that house prices are influenced by the school district they are in, as parents wish to move to school districts with better schools. If the rules are such that students on either side of the school district line *have* to attend school in their district, then we might hope for this change in price to be discontinuous at the boundary.

Tucson was chosen for two reasons. Firstly, it is a single city with multiple school districts. Secondly, there is publicly available house-sales data. This combination was quite difficult to find (you might remember I spent a while analyzing Milwaukee data before I realized it had only a single school district), although there are other cities with such a situation. The map, where each black dot represents a sale, shows why Tucson was so promising.

![Tucson Map](Tucson_plots/tucson_sales_map.png)

I was also able to download a rich dataset of covariates for each house in Tucson, with information about square footage, the number of bedrooms and floors, whether the house has a garage, and even an indicator of the quality of the house (determined by a surveyor), amongst many other covariates.

# Basic Issues

Ultimately, I wasn't able to detect a “treatment effect” at the boundary between two of Tucson's school district. The two most immediate reasons is that the treatment effect might be smaller than expected, and secondly that the power of the analysis was actually quite low. In the map below, I zoom in on the most populate school district boundary, between Amphitheater and Tucson Unified School Districts. The gray lines represent the lengthscale of the Gaussian Process fitted by maximum likelihood ($\mathscr{l} \approx 1300~\mathrm{ft}$). Relatively few houses actually fall within this margin, and so the posterior standard deviation of the treatment effect ended up being disappointingly high.

![Zoomed Tucson Map](Tucson_plots/tucson_zoom.png)

If we color the dots (house sales) by their price, it also becomes clear that there is no discontinuous jump visible by eye. Next time I look for an example dataset, I will look for a strong signal visible by eye..

![Zoomed Tucson Map](Tucson_plots/tucson_zoom_colored.png)

# Handling Covariates

One problem I was forced to confront with this example is how to think about and how to handle covariates. The simplest thing to do was to perform a linear regression on all the data, ignoring spatial correlations, and then fit a $\gp$ to the residuals. Applying the 2$\gp$ method to the residuals gave the following "treatment effect cliff face" between Amphitheater and Tucson:

![Cliff Face](Tucson_plots/cliff_face_naive_residuals.png)

(I also tried a Ridge regression with a fairly weak L2 penalty). The results looked promising, but I didn't believe they were trustworthy, because ignoring the spatial correlations in the linear regression stage seems wrong to me.

Because I had a lot of covariates, many of them quite sparse, I decided that some regularization of the $\beta$'s was needed, by giving them a normal prior. This presented the opportunity of treating the linear trend as just another component of the Gaussian Process. This was tempting because it meant I could tune a single hyperparameter, the prior variance of the $\beta$'s. Normally, adding a kernel component comes at a very low computation cost. It's just a modification of the covariance matrix, which still only needs to be inverted once. So the only cost is an additional hyperparameter (the prior variance $\sigma_\beta^2$) to be tuned. But in this case, it induces correlation *between* school districts, which means the data of the districts can no longer be fitted independently. I therefore focused on only 2015 data, with the results below. The envelope is now wider, because we have less data. There isn't much of a treatment effect, except perhaps the peak on the right (which is actually the West-most part of the boundary).

![Cliff Face](Tucson_plots/cliff_face_TA4_fullGP_2015.png)

At this point, I explored a few different modifications to the methodology, which all made very little difference. This is perhaps good news, as we wouldn't want our results to be hugely dependent on details of the method, but also disappointing because I was hoping for a treatment effect to emerge.

Firstly, I stepped away from treating the linear trend as part of the $\gp$, and instead plugged in their posterior mean. The posterior means are derived in a Bayesian generalized linear regression approach, incorporating spatial correlations, and the prior $\beta \sim \normal\del{0,\sigma_\beta^2}$. This change made no difference to the results (see below), which reassured me that the posterior variance on $\beta$ isn't critical (which makes sense, as the entire dataset is large), and also that my implementations of the two procedures are consistent, and therefore less likely to be buggy.

![Cliff Face](Tucson_plots/cliff_face_TA4_plugin_2015.png)

In a discussion a couple of weeks ago, Luke M also suggested that I should fit the $\beta$'s on only one of the school districts (the biggest one), to make sure that the treatment effect isn't affecting the estimates, or that the linear regression isn't somehow trying to paper over the treatment effect. Again, no difference, which reassured me that there wasn't some crazy interference between the linear regression and the rest of the model.

![Cliff Face](Tucson_plots/cliff_face_TA6_plugin_local_2015.png)

Lastly, I thought it could be a good idea to go back to using the entire dataset, in case the absence of a detectable treatment effect is simply due to a lack of power. In the meantime, I had implemented a series of improvements to Julia's GaussianProcesses package, which made it more realistic to use the entire dataset. I decided Luke's idea of fitting the parameters withing the largest school district was still a good idea, for the following reasons:

* easier to think about
* less questionable statistically
* easier to implement
* quicker to fit (subset of the data)

But still, the cliff face is almost unchanged. The credible envelope doesn't get much thinner, and there still isn't a treatment effect smoking gun.

![Cliff Face](Tucson_plots/cliff_face_TA7_plugin_local_all.png)