## Application: Hedonic Price Function of Houses
---

#### Dependent variable

+ price    - sale price of a house

#### Independent variables

+ lotsize  - lot size of a property in square feet
+ bedrooms - number of bedrooms
+ bathrms  - number of full bathrooms
+ stories  - number of stories excluding basement
+ driveway - does the house has a driveway?
+ recroom  - does the house has a recreational room?
+ fullbase - does the house has a full finished basement?
+ gashw    - does the house uses gas for hot water heating?
+ airco    - does the house has central air conditioning?
+ garagepl - number of garage places
+ prefarea - is the house located in the preferred neighbourhood of the city?

#### Source:

Sales Prices of Houses in the City of Windsor

https://vincentarelbundock.github.io/Rdatasets/datasets.html

Anglin, P.M. and R. Gencay (1996) “Semiparametric estimation of a hedonic price function,” Journal of Applied Econometrics, 11(6), 633-648.


In [1]:
%matplotlib inline
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from IPython.display import display
import pymc3 as pm



#### Hedonic regression model

$$
 \log(\text{price})
 = \text{constant}
 + \beta_1\log(\text{lotsize})
 + \beta_2\text{bedrooms}
 + \beta_3\text{bathrms}
 + \beta_4\text{stories}
 + \beta_5\text{driveway}
 + \beta_6\text{recroom}
 + \beta_7\text{fullbase}
 + \beta_8\text{gashw}
 + \beta_9\text{airco}
 + \beta_{10}\text{garagepl}
 + \beta_{11}\text{prefarea}
 + \text{error}.
$$

In [2]:
data = pd.read_csv('Housing.csv', index_col=0)
column_names = data.columns
qualitative = ['driveway', 'recroom', 'fullbase', 'gashw', 'airco', 'prefarea']
dummy = data[qualitative].replace('yes', 1)
dummy = dummy.replace('no', 0)
data[qualitative] = dummy
data['lotsize'] = np.log(data['lotsize'])
n = data.shape[0]
y = np.log(data['price'].values)
X = np.hstack((np.ones((y.size, 1)), data[column_names[1:]].values))
var_names = np.concatenate((['constant'], column_names[1:], ['$\\sigma^2$']))

The prior distribution of $\beta$ and $\sigma^2$ are

\begin{align*}
 \beta &\sim \mathrm{Normal}\left(\mu_\beta,\Omega_\beta\right), \\
 \sigma^2 &\sim \mathrm{Inv.Gamma}\left(\frac{\nu_0}{2},\frac{\lambda_0}{2}\right).
\end{align*}

In [3]:
k = X.shape[1]
mu_b = np.zeros(k)
Omega_b = 100.0 * np.eye(k)
nu0 = 0.02
lam0 = 0.02

#### Model setup

In [4]:
multiple_regression = pm.Model()
with multiple_regression:
    sigma2 = pm.InverseGamma('sigma2', alpha=0.5*nu0, beta=0.5*lam0)
    b = pm.MvNormal('b', mu=mu_b, cov=Omega_b, shape=k)
    y_hat = pm.math.dot(X, b)
    likelihood = pm.Normal('y', mu=y_hat, sd=pm.math.sqrt(sigma2), observed=y)

#### Markov chain sampling

In [5]:
n_draws = 5000
n_chains = 4
n_tune = 1000
with multiple_regression:
    trace = pm.sample(draws=n_draws, chains=n_chains, tune=n_tune, random_seed=123)

Auto-assigning NUTS sampler...
Initializing NUTS using jitter+adapt_diag...
Multiprocess sampling (4 chains in 2 jobs)
NUTS: [b, sigma2]
Sampling 4 chains: 100%|██████████| 24000/24000 [08:58<00:00, 44.54draws/s]


In [6]:
post_stats = pm.summary(trace)
post_stats.index = var_names
display(post_stats)

Unnamed: 0,mean,sd,mc_error,hpd_2.5,hpd_97.5,n_eff,Rhat
constant,7.743899,0.216189,0.002116,7.317813,8.159669,11051.799512,1.000113
lotsize,0.303343,0.026739,0.000262,0.252907,0.357378,10928.397596,1.000143
bedrooms,0.0342,0.014306,0.000107,0.006694,0.062564,16602.039497,0.999919
bathrms,0.165781,0.020484,0.000149,0.125799,0.205438,18465.265687,1.000066
stories,0.091734,0.012828,0.000108,0.066871,0.117143,15817.326696,1.000156
driveway,0.110141,0.02849,0.000224,0.055366,0.167297,18445.800762,1.000035
recroom,0.057639,0.026012,0.000226,0.008181,0.109855,15903.166881,0.999944
fullbase,0.104489,0.021999,0.00018,0.061412,0.147303,15980.114626,1.00002
gashw,0.179041,0.04352,0.000302,0.095071,0.265299,22202.737616,0.999924
airco,0.166724,0.021236,0.000162,0.124287,0.207731,17955.769128,0.99993
