<a target="_blank" href="https://colab.research.google.com/github/ignaciomsarmiento/BDML_202302/blob/main/Lecture05/Notebook_SS05_Spatial.ipynb">
  <img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/>
</a>


##  House Prices Indices

Our objective today is to construct a model to predict house prices. From Rosen's landmark paper "Hedonic Prices and Implicit Markets: Product Differentiation in Pure Competition" (1974), we know that a vector of its characteristics describes a differentiated good.

In the case of a house, these characteristics may include structural attributes (e.g., number of bedrooms), neighborhood public services (e.g., local school quality), and local amenities (e.g., crime, air quality, etc). Thus, we can write the market price of the house as:

$$
Price=f(structural\,attributes,amenities,...)
$$


However, Rosen's theory doesn't tell us much about the functional form of $f$. 

### The Ames Housing Data

For this exersise we are going to use housing data from Ames, Iowa, available on the `modeldata` package.

Let's load the packages:

In [1]:
# install.packages("pacman") #run this line if you use Google Colab

In [2]:
#packages
require("pacman")
p_load("tidyverse", #data wrangling
       "modeldata", # package with the housing data from Ames, Iowa
       "vtable", #descriptive stats package
       "stargazer", #tidy regression results,
       "sf", #handling spatial data,
       "broom" #tidy data set
        ) #spatial CV



Loading required package: pacman



 And the data set:

In [3]:
data("ames", package = "modeldata")

The Ames housing data is a normal [tibble](https://tibble.tidyverse.org/).

In [4]:
head(ames)

MS_SubClass,MS_Zoning,Lot_Frontage,Lot_Area,Street,Alley,Lot_Shape,Land_Contour,Utilities,Lot_Config,⋯,Fence,Misc_Feature,Misc_Val,Mo_Sold,Year_Sold,Sale_Type,Sale_Condition,Sale_Price,Longitude,Latitude
<fct>,<fct>,<dbl>,<int>,<fct>,<fct>,<fct>,<fct>,<fct>,<fct>,⋯,<fct>,<fct>,<int>,<int>,<int>,<fct>,<fct>,<int>,<dbl>,<dbl>
One_Story_1946_and_Newer_All_Styles,Residential_Low_Density,141,31770,Pave,No_Alley_Access,Slightly_Irregular,Lvl,AllPub,Corner,⋯,No_Fence,,0,5,2010,WD,Normal,215000,-93.61975,42.05403
One_Story_1946_and_Newer_All_Styles,Residential_High_Density,80,11622,Pave,No_Alley_Access,Regular,Lvl,AllPub,Inside,⋯,Minimum_Privacy,,0,6,2010,WD,Normal,105000,-93.61976,42.05301
One_Story_1946_and_Newer_All_Styles,Residential_Low_Density,81,14267,Pave,No_Alley_Access,Slightly_Irregular,Lvl,AllPub,Corner,⋯,No_Fence,Gar2,12500,6,2010,WD,Normal,172000,-93.61939,42.05266
One_Story_1946_and_Newer_All_Styles,Residential_Low_Density,93,11160,Pave,No_Alley_Access,Regular,Lvl,AllPub,Corner,⋯,No_Fence,,0,4,2010,WD,Normal,244000,-93.61732,42.05125
Two_Story_1946_and_Newer,Residential_Low_Density,74,13830,Pave,No_Alley_Access,Slightly_Irregular,Lvl,AllPub,Inside,⋯,Minimum_Privacy,,0,3,2010,WD,Normal,189900,-93.63893,42.0609
Two_Story_1946_and_Newer,Residential_Low_Density,78,9978,Pave,No_Alley_Access,Slightly_Irregular,Lvl,AllPub,Inside,⋯,No_Fence,,0,6,2010,WD,Normal,195500,-93.63893,42.06078


The description of the variables can be viewed here: https://jse.amstat.org/v19n3/decock/DataDocumentation.txt

### Modelling Prices

Let's say that the logarithm of sale price of these houses is a linear model on their living area (size),  the type of house, and the Neighboorhood:


In [5]:
table(ames$Year_Sold)



2006 2007 2008 2009 2010 
 625  694  622  648  341 

In [6]:
class(ames$Year_Sold)

In [8]:
ames<- ames  %>% mutate(year=factor(Year_Sold,levels=c(2006,2007,2008,2009,2010),
                                    labels=c("d2006","d2007","d2008","d2009","d2010")))

In [None]:
table(ames$year)

In [None]:
class(ames$year)

In [10]:
reg1<-lm(Sale_Price ~year+ Gr_Liv_Area  + Bldg_Type ,data=ames)
stargazer(reg1,type="text")


                        Dependent variable:    
                    ---------------------------
                            Sale_Price         
-----------------------------------------------
yeard2007                    1,268.544         
                            (2,984.665)        
                                               
yeard2008                   -1,933.982         
                            (3,063.130)        
                                               
yeard2009                     -70.717          
                            (3,034.576)        
                                               
yeard2010                   -3,071.273         
                            (3,641.608)        
                                               
Gr_Liv_Area                 113.814***         
                              (1.998)          
                                               
Bldg_TypeTwoFmCon         -59,017.220***       
                            (6,961.688)