In [9]:
#####READ THIS FIRST
#Given the current sample size and analysis (regressions), you will see that single family homes and town houses 
#show that different natural elements within the photo have a different relationship with days on market. 
#I am using only the manually created tags of the photo for these regression models.
#Please note that these results are a generalized model for all photos and properties.
#This could be interesting to further explore with more categories (home properties and photo properties),
#larger sample size (more months, bigger residential areas).



library (readr)

urlfile="https://raw.githubusercontent.com/jdhl85/TDI/master/redfin_2020-08-02-13-19-53.csv"
JulyRestonSold<-read_csv(url(urlfile))

Parsed with column specification:
cols(
  .default = col_double(),
  `SALE TYPE` = [31mcol_character()[39m,
  `SOLD DATE` = [31mcol_character()[39m,
  `PROPERTY TYPE` = [31mcol_character()[39m,
  ADDRESS = [31mcol_character()[39m,
  CITY = [31mcol_character()[39m,
  `STATE OR PROVINCE` = [31mcol_character()[39m,
  LOCATION = [31mcol_character()[39m,
  STATUS = [31mcol_character()[39m,
  `NEXT OPEN HOUSE START TIME` = [33mcol_logical()[39m,
  `NEXT OPEN HOUSE END TIME` = [33mcol_logical()[39m,
  `URL (SEE http://www.redfin.com/buy-a-home/comparative-market-analysis FOR INFO ON PRICING)` = [31mcol_character()[39m,
  SOURCE = [31mcol_character()[39m,
  `MLS#` = [31mcol_character()[39m,
  FAVORITE = [31mcol_character()[39m,
  INTERESTED = [31mcol_character()[39m,
  comments = [31mcol_character()[39m
)

See spec(...) for full column specifications.



In [10]:
library(dplyr)
library(tidyr)

#process/clean data
JulyRestonSold = JulyRestonSold %>% drop_na(score) #remove rows that are not usable
JulyRestonSold[,2:9][is.na(JulyRestonSold[,2:9])] = 0 #NA's that should be zeros
JulyRestonSold=JulyRestonSold %>% mutate(front = ifelse(combo == 1, 0, front)) #make each location of picture taken mutually exclusive
JulyRestonSold=JulyRestonSold %>% mutate(back = ifelse(combo == 1, 0, back))
JulyRestonSold=JulyRestonSold %>% mutate(side = ifelse(combo == 1, 0, side))
JulyRestonSold=JulyRestonSold %>% mutate(inside = ifelse(combo == 1, 0, inside))
JulyRestonSold$backOrSide = JulyRestonSold$back +JulyRestonSold$side #combine side and back pictures to get reasonable sample size

#clean/codify property types
JulyRestonSold=JulyRestonSold %>% mutate(`PROPERTY TYPE` = ifelse(`PROPERTY TYPE` == "Other", "Single Family Residential", `PROPERTY TYPE`))
JulyRestonSold=JulyRestonSold %>% mutate(`PROPERTY TYPE` = ifelse(`PROPERTY TYPE` == "Single Family Residential", 0, 
                                                                  ifelse(`PROPERTY TYPE` == "Townhouse", 1, 2))) #0 sfh, 1 th, 2 condo
#re-categorize to low avg high
JulyRestonSold$scoreCat = ifelse(JulyRestonSold$score == 1 | JulyRestonSold$score == 2, 1,
                                 ifelse(JulyRestonSold$score == 3, 2, 3)) #1 low, 2 avg, 3 high
#take number of nature elements in picture
JulyRestonSold$numNat = JulyRestonSold$trees +JulyRestonSold$sky +JulyRestonSold$grass #counting how many nature elements there are in picture

#get better view of data to be used/further processed
#keep columns 5,6,7,8,9,13,15,16,20,21,22,24,27,28,41,42,43
JRS_inProg = JulyRestonSold[,c(5,6,7,8,9,13,15,16,20,21,22,24,27,28,41,42,43)]
JRS_inProg$priceDiff = JRS_inProg$PRICE - JRS_inProg$`redfin est`
JRS_inProg$scoreAvg = ifelse(JRS_inProg$scoreCat==2, 1,0)
JRS_inProg$scoreHigh = ifelse(JRS_inProg$scoreCat==3, 1,0)
JRS_inProg$Nat1 = ifelse(JRS_inProg$numNat==1, 1,0)
JRS_inProg$Nat2 = ifelse(JRS_inProg$numNat==2, 1,0)
JRS_inProg$Nat3 = ifelse(JRS_inProg$numNat==3, 1,0)
#will not use condo data since it seems a bit harder to produce quality pictures of condos
JRS_inProg = subset(JRS_inProg, JRS_inProg$`PROPERTY TYPE` != 2)

#create sets for sfh and th, and remove unnecessary columns
cols4days= c("DAYS ON MARKET","backOrSide","inside","combo","trees","sky","grass","scoreAvg","scoreHigh")
JRS_days= subset(JRS_inProg, select = cols4days)
singleFamHomes = subset(JRS_inProg,JRS_inProg$`PROPERTY TYPE`==0)
sfh_days= subset(singleFamHomes, select = cols4days)
townHomes = subset(JRS_inProg,JRS_inProg$`PROPERTY TYPE`==1)
th_days= subset(townHomes, select = cols4days)

print("hi")

[1] "hi"


In [17]:
poiss_mod = glm(JRS_days$`DAYS ON MARKET`~., family="poisson", data=JRS_days)
summary(poiss_mod)

#For all (single family and town homes), when photograph of inside it sold faster
#when there were trees in the photo it sold faster
#when there was grass in the photo it sold slower


Call:
glm(formula = JRS_days$`DAYS ON MARKET` ~ ., family = "poisson", 
    data = JRS_days)

Deviance Residuals: 
    Min       1Q   Median       3Q      Max  
-4.0726  -2.0884   0.0421   1.5787   3.4751  

Coefficients:
            Estimate Std. Error z value Pr(>|z|)    
(Intercept)  3.26161    0.11387  28.642  < 2e-16 ***
backOrSide   0.03211    0.13592   0.236 0.813235    
inside      -1.04119    0.29093  -3.579 0.000345 ***
combo       -0.20384    0.16285  -1.252 0.210697    
trees       -0.71818    0.11822  -6.075 1.24e-09 ***
sky          0.09223    0.05958   1.548 0.121630    
grass        0.09803    0.05931   1.653 0.098334 .  
scoreAvg    -0.09710    0.06779  -1.432 0.152036    
scoreHigh   -0.07938    0.07864  -1.009 0.312790    
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for poisson family taken to be 1)

    Null deviance: 527.71  on 97  degrees of freedom
Residual deviance: 474.76  on 89  degrees of freedom
AIC: 904.33

Num

In [18]:
poiss_mod = glm(sfh_days$`DAYS ON MARKET`~.,family="poisson", data=sfh_days)
summary(poiss_mod)

#For single family homes, there were no significant results


Call:
glm(formula = sfh_days$`DAYS ON MARKET` ~ ., family = "poisson", 
    data = sfh_days)

Deviance Residuals: 
    Min       1Q   Median       3Q      Max  
-3.7247  -1.6217  -0.1189   1.2981   3.6063  

Coefficients: (2 not defined because of singularities)
            Estimate Std. Error z value Pr(>|z|)    
(Intercept)  2.53470    0.14595  17.367  < 2e-16 ***
backOrSide   0.54772    0.16490   3.322 0.000895 ***
inside            NA         NA      NA       NA    
combo        0.18290    0.20556   0.890 0.373598    
trees             NA         NA      NA       NA    
sky          0.16524    0.10489   1.575 0.115169    
grass       -0.18092    0.13203  -1.370 0.170581    
scoreAvg    -0.06234    0.13861  -0.450 0.652901    
scoreHigh    0.01635    0.14118   0.116 0.907822    
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for poisson family taken to be 1)

    Null deviance: 209.44  on 42  degrees of freedom
Residual deviance: 193.58  o

In [19]:
poiss_mod = glm(th_days$`DAYS ON MARKET`~.,family="poisson", data=th_days)
summary(poiss_mod)

#For town homes, photos with locations other than the front(baseline vairable) sold faster
#photos with trees sold faster
#while photos with grass sold slower


Call:
glm(formula = th_days$`DAYS ON MARKET` ~ ., family = "poisson", 
    data = th_days)

Deviance Residuals: 
    Min       1Q   Median       3Q      Max  
-4.5121  -1.7620   0.1729   1.3339   3.2698  

Coefficients:
            Estimate Std. Error z value Pr(>|z|)    
(Intercept)  3.22377    0.11900  27.090  < 2e-16 ***
backOrSide  -0.69344    0.25790  -2.689 0.007171 ** 
inside      -0.99519    0.29432  -3.381 0.000722 ***
combo       -0.64480    0.28824  -2.237 0.025286 *  
trees       -0.63073    0.12383  -5.093 3.52e-07 ***
sky          0.05287    0.07550   0.700 0.483775    
grass        0.29248    0.07405   3.950 7.83e-05 ***
scoreAvg    -0.07645    0.08132  -0.940 0.347165    
scoreHigh   -0.14197    0.11560  -1.228 0.219419    
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for poisson family taken to be 1)

    Null deviance: 310.01  on 54  degrees of freedom
Residual deviance: 236.21  on 46  degrees of freedom
AIC: 488.52

Numbe