In [None]:
knitr::opts_chunk$set(echo = TRUE)

Disclaimer : This is the Markdown file is for Task B

In [None]:
rm(list = ls())
library(plyr)
library(dplyr)
library(tidyverse)
library(ggplot2)
library(ggpubr)
library(rgdal)
library(geojsonio)
library(lubridate)
library(cowplot)
library(leaflet)
library(reshape)
library(raster)
library(RColorBrewer)
library(spatialEco)
library(htmltools)

B1. Create a GeoJSON file where each postcode is represented with a latitude, longitude value, together with minimum, maximum, mean and median house price.

Ans: We create a GeoJSON file with the required information

We load the required data. We remove the ID Column from the postcodes data set as it is a redundant column. Then we create a column to store the years of our original dataset.This is done for ease of viewing and grouping data ahead.

In [None]:
setwd("D:\\BSE\\BSE Material\\sem 2\\Data Vis\\Project")
pp_data <- read.csv("ppdata_lite.csv")


# Load file with postcodes and latitude/longitude
ukpostcodes <- read.csv("ukpostcodes.csv", header = TRUE, sep = ',')
#id seems redundant
ukpostcodes <- ukpostcodes[-1]
#selecting only required data
ppdata <- pp_data %>%
  mutate(year = as.POSIXlt(date_of_transfer)$year +1900)

We create a seperate dataframe which would contain postcodes,prices and years.And find the mean,max,min and median for the postcodes available.

In [None]:
ppdata <- ppdata %>%
  group_by(postcode)%>%
  summarise_at(vars(price),list(mean_price = mean,
                                median_price = median,
                                max_price = max,
                                min_price = min))
  
head(ppdata)

ppdata2 <- ppdata[-1,]

We then remove the first row as it contains values starting from 0 which we would not require and throws an error when we merge the data with ukpostcodes. We create a spatial dataset and create the required GeoJSON file.

In [None]:
ppdata2 <- ppdata[-1,]
# Create GeoJSON file
merge_data <- merge(ppdata2, ukpostcodes, by = "postcode")

coordinates(merge_data) <- c("latitude", "longitude")

head(merge_data)

data <- merge_data

writeOGR(data, 
         layer = "merge_data", 
         check_exists = TRUE,
         overwrite_layer = T,
         driver = "GeoJSON", 
         dsn = "Plot_data.geojson" )

B2. Open the GeoJSON file in the GIS application of your choice and colour-code the data to give an overview of areas with high, medium and low median house price. Additionally, you can visualise this information as cloropleths or use shiny and add the information as markers on a map for a more interactive and impressive

Ans: The question requires us to use a GIS application and view areas with high,medium and low median house prices. On further discussion with colleagues, I use the Area shapial data which contains the first( in some cases first two) letters of the postcodes. This allowed in reducing the running time, a major constraining factor while computing on the device used and helped in displaying the spatial data better for the required plot. 

We create a dataset this time by first getting the first using the old dataset and then mutating the postcode column by only keeping the first or the first two letters of the postal code.
We then as instructed find the mean,median,max and min prices of the houses according to these post codes.

In [None]:
Dataset <- pp_data
Dataset$postcode <- gsub('[[:digit:]]+', '', Dataset$postcode)
Dataset$postcode <- substr(Dataset$postcode,start = 1,stop = 2)
Dataset <- na.omit(Dataset)
Dataset <- Dataset%>%
  group_by(postcode)%>%
  summarise_at(vars(price),list(mean_price = mean,
                                median_price = median,
                                max_price = max,
                                min_price = min))
Dataset <- Dataset[-1,]

head(Dataset)

We get our Area specific spatial data and combine it with the dataset created to obtain a spaital dataset which contains the longitude,latitude, price statistics and postcodes of the data that is to be plotted 

In [None]:
Area <- shapefile("shapes/Areas.shp")
class(Area) 

Map_data <- merge(Area,Dataset,by.x = 'name',by.y = "postcode")

head(Map_data)
Map_data <- sp.na.omit(Map_data)

Before Plotting we need to defined how we will divide our data to show different house prices and if they are low,medium or high. Instead of sticking to three catergories, I have chosen to divide the values into 6 parts based on its percentiles. We then assign colours to these required intervals.  

In [None]:
intervals = quantile(Map_data$mean_price, probs = c(0.167,0.33,0.5,0.667,0.833,1),names = F ,na.rm = T)
values <- append(intervals,0,0)
factpal <- colorBin("PRGn", bins = values , domain =Map_data$mean_price)

We then try to display the UK property prices using these intervals based on the postcodes. 

In [None]:
mapplot_mean <- leaflet(Map_data) %>% setView(lng=-2, lat=52.2783, zoom = 8) %>%
  addProviderTiles("Stamen.TonerHybrid")  %>%
  addPolygons(fillColor = ~factpal(Map_data$mean_price),weight = 0.2,fillOpacity = 0.5, 
            smoothFactor = 0.2)%>%
  addLegend(pal = factpal, 
            values = Map_data$mean_price, 
            title = "Mean HP data")

mapplot_mean

We now use Median instead of mean, as frequency of different properties might now display the true average value of property prices.Median is better for skewed distributions, so chosing median would gives us a better, more robust and a sensible plot. Median free's us form the disadvantage of means considering not just the values but also their occurences.

In [None]:
intervals = quantile(Map_data$median_price, probs = c(0.167,0.33,0.5,0.667,0.833,1),names = F ,na.rm = T)
values <- append(intervals,0,0)
factpal <- colorBin("PRGn", bins = values , domain =Map_data$median_price)


mapplot_median <- leaflet(Map_data) %>% setView(lng=-2, lat=52.2783, zoom = 8) %>%
  addProviderTiles("Stamen.TonerHybrid")  %>%
  addPolygons(fillColor = ~factpal(Map_data$median_price),weight = 0.5,fillOpacity = 0.5,
              smoothFactor = 0.25)%>%
  addLegend(pal = factpal, 
            values = Map_data$median_price, 
             title = "Median HP data")
mapplot_median
