Skip to content
Permalink
Branch: master
Find file Copy path
Find file Copy path
Fetching contributors…
Cannot retrieve contributors at this time
634 lines (486 sloc) 27.7 KB
---
title: Vancouver population density over time
authors:
- Stuart Smith
- Jens von Bergmann
date: '2019-06-09'
slug: vancouver-population-density-over-time
categories:
- cancensus
- CensusMapper
- density
- Vancouver
- zoning
tags: []
description: "A detailed look at growth in the City of Vancouver"
featured: 'cluster-map-41-1.png'
images: ["https://doodles.mountainmath.ca/posts/2019-06-09-vancouver-population-density-over-time_files/figure-html/cluster-map-41-1.png"]
featuredalt: "Clustered growth map"
featuredpath: "/posts/2019-06-09-vancouver-population-density-over-time_files/figure-html"
linktitle: ''
type: "post"
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(
echo = FALSE,
fig.width = 8,
cache=TRUE,
message = FALSE,
warning = FALSE
)
library(tidyverse)
library(sf)
library(cancensusHelpers)
library(cancensus)
library(rayshader)
library(png)
library(raster)
library(rmapshaper)
library(gganimate)
my_map_theme <- list(
theme_void(),
theme(legend.position="bottom"),
guides(fill=guide_legend(ncol=2))
)
```
Canadian census data is freely available, alas not in a very convenient format for older data. Census data back to 1991 are [available from Statistics Canada](https://www12.statcan.gc.ca/datasets/index-eng.cfm?Temporal=2016) with an open data licence, digital geographic data is only available back to 2001. Older census data is available in digital format via paid subscription services from private entities with restrictive licences. But all data is available for free as open data in paper format.
Stuart Smith took it upon himself to go to the library and digitize (i.e. type into a spreadsheet) old census data all the way back to 1941. And he used the reference maps to trace back the areas to [tongfen](https://github.com/mountainMath/tongfen) to modern census boundaries. Which it totally awesome. And Stuart has been making neighbourhood timelines using that data to great effect.
We decided to team up and take another look at the data to understand population growth in the City of Vancouver, as well as Musqueam 2 and the UBC/UEL/UNA area.
```{r data-import}
column_names_from_first_row_pure <- function(data) {
colnames(data) = data[1, ]
data[-1,]
}
#stus_awesome_spreadsheet_all <- "https://docs.google.com/spreadsheets/d/17r0jYYKLUG1Cezng5AHRQIRSubSgWjNvuaX3otn0zDs/export?format=tsv&id=17r0jYYKLUG1Cezng5AHRQIRSubSgWjNvuaX3otn0zDs&gid=880744385"
#stus_awesome_spreadsheet_all <- "https://docs.google.com/spreadsheets/d/1eadA5pTijP5LNO60s-H7Cg6yCYGvZpaPxHP6Kfg86Lg/edit?usp=sharing"
stus_awesome_spreadsheet_all <- here::here("static/data/Census Tract Populations - All.tsv")
raw_data <- read_tsv(stus_awesome_spreadsheet_all,col_types = cols(.default = "c"))
translation_table <- raw_data[2,] %>%
dplyr::select(seq(1,38) %>% as.character) %>%
t %>%
as.tibble(rownames = "Old")
expand.delimited <- function(x, col1=1, col2=2, sep=",") {
rnum <- 1
expand_row <- function(y) {
factr <- y[col1]
strng <- toString(y[col2])
expand <- strsplit(strng, sep)[[1]]
num <- length(expand)
factor <- rep(factr,num)
return(as.data.frame(cbind(factor,expand),
row.names=seq(rnum:(rnum+num)-1)))
rnum <- (rnum+num)-1
}
expanded <- apply(x,1,expand_row)
df <- do.call("rbind", expanded)
names(df) <- c(names(x)[col1],names(x)[col2])
return(df)
}
lookup_table <- translation_table %>% expand.delimited(col1="Old",col2="V1",sep=",") %>%
set_names(c("Old","CT")) %>%
mutate(base_CT=sub("\\.\\d*","",CT)) %>%
mutate(base_CT=str_pad(as.numeric(base_CT),3,pad="0"))
lookup <- set_names(as.character(lookup_table$Old),as.character(lookup_table$base_CT))
all_data <- raw_data[3:17,] %>%
rename(Old=Characteristic) %>%
dplyr::select(c("Old",seq(1:38) %>% as.character)) %>%
t %>%
as.tibble(rownames="Old") %>% column_names_from_first_row_pure %>%
mutate_at(vars(-Old),function(x){as.numeric(gsub(",","",x))}) %>%
mutate(Old=factor(Old,levels=seq(1:38) %>% as.character))
new_data <- read_csv("https://docs.google.com/spreadsheets/d/14y3Vh8SLtbgFZo_31sHVDlY-uno2INbvD4V40loz93w/export?format=csv&id=14y3Vh8SLtbgFZo_31sHVDlY-uno2INbvD4V40loz93w&gid=0") %>%
gather(key="Year",value="Population",-base_CT) %>%
mutate(Year=gsub("Population[,]* ","",Year)) %>%
mutate(base_CT=str_pad(as.numeric(base_CT),4,pad="0")) %>%
mutate(Population=as.numeric(gsub(",","",Population)))
```
```{r}
# start_year = "1971"
# end_year = "2016"
geo_data <- get_census("CA06",regions=list(CMA="59933"),labels="short",level="CT",geo_format = 'sf') %>%
mutate(CT=sub("933","",GeoUID)) %>%
mutate(base_CT=sub("\\.\\d*","",CT)) %>%
mutate(base_CT=str_pad(as.numeric(base_CT),3,pad="0")) %>%
mutate(id=lookup[base_CT]) %>%
filter(!is.na(id)) %>%
group_by(id) %>%
summarize(area=sum(`Shape Area`)) %>%
left_join(all_data,by=c("id"="Old"))
new_geo_data <- get_census("CA06",regions=list(CMA="59933"),labels="short",level="CT",geo_format = 'sf') %>%
mutate(CT=sub("933","",GeoUID)) %>%
mutate(base_CT=sub("\\.\\d*","",CT)) %>%
group_by(base_CT) %>%
summarise(area=sum(`Shape Area`)) %>%
left_join(new_data,by="base_CT")
new_years=new_geo_data$Year %>% unique %>% sort
```
```{r}
years <- c("1941", "1951", "1956", "1961", "1966", "1971", "1976", "1981", "1986", "1991", "1996", "2001", "2006", "2011", "2016")
ds <- all_data %>%
gather(key="Year",value="Population",years,factor_key=TRUE) %>%
group_by(Old) %>%
mutate(`Min Year`=years[which(min(Population)==Population)]) %>%
mutate(`Max Year`=years[which(max(Population)==Population)]) %>%
mutate(Date=as.Date(paste0(Year,"-01-01")))
plot_data <- geo_data %>%
left_join(ds %>% summarize(`Min Year`=first(`Min Year`),`Max Year`=first(`Max Year`)), by=c("id"="Old"))
```
```{r}
city_region <- list(CT=c("9330069.01","9330069.02"),CSD=c("5915022","5915803"))
vancouver <- get_census(dataset='CA16', regions=city_region,
vectors=c(), labels="short",
geo_format='sf', level='Regions')
bbox=st_bbox(vancouver)
vector_tiles <- simpleCache(get_vector_tiles(bbox),"van_city_ubc_vector_tiles")
# vector tiles return all layers (roads, water, buildings, etc) in a list
roads <- rmapzen::as_sf(vector_tiles$roads) %>% filter(kind != "ferry")
water <- rmapzen::as_sf(vector_tiles$water)
landuse_all <- rmapzen::as_sf(vector_tiles$landuse)
green_sites <-c("beach","cemetery","dog_park","farmland","forest","golf_course","grass","garden",
"protected_area","scrub","water_park","wetland","park","nature_reserve","meadow",#"military",
"natural_wood")
industrial_sites <- c("industrial","wastewater_plant","fuel","aerodrome")
commercial_sites <- c("commercial","retail","theatre")
institutional_sites <- c("university","military")
landuse <-landuse_all %>%
filter(kind %in% c(green_sites,industrial_sites))
landuse_van <- landuse %>% st_intersection(vancouver %>% dplyr::select() %>% st_union)
brightbness <- function(color){
col2rgb(color) %>%
as.tibble %>%
mutate(mult=c(0.2126, 0.7152, 0.0722)) %>%
mutate(res=V1*mult) %>%
pull(res) %>%
sum()/255
}
contrast_color = function(color){
purrr::map(color,function(c) {ifelse(brightbness(c) > 0.5, '#000000' , '#ffffff')}) %>% unlist
#purrr::map(color,function(c) {ifelse(as.integer(sub("#","0x",c)) > (0xffffff/4*3), '#000000' , '#ffffff')}) %>% unlist
}
label_colors_old <- function(values){
kableExtra::spec_color(values, option = "magma") %>% substr(1,7) %>% contrast_color
}
label_colors <- function(values){
ifelse(values<150,"#ffffff","#000000")
}
#ggplot(landuse,aes(fill=kind)) +geom_sf()
map_data <- geo_data %>%
st_difference(landuse %>% dplyr::select() %>% st_union()) %>%
mutate(hectares=as.numeric(st_area(.))/10000) %>%
bind_cols(st_centroid(.) %>% dplyr::select() %>% sfc_as_cols %>% st_set_geometry(NULL)) %>%
gather(key="Year",value="Population",years,factor_key=TRUE) %>%
mutate(Density=Population/hectares) %>%
group_by(Year) %>%
mutate(TotalDensity=sum(Population)/sum(hectares)) %>%
mutate(label_color=label_colors(Density))
new_map_data <- new_geo_data %>%
st_difference(landuse %>% dplyr::select() %>% st_union()) %>%
mutate(hectares=as.numeric(st_area(.))/10000) %>%
bind_cols(st_centroid(.) %>% dplyr::select() %>% sfc_as_cols %>% st_set_geometry(NULL)) %>%
mutate(Density=Population/hectares) %>%
group_by(Year) %>%
mutate(TotalDensity=sum(Population)/sum(hectares)) %>%
mutate(label_color=label_colors(Density)) %>%
filter(!is.na(Year))
```
When computing density there are always some choices to be made. Ideally we like to work with net density, that is only counting area taken up by residential lots. That works well for point-in-time density calculations [like we did recently](https://doodles.mountainmath.ca/blog/2019/02/21/planned-displacement/), but raises new questions when looking at time series because the designation of lots as residential changes over time. So we would either have to use changing geographies over time, or use some kind of union geography to aggregate residential land use over time. For this we decided to go a simpler route and just take out parks, but leave industrial and commercial land uses included. None of the effects the population growth in an area, but it does effect the population density.
With these caveats in mind, let's take a look at an animated population density in Vancouver between 1941 and 2016.
```{r}
path_for_year <- function(year){
file.path(tempdir(),paste0("density_3D_",year,".png"))
}
density_map_3D <- function(year){
print(paste0("Processing ",year))
shade_data <- map_data %>% filter(Year==year) %>% ms_simplify(0.05)
max_density <- shade_data$Density %>% max
base_3d <- ggplot(data = shade_data, mapping = aes(fill=Density)) +
geom_sf(size=0) +
scale_fill_continuous(low = "#010101",high = "white",guide=FALSE) +
theme_void() +
theme(legend.position = "none",
axis.ticks = element_blank(),
panel.grid = element_blank(),
axis.text = element_blank()) +
theme(plot.background = element_rect(fill = "black")) +
theme(panel.border = element_rect(fill = "black")) +
theme(panel.background = element_rect(fill = "black")) +
coord_sf(datum=NA, xlim=c(bbox$xmin,bbox$xmax), ylim=c(bbox$ymin,bbox$ymax))
ggsave(filename = "elevation-2d.png", plot = base_3d, width = 6, height = 4.376)
color_3d <- ggplot(data = shade_data) +
geom_sf(aes(fill=Density),size=0) +
scale_fill_viridis_c(limits=c(min(map_data$Density)-0.1,max(map_data$Density)+0.1),option="magma",guide=FALSE,trans="log") +
geom_sf(data=water,fill="lightblue",size=0,color=NA) +
geom_sf(data=landuse_van %>% filter(kind %in% green_sites),fill="grey",size=0,color=NA) +
geom_sf(data=landuse_van %>% filter(kind %in% industrial_sites),fill="grey",size=0,color=NA) +
geom_sf(data=roads %>% filter(kind %in% c("highway","major_road")),color="black",size=0.1) +
geom_sf(size=0.5,fill=NA,color="#444444") +
theme_void() +
theme(legend.position = "none",
axis.ticks = element_blank(),
panel.grid = element_blank(),
axis.text = element_blank()) +
theme(plot.background = element_rect(fill = "white")) +
theme(panel.border = element_rect(fill = "white")) +
theme(panel.background = element_rect(fill = "white")) +
coord_sf(datum=NA, xlim=c(bbox$xmin,bbox$xmax), ylim=c(bbox$ymin,bbox$ymax))
ggsave(filename = "elevation-2d-color.png", plot = color_3d, width = 6, height = 4.376)
raster::raster("elevation-2d.png") -> localtif
# And convert it to a matrix:
elmat <- matrix(raster::extract(localtif,raster::extent(localtif),buffer=1000),
nrow=ncol(localtif),ncol=nrow(localtif))
elmat <- elmat[11:(nrow(elmat)-10),11:(ncol(elmat)-10)]
ecolor <- readPNG("elevation-2d-color.png")
ecolor <- ecolor[11:(nrow(ecolor)-10),11:(ncol(ecolor)-10),1:4]
ecolor[,,4] <- .9
elmat %>%
sphere_shade(progbar = FALSE,texture = "bw") %>%
add_overlay(overlay = ecolor) %>%
add_shadow(ray_shade(elmat,maxsearch = 300,zscale = 250/max_density,progbar = FALSE),0.7) %>%
plot_3d(elmat,fov=30,zscale = 250/max_density, theta=0, phi=35, windowsize=c(1024,600), zoom=0.6)
render_label(elmat,x = 150, y = 100, z = 1/10, text = year, textsize = 2, linewidth = 4)
render_snapshot(path_for_year(year))
rgl::clear3d()
unlink("elevation-2d-color.png")
unlink("elevation-2d.png")
}
```
```{r compute-animated-map}
gif_path=here::here("/static/images/van_population_animated.gif")
if (!file.exists(gif_path)) {
years %>% lapply(density_map_3D)
file_paths <- map(years,path_for_year) %>% unlist
magick::image_write_gif(magick::image_read(file_paths),
path = gif_path,
delay = 1)
#unlink(file_paths)
}
```
![Population density](/images/van_population_animated.gif)
This gives an overview of where Vancouver has added population -- and where it hasn't. We want to take a more systematic look at how this played out in the different areas.
## Classifying growth and density patterns
Looking at the changes in population density, we wanted to dive a little deeper and cluster neighbourhoods by how they grew. Trying not to pre-impose our ideas we settled on simple unsupervised k-means clustering. We did play a little bit with the number of clusters, 6 seemed like a good number to get some nuance but still keep things simple. As variables we used growth between the first and last years in our series, and current density. We applied a log scale and normalized both before clustering.
```{r}
first_year=first(years)
last_year=last(years)
span=as.integer(last_year)-as.integer(first_year)
df <- map_data %>%
st_set_geometry(NULL) %>%
dplyr::select(c("id","Year","Density")) %>%
group_by(id) %>%
mutate(min=min(Density),
max=max(Density)) %>%
spread(key="Year",value="Density") %>%
mutate(growth=(!!as.name(last_year)/!!as.name(first_year))**(1/span)) %>%
ungroup %>%
mutate(rel_density_r=log(!!as.name(last_year))) %>%
mutate(growth_r=log(growth)) %>%
mutate(rel_density=log(!!as.name(last_year))/max(log(!!as.name(last_year)))) %>%
mutate(growth=log(growth)/max(log(growth)))
set.seed(12345)
cl <- kmeans(df %>% dplyr::select(growth,rel_density),centers=6,algorithm = "MacQueen")
df$cluster=cl$cluster %>% as.character
clusters=df$cluster %>% unique %>% sort
type_color <- set_names(RColorBrewer::brewer.pal(length(clusters),"Dark2"),clusters)
ggplot(df,aes(x=growth_r,y=rel_density_r))+
geom_point(aes(color=cluster)) +
scale_color_manual(values = type_color) +
ggrepel::geom_label_repel(aes(label=id)) +
labs(title=paste0("Growth since ",first_year," vs Density"),x="Average annual growth (log scale)",y="Density (log scale)",color="K-means",caption="StatCan Census")
#ggsave("~/Desktop/cluster_scatter.png")
```
The scatter plot shows that there us a large variation in the data, ranging from population decline for Area 6 that is part of our low-growth cluster, to tremendous growth for our high-growth cluster comprised of Areas 37 and 38 where population grew by a factor of 20.
Generally the clustering makes intuitive sense, although there are some edge cases. The labels correspond to the 1941 census tract boundaries, we kept them as labels for the neighbourhoods. There is a reference map [further down](#reference_map).
We could have also used a more complex model to look at the whole time series for the clustering, but results were not much different and the simple clustering seemed to make sense when looking at the time series.
```{r fig.height=8, fig.width=8}
#df$type <- as.character(cutree(hc, k = 5))
map_data_type <- map_data %>% left_join(df %>% dplyr::select(id,cluster))
levels <- map_data_type %>% filter( Year=="2016") %>% ungroup %>% arrange(cluster) %>% pull(id)
plot_data <- map_data_type %>%
ungroup %>%
mutate(id=factor(id,levels=levels)) %>%
mutate(Date=as.Date(paste0(Year,"-05-01")))
ggplot(plot_data,aes(x=Date,y=Density,color=cluster)) +
geom_line() +
theme_bw() +
scale_color_manual(values = type_color) +
labs(title="Population density in Vancouver 1941 Census Tracts over time",color="Cluster") +
facet_wrap("id", ncol=5) + #,scales="free_y"
theme(axis.title.y=element_blank(),
axis.text.y=element_blank(),
axis.ticks.y=element_blank())
#ggsave("~/Desktop/cluster_timelines.png")
# ggsave("~/Desktop/stus_data_timelines2.png")
```
We can take these time series and condense them into our six growth and density archetypes.
```{r}
cluster_labels <- c(
"2"="Mid density from low growth",
"3"="Very low density and low growth",
"1"="Low density from low growth",
"5"="High density from medium growth",
"6"="Low density from high growth",
"4"="Low density from medium growth")
ggplot(plot_data %>% group_by(cluster,Date) %>% summarize(Density=mean(Density)),aes(x=Date,y=Density,color=cluster)) +
geom_line() +
theme_bw() +
scale_color_manual(values = type_color) +
labs(title="Population growth since 1941 and density archetypes",color="Archetype") +
facet_wrap("cluster",labeller = as_labeller(cluster_labels)) + #,scales="free_y"
theme(axis.title.y=element_blank(),
axis.text.y=element_blank(),
axis.ticks.y=element_blank())
#ggsave("~/Desktop/cluster_archetypes.png")
```
<span id="reference_map"></span>
```{r cluster-map-41, fig.height=8}
ggplot(map_data_type) +
geom_sf(aes(fill=cluster)) +
geom_sf(data=water,fill="lightblue",size=0,color=NA) +
geom_sf(data=landuse_van %>% filter(kind %in% green_sites),fill="darkgrey",size=0,color=NA) +
geom_sf(data=landuse_van %>% filter(kind %in% industrial_sites),fill="darkgrey",size=0,color=NA) +
geom_sf(data=roads %>% filter(kind %in% c("highway","major_road")),color="black",size=0.1) +
geom_sf(data=geo_data,size=0.5,fill=NA,color="darkgrey") +
geom_text(aes(x,y,label=id)) +
scale_fill_manual(values = type_color,labels=cluster_labels) +
my_map_theme +
labs(title=paste0(first_year,"-",last_year," Growth and Density Archetypes"),
caption=paste0("StatCan Census",first_year," through ",last_year),
fill=NULL) +
coord_sf(datum=NA, xlim=c(bbox$xmin,bbox$xmax), ylim=c(bbox$ymin,bbox$ymax))
#ggsave("~/Desktop/cluster_reference_map.png")
```
Referring back to the time series for the neighbourhoods, we can examine some of the ones that did not quite fit their archetype model. Neighbourhood 16, containing Fairview and South False Creek, hit a low in 1976 before growing quite strong. So overall the growth was only low when using 1941 as the starting point, but it would qualify as medium growth when starting from 1976.
Similarly, area 5 saw a decline until 1986, but Yaletown lead to strong population growth.
Area 2 in the West End took off between the 1956 and 1961 censuses and reached a peak in 1971 that was only eclipsed in the last census, something we also see to some extent in Area 3.
Area 30 is highest density area of the purple "medium density from medium growth" category, with Joyce-Collingwood marking it's imprint even though it is located within a very confined part of that area.
The "low density and very low growth" areas stand out as being geographically disconnected, but thus is the fate of these parts of Dunbar, West Point Grey and Strathcona.
## More recent history
Memory is short, and not many Vancouverites have 1941 as a comparison in mind. While is great to take the long view on this topic, we also wanted to complement by just looking at the more recent history and using 1976 as a start year, roughly the half-way mark in our time series.
```{r}
first_year="1976"
last_year=max(new_map_data$Year)
span=as.integer(last_year)-as.integer(first_year)
df <- map_data %>%
st_set_geometry(NULL) %>%
dplyr::select(c("id","Year","Density")) %>%
group_by(id) %>%
mutate(min=min(Density),
max=max(Density)) %>%
spread(key="Year",value="Density") %>%
mutate(growth=(!!as.name(last_year)/!!as.name(first_year))**(1/span)) %>%
ungroup %>%
mutate(rel_density_r=log(!!as.name(last_year))) %>%
mutate(growth_r=log(growth)) %>%
mutate(rel_density=log(!!as.name(last_year))/max(log(!!as.name(last_year)))) %>%
mutate(growth=log(growth)/max(log(growth)))
#centers <- readRDS("data/kmeans_growth_centres.Rd")
set.seed(1234)
cl <- kmeans(df %>% dplyr::select(growth,rel_density),centers=6,algorithm = "MacQueen")
#saveRDS(cl$centers,"data/kmeans_growth_centres.Rd")
df$cluster=cl$cluster %>% as.character
clusters=df$cluster %>% unique %>% sort
type_color <- set_names(RColorBrewer::brewer.pal(length(clusters),"Set2"),clusters)
ggplot(df,aes(x=growth_r,y=rel_density_r))+
geom_point(aes(color=cluster)) +
scale_color_manual(values = type_color) +
ggrepel::geom_label_repel(aes(label=id)) +
labs(title=paste0("Growth since ",first_year," vs Density"),x="Average annual growth (log scale)",y="Density (log scale)",color="K-means",caption="StatCan Census")
```
Here the picture is quite different. Area 37 dropped back into a lower growth cluster, while Areas 5 and 16 joined 38 in the high-growth cluster.
We can take these time series and condense them into our six growth and density archetypes.
```{r, fig.height=8}
cluster_labels <- c(
"5"="Mid density from medium growth",
"6"="Low density and low growth",
"2"="Very low density from low growth",
"1"="High growth",
"4"="High density",
"3"="Low to mid density from\nlow to medium growth")
map_data_type <- map_data %>% left_join(df %>% dplyr::select(id,cluster))
ggplot(map_data_type) +
geom_sf(aes(fill=cluster)) +
geom_sf(data=water,fill="lightblue",size=0,color=NA) +
geom_sf(data=landuse_van %>% filter(kind %in% green_sites),fill="darkgrey",size=0,color=NA) +
geom_sf(data=landuse_van %>% filter(kind %in% industrial_sites),fill="darkgrey",size=0,color=NA) +
geom_sf(data=roads %>% filter(kind %in% c("highway","major_road")),color="black",size=0.1) +
geom_sf(data=geo_data,size=0.5,fill=NA,color="darkgrey") +
geom_text(aes(x,y,label=id)) +
scale_fill_manual(values = type_color,labels=cluster_labels) +
my_map_theme +
labs(title=paste0(first_year,"-",last_year," Growth and Density Archetypes"),
caption=paste0("StatCan Census",first_year," through ",last_year),
fill=NULL) +
coord_sf(datum=NA, xlim=c(bbox$xmin,bbox$xmax), ylim=c(bbox$ymin,bbox$ymax))
#ggsave("~/Desktop/cluster_reference_map.png")
```
Using a more recent time frame paints a somewhat different pictures, albeit one that most are more familiar with.
## Finer geographies
There is a tradeoff between the length of the timeline and how coarse our geographies are. 1976 is somewhat of an inflection point in terms of geographies, so it's worthwhile to use this as a starting point for taking anther look using finer geographies.
This gives us more sub-regions to classify, but we can just run it though our established machine.
```{r}
first_year=first(new_map_data$Year)
last_year=last(years)
span=as.integer(last_year)-as.integer(first_year)
df <- new_map_data %>%
filter(!is.na(Year)) %>%
st_set_geometry(NULL) %>%
dplyr::select(c("base_CT","Year","Density")) %>%
group_by(base_CT) %>%
mutate(min=min(Density),
max=max(Density)) %>%
spread(key="Year",value="Density") %>%
mutate(growth=(!!as.name(last_year)/!!as.name(first_year))**(1/span)) %>%
ungroup %>%
mutate(rel_density_r=log(!!as.name(last_year))) %>%
mutate(growth_r=log(growth)) %>%
mutate(rel_density=log(!!as.name(last_year))/max(log(!!as.name(last_year)))) %>%
mutate(growth=log(growth)/max(log(growth)))
#centers <- readRDS("data/kmeans_growth_centres.Rd")
set.seed(123456)
cl <- kmeans(df %>% dplyr::select(growth,rel_density),centers=6,algorithm = "MacQueen")
#saveRDS(cl$centers,"data/kmeans_growth_centres.Rd")
df$cluster=cl$cluster %>% as.character
clusters=df$cluster %>% unique %>% sort
type_color <- set_names(RColorBrewer::brewer.pal(length(clusters),"Set1"),clusters)
ggplot(df,aes(x=growth_r,y=rel_density_r))+
geom_point(aes(color=cluster)) +
scale_color_manual(values = type_color) +
scale_x_continuous(limits = c(-0.03,NA)) +
scale_y_continuous(limits = c(2,6)) +
ggrepel::geom_label_repel(aes(label=base_CT),fill="#ffffff88",size = 2,alpha=0.5) +
labs(title=paste0("Growth since ",first_year," vs Density"),x="Average annual growth (log scale)",y="Density (log scale)",color="K-means",caption="StatCan Census")
```
```{r fig.height=8}
cluster_labels <- c(
"4"="Mid density from medium growth",
"3"="Mid density and low growth",
"5"="Very low density from\nvery low or negative growth",
"6"="High growth",
"1"="High density",
"2"="Low density from low to medium growth")
map_data_type <- new_map_data %>% left_join(df %>% dplyr::select(base_CT,cluster)) %>%
mutate(id=as.integer(base_CT))
ggplot(map_data_type) +
geom_sf(aes(fill=cluster)) +
geom_sf(data=water,fill="lightblue",size=0,color=NA) +
geom_sf(data=landuse_van %>% filter(kind %in% green_sites),fill="darkgrey",size=0,color=NA) +
geom_sf(data=landuse_van %>% filter(kind %in% industrial_sites),fill="darkgrey",size=0,color=NA) +
geom_sf(data=roads %>% filter(kind %in% c("highway","major_road")),color="black",size=0.1) +
geom_sf(data=geo_data,size=0.5,fill=NA,color="darkgrey") +
geom_text(aes(x,y,label=id),size=3) +
scale_fill_manual(values = type_color,labels=cluster_labels) +
my_map_theme +
labs(title=paste0(first_year,"-",last_year," Growth and Density Archetypes"),
caption=paste0("StatCan Census",first_year," through ",last_year),
fill=NULL) +
coord_sf(datum=NA, xlim=c(bbox$xmin,bbox$xmax), ylim=c(bbox$ymin,bbox$ymax))
#ggsave("~/Desktop/cluster_reference_map.png")
```
The slightly refined geographies add a little bit more detail, for example the mid-density centre of Kerrisdale pops out. Generally it paints a similar picture to the more coarser geography used above.
## Implications
This highlights how unevenly growth has been distributed in Vancouver. This pattern is not caused by people's preferences, it is almost entirely shaped by policy that dictate which areas should accommodate growth, and which ones should not. One modest exception is the proliferation of informal dwellings in form of secondary suites that has lead so some growth and was only fairly recently legalized.
We don't know how growth would have evolved without the imposing rigid zoning that does not just regulate nuisances by e.g. separating residential from industrial areas, but [imposes very low density on the majority of city land designated for residential use](https://doodles.mountainmath.ca/blog/2016/06/17/sdh-zoning-and-land-use/). But we do understand how cities generally function. Cities exist because the proximity of people, jobs and amenities create synergies. And this leads to [residential density genreally decaying exponentially with distance from the central business district and amenities](https://doodles.mountainmath.ca/blog/2019/03/27/density-timelines/). In Vancouver we notice the imbalance between the densities at the centre to densities only 4km out.
![Vancouver densities](https://doodles.mountainmath.ca/posts/2019-03-27-density-timelines_files/figure-html/vancouver-1.png)
The strong differential also builds up very visibly in the 3D animated map near the top, and is indicative of a massive planning failure that forces artificially low density in large swaths of the city while squeezing growth into small central areas as well as areas far outside of the City of Vancouver in other parts of Metro Vancouver. The consequence is a loss of social welfare, forcing people into longer commutes and into lower-amenity areas.
## The data
Stuart made the transcribed census data available, you can [download it here](https://docs.google.com/spreadsheets/d/1eadA5pTijP5LNO60s-H7Cg6yCYGvZpaPxHP6Kfg86Lg/edit?usp=sharing). Or [grab the code to the post](https://github.com/mountainMath/doodles/blob/master/content/posts/2019-06-09-vancouver-population-density-over-time.Rmarkdown) and play with the data that way.
You can’t perform that action at this time.