---
Mapping Transit-induced Gentrification
---

## Introduction

This hands-on exercise shows how to use Census API to get data from the Census Bureau and explore demographic changes over time near transit stops. Gentrification is a complex process that involves changes in the character of a neighborhood by means of increased property values or rent and new higher-income residents moving in. The multi-dimensional character of gentrification can be identified in diverse ways (Easton et al., 2020). For example, Freeman (2005) proposed to identify gentrifying neighborhoods by looking at age of housing stock, income, and education level of residents, and change in owner-occupied housing price. Studies usually look at changes in these variables between two time periods (Maciag, 2015; Desmond and Gershenson, 2017). The causes behind gentrification can also be diverse. Gentrification can happen due to transit investment (Dawkins and Moeckel, 2016), urban renewal programs (Mehdipanah et al., 2018), housing policy (Gelb and Lyons, 199), park location (Rigolon and Németh, 2020), tourism (Gotham, 2013), etc.

This exercise focuses on transit-induced gentrification in Chicago, IL and shows a simple analytical approach to identify gentrifying neighborhoods. Although gentrification is a multi-dimensional phenomenon, in this exercise we will take only one indicator (education attainment) to identify areas which are most likely gentrifying. We will get data on population with bachelor’s degree or higher from the American Community Survey (ACS) of the US Census Bureau, at census tract level for two time periods (2010 and 2019). Changes in the share of population with bachelor’s degree or higher will be our measure of gentrification. We will then take locations of Chicago Transit Authority (CTA) train stations and overlay them on the map showing changes in bachelor’s degree or higher population to identify the areas which are most likely experiencing transit-induced gentrificatio.p


Here are the four key steps of this exercise:

- Collect and calculate data for 2010 (2006-2010 ACS 5-year estimate) 
- Collect and calculate data for 2019 (2015-2019 ACS 5-year estimate)
- Calculate change in % of population with bachelor’s degree or higher
- Overlay CTA train stations on demographic change map

<B> Loading required packages </B>

If any of the packages is already not installed, use code: install.packages("package name")

In [None]:
#install.packages("sf")
#install.packages("tidycensus")
#install.packages("tidyverse")
#install.packages("ggplot2")
#install.packages("leaflet")

library(sf)
library(ggplot2)
library(leaflet)
library(tidycensus)
library(tidyverse)

<b>Get 2010 ACS data using Census API</b>

We will use tidycensus package to get ACS data. The tidycensus package (Walker and Herman, 2021) is designed to facilitate the process of acquiring and working with US Census Bureau population data in the R environment. With this package, R users can request geometry along with attributes for their Census data, helping facilitate mapping and spatial analysis. You can find the documentation on this package here: https://cran.r-project.org/web/packages/tidycensus/tidycensus.pdf 
Tidycensus uses Census Application Programming Interface (API) to get get any decennial or ACS data.

API can help us to easily download data from census website without going through any interactive process of data selection. To do this, you will need to have a good idea about census data structure. APIs for different census data products are available here: https://www.census.gov/data/developers/data-sets.html 

The Census API allows up to 500 queries per day without an API key. For more than 500 queries, you will need an API key. Get the API key from here: https://api.census.gov/data/key_signup.html. In a return email you will receive the API Key. Copy the key in the function below.

census_api_key("Copy the API Key Here", install = TRUE, overwrite = TRUE)

Install=TRUE option will install they key on your computer and you can use the same API key for future use.


We will use get_acs() command from tidycensus package to get 2010 ACS data (2006-2010 ACS 5-year estimate).

In [None]:
Pop25y10 <- get_acs(geography = "tract", variables = c("B23006_001"), year = 2010, 
                     survey = "acs5", geometry=TRUE, state=17, county = 31)

The above code downloaded 2010 ACS 5-year estimate (acs5) data at the census tract level for Cook county, IL (Chicago is located in Cook county). Notice that we used geometry=TRUE option, which allows getting census geometry along with the attributes. In this code, we downloaded population data of 25 to 64 years since ACS data reports education attainment for this age group. You can also look at the table IDs available on census website: https://api.census.gov/data/2010/acs/acs5/variables.html

If you want to do this analysis for any other county, you will need FIPS code of your County of interest. You can find the FIPS code here: https://www.nrcs.usda.gov/wps/portal/nrcs/detail/national/home/?cid=nrcs143_013697

Since the downloaded data (Pop25y10) is an sf (simple feature) object, we need to convert it to a data frame for our calculation.

In [None]:
Pop25y10.df <- as.data.frame(Pop25y10)

Now we will create a new column (Tot25) that stores the 25 to 64 population estimate from ACS data. We only need GEOID and this new column for further calculation, so used select command to separate them to a new data frame (Pop25y10.df2). We then used head command to take a quick look at the table.

In [None]:
Pop25y10.df$Tot25 <- Pop25y10.df$estimate
Pop25y10.df2 <- Pop25y10.df %>% select("GEOID", "Tot25")
head(Pop25y10.df2)

We will now use get_acs command to download populations with bachelor's or higher degree within 25 to 64 years of age population. Most options are same as discussed above, except we need to change the table ID for our variable.

In [None]:
Bach25y10 <- get_acs(geography = "tract", variables = c("B23006_023"), year = 2010, 
                     survey = "acs5", geometry=TRUE, state=17, county = 31)

Now convert it to a data frame, calculate a new field (Bach25) that contains populations with bachelor's degree or higher, and then create another data frame selecting GEOID and the new field (Bach25) 

In [None]:
Bach25y10.df <- as.data.frame(Bach25y10)
Bach25y10.df$Bach25 <- Bach25y10.df$estimate
Bach25y10.df2 <- Bach25y10.df %>% select("GEOID", "Bach25")

We will now merge the two data frames by GEOID, and used head command to take a look at the new table.

In [None]:
BachDat10 <- merge(Pop25y10.df2, Bach25y10.df2, by="GEOID")
head(BachDat10)

Calculate a new field (PctBach10) for storing percent of population (25 to 64 years) with bachelor's degree or higher.

In [None]:
BachDat10$PctBach10 <- BachDat10$Bach25/BachDat10$Tot25*100
head(BachDat10)

Since this data frame does not contain any geometry attribute, we will merge it with a previous data frame (Pop25y10.df) that contains geometry attribute. We then convert it to an sf object (using st_as_sf command) to creae a map from the data.

In [None]:
Dat10 <- merge(Pop25y10.df, BachDat10, by="GEOID")
Dat10sf <- st_as_sf(Dat10)

Now we will create a map of % population with bachelor's degree or higher using ggplot2.

In [None]:
Dat10sf %>%
  ggplot(aes(fill = PctBach10)) + 
  geom_sf() + 
  scale_fill_viridis_c(option = "magma") 

<b>Get 2019 ACS data using Census API</b>

Following similar approach for 2010 data (as shown above), we will now get 2019 ACS data (2015-2019 acs 5-year estimate) and do the necessary calculations.

In [None]:
Pop25y19 <- get_acs(geography = "tract", variables = c("B23006_001"), year = 2019, 
                     survey = "acs5", geometry=TRUE, state=17, county = 31)

Convert the sf object (Pop25y19) to a data frame, store total population estimate (25-64 years) to a new field, and then create a new data frame with GEOID and the calculated field (Tot25)

In [None]:
Pop25y19.df <- as.data.frame(Pop25y19)
Pop25y19.df$Tot25 <- Pop25y19.df$estimate
Pop25y19.df2 <- Pop25y19.df %>% select("GEOID", "Tot25")
head(Pop25y19.df2)

Similar approach to download populations with bachelor's degree or higher data using get_acs command.

In [None]:
Bach25y19 <- get_acs(geography = "tract", variables = c("B23006_023"), year = 2019, 
                     survey = "acs5", geometry=TRUE, state=17, county = 31)

Create a new data frame with GEOID and a new field (Bach25) for populations with bachelor's or higher degree

In [None]:
Bach25y19.df <- as.data.frame(Bach25y19)
Bach25y19.df$Bach25 <- Bach25y19.df$estimate
Bach25y19.df2 <- Bach25y19.df %>% select("GEOID", "Bach25")

Calculate percentage of population with bachelor's degree or higher education attainment.

In [None]:
BachDat19 <- merge(Pop25y19.df2, Bach25y19.df2, by="GEOID")
BachDat19$PctBach19 <- BachDat19$Bach25/BachDat19$Tot25*100
head(BachDat19)

Create an sf object (Dat19sf) with the calculated variable

In [None]:
Dat19 <- merge(Pop25y19.df, BachDat19, by="GEOID")
Dat19sf <- st_as_sf(Dat19)

Now create a map of % population with bachelor's degree or higher at the census tract level.

In [None]:
Dat19sf %>%
  ggplot(aes(fill = PctBach19)) + 
  geom_sf() + 
  scale_fill_viridis_c(option = "magma") 

<b> Calculate and map demographic change </b>

We will now calculate demographic change (i.e., change in % population with bachelor's degree or higher) using the data for two years we calculated before. First, we will merge the two data frames by GEOID and then create a new field (BachDiff) to store the difference in share of population with bachelor's degree or higher.

In [None]:
DatAll <- merge (Dat19, Dat10, by="GEOID")
DatAll$BachDiff <- DatAll$PctBach19-DatAll$PctBach10

Let's take a quick look at the new variable to decide how should we map it.

In [None]:
hist(DatAll$BachDiff)

Now we will convert the above data frame (DatAll) to an sf object (DatAll.sf) to create a map showing the change in % population with bachelor's degree or higher. 

In [None]:
DatAll.sf <-st_as_sf(DatAll)
DatAll.sf %>%
  ggplot(aes(fill = BachDiff)) + 
  geom_sf() + 
  scale_fill_viridis_c(option = "turbo") 

The map created above shows the areas where more educated populations moved in between 2010 and 2019, indicating potential areas where gentrification might be happening.

<b>Map overlay with transit stops </b>

We will now overlay CTA train station locations with demographic change map created in the previous step to identify areas where the train stations may have contributed to gentrification.

First, we will read a shapefile that contains all the CTA train station locations. This data was collected from the City of Chicago Data Portal, availabe here: https://data.cityofchicago.org/dataset/CTA-L-Rail-Stations-Shapefile/vmyy-m9qj/about_data 

In [None]:
CTA_stops <- st_read("Data/CTA_Stations.shp")

Let's create a map of the stations only to check out the data.

In [None]:
ggplot(CTA_stops)+
geom_sf(size=3)

Now we will overlay these stop locations on the demographic change map created in the previous step.

In [None]:
ggplot() + 
  geom_sf(data=DatAll.sf, aes(fill = BachDiff)) + 
  scale_fill_viridis_c(option = "turbo")+
  geom_sf(data=CTA_stops, size = 2, alpha = 0.5)

As the above map shows, some of the gentrifying areas, primarily areas in the north-west from downtown are overlapping with the CTA stations. We can create an interactive map to explore it more.

We need to transform the projection of our demographic change map (DatAll.sf) to a geographic coordinate system to make it consistent with the CTA station map.

In [None]:
DatAll.sf2 <- st_transform (DatAll.sf, 4326)

Now we will use leaflet package to create an interactive map where we can zoom in and out to further explore whether CTA stations overlap with gentrifying areas.

In [None]:
bins <- c(50, 40, 20, 10, 0, -10, -20, -30)
pal <- colorBin("RdYlBu", domain = DatAll.sf2$BachDiff, bins = bins, reverse=TRUE)

m <- leaflet() %>%
  addTiles() %>%  # Add default OpenStreetMap map tiles
  addPolygons(data=DatAll.sf2, fillColor = ~pal(BachDiff), fillOpacity = 0.6, stroke=FALSE) %>%
  addCircles(lng=CTA_stops$long, lat=CTA_stops$lat, popup=CTA_stops$LINES, radius=2, opacity =0.7, fill = TRUE) %>% 
  addLegend(position = "bottomright", pal = pal, values = DatAll.sf2$BachDiff, title="Change in %Bachelor degree or higher", opacity = 0.7, )
m

The interactive map shows that some areas along Blue line stations (going to O'Hare airport) experienced more concentration of highly educated populations (bachelor's degree or higher) between 2010 and 2019 compared to other areas. However, stations along most other lines do not show any significant change in their neighboring areas.

<b> Reflection and challenge tasks </b>

This exercise shows a simplified way to identify gentrifying areas based on only one indicator. However, gentrification is a multi-dimensional phenomenon and it cannot be measured by one variable only. Increased concentration of highly educated population does not confirm that gentrification is happening in those areas. This exercise can be further expanded by incorporating data on income level, race/ethnicity, home value, housing stock, etc. It can also be implemented for any other city.

# References

Dawkins, C., & Moeckel, R. (2016). Transit-induced gentrification: Who will stay, and who will go?. Housing Policy Debate, 26(4-5), 801-818.

Desmond M, Gershenson C (2017) Who gets evicted? Assessing individual, neighborhood, and network factors. Social Science Research 62: 362–377.

Easton, S., Lees, L., Hubbard, P., & Tate, N. (2020). Measuring and mapping displacement: The problem of quantification in the battle against gentrification. Urban studies, 57(2), 286-306.

Freeman L (2005) Displacement or succession? Urban Affairs Review 40: 463–491.

Gelb, J., & Lyons, M. (1993). A tale of two cities: housing policy and gentrification in London and New York. Journal of Urban Affairs, 15(4), 345-366.

Gotham, K. F. (2013). Tourism gentrification: The case of new Orleans' vieux carre (French Quarter). In The Gentrification Debates (pp. 145-165). Routledge.

Maciag M (2015) Gentrification in America report: Governing the states and localities. Available at: https://www.governing.com/gov-data/census/gentrification-in-cities-governing-report.htm

Rigolon, A., & Németh, J. (2020). Green gentrification or ‘just green enough’: Do park location, size and function affect whether a place gentrifies or not?. Urban Studies, 57(2), 402-420.

Walker, K., & Herman, M. (2021). tidycensus: Load US Census Boundary and Attribute Data as ’Tidyverse’ and ’Sf’ -Ready Data Frames. https://github.com/walkerke/tidycensus.

