# Chicago's Pandemic Rail Ridership - Part 1 Ridership Changes
> A look at changes in Chicago rail ridership during the pandemic.

- toc: true 
- badges: true
- comments: true
- categories: 
- image: 

## Chicago Rail Ridership Series

Inspired by a [post](https://www.urban.org/urban-wire/transit-ridership-dropped-heavy-rail-stations-during-covid-19-pandemic-ridership-change-depended-neighborhood-characteristics) from the Urban Institute's blog, I wanted to reproduce and extend a portion of their analysis.  

This analysis focuses on the City of Chicago and first investigates rail ridership changes between 2019 and 2020 from the months of April to June.  As the author's highlight, rail stops that saw different changes in ridership were likely to differ in demographic characteristics.

While some individuals lost their jobs or were able to work from home, many essential workers still needed to get to a phyiscal location to do their job.  Thus it is reasonable to assume that rail stops with residents more likely to be essential workers would see less extreme declines in ridership.

In order to conduct this analysis we will be looking at data from the Chicago Transit Authority (CTA).  It publishes daily, stop-specific ridership information that allows for robust analysis.  CTA also publishes rail stop information, including geographic coordinates.  Using American Community Survey data, we will be able to summarise the demographic characteristics of residents living within a half mile of each rail station.

For this series, we will walk through several posts to highlight different techniques.  

In the first post, we will simply read in the ridership data and analyze year-over-year changes by stop.  

In our second post, we will combine the geographic coordinates of each stop with Census data to identify individuals that reside within a half mile of each stop.

In our third post, we will summarise the demographic characteristics of the individuals falling within 0.5 miles of each stop.

Finally, we will tie these together by investigating the relationships between demographic characteristics and ridership changes.

From a policy perspective, this analysis provides grounds for investment in transit in neighborhoods that rely on it most heavily, especially in times of crisis.  In order to build resilient cities, **essential workers should have easy access to transit that is most readily available when they are most essential**.

I'm excited, let's go!

## In this Post

In this post we will be keeping things simple.  We read in the [CTA daily rail ridership data](https://data.cityofchicago.org/Transportation/CTA-Ridership-L-Station-Entries-Daily-Totals/5neh-572f) and summarize changes in ridership across years.

## Setup

For this post we need to have `pandas` available in our environment.  We also need to make the `generate_query` function outlined in [this post](./2021-08-19-read-socrata.ipynb) available.

In [10]:
# load pandas
import pandas as pd

# import user-defined function to clean up soql queries
from resources.config import generate_query

## Read Data

We will be using data from the Chicago Transit Authority that is warehoused within the City of Chicago's Open Data Portal.  Very conveniently, Chicago uses Socrata to warehouse their Open Data.  We can use the principles from [this post](./2021-08-19-read-socrata.ipynb) to help us read in the data.

In [3]:
ridership_endpoint = "https://data.cityofchicago.org/resource/5neh-572f.json"

ridership_query = generate_query(endpoint_url=ridership_endpoint,
                                 query = """select station_id, 
                                                   date_extract_y(date) as year,
                                                   date_extract_m(date) as month, 
                                                   sum(rides) as rides 
                                            where date>='2019-01-01'
                                            and date<='2020-07-01'
                                            group by station_id, 
                                                     date_extract_y(date),
                                                     date_extract_m(date)""",
                                 limit = 50000)

rides_df = pd.read_json(ridership_query)

## Aggregate Ridership

Here, we filter to observations in April, May, and June and then summarise the number of rides by year.

In [4]:
agg_rides_df = rides_df \
    .query("month == [4,5,6]") \
    .groupby(['station_id','year'], as_index=False) \
    .aggregate({'rides':'sum'})

agg_rides_df.head()

Unnamed: 0,station_id,year,rides
0,40010,2019,140717
1,40010,2020,25533
2,40020,2019,284197
3,40020,2020,68098
4,40030,2019,125606


## Pivot Ridership Data Wide by Year

Using the `pivot` function, we pivout our data wide such that each station has a column for rides in both 2019 and 2020.

In [5]:
agg_rides_wide_df = agg_rides_df \
    .assign(year=lambda x: 'year_' + x['year'].astype(str)) \
    .pivot(index = 'station_id', columns='year', values='rides').reset_index()

agg_rides_wide_df.head()

year,station_id,year_2019,year_2020
0,40010,140717,25533
1,40020,284197,68098
2,40030,125606,40887
3,40040,564445,29337
4,40050,286127,41275


## Calculate Year-Over-Year Change

Finally, we calculate the change in ridership from 2019 to 2020.

In [15]:
rides_change_df = agg_rides_wide_df \
    .assign(abs_change = lambda x: x['year_2020'] - x['year_2019'],
            pct_change = lambda x: x['abs_change'] / x['year_2019'])

# write to csv for next post
rides_change_df.to_csv(path_or_buf="./resources/rides_change_df.csv")

rides_change_df.head()

year,station_id,year_2019,year_2020,abs_change,pct_change
0,40010,140717,25533,-115184,-0.818551
1,40020,284197,68098,-216099,-0.760385
2,40030,125606,40887,-84719,-0.674482
3,40040,564445,29337,-535108,-0.948025
4,40050,286127,41275,-244852,-0.855746


## Summary

We now have a clean data frame with changes in ridership for each rail station.  In the next post in the series we will explore how to identify Census respondents within a half mile of each station. 