# Analysis of CTA Ridership data

In [1]:
import sqlite3
import pandas as pd

conn = sqlite3.connect('cta_ridership.db')

### Check that data has been loaded into tables as expected

In [7]:
# Annual ridership data
annual_ridership_query = """
SELECT * FROM annual_boarding_totals;
"""

annual_df = pd.read_sql(annual_ridership_query, conn)
annual_df

Unnamed: 0,year,bus,paratransit,rail,total
0,1988,430089500,435400,174436000,604960900
1,1989,420572700,924800,168658800,590156300
2,1990,421183734,930802,165732575,587847111
3,1991,392088602,949460,147608116,540646178
4,1992,370335119,1011669,137372830,508719618
5,1993,326655953,1167904,135369734,463193591
6,1994,331520700,1209900,143579100,476309700
7,1995,306075585,1270274,135461619,442807478
8,1996,302115116,1244209,142040486,445399811
9,1997,287628293,1235085,151010374,439873752


In [None]:
### Daily Bus Ridership Data
bus_query = """
SELECT * FROM daily_ridership_bus_routes;
"""

bus_df = pd.read_sql(bus_query, conn)
bus_df

Unnamed: 0,route,date,daytype,rides
0,1,2001-01-02,W,5813
1,1,2001-01-03,W,6809
2,1,2001-01-04,W,6907
3,1,2001-01-05,W,6154
4,1,2001-01-08,W,6126
...,...,...,...,...
1081968,X99,2005-03-21,W,55
1081969,X99,2005-03-22,W,58
1081970,X99,2005-03-23,W,49
1081971,X99,2005-03-24,W,53


In [17]:
# Daily L (train) ridership data
train_ridership_query = """
SELECT * FROM daily_ridership_l_stations;
"""

train_df = pd.read_sql(train_ridership_query, conn)
train_df

Unnamed: 0,station_id,stationname,date,daytype,rides
0,40350,UIC-Halsted,2001-01-01,U,273
1,41130,Halsted-Orange,2001-01-01,U,306
2,40760,Granville,2001-01-01,U,1059
3,40070,Jackson/Dearborn,2001-01-01,U,649
4,40090,Damen-Brown,2001-01-01,U,411
...,...,...,...,...,...
1284672,41670,Conservatory,2025-08-31,U,558
1284673,41680,Oakton-Skokie,2025-08-31,U,250
1284674,41690,Cermak-McCormick Place,2025-08-31,U,1459
1284675,41700,Washington/Wabash,2025-08-31,U,6586


## Begin Analysis

### Analysis 1: How has ridership for the L and Buses changed 5 years before and 5 years after Covid-19?

In [21]:
# analysis starts on 11/01/2014 and ends on 11/01/2025 due to 2025's data availability
covid_ridership_impacts_query = """
with total_ridership as
   (SELECT train.date as date, train.rides as train_ride_count, bus.rides as bus_ride_count
    FROM daily_ridership_l_stations train
    INNER JOIN daily_ridership_bus_routes bus
        ON train.date = bus.date
    WHERE train.date BETWEEN '2014-11-01' AND '2025-11-01')
SELECT 'pre-pandemic' as timeframe, sum(train_ride_count), sum(bus_ride_count) FROM total_ridership WHERE date < '2020-01-01'
UNION
SELECT 'post-pandemic' as timeframe, sum(train_ride_count), sum(bus_ride_count) FROM total_ridership WHERE date >= '2020-01-01'
"""

covid_ridership_df = pd.read_sql(covid_ridership_impacts_query, conn)
covid_ridership_df

Unnamed: 0,timeframe,sum(train_ride_count),sum(bus_ride_count)
0,post-pandemic,58407267537,119981507510
1,pre-pandemic,117245789043,186724939005


## Thoughts
I'm really surprised to see how much ridership for both bus and trains have boomed after the pandemic. I honestly expected to see the opposit results, especially because the "post-pandemic" data includes 2020 in its counts. The rate of increase is truly what's the most shocking to me. Chicagoans only have half a year to spend their time outside, and 2020 took that away from its residents. Maybe the people are compensating or being more grateful for the time they have to spend outdoors? Or it could be that we're seeing a decrease in car ownership in the city. Hard to tell without other statistics on hand.\
I've heard that though Chicago has lost the largest amount of residents moving out of the city in a year (I believe back in 2023), the same year also saw an increase in population (I contributed to the latter statistic).

## Analysis 2: What is the top train station of 2025 so far? What about the least?

In [25]:
l_station_popularity_query = """
SELECT distinct stationname, sum(rides) as total_rides
FROM daily_ridership_l_stations
WHERE date >= '2025-01-01'
GROUP BY stationname
ORDER BY total_rides DESC
"""

l_ridership_df = pd.read_sql(l_station_popularity_query, conn)
l_ridership_df

Unnamed: 0,stationname,total_rides
0,Lake/State,2263853
1,O'Hare Airport,2115712
2,Clark/Lake,2008968
3,Fullerton,1677029
4,State/Lake,1673523
...,...,...
139,Berwyn,74444
140,Lawrence,71637
141,Halsted/63rd,62312
142,Kostner,60504


### Thoughts
I'm not surprised to see State & Lake station having the highest ridership, considering it's located right in the middle of downtown Chicago. However, I do see another entry for State & Lake at number 4. It's difficult to tell which of the two entries belong to the Redline subway station or the platform station housing Brown, Purple, Green, Pink, and Orange lines. I'd say it's safe to assume that the latter hold the place at the number 1 spot, considering this is the intersection where North, South, and West side transit links together.  However, seeing the redline isn't surprising, either, as this station is located right next to the Chicago Theater. Additionally, this train line takes commuters to both the Sox and Cubs stadiums.\
\
King Drive being placed bottom is not too surprising. This is station is located a few blocks away from the notorious O Block. This isn't an area frequented by many outside of residents due to the area's bad reputation, unfortunately. 

## Analysis 3: In the spirit of the Holidays, let's take a look at how Christmas ridership has changed over the years.

In [38]:
holiday_query = """
SELECT strftime('%Y', train.date) AS year, sum(train.rides) as total_train_rides, sum(bus.rides) as total_bus_rides
FROM daily_ridership_l_stations train
    JOIN daily_ridership_bus_routes bus
    ON train.date = bus.date
WHERE train.date LIKE '%-12-25'
GROUP BY year
ORDER BY year DESC
"""

holiday_ridership_df = pd.read_sql(holiday_query, conn)
holiday_ridership_df

Unnamed: 0,year,total_train_rides,total_bus_rides
0,2024,6352906,20926224
1,2023,5972724,20652775
2,2022,4015082,12048465
3,2021,5019396,14431274
4,2020,2900006,11468457
5,2019,9510492,28360046
6,2018,9151878,26994384
7,2017,8484066,23678065
8,2016,9297460,28584432
9,2015,11723800,33117552


#### Thoughts
I contributed in 2024, 2023, and 2022 to the Christmas day ridership. I truly didn't expect this high of a number, but there are plenty of citizens who call Chicago home, so that can't be too surprising.\
It makes sense to see that ridership for both the L and the bus continue to keep up in 2008-2011 after the major financial crisis. We can assume that more residents chose to save money on gas and/or purchasing cars by taking public transit. \
As expected, there is a major dip in 2020, though recovery seems to come rather quickly. Usage of the L on Christmas day in 2024 matches the levels seen back in 2001.