## MTA Daily Ridership - Line Chart

On March 8 2021, The New York Times published an article named "[How Corona Virus Has Changed New York Cit Transit in One Chart](https://www.nytimes.com/interactive/2021/03/08/climate/nyc-transit-covid.html)". The chart looks like the following:

<div>
<img src="https://static01.nyt.com/images/2021/03/07/us/nyc-transit-covid-promo-1615150889393/nyc-transit-covid-promo-1615150889393-superJumbo.png" width="800"/>
</div>

This chart shows the percentage of decline of ridership for bridges/tunnels, subways, buses, LIRR and Metro North. It visualizes the profound disruption of the pandemic on the large public transit system in New York City. It also shows that although the daily ridership has bounced back somewhat by March 2021, it has not fully recovered to the pre-pandemic level. It is interesting to extend this chart to include more recent data to see if we have recovered from the pandemic disruption by now.

I have attempted to reproduce and extend this chart to December 2023. The following dataset is used:

* [MTA Daily Ridership Data: Beginning 2020](https://data.ny.gov/Transportation/MTA-Daily-Ridership-Data-Beginning-2020/vxuj-8kew/about_data)

This Python script visualizes the decline in NYC public transit ridership from 2019 using Altair. It processes data for subways, buses, LIRR, Metro-North, and bridges/tunnels, calculating a 3-day moving average of ridership decline compared to pre-pandemic levels.

The chart includes:

Line plots tracking the decline for each transit mode.
Markers and labels for the latest data points.
A red vertical line marking NYC's March 22, 2020 lockdown.
A dashed green horizontal line representing 100% pre-pandemic ridership recovery.
The final 2000x800 resolution chart clearly illustrates transit recovery trends over time.

In [1]:
import altair as alt
import pandas as pd

url = "https://github.com/qnzhou/practical_data_visualization_in_python/files/14484180/MTA_Daily_Ridership_Data__Beginning_2020_20240304.csv"
data = pd.read_csv(url)

In [2]:
modes = ['Subways', 'Buses', 'LIRR', 'Metro-North', 'Bridges and Tunnels']

In [3]:
data['Date'] = pd.to_datetime(data['Date'], format='%m/%d/%Y')
for mode in modes:
    data[f'{mode}: % of Comparable Pre-Pandemic Day'] = (1 - data[f'{mode}: % of Comparable Pre-Pandemic Day']) * 100

In [4]:
# 3 day avg
for mode in modes:
    data[f'{mode} 3-day MA'] = -(data[f'{mode}: % of Comparable Pre-Pandemic Day'].rolling(window=3).mean())

In [5]:
long_data = pd.melt(data, id_vars=['Date'], 
                    value_vars=[f'{mode} 3-day MA' for mode in modes],
                    var_name='category', value_name='value')
last_data_points = long_data.groupby('category').apply(lambda d: d.nlargest(1, 'Date')).reset_index(drop=True)


In [23]:
# Chart

base = alt.Chart(long_data).encode(
    x=alt.X('Date:T', axis=alt.Axis(title='Date')),
    y=alt.Y('value:Q', axis=alt.Axis(title='Percent Decline from 2019 ridership'), scale=alt.Scale(domain=[-100, 60]))
)

base = base.encode(
    x=alt.X('Date:T', axis=alt.Axis(grid=False)),
)

lines = base.mark_line(opacity=0.7).encode(
    color=alt.Color('category:N', scale=alt.Scale(
    domain=[f'{mode} 3-day MA' for mode in modes],
    range=['#00008B',  # Dark Blue for Bridges and Tunnels
           '#4682B4',  # Medium Blue for Buses
           '#ADD8E6',  # Light Blue for Subways
           '#D8BFD8',  # Light Purple for LIRR
           '#9370DB']
)))

In [24]:
points = alt.Chart(last_data_points).mark_point(filled=True).encode(
    x='Date:T',
    y='value:Q',
    color=alt.Color('category:N', scale=alt.Scale(
    domain=[f'{mode} 3-day MA' for mode in modes],
    range=['#00008B',  # Dark Blue for Bridges and Tunnels
           '#4682B4',  # Medium Blue for Buses
           '#ADD8E6',  # Light Blue for Subways
           '#D8BFD8',  # Light Purple for LIRR
           '#9370DB']  # Medium Purple for Metro-North
)),
    size=alt.value(60),  # Adjust size as needed
)



In [25]:
labels = alt.Chart(last_data_points).mark_text(align='left', dx=5, dy=3).encode(
    x='Date:T',
    y='value:Q',
    text='category:N',
    color=alt.Color('category:N', scale=alt.Scale(
    domain=[f'{mode} 3-day MA' for mode in modes],
    range=['#00008B',  # Dark Blue for Bridges and Tunnels
           '#4682B4',  # Medium Blue for Buses
           '#ADD8E6',  # Light Blue for Subways
           '#D8BFD8',  # Light Purple for LIRR
           '#9370DB']  # Medium Purple for Metro-North
)))

In [26]:
lockdown_vline = alt.Chart(pd.DataFrame({'Date': [pd.to_datetime('2020-03-22')]})).mark_rule(
    color='red', strokeWidth=2
).encode(
    x='Date:T'
)

lockdown_text = alt.Chart(pd.DataFrame({'Date': [pd.to_datetime('2020-03-22')], 'Text': ['New York lockdown\nMarch 22']})).mark_text(
    align='left',
    baseline='middle',
    dx=5,
    dy=50,
    fontSize=15,
    color='grey'
).encode(
    x='Date:T',
    text='Text:N'
)

In [27]:
hundred_percent_hline = alt.Chart(pd.DataFrame({'Percent': [0]})).mark_rule(
    color='green', strokeWidth=2, strokeDash=[2, 2]
).encode(
    y='Percent:Q'
)

In [28]:
y_axis = alt.Axis(
    values=list(range(-100, 61, 20)),
    grid=True,
    title='Percent Decline from 2019 Ridership'
)

In [29]:
chart = alt.layer(lines, points, labels, lockdown_vline, lockdown_text, hundred_percent_hline).properties(
    width=2000,
    height=800,
    title="Percent Decline from 2019 Ridership"
)

alt.data_transformers.disable_max_rows()

DataTransformerRegistry.enable('default')

In [30]:
chart

In [24]:
chart.save('final_visualization.html')