## MTA Daily Ridership

On March 8 2021, The New York Times published an article named "[How Corona Virus Has Changed New York Cit Transit in One Chart](https://www.nytimes.com/interactive/2021/03/08/climate/nyc-transit-covid.html)". The chart looks like the following:

<div>
<img src="https://static01.nyt.com/images/2021/03/07/us/nyc-transit-covid-promo-1615150889393/nyc-transit-covid-promo-1615150889393-superJumbo.png" width="800"/>
</div>

This chart shows the percentage of decline of ridership for bridges/tunnels, subways, buses, LIRR and Metro North. It visualizes the profound disruption of the pandemic on the large public transit system in New York City. It also shows that although the daily ridership has bounced back somewhat by March 2021, it has not fully recovered to the pre-pandemic level. It is interesting to extend this chart to include more recent data to see if we have recovered from the pandemic disruption by now.

In this assignment, your task is to reproduce and extend this chart to December 2023. The following dataset is used:

* [MTA Daily Ridership Data: Beginning 2020](https://data.ny.gov/Transportation/MTA-Daily-Ridership-Data-Beginning-2020/vxuj-8kew/about_data)

The chart should be a line chart just like the one in the New York Times. Here are the requirements:

* The X axis should go from March 2020 all the way to Dec 31, 2023.
* The Y axis should show the percentage decline from the pre-pandemic ridership level.
* There should be 5 curves corresponding to bridges/tunnels, subway, buses, LIRR and Metro North just like in the original NYT chart.
* The data to encode in the Y-axis should be the 3-day moving average of the daily ridership data. I.e. The data used for Jan 3, 2023 should be the average of the data of Jan 1, Jan 2 and Jan 3 in 2023.
* Each curve should be labeled at the end of the curve (i.e. with a dot and text at the location of the last data point).
* A vertical line showing the date of New York lockdown on March 22, 2020.
* A horizontal line showing the 100% level.

Please submit the complete notebook and the resulting visualization in .png, .svg or .html format.

In [1]:
import altair as alt
import pandas as pd

url = "https://github.com/qnzhou/practical_data_visualization_in_python/files/14484180/MTA_Daily_Ridership_Data__Beginning_2020_20240304.csv"
data = pd.read_csv(url)

In [2]:

pip install "vegafusion[embed]>=1.5.0"

Note: you may need to restart the kernel to use updated packages.


In [3]:
alt.data_transformers.enable('vegafusion')


DataTransformerRegistry.enable('vegafusion')

In [4]:
# Your code here...

In [5]:
data.head(20)

Unnamed: 0,Date,Subways: Total Estimated Ridership,Subways: % of Comparable Pre-Pandemic Day,Buses: Total Estimated Ridership,Buses: % of Comparable Pre-Pandemic Day,LIRR: Total Estimated Ridership,LIRR: % of Comparable Pre-Pandemic Day,Metro-North: Total Estimated Ridership,Metro-North: % of Comparable Pre-Pandemic Day,Access-A-Ride: Total Scheduled Trips,Access-A-Ride: % of Comparable Pre-Pandemic Day,Bridges and Tunnels: Total Traffic,Bridges and Tunnels: % of Comparable Pre-Pandemic Day,Staten Island Railway: Total Estimated Ridership,Staten Island Railway: % of Comparable Pre-Pandemic Day
0,03/01/2020,2212965,0.97,984908,0.99,,,55826,0.59,19922,1.13,786961,0.98,1636.0,0.52
1,03/02/2020,5329915,0.96,2209066,0.99,321569.0,1.03,180702,0.66,30338,1.02,874620,0.95,17140.0,1.07
2,03/03/2020,5481103,0.98,2228608,0.99,319727.0,1.02,190648,0.69,32767,1.1,882175,0.96,17453.0,1.09
3,03/04/2020,5498809,0.99,2177165,0.97,311662.0,0.99,192689,0.7,34297,1.15,905558,0.98,17136.0,1.07
4,03/05/2020,5496453,0.99,2244515,1.0,307597.0,0.98,194387,0.7,33209,1.12,929298,1.01,17203.0,1.08
5,03/06/2020,5189447,0.93,2066743,0.92,289171.0,0.92,205056,0.74,30970,1.04,945408,1.03,15285.0,0.96
6,03/07/2020,2814637,0.92,1249085,0.94,106058.0,0.98,75839,0.56,18117,1.07,827908,0.95,2445.0,0.48
7,03/08/2020,2120656,0.93,957163,0.96,81565.0,0.94,60800,0.64,19477,1.11,765084,0.95,1672.0,0.53
8,03/09/2020,4973513,0.89,2124770,0.95,277001.0,0.88,183953,0.67,29609,1.0,860073,0.93,16122.0,1.01
9,03/10/2020,4867818,0.87,2111989,0.94,259324.0,0.83,179050,0.65,31315,1.05,855585,0.93,15805.0,0.99


In [6]:
# bridges_tunnels.head(20)

In [7]:
data.tail()

Unnamed: 0,Date,Subways: Total Estimated Ridership,Subways: % of Comparable Pre-Pandemic Day,Buses: Total Estimated Ridership,Buses: % of Comparable Pre-Pandemic Day,LIRR: Total Estimated Ridership,LIRR: % of Comparable Pre-Pandemic Day,Metro-North: Total Estimated Ridership,Metro-North: % of Comparable Pre-Pandemic Day,Access-A-Ride: Total Scheduled Trips,Access-A-Ride: % of Comparable Pre-Pandemic Day,Bridges and Tunnels: Total Traffic,Bridges and Tunnels: % of Comparable Pre-Pandemic Day,Staten Island Railway: Total Estimated Ridership,Staten Island Railway: % of Comparable Pre-Pandemic Day
1456,02/25/2024,1742005,0.79,608429,0.62,88104.0,1.13,78248,0.86,19822,1.18,822624,1.09,1799.0,0.64
1457,02/26/2024,3440336,0.63,1271113,0.59,217914.0,0.72,187543,0.7,30867,1.05,873604,0.99,6969.0,0.43
1458,02/27/2024,3803092,0.7,1323564,0.62,233207.0,0.77,208043,0.77,33571,1.14,893537,1.01,7658.0,0.47
1459,02/28/2024,3784529,0.7,1265477,0.59,228110.0,0.75,199672,0.74,33664,1.14,889784,1.01,7427.0,0.46
1460,02/29/2024,3900237,0.72,1182262,0.55,232010.0,0.77,204829,0.76,33637,1.14,957616,1.08,7291.0,0.45


In [8]:
data.columns

Index(['Date', 'Subways: Total Estimated Ridership',
       'Subways: % of Comparable Pre-Pandemic Day',
       'Buses: Total Estimated Ridership',
       'Buses: % of Comparable Pre-Pandemic Day',
       'LIRR: Total Estimated Ridership',
       'LIRR: % of Comparable Pre-Pandemic Day',
       'Metro-North: Total Estimated Ridership',
       'Metro-North: % of Comparable Pre-Pandemic Day',
       'Access-A-Ride: Total Scheduled Trips',
       'Access-A-Ride: % of Comparable Pre-Pandemic Day',
       'Bridges and Tunnels: Total Traffic',
       'Bridges and Tunnels: % of Comparable Pre-Pandemic Day',
       'Staten Island Railway: Total Estimated Ridership',
       'Staten Island Railway: % of Comparable Pre-Pandemic Day'],
      dtype='object')

In [9]:
data["Date"] = pd.to_datetime(data["Date"])

In [10]:
data.dtypes

Date                                                       datetime64[ns]
Subways: Total Estimated Ridership                                  int64
Subways: % of Comparable Pre-Pandemic Day                         float64
Buses: Total Estimated Ridership                                    int64
Buses: % of Comparable Pre-Pandemic Day                           float64
LIRR: Total Estimated Ridership                                   float64
LIRR: % of Comparable Pre-Pandemic Day                            float64
Metro-North: Total Estimated Ridership                              int64
Metro-North: % of Comparable Pre-Pandemic Day                     float64
Access-A-Ride: Total Scheduled Trips                                int64
Access-A-Ride: % of Comparable Pre-Pandemic Day                   float64
Bridges and Tunnels: Total Traffic                                  int64
Bridges and Tunnels: % of Comparable Pre-Pandemic Day             float64
Staten Island Railway: Total Estimated

In [11]:
data.shape

(1461, 15)

In [12]:
# data.isna().sum()
# data.dropna(inplace = True)

In [13]:

# Calculate the 3-day moving average for each mode of transport
data['Subways_3d_avg'] = data['Subways: % of Comparable Pre-Pandemic Day'].rolling(window=3).mean()
data['Buses_3d_avg'] = data['Buses: % of Comparable Pre-Pandemic Day'].rolling(window=3).mean()
data['LIRR_3d_avg'] = data['LIRR: % of Comparable Pre-Pandemic Day'].rolling(window=3).mean()
data['MetroNorth_3d_avg'] = data['Metro-North: % of Comparable Pre-Pandemic Day'].rolling(window=3).mean()
data['BridgesTunnels_3d_avg'] = data['Bridges and Tunnels: % of Comparable Pre-Pandemic Day'].rolling(window=3).mean()



In [14]:
data.head()

Unnamed: 0,Date,Subways: Total Estimated Ridership,Subways: % of Comparable Pre-Pandemic Day,Buses: Total Estimated Ridership,Buses: % of Comparable Pre-Pandemic Day,LIRR: Total Estimated Ridership,LIRR: % of Comparable Pre-Pandemic Day,Metro-North: Total Estimated Ridership,Metro-North: % of Comparable Pre-Pandemic Day,Access-A-Ride: Total Scheduled Trips,Access-A-Ride: % of Comparable Pre-Pandemic Day,Bridges and Tunnels: Total Traffic,Bridges and Tunnels: % of Comparable Pre-Pandemic Day,Staten Island Railway: Total Estimated Ridership,Staten Island Railway: % of Comparable Pre-Pandemic Day,Subways_3d_avg,Buses_3d_avg,LIRR_3d_avg,MetroNorth_3d_avg,BridgesTunnels_3d_avg
0,2020-03-01,2212965,0.97,984908,0.99,,,55826,0.59,19922,1.13,786961,0.98,1636.0,0.52,,,,,
1,2020-03-02,5329915,0.96,2209066,0.99,321569.0,1.03,180702,0.66,30338,1.02,874620,0.95,17140.0,1.07,,,,,
2,2020-03-03,5481103,0.98,2228608,0.99,319727.0,1.02,190648,0.69,32767,1.1,882175,0.96,17453.0,1.09,0.97,0.99,,0.646667,0.963333
3,2020-03-04,5498809,0.99,2177165,0.97,311662.0,0.99,192689,0.7,34297,1.15,905558,0.98,17136.0,1.07,0.976667,0.983333,1.013333,0.683333,0.963333
4,2020-03-05,5496453,0.99,2244515,1.0,307597.0,0.98,194387,0.7,33209,1.12,929298,1.01,17203.0,1.08,0.986667,0.986667,0.996667,0.696667,0.983333


In [15]:
# # Selecting the columns for the melt
# columns_to_melt = [
#     'Subways: % of Comparable Pre-Pandemic Day',
#     'Buses: % of Comparable Pre-Pandemic Day',
#     'LIRR: % of Comparable Pre-Pandemic Day',
#     'Metro-North: % of Comparable Pre-Pandemic Day',
#     'Bridges and Tunnels: % of Comparable Pre-Pandemic Day'
# ]

# # Melting the data to long format
# melted_data = data.melt(id_vars=['Date'], 
#                         value_vars=columns_to_melt, 
#                         var_name='Transit_System', 
#                         value_name='Percentage')

# # Renaming the transit systems to cleaner names
# melted_data['Transit_System'] = melted_data['Transit_System'].replace({
#     'Subways: % of Comparable Pre-Pandemic Day': 'Subway',
#     'Buses: % of Comparable Pre-Pandemic Day': 'Bus',
#     'LIRR: % of Comparable Pre-Pandemic Day': 'LIRR',
#     'Metro-North: % of Comparable Pre-Pandemic Day': 'Metro-North',
#     'Bridges and Tunnels: % of Comparable Pre-Pandemic Day': 'Bridges and Tunnels'
# })





In [16]:


# melted_data

In [17]:

# chart = alt.Chart(melted_data).mark_line().encode(
#     x=alt.X('Date:T', title='Date'),
#     y=alt.Y('Percentage:Q', title='% of Pre-Pandemic Ridership'),
#     color=alt.Color('Transit_System:N', title='Transit System'),
#     tooltip=['Date:T', 'Transit_System:N', 'Percentage:Q']
# ).properties(
#     title='MTA Daily Ridership (March 2020 - December 2023)',
#     width=800,
#     height=400
# )

In [18]:
subway = pd.DataFrame({
    'Date': data['Date'],
    'Transit_System': 'Subway',
    'Percentage': data['Subways_3d_avg']
})

bus = pd.DataFrame({
    'Date': data['Date'],
    'Transit_System': 'Bus',
    'Percentage': data['Buses_3d_avg']
})

lirr = pd.DataFrame({
    'Date': data['Date'],
    'Transit_System': 'LIRR',
    'Percentage': data['LIRR_3d_avg']
})

metro_north = pd.DataFrame({
    'Date': data['Date'],
    'Transit_System': 'Metro-North',
    'Percentage': data['MetroNorth_3d_avg']
})

bridges_tunnels = pd.DataFrame({
    'Date': data['Date'],
    'Transit_System': 'Bridges and Tunnels',
    'Percentage': data['BridgesTunnels_3d_avg']
})


In [19]:
# Concatenating the data frames
all_transit_data = pd.concat([subway, bus, lirr, metro_north, bridges_tunnels], ignore_index=True)
all_transit_data

Unnamed: 0,Date,Transit_System,Percentage
0,2020-03-01,Subway,
1,2020-03-02,Subway,
2,2020-03-03,Subway,0.970000
3,2020-03-04,Subway,0.976667
4,2020-03-05,Subway,0.986667
...,...,...,...
7300,2024-02-25,Bridges and Tunnels,1.056667
7301,2024-02-26,Bridges and Tunnels,1.040000
7302,2024-02-27,Bridges and Tunnels,1.030000
7303,2024-02-28,Bridges and Tunnels,1.003333


In [20]:
# line_chart = alt.Chart(all_transit_data).mark_line().encode(
#     x=alt.X('Date:T', title='Date'),
#     y=alt.Y('Percentage:Q', title='% of Pre-Pandemic Ridership'),
#     color=alt.Color('Transit_System:N', title='Transit System'),
#     tooltip=['Date:T', 'Transit_System:N', 'Percentage:Q']
# ).properties(
#     width=800,
#     height=400,
#     title="MTA Daily Ridership Recovery (March 2020 - December 2023)"
# )

In [21]:
# Adjust the line chart with y-axis scale starting from -100 to 100
line_chart = alt.Chart(all_transit_data).mark_line().encode(
    x=alt.X('Date:T', title='Date'),
    y=alt.Y('Percentage:Q',
            title='% of Pre-Pandemic Ridership',
            scale=alt.Scale(domain=[-100, 100]),  # Setting range from -100% to 100%
            axis=alt.Axis(format='.0f')  # Ensure integer labels without decimals
           ),
    color=alt.Color('Transit_System:N', title='Transit System'),
    tooltip=[
        'Date:T',
        'Transit_System:N',
        alt.Tooltip('Percentage:Q', title='% of Pre-Pandemic', format='.0f')
    ]
).transform_calculate(
    Percentage="(datum.Percentage - 1) * 100"  # Convert decimals to percentages
).properties(
    width=800,
    height=400,
    title="MTA Daily Ridership Recovery (March 2020 - December 2023)"
)

In [22]:
line_chart

In [23]:
# Adding the lockdown vertical line (March 22, 2020)
lockdown_line = alt.Chart(pd.DataFrame({'Date': ['2020-03-22']})).mark_rule(strokeDash=[5, 5],strokeWidth=1.69, color='black').encode(
    x='Date:T',
)
lockdown_line

In [24]:
horizontal_line = alt.Chart(pd.DataFrame({'y': [0]})).mark_rule(strokeWidth=1, color='black').encode(
    y='y'
)
horizontal_line

In [25]:
# # labels = all_transit_data.groupby('Transit_System').agg({'Date': 'max'}).reset_index()
# # labels = pd.merge(labels, all_transit_data, on=['Transit_System', 'Date'])

# # text_labels = alt.Chart(labels).mark_text(align='left', dx=5).encode(
# #     x='Date:T',
# #     y='Percentage:Q',
# #     text='Transit_System:N',
# #     color='Transit_System:N'
# # )
# # text_labels

# # ----- 
# last_points = all_transit_data.groupby('Transit_System').tail(1)

# # Create text labels at the last data point of each transit system
# # text = alt.Chart(last_points).mark_text(align='left', dx=5).encode(
# #     x=alt.X('Date:T', title='Date'),  # Same x-axis encoding as the line chart
# #     y=alt.Y('Percentage:Q', title='% of Pre-Pandemic Ridership'),  # Same y-axis encoding as the line chart
# #     text='Transit_System:N',  # Label with the transit system name
# #     color='Transit_System:N'  # Ensure label color matches the line color
# # )

# # ----- 


# # Calculate maximum date for each transit system
# # labels = all_transit_data.groupby('Transit_System').agg({'Date': 'max'}).reset_index()
# # labels = pd.merge(labels, all_transit_data, on=['Transit_System', 'Date'])

# # Create text labels with consistent scale
# text_labels = alt.Chart(last_points).mark_text(align='left', dx=5).encode(
#     x='Date:T',
    
#     y=alt.Y('Percentage:Q', scale=alt.Scale(domain=[-100, 100]),
#             axis=alt.Axis(format='.0f')  # Ensure integer labels without decimals

           
#            ),  # Ensure consistent y-scale
#     text='Transit_System:N',
#     color='Transit_System:N'
# )


# text_labels



In [26]:
# Get the last data point for each transit system
last_points = all_transit_data.groupby('Transit_System').tail(1)

# Create text labels at the last data point of each transit system
text_labels = alt.Chart(last_points).mark_text(align='left', dx=5).encode(
    x=alt.X('Date:T', title='Date'),  # Same x-axis as the line chart

    
    # Transform the Y axis to match the scale from -100 to 100 by converting decimals to percentages
    y=alt.Y('Percentage:Q', 
            scale=alt.Scale(domain=[-100, 100]),  # Consistent scale
            axis=alt.Axis(format='.0f'),  # Ensure integer labels without decimals
   
            title='% of Pre-Pandemic Ridership'),
    text=alt.Text('Transit_System:N'),  # Label with the transit system name
    color='Transit_System:N'  # Color to match the lines
    
).transform_calculate(
    # Convert the decimals to percentage format for correct placement on the y-axis
    Percentage="(datum.Percentage * 100) - 100"
)


text_labels

In [27]:
last_points

Unnamed: 0,Date,Transit_System,Percentage
1460,2024-02-29,Subway,0.706667
2921,2024-02-29,Bus,0.586667
4382,2024-02-29,LIRR,0.763333
5843,2024-02-29,Metro-North,0.756667
7304,2024-02-29,Bridges and Tunnels,1.033333


In [28]:
dots = alt.Chart(last_points).mark_point(size=69, fillOpacity=1, filled=True).encode(
    x=alt.X('Date:T', title='Date'),  # Same x-axis as the line chart
    # Transform the Y axis to match the scale from -100 to 100 by converting decimals to percentages

    
    y=alt.Y('Percentage:Q', 
            scale=alt.Scale(domain=[-100, 100]),  # Consistent scale
            axis=alt.Axis(format='.0f'),  # Ensure integer labels without decimals
            title='% of Pre-Pandemic Ridership'),
    color='Transit_System:N',  # Color to match the lines
    tooltip=[
        'Date:T',
        'Transit_System:N',
        alt.Tooltip('Percentage:Q', title='% of Pre-Pandemic', format='.0f')
    ]
).transform_calculate(
    # Convert the decimals to percentage format for correct placement on the y-axis
    Percentage="(datum.Percentage * 100) - 100"
)



dots

In [29]:
title_text1 = alt.Chart(pd.DataFrame({'x': [0.5], 'y': [0.5]})).mark_text(
    text='New York Lockdown',
    align='left',
    baseline='middle',
    fontSize=15,
    fontWeight='bold',
    color='black'
).encode(
    x=alt.value(18),  # Adjust x position inside the chart
    y=alt.value(50)   # Adjust y position inside the chart
)
title_text1

In [30]:
title_text2= alt.Chart(pd.DataFrame({'x': [0.5], 'y': [0.5]})).mark_text(
    text='March 22, 2020',
    align='left',
    baseline='middle',
    fontSize=11,
    fontWeight=900,
    color='black'
).encode(
    x=alt.value(18),  # Adjust x position inside the chart
    y=alt.value(70)   # Adjust y position inside the chart
)
title_text2

In [31]:
title_text3= alt.Chart(pd.DataFrame({'x': [0.5], 'y': [0.5]})).mark_text(
    text='Rolling three-day average',
    align='center',
    baseline='middle',
    fontSize=15,
    fontWeight=500,
    color='black'
).encode(
    x=alt.value(680),  # Adjust x position inside the chart
    y=alt.value(350)   # Adjust y position inside the chart
)
title_text3

In [32]:
final = line_chart + dots+ lockdown_line + horizontal_line + text_labels + title_text1 + title_text2 + title_text3
final

In [33]:
print("GG")

GG
