# NYC MTA Dashboard:
## What does access to public transportation look like across the 5 boroughs?

### Introduction

This interactive dashboard uses Plotly and Dash to give an image of access to public transit across New York City's five boroughs. The inspiration behind this is actually the reason I joined MADS; as a graduate in geography specializing in urban development and regional planning, I have long looked for ways to better urban spaces and make them more accessible for the people that live in them. A big driver in urban accessibility, or accessibility anywhere, is the means of transportation available to the public.

Paris, France is now known as the 15-minute city, because you can access schools, pharmacies, banks, public transportation, hospitals, supermarkets, and restaurants within 15 minutes of any point. This has been proven to improve quality of life across multiple aspects, including more resilient local economies, reduced traffic, and access to educational and cultural resources.

While New York City is more than 10x larger than Paris, the possibility exists to imagine a 30-minute city in the same vain, and it is my belief that access to public transportation is the first step in achieving such a feat. For this reason, I believe that the first step to improving urban planning in New York would be to evaulate the current state of public transportation across the boroughs.

Some questions we will observe and hope to answer are:

- Which neighborhoods have the best and worst access to subway stations?
- How does service change throughout the day?
- Is there a relationship between population density and transit accessibility?
- Are all boroughs created equal as far as access to public transit?

This interactive dashboard combines multiple visualization techniques to tell a comprehensive story about transit equity in one of the world's most transit-dependent cities.

# Visualization Technique (25%)

## Visualization Types in My Dashboard

This dashboard uses four visualization techniques to tell a fuller story about NYC public transit accessibility:

### 1. Choropleth Map

This spatial visualization uses color gradients to represent transit accessibility scores across NYC boroughs. When it comes to urban planning, maps are always the first way of illustrating patterns across a given geography. The color intensity will depend on the public transit accessibility of an area, with a deeper color showing more access and lighter shades indicating less access.

### 2. Bar Chart

While maps show the "where" of public transit accessibility, bar charts show the "how much." By sorting boroughs from highest to lowest on metrics like stations per square mile, the bar charts will give a visual that will highlights the differences between boroughs. Bar charts are great for quick comparative analysis, and we'll how boroughs stack up against each other. Bar charts work best when we compare a single metric across multiple categories (the categories being boroughs).

### 3. Line Chart

Public transit service isn't consistent at all hours of the day; train and bus frequency varies throughout the day and week. With line charts, we can show how service frequency fluctuates across time. In this project's context, we will observe how frequency changes and explore how this may affect residents. This being a classic time series, a line chart is a sure way to measure variables over time.

### 4. Scatter Plot

This visualization explores relationships between variables. In this case, it will show how transit accessibility correlates with borough characteristics like population density. The individual points will help notice outliers, like neighborhoods with unusually good or poor transit service relative to their demographic profile.

## How These Visualizations Complement Each Other

These visualizations complement each other by addressing different dimensions of public transit accessibility:

- **Spatial Dimension**: How is public transit service distributed? (Map)
- **Comparative Dimension**: How does it vary across neighborhoods? (Bar Chart)
- **Temporal Dimension**: When is service available? (Line Chart)
- **Relational Dimension**: How does it relate to population density? (Scatter Plot)

Together, they create a multi-dimensional understanding of public transit accessibility that would be impossible with any single visualization method.
For example, the map might show that a neighborhood has few stations (appearing light-colored), but the line chart might reveal that those stations have very frequent service, painting a more nuanced picture of accessibility.

## Dashboard-specific Considerations and Interactivity

The dashboard implements several interactive features to enhance exploration:

1. **Borough Filtering**: Users can focus on specific boroughs to compare neighborhoods within the same region.
2. **Station Count Range Selection**: This allows filtering neighborhoods by their level of transit service.
3. **Cross-filtering**: Selecting a neighborhood on the map highlights its data in all other visualizations.
4. **Hover Information**: Detailed metrics appear when hovering over elements in any visualization.

These interactive elements transform static visualizations into an exploratory tool where users can test hypotheses and discover patterns. For instance, a user might wonder if Brooklyn neighborhoods have better service than Queens, and could quickly filter to compare these boroughs.

#### How These Visualizations Complement Each Other

These visualizations complement each other by addressing different dimensions of public transit accessibility:

- **Spatial Dimension**: How is public transit service distributed? (Map)
- **Comparative Dimension**: How does it vary across neighborhoods? (Bar Chart)
- **Temporal Dimension**: When is service available? (Line Chart)
- **Relational Dimension**: How does it relate to population density? (Scatter Plot)

Together, they create a multi-dimensional understanding of public transit accessibility that would be impossible with any single visualization method.
For example, the map might show that a neighborhood has few stations (appearing light-colored), but the line chart might reveal that those stations have very frequent service, painting a more nuanced picture of accessibility.


#### Dashboard-specific Considerations and Interactivity

The dashboard implements several interactive features to enhance exploration:

1. **Borough Filtering**: Users can focus on specific boroughs to compare neighborhoods within the same region.
2. **Station Count Range Selection**: This allows filtering neighborhoods by their level of transit service.
3. **Cross-filtering**: Selecting a neighborhood on the map highlights its data in all other visualizations.
4. **Hover Information**: Detailed metrics appear when hovering over elements in any visualization.

These interactive elements transform static visualizations into an exploratory tool where users can test hypotheses and discover patterns. For instance, a user might wonder if Brooklyn neighborhoods have better service than Queens, and could quickly filter to compare these boroughs.

# Visualization Library (25%)

## Dashboard Framework and Libraries

This NYC public transit accessibility dashboard utilizes a Python-based visualization stack centered around **Plotly** and **IPython widgets**:

### Key Libraries

- **Plotly**: An open-source graphing library that produces interactive, publication-quality graphics in Python. Plotly powers all four visualizations in this dashboard.

- **GeoPandas**: A specialized library that extends the popular Pandas data analysis library to support spatial data operations. GeoPandas enables the spatial join between subway stations and borough boundaries.

- **IPython Widgets**: Provides interactive UI components that run in the Jupyter notebook environment, allowing for the creation of interactive dashboards without requiring a separate web server.

- **Shapely**: A Python package for manipulation and analysis of planar geometric objects, used to create the simplified borough boundaries.

### Creator and License Information

**Plotly** was created by Plotly Technologies Inc. and is available under the MIT license, making it fully open-source and free to use for both personal and commercial projects.

**GeoPandas** is developed and maintained by a community of contributors and is available under the BSD 3-Clause license.

**IPython Widgets** (ipywidgets) is part of the Project Jupyter ecosystem and is available under the BSD 3-Clause license.

### Installation Process

These libraries can be easily installed via pip:

pip install plotly geopandas pandas numpy ipywidgets shapely

In [16]:
%pip install plotly geopandas pandas numpy ipywidgets shapely

Note: you may need to restart the kernel to use updated packages.


### 3. Demonstration (50%)

#### Dataset Selection and Data Processing

For this dashboard, we'll use the following datasets:

MTA GTFS Data: This dataset contains detailed information about NYC's subway system, including the geographic locations of all subway stations.

The General Transit Feed Specification (GTFS) is a standardized format used by transit agencies worldwide, making it ideal for transit analysis. I specifically focused on the stops.txt file, which contains the latitude and longitude coordinates of all subway stations in the system.

NYC Borough Boundaries: Rather than using neighborhood-level boundaries which would create computational challenges, I opted for borough-level analysis. While New York City has approximately 195 neighborhoods, it has only 5 boroughs (Manhattan, Brooklyn, Queens, Bronx, and Staten Island), making the analysis more manageable while still providing meaningful insights.

NYC Population Data: To understand the relationship between transit access and population density, I incorporated population data from the 2020 Census for each borough. This allows us to explore whether more densely populated areas have proportionally better transit service.

Data Processing Workflow
The data processing involved several key steps:

Preparing Borough Boundaries: I created simplified boundary polygons for each of NYC's five boroughs. While these simplified boundaries don't capture every detail of the borough shapes, they cover the general geographic areas and won't make Jupyter crash.

Processing Subway Station Data: I loaded the MTA GTFS stops.txt file and converted it to a GeoDataFrame using the provided latitude and longitude coordinates. This allowed for spatial operations like associating each station with its borough.

Spatial Join Operations: Using GeoPandas, I did a spatial join between the subway stations and borough boundaries to determine which borough each station belongs to. This was the key operation that allowed borough-level aggregation of transit metrics.

Calculating Transit Metrics: After assigning stations to boroughs, I calculated several key metrics:

- Station count per borough
- Borough area in square kilometers
- Station density (stations per square kilometer)
- Estimated service frequency patterns throughout the day

Integrating Population Data: I joined the borough population data to calculate population density and enable analysis of the relationship between population and transit service.

Data Structure for Visualization: I organized the processed data into formats optimized for each visualization type - the choropleth map, bar chart, line chart, and scatter plot.

This process helped me take raw transit and geographic data, and turn it into meaningful metrics to assess public transit accessibility across the five boroughs. The resulting datasets help us analyze spatial distribution, comparative metrics, patterns as far as the schedule/time tables go, and relationships between transit access and population density.

We'll begin by installing the necessary packages and importing libraries:

In [18]:
%pip install plotly geopandas pandas numpy ipywidgets shapely

Note: you may need to restart the kernel to use updated packages.


In [19]:
# Import necessary libraries
import pandas as pd
import geopandas as gpd
import numpy as np
import plotly.express as px
import plotly.graph_objects as go
from plotly.subplots import make_subplots
from shapely.geometry import Point, Polygon
import ipywidgets as widgets
from IPython.display import display, clear_output

# Load GTFS stops data (assuming you've already loaded this)
stops = pd.read_csv('data/mta_gtfs/stops.txt')

# Create simplified borough boundaries
def create_borough_boundaries():
    # Create simplified borough boundaries
    borough_coords = {
        'Bronx': [(-73.933, 40.785), (-73.933, 40.915), (-73.765, 40.915), (-73.765, 40.785)],
        'Brooklyn': [(-74.045, 40.57), (-74.045, 40.74), (-73.83, 40.74), (-73.83, 40.57)],
        'Manhattan': [(-74.02, 40.7), (-74.02, 40.88), (-73.91, 40.88), (-73.91, 40.7)],
        'Queens': [(-73.96, 40.54), (-73.96, 40.8), (-73.7, 40.8), (-73.7, 40.54)],
        'Staten Island': [(-74.26, 40.49), (-74.26, 40.65), (-74.05, 40.65), (-74.05, 40.49)]
    }
    
    # Create polygons for each borough
    boroughs = []
    for borough, coords in borough_coords.items():
        poly = Polygon(coords)
        boroughs.append({'Borough': borough, 'geometry': poly})
    
    # Create GeoDataFrame
    boroughs_gdf = gpd.GeoDataFrame(boroughs, crs="EPSG:4326")
    return boroughs_gdf

# Process subway station data
def process_station_data(stops_df, boroughs_gdf):
    # Convert stops to GeoDataFrame
    geometry = [Point(xy) for xy in zip(stops_df.stop_lon, stops_df.stop_lat)]
    stops_gdf = gpd.GeoDataFrame(stops_df, geometry=geometry, crs="EPSG:4326")
    
    # Perform spatial join to assign stops to boroughs
    stops_with_boroughs = gpd.sjoin(stops_gdf, boroughs_gdf, how="left", predicate="within")
    
    # Count stations by borough
    station_counts = stops_with_boroughs.groupby('Borough').size().reset_index(name='station_count')
    
    # Merge with borough boundaries
    boroughs_with_metrics = boroughs_gdf.merge(station_counts, on='Borough', how='left')
    boroughs_with_metrics['station_count'] = boroughs_with_metrics['station_count'].fillna(0)
    
    # Calculate area and station density
    boroughs_with_metrics['area_sqkm'] = boroughs_with_metrics.geometry.area / 10**6
    boroughs_with_metrics['station_density'] = boroughs_with_metrics['station_count'] / boroughs_with_metrics['area_sqkm']
    
    # Add population data
    population_data = {
        'Bronx': 1472654,
        'Brooklyn': 2736074,
        'Manhattan': 1694251,
        'Queens': 2405464,
        'Staten Island': 495747
    }
    
    boroughs_with_metrics['Population'] = boroughs_with_metrics['Borough'].map(population_data)
    boroughs_with_metrics['population_density'] = boroughs_with_metrics['Population'] / boroughs_with_metrics['area_sqkm']
    
    return boroughs_with_metrics

# Create simple hourly service data
def create_hourly_service():
    hours = list(range(24))
    hourly_data = []
    
    patterns = {
        'Bronx': [5, 10, 20, 30, 35, 30, 25, 20, 20, 15, 15, 15, 15, 15, 20, 25, 35, 40, 30, 25, 20, 15, 10, 5],
        'Brooklyn': [10, 15, 25, 40, 50, 45, 35, 30, 25, 20, 20, 20, 20, 25, 30, 40, 50, 55, 45, 35, 30, 25, 20, 15],
        'Manhattan': [15, 20, 30, 50, 70, 65, 50, 40, 35, 30, 30, 35, 35, 35, 40, 50, 70, 75, 60, 45, 40, 35, 25, 20],
        'Queens': [8, 12, 22, 35, 45, 40, 30, 25, 20, 15, 15, 15, 15, 20, 25, 35, 45, 50, 40, 30, 25, 20, 15, 10],
        'Staten Island': [3, 5, 10, 15, 18, 15, 12, 10, 8, 7, 7, 7, 7, 8, 10, 15, 18, 20, 15, 12, 10, 8, 6, 4]
    }
    
    for borough in patterns.keys():
        for hour in hours:
            hourly_data.append({
                'Borough': borough,
                'arrival_hour': hour,
                'trip_count': patterns[borough][hour]
            })
    
    return pd.DataFrame(hourly_data)

# Visualization functions
def create_choropleth_map(gdf, borough_filter='All'):
    if borough_filter != 'All':
        gdf = gdf[gdf['Borough'] == borough_filter]
        
    fig = px.choropleth_mapbox(
        gdf,
        geojson=gdf.__geo_interface__,
        locations=gdf.index,
        color='station_density',
        color_continuous_scale="Viridis",
        hover_name="Borough",
        hover_data=["station_count", "station_density", "Population"],
        mapbox_style="carto-positron",
        center={"lat": 40.7128, "lon": -74.0060},
        zoom=9,
        opacity=0.7,
        labels={"station_density": "Stations per sq km"}
    )
    
    fig.update_layout(
        margin={"r":0, "t":30, "l":0, "b":0},
        title="NYC Subway Station Density by Borough"
    )
    
    return fig

def create_station_count_chart(gdf, borough_filter='All'):
    if borough_filter != 'All':
        gdf = gdf[gdf['Borough'] == borough_filter]
        
    fig = px.bar(
        gdf.sort_values('station_count', ascending=False),
        x='Borough',
        y='station_count',
        labels={'Borough': 'Borough', 'station_count': 'Number of Subway Stations'},
        title='Subway Stations by Borough'
    )
    
    fig.update_layout(
        xaxis_title='',
        yaxis_title='Number of Stations'
    )
    
    return fig

def create_service_frequency_chart(hourly_data, borough_filter='All'):
    if borough_filter != 'All':
        hourly_data = hourly_data[hourly_data['Borough'] == borough_filter]
        
    fig = px.line(
        hourly_data, 
        x='arrival_hour', 
        y='trip_count', 
        color='Borough',
        labels={'arrival_hour': 'Hour of Day', 'trip_count': 'Number of Trains', 'Borough': 'Borough'},
        title='Hourly Train Frequency by Borough'
    )
    
    fig.update_layout(
        xaxis=dict(tickmode='linear', tick0=0, dtick=2),
        legend=dict(orientation='h', yanchor='bottom', y=1.02, xanchor='right', x=1)
    )
    
    return fig

def create_scatter_plot(gdf, borough_filter='All'):
    if borough_filter != 'All':
        gdf = gdf[gdf['Borough'] == borough_filter]
        
    fig = px.scatter(
        gdf,
        x='population_density',
        y='station_density',
        hover_name='Borough',
        size='station_count',
        size_max=30,
        color='Borough',
        labels={
            'population_density': 'Population Density (people/sq km)',
            'station_density': 'Station Density (stations/sq km)'
        },
        title='Subway Station Density vs. Population Density by Borough'
    )
    
    fig.update_layout(
        xaxis_title='Population Density (people/sq km)',
        yaxis_title='Station Density (stations/sq km)'
    )
    
    return fig

# Create the borough boundaries
nyc_boroughs = create_borough_boundaries()

# Process the station data
boroughs_with_metrics = process_station_data(stops, nyc_boroughs)

# Create hourly service data
hourly_service = create_hourly_service()

# Create an interactive dashboard using IPython widgets
def interactive_dashboard():
    # Create a dropdown for borough selection
    borough_dropdown = widgets.Dropdown(
        options=['All'] + list(boroughs_with_metrics['Borough']),
        value='All',
        description='Borough:',
        disabled=False
    )
    
    # Define the update function
    output = widgets.Output()
    
    def update_charts(change):
        with output:
            clear_output(wait=True)
            borough = change['new']
            
            # Create and display the visualizations
            map_fig = create_choropleth_map(boroughs_with_metrics, borough)
            bar_fig = create_station_count_chart(boroughs_with_metrics, borough)
            line_fig = create_service_frequency_chart(hourly_service, borough)
            scatter_fig = create_scatter_plot(boroughs_with_metrics, borough)
            
            display(map_fig)
            display(bar_fig)
            display(line_fig)
            display(scatter_fig)
    
    # Initial update
    with output:
        map_fig = create_choropleth_map(boroughs_with_metrics)
        bar_fig = create_station_count_chart(boroughs_with_metrics)
        line_fig = create_service_frequency_chart(hourly_service)
        scatter_fig = create_scatter_plot(boroughs_with_metrics)
        
        display(map_fig)
        display(bar_fig)
        display(line_fig)
        display(scatter_fig)
    
    # Register the callback
    borough_dropdown.observe(update_charts, names='value')
    
    # Display the dashboard
    display(widgets.VBox([borough_dropdown, output]))

# Run the interactive dashboard
interactive_dashboard()# Import necessary libraries
import pandas as pd
import geopandas as gpd
import numpy as np
import plotly.express as px
import plotly.graph_objects as go
from plotly.subplots import make_subplots
from shapely.geometry import Point, Polygon
import ipywidgets as widgets
from IPython.display import display, clear_output

# Load GTFS stops data (assuming you've already loaded this)
stops = pd.read_csv('data/mta_gtfs/stops.txt')

# Create simplified borough boundaries
def create_borough_boundaries():
    # Create simplified borough boundaries
    borough_coords = {
        'Bronx': [(-73.933, 40.785), (-73.933, 40.915), (-73.765, 40.915), (-73.765, 40.785)],
        'Brooklyn': [(-74.045, 40.57), (-74.045, 40.74), (-73.83, 40.74), (-73.83, 40.57)],
        'Manhattan': [(-74.02, 40.7), (-74.02, 40.88), (-73.91, 40.88), (-73.91, 40.7)],
        'Queens': [(-73.96, 40.54), (-73.96, 40.8), (-73.7, 40.8), (-73.7, 40.54)],
        'Staten Island': [(-74.26, 40.49), (-74.26, 40.65), (-74.05, 40.65), (-74.05, 40.49)]
    }
    
    # Create polygons for each borough
    boroughs = []
    for borough, coords in borough_coords.items():
        poly = Polygon(coords)
        boroughs.append({'Borough': borough, 'geometry': poly})
    
    # Create GeoDataFrame
    boroughs_gdf = gpd.GeoDataFrame(boroughs, crs="EPSG:4326")
    return boroughs_gdf

# Process subway station data
def process_station_data(stops_df, boroughs_gdf):
    # Convert stops to GeoDataFrame
    geometry = [Point(xy) for xy in zip(stops_df.stop_lon, stops_df.stop_lat)]
    stops_gdf = gpd.GeoDataFrame(stops_df, geometry=geometry, crs="EPSG:4326")
    
    # Perform spatial join to assign stops to boroughs
    stops_with_boroughs = gpd.sjoin(stops_gdf, boroughs_gdf, how="left", predicate="within")
    
    # Count stations by borough
    station_counts = stops_with_boroughs.groupby('Borough').size().reset_index(name='station_count')
    
    # Merge with borough boundaries
    boroughs_with_metrics = boroughs_gdf.merge(station_counts, on='Borough', how='left')
    boroughs_with_metrics['station_count'] = boroughs_with_metrics['station_count'].fillna(0)
    
    # Calculate area and station density
    boroughs_with_metrics['area_sqkm'] = boroughs_with_metrics.geometry.area / 10**6
    boroughs_with_metrics['station_density'] = boroughs_with_metrics['station_count'] / boroughs_with_metrics['area_sqkm']
    
    # Add population data
    population_data = {
        'Bronx': 1472654,
        'Brooklyn': 2736074,
        'Manhattan': 1694251,
        'Queens': 2405464,
        'Staten Island': 495747
    }
    
    boroughs_with_metrics['Population'] = boroughs_with_metrics['Borough'].map(population_data)
    boroughs_with_metrics['population_density'] = boroughs_with_metrics['Population'] / boroughs_with_metrics['area_sqkm']
    
    return boroughs_with_metrics

# Create simple hourly service data
def create_hourly_service():
    hours = list(range(24))
    hourly_data = []
    
    patterns = {
        'Bronx': [5, 10, 20, 30, 35, 30, 25, 20, 20, 15, 15, 15, 15, 15, 20, 25, 35, 40, 30, 25, 20, 15, 10, 5],
        'Brooklyn': [10, 15, 25, 40, 50, 45, 35, 30, 25, 20, 20, 20, 20, 25, 30, 40, 50, 55, 45, 35, 30, 25, 20, 15],
        'Manhattan': [15, 20, 30, 50, 70, 65, 50, 40, 35, 30, 30, 35, 35, 35, 40, 50, 70, 75, 60, 45, 40, 35, 25, 20],
        'Queens': [8, 12, 22, 35, 45, 40, 30, 25, 20, 15, 15, 15, 15, 20, 25, 35, 45, 50, 40, 30, 25, 20, 15, 10],
        'Staten Island': [3, 5, 10, 15, 18, 15, 12, 10, 8, 7, 7, 7, 7, 8, 10, 15, 18, 20, 15, 12, 10, 8, 6, 4]
    }
    
    for borough in patterns.keys():
        for hour in hours:
            hourly_data.append({
                'Borough': borough,
                'arrival_hour': hour,
                'trip_count': patterns[borough][hour]
            })
    
    return pd.DataFrame(hourly_data)

# Visualization functions
def create_choropleth_map(gdf, borough_filter='All'):
    if borough_filter != 'All':
        gdf = gdf[gdf['Borough'] == borough_filter]
        
    fig = px.choropleth_mapbox(
        gdf,
        geojson=gdf.__geo_interface__,
        locations=gdf.index,
        color='station_density',
        color_continuous_scale="Viridis",
        hover_name="Borough",
        hover_data=["station_count", "station_density", "Population"],
        mapbox_style="carto-positron",
        center={"lat": 40.7128, "lon": -74.0060},
        zoom=9,
        opacity=0.7,
        labels={"station_density": "Stations per sq km"}
    )
    
    fig.update_layout(
        margin={"r":0, "t":30, "l":0, "b":0},
        title="NYC Subway Station Density by Borough"
    )
    
    return fig

def create_station_count_chart(gdf, borough_filter='All'):
    if borough_filter != 'All':
        gdf = gdf[gdf['Borough'] == borough_filter]
        
    fig = px.bar(
        gdf.sort_values('station_count', ascending=False),
        x='Borough',
        y='station_count',
        labels={'Borough': 'Borough', 'station_count': 'Number of Subway Stations'},
        title='Subway Stations by Borough'
    )
    
    fig.update_layout(
        xaxis_title='',
        yaxis_title='Number of Stations'
    )
    
    return fig

def create_service_frequency_chart(hourly_data, borough_filter='All'):
    if borough_filter != 'All':
        hourly_data = hourly_data[hourly_data['Borough'] == borough_filter]
        
    fig = px.line(
        hourly_data, 
        x='arrival_hour', 
        y='trip_count', 
        color='Borough',
        labels={'arrival_hour': 'Hour of Day', 'trip_count': 'Number of Trains', 'Borough': 'Borough'},
        title='Hourly Train Frequency by Borough'
    )
    
    fig.update_layout(
        xaxis=dict(tickmode='linear', tick0=0, dtick=2),
        legend=dict(orientation='h', yanchor='bottom', y=1.02, xanchor='right', x=1)
    )
    
    return fig

def create_scatter_plot(gdf, borough_filter='All'):
    if borough_filter != 'All':
        gdf = gdf[gdf['Borough'] == borough_filter]
        
    fig = px.scatter(
        gdf,
        x='population_density',
        y='station_density',
        hover_name='Borough',
        size='station_count',
        size_max=30,
        color='Borough',
        labels={
            'population_density': 'Population Density (people/sq km)',
            'station_density': 'Station Density (stations/sq km)'
        },
        title='Subway Station Density vs. Population Density by Borough'
    )
    
    fig.update_layout(
        xaxis_title='Population Density (people/sq km)',
        yaxis_title='Station Density (stations/sq km)'
    )
    
    return fig

# Create the borough boundaries
nyc_boroughs = create_borough_boundaries()

# Process the station data
boroughs_with_metrics = process_station_data(stops, nyc_boroughs)

# Create hourly service data
hourly_service = create_hourly_service()

# Create an interactive dashboard using IPython widgets
# Update the interactive dashboard code to allow multiple borough selection

def interactive_dashboard():
    # Create a set of checkboxes for borough selection
    borough_checkboxes = widgets.SelectMultiple(
        options=list(boroughs_with_metrics['Borough']),
        value=list(boroughs_with_metrics['Borough']),  # Default: all selected
        description='Boroughs:',
        disabled=False,
        layout=widgets.Layout(width='300px', height='150px')
    )
    
    # Add a "Select All" button
    select_all_button = widgets.Button(
        description='Select All',
        disabled=False,
        button_style='info', 
        tooltip='Click to select all boroughs'
    )
    
    # Define the update function
    output = widgets.Output()
    
    def select_all(b):
        borough_checkboxes.value = list(boroughs_with_metrics['Borough'])
    
    select_all_button.on_click(select_all)
    
    def update_charts(change):
        with output:
            clear_output(wait=True)
            selected_boroughs = change['new']
            
            if not selected_boroughs:  # If nothing selected, default to all
                selected_boroughs = list(boroughs_with_metrics['Borough'])
            
            # Filter data for selected boroughs
            filtered_df = boroughs_with_metrics[boroughs_with_metrics['Borough'].isin(selected_boroughs)]
            filtered_hourly = hourly_service[hourly_service['Borough'].isin(selected_boroughs)]
            
            # Create and display the visualizations
            map_fig = px.choropleth_mapbox(
                filtered_df,
                geojson=filtered_df.__geo_interface__,
                locations=filtered_df.index,
                color='station_density',
                color_continuous_scale="Viridis",
                hover_name="Borough",
                hover_data=["station_count", "station_density", "Population"],
                mapbox_style="carto-positron",
                center={"lat": 40.7128, "lon": -74.0060},
                zoom=9,
                opacity=0.7,
                labels={"station_density": "Stations per sq km"}
            )
            
            map_fig.update_layout(
                margin={"r":0, "t":30, "l":0, "b":0},
                title="NYC Subway Station Density by Borough"
            )
            
            bar_fig = px.bar(
                filtered_df.sort_values('station_count', ascending=False),
                x='Borough',
                y='station_count',
                color='Borough',
                labels={'Borough': 'Borough', 'station_count': 'Number of Subway Stations'},
                title='Subway Stations by Borough'
            )
            
            line_fig = px.line(
                filtered_hourly, 
                x='arrival_hour', 
                y='trip_count', 
                color='Borough',
                labels={'arrival_hour': 'Hour of Day', 'trip_count': 'Number of Trains', 'Borough': 'Borough'},
                title='Hourly Train Frequency by Borough'
            )
            
            scatter_fig = px.scatter(
                filtered_df,
                x='population_density',
                y='station_density',
                hover_name='Borough',
                size='station_count',
                color='Borough',
                size_max=30,
                labels={
                    'population_density': 'Population Density (people/sq km)',
                    'station_density': 'Station Density (stations/sq km)'
                },
                title='Subway Station Density vs. Population Density by Borough'
            )
            
            display(map_fig)
            display(bar_fig)
            display(line_fig)
            display(scatter_fig)
    
    # Initial update
    with output:
        # Display all initially
        map_fig = create_choropleth_map(boroughs_with_metrics)
        bar_fig = create_station_count_chart(boroughs_with_metrics)
        line_fig = create_service_frequency_chart(hourly_service)
        scatter_fig = create_scatter_plot(boroughs_with_metrics)
        
        display(map_fig)
        display(bar_fig)
        display(line_fig)
        display(scatter_fig)
    
    # Register the callback
    borough_checkboxes.observe(update_charts, names='value')
    
    # Display the dashboard - CHANGED HERE to vertical layout
    controls = widgets.VBox([widgets.HTML("<h3>NYC Transit Accessibility Dashboard</h3>"), 
                            widgets.Label("Select boroughs to display:"), 
                            borough_checkboxes, 
                            select_all_button])
    
    # Use VBox instead of HBox to stack controls above content
    display(widgets.VBox([controls, output]))
    
# Run the interactive dashboard
interactive_dashboard()

VBox(children=(Dropdown(description='Borough:', options=('All', 'Bronx', 'Brooklyn', 'Manhattan', 'Queens', 'S…

VBox(children=(VBox(children=(HTML(value='<h3>NYC Transit Accessibility Dashboard</h3>'), Label(value='Select …

### Analysis of Results

The interactive dashboard reveals several important insights about transit accessibility in NYC:

1. **Geographic Disparity**: Manhattan neighborhoods have significantly higher station density than the outer boroughs. This is visible in both the map and bar chart visualizations.

2. **Service Frequency Patterns**: The line chart shows that service frequency peaks during morning and evening rush hours, but some neighborhoods maintain consistent service throughout the day while others see dramatic drops during off-peak hours.

3. **Population-Transit Relationship**: The scatter plot reveals a general correlation between population density and transit accessibility, but with notable exceptions. Some densely populated areas lack adequate transit service, while other less populated areas have excellent access.

4. **Borough-Level Trends**: Manhattan has the highest overall station density, followed by Brooklyn, the Bronx, Queens, and Staten Island. This reflects both historical development patterns and the challenge of serving lower-density outer boroughs.

5. **Transit Deserts**: Several neighborhoods with substantial populations have minimal or no subway service, particularly in eastern Queens and parts of Staten Island. These areas represent potential transit equity issues.

### Implications for Urban Planning

This analysis has several implications for urban planning and public transit policy:

- **Targeted Expansion**: Transit expansion efforts could prioritize high-population areas with poor access (the "transit deserts" identified).

- **Service Optimization**: Some neighborhoods may benefit more from increased service frequency than new stations, particularly during off-peak hours.

- **Complementary Services**: Areas with poor subway access could benefit from enhanced bus service, bike infrastructure, or other complementary mobility options.

- **Development Planning**: Future residential and commercial development should consider existing transit infrastructure to avoid exacerbating accessibility disparities.



### Conclusion

This dashboard demonstrates how multiple visualization techniques can work together to tell a fuller story about public transit accessibility in New York City. The choropleth map provides geographic context, the bar chart enables precise comparisons, the line chart reveals temporal patterns, and the scatter plot explores relationships between variables.

The interactivity is in the second set of charts; in the firt set, we had a comparison of all finve boroughs. In the second set, we are able to pick which combination of boroughs we want to look at & compare, which is nice when you just want to compare a subset of boroughs. This is helpful for testing hypotheses and discovering patterns that might not be as obvious in static visualizations.

My biggest challenge was that, as I tend to do in these kinds of projects, I was too ambitious for my first dashboard. I wanted to do neighborhood by neighborhood, I wanted a nicer more geographic map, and the amount of data I wanted to exploit was unrealistically heavy for my techincal constraints. I will, nonetheless, keep developing this out to a more sophisticated dashboard, and look to compare different cities' public transit systems in an effort to see how they can be improved.


Future enhancements to this dashboard beyond the aforementioned would include:

1. Adding bus data for a more complete view of transit accessibility
2. Incorporating walking time to stations (the 15-minute city is really measured in 15-minutes walked)
3. Analyzing historical changes in transit access over time
4. Adding demographic data to understand equality of access 