# UBER NYC OPERATIONAL OPTIMIZATION AND SUPPLY STRATEGY

## 1. INTRODUCTION

**Report Purpose**:   
Provide a comprehensive overview of Uber's operational system in New York, ranging from core system mechanics (2019-2025) to specific current bottlenecks (2023-2025) to propose supply optimization and pricing strategies.    

**Data Scope**:  
- Long-term data (2019-2025): Analysis of market stability and recovery trends (DRT)
- Focused data (2023-2025): Detailed diagnosis of current issues with congestion, geographic barriers, and weather impacts.
- Data: tlc_sample_processed (2019-2025), agg_network_monthly.parquet, agg_timeline_hourly.parquet

## 2. OPERATIONAL MECHANISMS AND MACROECONOMIC TRENDS
(Based on core system analysis and long-term data 2019-2025)

### 2.1. Data Loading and Review

### 2.2. Overall Analysis

### 2.3. Demand Analysis

### 2.3. Demand Recovery Index (DRT)

## 3. CURRENT MARKET DIAGNOSIS

In [1]:
FILE_PATHS = [
    r"D:\gi√°o tr√¨nh nƒÉm 3 k√¨ 1\Data Prep & Visualization\Uber\HVFHV subsets 2019-2025 - Samples\HVFHV subsets 2019-2025 - Samples\tlc_sample_2023_processed.parquet",
    r"D:\gi√°o tr√¨nh nƒÉm 3 k√¨ 1\Data Prep & Visualization\Uber\HVFHV subsets 2019-2025 - Samples\HVFHV subsets 2019-2025 - Samples\tlc_sample_2024_processed.parquet",
    r"D:\gi√°o tr√¨nh nƒÉm 3 k√¨ 1\Data Prep & Visualization\Uber\HVFHV subsets 2019-2025 - Samples\HVFHV subsets 2019-2025 - Samples\tlc_sample_2025_processed.parquet"
]

In [2]:
pip install -U kaleido

Note: you may need to restart the kernel to use updated packages.


### 3.1. Growth Hotspots

In [5]:
import polars as pl
import pandas as pd
import plotly.graph_objects as go
from plotly.subplots import make_subplots
import plotly.io as pio
import os
import uber_style

# --- CONFIG ---
pio.templates["uber"] = uber_style.uber_style_template
pio.templates.default = "uber"
SCALE_FACTOR = 100 

def analyze_dashboard_final_dumbbell(file_paths, output_dir="plots"):
    """
    Generates or loads the Hotspots Dashboard (Stacked + Dumbbell).
    Saves output as JSON and HTML in the specified directory.
    """
    
    # Define filenames
    json_filename = "hotspots_dashboard.json"
    html_filename = "hotspots_dashboard.html"
    
    # Ensure output directory exists
    if not os.path.exists(output_dir):
        os.makedirs(output_dir)
        print(f"üìÅ Created directory: {output_dir}")
        
    json_path = os.path.join(output_dir, json_filename)
    html_path = os.path.join(output_dir, html_filename)

    # --- CHECK IF PLOT EXISTS ---
    if os.path.exists(json_path):
        print(f"‚úÖ Plot already exists at {json_path}. Loading from file...")
        fig = pio.read_json(json_path)
        print("üéâ Dashboard loaded successfully (Data processing skipped).")
        
        # Uncomment the line below to show the plot when loading from file
        # fig.show() 
        return fig

    print(f"üöÄ Plot not found. Initializing Dashboard Generation (Dumbbell Fixed)...")
    
    # ==========================================================================
    # PART 1: DATA PROCESSING
    # ==========================================================================
    
    # 1.1 Load Data
    dfs = []
    for path in file_paths:
        try:
            filename = os.path.basename(path)
            year_label = ''.join(filter(str.isdigit, filename))[:4]
            d = pl.read_parquet(path, columns=[
                "pickup_zone", "dropoff_zone", "pickup_month", 
                "cultural_day_type", "pickup_date"
            ])
            months_active = d["pickup_month"].n_unique()
            d = d.with_columns([
                pl.lit(year_label).alias("Year"),
                pl.lit(months_active).alias("Months_Active"),
                pl.col("cultural_day_type").str.to_lowercase()
            ])
            dfs.append(d)
        except Exception: continue

    if not dfs: return None
    df_main = pl.concat(dfs)
    
    # 1.2 Data for Chart 1 (Hotspots)
    df_pu = df_main.select([pl.col("pickup_zone").alias("Zone"), pl.col("Year"), pl.col("Months_Active")])
    df_do = df_main.select([pl.col("dropoff_zone").alias("Zone"), pl.col("Year"), pl.col("Months_Active")])
    df_activity = pl.concat([df_pu, df_do])
    
    hotspots = (
        df_activity.group_by(["Zone", "Year", "Months_Active"])
        .agg((pl.len() * SCALE_FACTOR).alias("Real_Vol"))
        .with_columns(
            ((pl.col("Real_Vol") / pl.col("Months_Active")) * 12).alias("Projected_Vol")
        )
    )
    
    # Filter Top 15
    top_zones_list = (
        hotspots.group_by("Zone")
        .agg(pl.sum("Projected_Vol").alias("Grand_Total"))
        .sort("Grand_Total", descending=True)
        .head(15)
        .get_column("Zone")
    )
    
    df_chart1 = hotspots.filter(pl.col("Zone").is_in(top_zones_list)).to_pandas()
    
    # 1.3 Data for Chart 2 (Intensity - Dumbbell)
    target_types = ["workday", "weekend_night"]
    df_intensity = df_main.filter(pl.col("cultural_day_type").is_in(target_types))
    
    day_counts = (
        df_intensity.group_by("cultural_day_type")
        .agg(pl.col("pickup_date").n_unique().alias("n_days"))
        .to_pandas().set_index("cultural_day_type")["n_days"].to_dict()
    )
    
    df_pu_int = df_intensity.select([pl.col("pickup_zone").alias("Zone"), pl.col("cultural_day_type")])
    df_do_int = df_intensity.select([pl.col("dropoff_zone").alias("Zone"), pl.col("cultural_day_type")])
    df_int_activity = pl.concat([df_pu_int, df_do_int])
    
    df_chart2 = (
        df_int_activity.group_by(["Zone", "cultural_day_type"])
        .agg((pl.len() * SCALE_FACTOR).alias("Total_Vol"))
        .to_pandas()
    )
    df_chart2["n_days"] = df_chart2["cultural_day_type"].map(day_counts)
    df_chart2["Daily_Avg"] = df_chart2["Total_Vol"] / df_chart2["n_days"]
    
    df_chart2 = df_chart2[df_chart2["Zone"].isin(top_zones_list)]
    df_pivot = df_chart2.pivot(index="Zone", columns="cultural_day_type", values="Daily_Avg").reset_index()
    
    zone_order = top_zones_list.to_list()
    zone_order.reverse() 

    print("üìä Rendering Dashboard...")

    # ==========================================================================
    # PART 2: VISUALIZATION
    # ==========================================================================
    fig = make_subplots(
        rows=1, cols=2,
        column_widths=[0.55, 0.45],
        subplot_titles=(
            f"<b>Top 15 Market Scale (Stacked)</b>", 
            f"<b>Daily Intensity: Workday vs. Nightlife</b>"
        ),
        horizontal_spacing=0.12
    )
    
    # --- CHART 1: STACKED BAR ---
    colors = {"2023": "#D3EFDE", "2024": uber_style.UBER_GREEN, "2025": "#0E3F25"}
    
    for year in ["2023", "2024", "2025"]:
        year_data = df_chart1[df_chart1["Year"] == year]
        fig.add_trace(
            go.Bar(
                y=year_data["Zone"],
                x=year_data["Projected_Vol"],
                name=year,
                orientation='h',
                marker_color=colors[year],
                texttemplate="%{x:.2s}", 
                textposition='auto',
                textfont=dict(color=uber_style.UBER_WHITE if year == "2025" else uber_style.UBER_BLACK, size=9)
            ),
            row=1, col=1
        )

    # --- CHART 2: DUMBBELL PLOT ---
    # Connector lines (Hide legend)
    fig.add_trace(
        go.Scatter(
            x=[val for pair in zip(df_pivot["workday"], df_pivot["weekend_night"], [None]*len(df_pivot)) for val in pair],
            y=[val for pair in zip(df_pivot["Zone"], df_pivot["Zone"], [None]*len(df_pivot)) for val in pair],
            mode="lines",
            line=dict(color=uber_style.GRAY_300, width=3),
            showlegend=False,
            hoverinfo="skip"
        ),
        row=1, col=2
    )
    
    # Workday
    fig.add_trace(
        go.Scatter(
            x=df_pivot["workday"],
            y=df_pivot["Zone"],
            mode="markers",
            name="Workday",
            marker=dict(color=uber_style.UBER_GREEN, size=10, symbol="circle"),
        ),
        row=1, col=2
    )
    
    # Weekend Night
    fig.add_trace(
        go.Scatter(
            x=df_pivot["weekend_night"],
            y=df_pivot["Zone"],
            mode="markers",
            name="Weekend Night",
            marker=dict(color=uber_style.UBER_PURPLE, size=10, symbol="circle"),
        ),
        row=1, col=2
    )

    # ==========================================================================
    # PART 3: LAYOUT (REFINED)
    # ==========================================================================
    fig.update_layout(
        title=dict(
            text="<b>Strategic Hotspot Analysis: Scale & Behavior</b>",
            y=0.98, x=0.5, xanchor='right', yanchor='top'
        ),
        barmode='stack', 
        height=750,
        margin=dict(l=260, r=50, t=100, b=80),
        legend=dict(
            orientation="h", 
            y=1.06, x=0.7, 
            xanchor="center",
            traceorder="normal" 
        ),
        plot_bgcolor="white"
    )
    
    # Axes
    fig.update_xaxes(title_text="Annualized Volume (Est.)", row=1, col=1, showgrid=True, gridcolor=uber_style.GRAY_100)
    fig.update_yaxes(categoryorder='array', categoryarray=zone_order, row=1, col=1)
    
    fig.update_xaxes(title_text="Avg Trips / Day", row=1, col=2, showgrid=True, gridcolor=uber_style.GRAY_100)
    fig.update_yaxes(categoryorder='array', categoryarray=zone_order, row=1, col=2, showticklabels=False)

    # Branding
    fig = uber_style.apply_uber_branding(
        fig,
        source="Source: NYC TLC HVFHV (Scaled x100)",
        footer_y=-0.1
    )

    # --- SAVE TO FILE ---
    print(f"üíæ Saving dashboard to {output_dir}...")
    fig.write_json(json_path)
    fig.write_html(html_path)
    print(f"‚úÖ Dashboard saved successfully: {json_filename}, {html_filename}")

    # Uncomment the line below to show the plot immediately
    # fig.show()
    
    return fig

# EXECUTE (The plot will not show automatically)
fig = analyze_dashboard_final_dumbbell(FILE_PATHS)
# To show later:
fig.show()

‚úÖ Plot already exists at plots\hotspots_dashboard.json. Loading from file...
üéâ Dashboard loaded successfully (Data processing skipped).


**Context & Forecast:**
Following the recovery (2023) and stabilization (2024) phases, the 2025 forecast indicates a slight market deceleration in core zones driven by the implementation of the **CBD Congestion Fee**.  
The Pivot: Current hotspots are no longer just areas of high demand; they are signals for Uber to shift from "broad coverage" to "targeted infrastructure investment" (e.g., smart signage, dedicated pickup lanes) to reduce operational friction.

Based on behavioral data, we segment the market into 4 strategic clusters to optimize supply allocation:

#### 1. JFK & LaGuardia Airports
* Characteristics: Inelastic Demand. Although representing only ~8% of the data, these are "super-nodes" with continuous, dual-direction flows. Customers travel regardless of price.
* Strategy: Reliability First (24/7 Supply).  
These are defensive strongholds. Insufficient supply here results in immediate loss of market share and High Ticket Value revenue to Yellow Taxis and Lyft.

#### 2. "The Utility Core": Midtown, Penn Station
* Characteristics (Workday Logic): Compulsory/Utility Demand.
    * Behavior: Dominated by office hours (Mon-Fri). Flows are highly directional (AM: Inbound $\rightarrow$ PM: Outbound) and predictable.
    * Weekend: Becomes a "Ghost Town" with significantly reduced demand.
* Strategy: Maximize Reliability & Efficiency.
    * Focus supply during weekdays.
    * Action: Deploy Uber Shuttle / Commuter Pass products to aggregate fixed demand and lower costs for frequent commuters.
    * Scale down supply on weekend nights to avoid resource waste.

#### 3. "The Leisure Hubs": East Village, Williamsburg
* Characteristics (Weekend Night Logic): Leisure Demand.
    * Behavior: Explodes between 10 PM - 3 AM on Fridays and Saturdays. Customers are time-sensitive (need a car fast) rather than price-sensitive.
    * Economics: High willingness to pay Surge Pricing and Tips.
* Strategy: Yield Maximization.
    * Shape supply towards these zones on weekend nights to capture high margins.
    * Prioritize Availability algorithms to minimize wait times (reducing friction for intoxicated/impatient riders).

#### 4. "The Stabilizers": Crown Heights, Bushwick
* Characteristics: Resilient Demand.
    * Consistent usage throughout the week (commuting on weekdays, local travel on weekends).
* Strategy: Base Load Maintenance.
    * Use these zones as a Supply Buffer on off-peak days (Mondays, Sundays) to help drivers maintain a baseline income when the core zones cool down.

### 3.2. Wait Time Analysis

In [11]:
import polars as pl
import pandas as pd
import geopandas as gpd
import plotly.express as px
import plotly.io as pio
import os
import uber_style  # Your style template

# --- STEP 0: REGISTER STYLE ---
pio.templates["uber"] = uber_style.uber_style_template
pio.templates.default = "uber"

# --- CONFIGURATION ---
sample_path_2023 = r"D:\gi√°o tr√¨nh nƒÉm 3 k√¨ 1\Data Prep & Visualization\Uber\HVFHV subsets 2019-2025 - Samples\HVFHV subsets 2019-2025 - Samples\tlc_sample_2023_processed.parquet"
agg_network_path = r"D:\gi√°o tr√¨nh nƒÉm 3 k√¨ 1\Data Prep & Visualization\Uber\HVFHV subsets 2019-2025 - Aggregates\HVFHV subsets 2019-2025 - Aggregates\Aggregates_Processed\agg_network_monthly.parquet"
shapefile_path = r"D:\gi√°o tr√¨nh nƒÉm 3 k√¨ 1\Data Prep & Visualization\Uber\taxi_zones\taxi_zones.shp"

# --- SAVE/LOAD CONFIGURATION ---
output_dir = "plots"
plot_name = "map_wait_time_latency"
json_path = os.path.join(output_dir, f"{plot_name}.json")
html_path = os.path.join(output_dir, f"{plot_name}.html")

# Ensure output directory exists
if not os.path.exists(output_dir):
    os.makedirs(output_dir)

# ==============================================================================
# LOGIC: CHECK IF PLOT EXISTS -> LOAD OR GENERATE
# ==============================================================================

if os.path.exists(json_path):
    print(f"‚úÖ Plot found at '{json_path}'. Loading from file (Skipping processing)...")
    fig = pio.read_json(json_path)
    
else:
    print(f"üöÄ Plot not found. Starting Data Processing & Generation...")

    # 1. LOAD & OPTIMIZE SHAPEFILE (The Canvas)
    print("   üó∫Ô∏è Loading & Optimizing Shapefile...")
    try:
        gdf_zones = gpd.read_file(shapefile_path).to_crs(epsg=4326)
        # [IMPORTANT] Simplify geometry for 10x smoother Map
        gdf_zones["geometry"] = gdf_zones["geometry"].simplify(tolerance=0.001, preserve_topology=True)
    except Exception as e:
        print(f"Error: {e}")
        raise

    # 2. DATA AGGREGATION
    print("   üöÄ Loading Operational Metrics...")
    q_lookup = (
        pl.scan_parquet(sample_path_2023)
        .select(["PULocationID", "pickup_zone", "pickup_borough"])
        .unique(subset=["PULocationID"])
    )
    df_lookup = q_lookup.collect()

    q_net = (
        pl.scan_parquet(agg_network_path)
        .filter(pl.col("pickup_year") >= 2023)
        .join(df_lookup.lazy(), on="PULocationID", how="left")
        .group_by(["PULocationID", "pickup_zone", "pickup_borough"])
        .agg([
            pl.col("trip_count").sum().alias("Total_Volume"),
            ((pl.col("avg_wait_time") * pl.col("trip_count")).sum() / pl.col("trip_count").sum()).alias("Avg_Wait_Time_min"),
            ((pl.col("avg_driver_response") * pl.col("trip_count")).sum() / pl.col("trip_count").sum()).alias("Avg_Response_Time_min")
        ])
        .filter(pl.col("Total_Volume") > 5000)
    )
    df_data = q_net.collect().to_pandas()

    # 3. MERGE GEOMETRY & DATA
    gdf_plot = gdf_zones.merge(df_data, left_on="LocationID", right_on="PULocationID", how="inner")

    # 4. VISUALIZATION
    print("   üìä Generating Smooth Interactive Map...")

    # Diverging Color Scale
    UBER_DIVERGING = [
        (0.00, uber_style.UBER_GREEN),   # Good (Fast)
        (0.50, uber_style.GRAY_100),     # Average
        (1.00, uber_style.UBER_RED)      # Bad (Slow)
    ]

    fig = px.choropleth_mapbox(
        gdf_plot,
        geojson=gdf_plot.geometry,
        locations=gdf_plot.index,
        color="Avg_Wait_Time_min",
        range_color=[2, 8],
        
        # Tooltips
        hover_name="pickup_zone",
        hover_data={
            "pickup_borough": True,
            "Total_Volume": ":,.0f",
            "Avg_Wait_Time_min": ":.1f",
            "Avg_Response_Time_min": ":.1f"
        },
        
        # Styling
        color_continuous_scale=UBER_DIVERGING,
        mapbox_style="carto-positron", 
        zoom=10,
        center={"lat": 40.73, "lon": -73.93},
        opacity=0.7, 
        height=800
    )

    # 5. BRANDING & LAYOUT
    fig = uber_style.apply_uber_branding(
        fig,
        title="Operational Map: Wait Time Latency",
        subtitle="Avg. Passenger Wait Time (Min) by Zone | 2023 - 2025",
        source="Source: NYC TLC HVFHV Records (1% Sample)",
        footer_y=-0.05,
        logo_y=-0.1
    )

    fig.update_layout(
        margin=dict(l=0, r=0, t=50, b=0), 
        coloraxis_colorbar=dict(
            title="Wait Time<br>(Minutes)",
            thicknessmode="pixels", thickness=15,
            lenmode="pixels", len=300,
            yanchor="top", y=0.95,
            xanchor="left", x=0.02, 
            bgcolor="rgba(255,255,255,0.9)",
            tickfont=dict(family="Uber Move", size=12, color="black")
        )
    )

    # --- SAVE TO FILE ---
    print(f"   üíæ Saving figure to {output_dir}...")
    fig.write_json(json_path)
    fig.write_html(html_path)
    print("   ‚úÖ Generation & Save Complete.")

# --- OPTIONAL: DISPLAY ---
# Uncomment the line below to interact with the map
fig.show(config={'scrollZoom': True})

üöÄ Plot not found. Starting Data Processing & Generation...
   üó∫Ô∏è Loading & Optimizing Shapefile...
   üöÄ Loading Operational Metrics...
   üìä Generating Smooth Interactive Map...
   üíæ Saving figure to plots...
   ‚úÖ Generation & Save Complete.



Based on the wait time heatmap, the market is divided into three distinct operational zones requiring specific intervention strategies.

#### 1. High Liquidity Zones
* Areas: Green zones (Most Regions in Manhattan and the Surroundings).
* Status: Efficient. Supply meets demand immediately.
* Analysis: This area has driver density. Low wait times indicate high liquidity.
* Note: Low wait times do not guarantee fast trip speeds. We must cross-reference with Traffic Speed metrics to ensure vehicles are not stuck in congestion immediately after pickup.

#### 2. Physical Congestion Zones
* Areas: Large orange zones (JFK, LaGuardia Airports).
* Status: Process Bottlenecks.
* Analysis: High wait times here are not caused by a lack of vehicles (supply is abundant). The root causes are complex terminal pickup procedures and sudden demand surges when flights land, creating local bottlenecks.

#### 3. Structural & Geographic Friction Zones
These are "red alert" areas where the standard operational model fails due to external factors.

* A. Staten Island (Economic Structure Issue):
    * Status: Record high wait times despite the large land area.
    * Cause: Low demand density. Drivers act rationally by avoiding this area to prevent idle time. When a request occurs, the system must dispatch a driver from far away.

* B. The Rockaways (Geographic Isolation):
    * Status: The "Dead End" effect.
    * Cause: The area is isolated by Jamaica Bay with only two entry points. Drivers fear "Deadhead" risks (driving back empty) and toll costs reduce their profit margins.

* C. Corona Park (Navigational Friction):
    * Status: High pickup latency.
    * Cause: The park is too vast without a clear grid system. GPS pin drift causes riders and drivers to lose time trying to locate each other.


### TARGETED SOLUTIONS

We propose shifting from generic solutions to tailored mechanisms for each friction type:

| Friction Type | Target Area | Strategic Solution | Mechanism |
| :--- | :--- | :--- | :--- |
| **Economic** (Low Density) | Staten Island | Uber Reserve | Shift from On-demand to Scheduled rides. This gives the system 30-60 minutes to dispatch a remote driver efficiently without making the rider wait. |
| **Geographic** (Tolls/Isolation) | The Rockaways | Return Toll Reimbursement | Automatically add a surcharge to the fare if the trip forces a driver across a toll bridge with a high risk of an empty return trip. |
| **Navigational** (Vast Parks) | Corona Park | Smart Meeting Points | Disable random GPS pins. The app only displays fixed "Blue Dots" (main gates/intersections), forcing riders to walk to a feasible pickup spot. |
| **Crowds** (Events) | Stadiums | Uber Zone (PIN Code) | Implement a FIFO queue. Riders receive a 6-digit PIN and take the first available car in the line, eliminating the need to find a specific license plate. |

## 4. DEEP ANALYSIS OF BOTTLENECKS

### 4.1. Dead Mileage

**Definition & Impact**:
Dead Mileage (Zero-Passenger Miles) represents the most critical inefficiency in the network. It occurs when a driver travels empty from a drop-off point to a new pickup location.
* Operational Cost: Increases fuel consumption and vehicle depreciation without revenue.
* System Friction: Reduces overall fleet availability, triggering unnecessary price surges in adjacent neighborhoods.

We diagnose this by calculating Net Flow Imbalance (Drop-offs minus Pickups) to identify where the network is "leaking" supply.

In [None]:
# --- STEP 0: REGISTER UBER STYLE ---
# Ensure template is activated
pio.templates["uber"] = uber_style.uber_style_template
pio.templates.default = "uber"

def analyze_network_imbalance_final(file_paths):
    print("üöÄ Analyzing Dead Mileage (Final Version)...")
    
    # --- 1. DATA LOADING & PREP ---
    dfs = []
    cols = ["pickup_zone", "dropoff_zone"]
    
    for path in file_paths:
        try:
            # Load sample data
            df = pd.read_parquet(path, columns=cols)
            dfs.append(df)
        except Exception:
            continue
            
    if not dfs: return
    df_main = pd.concat(dfs)
    
    # Calculate Pickups & Dropoffs
    pu = df_main["pickup_zone"].value_counts().reset_index()
    pu.columns = ["Zone", "Pickups"]
    
    do = df_main["dropoff_zone"].value_counts().reset_index()
    do.columns = ["Zone", "Dropoffs"]
    
    # Merge
    df_bal = pd.merge(pu, do, on="Zone", how="outer").fillna(0)
    
    # --- [IMPORTANT] SCALING x100 ---
    # Convert from 1% sample to actual estimate
    df_bal["Pickups"] = df_bal["Pickups"] * 100
    df_bal["Dropoffs"] = df_bal["Dropoffs"] * 100
    
    # Calculate Net Flow: Positive = Excess cars, Negative = Shortage
    df_bal["Net_Inflow"] = df_bal["Dropoffs"] - df_bal["Pickups"]
    df_bal["Total_Activity"] = df_bal["Dropoffs"] + df_bal["Pickups"]
    
    # --- 2. FILTERING & CATEGORIZATION ---
    # Only take zones with high activity (> 50k trips) for C-level reporting
    df_viz = df_bal[df_bal["Total_Activity"] > 50000].sort_values("Net_Inflow", ascending=True)
    
    # Split into 2 groups
    top_deficit = df_viz.head(10).copy() # Shortage (High Demand)
    top_surplus = df_viz.tail(10).copy() # Excess (Oversupply)
    
    # Assign Category labels (for Legend display)
    top_deficit["Category"] = "High Demand (Opportunity)"
    top_surplus["Category"] = "Drivers Available (Oversupply)"
    
    # Combine and sort for symmetric chart display
    plot_data = pd.concat([top_deficit, top_surplus])
    plot_data["Abs_Inflow"] = plot_data["Net_Inflow"].abs()
    plot_data = plot_data.sort_values("Abs_Inflow", ascending=True)

    print("üìä Rendering high-definition chart...")

    # --- 3. VISUALIZATION (Black & Red) ---
    fig = px.bar(
        plot_data,
        x="Net_Inflow",
        y="Zone",
        orientation='h',
        color="Category",
        text_auto='.2s', # Show abbreviated numbers (e.g.: 120k)
        
        # COLOR MAPPING
        color_discrete_map={
            "High Demand (Opportunity)": uber_style.UBER_BLACK,   # Black: Core, Strong
            "Drivers Available (Oversupply)": uber_style.UBER_RED # Red: Alert, Available
        },
        # Custom data for Tooltip
        custom_data=["Pickups", "Dropoffs"]
    )

    # --- 4. LAYOUT POLISHING (STORYTELLING WITH DATA) ---
    fig.update_layout(
        # Impressive title
        title=dict(
            text="<b>Network Imbalance: Supply vs. Demand Gaps</b>",
            font=dict(size=22, family="Uber Move, Roboto, sans-serif")
        ),
        
        # Fine-tune axes
        xaxis_title="Net Vehicle Flow (Dropoffs - Pickups)",
        yaxis_title=None, # Hide Y-axis title to reduce clutter
        
        # [FIX] MARGINS: Increase left margin (l=200) for long zone names
        margin=dict(l=220, r=50, t=110, b=100),
        
        # # [FIX] LEGEND: Move to top to save horizontal space
        legend=dict(
            title=None,
            orientation="h",
            yanchor="bottom", y=1.02,
            xanchor="left", x=0.5,
            font=dict(size=12)
        ),
        
        # Space between bars
        bargap=0.35,
        height=750,
        
        # Clean white background (Minimalism)
        plot_bgcolor="white"
    )
    
    # Y-axis formatting (Zone Names)
    fig.update_yaxes(
        tickfont=dict(size=11, color="#333333"), # Dark gray for easy reading
        ticksuffix="   " # Small gap between text and chart
    )
    
    # X-axis formatting (Grid only vertical)
    fig.update_xaxes(
        showgrid=False, 
        gridcolor="#E5E5E5",
        zeroline=True, 
        zerolinecolor="black", 
        zerolinewidth=1.5 # Bold zero line to divide Risk/Opportunity
    )
    
    # Data label formatting
    fig.update_traces(
        textfont_size=11,
        textposition='outside', # Numbers outside bars for clarity
        cliponaxis=False,       # Allow numbers to overflow frame if needed
        
        # Detailed tooltip for data inspection
        hovertemplate="<b>%{y}</b><br>Net Flow: %{x:,.0f}<br>Pickups: %{customdata[0]:,.0f}<br>Dropoffs: %{customdata[1]:,.0f}<extra></extra>"
    )

    # # --- 5. UBER BRANDING FOOTER ---
    # fig = uber_style.apply_uber_branding(
    #     fig,
    #     source="Source: NYC TLC HVFHV Records (Scaled x100)",
    #     footer_y=-0.1 # Push footer to bottom
    # )
    
    fig.show()

# --- CALL FUNCTION ---
analyze_network_imbalance_final(FILE_PATHS)

üöÄ Analyzing Dead Mileage (Final Version)...
üìä Rendering high-definition chart...


In [13]:
# --- STEP 0: REGISTER UBER STYLE ---
pio.templates["uber"] = uber_style.uber_style_template
pio.templates.default = "uber"

def analyze_network_imbalance_final(file_paths, output_dir="plots"):
    """
    Generates or loads the Network Imbalance (Diverging Bar) Chart.
    Saves output as JSON and HTML in the specified directory.
    """
    
    # Define filenames
    json_filename = "network_imbalance.json"
    html_filename = "network_imbalance.html"
    
    # Ensure output directory exists
    if not os.path.exists(output_dir):
        os.makedirs(output_dir)
        
    json_path = os.path.join(output_dir, json_filename)
    html_path = os.path.join(output_dir, html_filename)

    # --- CHECK IF PLOT EXISTS ---
    if os.path.exists(json_path):
        print(f"‚úÖ Plot found at '{json_path}'. Loading from file...")
        fig = pio.read_json(json_path)
        # fig.show()
        return fig

    print("üöÄ Analyzing Dead Mileage (Generating New Plot)...")
    
    # --- 1. DATA LOADING & PREP ---
    dfs = []
    cols = ["pickup_zone", "dropoff_zone"]
    
    for path in file_paths:
        try:
            # Load sample data
            df = pd.read_parquet(path, columns=cols)
            dfs.append(df)
        except Exception:
            continue
            
    if not dfs: return None
    df_main = pd.concat(dfs)
    
    # Calculate Pickups & Dropoffs
    pu = df_main["pickup_zone"].value_counts().reset_index()
    pu.columns = ["Zone", "Pickups"]
    
    do = df_main["dropoff_zone"].value_counts().reset_index()
    do.columns = ["Zone", "Dropoffs"]
    
    # Merge
    df_bal = pd.merge(pu, do, on="Zone", how="outer").fillna(0)
    
    # --- [IMPORTANT] SCALING x100 ---
    # Convert from 1% sample to actual estimate
    df_bal["Pickups"] = df_bal["Pickups"] * 100
    df_bal["Dropoffs"] = df_bal["Dropoffs"] * 100
    
    # Calculate Net Flow: Positive = Excess cars, Negative = Shortage
    df_bal["Net_Inflow"] = df_bal["Dropoffs"] - df_bal["Pickups"]
    df_bal["Total_Activity"] = df_bal["Dropoffs"] + df_bal["Pickups"]
    
    # --- 2. FILTERING & CATEGORIZATION ---
    # Only take zones with high activity (> 50k trips)
    df_viz = df_bal[df_bal["Total_Activity"] > 50000].sort_values("Net_Inflow", ascending=True)
    
    # Split into 2 groups
    top_deficit = df_viz.head(10).copy() # Shortage (High Demand)
    top_surplus = df_viz.tail(10).copy() # Excess (Oversupply)
    
    # Assign Category labels
    top_deficit["Category"] = "High Demand (Opportunity)"
    top_surplus["Category"] = "Drivers Available (Oversupply)"
    
    # Combine and sort
    plot_data = pd.concat([top_deficit, top_surplus])
    plot_data["Abs_Inflow"] = plot_data["Net_Inflow"].abs()
    plot_data = plot_data.sort_values("Abs_Inflow", ascending=True)

    print("üìä Rendering high-definition chart...")

    # --- 3. VISUALIZATION (Black & Red) ---
    fig = px.bar(
        plot_data,
        x="Net_Inflow",
        y="Zone",
        orientation='h',
        color="Category",
        text_auto='.2s', 
        
        # COLOR MAPPING
        color_discrete_map={
            "High Demand (Opportunity)": uber_style.UBER_BLACK,   # Black: Core
            "Drivers Available (Oversupply)": uber_style.UBER_RED # Red: Alert
        },
        custom_data=["Pickups", "Dropoffs"]
    )

    # --- 4. LAYOUT POLISHING ---
    fig.update_layout(
        title=dict(
            text="<b>Network Imbalance: Supply vs. Demand Gaps</b>",
            font=dict(size=22, family="Uber Move, Roboto, sans-serif")
        ),
        xaxis_title="Net Vehicle Flow (Dropoffs - Pickups)",
        yaxis_title=None,
        
        # MARGINS
        margin=dict(l=220, r=50, t=110, b=100),
        
        # LEGEND
        legend=dict(
            title=None,
            orientation="h",
            yanchor="bottom", y=1.02,
            xanchor="left", x=0.5,
            font=dict(size=12)
        ),
        
        bargap=0.35,
        height=750,
        plot_bgcolor="white"
    )
    
    # Y-axis formatting
    fig.update_yaxes(
        tickfont=dict(size=11, color="#333333"),
        ticksuffix="   "
    )
    
    # X-axis formatting
    fig.update_xaxes(
        showgrid=False, 
        gridcolor="#E5E5E5",
        zeroline=True, 
        zerolinecolor="black", 
        zerolinewidth=1.5
    )
    
    # Data label formatting
    fig.update_traces(
        textfont_size=11,
        textposition='outside',
        cliponaxis=False,
        hovertemplate="<b>%{y}</b><br>Net Flow: %{x:,.0f}<br>Pickups: %{customdata[0]:,.0f}<br>Dropoffs: %{customdata[1]:,.0f}<extra></extra>"
    )

    # Branding
    fig = uber_style.apply_uber_branding(
        fig,
        source="Source: NYC TLC HVFHV Records (Scaled x100)",
        footer_y=-0.1 
    )
    
    # --- SAVE TO FILE ---
    print(f"üíæ Saving figure to {output_dir}...")
    fig.write_json(json_path)
    fig.write_html(html_path)
    print("‚úÖ Generation & Save Complete.")

    #fig.show()
    return fig

# --- CALL FUNCTION ---
analyze_network_imbalance_final(FILE_PATHS)

‚úÖ Plot found at 'plots\network_imbalance.json'. Loading from file...


#### 1. Excess Drop-offs: JFK, LaGuardia, Penn Station
* Data Signal: Massive inflow of vehicles, but significantly lower outflow of active trips.
* Root Cause (Modal Split Asymmetry):
    * Inbound (To Airport): Passengers prioritize Uber for convenience (luggage handling).
    * Outbound (From Airport): Passengers shift to substitutes like the AirTrain, Bus, or the dedicated Yellow Cab queues (cheaper/no wait time).
* The "Deadhead" Trap: Drivers dropping off at JFK/LGA face a dilemma: join a long queue in the "Wait Lot" or drive empty back to Queens/Brooklyn. Both options destroy hourly earnings.

#### 2. Excess Pickups: Lower East Side, East Village
* Data Signal: High request volume but low local vehicle turnover (insufficient drop-offs to recycle into new supply).
* Operational Result: The system must dispatch cars from distant zones, increasing global ETA and dead mileage across the city.


### STRATEGIC SOLUTION: HIGH-CAPACITY CONSOLIDATION

To solve the Airport problems, we must shift from individual unit optimization to high-capacity transit logic.

The Pivot: Uber Shuttle (Hybrid Transit)
* Status (2024-2025): Expansion of direct shuttle routes connecting Manhattan hubs to LaGuardia (LGA) and major event venues.
* Operational Mechanism:
    * Consolidation: Replaces ~50 individual UberX pickups with a single high-capacity vehicle.
    * Friction Reduction: Eliminates the chaotic pickup at terminals. A single vehicle loads at a dedicated zone, drastically reducing dwell time and curb congestion.
* Strategic Value: Converts the "Dead Mileage" problem into a "Line Haul" efficiency model, capturing the price-sensitive travelers who previously defected to the AirTrain.

### 4.2. Traffic Corridors

*Analysis of high-volume routes to identify where the current vehicle-based model is inefficient.*

We define a "Corridor" as a high-frequency pair of Pickup and Dropoff zones. By overlaying speed data onto volume data, we identify "Red Corridors"‚Äîroutes carrying massive passenger volume but suffering from critically low speeds.

In [14]:
# --- CONFIG ---
pio.templates["uber"] = uber_style.uber_style_template
pio.templates.default = "uber"

def analyze_top10_fixed_layout(file_paths, output_dir="plots"):
    """
    Generates or loads the Top 10 Busiest Corridors chart.
    Applies margins fix for long text and RdYlGn colorscale.
    """
    
    # Define filenames
    json_filename = "top_10_corridors_speed.json"
    html_filename = "top_10_corridors_speed.html"
    
    # Ensure output directory exists
    if not os.path.exists(output_dir):
        os.makedirs(output_dir)
        print(f"üìÅ Created directory: {output_dir}")
        
    json_path = os.path.join(output_dir, json_filename)
    html_path = os.path.join(output_dir, html_filename)

    # --- CHECK IF PLOT EXISTS ---
    if os.path.exists(json_path):
        print(f"‚úÖ Plot found at '{json_path}'. Loading from file...")
        fig = pio.read_json(json_path)
        # fig.show()
        return fig

    print("üöÄ Plot not found. Drawing Top 10 (Generating New Plot)...")
    
    # 1. Load & Process Data
    queries = []
    for p in file_paths:
        try:
            queries.append(pl.scan_parquet(p))
        except:
            continue
    if not queries: return None
    lf = pl.concat(queries)

    # Process Undirected
    lf_processed = (
        lf.filter((pl.col("trip_km") > 0.5) & (pl.col("displacement_speed_kmh").is_between(1, 100)))
        .with_columns([
            pl.min_horizontal("pickup_zone", "dropoff_zone").alias("Zone_1"),
            pl.max_horizontal("pickup_zone", "dropoff_zone").alias("Zone_2")
        ])
        .with_columns((pl.col("Zone_1") + " ‚Üî " + pl.col("Zone_2")).alias("Corridor"))
    )

    # Aggregation
    corridor_stats = (
        lf_processed.group_by("Corridor")
        .agg([
            (pl.len() * 100).alias("Trip_Count"), 
            pl.col("displacement_speed_kmh").median().alias("Median_Speed")
        ])
        .filter(pl.col("Trip_Count") > 15000)
        .sort("Trip_Count", descending=True)
        .head(10)
        .collect()
        .to_pandas()
    )

    print("üìä Rendering chart...")
    
    # 2. Visualization
    fig = px.bar(
        corridor_stats,
        x="Trip_Count",
        y="Corridor",
        orientation='h',
        color="Median_Speed",
        color_continuous_scale="RdYlGn", 
        range_color=[5, 35], 
        text_auto='.2s',
        hover_data={"Median_Speed": ":.1f"}
    )
    
    # 3. FIX DISPLAY BUG (FIX LAYOUT)
    fig.update_layout(
        yaxis={'categoryorder':'total ascending', 'title': None},
        xaxis={'title': "Total Trip Volume (Est.)"},
        
        # INCREASE MARGINS SIGNIFICANTLY
        margin=dict(l=350, r=120, t=110, b=80),
        
        height=700, 
        bargap=0.3,
        
        # Legend (Colorbar)
        coloraxis_colorbar=dict(
            title="Avg Speed (km/h)",
            orientation="h",
            yanchor="bottom", y=1.05,
            xanchor="left", x=0.2,
            thickness=15,
            len=0.4
        )
    )
    
    # Auto-adjust Y-axis for text
    fig.update_yaxes(
        tickfont=dict(size=11), 
        automargin=True
    )

    # Data Labels
    fig.update_traces(
        textposition='outside', 
        cliponaxis=False
    )
    
    # Apply Branding
    fig = uber_style.apply_uber_branding(
        fig,
        title="Top 10 Busiest Corridors: Volume vs. Speed",
        subtitle="Green > 35km/h (Fast) | Red < 5km/h (Congested)",
        source="Source: NYC TLC HVFHV",
        footer_y=-0.1,
        logo_y=-0.13
    )
    
    # --- SAVE TO FILE ---
    print(f"üíæ Saving figure to {output_dir}...")
    fig.write_json(json_path)
    fig.write_html(html_path)
    print("‚úÖ Generation & Save Complete.")
    
    # fig.show()
    return fig

# Call function
analyze_top10_fixed_layout(FILE_PATHS)

üöÄ Plot not found. Drawing Top 10 (Generating New Plot)...
üìä Rendering chart...
üíæ Saving figure to plots...
‚úÖ Generation & Save Complete.


#### A. Brooklyn Cluster: Intra-Borough Connectivity Gap
* The Conflict: Routes between Bushwick North and Bushwick South/Williamsburg are consistently congested (High Volume / Low Speed).
* Data Insight:
    * Infrastructure Limit: The NYC Subway system is radial (Manhattan-centric), making cross-town travel within Brooklyn difficult.
    * Behavioral Mismatch: Young residents use Uber for very short trips (1-3 km) for leisure, clogging narrow one-way streets.
* Strategic Solution: Micromobility Integration. This is a prime market for e-bikes and scooters. Integrating Lime/Citi Bike or promoting Uber Moto for trips under 3km will offload traffic from the streets.

#### B. Queens Cluster: First-Mile Feeder Friction
* The Conflict: The corridor from Jamaica to South Jamaica is severely gridlocked.
* Data Insight:
    * Transit Desert: South Jamaica has high density but no subway access. Residents rely on Uber to reach the Jamaica transit hub.
    * Bottleneck: Individual Uber drop-offs at the station entrance compete with buses, causing local paralysis.
* Strategic Solution: Aggregation & Smart Zoning.
    * Shift to Uber Shuttle or Uber Pool to consolidate these individual trips.
    * Coordinate with the MTA to establish "Smart Drop-off Zones" 1-2 blocks away from the main entrance to improve flow.

**Conclusion:**
In these peripheral zones, Uber acts as a bus service. The optimization strategy is not to add more cars, but to facilitate a Modal Shift: Two-wheelers for Brooklyn and Shuttles for Queens.

### 4.3. Border Friction

*Investigation into cross-borough connectivity and the economic value of paid infrastructure.*

This analysis utilizes two key datasets to determine if geographical bottlenecks (bridges/tunnels) are the driver of network inefficiency and whether paid routes offer a viable solution.

In [19]:
import polars as pl
import pandas as pd
import geopandas as gpd
import plotly.graph_objects as go
from plotly.subplots import make_subplots
import plotly.io as pio
import os
import uber_style

# --- CONFIG ---
pio.templates["uber"] = uber_style.uber_style_template
pio.templates.default = "uber"

def create_dashboard_smooth_map(file_paths, shapefile_path=r"D:\gi√°o tr√¨nh nƒÉm 3 k√¨ 1\Data Prep & Visualization\Uber\taxi_zones\taxi_zones.shp", output_dir="plots"):
    
    # --- SAVE/LOAD CONFIGURATION ---
    plot_name = "strategic_mobility_dashboard_v3"
    json_filename = f"{plot_name}.json"
    html_filename = f"{plot_name}.html"
    
    if not os.path.exists(output_dir):
        os.makedirs(output_dir)
    
    json_path = os.path.join(output_dir, json_filename)
    html_path = os.path.join(output_dir, html_filename)

    # --- CHECK IF PLOT EXISTS -> LOAD ---
    if os.path.exists(json_path):
        print(f"‚úÖ Plot found at '{json_path}'. Loading from file...")
        fig = pio.read_json(json_path)
        # fig.show(config={'scrollZoom': True})
        return fig

    # --- ELSE: GENERATE PLOT ---
    print("üöÄ Plot not found. Initializing Dashboard Generation...")
    
    # ==========================================================================
    # PART 1: NUMERICAL DATA PROCESSING (LEFT SIDE)
    # ==========================================================================
    queries = []
    for p in file_paths:
        try: queries.append(pl.scan_parquet(p))
        except: continue
    if not queries: return None
    lf = pl.concat(queries)
    
    # 1.1 Friction Data
    print("   1. Computing Friction Index...")
    lf_friction = (
        lf.filter(
            (pl.col("trip_km") > 0.5) &
            (pl.col("displacement_speed_kmh").is_between(1, 100)) &
            (pl.col("pickup_borough") != pl.col("dropoff_borough"))
        )
        .with_columns([
            pl.min_horizontal("pickup_borough", "dropoff_borough").alias("Boro_1"),
            pl.max_horizontal("pickup_borough", "dropoff_borough").alias("Boro_2")
        ])
        .with_columns((pl.col("Boro_1") + " ‚Üî " + pl.col("Boro_2")).alias("Borough_Flow"))
        .group_by("Borough_Flow")
        .agg([pl.len().alias("Trip_Count"), pl.col("displacement_speed_kmh").median().alias("Median_Speed")])
        .filter(pl.col("Trip_Count") > 2000)
        .sort("Median_Speed")
        .collect().to_pandas()
    )
    # Color Logic
    top3 = lf_friction.head(3)["Borough_Flow"].tolist()
    lf_friction["Color"] = lf_friction["Borough_Flow"].apply(lambda x: uber_style.UBER_BLACK if x in top3 else uber_style.GRAY_500)

    # 1.2 Speed Premium Data
    print("   2. Computing Speed Premium...")
    target_boros = ["Brooklyn", "Queens", "Bronx"]
    schema = lf.collect_schema().names()
    toll_col = "tolls" if "tolls" in schema else "tolls_amount"
    
    df_premium = (
        lf.filter(
            (pl.col("trip_km") > 2) &
            (
                ((pl.col("pickup_borough") == "Manhattan") & (pl.col("dropoff_borough").is_in(target_boros))) |
                ((pl.col("dropoff_borough") == "Manhattan") & (pl.col("pickup_borough").is_in(target_boros)))
            )
        )
        .with_columns([
            pl.when(pl.col("pickup_borough") == "Manhattan").then(pl.col("dropoff_borough")).otherwise(pl.col("pickup_borough")).alias("Connected_Borough"),
            pl.when(pl.col(toll_col) > 0).then(pl.lit("Paid")).otherwise(pl.lit("Free")).alias("Path_Type")
        ])
        .group_by(["Connected_Borough", "Path_Type"]).agg(pl.col("displacement_speed_kmh").median().alias("Median_Speed"))
        .collect().to_pandas()
        .pivot(index="Connected_Borough", columns="Path_Type", values="Median_Speed").reset_index()
    )
    df_premium["Sorter"] = df_premium["Connected_Borough"].map({"Brooklyn":1, "Bronx":2, "Queens":3})
    df_premium = df_premium.sort_values("Sorter")

    # 1.3 Map Processing (Boundary Only)
    print("   3. Optimizing map (Boundary Drawing)...")
    try:
        gdf_zones = gpd.read_file(shapefile_path).to_crs(epsg=4326)
        gdf_boroughs = gdf_zones.dissolve(by='borough').reset_index()
        main_boros = ["Manhattan", "Brooklyn", "Queens", "Bronx", "Staten Island"]
        gdf_boroughs = gdf_boroughs[gdf_boroughs['borough'].isin(main_boros)]
        gdf_boroughs["geometry"] = gdf_boroughs["geometry"].simplify(tolerance=0.001, preserve_topology=True)
        gdf_boroughs['lat'] = gdf_boroughs.geometry.centroid.y
        gdf_boroughs['lon'] = gdf_boroughs.geometry.centroid.x
    except Exception as e:
        print(f"‚ö†Ô∏è Map Error: {e}")
        gdf_boroughs = None

    print("üìä Rendering Dashboard...")

    # ==========================================================================
    # PART 2: VISUALIZATION
    # ==========================================================================
    fig = make_subplots(
        rows=2, cols=2,
        column_widths=[0.45, 0.55], 
        row_heights=[0.5, 0.5],
        specs=[
            [{"type": "xy"}, {"type": "mapbox", "rowspan": 2}], 
            [{"type": "xy"}, None]
        ],
        subplot_titles=(
            "<b>1. Inter-Borough Friction (Bottlenecks)</b>", 
            "<b>3. NYC Borough Reference</b>",
            "<b>2. Speed Premium (Paid vs Free)</b>"
        ),
        horizontal_spacing=0.08,
        vertical_spacing=0.15
    )

    # Chart 1: Bar
    fig.add_trace(go.Bar(
        y=lf_friction["Borough_Flow"], x=lf_friction["Median_Speed"], orientation='h',
        marker_color=lf_friction["Color"], text=lf_friction["Median_Speed"].apply(lambda x: f"{x:.1f}"),
        textposition='outside', showlegend=False
    ), row=1, col=1)

    # Chart 2: Dumbbell
    for i, row in df_premium.iterrows():
        fig.add_trace(go.Scatter(x=[row["Free"], row["Paid"]], y=[row["Connected_Borough"], row["Connected_Borough"]], mode="lines", line=dict(color=uber_style.GRAY_600, width=3), showlegend=False), row=2, col=1)
    fig.add_trace(go.Scatter(x=df_premium["Free"], y=df_premium["Connected_Borough"], mode="markers+text", marker=dict(color=uber_style.UBER_ORANGE, size=10), text=df_premium["Free"].apply(lambda x: f"{x:.1f}"), textposition="bottom center", name="Free", showlegend=True), row=2, col=1)
    fig.add_trace(go.Scatter(x=df_premium["Paid"], y=df_premium["Connected_Borough"], mode="markers+text", marker=dict(color=uber_style.UBER_GREEN, size=12), text=df_premium["Paid"].apply(lambda x: f"{x:.1f}"), textposition="top center", name="Paid", showlegend=True), row=2, col=1)

    # Chart 3: Map (Optimized)
    if gdf_boroughs is not None:
        # Draw boundary 
        fig.add_trace(
            go.Choroplethmapbox(
                geojson=gdf_boroughs.geometry.__geo_interface__,
                locations=gdf_boroughs.index,
                z=gdf_boroughs.index,
                colorscale=[[0, 'rgba(0,0,0,0)'], [1, 'rgba(0,0,0,0)']], 
                marker_line_color="#444444", 
                marker_line_width=0.75,
                showscale=False,
                hoverinfo='text',
                text=gdf_boroughs['borough']
            ),
            row=1, col=2
        )
        
        # Borough name labels
        fig.add_trace(
            go.Scattermapbox(
                lat=gdf_boroughs['lat'], lon=gdf_boroughs['lon'],
                mode='text',
                text=gdf_boroughs['borough'].str.upper(),
                textfont=dict(size=14, color="black", weight="bold"),
                hoverinfo='none'
            ),
            row=1, col=2
        )

    # ==========================================================================
    # PART 4: LAYOUT & INTERACTION
    # ==========================================================================
    fig.update_layout(
        title=dict(text="<b>Strategic Mobility Dashboard</b>", y=0.98, x=0.5, xanchor='center', yanchor='top'),
        height=900,
        width=1300,
        margin=dict(l=150, r=50, t=100, b=50),
        plot_bgcolor="white",
        
        # Mapbox Style
        mapbox=dict(
            style="carto-positron",
            center=dict(lat=40.73, lon=-73.93),
            zoom=9.5,
        ),
        
        legend=dict(x=0.0, y=0.48, orientation="h", bgcolor="rgba(255,255,255,0.8)")
    )

    # Axis Config
    fig.update_xaxes(title="Avg Speed (km/h)", row=1, col=1)
    fig.update_yaxes(automargin=True, row=1, col=1)
    fig.update_xaxes(title="Median Speed (km/h)", row=2, col=1, range=[10, 45])
    fig.update_yaxes(categoryorder="array", categoryarray=["Brooklyn", "Bronx", "Queens"], row=2, col=1)

    # --- SAVE TO FILE ---
    print(f"üíæ Saving figure to {output_dir}...")
    fig.write_json(json_path)
    fig.write_html(html_path)
    print("‚úÖ Generation & Save Complete.")

    fig.show(config={'scrollZoom': True})


# Call function
create_dashboard_smooth_map(FILE_PATHS)

üöÄ Plot not found. Initializing Dashboard Generation...
   1. Computing Friction Index...
   2. Computing Speed Premium...
   3. Optimizing map (Boundary Drawing)...
üìä Rendering Dashboard...
üíæ Saving figure to plots...
‚úÖ Generation & Save Complete.



Geometry is in a geographic CRS. Results from 'centroid' are likely incorrect. Use 'GeoSeries.to_crs()' to re-project geometries to a projected CRS before this operation.



Geometry is in a geographic CRS. Results from 'centroid' are likely incorrect. Use 'GeoSeries.to_crs()' to re-project geometries to a projected CRS before this operation.




The data confirms low average speed from Bronx, Queens and Brooklyn to Manhattan. The reason for this might be these regions rely heavily on limited bridge/tunnel access to reach Manhattan. These entry points act as funnels, causing severe congestion that drags down the city-wide average speed.  

**Conclusion:** The physical infrastructure connecting the boroughs is the primary constraint on network velocity, creating a structural ceiling on how fast the fleet can move.

Given that border crossings are the bottleneck, we analyze whether paying for premium infrastructure (Tolls) successfully bypasses this congestion. We calculate the **Return on Investment (ROI)** of tolls by comparing speed differentials between Paid and Free routes.  

1. High Efficiency Zone (Manhattan ‚Üí Bronx)
* Verdict: "High ROI / Value for Money"
* Data: Paid routes (e.g., Henry Hudson Bridge) are ~8 km/h faster (+50%) than free alternatives.
* Insight: Tolled infrastructure here effectively bypasses local street congestion.
* Action: Default to Paid Route. The app should explicitly highlight the time savings (e.g., "Pay $X to save 20 mins") to encourage conversion.

2. Saturation Zone (Manhattan ‚Üí Queens)
* Verdict: "Low Marginal Benefit"
* Data: The speed difference is negligible (~3 km/h).
* Insight: Capacity Saturation. The Queens-Midtown Tunnel (Paid) is often just as congested as the Queensboro Bridge (Free). The infrastructure has reached its physical limit; paying extra yields minimal time savings.
* Action: Prioritize Saver Mode. Algorithms should not nudge users toward paid routes unless the time saving exceeds a 5-minute threshold to maintain price competitiveness.

3. Structural Failure Zone (Manhattan ‚Üí Brooklyn)
* Verdict: "Systemic Gridlock"
* Data: Both options fail. Absolute speeds are critically low for both Free (~14 km/h) and Paid (~18 km/h) routes.
* Insight: Whether using the Battery Tunnel or Brooklyn Bridge, vehicles are trapped in city-wide congestion. Financial instruments (tolls) cannot solve this volume issue.
* Action: Manage Expectations.
    * Implement Pre-Surge logic to balance supply.
    * Increase ETA Buffer Time to prevent cancellations due to delays.
    * During peak hours, suggest Modal Shifts (Subway/Moto) near bridge entries.

### 4.4. Weather Impact

*Diagnosis of how environmental factors degrade network performance.*

**Methodology:**
We established a "Clear Baseline" (0%) using the median speed of ideal days (No Rain, No Snow, Low Wind, Clear Visibility). We then measured the percentage deviation in Speed and Wait Times for specific weather conditions.

In [22]:
import polars as pl
import pandas as pd
import plotly.graph_objects as go
import plotly.io as pio
import os
import uber_style

# --- CONFIG ---
pio.templates["uber"] = uber_style.uber_style_template
pio.templates.default = "uber"

def analyze_weather_logical_order(file_paths, output_dir="plots"):
    """
    Generates or loads the Weather Impact Tornado Chart (Logically Ordered).
    """
    
    # --- SAVE/LOAD CONFIGURATION ---
    plot_name = "weather_impact_tornado_logical"
    json_filename = f"{plot_name}.json"
    html_filename = f"{plot_name}.html"
    
    if not os.path.exists(output_dir):
        os.makedirs(output_dir)
    
    json_path = os.path.join(output_dir, json_filename)
    html_path = os.path.join(output_dir, html_filename)

    # --- CHECK IF PLOT EXISTS -> LOAD ---
    if os.path.exists(json_path):
        print(f"‚úÖ Plot found at '{json_path}'. Loading from file...")
        fig = pio.read_json(json_path)
        # fig.show()
        return fig

    # --- ELSE: GENERATE PLOT ---
    print("üöÄ Plot not found. Analyzing Weather Impact (Generating New Plot)...")
    
    # 1. Load Data
    q = pl.concat([pl.scan_parquet(p) for p in file_paths])
    schema = q.collect_schema().names()
    
    # 2. Baseline
    baseline_df = (
        q.filter(
            (pl.col("rain_intensity") == "none") & 
            (pl.col("snow_intensity") == "none") &
            (pl.col("wind_intensity").is_in(["calm", "breezy"])) &
            (pl.col("visibility_status") == "clear")
        )
        .select([
            pl.col("displacement_speed_kmh").median().alias("Base_Speed"),
            pl.col("total_wait_time_min").mean().alias("Base_Wait")
        ])
        .collect()
    )
    base_speed = baseline_df["Base_Speed"][0]
    base_wait = baseline_df["Base_Wait"][0]

    # 3. Helper Function
    def get_impact(col_name, type_label, valid_values):
        if col_name not in schema: return None
        
        df = (
            q.filter(pl.col(col_name).is_in(valid_values))
            .group_by(col_name)
            .agg([
                pl.col("displacement_speed_kmh").median().alias("Speed"),
                pl.col("total_wait_time_min").mean().alias("Wait")
            ])
            .collect()
            .to_pandas()
        )
        if df.empty: return None

        df["Speed_Drop_Pct"] = ((base_speed - df["Speed"]) / base_speed) * 100
        df["Wait_Increase_Pct"] = ((df["Wait"] - base_wait) / base_wait) * 100
        
        df["Category"] = type_label
        df["Level"] = df[col_name]
        df["Display_Label"] = type_label + " | " + df[col_name].str.capitalize()
        return df

    # 4. Get Data
    dfs = []
    dfs.append(get_impact("rain_intensity", "Rain", ["light", "moderate", "heavy", "violent"]))
    dfs.append(get_impact("snow_intensity", "Snow", ["trace_light", "moderate", "heavy", "severe"]))
    dfs.append(get_impact("visibility_status", "Fog", ["reduced", "poor_fog"]))
    dfs.append(get_impact("wind_intensity", "Wind", ["windy", "gale", "storm"]))
    
    df_viz = pd.concat([d for d in dfs if d is not None])

    # 5. SORT LOGIC BY ATTRIBUTE (HIERARCHY)
    custom_order = [
        "Rain | Light", "Rain | Moderate", "Rain | Heavy", "Rain | Violent",
        "Snow | Trace_light", "Snow | Moderate", "Snow | Heavy", "Snow | Severe",
        "Fog | Reduced", "Fog | Poor_fog",
        "Wind | Windy", "Wind | Gale", "Wind | Storm"
    ]
    
    df_viz["Display_Label"] = pd.Categorical(df_viz["Display_Label"], categories=custom_order, ordered=True)
    df_viz = df_viz.sort_values("Display_Label", ascending=True)

    print("üìä Rendering logical tornado chart...")

    # 6. Visualization
    fig = go.Figure()

    # LEFT: Speed Drop (Red)
    fig.add_trace(go.Bar(
        y=df_viz["Display_Label"],
        x=-df_viz["Speed_Drop_Pct"], 
        orientation='h',
        name="Speed Loss",
        marker=dict(color=uber_style.UBER_RED),
        text=df_viz["Speed_Drop_Pct"].apply(lambda x: f"-{x:.1f}%"),
        textposition='outside',
        textfont=dict(color=uber_style.UBER_RED, size=11)
    ))

    # RIGHT: Wait Time Increase (Black)
    fig.add_trace(go.Bar(
        y=df_viz["Display_Label"],
        x=df_viz["Wait_Increase_Pct"],
        orientation='h',
        name="Wait Increase",
        marker=dict(color=uber_style.UBER_BLACK),
        text=df_viz["Wait_Increase_Pct"].apply(lambda x: f"+{x:.1f}%"),
        textposition='outside',
        textfont=dict(color=uber_style.UBER_BLACK, size=11)
    ))

    # 7. Layout
    fig.update_layout(
        title=dict(text="<b>Weather Sensitivity: Intensity vs. Impact</b>"),
        barmode='overlay',
        
        xaxis=dict(
            title="Impact Magnitude (%)",
            showgrid=True, gridcolor=uber_style.GRAY_100,
            zeroline=True, zerolinecolor=uber_style.GRAY_500,
            tickmode='array',
            tickvals=[-40, -20, 0, 20, 40, 60],
            ticktext=['40%', '20%', '0%', '20%', '40%', '60%']
        ),
        
        yaxis=dict(title=None),
        legend=dict(orientation="h", y=1.0, x=0.8, xanchor="center"),
        margin=dict(l=150, r=50, t=100, b=80),
        height=750,
        plot_bgcolor="white"
    )
    
    fig = uber_style.apply_uber_branding(
        fig,
        subtitle="Ordered by weather severity to show operational degradation trends",
        source="Source: NYC TLC HVFHV (Vs. Clear Baseline)",
        footer_y=-0.1,
        logo_y=-0.13
    )

    # --- SAVE TO FILE ---
    print(f"üíæ Saving figure to {output_dir}...")
    fig.write_json(json_path)
    fig.write_html(html_path)
    print("‚úÖ Generation & Save Complete.")

    # fig.show()
    return fig

analyze_weather_logical_order(FILE_PATHS)

üöÄ Plot not found. Analyzing Weather Impact (Generating New Plot)...
üìä Rendering logical tornado chart...
üíæ Saving figure to plots...
‚úÖ Generation & Save Complete.



1. Rain: The Supply Shock
* Data: Speed remains relatively stable (-6%), but Wait Times surge (+28%).
* Insight: Rain is not primarily a traffic congestion issue. It causes a supply-demand fracture. Demand increases as pedestrians switch to cars, while supply decreases as drivers log off to avoid hassle or risk.

2. Wind: The Speed Trap
* Data: Shows the sharpest drop in Speed (-7.1%) as vehicles slow down on bridges for safety, yet Wait Times remain stable (-0.9%).
* Insight: The market remains in equilibrium. Strong winds slow the fleet down but do not cause a shortage of drivers.

3. Fog: The Double Hazard
* Data: Simultaneous drop in Speed (-6.1%) and spike in Wait Times (+18.1%).
* Insight: This represents the worst overall user experience condition, degrading both safety and availability.

4. Snow: Negligible Impact
* Insight: Impact is minimal in the dataset, likely due to recorded instances being light snow or rapid melting.


### STRATEGIC RECOMMENDATIONS

Based on the diagnosis, we propose distinct protocols for Rain and Wind events.

For Rain (Retention Strategy)
* Action: Implement Pre-Surge Incentives.
* Mechanism: Utilize weather forecasting to trigger a Rain Bonus notification to drivers 30 minutes before precipitation starts. The goal is to prevent the initial log-off wave and maintain supply density.

For Wind (Expectation Strategy)
* Action: Dynamic ETA Adjustment.
* Mechanism: Automatically add a 7-10% buffer to estimated arrival times when wind speeds exceed 20mph, particularly for routes crossing major bridges. This ensures the app sets realistic expectations for customers.