<!-- HIDDEN H1 FOR OUTLINE VIEW -->
<h1 id="atlas" style="display: none;">1. The Project Atlas: File Inventory</h1>
<!-- VISIBLE H1 -->
<h1 id="atlas-visible" style="font-family: 'Roboto Condensed', 'Arial Narrow', sans-serif; color: white; font-size: 22px; font-weight: bold; background-color: #0771A4; border-radius: 4px; padding: 12px 0px 12px 15px; margin-top: 20px;">1. The Project Atlas: File Inventory</h1>


This section serves as the central navigation hub. Before diving into analysis, identify which file suits your specific question.

| File Category | Filename Pattern | Format | Primary Use Case | Target Audience |
| :--- | :--- | :--- | :--- | :--- |
| **The Flesh** | `tlc_sample_20XX.parquet` | Parquet | **ML Modeling & Deep Dives.** Use this when you need row-level distributions, interaction effects, or to train predictive models. Contains ~1% of the full dataset (Stratified Sample). | Data Scientists |
| **The Backbone** | `agg_timeline_hourly.parquet` | Parquet | **Time Series Analysis.** Aggregated by Hour. Use for plotting volume trends, revenue growth, and seasonality over years. | Analysts / Viz |
| **The Map** | `agg_network_monthly.parquet` | Parquet | **Geospatial Analysis.** Aggregated by Route (Zone-to-Zone). Use for heatmaps, flow maps, and route efficiency studies. | GIS / Viz |
| **The Wallet** | `agg_pricing_distribution.parquet` | Parquet | **Economic Analysis.** Aggregated by Day & Weather. Use for inflation, surge pricing analysis, and driver pay volatility. | Economists / Strategy |
| **The KPI** | `agg_executive_daily.csv` | CSV | **High-Level Summary.** A lightweight daily summary. Use for "Big Number" dashboards and Excel quick checks. | Management |
| **The Evidence** | `TLC_Universal_Audit_*.csv` | CSV | **Quality Control.** Statistical proof of data integrity (Null counts, Physics paradox checks). | Data Engineers |

<!-- HIDDEN H1 FOR OUTLINE VIEW -->
<h1 id="dictionary-processed" style="display: none;">2. Processed Data Dictionary (The "Flesh")</h1>
<!-- VISIBLE H1 -->
<h1 id="dictionary-processed-visible" style="font-family: 'Roboto Condensed', 'Arial Narrow', sans-serif; color: white; font-size: 22px; font-weight: bold; background-color: #0771A4; border-radius: 4px; padding: 12px 0px 12px 15px; margin-top: 20px;">2. Processed Data Dictionary (The "Flesh")</h1>


**Total Features:** 70  
**Source:** `tlc_sample_20XX.parquet`  
**Context:** This dataset represents individual rides, cleaned and enriched with weather, geospatial, and economic data.


<!-- HIDDEN H2 FOR OUTLINE VIEW -->
<h2 id="group-1-temporal" style="display: none;">Group 1: Temporal Context (17 Features)</h2>
<!-- VISIBLE H2 -->
<h2 id="group-1-temporal-visible" style="font-family: 'Roboto Condensed', 'Arial Narrow', sans-serif; color: #38545f; font-size: 18px; font-weight: 500; background-color: #f9fbfb; border-top: 4px solid #0c75ab; border-radius: 2px; border-bottom: 1px solid #D9D9D9; padding: 10px 0px 10px 15px; margin-top: 15px;">Group 1: Temporal Context (17 Features)</h2>


*Features that place the ride in human and machine time.*

| Feature Name | Type | Definition | Range / Values | Rationale & Usage |
| :--- | :--- | :--- | :--- | :--- |
| `pickup_datetime` | Datetime | The exact timestamp when the trip started. | 2019–2025 | **The anchor.** Use for sorting and granular time plotting. |
| `dropoff_datetime` | Datetime | The exact timestamp when the trip ended. | 2019–2025 | Used to calculate duration. |
| `pickup_date` | Date | The calendar date of the trip. | YYYY-MM-DD | Optimized for grouping by day without re-casting. |
| `pickup_year` | Int32 | The year of the trip. | 2019–2025 | High-level yearly faceting. |
| `pickup_month` | UInt32 | The month of the trip. | 1–12 | Seasonality analysis. |
| `pickup_day` | UInt32 | The day of the month. | 1–31 | Intra-month trend analysis. |
| `pickup_hour` | UInt32 | The hour of the day (24h). | 0–23 | Hourly demand analysis. |
| `pickup_dow` | UInt32 | The day of the week (ISO). | 1 (Mon) – 7 (Sun) | Weekly cycle analysis. |
| `time_of_day_bin` | String | Functional categorization of the hour. | `morning_rush`, `midday`, `evening_rush`, `evening`, `late_night` | **Storytelling.** Better than raw hours for describing "Commute" vs "Nightlife". |
| `cultural_day_type` | String | A sociological definition of the day type. | `workday`, `weekend_night`, `weekend_day`, `sunday_rest` | **Crucial.** Distinguishes "Friday Night Party" (Weekend Night) from "Monday Morning Grind" (Workday). |
| `pandemic_phase` | String | The COVID-19 era of the trip. | `pre_pandemic`, `lockdown`, `recovery`, `new_normal` | Essential for contextualizing 2020–2021 volume drops. |
| `cyclical_hour_sin` | Float64 | Sine transformation of the hour. | -1.0 to 1.0 | **ML Only.** Allows models to understand that Hour 23 is adjacent to Hour 0. |
| `cyclical_hour_cos` | Float64 | Cosine transformation of the hour. | -1.0 to 1.0 | **ML Only.** |
| `cyclical_month_sin` | Float64 | Sine transformation of the month. | -1.0 to 1.0 | **ML Only.** |
| `cyclical_month_cos` | Float64 | Cosine transformation of the month. | -1.0 to 1.0 | **ML Only.** |
| `cyclical_day_sin` | Float64 | Sine transformation of the day of week. | -1.0 to 1.0 | **ML Only.** |
| `cyclical_day_cos` | Float64 | Cosine transformation of the day of week. | -1.0 to 1.0 | **ML Only.** |


<!-- HIDDEN H2 FOR OUTLINE VIEW -->
<h2 id="group-2-geo" style="display: none;">Group 2: Geospatial & Trip Context (10 Features)</h2>
<!-- VISIBLE H2 -->
<h2 id="group-2-geo-visible" style="font-family: 'Roboto Condensed', 'Arial Narrow', sans-serif; color: #38545f; font-size: 18px; font-weight: 500; background-color: #f9fbfb; border-top: 4px solid #0c75ab; border-radius: 2px; border-bottom: 1px solid #D9D9D9; padding: 10px 0px 10px 15px; margin-top: 15px;">Group 2: Geospatial & Trip Context (10 Features)</h2>

*Features that define the "Where" and the "Who".*

| Feature Name | Type | Definition | Range / Values | Rationale & Usage |
| :--- | :--- | :--- | :--- | :--- |
| `PULocationID` | Int32 | TLC Taxi Zone ID where the trip began. | 1–263 | Join with Shapefile for maps. |
| `DOLocationID` | Int32 | TLC Taxi Zone ID where the trip ended. | 1–263 | Join with Shapefile for maps. |
| `pickup_borough` | String | The NYC Borough of the pickup. | Manhattan, Brooklyn, Queens, Bronx, Staten Island, EWR | High-level geographic grouping. |
| `dropoff_borough` | String | The NYC Borough of the dropoff. | (Same as above) | High-level geographic grouping. |
| `pickup_zone` | String | The name of the neighborhood (e.g., "East Village"). | (Variable) | Human-readable labels for charts. |
| `dropoff_zone` | String | The name of the neighborhood. | (Variable) | Human-readable labels for charts. |
| `borough_flow` | String | A string describing the movement path. | e.g., "Manhattan -> Brooklyn" | Simplifies flow analysis (Sankey diagrams). |
| `borough_flow_type` | String | Classification of the transit path. | `manhattan_internal`, `manhattan_outer_commute`, `outer_inter`, `outer_intra` | **Storytelling.** Highlights the "Transit Desert" economy (Outer-to-Outer trips). |
| `trip_type_zone` | String | Granular classification of distance. | `intra_zone`, `intra_borough`, `inter_borough` | Differentiates local errands from cross-city commutes. |
| `trip_archetype` | String | A behavioral classification of the trip's purpose. | `commute`, `nightlife`, `airport`, `leisure` | **Storytelling.** Inferred based on Time + Location + Day. |

<!-- HIDDEN H2 FOR OUTLINE VIEW -->
<h2 id="group-3-physics" style="display: none;">Group 3: Physics & Service Metrics (11 Features)</h2>
<!-- VISIBLE H2 -->
<h2 id="group-3-physics-visible" style="font-family: 'Roboto Condensed', 'Arial Narrow', sans-serif; color: #38545f; font-size: 18px; font-weight: 500; background-color: #f9fbfb; border-top: 4px solid #0c75ab; border-radius: 2px; border-bottom: 1px solid #D9D9D9; padding: 10px 0px 10px 15px; margin-top: 15px;">Group 3: Physics & Service Metrics (11 Features)</h2>


*Features measuring the physical reality of the movement.*

| Feature Name | Type | Definition | Range / Values | Rationale & Usage |
| :--- | :--- | :--- | :--- | :--- |
| `trip_km` | Float32 | The actual distance driven (Odometer). | 0.15 – 120.0 | The basis for billing and cost analysis. |
| `duration_seconds` | Float64 | Total trip time in seconds. | 60 – 15,000 | The raw measure of time spent. |
| `duration_min` | Float64 | Total trip time in minutes. | 1.0 – 250.0 | Human-readable duration. |
| `straight_line_dist_km`| Float64 | The "As the crow flies" distance between centroids. | > 0 | Used to calculate efficiency. |
| `bearing_degrees` | Float64 | The compass direction of travel (0=North). | 0.0 – 360.0 | Analyzes flow direction (e.g., "Everyone heads North in the evening"). |
| `speed_kmh` | Float64 | Average speed based on *driven* distance (`trip_km` / `time`). | 1.0 – 100.0 | Measures how fast the wheels turned. High on highways. |
| `displacement_speed_kmh`| Float64 | Effective speed based on *straight line* distance. | > 0 | **The Gridlock Detector.** Measures how fast you *actually* got closer to your destination. Low values = Stuck in traffic. |
| `tortuosity_index` | Float64 | Ratio of Driven Dist / Straight Line Dist. | >= 1.0 | **Efficiency Metric.** 1.0 = Straight line. > 1.5 = Detours or complex street grids. |
| `total_wait_time_min` | Float64 | Time between App Request and Pickup. | > 0 (or Null) | Measures system latency and passenger wait pain. **Nulls:** Negative values (Time Travel paradox) forced to `Null`. |
| `driver_response_time_min`| Float64 | Time between App Request and Driver Arrival. | > 0 (or Null) | Measures driver supply availability. **Nulls:** *same as above.* |
| `boarding_time_min` | Float64 | Time between Driver Arrival and Trip Start. | > 0 (or Null) | Measures "Curb Friction" (Passenger lateness). **Nulls:** *same as above.* |


<!-- HIDDEN H2 FOR OUTLINE VIEW -->
<h2 id="group-4-finance" style="display: none;">Group 4: Financials & Economics (17 Features)</h2>
<!-- VISIBLE H2 -->
<h2 id="group-4-finance-visible" style="font-family: 'Roboto Condensed', 'Arial Narrow', sans-serif; color: #38545f; font-size: 18px; font-weight: 500; background-color: #f9fbfb; border-top: 4px solid #0c75ab; border-radius: 2px; border-bottom: 1px solid #D9D9D9; padding: 10px 0px 10px 15px; margin-top: 15px;">Group 4: Financials & Economics (17 Features)</h2>

*The "Wallet War" features tracking the flow of money.*

| Feature Name | Type | Definition | Range / Values | Rationale & Usage |
| :--- | :--- | :--- | :--- | :--- |
| `base_passenger_fare` | Float32 | The core price of the ride (before taxes/tips). | 0.10 – 300.0 | The baseline demand signal. |
| `tolls` | Float32 | Bridge and tunnel fees. | 0.0 – 50.0 | Pass-through costs. |
| `bcf` | Float32 | Black Car Fund tax (~2.5-3%). | 0.0 – 15.0 | Mandatory NY State tax. |
| `sales_tax` | Float32 | NY Sales Tax (~8.875%). | 0.0 – 40.0 | Mandatory tax. |
| `congestion_surcharge` | Float32 | Fee for entering Manhattan (< 96th St). | 0.0 – 2.75 | Policy impact metric. |
| `airport_fee` | Float32 | Fee for pickup/dropoff at JFK/LGA/EWR. | 0.0 – 6.0 | Tourism tax. |
| `cbd_congestion_fee` | Float32 | The new congestion zone fee (Jan 2025+). | 0.0 or 2.50 | **New.** Tracks the impact of the 2025 policy change. |
| `tips` | Float32 | Voluntary gratuity from passenger. | 0.0 – 300.0 | Measures customer satisfaction/generosity. |
| `driver_pay` | Float32 | The net earnings of the driver. | 0.01 – 200.0 | **Key Target.** Used to calculate driver wages. |
| `total_rider_cost` | Float32 | Sum of ALL 8 financial columns above. | > 0 | **The True Cost.** What the rider actually saw on their credit card bill. |
| `cost_per_km` | Float32 | Total Cost / Trip KM. | > 0 | **Luxury Index.** High values indicate premium zones (Short Manhattan trips). |
| `driver_revenue_share` | Float32 | Driver Pay / Base Fare. | 0.0 – 1.0+ | **Fairness Metric.** What % of the core fare goes to the driver? |
| `uber_take_rate_proxy` | Float32 | 1 - Driver Revenue Share. | < 1.0 | **Platform Tax.** Estimated % kept by Uber. |
| `pay_per_hour` | Float32 | Driver Pay / Duration (Hours). | > 0 | **Livability Metric.** Can be compared to Minimum Wage. |
| `tipping_pct` | Float32 | Tips / Base Fare. | 0.0 – 4.0 | Normalized generosity metric. |
| `is_generous_tip` | UInt8 | Flag if Tip > 25% of Fare. | 0 / 1 | Identifies "Whale" tippers. |
| `is_subsidized` | UInt8 | Flag if Driver Pay > Base Fare. | 0 / 1 | Identifies trips where Uber likely lost money to incentivize drivers. |

<!-- HIDDEN H2 FOR OUTLINE VIEW -->
<h2 id="group-5-flags" style="display: none;">Group 5: Service Flags (5 Features)</h2>
<!-- VISIBLE H2 -->
<h2 id="group-5-flags-visible" style="font-family: 'Roboto Condensed', 'Arial Narrow', sans-serif; color: #38545f; font-size: 18px; font-weight: 500; background-color: #f9fbfb; border-top: 4px solid #0c75ab; border-radius: 2px; border-bottom: 1px solid #D9D9D9; padding: 10px 0px 10px 15px; margin-top: 15px;">Group 5: Service Flags (5 Features)</h2>


*Binary indicators (0/1) regarding specific ride attributes.*

| Feature Name | Type | Definition | Range / Values | Rationale & Usage |
| :--- | :--- | :--- | :--- | :--- |
| `shared_request_flag` | UInt8 | Did passenger request a pool? | 0 / 1 | Measures willingness to share. |
| `shared_match_flag` | UInt8 | Did the pool actually match? | 0 / 1 | Measures system liquidity/density. |
| `access_a_ride_flag` | UInt8 | Administered by MTA? | 0 / 1 | Paratransit integration. |
| `wav_request_flag` | UInt8 | Requested Wheelchair Accessible Vehicle? | 0 / 1 | Equity/Accessibility metric. |
| `wav_match_flag` | UInt8 | Was the vehicle WAV? | 0 / 1 | Supply side of accessibility. |

<!-- HIDDEN H2 FOR OUTLINE VIEW -->
<h2 id="group-6-weather" style="display: none;">Group 6: Meteorological Context (10 Features)</h2>
<!-- VISIBLE H2 -->
<h2 id="group-6-weather-visible" style="font-family: 'Roboto Condensed', 'Arial Narrow', sans-serif; color: #38545f; font-size: 18px; font-weight: 500; background-color: #f9fbfb; border-top: 4px solid #0c75ab; border-radius: 2px; border-bottom: 1px solid #D9D9D9; padding: 10px 0px 10px 15px; margin-top: 15px;">Group 6: Meteorological Context (10 Features)</h2>


*External weather conditions for New York (Visual Crossing, hourly).*

| Feature Name | Type | Definition | Range / Values | Rationale & Usage |
| :--- | :--- | :--- | :--- | :--- |
| `temp` | Float64 | Air temperature in Celsius. | idk man | Raw thermal comfort. |
| `conditions` | String | Raw summary from API. | e.g., "Rain, Overcast" | Descriptive text. |
| `rain_intensity` | String | Categorical rain volume. | `none`, `light`, `moderate`, `heavy` | Granular impact of rain on traffic. |
| `snow_intensity` | String | Categorical snow volume. | `none`, `trace_light`, `moderate`, `heavy`, `severe` | **Chaos Metric.** Snow stops the city. |
| `wind_intensity` | String | Categorical wind speed. | `calm`, `breezy`, `windy`, `gale` | High wind increases "Walk Aversion". |
| `visibility_status` | String | Categorical visibility distance. | `clear`, `reduced`, `poor_fog` | Safety metric impacting speed. |
| `weather_state` | String | Hierarchical summary of the hour. | `snowing`, `snow_on_ground`, `raining`, `clear_cloudy` | **Best for Viz.** Prioritizes the most disruptive weather (Snow > Rain). |
| `is_bad_weather` | UInt8 | Flag for generally poor conditions. | 0 / 1 | Simple filter for "Miserable Days". |
| `is_extreme_weather` | UInt8 | Flag for severe/dangerous conditions. | 0 / 1 | Identifies outlier days (blizzards, hurricanes). |
| `temp_bin` | String | Categorical temperature bucket. | `freezing`, `cold`, `mild`, `warm`, `hot` | Simplifies thermal analysis. |


* **Temperature:** `Freezing` ($<0$), `Cold` ($0-10$), `Mild` ($10-20$), `Warm` ($20-28$), `Hot` ($>28$).
* **Rain:** `Light` ($<1mm$), `Moderate` ($1-5mm$), `Heavy` ($>5mm$).
* **Snow:** `Trace` ($<2.5cm$), `Moderate` ($2.5-10cm$), `Heavy` ($10-20cm$), `Severe` ($>20cm$).
* **Wind:** `Breezy` ($15-40km/h$), `Windy` ($40-62km/h$), `Gale` ($>62km/h$).
* **Visibility:** `Reduced` ($1-10km$), `Poor/Fog` ($<1km$).


<!-- HIDDEN H1 FOR OUTLINE VIEW -->
<h1 id="dictionary-aggregates" style="display: none;">3. Aggregate Data Dictionaries (The "Backbones")</h1>
<!-- VISIBLE H1 -->
<h1 id="dictionary-aggregates-visible" style="font-family: 'Roboto Condensed', 'Arial Narrow', sans-serif; color: white; font-size: 22px; font-weight: bold; background-color: #0771A4; border-radius: 4px; padding: 12px 0px 12px 15px; margin-top: 20px;">3. Aggregate Data Dictionaries (The "Backbones")</h1>

This section details the four pre-calculated Data Marts designed for high-speed analysis without loading the full dataset.

### **NOTE: Đừng nhầm giữa fare với cost nhé!!**

**1. The "Base Price" Metrics (Pure Economics)**
*   **Source Column:** `base_passenger_fare`
*   **What it includes:** Just the ride cost. No taxes, no tips, no tolls.
*   **Aggregate Metrics:**
    *   `total_fare_amt` (Mart 1)
    *   `total_fare_revenue` (Mart 4)
    *   `median_fare` (Mart 3)
    *   `p90_fare_surge_proxy` (Mart 3)
*   **Why use this?** This is the clean signal of **Supply vs. Demand**. If this number goes up, it means "Surge Pricing" or "Longer Distance." It is not affected by whether the user tipped or not.

**2. The "Total Cost" Metrics (The Receipt)**
*   **Source Column:** `total_rider_cost` (The Sum of 8 components)
*   **What it includes:** Base + Tolls + Tips + Surcharge + Airport + Tax + BCF + CBD.
*   **Aggregate Metrics:**
    *   `total_revenue_gross` (Mart 1)
    *   `total_gross_booking_value` (Mart 4)
    *   `avg_cost` (Mart 2)
*   **Why use this?** This is the **Consumer Inflation** signal. It answers "How much does it actually cost to leave the house?"

<!-- HIDDEN H2 FOR OUTLINE VIEW -->
<h2 id="mart-1-timeline" style="display: none;">Mart 1: The Timeline Backbone</h2>
<!-- VISIBLE H2 -->
<h2 id="mart-1-timeline-visible" style="font-family: 'Roboto Condensed', 'Arial Narrow', sans-serif; color: #38545f; font-size: 18px; font-weight: 500; background-color: #f9fbfb; border-top: 4px solid #0c75ab; border-radius: 2px; border-bottom: 1px solid #D9D9D9; padding: 10px 0px 10px 15px; margin-top: 15px;">Mart 1: The Timeline Backbone</h2>


**File:** `agg_timeline_hourly.parquet`  
**Grain:** Hourly by Trip Type.  
**Use Case:** Trend analysis, Seasonality, Volume forecasting.



| Feature Name | Type | Definition | Rationale |
| :--- | :--- | :--- | :--- |
| `pickup_year` | Int32 | Year of trip. | Time hierarchy. |
| `pickup_month` | UInt32 | Month of trip (1-12). | Seasonality. |
| `pickup_day` | UInt32 | Day of month (1-31). | Daily volume tracking. |
| `pickup_hour` | UInt32 | Hour of day (0-23). | Intraday patterns. |
| `borough_flow_type` | String | e.g., `manhattan_internal`. | Analyzing Commuter vs. Intra-borough trends. |
| `trip_archetype` | String | e.g., `commute`, `nightlife`. | Analyzing behavioral segments. |
| `cultural_day_type` | String | e.g., `weekend_night`. | Distinguishing nightlife demand. |
| `trip_count` | UInt32 | Total number of trips in this hour/group. | **Primary Volume Metric.** |
| `total_fare_amt` | Float64 | Sum of Base Fares. | Core Revenue (Platform + Driver). |
| `total_driver_pay` | Float64 | Sum of Driver Pay. | Total Driver Earnings pool. |
| `total_cbd_fee` | Float64 | Sum of CBD Congestion Fees. | Impact of 2025 policy. |
| `total_revenue_gross` | Float64 | Sum of Total Rider Cost. | **Gross Booking Value (GBV).** |
| `total_tips` | Float64 | Sum of Tips. | Total "Generosity Economy". |
| `avg_trip_km` | Float64 | Mean distance traveled. | Detecting "Short Trip" trends. |
| `avg_speed_kmh` | Float64 | Mean travel speed. | Detecting systemic congestion. |
| `bad_weather_count` | UInt32 | Count of trips during `is_bad_weather=1`. | Weather impact volume. |
| `extreme_weather_count` | UInt32 | Count of trips during `is_extreme_weather=1`. | Chaos impact volume. |


<!-- HIDDEN H2 FOR OUTLINE VIEW -->
<h2 id="mart-2-network" style="display: none;">Mart 2: The Network Backbone</h2>
<!-- VISIBLE H2 -->
<h2 id="mart-2-network-visible" style="font-family: 'Roboto Condensed', 'Arial Narrow', sans-serif; color: #38545f; font-size: 18px; font-weight: 500; background-color: #f9fbfb; border-top: 4px solid #0c75ab; border-radius: 2px; border-bottom: 1px solid #D9D9D9; padding: 10px 0px 10px 15px; margin-top: 15px;">Mart 2: The Network Backbone</h2>


**File:** `agg_network_monthly.parquet`  
**Grain:** Monthly by Origin-Destination (Zone).  
**Use Case:** GIS Mapping, Route Efficiency, Transit Gap Analysis.


| Feature Name | Type | Definition | Rationale |
| :--- | :--- | :--- | :--- |
| `pickup_year` | Int32 | Year. | Time hierarchy. |
| `pickup_month` | UInt32 | Month. | Time hierarchy. |
| `PULocationID` | Int32 | Origin Zone ID. | **Map Join Key.** |
| `DOLocationID` | Int32 | Destination Zone ID. | **Map Join Key.** |
| `pickup_borough` | String | Origin Borough. | High-level filtering. |
| `dropoff_borough` | String | Destination Borough. | High-level filtering. |
| `trip_count` | UInt32 | Volume on this specific route. | Route Popularity (Edge Weight). |
| `avg_duration_min` | Float64 | Mean trip time. | Route Duration. |
| `avg_cost` | Float64 | Mean Total Rider Cost. | "How expensive is this route?" |
| `avg_displacement_speed`| Float64 | Mean straight-line speed. | **Gridlock Metric.** Low values = inefficient route. |
| `avg_wait_time` | Float64 | Mean time (Request $\to$ Pickup). | **Service Quality.** Identifies underserved zones. |
| `avg_driver_response` | Float64 | Mean time (Request $\to$ Arrival). | Driver Supply proximity. |

<!-- HIDDEN H2 FOR OUTLINE VIEW -->
<h2 id="mart-3-economic" style="display: none;">Mart 3: The Economic Backbone</h2>
<!-- VISIBLE H2 -->
<h2 id="mart-3-economic-visible" style="font-family: 'Roboto Condensed', 'Arial Narrow', sans-serif; color: #38545f; font-size: 18px; font-weight: 500; background-color: #f9fbfb; border-top: 4px solid #0c75ab; border-radius: 2px; border-bottom: 1px solid #D9D9D9; padding: 10px 0px 10px 15px; margin-top: 15px;">Mart 3: The Economic Backbone</h2>

**File:** `agg_pricing_distribution.parquet`  
**Grain:** Daily by Weather & Time Bin.  
**Use Case:** Price sensitivity, Driver Pay Analysis, Inflation tracking.



| Feature Name | Type | Definition | Rationale |
| :--- | :--- | :--- | :--- |
| `pickup_date` | Date | Calendar Date. | Daily grouping. |
| `time_of_day_bin` | String | e.g., `morning_rush`. | Intraday economic cycles. |
| `weather_state` | String | e.g., `snowing`. | Weather impact on price. |
| `borough_flow_type` | String | e.g., `outer_inter`. | Geographic economic variance. |
| `trip_count` | UInt32 | Volume. | Sample size context. |
| `avg_driver_share` | Float64 | Mean (Driver Pay / Base Fare). | **Fairness.** Does share drop during surges? |
| `std_driver_share` | Float64 | Standard Deviation of Share. | **Volatility.** Is driver pay consistent or gambling? |
| `avg_take_rate` | Float64 | Mean (1 - Driver Share). | Platform Margin proxy. |
| `avg_tip_pct` | Float64 | Mean (Tips / Base Fare). | **Generosity.** Does rain increase tips? |
| `avg_hourly_wage` | Float64 | Mean (Driver Pay / Duration). | **Livability.** Earnings per hour worked. |
| `median_fare` | Float64 | 50th Percentile Base Fare. | "Typical Price." |
| `p90_fare_surge_proxy` | Float64 | 90th Percentile Base Fare. | **Surge Detector.** High P90 indicates price spikes. |
| `dominant_rain` | String | Most common rain intensity (Mode). | Context for the day. |

<!-- HIDDEN H2 FOR OUTLINE VIEW -->
<h2 id="mart-4-executive" style="display: none;">Mart 4: The Executive Summary</h2>
<!-- VISIBLE H2 -->
<h2 id="mart-4-executive-visible" style="font-family: 'Roboto Condensed', 'Arial Narrow', sans-serif; color: #38545f; font-size: 18px; font-weight: 500; background-color: #f9fbfb; border-top: 4px solid #0c75ab; border-radius: 2px; border-bottom: 1px solid #D9D9D9; padding: 10px 0px 10px 15px; margin-top: 15px;">Mart 4: The Executive Summary</h2>

**File:** `agg_executive_daily.csv`  
**Grain:** Global Daily.  
**Use Case:** KPI Cards, Excel Dashboards, High-level reporting.



| Feature Name | Type | Definition | Rationale |
| :--- | :--- | :--- | :--- |
| `pickup_date` | Date | Calendar Date. | Time Index. |
| `total_trips` | Int64 | Total daily volume. | **Headline Volume.** |
| `total_fare_revenue` | Float64 | Total Base Fare. | Core Revenue. |
| `total_gross_booking_value`| Float64| Total Rider Cost (All in). | **Headline GBV.** |
| `total_tips` | Float64 | Total Tips. | Total Tip Economy. |
| `total_km_traveled` | Float64 | Sum of Trip KMs. | Fleet Mileage. |
| `bad_weather_trip_count` | Int64 | Trips during bad weather. | "How many miserable rides?" |
| `extreme_weather_trip_count`| Int64| Trips during extreme weather. | "How many chaotic rides?" |
| `avg_wait_time` | Float64 | Global mean wait time. | **Headline Service Quality.** |