# **New York City Yellow Taxi Data**

## Objective
In this case study you will be learning exploratory data analysis (EDA) with the help of a dataset on yellow taxi rides in New York City. This will enable you to understand why EDA is an important step in the process of data science and machine learning.

## **Problem Statement**
As an analyst at an upcoming taxi operation in NYC, you are tasked to use the 2023 taxi trip data to uncover insights that could help optimise taxi operations. The goal is to analyse patterns in the data that can inform strategic decisions to improve service efficiency, maximise revenue, and enhance passenger experience.

## Tasks
You need to perform the following steps for successfully completing this assignment:
1. Data Loading
2. Data Cleaning
3. Exploratory Analysis: Bivariate and Multivariate
4. Creating Visualisations to Support the Analysis
5. Deriving Insights and Stating Conclusions

---

**NOTE:** The marks given along with headings and sub-headings are cumulative marks for those particular headings/sub-headings.<br>

The actual marks for each task are specified within the tasks themselves.

For example, marks given with heading *2* or sub-heading *2.1* are the cumulative marks, for your reference only. <br>

The marks you will receive for completing tasks are given with the tasks.

Suppose the marks for two tasks are: 3 marks for 2.1.1 and 2 marks for 3.2.2, or
* 2.1.1 [3 marks]
* 3.2.2 [2 marks]

then, you will earn 3 marks for completing task 2.1.1 and 2 marks for completing task 3.2.2.


---

## Data Understanding
The yellow taxi trip records include fields capturing pick-up and drop-off dates/times, pick-up and drop-off locations, trip distances, itemized fares, rate types, payment types, and driver-reported passenger counts.

The data is stored in Parquet format (*.parquet*). The dataset is from 2009 to 2024. However, for this assignment, we will only be using the data from 2023.

The data for each month is present in a different parquet file. You will get twelve files for each of the months in 2023.

The data was collected and provided to the NYC Taxi and Limousine Commission (TLC) by technology providers like vendors and taxi hailing apps. <br>

You can find the link to the TLC trip records page here: https://www.nyc.gov/site/tlc/about/tlc-trip-record-data.page

###  Data Description
You can find the data description here: [Data Dictionary](https://www.nyc.gov/assets/tlc/downloads/pdf/data_dictionary_trip_records_yellow.pdf)

**Trip Records**



|Field Name       |description |
|:----------------|:-----------|
| VendorID | A code indicating the TPEP provider that provided the record. <br> 1= Creative Mobile Technologies, LLC; <br> 2= VeriFone Inc. |
| tpep_pickup_datetime | The date and time when the meter was engaged.  |
| tpep_dropoff_datetime | The date and time when the meter was disengaged.   |
| Passenger_count | The number of passengers in the vehicle. <br> This is a driver-entered value. |
| Trip_distance | The elapsed trip distance in miles reported by the taximeter. |
| PULocationID | TLC Taxi Zone in which the taximeter was engaged |
| DOLocationID | TLC Taxi Zone in which the taximeter was disengaged |
|RateCodeID |The final rate code in effect at the end of the trip.<br> 1 = Standard rate <br> 2 = JFK <br> 3 = Newark <br>4 = Nassau or Westchester <br>5 = Negotiated fare <br>6 = Group ride |
|Store_and_fwd_flag |This flag indicates whether the trip record was held in vehicle memory before sending to the vendor, aka “store and forward,” because the vehicle did not have a connection to the server.  <br>Y= store and forward trip <br>N= not a store and forward trip |
|Payment_type| A numeric code signifying how the passenger paid for the trip. <br> 1 = Credit card <br>2 = Cash <br>3 = No charge <br>4 = Dispute <br>5 = Unknown <br>6 = Voided trip |
|Fare_amount| The time-and-distance fare calculated by the meter. <br>Extra Miscellaneous extras and surcharges.  Currently, this only includes the 0.50 and 1 USD rush hour and overnight charges. |
|MTA_tax |0.50 USD MTA tax that is automatically triggered based on the metered rate in use. |
|Improvement_surcharge | 0.30 USD improvement surcharge assessed trips at the flag drop. The improvement surcharge began being levied in 2015. |
|Tip_amount |Tip amount – This field is automatically populated for credit card tips. Cash tips are not included. |
| Tolls_amount | Total amount of all tolls paid in trip.  |
| total_amount | The total amount charged to passengers. Does not include cash tips. |
|Congestion_Surcharge |Total amount collected in trip for NYS congestion surcharge. |
| Airport_fee | 1.25 USD for pick up only at LaGuardia and John F. Kennedy Airports|

Although the amounts of extra charges and taxes applied are specified in the data dictionary, you will see that some cases have different values of these charges in the actual data.

**Taxi Zones**

Each of the trip records contains a field corresponding to the location of the pickup or drop-off of the trip, populated by numbers ranging from 1-263.

These numbers correspond to taxi zones, which may be downloaded as a table or map/shapefile and matched to the trip records using a join.

This is covered in more detail in later sections.

---

# **1 Data Preperation**

## **Import Libraries**

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns


from google.colab import files
files.upload()

## **1.1 Load the Dataset**

In [None]:
import pandas as pd
df = pd.read_parquet("2023-1.parquet")
df.info()

In [None]:
# Sample 5% of the DataFrame
sampled_df = df.sample(frac=0.05, random_state=42) # Using a random_state for reproducibility

# Display information about the sampled DataFrame
display(sampled_df.info())

**1. Upload more files**

In [None]:
from google.colab import files
uploaded = files.upload()

In [None]:
import pandas as pd

for filename in uploaded.keys():
  print(f"\n Reading file: {filename}")
  try:
    df = pd.read_parquet(filename)
    sample_df = df.sample(frac=0.05, random_state=42)
    sampled_filename = f"sampled_{filename.replace('.parquet', '.csv')}"
    sample_df.to_csv(sampled_filename, index=False)
    print(f"saved 5% sample to: {sampled_filename}")
  except Exception as e:
    print(f"Error reading {filename}: {e}")

In [None]:
from google.colab import files as Taxi_records_2

for filename in uploaded.keys():
    sampled_filename = f"sampled_{filename.replace('.parquet', '.csv')}"
    Taxi_records_2.download(sampled_filename)



In [None]:
from google.colab import files
uploaded = files.upload()

In [None]:
import pandas as pd

for filename in uploaded.keys():
  print(f"\n Reading file: {filename}")
  try:
    df = pd.read_parquet(filename)
    sample_df = df.sample(frac=0.05, random_state=42)
    sampled_filename = f"sampled_{filename.replace('.parquet', '.csv')}"
    sample_df.to_csv(sampled_filename, index=False)
    print(f"saved 5% sample to: {sampled_filename}")
  except Exception as e:
    print(f"Error reading {filename}: {e}")

In [None]:
from google.colab import files as Taxi_records_3

for filename in uploaded.keys():
    sampled_filename = f"sampled_{filename.replace('.parquet', '.csv')}"
    Taxi_records_3.download(sampled_filename)

In [None]:
from google.colab import files
uploaded = files.upload()

In [None]:
import pandas as pd

for filename in uploaded.keys():
  print(f"\n Reading file: {filename}")
  try:
    df = pd.read_parquet(filename)
    sample_df = df.sample(frac=0.05, random_state=42)
    sampled_filename = f"sampled_{filename.replace('.parquet', '.csv')}"
    sample_df.to_csv(sampled_filename, index=False)
    print(f"saved 5% sample to: {sampled_filename}")
  except Exception as e:
    print(f"Error reading {filename}: {e}")

In [None]:
from google.colab import files as Taxi_records_4

for filename in uploaded.keys():
    sampled_filename = f"sampled_{filename.replace('.parquet', '.csv')}"
    Taxi_records_4.download(sampled_filename)

In [None]:
from google.colab import files
uploaded = files.upload()

In [None]:
import pandas as pd

for filename in uploaded.keys():
  print(f"\n Reading file: {filename}")
  try:
    df = pd.read_parquet(filename)
    sample_df = df.sample(frac=0.05, random_state=42)
    sampled_filename = f"sampled_{filename.replace('.parquet', '.csv')}"
    sample_df.to_csv(sampled_filename, index=False)
    print(f"saved 5% sample to: {sampled_filename}")
  except Exception as e:
    print(f"Error reading {filename}: {e}")

In [None]:
from google.colab import files as Taxi_records_4

for filename in uploaded.keys():
    sampled_filename = f"sampled_{filename.replace('.parquet', '.csv')}"
    Taxi_records_4.download(sampled_filename)

In [None]:
from google.colab import files
uploaded = files.upload()


In [None]:
import pandas as pd

for filename in uploaded.keys():
  print(f"\n Reading file: {filename}")
  try:
    df = pd.read_parquet(filename)
    sample_df = df.sample(frac=0.05, random_state=42)
    sampled_filename = f"sampled_{filename.replace('.parquet', '.csv')}"
    sample_df.to_csv(sampled_filename, index=False)
    print(f"saved 5% sample to: {sampled_filename}")
  except Exception as e:
    print(f"Error reading {filename}: {e}")

In [None]:
files.download(sampled_filename)

**Upload CSV files and Combine**

In [None]:
from google.colab import files
uploaded = files.upload()


In [None]:
import pandas as pd

Combined_dataframe = []

for filename in uploaded.keys():
  print(f"Reading file: {filename}")

  df = pd.read_csv(filename)

  Combined_dataframe.append(df)

Combined_dataframe = pd.concat(Combined_dataframe, ignore_index=True)
print("Combined all sampled files succcessfully!")
print(f"Combined Dataframe shape: {Combined_dataframe.shape}")

In [None]:
Combined_dataframe.to_csv("Combined_dataframe.csv", index=False)
files.download("Combined_dataframe.csv")

**Read both Taxi_Records Combined_CSV and Taxi_zone Shapefile**

In [None]:
import pandas as pd
import zipfile
import os

In [None]:
from google.colab import files
uploaded = files.upload()

In [None]:
import pandas as pd
df = pd.read_csv("Combined_dataframe.csv")

In [None]:
from google.colab import files
uploaded = files.upload()

In [None]:
import geopandas as gpd
import zipfile
import os

with zipfile.ZipFile("taxi_zones 2.zip", "r") as zip_ref:
  zip_ref.extractall("taxi_zones")

# List files in the nested directory
print(os.listdir("taxi_zones/taxi_zones 2"))

shp_files = [f for f in os.listdir("taxi_zones/taxi_zones 2") if f.endswith(".shp")]
zones_gdf = gpd.read_file("taxi_zones/taxi_zones 2/" + shp_files[0])

# **2 Data Cleaning**

In [None]:
df.head()

In [None]:
df.describe()

In [None]:
df.columns = df.columns.str.strip().str.lower().str.replace(' ', '_')
print(df.columns.tolist())

In [None]:
critical_cols = ['tpep_pickup_datetime', 'tpep_dropoff_datetime', 'pulocationid', 'dolocationid', 'store_and_fwd_flag', 'payment_type', 'fare_amount', 'total_amount' , 'passenger_count']
df.dropna(subset=critical_cols, inplace = True)
df.head()
print("Updated DataFrame shape:", df.shape)


In [None]:
print(df[critical_cols].isnull().sum())

In [None]:
print(df.dtypes)

In [None]:
print(df[critical_cols])
display(df[critical_cols].head())

In [None]:
df['payment_type'].describe()

**Removing bad records**


This avoids misleading insights about user experience, like ‘ghost’ trips or system bugs, and helps PMs spot genuine trends and issues.

In [None]:
df['tpep_pickup_datetime'] = pd.to_datetime(df['tpep_pickup_datetime'])
df['tpep_dropoff_datetime'] = pd.to_datetime(df['tpep_dropoff_datetime'])
df = df[(df['fare_amount']>0) & (df['total_amount']>0) & (df['trip_distance'] > 0) & (df['passenger_count'] >0)]
df = df.drop_duplicates()
print(df.head())

In [None]:
print("Number of duplicates rows left:", df.duplicated().sum())

In [None]:
df['trip_duration_min'] = (df['tpep_dropoff_datetime'] - df['tpep_pickup_datetime']).dt.total_seconds() / 60

df = df[(df['trip_duration_min'] > 0) & (df['trip_duration_min'] < 180)]

display(df[['tpep_pickup_datetime', 'tpep_dropoff_datetime', 'trip_duration_min']].head())

In [None]:
# For pickup datetime
df['pickup_date'] = df['tpep_pickup_datetime'].dt.date         # Just the date (YYYY-MM-DD)
df['pickup_time'] = df['tpep_pickup_datetime'].dt.time         # Just the time (HH:MM:SS)


# For dropoff datetime
df['dropoff_date'] = df['tpep_dropoff_datetime'].dt.date
df['dropoff_time'] = df['tpep_dropoff_datetime'].dt.time
print(df[['tpep_pickup_datetime', 'tpep_dropoff_datetime', 'pickup_date', 'pickup_time', 'dropoff_date', 'dropoff_time']].head())

In [None]:
df = df.drop(['ratecodeid' , 'airport_fee'], axis=1)
display(df.head())

**Outlier filtering for fare and distance**

Removing outliers ensures product and business metrics (like ‘average trip time’ on the homepage or ‘surge pricing triggers’) reflect real customer journeys, not system glitches or edge cases.

In [None]:
# Calculate IQR for 'fare_amount' and 'trip_distance'
Q1_fare = df['fare_amount'].quantile(0.25)
Q3_fare = df['fare_amount'].quantile(0.75)
IQR_fare = Q3_fare - Q1_fare

Q1_distance = df['trip_distance'].quantile(0.25)
Q3_distance = df['trip_distance'].quantile(0.75)
IQR_distance = Q3_distance - Q1_distance

# Define outlier bounds
upper_bound_fare = Q3_fare + 1.5 * IQR_fare
upper_bound_distance = Q3_distance + 1.5 * IQR_distance

# Filter outliers
df = df[df['fare_amount'] <= upper_bound_fare]
df = df[(df['trip_distance'] <= upper_bound_distance)]

print("DataFrame shape after outlier filtering:", df.shape)

# **Exploratory Data Analysis**

**Number of trips by Trip Duration**

In [None]:
import matplotlib.pyplot as plt
import seaborn as sns

plt.figure(figsize=(8,4))
df['trip_duration_min'].hist(bins=60)
plt.xlabel('Trip Duration (min)')
plt.ylabel('Number of trips')
plt.title('Trip Duration Distribution')
plt.show()

1. Understanding the typical ride time helps, optimize ETA algorithms, plan
feature rollouts (like ‘ride share’ vs ‘express’ options), and identify segments needing experience improvements.
2. Uncovering patterns **(e.g., most trips are under 20 minutes)** helps in understanding (‘quickest ride in the city!’) or operational tweaks (fleet positioning).
3. Unusually **many trips are short or long**, there may be **mismatches between user needs and service delivery.**

**Number of trips by Distance**

In [None]:
plt.figure(figsize=(8,4))
df['trip_distance'].hist(bins=60)
plt.xlabel('Trip Distance (miles)')
plt.title('Trip Distance Distribution')
plt.show()

Trip Distance Distribution Insights

1. The majority of NYC taxi trips are short (0.5–2 miles), confirming that taxis serve mostly intra-city mobility needs.
2. Sharp decline in trip counts for distances beyond 2 miles suggests alternative transportation is preferred for longer routes.
3. The most common trip length is just under 1 mile—ideal for product features like flat fares, express rides, or bundled promos.

**Trips numbers by hour of Day**

In [None]:
df['pickup_hour'] = df['tpep_pickup_datetime'].dt.hour
df['pickup_weekday'] = df['tpep_pickup_datetime'].dt.day_name()

plt.figure(figsize=(8,4))
df['pickup_hour'].value_counts().sort_index().plot(kind='line')
plt.xlabel('Hour of Day')
plt.ylabel('Number of Trips')
plt.title('Trips by Hour of Day')
plt.show()

1. Peak demand is during afternoon and early evening:

	•	Trips increase steadily from 7 AM, peaking between 3 PM and 7 PM—likely reflecting office commutes, after-school travel, and early evening social outings.
2. Late-night and early-morning trips are fewest:

	•	Minimal trips from midnight to 5 AM, then a steady morning rise—classic urban pattern, with only essential rides late at night.
3. Sharp drop after 7 PM:

	•	Possible factors: shift to public transit, fewer late-night events, or pricing changes.

**Fare Distribution by TopPickup/Dropoff Zones**

In [None]:
import seaborn as sns
import matplotlib.pyplot as plt

# Merge df with zones_gdf for pickup zones
pickup_zones = df.merge(zones_gdf[['LocationID', 'zone']], left_on='pulocationid', right_on='LocationID', how='left')

# Merge df with zones_gdf for dropoff zones
dropoff_zones = df.merge(zones_gdf[['LocationID', 'zone']], left_on='dolocationid', right_on='LocationID', how='left')

# Get the top 10 pickup zones by count
top_pickup_names = pickup_zones['zone'].value_counts().head(10).index

# Filter data for just those zones
pickup_top10 = pickup_zones[pickup_zones['zone'].isin(top_pickup_names)]

# Create a boxplot for fare distribution in top 10 pickup zones
plt.figure(figsize=(12, 6))
sns.boxplot(data=pickup_top10, x='zone', y='fare_amount')
plt.xlabel('Pickup Zone')
plt.ylabel('Fare Amount ($)')
plt.title('Fare Distribution by Top 10 Pickup Zones')
plt.xticks(rotation=45)
plt.show()

# Get the top 10 dropoff zones by count
top_dropoff_names = dropoff_zones['zone'].value_counts().head(10).index

# Filter data for just those zones
dropoff_top10 = dropoff_zones[dropoff_zones['zone'].isin(top_dropoff_names)]

# Create a boxplot for fare distribution in top 10 dropoff zones
plt.figure(figsize=(12, 6))
sns.boxplot(data=dropoff_top10, x='zone', y='fare_amount')
plt.xlabel('Dropoff Zone')
plt.ylabel('Fare Amount ($)')
plt.title('Fare Distribution by Top 10 Dropoff Zones')
plt.xticks(rotation=45)
plt.show()

1. Consistent Median Fares Across Zones

	•	The median fare (middle line of each box) is surprisingly similar across most top pickup and dropoff zones, generally around $10–$13.
	•	This suggests standardized pricing for most popular short-distance rides, regardless of neighborhood.

2. Significant Fare Spread Within Each Zone

	•	The box heights and whiskers show that fare amounts can vary widely within each zone.
	•	Some zones (like Midtown East, Midtown Center) have wider boxes and longer whiskers, indicating more variability—likely due to traffic, longer routes, or diverse trip patterns in central business areas.

3. Outliers Are Present in Every Zone

	•	The dots above each box are outlier fares—likely representing longer trips, detours, airport runs, or premium surcharges.
	•	These outliers show that even in typically short-trip zones, occasional high fares occur.

4. Some Zones Have Slightly Higher Fares

	•	A few zones (for example, Midtown Center, Lincoln Square East) show slightly higher medians and longer upper whiskers—could be due to:
	•	Higher demand for longer trips from business districts.

**Trip rate vs Fare amount**

In [None]:
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

df['tip_rate'] = df['tip_amount'] / df['fare_amount']
df['tip_rate'] = df['tip_rate'].replace([np.inf, -np.inf], np.nan).clip(lower=0, upper=2)

plt.figure(figsize=(8,4))
df['tip_rate'].dropna().hist(bins=60)
plt.xlabel('Tip Rate (tip/fare)')
plt.title('Tip Rate Distribution')
plt.show()


1. Most tips are below 25% of fare:

	•	The vast majority of trips have a tip rate (tip/fare) between 0% and 25%.
	•	This is consistent with standard tipping expectations for NYC taxis, where 10–20% is typical.
  
2. Very few extreme tip rates:

	•	There are very few trips with tip rates above 50%—most of these are likely data anomalies or rare acts of generosity.
	•	Right-skewed distribution suggests most customers are price-sensitive or use default tipping buttons in the app/card reader.

**Taxi rides vs Payment type**

In [None]:
# Define the mapping as per NYC taxi data dictionary
payment_type_map = {
    1: "Credit card",
    2: "Cash",
    3: "No charge",
    4: "Dispute",
    5: "Unknown",      # Some datasets have 5 as 'Unknown'
    6: "Voided trip"   # And sometimes 6
}

df['payment_type_label'] = df['payment_type'].map(payment_type_map)
print(df[['payment_type', 'payment_type_label']].head())

df['payment_type'] = df['payment_type'].astype(str)
paytype_counts = df['payment_type'].value_counts()

plt.figure(figsize=(6,4))
paytype_counts.plot(kind='bar')
plt.xlabel('Payment Type Code')
plt.ylabel('Trips')
plt.title('Trips by Payment Type')
plt.show()



1. Credit card is dominant:

	•	Over 80% of taxi rides are paid by credit card (code 1, now labeled “Credit card”).
	•	This signals strong adoption of digital payments, likely due to app-based hailing and convenience.
  
2. Cash is a distant second:

	•	Cash rides (code 2) make up a much smaller fraction—perhaps 15–20%.
	•	This could reflect tourists, users with privacy concerns, or those with less access to cards.

In [None]:
# Group by pickup zone to find mean fare and mean trip distance
zone_stats = pickup_zones.groupby('zone').agg(
    mean_fare=('fare_amount', 'mean'),
    mean_distance=('trip_distance', 'mean'),
    trip_count=('fare_amount', 'count')
).reset_index()

zone_stats = zone_stats.sort_values('mean_fare', ascending=False)
zone_stats.head(10)


correlation = zone_stats['mean_distance'].corr(zone_stats['mean_fare'])
print(f"Correlation between mean trip distance and mean fare across zones: {correlation:.2f}")



import matplotlib.pyplot as plt

plt.figure(figsize=(8,5))
plt.scatter(zone_stats['mean_distance'], zone_stats['mean_fare'])
plt.xlabel('Mean Trip Distance (miles) by Zone')
plt.ylabel('Mean Fare ($) by Zone')
plt.title('Zone-Level Correlation: Mean Distance vs Mean Fare')
plt.grid(True)
plt.show()

In [None]:
import seaborn as sns

# Create a pivot table: days as rows, hours as columns
heatmap_data = df.pivot_table(index='pickup_weekday', columns='pickup_hour', values='fare_amount', aggfunc='count')
# Reorder weekdays if you want
weekday_order = ['Monday','Tuesday','Wednesday','Thursday','Friday','Saturday','Sunday']
heatmap_data = heatmap_data.reindex(weekday_order)

plt.figure(figsize=(12,6))
sns.heatmap(heatmap_data, cmap='Blues')
plt.title('Number of Trips: Hour vs. Day of Week')
plt.xlabel('Hour of Day')
plt.ylabel('Day of Week')
plt.show()


1. The strongest demand is during afternoons and early evenings, peaking 4–7 PM on most days.
2. Friday evenings see the highest ride volume, while Saturday demand starts earlier and stays high through the day.
3. Weekday patterns are highly consistent, Trip demand rises steadily after 7 AM, peaks in the evening, and drops sharply after 8 PM.
4. Product, ops, and marketing teams should use these insights to optimize fleet distribution, **schedule promos**, and communicate ETAs during peak periods.

# **4 Conclusion**

## 4.1 **Final Insights and Recommendation**

# **Key Insights**
***	•	Trip Timing Patterns:***

Taxi demand in NYC consistently peaks during afternoons and early evenings (especially 4–7 PM), with Friday evenings and Saturdays being the busiest. Late nights and early mornings see the lowest activity across all days.

***	•	Zone & Distance Trends:***

Most trips are short (1–2 miles), with average fares clustering around $12–$18. A strong positive correlation exists between trip distance and fare across zones, confirming distance as the key driver of taxi pricing.

***	•	Payment Behavior:***

Credit card dominates as the preferred payment method (over 80% of rides), followed by cash. Other payment types are negligible, indicating digital payment flows are well established in the market.

***	•	Tipping Patterns:***

The majority of tips fall between 10–25% of the fare, with very few extreme outliers. Credit card users are most likely to tip, supporting the design of default digital tipping prompts.

***	•	Spatial Demand:***

Certain central business and entertainment districts (like Midtown) show higher fare variability, possibly due to congestion, trip length diversity, or frequent surcharges.



# **Recommendations**

***	1.	Optimize Fleet and Pricing for Peak Hours:***

Prioritize driver allocation and dynamic pricing strategies for high-demand periods—especially Friday evenings and Saturday afternoons/evenings. Use predictive scheduling for weekday afternoon peaks.

***	2.	Enhance Digital Payment and Tipping Experience:***

Focus UX and loyalty features on credit card users, but offer incentives for cash users to switch to digital. Default tipping options (15%, 20%, 25%) are aligned with most user behavior.

***	3.	Improve Fare Transparency: ***

Leverage the strong fare-distance correlation to offer accurate fare estimators and up-front pricing in the app, especially for zones with high variability or outliers.

***	4.	Zone-Specific Product Opportunities:***

Investigate zones with unusually high fares or fare spread for targeted promotions, flat fares, or user education. Flag outlier trips for potential fraud review or enhanced customer support.

***	5.	Targeted Promotions for Off-Peak Hours:***

Use discounts or loyalty programs during low-demand times (late night, early morning) to balance utilization and attract new segments.

***	6.	Communication & Operations:***

Proactively communicate expected wait times and surge pricing during predictable peaks. Ensure safety protocols and quick-response availability during late-night low-demand windows.
