# Project 2 NYC Yellow Taxi Trips vs Daily 311 Complaints
In this project, I combine two NYC datasets:
1. Yellow Taxi trip records
2. 311 Service Requests

I chose to compare 311 service requests with yellow taxi trips because both reflect different sides of daily life in New York City. 311 complaints indicate how active or disrupted the city is. People call 311 when something is wrong or when they need city services. Taxi usage, on the other hand, reflects how much people are moving around the city. By comparing these two datasets, I wanted to see whether days with more city activity or disruptions also correspond to days with greater demand for taxis. This helps reveal how changes in city conditions might affect transportation patterns.

In [6]:
import pandas as pd
import matplotlib.pyplot as plt

plt.rcParams["figure.figsize"] = (12, 6)


# Step 1: Load Yellow Taxi Data
I use one month of Yellow Taxi trip records (January 2019) to keep the file size manageable.
 The original data is provided as a CSV file from NYC TLC / NYC Open Data and saved locally
as `yellow_tripdata_2019-01.csv`.



In [20]:
taxi = pd.read_csv("yellow_tripdata_2019-01.csv")
taxi.head()
taxi.columns


Index(['VendorID', 'tpep_pickup_datetime', 'tpep_dropoff_datetime',
       'passenger_count', 'trip_distance', 'RatecodeID', 'store_and_fwd_flag',
       'PULocationID', 'DOLocationID', 'payment_type', 'fare_amount', 'extra',
       'mta_tax', 'tip_amount', 'tolls_amount', 'improvement_surcharge',
       'total_amount'],
      dtype='object')

# Step 2: Clean Taxi Data & Create Daily Trip Counts


In [21]:
taxi["tpep_pickup_datetime"] = pd.to_datetime(taxi["tpep_pickup_datetime"], errors="coerce")
taxi = taxi.dropna(subset=["tpep_pickup_datetime"])
taxi["date"] = taxi["tpep_pickup_datetime"].dt.date

taxi = taxi[taxi["trip_distance"] > 0]

daily_taxi = (
    taxi.groupby("date")
        .size()
        .reset_index(name="trip_count")
        .sort_values("date")
)

daily_taxi.head()


  taxi["tpep_pickup_datetime"] = pd.to_datetime(taxi["tpep_pickup_datetime"], errors="coerce")


Unnamed: 0,date,trip_count
0,2018-12-31,5
1,2019-01-02,2
2,2019-01-04,2
3,2019-01-07,1
4,2019-01-08,6


# Step 3: Load 311 Data
I load a local CSV file exported from NYC Open Data in 2019 Jan: "311_Service_Requests_201901.csv"

In [23]:
data_311 = pd.read_csv("311_Service_Requests_201901.csv")
data_311.columns

datetime_col = "Created Date"
data_311[datetime_col] = pd.to_datetime(data_311[datetime_col], errors="coerce")
data_311 = data_311.dropna(subset=[datetime_col])
data_311["date"] = data_311[datetime_col].dt.date

daily_311 = (
    data_311.groupby("date")
            .size()
            .reset_index(name="complaint_count")
            .sort_values("date")
)

daily_311.head()




  data_311 = pd.read_csv("311_Service_Requests_201901.csv")
  data_311[datetime_col] = pd.to_datetime(data_311[datetime_col], errors="coerce")


Unnamed: 0,date,complaint_count
0,2019-01-01,4854
1,2019-01-02,8134
2,2019-01-03,7855
3,2019-01-04,7718
4,2019-01-05,5169


# Step 4: Merge Taxi and 311 on Date

In [25]:
df = (daily_taxi.merge(daily_311, on="date", how="inner").sort_values("date"))

df

Unnamed: 0,date,trip_count,complaint_count
0,2019-01-02,2,8134
1,2019-01-04,2,7718
2,2019-01-07,1,8620
3,2019-01-08,6,7904
4,2019-01-10,2,7313
5,2019-01-12,4,6669
6,2019-01-13,2,5116
7,2019-01-16,1,8146
8,2019-01-18,3,6732
9,2019-01-19,6,5307


# Step 5: Visualization: Daily Taxi Trips vs Daily 311 Complaints

In [28]:
import plotly.express as px
fig = px.scatter(
    df,
    x="complaint_count",
    y="trip_count",
    trendline="ols",
    title="NYC 311 Complaints vs Taxi Trips (Jan 2019)",
    labels={
        "complaint_count": "Daily 311 Complaint Count",
        "trip_count": "Daily Yellow Taxi Trip Count"
    }
)

fig.show()
df[["trip_count", "complaint_count"]].corr()


Unnamed: 0,trip_count,complaint_count
trip_count,1.0,0.112354
complaint_count,0.112354,1.0


# Takeaway

In this project, i created a scatter plot with a linear trendline and calculated the correlation between the two variables. The trendline shows a slight upward slope, suggesting that days with more 311 complaints tend to have somewhat higher taxi trip counts. However, this pattern is weak. The correlation between the two variables is only 0.112, indicating that the relationship is positive but extremely small.

This means that while the city appears slightly more active on days with higher complaint volumes, 311 request frequency explains very little of the variation in taxi demand. The weak relationship makes sense given that taxi usage is influenced by many other factors that are not captured by 311 data. Additionally, 311 complaints encompass a wide range of issues that might hard to generalize in this scenario.

Overall, the analysis shows that daily 311 complaints and taxi trips do not strongly move together. This exercise demonstrates how two independent city datasets can be merged and visualized effectively, and highlights the complexity of urban mobility patterns.