# Project 3: Noise complaints in New York City

## Prompt

Dataset(s) to be used:  
311_requests_2018-19_sample_clean.csv  
(Cleaned subset of https://storage.googleapis.com/python-public-policy/311_2018-2019.csv)

Analysis question:  
Are there more "Noise - Residential" complaints on weekends than on weekdays in NYC between August 2018 and August 2019?

Columns that will (likely) be used:  
- Created Date  
- Complaint Type  
- Borough  
- Incident Zip  

(If you’re using multiple datasets) Columns to be used to merge/join them:  
- [Not applicable for now]

Hypothesis:  
There are more noise complaints on weekends than on weekdays.  

## Data and setup

In this section, I load the 311 sample dataset into a pandas DataFrame.  
I keep only records for "Noise – Residential" complaints between August 2018 and August 2019.


In [3]:
import pandas as pd
import plotly.express as px

import plotly.io as pio
pio.renderers.default = "notebook_connected+plotly_mimetype"

In [4]:
requests_311 = pd.read_csv(
    "311_requests_2018-19_sample_clean.csv"
)


Columns (8,20,31,34) have mixed types. Specify dtype option on import or set low_memory=False.



In [5]:
requests_311.head()
requests_311.dtypes

Unique Key                          int64
Created Date                       object
Closed Date                        object
Agency                             object
Agency Name                        object
Complaint Type                     object
Descriptor                         object
Location Type                      object
Incident Zip                       object
Incident Address                   object
Street Name                        object
Cross Street 1                     object
Cross Street 2                     object
Intersection Street 1              object
Intersection Street 2              object
Address Type                       object
City                               object
Landmark                           object
Facility Type                      object
Status                             object
Due Date                           object
Resolution Description             object
Resolution Action Updated Date     object
Community Board                   

## From individual complaints to daily counts

Here I convert the `Created Date` column to datetime and resample the data by day.  
This gives me a time series of how many "Noise – Residential" complaints happen on each date.


In [6]:
requests_311["Created Date"] = pd.to_datetime(
    requests_311["Created Date"],
    format="%m/%d/%Y %I:%M:%S %p",
    errors="coerce",
)
requests_311["Closed Date"] = pd.to_datetime(
    requests_311["Closed Date"],
    format="%m/%d/%Y %I:%M:%S %p",
    errors="coerce",
)

requests_311[["Created Date", "Closed Date"]].head()
requests_311.dtypes

Unique Key                                 int64
Created Date                      datetime64[ns]
Closed Date                       datetime64[ns]
Agency                                    object
Agency Name                               object
Complaint Type                            object
Descriptor                                object
Location Type                             object
Incident Zip                              object
Incident Address                          object
Street Name                               object
Cross Street 1                            object
Cross Street 2                            object
Intersection Street 1                     object
Intersection Street 2                     object
Address Type                              object
City                                      object
Landmark                                  object
Facility Type                             object
Status                                    object
Due Date            

In [7]:
noise = requests_311[requests_311["Complaint Type"] == "Noise - Residential"].copy()

noise.head()
noise["Complaint Type"].value_counts()

Complaint Type
Noise - Residential    41311
Name: count, dtype: int64

In [8]:
noise_per_day = (
    noise
    .resample("D", on="Created Date")
    .size()
    .reset_index(name="count_requests")
)

noise_per_day.head()

Unnamed: 0,Created Date,count_requests
0,2018-08-01,50
1,2018-08-02,49
2,2018-08-03,65
3,2018-08-04,162
4,2018-08-05,191


In [9]:
fig = px.line(
    noise_per_day,
    x="Created Date",
    y="count_requests",
    title="Noise - Residential complaints per day",
)
fig.show()

## Comparing weekdays and weekends

Next, I classify each day as a weekday or weekend using the day of week.
Then I group by this flag to calculate the mean and median number of complaints for weekdays and for weekends.


In [10]:
noise_per_day["day_of_week"] = noise_per_day["Created Date"].dt.dayofweek

noise_per_day["is_weekend"] = noise_per_day["day_of_week"].isin([5, 6])

noise_per_day.head()
noise_per_day["is_weekend"].value_counts()

is_weekend
False    278
True     111
Name: count, dtype: int64

In [11]:
weekend_stats = (
    noise_per_day
    .groupby("is_weekend")["count_requests"]
    .agg(["mean", "median", "count"])
    .reset_index()
)

weekend_stats

Unnamed: 0,is_weekend,mean,median,count
0,False,76.179856,70.5,278
1,True,181.378378,167.0,111


In [12]:
weekend_stats["day_type"] = weekend_stats["is_weekend"].map(
    {False: "Weekday", True: "Weekend"}
)
weekend_stats

Unnamed: 0,is_weekend,mean,median,count,day_type
0,False,76.179856,70.5,278,Weekday
1,True,181.378378,167.0,111,Weekend


## Visualizing the pattern

Finally, I plot the daily complaint counts over time and compare weekdays with weekends.
The charts help show visually how weekend days tend to have many more complaints than weekdays.

In [13]:
fig = px.bar(
    weekend_stats,
    x="day_type",
    y="mean",
    title="Average daily noise complaints: Weekend vs Weekday",
    labels={"mean": "Average number of complaints", "day_type": "Day type"},
)
fig.show()

In [14]:
# Aggregate complaints per day and borough
noise_per_day_boro = (
    noise
    .resample("D", on="Created Date")
    .agg({"Borough": "first"})
    .merge(
        noise
        .resample("D", on="Created Date")
        .size()
        .rename("count_requests"),
        left_index=True,
        right_index=True,
    )
    .reset_index()
)

# Create day_of_week and is_weekend again
noise_per_day_boro["day_of_week"] = noise_per_day_boro["Created Date"].dt.dayofweek
noise_per_day_boro["is_weekend"] = noise_per_day_boro["day_of_week"].isin([5, 6])

In [15]:
boro_weekend_stats = (
    noise_per_day_boro
    .groupby(["Borough", "is_weekend"])["count_requests"]
    .mean()
    .reset_index()
)

boro_weekend_stats["day_type"] = boro_weekend_stats["is_weekend"].map(
    {False: "Weekday", True: "Weekend"}
)

boro_weekend_stats.head()

Unnamed: 0,Borough,is_weekend,count_requests,day_type
0,BRONX,False,77.333333,Weekday
1,BRONX,True,182.782609,Weekend
2,BROOKLYN,False,76.375,Weekday
3,BROOKLYN,True,178.210526,Weekend
4,MANHATTAN,False,72.68,Weekday


In [16]:
fig = px.bar(
    boro_weekend_stats,
    x="day_type",
    y="count_requests",
    color="day_type",
    facet_col="Borough",
    title="Average daily noise complaints by borough and day type",
    labels={"count_requests": "Average number of complaints", "day_type": "Day type"},
)
fig.show()

## Results

From the daily time series, I summarize the key numbers below:

- On **weekdays**, the average number of daily "Noise - Residential" complaints is **about 76.2** (median 70.5, 278 days).
- On **weekends**, the average number of daily "Noise - Residential" complaints is **about 181.4** (median 167.0, 111 days).

The weekend average is more than twice the weekday average. This supports my original hypothesis that weekends have more noise complaints than weekdays.

## Conclusion

Using 311 "Noise - Residential" complaints from August 2018 to August 2019, I find that weekend days have a much higher average number of daily complaints than weekday days. Therefore, my hypothesis that there are more noise complaints on weekends than on weekdays is supported by the data.

This result suggests that residential neighborhoods in New York City are noisier on weekends, possibly because more people are at home or hosting social activities. Future work could compare different years or study other complaint types to see whether this weekend–weekday pattern also appears in other forms of urban disturbance.

Limitations:
- The dataset only covers requests from August 2018 to August 2019.
- We treat each day equally and do not control for holidays or weather.
- We assume that all complaints are reported at similar rates across weekdays and weekends.

Future work:
- Compare noise complaints in different years.
- Combine this dataset with population or land-use data.
- Explore other types of complaints, such as "HEAT/HOT WATER".