# NYC 311 Noise Complaint Analysis

**Author:** Olivia Mohning  
**Date:** July 2025  
**Dataset:** `311_noise_complaints_sample_small.csv`  

This project explores temporal and spatial patterns in NYC 311 noise complaints and forecasts complaint trends to inform NYPD resource planning. It involves data cleaning, exploratory data analysis (EDA), time series forecasting, and predictive modeling using Python, SQL, and key data science libraries.

## Environment and Tools

- **Languages & Interfaces**: Python 3.11, Jupyter Notebooks, SQL (via SQLite3 and PostgreSQL)
- **Core Libraries**:
  - Data manipulation: `pandas`, `numpy`
  - Visualization: `seaborn`, `matplotlib`, `plotly`
  - Machine learning and modeling: `scikit-learn`
  - Time series analysis: `pandas`, `scikit-learn` (optionally `statsmodels`)
- **Key Concepts**:
  - Time series forecasting
  - Cross-validation and the bias–variance tradeoff
  - Linear regression, binary classification, clustering
  - Dimensionality reduction
  - Regression vs. classification
  - Supervised, unsupervised, and reinforcement learning (conceptual overview)
  - Vectors, matrices, derivatives, integrals, probability, and statistics (theoretical foundations)
- **Data Access**: `sqlite3` for SQL queries in Python; PostgreSQL for external RDBMS queries
- **Visualization Tools**: Static and interactive plots using `matplotlib`, `seaborn`, and `plotly`; external dashboards via Tableau

## Table of Contents

1. [Imports and Setup](#Imports-and-Setup)
2. [Data Preview & Basic Structure](#Data-Preview-&-Basic-Structure)
3. [Data Cleaning](#Data-Cleaning)
4. [Early Visualizations](#Early-Visualizations)
5. [Time-Based Trends](#Time-Based-Trends) *(coming soon)*
6. [Forecasting and Modeling](#Forecasting-and-Modeling) *(coming soon)*
7. [Conclusions and Next Steps](#Conclusions-and-Next-Steps) *(coming soon)*

## Imports and Setup

Load core libraries for data analysis.

In [1]:
import sys
import numpy as np
import pandas as pd
import matplotlib
import matplotlib.pyplot as plt
import seaborn as sns
import plotly
import sklearn
import sqlite3

print("Python:", sys.version)
print("numpy:", np.__version__)
print("pandas:", pd.__version__)
print("matplotlib:", matplotlib.__version__)
print("seaborn:", sns.__version__)
print("plotly:", plotly.__version__)
print("scikit-learn:", sklearn.__version__)
print("sqlite3:", sqlite3.sqlite_version)

Python: 3.11.13 | packaged by conda-forge | (main, Jun  4 2025, 14:52:34) [Clang 18.1.8 ]
numpy: 2.2.6
pandas: 2.3.0
matplotlib: 3.10.3
seaborn: 0.13.2
plotly: 6.2.0
scikit-learn: 1.7.0
sqlite3: 3.50.2


## Data Preview & Basic Structure

Load the dataset, preview its dimensions, and inspect the columns to get a sense of the data. Running the below code snippet, we can see that the dataset is much smaller sample of the original data. This sample was pulled in late July 2025, and includes only 20,000 observations for ease of use.

In [2]:
# Loading in a random sample of 20,000 instances from the NYC 311 noise complaint public dataset
df = pd.read_csv("311_noise_complaints_sample_small.csv")

# Preview dataset size
print(f"Dataset shape: {df.shape}")

Dataset shape: (20000, 41)


In [3]:
# Looking at a sample of the data
df.head()

Unnamed: 0,unique_key,created_date,closed_date,agency,agency_name,complaint_type,descriptor,location_type,incident_zip,incident_address,...,vehicle_type,taxi_company_borough,taxi_pick_up_location,bridge_highway_name,bridge_highway_direction,road_ramp,bridge_highway_segment,latitude,longitude,location
0,65213897,2025-06-09T12:51:30.000,2025-06-11T13:30:50.000,DSNY,Department of Sanitation,Missed Collection,Compost,Street,11369.0,87-04 31 AVENUE,...,,,,,,,,40.75973,-73.88122,"\n, \n(40.75972962354946, -73.88121963439833)"
1,65202348,2025-06-08T23:48:33.000,2025-06-09T00:09:22.000,NYPD,New York City Police Department,Noise - Residential,Banging/Pounding,Residential Building/House,10453.0,40 RICHMOND PLAZA,...,,,,,,,,40.852196,-73.922327,"\n, \n(40.8521960366985, -73.92232722755996)"
2,65271096,2025-06-15T15:18:38.000,2025-06-15T19:24:01.000,NYPD,New York City Police Department,Noise - Street/Sidewalk,Loud Music/Party,Street/Sidewalk,10034.0,229 SEAMAN AVENUE,...,,,,,,,,40.871239,-73.918899,"\n, \n(40.87123914834134, -73.91889895540905)"
3,65214828,2025-06-09T11:11:29.000,,HPD,Department of Housing Preservation and Develop...,APPLIANCE,REFRIGERATOR,RESIDENTIAL BUILDING,10128.0,1845 1 AVENUE,...,,,,,,,,40.78282,-73.945045,"\n, \n(40.782819840288106, -73.94504513362945)"
4,65190650,2025-06-07T17:56:28.000,,DOHMH,Department of Health and Mental Hygiene,Rodent,Rat Sighting,Parking Lot/Garage,11218.0,344 CONEY ISLAND AVENUE,...,,,,,,,,40.649497,-73.971761,"\n, \n(40.649497274437046, -73.97176123316513)"


In [4]:
# Column names, data types, and non-null counts
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 20000 entries, 0 to 19999
Data columns (total 41 columns):
 #   Column                          Non-Null Count  Dtype  
---  ------                          --------------  -----  
 0   unique_key                      20000 non-null  int64  
 1   created_date                    20000 non-null  object 
 2   closed_date                     16668 non-null  object 
 3   agency                          20000 non-null  object 
 4   agency_name                     20000 non-null  object 
 5   complaint_type                  20000 non-null  object 
 6   descriptor                      19461 non-null  object 
 7   location_type                   17596 non-null  object 
 8   incident_zip                    19825 non-null  float64
 9   incident_address                19312 non-null  object 
 10  street_name                     19310 non-null  object 
 11  cross_street_1                  17116 non-null  object 
 12  cross_street_2                  

## Mini "Codebook"

Below is a reference key describing the columns, adapted from the [NYC Open Data documentation](https://data.cityofnewyork.us/Social-Services/311-Service-Requests-from-2010-to-Present/erm2-nwe9/data_dictionary).

- **unique_key**: Unique identifier for each service request  
- **created_date**: Date and time the complaint was created  
- **closed_date**: Date and time the complaint was closed (if closed)  
- **agency**: Code for the agency handling the complaint  
- **agency_name**: Full name of the agency  
- **complaint_type**: General category of the complaint (e.g., "Noise")  
- **descriptor**: More detailed sub-category of the complaint  
- **location_type**: Type of location (e.g., residential building, street, etc.)  
- **incident_zip**: ZIP code where the incident occurred  
- **incident_address**: Street address where the complaint occurred  
- **street_name**: Street name only  
- **cross_street_1** / **cross_street_2**: Intersecting streets near the incident  
- **intersection_street_1** / **intersection_street_2**: Alternate fields for intersection location  
- **address_type**: How the address was provided (e.g., exact, intersection, etc.)  
- **city**: Name of the city the incident occurred in  
- **landmark**: Noted nearby landmark (if applicable)  
- **facility_type**: Type of facility involved (very sparse)  
- **status**: Current status of the complaint (e.g., “Closed”, “Open”)  
- **due_date**: When the agency aimed to resolve the issue by (rarely filled)  
- **resolution_description**: Description of the resolution or response  
- **resolution_action_updated_date**: Timestamp of last resolution update  
- **community_board**: Community board jurisdiction  
- **bbl**: Borough-Block-Lot code (for NYC land lots)  
- **borough**: NYC borough (Manhattan, Bronx, etc.)  
- **x_coordinate_state_plane** / **y_coordinate_state_plane**: NYC-specific coordinates  
- **open_data_channel_type**: How the complaint was submitted (phone, app, etc.)  
- **park_facility_name**: Park facility name (if applicable)  
- **park_borough**: Borough assigned to the park  
- **vehicle_type**: Type of vehicle involved (sparse)  
- **taxi_company_borough**: Borough of taxi company (rare)  
- **taxi_pick_up_location**: Taxi pick-up area (rare)  
- **bridge_highway_name**, **bridge_highway_direction**, **road_ramp**, **bridge_highway_segment**: Location data for complaints on highways/bridges (rare)  
- **latitude** / **longitude**: Geographic coordinates  
- **location**: Combined lat/long string


## Data Cleaning
1. Check the complaint_type value counts of the original dataset.
2. Save a copy of the original dataframe, df, to noise_df, which will include only the rows where the complaint_type column includes the word "noise" - indiscriminate of the string's case. Check complaint_type value counts for noise_df.
3. Drop the now empty columns from df where complaint_type did not include the word "noise." These columns are now empty and safe to drop.
4. Check for any redundant entries, or duplicate rows.
5. Check if any columns have a disproportionately large number of NULL values and deal with them accordingly.
# BOOKMARK

In [40]:
# 1. Checking the complaint_type value counts of our original dataset
df['complaint_type'].value_counts()

complaint_type
Illegal Parking                 3719
Noise - Residential             2467
Noise - Street/Sidewalk         1709
Blocked Driveway                1092
Street Condition                 568
                                ... 
Uprooted Stump                     1
For Hire Vehicle Report            1
AHV Inspection Unit                1
Building Drinking Water Tank       1
Lifeguard                          1
Name: count, Length: 144, dtype: int64

In [41]:
# 2. Saving a copy of df as noise_df, including only noise-related complaint types, and checking the complaint_type value counts
noise_df = df[df['complaint_type'].str.contains("noise", case=False, na=False)].copy()

noise_df['complaint_type'].value_counts()

complaint_type
Noise - Residential         2467
Noise - Street/Sidewalk     1709
Noise - Commercial           493
Noise - Vehicle              349
Noise                        326
Noise - Park                 109
Noise - Helicopter            85
Noise - House of Worship      12
Name: count, dtype: int64

In [42]:
# 3. Dropping columns where every value is now null after filtering for just noise complaints and checking the shape
noise_df = noise_df.dropna(axis=1, how='all')

noise_df.shape

(5550, 33)

In [43]:
# 4. Checking for redundant entries
dupes = noise_df.duplicated().sum()
print(f"Duplicate rows: {dupes}")

Duplicate rows: 0


In [53]:
# 5. Check if any columns contain a disproportionately large number of NULL values
null_pct = (noise_df.isna().mean() * 100).round(2)
null_pct = null_pct[null_pct > 0].sort_values(ascending=False)
null_pct_formatted = null_pct.astype(str) + '%'
print(null_pct_formatted)

vehicle_type                      93.71%
bbl                               11.23%
landmark                           10.0%
location_type                      5.87%
intersection_street_2              5.87%
intersection_street_1              5.75%
city                               4.11%
closed_date                        3.96%
resolution_description             3.03%
resolution_action_updated_date     2.58%
x_coordinate_state_plane           0.59%
y_coordinate_state_plane           0.59%
latitude                           0.59%
longitude                          0.59%
location                           0.59%
cross_street_2                     0.58%
cross_street_1                     0.45%
street_name                        0.31%
incident_address                   0.29%
address_type                       0.02%
incident_zip                       0.02%
dtype: object


Above, we see the column, vehicle_type, contains over 90% NULL values. This is likely because the only complaint types that would need to include a vehicle type are the 349 "Noise - Vehicle" complaint types. Let's confirm that first:

In [74]:
# Complaints where complaint_type IS "Noise - Vehicle"
vehicle_only = noise_df[noise_df['complaint_type'] == "Noise - Vehicle"]
vehicle_only_null_pct = (vehicle_only['vehicle_type'].isna().mean() * 100).round(2)
print(f"NULLs in vehicle_type where complaint_type is 'Noise - Vehicle': {vehicle_only_null_pct}%")

# Complaints where complaint_type is NOT "Noise - Vehicle"
non_vehicle = noise_df[noise_df['complaint_type'] != "Noise - Vehicle"]
non_vehicle_null_pct = (non_vehicle['vehicle_type'].isna().mean() * 100).round(2)
print(f"NULLs in vehicle_type where complaint_type is NOT 'Noise - Vehicle': {non_vehicle_null_pct}% \n")

NULLs in vehicle_type where complaint_type is 'Noise - Vehicle': 0.0%
NULLs in vehicle_type where complaint_type is NOT 'Noise - Vehicle': 100.0% 



# BOOKMARK

In [10]:
# Viewing a random sample of instances to decide if any columns must be reformatted
with pd.option_context('display.max_columns', None):
    display(noise_df.sample(15, random_state=42))

Unnamed: 0,unique_key,created_date,closed_date,agency,agency_name,complaint_type,descriptor,location_type,incident_zip,incident_address,street_name,cross_street_1,cross_street_2,intersection_street_1,intersection_street_2,address_type,city,landmark,status,resolution_description,resolution_action_updated_date,community_board,bbl,borough,x_coordinate_state_plane,y_coordinate_state_plane,open_data_channel_type,park_facility_name,park_borough,vehicle_type,latitude,longitude,location
6843,65234814,2025-06-11T23:29:54.000,2025-06-12T00:42:19.000,NYPD,New York City Police Department,Noise - Commercial,Loud Music/Party,Club/Bar/Restaurant,10004.0,47 STONE STREET,STONE STREET,COENTIES ALLEY,MILL LANE,COENTIES ALLEY,MILL LANE,ADDRESS,NEW YORK,STONE STREET,Closed,The Police Department responded to the complai...,2025-06-12T00:42:23.000,01 MANHATTAN,1000298000.0,MANHATTAN,981390.0,195892.0,ONLINE,Unspecified,MANHATTAN,,40.704355,-74.010315,"\n, \n(40.70435458995396, -74.01031512653833)"
18866,65264690,2025-06-14T15:27:27.000,2025-06-14T16:06:14.000,NYPD,New York City Police Department,Noise - Residential,Loud Music/Party,Residential Building/House,11236.0,10622 FARRAGUT ROAD,FARRAGUT ROAD,EAST 105 STREET,EAST 108 STREET,EAST 105 STREET,EAST 108 STREET,ADDRESS,BROOKLYN,FARRAGUT ROAD,Closed,The Police Department responded to the complai...,2025-06-14T16:06:18.000,18 BROOKLYN,3081740000.0,BROOKLYN,1012907.0,176581.0,ONLINE,Unspecified,BROOKLYN,,40.651304,-73.896725,"\n, \n(40.65130435746643, -73.89672531475608)"
15312,65219843,2025-06-10T23:56:16.000,2025-06-11T00:56:36.000,NYPD,New York City Police Department,Noise - Residential,Loud Music/Party,Residential Building/House,11206.0,141 MONTROSE AVENUE,MONTROSE AVENUE,MANHATTAN AVENUE,GRAHAM AVENUE,MANHATTAN AVENUE,GRAHAM AVENUE,ADDRESS,BROOKLYN,MONTROSE AVENUE,Closed,The Police Department responded to the complai...,2025-06-11T00:56:40.000,01 BROOKLYN,3030520000.0,BROOKLYN,999829.0,196964.0,ONLINE,Unspecified,BROOKLYN,,40.707284,-73.943809,"\n, \n(40.70728372457755, -73.94380894115152)"
742,65265720,2025-06-14T21:32:29.000,2025-06-14T21:50:11.000,NYPD,New York City Police Department,Noise - Residential,Loud Music/Party,Residential Building/House,11235.0,206 CORBIN PLACE,CORBIN PLACE,ORIENTAL BOULEVARD,DEAD END,ORIENTAL BOULEVARD,DEAD END,ADDRESS,BROOKLYN,CORBIN PLACE,Closed,The Police Department responded to the complai...,2025-06-14T21:50:15.000,13 BROOKLYN,3087230000.0,BROOKLYN,997078.0,149362.0,ONLINE,Unspecified,BROOKLYN,,40.576631,-73.953822,"\n, \n(40.576630914121814, -73.95382188377533)"
12129,65199101,2025-06-07T15:07:55.000,2025-06-07T16:22:15.000,NYPD,New York City Police Department,Noise - Street/Sidewalk,Loud Music/Party,Street/Sidewalk,11385.0,55-06 MYRTLE AVENUE,MYRTLE AVENUE,MADISON STREET,PUTNAM AVENUE,MADISON STREET,PUTNAM AVENUE,ADDRESS,RIDGEWOOD,MYRTLE AVENUE,Closed,The Police Department responded to the complai...,2025-06-07T16:22:20.000,05 QUEENS,4035450000.0,QUEENS,1009767.0,194287.0,ONLINE,Unspecified,QUEENS,,40.699913,-73.907974,"\n, \n(40.699912913907106, -73.9079742673068)"
18209,65274883,2025-06-15T03:53:06.000,2025-06-15T04:45:55.000,NYPD,New York City Police Department,Noise - Street/Sidewalk,Loud Music/Party,Street/Sidewalk,10458.0,2340 CROTONA AVENUE,CROTONA AVENUE,EAST 183 STREET,EAST 187 STREET,EAST 183 STREET,EAST 187 STREET,ADDRESS,BRONX,CROTONA AVENUE,Closed,The Police Department responded to the complai...,2025-06-15T04:46:02.000,06 BRONX,2031020000.0,BRONX,1016286.0,249983.0,PHONE,Unspecified,BRONX,,40.852762,-73.884198,"\n, \n(40.85276240967357, -73.8841983188242)"
3956,65257782,2025-06-14T00:45:37.000,2025-06-14T01:02:28.000,NYPD,New York City Police Department,Noise - Street/Sidewalk,Loud Music/Party,Street/Sidewalk,10453.0,DAVIDSON AVENUE,DAVIDSON AVENUE,DAVIDSON AVENUE,WEST 176 STREET,DAVIDSON AVENUE,WEST 176 STREET,INTERSECTION,,,Closed,The Police Department responded to the complai...,2025-06-14T01:02:31.000,05 BRONX,,BRONX,1008326.0,248396.0,PHONE,Unspecified,BRONX,,40.848432,-73.912977,"\n, \n(40.84843186100232, -73.91297729395332)"
7029,65214548,2025-06-09T19:02:42.000,2025-06-09T19:23:14.000,NYPD,New York City Police Department,Noise - Residential,Loud Talking,Residential Building/House,11204.0,1 DAHL COURT,DAHL COURT,DEAD END,58 STREET,DEAD END,58 STREET,ADDRESS,BROOKLYN,DAHL COURT,Closed,The Police Department responded to the complai...,2025-06-09T19:23:17.000,12 BROOKLYN,3054940000.0,BROOKLYN,988666.0,166099.0,ONLINE,Unspecified,BROOKLYN,,40.622579,-73.984092,"\n, \n(40.6225787685612, -73.98409238126375)"
16176,65232097,2025-06-11T22:49:28.000,2025-06-11T23:28:28.000,NYPD,New York City Police Department,Noise - Vehicle,Car/Truck Music,Street/Sidewalk,10455.0,990 LEGGETT AVENUE,LEGGETT AVENUE,BECK STREET,FOX STREET,BECK STREET,FOX STREET,ADDRESS,BRONX,LEGGETT AVENUE,Closed,The Police Department responded to the complai...,2025-06-11T23:28:31.000,02 BRONX,2026840000.0,BRONX,1011965.0,236061.0,ONLINE,Unspecified,BRONX,SUV,40.814565,-73.899875,"\n, \n(40.814565178701955, -73.89987509966802)"
1334,65215816,2025-06-09T23:04:36.000,2025-06-10T00:12:20.000,NYPD,New York City Police Department,Noise - Residential,Banging/Pounding,Residential Building/House,10460.0,985 EAST 174 STREET,EAST 174 STREET,BRYANT AVENUE,LONGFELLOW AVENUE,BRYANT AVENUE,LONGFELLOW AVENUE,ADDRESS,BRONX,EAST 174 STREET,Closed,The Police Department responded to the complai...,2025-06-10T00:12:23.000,03 BRONX,2029980000.0,BRONX,1016184.0,243988.0,MOBILE,Unspecified,BRONX,,40.836308,-73.884596,"\n, \n(40.83630827673318, -73.88459557090619)"


## Further Data Cleaning

Some findings from the above random sample of instances, and plan of action for more cleaning:
1. `created_date`: In datetime64, may need conversion.
2. `closed_date`: Not needed for our purposes. Drop it.
3. `agency` and `agency_name`: Always NYPD; probably not useful for filtering or analysis.
4. `complaint_type` and `descriptor`: Rich source of info. Good candidates for grouping/severity classification later.
5. `location_type`: Potentially interesting; might correlate with complaint_type or borough.
6. `incident_zip`: Probably not necessary if `borough` included. Will check.
7. `incident_address`, `street_name`: Useful for mapping or aggregating. Street name redundant after address. Could combine/clean.
8. `cross_street_1`, `cross_street_2`, `intersection_street_1`, `intersection_street_2`: Redundant after address. Could clean.
9. `address_type`: Could be useful. Explore further.
10. `city`, `landmark`: Consider omitting in favor of borough and address.
11. `status`, `resolution_description`: Could be useful. Explore further.
12. `resolution_action_updated_date`: Compare to created_date and closed_date to determine whether useful.
13. `community_board`, `bbl`: May or may not be useful.
15. `borough`: Definitely useful. Central for geographic analysis.
16. `x_coordinate_state_plane`, `y_coordinate_state_plane`, `latitude`, `longitude`, `location`: Probably redundant, location is messy, latitude/longitude likely best for plotting.
17. `open_data_channel_type`: Keep for now, might show behavioral trends of complaint callers.
18. `park_facility_name`: Likely just noise, consider dropping.
19. `park_borough`: Redundant after borough, drop it.
20. `vehicle_type`: Mostly full of NULLs, also irrelevant. Drop it.

In [11]:
# 1. Converting created_date to datetime for later tests
df['created_date'] = pd.to_datetime(df['created_date'], errors='coerce')

# Confirming created_date is in datetime
print(df['created_date'].dtype)
# expected: datetime64[ns]

# Looking for any non-datetime rows that didn't get parsed
bad_dates = df['created_date'].apply(lambda x: isinstance(x, str)).sum()
print(f"Non-datetime entries: {bad_dates}")

datetime64[ns]
Non-datetime entries: 0


In [12]:
# 2. Droping `closed_date` since it's only useful for response-time analysis, which we won't be doing
noise_df.drop(columns=['closed_date'], inplace=True)

In [13]:
# 3. Inspecting the variety of values in agency and agency_name
print(noise_df['agency'].value_counts(dropna=False))
print("\n")

print(noise_df['agency_name'].value_counts(dropna=False))
print("\n")

# See what complaint types go to each non-NYPD agency
print((noise_df[noise_df['agency'] != 'NYPD']).groupby('agency')['complaint_type'].value_counts())
print("\n")

# Since complaint handling agency doesn't affect our forecasting goal, we'll drop 'agency' and 'agency_name'
noise_df.drop(columns=['agency', 'agency_name'], inplace=True)

agency
NYPD    5139
DEP      326
EDC       85
Name: count, dtype: int64


agency_name
New York City Police Department           5139
Department of Environmental Protection     326
Economic Development Corporation            85
Name: count, dtype: int64


agency  complaint_type    
DEP     Noise                 326
EDC     Noise - Helicopter     85
Name: count, dtype: int64




In [14]:
# 4. Looking at complaint_type and descriptor to see what's there
display(noise_df['complaint_type'].value_counts())
print("\n")
display(noise_df['descriptor'].value_counts())

complaint_type
Noise - Residential         2467
Noise - Street/Sidewalk     1709
Noise - Commercial           493
Noise - Vehicle              349
Noise                        326
Noise - Park                 109
Noise - Helicopter            85
Noise - House of Worship      12
Name: count, dtype: int64





descriptor
Loud Music/Party                                    3317
Banging/Pounding                                     742
Loud Talking                                         682
Car/Truck Music                                      204
Noise: Construction Before/After Hours (NM1)         153
Car/Truck Horn                                        86
Other                                                 81
Engine Idling                                         72
Noise, Barking Dog (NR5)                              46
Noise: Construction Equipment (NC1)                   43
Loud Television                                       36
Noise: air condition/ventilation equipment (NV1)      31
Noise: Alarms (NR3)                                   20
Noise: Jack Hammering (NC2)                           12
Noise: Boat(Engine,Music,Etc) (NR10)                   7
Noise:  lawn care equipment (NCL)                      6
Noise, Ice Cream Truck (NR4)                           4
Noise, Other Animals

In [15]:
# 5. Looking at location_type to see what's there
display(noise_df['location_type'].value_counts(dropna=False).head(10))

location_type
Residential Building/House    2467
Street/Sidewalk               2058
NaN                            326
Store/Commercial               294
Club/Bar/Restaurant            199
Park/Playground                109
Above Address                   85
House of Worship                12
Name: count, dtype: int64

In [16]:
# Explore rows where location_type is missing (NaN)
nan_loc = noise_df[noise_df['location_type'].isna()]

print(f"Rows with NaN location_type: {len(nan_loc)}")

# How are the NaNs distributed across complaint types?
print("\nNaN location_type by complaint_type:")
display(nan_loc['complaint_type'].value_counts())

# Peek at a few examples
display(nan_loc[['created_date', 'borough', 'complaint_type', 'descriptor']].head(15))

Rows with NaN location_type: 326

NaN location_type by complaint_type:


complaint_type
Noise    326
Name: count, dtype: int64

Unnamed: 0,created_date,borough,complaint_type,descriptor
42,2025-06-13T04:17:00.000,MANHATTAN,Noise,Noise: Construction Before/After Hours (NM1)
83,2025-06-10T07:50:00.000,MANHATTAN,Noise,Noise: Construction Equipment (NC1)
95,2025-06-07T20:35:00.000,MANHATTAN,Noise,Noise: Alarms (NR3)
160,2025-06-12T23:47:00.000,BRONX,Noise,Noise: Construction Equipment (NC1)
235,2025-06-14T17:29:00.000,BRONX,Noise,Noise: air condition/ventilation equipment (NV1)
302,2025-06-16T21:41:00.000,BRONX,Noise,Noise: Construction Before/After Hours (NM1)
336,2025-06-12T19:59:00.000,MANHATTAN,Noise,Noise: Alarms (NR3)
360,2025-06-12T18:05:00.000,MANHATTAN,Noise,Noise: air condition/ventilation equipment (NV1)
396,2025-06-11T07:14:00.000,QUEENS,Noise,Noise: Construction Equipment (NC1)
414,2025-06-07T12:11:00.000,BROOKLYN,Noise,Noise: Construction Before/After Hours (NM1)


In the above sample of instances, we see that there are NaN values in the `location_type` column, which all fall under wherever `complaint_type` is just "Noise." We'll deal with this in a bit. First, we'll compare `location_type` and `complaint_type` to check for any overlapping information that may be redundant, as well as any information that is found in one column but not the other.

In [17]:
# Running a cross-tab to check overlap between complaint_type and location_type
ctab = pd.crosstab(noise_df['complaint_type'], noise_df['location_type'], normalize='index')  # row-wise %
display(ctab.round(2))

location_type,Above Address,Club/Bar/Restaurant,House of Worship,Park/Playground,Residential Building/House,Store/Commercial,Street/Sidewalk
complaint_type,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
Noise - Commercial,0.0,0.4,0.0,0.0,0.0,0.6,0.0
Noise - Helicopter,1.0,0.0,0.0,0.0,0.0,0.0,0.0
Noise - House of Worship,0.0,0.0,1.0,0.0,0.0,0.0,0.0
Noise - Park,0.0,0.0,0.0,1.0,0.0,0.0,0.0
Noise - Residential,0.0,0.0,0.0,0.0,1.0,0.0,0.0
Noise - Street/Sidewalk,0.0,0.0,0.0,0.0,0.0,0.0,1.0
Noise - Vehicle,0.0,0.0,0.0,0.0,0.0,0.0,1.0


Indeed, notice that the above table shows a lot of overlap between `complaint_type` and `location_type`. We will clean this up for ease of use. Some notes on how we'll tackle that:
1. I find "complaint_type" more explanatory than "location_type" so I'd like to remove "location_type" if possible.
2. Before I remove "location_type," I want to separate out the "Noise - Commercial" complaint_type into "Club/Bar/Restaurant" and "Store/Commercial" which are both varieties of location_type that I'd like to keep separate for further study.
3. Drop location_type all together.
4. Rename "Noise" to "Noise - Unspecified." This is where those NaN values for location_type were found. We've dropped location_type, so the NaNs are irrelevant, and it's safe to say we don't need to fill them as the type of noise was never specified in the data.
5. Finally, I'll remove "Noise - " as a prefix because it's now useless.

In [18]:
# Make a copy to avoid SettingWithCopyWarning
noise_df = noise_df.copy()

noise_df.loc[
    (noise_df['complaint_type'] == 'Noise - Commercial') & 
    (noise_df['location_type'] == 'Club/Bar/Restaurant'), 
    'complaint_type'
] = 'Noise - Bar/Club/Restaurant'

noise_df.loc[
    (noise_df['complaint_type'] == 'Noise - Commercial') & 
    (noise_df['location_type'] == 'Store/Commercial'), 
    'complaint_type'
] = 'Noise - Store/Commercial'

# Now drop location_type
noise_df.drop(columns=['location_type'], inplace=True)

# Replace "Noise" with "Noise - Unspecified" and remove "Noise - " from prefixes of all values
noise_df['complaint_type'] = noise_df['complaint_type'].replace('Noise', 'Noise - Unspecified')
noise_df['complaint_type'] = noise_df['complaint_type'].str.replace('Noise - ', '', regex=False)

# Print all unique complaint types line by line
for complaint in sorted(noise_df['complaint_type'].unique()):
    print(complaint)

Bar/Club/Restaurant
Helicopter
House of Worship
Park
Residential
Store/Commercial
Street/Sidewalk
Unspecified
Vehicle


In [19]:
# 6. incident_zip: Probably not necessary if borough included. Will check.


In [20]:
# 7. incident_address, street_name: Useful for mapping or aggregating. Street name redundant after address. Could combine/clean.


In [21]:
# 8. cross_street_1, cross_street_2, intersection_street_1, intersection_street_2: Redundant after address. Could clean.


In [22]:
# 9. address_type: Could be useful. Explore further.


In [23]:
# 10. city, landmark: Consider omitting in favor of borough and address.


In [24]:
# 11. status, resolution_description: Could be useful. Explore further.


In [25]:
# 12. resolution_action_updated_date: Compare to created_date and closed_date to determine whether useful.


In [26]:
# 13. community_board, bbl: May or may not be useful.


In [27]:
# 14. borough: Definitely useful. Central for geographic analysis.


In [28]:
# 15. x_coordinate_state_plane, y_coordinate_state_plane, latitude, longitude, location: Probably redundant, location is messy,
# latitude/longitude likely best for plotting.


In [29]:
# 16. open_data_channel_type: Keep for now, might show behavioral trends of complaint callers.


In [30]:
# 17. park_facility_name: Likely just noise, consider dropping.


In [31]:
# 18. park_borough: Redundant after borough, drop it.


In [32]:
# 19. vehicle_type: Mostly full of NULLs, also irrelevant. Drop it.


In [33]:
# Taking a look to ensure everything looks good
noise_df.head(20)

Unnamed: 0,unique_key,created_date,complaint_type,descriptor,incident_zip,incident_address,street_name,cross_street_1,cross_street_2,intersection_street_1,...,borough,x_coordinate_state_plane,y_coordinate_state_plane,open_data_channel_type,park_facility_name,park_borough,vehicle_type,latitude,longitude,location
1,65202348,2025-06-08T23:48:33.000,Residential,Banging/Pounding,10453.0,40 RICHMOND PLAZA,RICHMOND PLAZA,DEAD END,HARLEM RIVER PARK BRIDGE,DEAD END,...,BRONX,1005738.0,249765.0,PHONE,Unspecified,BRONX,,40.852196,-73.922327,"\n, \n(40.8521960366985, -73.92232722755996)"
2,65271096,2025-06-15T15:18:38.000,Street/Sidewalk,Loud Music/Party,10034.0,229 SEAMAN AVENUE,SEAMAN AVENUE,WEST 214 STREET,WEST 215 STREET,WEST 214 STREET,...,MANHATTAN,1006680.0,256704.0,MOBILE,Unspecified,MANHATTAN,,40.871239,-73.918899,"\n, \n(40.87123914834134, -73.91889895540905)"
7,65204700,2025-06-08T22:17:27.000,Street/Sidewalk,Loud Music/Party,10031.0,534 WEST 153 STREET,WEST 153 STREET,AMSTERDAM AVENUE,BROADWAY,AMSTERDAM AVENUE,...,MANHATTAN,999469.0,241935.0,MOBILE,Unspecified,MANHATTAN,,40.830718,-73.945006,"\n, \n(40.83071800761314, -73.94500557250639)"
11,65250887,2025-06-12T21:43:18.000,Residential,Banging/Pounding,10472.0,1040 ROSEDALE AVENUE,ROSEDALE AVENUE,BRUCKNER BOULEVARD,WATSON AVENUE,BRUCKNER BOULEVARD,...,BRONX,1020818.0,240247.0,ONLINE,Unspecified,BRONX,,40.826022,-73.867869,"\n, \n(40.82602234487356, -73.86786944351837)"
14,65277378,2025-06-15T13:38:52.000,Store/Commercial,Loud Music/Party,11101.0,43-40 NORTHERN BOULEVARD,NORTHERN BOULEVARD,43 STREET,35 AVENUE,43 STREET,...,QUEENS,1006502.0,213793.0,ONLINE,Unspecified,QUEENS,,40.753461,-73.919685,"\n, \n(40.753460924751224, -73.91968481352149)"
15,65191268,2025-06-08T01:56:52.000,Bar/Club/Restaurant,Loud Music/Party,11354.0,137-72 NORTHERN BOULEVARD,NORTHERN BOULEVARD,LEAVITT STREET,UNION STREET,LEAVITT STREET,...,QUEENS,1031725.0,217601.0,PHONE,Unspecified,QUEENS,,40.763813,-73.82862,"\n, \n(40.7638134402157, -73.82861953775803)"
16,65290763,2025-06-18T00:11:33.000,Street/Sidewalk,Loud Talking,11237.0,44 WILSON AVENUE,WILSON AVENUE,GEORGE STREET,MELROSE STREET,GEORGE STREET,...,BROOKLYN,1003829.0,195325.0,ONLINE,Unspecified,BROOKLYN,,40.702777,-73.929386,"\n, \n(40.70277711007218, -73.92938632693719)"
24,65245872,2025-06-12T23:02:13.000,Residential,Banging/Pounding,10314.0,44 ANJALI LOOP,ANJALI LOOP,BEND,BEND,BEND,...,STATEN ISLAND,943065.0,153927.0,ONLINE,Unspecified,STATEN ISLAND,,40.589075,-74.148285,"\n, \n(40.58907453407397, -74.1482850736618)"
26,65193163,2025-06-08T02:08:34.000,Street/Sidewalk,Loud Music/Party,10452.0,1416 UNDERCLIFF AVENUE,UNDERCLIFF AVENUE,WEST 171 STREET,BOSCOBEL PLACE,WEST 171 STREET,...,BRONX,1004868.0,246731.0,MOBILE,Unspecified,BRONX,,40.843871,-73.925481,"\n, \n(40.843870679050376, -73.92548134860958)"
27,65191365,2025-06-07T12:50:52.000,Residential,Loud Music/Party,11206.0,54 MELROSE STREET,MELROSE STREET,BROADWAY,BUSHWICK AVENUE,BROADWAY,...,BROOKLYN,1002266.0,193732.0,ONLINE,Unspecified,BROOKLYN,,40.698408,-73.935028,"\n, \n(40.69840801910899, -73.9350277060994)"


## Early Visualizations
Following our more extensive data exploration and cleaning seen above, we will build some preliminary data visualizations to increase our understanding of where the noise is concentrated in space and time/seasonality.

We'll start with bar graphs showing complaint type by Borough.

In [34]:
# 1. Bar graph showing complaint_type for Brooklyn


In [35]:
# 2. Bar graph showing complaint_type for The Bronx


In [36]:
# 3. Bar graph showing complaint_type for Manhattan


In [37]:
# 4. Bar graph showing complaint_type for Queens


In [38]:
# 5. Bar graph showing complaint_type for Staten Island


In [39]:
# 6. Bar graph showing descriptor


## Time-Based Trends
Time based trends description

## Forecasting and Modeling
Forecasting and modeling description

## Conclusions and Next Steps
Forecasting and modeling description