# Analysis of the NYC 311 Dataset

The NYC 311 dataset contains records of non-emergency service requests made by residents of New York City. These complaints are submitted via phone, web, or mobile app and include issues like noise complaints, illegal parking, blocked driveways, and more.

Each record typically includes:
- The type of complaint
- Date and time the complaint was created and closed
- The location of the incident (ZIP code, street address)
- The agency responsible
- Status and resolution details

You will use your knowledge of Pandas to do the following activity.

### Step 1: Load and Inspect the NYC 311 Dataset

In this step, you'll load the NYC 311 complaint dataset from CSV and explore its structure.

- Call the DataFrame `df_311`
- Check how many rows and columns it contains.
- View the first 5 rows to understand the kind of data you're working with.
- List all column names.
- **Pay special attention to the date columns** (e.g., `Created Date`, `Closed Date`, etc.) and observe any formatting inconsistencies.

**Question:**  
What kinds of issues do you notice in the formatting of the date columns?

In [2]:
# Your code here
import pandas as pd
df_311 = pd.read_csv(r'C:\MDA\2025\Term_1\Python for data Analysis (CPSC 610-2)\Week_6\NYC311data_cleaned 1.csv')
df_311.info()
df_311.isnull().sum()
df_311.head()



<class 'pandas.core.frame.DataFrame'>
RangeIndex: 89396 entries, 0 to 89395
Data columns (total 17 columns):
 #   Column                          Non-Null Count  Dtype 
---  ------                          --------------  ----- 
 0   Unique Key                      89396 non-null  int64 
 1   Created Date                    89396 non-null  object
 2   Closed Date                     89381 non-null  object
 3   Complaint Type                  89396 non-null  object
 4   Descriptor                      87544 non-null  object
 5   Location Type                   89352 non-null  object
 6   Incident Zip                    89396 non-null  int64 
 7   Incident Address                76111 non-null  object
 8   Street Name                     76111 non-null  object
 9   Cross Street 1                  75267 non-null  object
 10  Cross Street 2                  75260 non-null  object
 11  Address Type                    89334 non-null  object
 12  Status                          89396 non-null

Unnamed: 0,Unique Key,Created Date,Closed Date,Complaint Type,Descriptor,Location Type,Incident Zip,Incident Address,Street Name,Cross Street 1,Cross Street 2,Address Type,Status,Due Date,Resolution Description,Resolution Action Updated Date,Community Board
0,31685705,10-05-15 22:45,10-06-15 2:09,Blocked Driveway,No Access,Street/Sidewalk,11373,43-31 ELBERTSON STREET,ELBERTSON STREET,LAMONT AVENUE,43 AVENUE,ADDRESS,Closed,10-06-15 6:45,The Police Department responded to the complai...,10-06-15 2:09,04 QUEENS
1,31426484,08/30/2015 09:04:52 PM,08/30/2015 11:53:46 PM,Noise - Street/Sidewalk,Loud Talking,Street/Sidewalk,10030,100 WEST 141 STREET,WEST 141 STREET,LENOX AVENUE,7 AVENUE,ADDRESS,Closed,08/31/2015 05:04:52 AM,The Police Department responded to the complai...,08/30/2015 11:53:46 PM,10 MANHATTAN
2,31473909,09-06-15 23:56,09-07-15 7:30,Noise - Street/Sidewalk,Loud Music/Party,Street/Sidewalk,10034,73 VERMILYEA AVENUE,VERMILYEA AVENUE,ACADEMY STREET,WEST 204 STREET,ADDRESS,Closed,09-07-15 7:56,The Police Department responded to the complai...,09-07-15 7:30,12 MANHATTAN
3,31530153,09/14/2015 11:05:21 AM,09/14/2015 12:52:54 PM,Illegal Parking,Posted Parking Sign Violation,Street/Sidewalk,11385,,,,,INTERSECTION,Closed,09/14/2015 07:05:21 PM,The Police Department responded and upon arriv...,09/14/2015 12:52:54 PM,05 QUEENS
4,31562497,09/18/2015 09:53:11 PM,09/19/2015 04:34:07 AM,Derelict Vehicle,With License Plate,Street/Sidewalk,11235,611 BANNER AVENUE,BANNER AVENUE,BRIGHTON 6 STREET,BRIGHTON 7 STREET,ADDRESS,Closed,09/19/2015 05:53:11 AM,The Police Department responded to the complai...,09/19/2015 04:34:07 AM,13 BROOKLYN


### Step 2: Convert Date Columns to Datetime

In this step, you'll convert the following columns to proper datetime format:

- `Created Date`
- `Closed Date`
- `Due Date`
- `Resolution Action Updated Date`

Since the dataset contains **mixed date formats**, you should **not specify a date format**. Instead, use `errors='coerce'` to safely handle problematic rows (they will become `NaT`).

You may get warnings during this step — **you may ignore them**. The warning may ask you to specify a format, but since there are different kinds of date/time formats in those columns, letting Pandas to fix them is the most robust method.

In [8]:
# Your code here
df_311['Created Date'] = pd.to_datetime(df_311['Created Date'], errors='coerce')
df_311['Closed Date'] = pd.to_datetime(df_311['Closed Date'], errors='coerce')
df_311['Due Date'] = pd.to_datetime(df_311['Due Date'], errors='coerce')
df_311['Resolution Action Updated Date'] = pd.to_datetime(df_311['Resolution Action Updated Date'], errors='coerce')
df_311.info()

  df_311['Created Date'] = pd.to_datetime(df_311['Created Date'], errors='coerce')
  df_311['Closed Date'] = pd.to_datetime(df_311['Closed Date'], errors='coerce')
  df_311['Due Date'] = pd.to_datetime(df_311['Due Date'], errors='coerce')
  df_311['Resolution Action Updated Date'] = pd.to_datetime(df_311['Resolution Action Updated Date'], errors='coerce')


<class 'pandas.core.frame.DataFrame'>
RangeIndex: 89396 entries, 0 to 89395
Data columns (total 17 columns):
 #   Column                          Non-Null Count  Dtype         
---  ------                          --------------  -----         
 0   Unique Key                      89396 non-null  int64         
 1   Created Date                    89396 non-null  datetime64[ns]
 2   Closed Date                     89381 non-null  datetime64[ns]
 3   Complaint Type                  89396 non-null  object        
 4   Descriptor                      87544 non-null  object        
 5   Location Type                   89352 non-null  object        
 6   Incident Zip                    89396 non-null  int64         
 7   Incident Address                76111 non-null  object        
 8   Street Name                     76111 non-null  object        
 9   Cross Street 1                  75267 non-null  object        
 10  Cross Street 2                  75260 non-null  object        
 11  Ad

### Step 3: Sorting Practice Questions to Explore `df_311`

Use the dataset `df_311` to practice various **sorting techniques**.

1. **Basic sorting (ascending)**:  
   - Sort the DataFrame by `Created Date` in ascending order and show the first 5 rows.

2. **Descending sort**:  
   - Sort by `Closed Date` in descending order and display the top 5 complaints with the latest closing times.

3. **Sorting by multiple columns**:  
   - Sort first by `Complaint Type` (A–Z), then by `Created Date` (newest first).

4. **Sorting with a custom function**:  
   - Sort complaints by the **length** of the `Descriptor` column, longest first.

5. **In-place sorting**:  
   - Sort the DataFrame by `Due Date` in-place, then show the last 5 rows.

In [17]:
# 1. Sort the DataFrame by Created Date in ascending order and show the first 5 rows.
# Your code here
df_311_sorted = df_311.sort_values(by='Created Date')
df_311_sorted.head()

Unnamed: 0,Unique Key,Created Date,Closed Date,Complaint Type,Descriptor,Location Type,Incident Zip,Incident Address,Street Name,Cross Street 1,Cross Street 2,Address Type,Status,Due Date,Resolution Description,Resolution Action Updated Date,Community Board
87211,30283424,2015-03-29 00:33:03,2015-03-29 03:40:20,Noise - Commercial,Loud Music/Party,Club/Bar/Restaurant,11206,162 THROOP AVENUE,THROOP AVENUE,HOPKINS STREET,ELLERY STREET,ADDRESS,Closed,2015-03-29 08:33:03,The Police Department responded to the complai...,2015-03-29 03:40:20,03 BROOKLYN
34299,30283432,2015-03-29 00:35:28,2015-03-29 04:14:27,Noise - Street/Sidewalk,Loud Music/Party,Street/Sidewalk,11233,120 CHAUNCEY STREET,CHAUNCEY STREET,STUYVESANT AVENUE,MALCOLM X BOULEVARD,ADDRESS,Closed,2015-03-29 08:35:28,The Police Department responded to the complai...,2015-03-29 04:14:27,03 BROOKLYN
50645,30280732,2015-03-29 00:37:15,2015-03-29 01:02:39,Noise - Commercial,Loud Music/Party,Club/Bar/Restaurant,10014,22 9 AVENUE,9 AVENUE,WEST 13 STREET,WEST 14 STREET,ADDRESS,Closed,2015-03-29 08:37:15,The Police Department reviewed your complaint ...,2015-03-29 01:02:39,02 MANHATTAN
19065,30280506,2015-03-29 00:43:16,2015-03-29 04:25:50,Noise - Vehicle,Car/Truck Music,Street/Sidewalk,10028,415 EAST 86 STREET,EAST 86 STREET,1 AVENUE,YORK AVENUE,ADDRESS,Closed,2015-03-29 08:43:16,The Police Department responded and upon arriv...,2015-03-29 04:25:50,08 MANHATTAN
116,30281090,2015-03-29 00:49:27,2015-03-29 04:25:53,Noise - Vehicle,Car/Truck Music,Street/Sidewalk,10028,420 EAST 86 STREET,EAST 86 STREET,1 AVENUE,YORK AVENUE,ADDRESS,Closed,2015-03-29 08:49:27,The Police Department responded and upon arriv...,2015-03-29 04:25:53,08 MANHATTAN


In [18]:
# 2. Sort by Closed Date in descending order and display the top 5 complaints with the latest closing times.
# Your code here
df_311_closed_sorted = df_311.sort_values(by='Closed Date', ascending=False)
df_311_closed_sorted.head()

Unnamed: 0,Unique Key,Created Date,Closed Date,Complaint Type,Descriptor,Location Type,Incident Zip,Incident Address,Street Name,Cross Street 1,Cross Street 2,Address Type,Status,Due Date,Resolution Description,Resolution Action Updated Date,Community Board
6831,32308423,2015-12-31 23:31:40,2016-01-03 16:22:00,Blocked Driveway,No Access,Street/Sidewalk,10467,3025 WALLACE AVENUE,WALLACE AVENUE,ADEE AVENUE,BURKE AVENUE,ADDRESS,Closed,2016-01-01 07:31:00,The Police Department responded to the complai...,2016-01-03 16:22:00,12 BRONX
47242,32306260,2015-12-31 23:50:57,2016-01-01 10:58:00,Blocked Driveway,No Access,Street/Sidewalk,10453,1770 UNDERCLIFF AVENUE,UNDERCLIFF AVENUE,WEST 176 STREET,SEDGWICK AVENUE,ADDRESS,Closed,2016-01-01 07:50:00,The Police Department responded to the complai...,2016-01-01 10:58:00,05 BRONX
59869,32310624,2015-12-31 18:22:08,2016-01-01 08:27:00,Blocked Driveway,Partial Access,Street/Sidewalk,11416,97-46 77 STREET,77 STREET,97 AVENUE,101 AVENUE,ADDRESS,Closed,2016-01-01 02:22:00,The Police Department responded and upon arriv...,2016-01-01 08:27:00,09 QUEENS
17264,32306268,2015-12-31 20:02:03,2016-01-01 07:47:00,Blocked Driveway,No Access,Street/Sidewalk,10461,1659 WILLIAMSBRIDGE ROAD,WILLIAMSBRIDGE ROAD,PIERCE AVENUE,VAN NEST AVENUE,ADDRESS,Closed,2016-01-01 04:02:00,The Police Department responded to the complai...,2016-01-01 07:47:00,11 BRONX
2304,32308708,2015-12-31 21:43:01,2016-01-01 07:44:00,Illegal Parking,Blocked Hydrant,Street/Sidewalk,11220,167 SENATOR STREET,SENATOR STREET,COLONIAL ROAD,RIDGE BOULEVARD,ADDRESS,Closed,2016-01-01 05:43:00,The Police Department responded and upon arriv...,2016-01-01 07:44:00,10 BROOKLYN


In [22]:
# 3. Sort first by Complaint Type (A–Z), then by Created Date (newest first).
# Your code here
df_311_sorted_complaint = df_311.sort_values(by=['Complaint Type', 'Created Date'], ascending=[True, False])
df_311_sorted_complaint.head()


Unnamed: 0,Unique Key,Created Date,Closed Date,Complaint Type,Descriptor,Location Type,Incident Zip,Incident Address,Street Name,Cross Street 1,Cross Street 2,Address Type,Status,Due Date,Resolution Description,Resolution Action Updated Date,Community Board
6937,32306470,2015-12-31 22:38:04,2015-12-31 23:04:14,Animal Abuse,Tortured,Residential Building/House,11427,89-27 218 STREET,218 STREET,89 AVENUE,90 AVENUE,ADDRESS,Closed,2016-01-01 06:38:00,The Police Department responded to the complai...,2015-12-31 23:04:14,13 QUEENS
4650,32309522,2015-12-31 21:22:54,2015-12-31 22:33:00,Animal Abuse,Neglected,Residential Building/House,11415,84-87 129 STREET,129 STREET,KEW GARDENS ROAD,METROPOLITAN AVENUE,ADDRESS,Closed,2016-01-01 05:22:00,The Police Department responded to the complai...,2015-12-31 22:33:00,09 QUEENS
66899,32310128,2015-12-31 16:59:34,2015-12-31 21:14:28,Animal Abuse,Neglected,Residential Building/House,11416,102-04 89 STREET,89 STREET,102 AVENUE,102 ROAD,ADDRESS,Closed,2016-01-01 00:59:00,The Police Department responded to the complai...,2015-12-31 21:14:28,09 QUEENS
36331,32305067,2015-12-31 14:08:45,2015-12-31 15:02:57,Animal Abuse,Neglected,Residential Building/House,10472,1327 STRATFORD AVENUE,STRATFORD AVENUE,EAST 172 STREET,EAST 174 STREET,ADDRESS,Closed,2015-12-31 22:08:45,The Police Department responded to the complai...,2015-12-31 15:02:57,09 BRONX
82916,32310105,2015-12-31 13:54:38,2015-12-31 14:38:22,Animal Abuse,Other (complaint details),Street/Sidewalk,10023,BROADWAY,BROADWAY,WEST 73 STREET,WEST 74 STREET,BLOCKFACE,Closed,2015-12-31 21:54:38,The Police Department responded to the complai...,2015-12-31 14:38:22,07 MANHATTAN


In [12]:
# 4. Sort complaints by the length of the Descriptor column, longest first.
df_311_sorted_descriptor = df_311.sort_values(by='Descriptor', ascending=False)
df_311_sorted_descriptor.head()

Unnamed: 0,Unique Key,Created Date,Closed Date,Complaint Type,Descriptor,Location Type,Incident Zip,Incident Address,Street Name,Cross Street 1,Cross Street 2,Address Type,Status,Due Date,Resolution Description,Resolution Action Updated Date,Community Board
43,31081303,07/15/2015 12:34:04 PM,07/15/2015 03:02:35 PM,Illegal Parking,30.0,Street/Sidewalk,10462,1190 COMMERCE AVENUE,COMMERCE AVENUE,ELLIS AVENUE,NEWBOLD AVENUE,ADDRESS,Closed,07/15/2015 08:34:04 PM,The Police Department responded to the complai...,07/15/2015 03:02:35 PM,09 BRONX
79990,32172945,12-09-15 10:18,12-09-15 16:00,Illegal Parking,30.0,Street/Sidewalk,11001,255-33 JERICHO TURNPIKE,JERICHO TURNPIKE,CITY LIMIT,256 STREET,ADDRESS,Closed,12-09-15 18:18,The Police Department responded to the complai...,12-09-15 15:59,13 QUEENS
79995,32142307,12-06-15 20:18,12-06-15 23:19,Illegal Parking,30.0,Street/Sidewalk,11203,,,,,INTERSECTION,Closed,12-07-15 4:18,The Police Department responded and upon arriv...,12-06-15 23:19,17 BROOKLYN
89368,31263274,08-07-15 19:02,08-07-15 21:04,Illegal Parking,30.0,Street/Sidewalk,11234,,,,,INTERSECTION,Closed,08-08-15 3:02,The Police Department responded and upon arriv...,08-07-15 21:04,18 BROOKLYN
9615,30981004,07-01-15 15:30,07-01-15 18:52,Illegal Parking,30.0,Street/Sidewalk,11206,672 PARK AVENUE,PARK AVENUE,MARCY AVENUE,TOMPKINS AVENUE,ADDRESS,Closed,07-01-15 23:30,The Police Department responded and upon arriv...,07-01-15 18:52,03 BROOKLYN


In [31]:
# 5. Sort the DataFrame by Due Date in-place, then show the last 5 rows.
# Your code here
df_311.sort_values(by='Due Date', inplace=True)
df_311.tail()

Unnamed: 0,Unique Key,Created Date,Closed Date,Complaint Type,Descriptor,Location Type,Incident Zip,Incident Address,Street Name,Cross Street 1,Cross Street 2,Address Type,Status,Due Date,Resolution Description,Resolution Action Updated Date,Community Board
47242,32306260,2015-12-31 23:50:57,2016-01-01 10:58:00,Blocked Driveway,No Access,Street/Sidewalk,10453,1770 UNDERCLIFF AVENUE,UNDERCLIFF AVENUE,WEST 176 STREET,SEDGWICK AVENUE,ADDRESS,Closed,2016-01-01 07:50:00,The Police Department responded to the complai...,2016-01-01 10:58:00,05 BRONX
77179,32305071,2015-12-31 23:52:58,2016-01-01 07:41:00,Blocked Driveway,No Access,Street/Sidewalk,11372,34-06 73 STREET,73 STREET,34 AVENUE,35 AVENUE,ADDRESS,Closed,2016-01-01 07:52:00,The Police Department responded and upon arriv...,2016-01-01 07:41:00,03 QUEENS
41975,32306559,2015-12-31 23:55:32,2016-01-01 01:53:00,Illegal Parking,Blocked Hydrant,Street/Sidewalk,10032,524 WEST 169 STREET,WEST 169 STREET,AMSTERDAM AVENUE,AUDUBON AVENUE,ADDRESS,Closed,2016-01-01 07:55:00,The Police Department issued a summons in resp...,2016-01-01 01:53:00,12 MANHATTAN
80818,32306529,2015-12-31 23:56:58,2016-01-01 03:24:00,Illegal Parking,Blocked Sidewalk,Street/Sidewalk,11373,87-14 57 ROAD,57 ROAD,SEABURY STREET,HOFFMAN DRIVE,ADDRESS,Closed,2016-01-01 07:56:00,The Police Department responded and upon arriv...,2016-01-01 03:24:00,04 QUEENS
15623,32310363,2015-12-31 23:59:45,2016-01-01 00:55:00,Noise - Street/Sidewalk,Loud Music/Party,Street/Sidewalk,10034,71 VERMILYEA AVENUE,VERMILYEA AVENUE,ACADEMY STREET,WEST 204 STREET,ADDRESS,Closed,2016-01-01 07:59:00,The Police Department responded and upon arriv...,2016-01-01 00:55:00,12 MANHATTAN


### Step 4: Filtering Practice Questions to Explore `df_311`

Use the dataset `df_311` to answer the following questions by applying **filtering techniques** covered in this module.

1. **Boolean indexing**:  
   - Show all rows where the status is `"Closed"`.

2. **isin() for multiple values**:  
   - Show complaints where the complaint type is either `"Illegal Parking"`, `"Noise - Street/Sidewalk"`, or `"Blocked Driveway"`.

3. **Multiple conditions**:  
   - Show `"Blocked Driveway"` complaints where the incident ZIP is either `10007` or `10307`.

4. **String method – startswith()**:  
   - Find all complaints where the street name starts with the letter `'E'`.

5. **iloc[]**:  
   - Display the first 3 rows and the 2nd to 5th columns using integer-based indexing.

6. **loc[]**:  
   - Use label-based selection to show `Complaint Type`, `Created Date`, and `Status` for ZIP code `11373`.

7. **between()**:  
   - Filter complaints where the `Created Date` falls between **October 1, 2015** and **October 2, 2015**.


In [32]:
# 1. Show all rows where the status is "Closed".
# Your code here
df_311[df_311['Status'] == 'Closed']


Unnamed: 0,Unique Key,Created Date,Closed Date,Complaint Type,Descriptor,Location Type,Incident Zip,Incident Address,Street Name,Cross Street 1,Cross Street 2,Address Type,Status,Due Date,Resolution Description,Resolution Action Updated Date,Community Board
87211,30283424,2015-03-29 00:33:03,2015-03-29 03:40:20,Noise - Commercial,Loud Music/Party,Club/Bar/Restaurant,11206,162 THROOP AVENUE,THROOP AVENUE,HOPKINS STREET,ELLERY STREET,ADDRESS,Closed,2015-03-29 08:33:03,The Police Department responded to the complai...,2015-03-29 03:40:20,03 BROOKLYN
34299,30283432,2015-03-29 00:35:28,2015-03-29 04:14:27,Noise - Street/Sidewalk,Loud Music/Party,Street/Sidewalk,11233,120 CHAUNCEY STREET,CHAUNCEY STREET,STUYVESANT AVENUE,MALCOLM X BOULEVARD,ADDRESS,Closed,2015-03-29 08:35:28,The Police Department responded to the complai...,2015-03-29 04:14:27,03 BROOKLYN
50645,30280732,2015-03-29 00:37:15,2015-03-29 01:02:39,Noise - Commercial,Loud Music/Party,Club/Bar/Restaurant,10014,22 9 AVENUE,9 AVENUE,WEST 13 STREET,WEST 14 STREET,ADDRESS,Closed,2015-03-29 08:37:15,The Police Department reviewed your complaint ...,2015-03-29 01:02:39,02 MANHATTAN
19065,30280506,2015-03-29 00:43:16,2015-03-29 04:25:50,Noise - Vehicle,Car/Truck Music,Street/Sidewalk,10028,415 EAST 86 STREET,EAST 86 STREET,1 AVENUE,YORK AVENUE,ADDRESS,Closed,2015-03-29 08:43:16,The Police Department responded and upon arriv...,2015-03-29 04:25:50,08 MANHATTAN
116,30281090,2015-03-29 00:49:27,2015-03-29 04:25:53,Noise - Vehicle,Car/Truck Music,Street/Sidewalk,10028,420 EAST 86 STREET,EAST 86 STREET,1 AVENUE,YORK AVENUE,ADDRESS,Closed,2015-03-29 08:49:27,The Police Department responded and upon arriv...,2015-03-29 04:25:53,08 MANHATTAN
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
47242,32306260,2015-12-31 23:50:57,2016-01-01 10:58:00,Blocked Driveway,No Access,Street/Sidewalk,10453,1770 UNDERCLIFF AVENUE,UNDERCLIFF AVENUE,WEST 176 STREET,SEDGWICK AVENUE,ADDRESS,Closed,2016-01-01 07:50:00,The Police Department responded to the complai...,2016-01-01 10:58:00,05 BRONX
77179,32305071,2015-12-31 23:52:58,2016-01-01 07:41:00,Blocked Driveway,No Access,Street/Sidewalk,11372,34-06 73 STREET,73 STREET,34 AVENUE,35 AVENUE,ADDRESS,Closed,2016-01-01 07:52:00,The Police Department responded and upon arriv...,2016-01-01 07:41:00,03 QUEENS
41975,32306559,2015-12-31 23:55:32,2016-01-01 01:53:00,Illegal Parking,Blocked Hydrant,Street/Sidewalk,10032,524 WEST 169 STREET,WEST 169 STREET,AMSTERDAM AVENUE,AUDUBON AVENUE,ADDRESS,Closed,2016-01-01 07:55:00,The Police Department issued a summons in resp...,2016-01-01 01:53:00,12 MANHATTAN
80818,32306529,2015-12-31 23:56:58,2016-01-01 03:24:00,Illegal Parking,Blocked Sidewalk,Street/Sidewalk,11373,87-14 57 ROAD,57 ROAD,SEABURY STREET,HOFFMAN DRIVE,ADDRESS,Closed,2016-01-01 07:56:00,The Police Department responded and upon arriv...,2016-01-01 03:24:00,04 QUEENS


In [33]:
# 2. Show complaints where the complaint type is either "Illegal Parking", "Noise - Street/Sidewalk", or "Blocked Driveway".
# Your code here
df_311[df_311['Complaint Type'].isin(['Illegal Parking', 'Noise - Street/Sidewalk', 'Blocked Driveway'])]

Unnamed: 0,Unique Key,Created Date,Closed Date,Complaint Type,Descriptor,Location Type,Incident Zip,Incident Address,Street Name,Cross Street 1,Cross Street 2,Address Type,Status,Due Date,Resolution Description,Resolution Action Updated Date,Community Board
34299,30283432,2015-03-29 00:35:28,2015-03-29 04:14:27,Noise - Street/Sidewalk,Loud Music/Party,Street/Sidewalk,11233,120 CHAUNCEY STREET,CHAUNCEY STREET,STUYVESANT AVENUE,MALCOLM X BOULEVARD,ADDRESS,Closed,2015-03-29 08:35:28,The Police Department responded to the complai...,2015-03-29 04:14:27,03 BROOKLYN
85595,30281254,2015-03-29 00:49:51,2015-03-29 02:02:21,Blocked Driveway,No Access,Street/Sidewalk,11385,1912 GROVE STREET,GROVE STREET,WOODWARD AVENUE,ST JOHNS ROAD,ADDRESS,Closed,2015-03-29 08:49:51,The Police Department responded and upon arriv...,2015-03-29 02:02:21,05 QUEENS
48736,30283901,2015-03-29 00:58:08,2015-03-29 04:44:02,Illegal Parking,Blocked Hydrant,Street/Sidewalk,10019,446 WEST 49 STREET,WEST 49 STREET,9 AVENUE,10 AVENUE,ADDRESS,Closed,2015-03-29 08:58:08,The Police Department responded to the complai...,2015-03-29 04:44:02,04 MANHATTAN
11022,30279563,2015-03-29 01:04:01,2015-03-29 01:23:04,Noise - Street/Sidewalk,Loud Talking,Street/Sidewalk,10014,,,,,INTERSECTION,Closed,2015-03-29 09:04:01,The Police Department responded to the complai...,2015-03-29 01:23:04,02 MANHATTAN
79089,30280534,2015-03-29 01:15:51,2015-03-29 02:03:34,Blocked Driveway,No Access,Street/Sidewalk,11358,198-18 32 AVENUE,32 AVENUE,FRANCIS LEWIS BOULEVARD,JORDAN STREET,ADDRESS,Closed,2015-03-29 09:15:51,The Police Department responded to the complai...,2015-03-29 02:03:34,11 QUEENS
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
47242,32306260,2015-12-31 23:50:57,2016-01-01 10:58:00,Blocked Driveway,No Access,Street/Sidewalk,10453,1770 UNDERCLIFF AVENUE,UNDERCLIFF AVENUE,WEST 176 STREET,SEDGWICK AVENUE,ADDRESS,Closed,2016-01-01 07:50:00,The Police Department responded to the complai...,2016-01-01 10:58:00,05 BRONX
77179,32305071,2015-12-31 23:52:58,2016-01-01 07:41:00,Blocked Driveway,No Access,Street/Sidewalk,11372,34-06 73 STREET,73 STREET,34 AVENUE,35 AVENUE,ADDRESS,Closed,2016-01-01 07:52:00,The Police Department responded and upon arriv...,2016-01-01 07:41:00,03 QUEENS
41975,32306559,2015-12-31 23:55:32,2016-01-01 01:53:00,Illegal Parking,Blocked Hydrant,Street/Sidewalk,10032,524 WEST 169 STREET,WEST 169 STREET,AMSTERDAM AVENUE,AUDUBON AVENUE,ADDRESS,Closed,2016-01-01 07:55:00,The Police Department issued a summons in resp...,2016-01-01 01:53:00,12 MANHATTAN
80818,32306529,2015-12-31 23:56:58,2016-01-01 03:24:00,Illegal Parking,Blocked Sidewalk,Street/Sidewalk,11373,87-14 57 ROAD,57 ROAD,SEABURY STREET,HOFFMAN DRIVE,ADDRESS,Closed,2016-01-01 07:56:00,The Police Department responded and upon arriv...,2016-01-01 03:24:00,04 QUEENS


In [34]:
# 3. Show "Blocked Driveway" complaints where the incident ZIP is either 10007 or 10307.
# Your code here
df_311[(df_311['Complaint Type'] == 'Blocked Driveway') & (df_311['Incident Zip'].isin([10007, 10307]))]

Unnamed: 0,Unique Key,Created Date,Closed Date,Complaint Type,Descriptor,Location Type,Incident Zip,Incident Address,Street Name,Cross Street 1,Cross Street 2,Address Type,Status,Due Date,Resolution Description,Resolution Action Updated Date,Community Board
84929,30741450,2015-05-31 20:59:20,2015-05-31 22:38:13,Blocked Driveway,No Access,Street/Sidewalk,10307,17 SEACREST LANE,SEACREST LANE,SPRAGUE AVENUE,SUNSET LANE,ADDRESS,Closed,2015-06-01 04:59:00,The Police Department responded to the complai...,2015-05-31 22:38:13,03 STATEN ISLAND
78229,31158664,2015-07-25 18:30:12,2015-07-25 20:14:54,Blocked Driveway,No Access,Street/Sidewalk,10307,7336 AMBOY ROAD,AMBOY ROAD,FISHER AVENUE,WOOD AVENUE,ADDRESS,Closed,2015-07-26 02:30:12,The Police Department responded and upon arriv...,2015-07-25 20:14:54,03 STATEN ISLAND
53690,31519452,2015-09-12 21:22:00,2015-09-12 22:39:00,Blocked Driveway,No Access,Street/Sidewalk,10307,243 CHELSEA STREET,CHELSEA STREET,HYLAN BOULEVARD,CLERMONT AVENUE,ADDRESS,Closed,2015-09-13 05:22:15,The Police Department responded to the complai...,2015-09-12 22:39:00,03 STATEN ISLAND
61493,31693876,2015-10-06 14:25:00,2015-10-06 23:12:00,Blocked Driveway,No Access,Street/Sidewalk,10007,71 READE STREET,READE STREET,BROADWAY,CHURCH STREET,ADDRESS,Closed,2015-10-06 22:25:00,The Police Department responded to the complai...,2015-10-06 23:12:00,01 MANHATTAN
54978,31713260,2015-10-08 19:13:00,2015-10-08 23:39:00,Blocked Driveway,No Access,Street/Sidewalk,10007,77 READE STREET,READE STREET,BROADWAY,CHURCH STREET,ADDRESS,Closed,2015-10-09 03:13:00,The Police Department responded and upon arriv...,2015-10-08 23:39:00,01 MANHATTAN
47645,31884584,2015-10-31 23:45:30,2015-11-01 08:17:00,Blocked Driveway,No Access,Street/Sidewalk,10307,41 HALE STREET,HALE STREET,AMBOY ROAD,LENHART STREET,ADDRESS,Closed,2015-11-01 07:45:00,The Police Department issued a summons in resp...,2015-11-01 08:17:00,03 STATEN ISLAND
61211,32017774,2015-11-18 14:33:03,2015-11-18 14:49:44,Blocked Driveway,No Access,Street/Sidewalk,10007,8 WARREN STREET,WARREN STREET,BROADWAY,CHURCH STREET,ADDRESS,Closed,2015-11-18 22:33:03,The Police Department responded to the complai...,2015-11-18 14:49:44,01 MANHATTAN
13199,32038599,2015-11-21 10:29:47,2015-11-21 12:52:30,Blocked Driveway,Partial Access,Street/Sidewalk,10307,626 CRAIG AVENUE,CRAIG AVENUE,AMBOY ROAD,SUMMIT ROAD,ADDRESS,Closed,2015-11-21 18:29:47,The Police Department responded and upon arriv...,2015-11-21 12:52:30,03 STATEN ISLAND
82123,32131313,2015-12-04 14:23:00,2015-12-04 19:53:00,Blocked Driveway,Partial Access,Street/Sidewalk,10007,101 BARCLAY STREET,BARCLAY STREET,GREENWICH STREET,WASHINGTON STREET,ADDRESS,Closed,2015-12-04 22:23:00,The Police Department responded and upon arriv...,2015-12-04 19:53:00,01 MANHATTAN


In [35]:
# 4. Find all complaints where the street name starts with the letter 'E'.
# Your code here
df_311[df_311['Street Name'].str.startswith('E', na=False)]

Unnamed: 0,Unique Key,Created Date,Closed Date,Complaint Type,Descriptor,Location Type,Incident Zip,Incident Address,Street Name,Cross Street 1,Cross Street 2,Address Type,Status,Due Date,Resolution Description,Resolution Action Updated Date,Community Board
19065,30280506,2015-03-29 00:43:16,2015-03-29 04:25:50,Noise - Vehicle,Car/Truck Music,Street/Sidewalk,10028,415 EAST 86 STREET,EAST 86 STREET,1 AVENUE,YORK AVENUE,ADDRESS,Closed,2015-03-29 08:43:16,The Police Department responded and upon arriv...,2015-03-29 04:25:50,08 MANHATTAN
116,30281090,2015-03-29 00:49:27,2015-03-29 04:25:53,Noise - Vehicle,Car/Truck Music,Street/Sidewalk,10028,420 EAST 86 STREET,EAST 86 STREET,1 AVENUE,YORK AVENUE,ADDRESS,Closed,2015-03-29 08:49:27,The Police Department responded and upon arriv...,2015-03-29 04:25:53,08 MANHATTAN
72193,30280817,2015-03-29 00:57:25,2015-03-29 02:25:31,Noise - Commercial,Loud Music/Party,Club/Bar/Restaurant,11203,196 EAST 51 STREET,EAST 51 STREET,WINTHROP STREET,CLARKSON AVENUE,ADDRESS,Closed,2015-03-29 08:57:25,The Police Department responded to the complai...,2015-03-29 02:25:31,17 BROOKLYN
8400,30281419,2015-03-29 01:12:31,2015-03-29 01:47:54,Noise - Commercial,Loud Music/Party,Club/Bar/Restaurant,10003,100 EAST 20 STREET,EAST 20 STREET,PARK AVENUE SOUTH,GRAMERCY PARK,ADDRESS,Closed,2015-03-29 09:12:31,Your request can not be processed at this time...,2015-03-29 01:47:54,06 MANHATTAN
83905,30280099,2015-03-29 01:33:12,2015-03-29 04:50:25,Noise - Vehicle,Car/Truck Horn,Street/Sidewalk,10029,169 EAST EAST 111 STREET,EAST 111 STREET,LEXINGTON AVENUE,3 AVENUE,ADDRESS,Closed,2015-03-29 09:33:12,The Police Department issued a summons in resp...,2015-03-29 04:50:25,11 MANHATTAN
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
6135,32305164,2015-12-31 13:40:53,2015-12-31 19:37:52,Noise - Commercial,Loud Music/Party,Store/Commercial,10029,112 EAST 116 ST-MARIN BOULEVARD,EAST 116 ST-MARIN BOULEVARD,PARK AVENUE,LEXINGTON AVENUE,ADDRESS,Closed,2015-12-31 21:40:53,The Police Department responded to the complai...,2015-12-31 19:37:52,11 MANHATTAN
85438,32307008,2015-12-31 14:43:17,2015-12-31 15:40:34,Blocked Driveway,No Access,Street/Sidewalk,11213,1231 EASTERN PARKWAY,EASTERN PARKWAY,ROCHESTER AVENUE,BUFFALO AVENUE,ADDRESS,Closed,2015-12-31 22:43:17,The Police Department responded and upon arriv...,2015-12-31 15:40:34,08 BROOKLYN
34787,32306573,2015-12-31 17:08:07,2015-12-31 18:57:26,Illegal Parking,Blocked Sidewalk,Street/Sidewalk,11210,622 EAST 28 STREET,EAST 28 STREET,FARRAGUT ROAD,FLATBUSH AVENUE,ADDRESS,Closed,2016-01-01 01:08:00,The Police Department responded to the complai...,2015-12-31 18:57:26,14 BROOKLYN
83902,32308422,2015-12-31 21:12:43,2015-12-31 22:50:08,Blocked Driveway,No Access,Street/Sidewalk,11236,1319 EAST 85 STREET,EAST 85 STREET,AVENUE M,AVENUE N,ADDRESS,Closed,2016-01-01 05:12:00,The Police Department responded and upon arriv...,2015-12-31 22:50:08,18 BROOKLYN


In [4]:
# 5. Display the first 3 rows and the 2nd to 5th columns using integer-based indexing.
# Your code here
df_311.iloc[0:3, 1:5]


Unnamed: 0,Created Date,Closed Date,Complaint Type,Descriptor
0,10-05-15 22:45,10-06-15 2:09,Blocked Driveway,No Access
1,08/30/2015 09:04:52 PM,08/30/2015 11:53:46 PM,Noise - Street/Sidewalk,Loud Talking
2,09-06-15 23:56,09-07-15 7:30,Noise - Street/Sidewalk,Loud Music/Party


In [5]:
# 6. Show Complaint Type, Created Date, and Status for ZIP code 11373 using label-based selection.
# Your code here
df_311.loc[df_311['Incident Zip'] == 11373, ['Complaint Type', 'Created Date', 'Status']]


Unnamed: 0,Complaint Type,Created Date,Status
0,Blocked Driveway,10-05-15 22:45,Closed
99,Illegal Parking,05/21/2015 11:50:19 PM,Closed
573,Blocked Driveway,09-08-15 1:45,Closed
767,Animal Abuse,07/29/2015 10:36:21 AM,Closed
973,Homeless Encampment,08/24/2015 04:38:38 PM,Closed
...,...,...,...
88933,Illegal Parking,08/25/2015 12:17:56 PM,Closed
89017,Illegal Parking,06/24/2015 08:39:12 AM,Closed
89129,Blocked Driveway,05-08-15 22:38,Closed
89202,Blocked Driveway,06-11-15 17:55,Closed


In [6]:
# 7. Filter complaints where the Created Date falls between October 1, 2015 and October 2, 2015.
# Your code here
df_311.loc[(df_311['Created Date'] >= '2015-10-01') & (df_311['Created Date'] <= '2015-10-02')]

Unnamed: 0,Unique Key,Created Date,Closed Date,Complaint Type,Descriptor,Location Type,Incident Zip,Incident Address,Street Name,Cross Street 1,Cross Street 2,Address Type,Status,Due Date,Resolution Description,Resolution Action Updated Date,Community Board


### Step 5: Calculate Response Time in Hours

To analyze how long it takes to respond to a 311 complaint, we'll calculate the **response time** by subtracting the `Created Date` from the `Closed Date`.

- Subtracting two datetime columns gives a **Timedelta** object.
- To convert the timedelta to a numeric value (in seconds), use `.dt.total_seconds()`.
- Finally, divide by 3600 to convert seconds to **hours**.

We will store the result in a new column called `Response Time (hrs)`.

In [12]:
# Your code here
df_311['Response Time (hrs)'] = df_311['Closed Date'] - df_311['Created Date']
df_311['Response Time (hrs)'] = df_311['Response Time (hrs)'].dt.total_seconds() / 3600  # convert to hours
df_311[['Created Date', 'Closed Date', 'Response Time (hrs)']].head()


Unnamed: 0,Created Date,Closed Date,Response Time (hrs)
0,2015-10-05 22:45:00,2015-10-06 02:09:00,3.4
1,2015-08-30 21:04:52,2015-08-30 23:53:46,2.815
2,2015-09-06 23:56:00,2015-09-07 07:30:00,7.566667
3,2015-09-14 11:05:21,2015-09-14 12:52:54,1.7925
4,2015-09-18 21:53:11,2015-09-19 04:34:07,6.682222


### Step 6: Analyze Response Times

1. **Filter long response times**  
   Identify complaints where the response time exceeded **24 hours**.

2. **Classify as 'Fast' or 'Slow'**  
   Based on the `Response Time (hrs)`, classify each complaint as:
   - `'Fast'` if response time is **6 hours or less**
   - `'Slow'` if it took **more than 6 hours**

This helps in understanding how efficiently complaints were addressed.

In [16]:
# Filter complaints that took longer than 24 hours to close
# Your code here
df_311[df_311['Response Time (hrs)'] > 24]


Unnamed: 0,Unique Key,Created Date,Closed Date,Complaint Type,Descriptor,Location Type,Incident Zip,Incident Address,Street Name,Cross Street 1,Cross Street 2,Address Type,Status,Due Date,Resolution Description,Resolution Action Updated Date,Community Board,Resolution Time,Response Time (hrs),Response Speed
15,31428361,2015-08-31 16:54:40,2015-09-02 16:40:00,Animal Abuse,Other (complaint details),Street/Sidewalk,11218,1115 DITMAS AVENUE,DITMAS AVENUE,EAST 11 STREET,EAST 12 STREET,ADDRESS,Closed,2015-09-01 00:54:00,The Police Department responded to the complai...,2015-09-02 16:40:00,14 BROOKLYN,1 days 23:45:20,47.755556,Slow
45,31978484,2015-11-15 02:12:28,2015-11-16 11:02:43,Blocked Driveway,No Access,Street/Sidewalk,10453,1740 POPHAM AVENUE,POPHAM AVENUE,WEST 176 STREET,PALISADE PLACE,ADDRESS,Closed,2015-11-15 10:12:28,The Police Department responded to the complai...,2015-11-16 11:02:43,05 BRONX,1 days 08:50:15,32.837500,Slow
60,31213592,2015-08-01 08:48:00,2015-08-03 09:54:00,Noise - House of Worship,Banging/Pounding,House of Worship,11423,196-12 JAMAICA AVENUE,JAMAICA AVENUE,196 STREET,WOODHULL AVENUE,ADDRESS,Closed,2015-08-01 16:48:00,The Police Department responded to the complai...,2015-08-03 09:54:00,12 QUEENS,2 days 01:06:00,49.100000,Slow
134,31769825,2015-10-17 18:26:47,2015-10-19 00:16:41,Blocked Driveway,No Access,Street/Sidewalk,10453,141 WEST 179 STREET,WEST 179 STREET,ANDREWS AVENUE,LORING PLACE,ADDRESS,Closed,2015-10-18 02:26:47,The Police Department responded to the complai...,2015-10-19 00:16:41,05 BRONX,1 days 05:49:54,29.831667,Slow
170,30561986,2015-05-06 10:03:00,2015-05-08 10:35:00,Derelict Vehicle,With License Plate,Street/Sidewalk,11434,179-29 150 ROAD,150 ROAD,NORTH BOUNDARY ROAD,182 STREET,ADDRESS,Closed,2015-05-06 18:03:00,The Police Department responded to the complai...,2015-05-08 10:35:00,13 QUEENS,2 days 00:32:00,48.533333,Slow
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
88819,32205968,2015-12-15 12:59:12,2015-12-16 17:57:24,Noise - Street/Sidewalk,Loud Music/Party,Street/Sidewalk,11210,1599 FLATBUSH AVENUE,FLATBUSH AVENUE,AVENUE H,EAST 32 STREET,ADDRESS,Closed,2015-12-15 20:59:12,The Police Department responded to the complai...,2015-12-16 17:57:24,14 BROOKLYN,1 days 04:58:12,28.970000,Slow
88870,30296970,2015-03-30 11:14:26,2015-03-31 23:36:11,Illegal Parking,Posted Parking Sign Violation,Street/Sidewalk,11416,95-16 91 STREET,91 STREET,95 AVENUE,97 AVENUE,ADDRESS,Closed,2015-03-30 19:14:26,The Police Department responded to the complai...,2015-03-31 23:36:11,09 QUEENS,1 days 12:21:45,36.362500,Slow
89019,30301616,2015-03-31 21:50:29,2015-04-02 00:34:00,Derelict Vehicle,With License Plate,Street/Sidewalk,11218,158 STRATFORD ROAD,STRATFORD ROAD,TURNER PLACE,HINCKLEY PLACE,ADDRESS,Closed,2015-04-01 05:50:00,The Police Department reviewed your complaint ...,2015-04-02 00:34:00,14 BROOKLYN,1 days 02:43:31,26.725278,Slow
89215,31809394,2015-10-24 15:20:31,2015-10-26 18:48:23,Illegal Parking,Commercial Overnight Parking,Street/Sidewalk,11422,,,,,INTERSECTION,Closed,2015-10-24 23:20:31,The Police Department responded to the complai...,2015-10-26 18:48:07,13 QUEENS,2 days 03:27:52,51.464444,Slow


In [17]:
# Classify responses based on a 6-hour threshold
# Your code here
df_311['Response Speed'] = df_311['Response Time (hrs)'].apply(lambda x: 'Fast' if x <= 6 else 'Slow')
df_311[['Created Date', 'Closed Date', 'Response Time (hrs)', 'Response Speed']].head()


Unnamed: 0,Created Date,Closed Date,Response Time (hrs),Response Speed
0,2015-10-05 22:45:00,2015-10-06 02:09:00,3.4,Fast
1,2015-08-30 21:04:52,2015-08-30 23:53:46,2.815,Fast
2,2015-09-06 23:56:00,2015-09-07 07:30:00,7.566667,Slow
3,2015-09-14 11:05:21,2015-09-14 12:52:54,1.7925,Fast
4,2015-09-18 21:53:11,2015-09-19 04:34:07,6.682222,Slow


### Step 7: Average Response Time by Complaint Type

To understand which types of complaints take longer to resolve on average:

- Use `groupby()` on the `Complaint Type` column.
- Calculate the **mean** of `Response Time (hrs)` for each group.

In [18]:
# Your code here
# Using group by on the complaint type column calculate the mean response time
df_311.groupby('Complaint Type')['Response Time (hrs)'].mean()

Complaint Type
Animal Abuse                 5.213205
Bike/Roller/Skate Chronic    3.885261
Blocked Driveway             4.780392
Derelict Vehicle             7.545651
Disorderly Youth             2.977696
Drinking                     3.843890
Graffiti                     7.888535
Homeless Encampment          4.392481
Illegal Fireworks            3.173807
Illegal Parking              4.451517
Noise - Commercial           3.097158
Noise - House of Worship     3.374917
Noise - Park                 3.358954
Noise - Street/Sidewalk      3.410908
Noise - Vehicle              3.720984
Panhandling                  3.318418
Posting Advertisement        1.969223
Squeegee                     1.179167
Traffic                      3.335331
Urinating in Public          3.631113
Vending                      3.941029
Name: Response Time (hrs), dtype: float64

### Step 8: Load and Inspect ZIP Code Information

In this step, you'll load the ZIP code information dataset and inspect its structure. The file name is `zip_code_info.csv`

- How many ZIP codes are listed?
- What columns are available?
- View the first few rows to understand the kind of information it provides (e.g., population, borough).


In [None]:
# Your code here
zip_code_info = pd.read_csv(r'C:\MDA\2025\Term_1\Python for data Analysis (CPSC 610-2)\Week_6\zip_code_info.csv')



191

In [58]:
print('\nColumns available:')
print(zip_code_info.columns.tolist())
print('\nFirst few rows:')
print(zip_code_info.head())
print('\nNumber of unique zip codes:')
print(zip_code_info['zip'].nunique())



Columns available:
['zip', 'Borough', 'city', 'state_id', 'population']

First few rows:
     zip    Borough      city state_id  population
0  10001  MANHATTAN  New York       NY     29079.0
1  10002  MANHATTAN  New York       NY     75517.0
2  10003  MANHATTAN  New York       NY     53825.0
3  10004  MANHATTAN  New York       NY      3875.0
4  10005  MANHATTAN  New York       NY      9238.0

Number of unique zip codes:
191


### Step 9: Find Complaints in the Most Populated ZIP Code

To understand how complaints are distributed in areas with dense populations:

1. Identify the ZIP code with the **highest population** from `zip_info`.
2. Filter the complaints dataframe to include only those that occurred in that ZIP.
3. Get the total number of complaints in that zip code.
4. Display a few sample complaints from that ZIP.

In [60]:
# Your code here
highest_population_zip = zip_code_info.loc[zip_code_info['population'].idxmax()]
print('\nZip code with the highest population:')
print(highest_population_zip)
complaints_11368 = zip_code_info[zip_code_info['zip'] == 11368]
print('\n Only complaints for zip code 11368:')
print(complaints_11368.head())
print('\nNumber of complaints for zip code 11368:')
print(complaints_11368.shape[0])


Zip code with the highest population:
zip              11368
Borough         QUEENS
city            Corona
state_id            NY
population    107060.0
Name: 153, dtype: object

 Only complaints for zip code 11368:
       zip Borough    city state_id  population
153  11368  QUEENS  Corona       NY    107060.0

Number of complaints for zip code 11368:
1


### Step 10: Merge Complaint Data with ZIP Code Demographics

We want to enrich the 311 dataset (`df_311`) with additional info from `zip_info`, such as population and borough name.

Here's what we do:

- Use `.merge()` to combine `df_311` and `zip_info`.
- Match `Incident Zip` from the 311 data with `zip` from the ZIP info.
- Use a **left join** to:
  - Keep all rows from `df_311` (even if some ZIPs don’t match).
  - Add population and borough data **only where a match exists**.

After merging:
- You'll see all columns from `zip_info`.
- The `zip` column (from `zip_info`) becomes redundant — it duplicates `Incident Zip` — so drop it.

In [64]:
# Your code here
df_311_merged = pd.merge(df_311, zip_code_info, left_on='Incident Zip', right_on='zip', how='left')
df_311_merged = df_311_merged.drop(columns=['zip'])
df_311_merged.head()


Unnamed: 0,Unique Key,Created Date,Closed Date,Complaint Type,Descriptor,Location Type,Incident Zip,Incident Address,Street Name,Cross Street 1,...,Resolution Description,Resolution Action Updated Date,Community Board,Resolution Time,Response Time (hrs),Response Speed,Borough,city,state_id,population
0,31685705,2015-10-05 22:45:00,2015-10-06 02:09:00,Blocked Driveway,No Access,Street/Sidewalk,11373,43-31 ELBERTSON STREET,ELBERTSON STREET,LAMONT AVENUE,...,The Police Department responded to the complai...,2015-10-06 02:09:00,04 QUEENS,0 days 03:24:00,3.4,Fast,QUEENS,Elmhurst,NY,99433.0
1,31426484,2015-08-30 21:04:52,2015-08-30 23:53:46,Noise - Street/Sidewalk,Loud Talking,Street/Sidewalk,10030,100 WEST 141 STREET,WEST 141 STREET,LENOX AVENUE,...,The Police Department responded to the complai...,2015-08-30 23:53:46,10 MANHATTAN,0 days 02:48:54,2.815,Fast,MANHATTAN,New York,NY,30781.0
2,31473909,2015-09-06 23:56:00,2015-09-07 07:30:00,Noise - Street/Sidewalk,Loud Music/Party,Street/Sidewalk,10034,73 VERMILYEA AVENUE,VERMILYEA AVENUE,ACADEMY STREET,...,The Police Department responded to the complai...,2015-09-07 07:30:00,12 MANHATTAN,0 days 07:34:00,7.566667,Slow,MANHATTAN,New York,NY,39037.0
3,31530153,2015-09-14 11:05:21,2015-09-14 12:52:54,Illegal Parking,Posted Parking Sign Violation,Street/Sidewalk,11385,,,,...,The Police Department responded and upon arriv...,2015-09-14 12:52:54,05 QUEENS,0 days 01:47:33,1.7925,Fast,BROOKLYN,Ridgewood,NY,103865.0
4,31530153,2015-09-14 11:05:21,2015-09-14 12:52:54,Illegal Parking,Posted Parking Sign Violation,Street/Sidewalk,11385,,,,...,The Police Department responded and upon arriv...,2015-09-14 12:52:54,05 QUEENS,0 days 01:47:33,1.7925,Fast,QUEENS,Ridgewood,NY,103865.0


### Step 11: Map Borough Names to Abbreviations

To simplify analysis and plots, we'll map full borough names to shorter codes using a dictionary.

- Use the `.map()` function on the `Borough` column.
- Provide a dictionary where keys are full names and values are abbreviations.
- Store the result in a new column called `Borough Code`.

Example:
- `"MANHATTAN"` → `"MH"`
- `"BROOKLYN"` → `"BK"`

In [69]:
# Your code here
borough_map = {'MANHATTAN': 'MH', 'BROOKLYN': 'BK', 'QUEENS': 'QN', 'BRONX': 'BX', 'STATEN ISLAND': 'SI'}
df_311_merged['Borough Code'] = df_311_merged['Borough'].map(borough_map)
df_311_merged.head()


Unnamed: 0,Unique Key,Created Date,Closed Date,Complaint Type,Descriptor,Location Type,Incident Zip,Incident Address,Street Name,Cross Street 1,...,Resolution Action Updated Date,Community Board,Resolution Time,Response Time (hrs),Response Speed,Borough,city,state_id,population,Borough Code
0,31685705,2015-10-05 22:45:00,2015-10-06 02:09:00,Blocked Driveway,No Access,Street/Sidewalk,11373,43-31 ELBERTSON STREET,ELBERTSON STREET,LAMONT AVENUE,...,2015-10-06 02:09:00,04 QUEENS,0 days 03:24:00,3.4,Fast,QUEENS,Elmhurst,NY,99433.0,QN
1,31426484,2015-08-30 21:04:52,2015-08-30 23:53:46,Noise - Street/Sidewalk,Loud Talking,Street/Sidewalk,10030,100 WEST 141 STREET,WEST 141 STREET,LENOX AVENUE,...,2015-08-30 23:53:46,10 MANHATTAN,0 days 02:48:54,2.815,Fast,MANHATTAN,New York,NY,30781.0,MH
2,31473909,2015-09-06 23:56:00,2015-09-07 07:30:00,Noise - Street/Sidewalk,Loud Music/Party,Street/Sidewalk,10034,73 VERMILYEA AVENUE,VERMILYEA AVENUE,ACADEMY STREET,...,2015-09-07 07:30:00,12 MANHATTAN,0 days 07:34:00,7.566667,Slow,MANHATTAN,New York,NY,39037.0,MH
3,31530153,2015-09-14 11:05:21,2015-09-14 12:52:54,Illegal Parking,Posted Parking Sign Violation,Street/Sidewalk,11385,,,,...,2015-09-14 12:52:54,05 QUEENS,0 days 01:47:33,1.7925,Fast,BROOKLYN,Ridgewood,NY,103865.0,BK
4,31530153,2015-09-14 11:05:21,2015-09-14 12:52:54,Illegal Parking,Posted Parking Sign Violation,Street/Sidewalk,11385,,,,...,2015-09-14 12:52:54,05 QUEENS,0 days 01:47:33,1.7925,Fast,QUEENS,Ridgewood,NY,103865.0,QN


### Step 12: Calculate Total Complaints and Complaints per 1,000 Residents

We want to understand how complaint volume compares across ZIP codes, adjusting for population size.

#### Goal:
- Count how many 311 complaints were made **per ZIP code**.
- Normalize by population to get **complaints per 1,000 residents**. Complaints per 1000 is calculated as: total complaints divided by population, then multiplied by 1000.

#### How?
1. Use `.groupby()` to group data by ZIP.
2. Use `.agg()` to compute:
   - The total number of complaints using `count` on the `Unique Key`.
   - The population using the `first` non-null value from the `population` column.
   
   You can pass this as a dictionary with:
   ```python
   .agg({'column name 1': 'aggregation function 1', 'column name 2': 'aggregation function 2', ...})


In [None]:
# Your code here
# Calculate total complaints and complaints per 1,000 residents by ZIP code
complaints_by_zip = df_311_merged.groupby('Incident Zip').agg({
    'Unique Key': 'count',
    'population': 'first'
}).rename(columns={'Unique Key': 'Total Complaints'})

# Calculate complaints per 1,000 residents
complaints_by_zip['Complaints per 1000'] = (complaints_by_zip['Total Complaints'] / complaints_by_zip['population'])
# Display all dataframe
complaints_by_zip


Unnamed: 0_level_0,Total Complaints,population,Complaints per 1000
Incident Zip,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
10000,24,,
10001,380,29079.0,0.013068
10002,857,75517.0,0.011348
10003,811,53825.0,0.015067
10004,128,3875.0,0.033032
...,...,...,...
11691,218,68704.0,0.003173
11692,75,23247.0,0.003226
11693,113,13066.0,0.008648
11694,218,21430.0,0.010173


### Step 13: How to JOIN the DataFrames?

- `.join()` is mainly used to **combine two DataFrames by their index**.
- It's best used when both DataFrames have a meaningful index (e.g., ZIP code).

For this question, you already have a DataFrame:
- `zip_info`: Demographic data per ZIP.

And you create another one called `complaints_by_zip`
- `zip_summary`: Total complaints per ZIP

**Index both of them** with zip column.

**Task:**
1. Use `.join()` to perform a **left join** to add demographic info only for ZIPs that appear in the complaint summary.
2. Use `.join()` again with **how='right'** to include **all ZIPs**, even those with no complaints.
3. Compare the size of the two results.

In [86]:
# Your code here
# zip_summary = ?
# zip_info_indexed = ?
# Step 13: How to JOIN the DataFrames?

# Set index to ZIP code for both DataFrames
zip_info_indexed = zip_code_info.set_index('zip')
zip_summary = complaints_by_zip

# 1. Left join: only ZIPs in complaint summary
left_joined = zip_summary.join(zip_info_indexed, how='left', lsuffix='_complaints', rsuffix='_zipinfo')
print("Left join shape:", left_joined.shape)

# 2. Right join: all ZIPs, even those with no complaints
right_joined = zip_summary.join(zip_info_indexed, how='right', lsuffix='_complaints', rsuffix='_zipinfo')
print("Right join shape:", right_joined.shape)

# 3. Compare the size of the two results
print("Number of rows in left join:", left_joined.shape[0])
print("Number of rows in right join:", right_joined.shape[0])


Left join shape: (195, 7)
Right join shape: (196, 7)
Number of rows in left join: 195
Number of rows in right join: 196


### Step 14: Analyze Complaint Types Across Boroughs with a Pivot Table

To summarize and compare complaint volumes across boroughs and types, we use a **pivot table**. Pivot tables are used when you want to **summarize grouped data** in a 2D format for easy comparison.

#### What is a pivot table?
A pivot table reshapes data:
- **Rows (`index`)** represent categories you want to group by (e.g., `Borough Code`).
- **Columns (`columns`)** represent subcategories (e.g., `Complaint Type`).
- **Values (`values`)** are the numbers you want to compute (e.g., count of complaints).
- **aggfunc** defines what to compute: `count`, `sum`, `mean`, etc.

####  In this case:
- We group complaints by **borough code**.
- For each borough, we count the number of complaints of each **type**.
- We use `fill_value=0` to replace missing combinations with zero (no complaints of that type).

In [87]:
# Your code here
complaints_by_borough_type = df_311_merged.groupby(['Borough Code', 'Complaint Type']).size().unstack(fill_value=0)
complaints_by_borough_type

Complaint Type,Animal Abuse,Bike/Roller/Skate Chronic,Blocked Driveway,Derelict Vehicle,Disorderly Youth,Drinking,Graffiti,Homeless Encampment,Illegal Fireworks,Illegal Parking,...,Noise - House of Worship,Noise - Park,Noise - Street/Sidewalk,Noise - Vehicle,Panhandling,Posting Advertisement,Squeegee,Traffic,Urinating in Public,Vending
Borough Code,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
BK,749,42,8945,1698,26,74,19,267,21,8971,...,109,462,4171,1611,14,14,0,335,39,161
BX,454,5,3873,581,12,47,0,81,10,2357,...,26,180,2709,1018,3,7,0,104,14,92
MH,422,73,771,177,24,89,8,876,8,3726,...,56,398,6201,1641,69,8,1,450,66,730
QN,611,18,10060,2551,21,113,15,172,16,6955,...,124,196,1678,872,13,10,0,383,49,159
SI,156,1,645,529,9,64,1,21,3,1433,...,4,17,231,123,7,154,0,50,6,6


### Step 15: Convert Pivot Table to Long Format with `melt()`

We previously created a **pivot table** to compare complaint counts by borough and type. Now, we'll use `melt()` to convert that wide-format table back into a **long-format** DataFrame.


#### What is `melt()`?

- `melt()` is used to **unpivot** or **flatten** a DataFrame.
- It turns columns into rows, which is useful when:
  - You want to plot or analyze categorical data more easily.
  - You need a **tidy format**: one row per observation.


#### In this case:
- `id_vars='Borough Code'` means we'll keep that column as-is.
- All other columns (complaint types) become values in a new column: `'Complaint Type'`.
- Their counts go into `'Complaint Count'`.

This is useful for **grouped bar plots**, heatmaps, or exporting clean data.

In [91]:
# Your code here
complaints_long = complaints_by_borough_type.reset_index().melt(
    id_vars='Borough Code',
    var_name='Complaint Type',
    value_name='Complaint Count'
)
complaints_long


Unnamed: 0,Borough Code,Complaint Type,Complaint Count
0,BK,Animal Abuse,749
1,BX,Animal Abuse,454
2,MH,Animal Abuse,422
3,QN,Animal Abuse,611
4,SI,Animal Abuse,156
...,...,...,...
100,BK,Vending,161
101,BX,Vending,92
102,MH,Vending,730
103,QN,Vending,159


### Step 16: Save Complaint Summary and ZIP Info for Visualization

We've computed the **total complaints and complaints per 1,000 residents** by ZIP code (`complaints_by_zip`), we’ll save this summary to a file for future use in visualizations (e.g., maps, bar charts).

Do these:
- Save `complaints_by_zip` to a **CSV file** for general use. Remember to deal with the index.
- Also use the given code to export both `complaints_by_zip` and the full `zip_info` DataFrame to an **Excel file** with two separate sheets.

**Note:** To use `pd.ExcelWriter` for saving `.xlsx` files, you need to have `openpyxl` library installed. If you get an error for that code cell, you can install it using the command:  
`pip install openpyxl`


In [92]:
# Save complaints_by_zip to CSV
complaints_by_zip.to_csv("complaints_by_zip.csv", index=False)



In [93]:
# Save both DataFrames to separate sheets in one Excel file
with pd.ExcelWriter("zip_complaints_summary.xlsx") as writer:
    complaints_by_zip.to_excel(writer, sheet_name="Complaint Summary", index=False)
    zip_code_info.to_excel(writer, sheet_name="ZIP Info", index=False)

# Bonus Section: Performance Comparison: Vectorized Operations vs. apply() vs. iterrows()

When working with pandas, **how** you write your operations can make a huge difference in performance — especially on large datasets.

In this example, we compare three common ways to assign a new column based on `Response Time (hrs)`. We have already solved this problem but let's revisit it:

**Question was**: Label each row as `'Fast'` if the response time is 6 hours or less, and `'Slow'` otherwise.

You can solve this in different ways:

#### 1. `apply()` with lambda as we did
#### 2. `np.where` – Vectorized
#### 3. `iterrows()` – Avoid if Possible. This is like a for loop on rows of the dataframe

**time.time() is used to compute the time taken to compute a block of code**

`np.where` works like this and you can assign it to a column:
```python
np.where(df['col'] <= x, 'A', 'B')

`iterrows()` loops over a DataFrame row by row, returning each row as a (index, Series) pair.

Use only when absolutely necessary (e.g., complex logic that can't be vectorized) as this is very time-consuming.

```python
for index, row in df.iterrows():
    # Access values like a dictionary
    value = row['ColumnName']
    # Perform operations here


In [94]:
import time
import numpy as np

# 1. Vectorized with np.where
start = time.time()
df_311['Speed Category'] = np.where(df_311['Response Time (hrs)'] <= 6, 'Fast', 'Slow')
print("np.where time:", round(time.time() - start, 4), "seconds")

# 2. apply() with lambda
start = time.time()
df_311['Speed Category'] = df_311['Response Time (hrs)'].apply(lambda x: 'Fast' if x <= 6 else 'Slow')
print("apply() time:", round(time.time() - start, 4), "seconds")

# 3. iterrows()
start = time.time()
speed_labels = []
for _, row in df_311.iterrows():
    speed_labels.append('Fast' if row['Response Time (hrs)'] <= 6 else 'Slow')
df_311['Speed Category'] = speed_labels
print("iterrows() time:", round(time.time() - start, 4), "seconds")

np.where time: 0.0182 seconds
apply() time: 0.0614 seconds
iterrows() time: 2.6337 seconds


### A randomly generated DataFrame
to show the difference of vectorized operations better

In [95]:
import pandas as pd
import numpy as np
import time

# Create a large DataFrame
df_test = pd.DataFrame({'value': np.arange(1000000)})

# 1. Vectorized
start = time.time()
df_test['squared_vec'] = df_test['value'] ** 2
print("Vectorized time:", round(time.time() - start, 4), "seconds")

# 2. apply() with lambda
start = time.time()
df_test['squared_apply'] = df_test['value'].apply(lambda x: x ** 2)
print("apply() time:", round(time.time() - start, 4), "seconds")

# 3. iterrows()
start = time.time()
squared = []
for _, row in df_test.iterrows():
    squared.append(row['value'] ** 2)
df_test['squared_iterrows'] = squared
print("iterrows() time:", round(time.time() - start, 4), "seconds")


Vectorized time: 0.0254 seconds
apply() time: 0.3908 seconds
iterrows() time: 16.6409 seconds
