# Assignment: Data Wrangling

In this assignment, we will practice data wrangling techniques on real world data.

## NYC film permits

New York City is a very popular film/tv shooting location. In order to assert the exclusive usage of city properties (e.g. sidewalk, park, etc.) for this purpose, a permit is required. New York Mayor's Office of Media and Entertainment release all granted [film permits data](https://data.cityofnewyork.us/City-Government/Film-Permits/tg4x-b46p) to the public. In this part, we will do some data wrangling on this data.

Let us first retrieve the data!

In [1]:
import pandas as pd

url = "https://data.cityofnewyork.us/api/views/tg4x-b46p/rows.csv"
    
df = pd.read_csv(url)

Your task for this part is to clean up this data. Specifically, the cleaned up dataframe should have the following properties:
1. All columns must have the correct data type.  In particular, `StartDateTime`, `EndDateTime`, `EnteredOn` should have `datetime64` as data type.
2. The `ZipCode(s)` column may contain multiple zip codes in one cell. In the cleaned up dataframe, we will replace the `ZipCode(s)` column with `ZipCode` column, where each cell in the `ZipCode` column only contains a single zip code. Rows with multiple zip codes in the input data frame should be repeated for each zip code in the output dataframe.
3. The output dataframe should not contain any missing data.

In [2]:
df.head().T

Unnamed: 0,0,1,2,3,4
EventID,696255,714139,705334,746696,717328
EventType,Shooting Permit,Shooting Permit,Shooting Permit,Shooting Permit,Theater Load in and Load Outs
StartDateTime,02/17/2023 09:00:00 AM,05/12/2023 01:00:00 PM,04/10/2023 09:00:00 AM,11/07/2023 06:00:00 AM,05/31/2023 12:01:00 AM
EndDateTime,02/18/2023 12:00:00 PM,05/13/2023 05:00:00 AM,04/10/2023 10:00:00 PM,11/07/2023 10:00:00 PM,06/01/2023 06:00:00 AM
EnteredOn,02/14/2023 10:47:33 PM,05/04/2023 02:27:51 PM,03/30/2023 05:17:29 PM,10/31/2023 12:08:00 PM,05/17/2023 11:55:51 AM
EventAgency,Mayor's Office of Media & Entertainment,Mayor's Office of Media & Entertainment,Mayor's Office of Media & Entertainment,Mayor's Office of Media & Entertainment,Mayor's Office of Media & Entertainment
ParkingHeld,KINGSLAND AVENUE between DEAD END and GREENPOI...,WEST 26 STREET between 12 AVENUE and 11 AVEN...,SOUTH STREET between BROAD STREET and OLD SLIP...,NORTH HENRY STREET between GREENPOINT AVENUE a...,WEST 55 STREET between 11 AVENUE and 12 AVEN...
Borough,Brooklyn,Manhattan,Manhattan,Brooklyn,Manhattan
CommunityBoard(s),1,4,"1, 3",1,4
PolicePrecinct(s),94,10,"1, 5",94,18


In [3]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 7870 entries, 0 to 7869
Data columns (total 14 columns):
 #   Column             Non-Null Count  Dtype 
---  ------             --------------  ----- 
 0   EventID            7870 non-null   int64 
 1   EventType          7870 non-null   object
 2   StartDateTime      7870 non-null   object
 3   EndDateTime        7870 non-null   object
 4   EnteredOn          7870 non-null   object
 5   EventAgency        7870 non-null   object
 6   ParkingHeld        7870 non-null   object
 7   Borough            7870 non-null   object
 8   CommunityBoard(s)  7868 non-null   object
 9   PolicePrecinct(s)  7868 non-null   object
 10  Category           7870 non-null   object
 11  SubCategoryName    7870 non-null   object
 12  Country            7870 non-null   object
 13  ZipCode(s)         7868 non-null   object
dtypes: int64(1), object(13)
memory usage: 860.9+ KB


In [4]:
df["EventType"].value_counts()

Shooting Permit                  6160
Theater Load in and Load Outs    1512
Rigging Permit                    151
DCAS Prep/Shoot/Wrap Permit        47
Name: EventType, dtype: int64

In [5]:
df["StartDateTime"] = pd.to_datetime(df["StartDateTime"])
df["EndDateTime"] = pd.to_datetime(df["EndDateTime"])
df["EnteredOn"] = pd.to_datetime(df["EnteredOn"])

In [6]:
df["EventAgency"].value_counts()

Mayor's Office of Media & Entertainment    7870
Name: EventAgency, dtype: int64

In [7]:
len(df["ParkingHeld"].value_counts())

4214

In [8]:
df["Borough"].value_counts()

Manhattan        3904
Brooklyn         2517
Queens           1102
Bronx             306
Staten Island      41
Name: Borough, dtype: int64

In [9]:
df["CommunityBoard(s)"].value_counts()

1              1851
2              1042
5               909
4               539
7               436
               ... 
11, 2, 6, 8       1
3, 4, 8           1
12, 13            1
1, 3, 6, 7        1
14, 2, 6          1
Name: CommunityBoard(s), Length: 231, dtype: int64

In [10]:
len(df["PolicePrecinct(s)"].value_counts())

538

In [11]:
len(df["Category"].value_counts())

9

In [12]:
len(df["SubCategoryName"].value_counts())

26

In [13]:
df["ZipCode(s)"] = df["ZipCode(s)"].astype(str)

In [14]:
zip_df = df[df['ZipCode(s)'].str.contains(',')]
# zip_df.T

In [15]:
zip_df["ZipCode(s)"] = zip_df["ZipCode(s)"].astype(str)

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  zip_df["ZipCode(s)"] = zip_df["ZipCode(s)"].astype(str)


In [16]:
expanded_rows = []
for index, row in zip_df.iterrows():
    zip_code_list = row["ZipCode(s)"].split(",")

    for zip_code in zip_code_list:
        new_row = row.copy()
        new_row["ZipCode(s)"] = zip_code.strip()
        
        expanded_rows.append(new_row)

expanded_df = pd.DataFrame(expanded_rows)


In [17]:
expanded_df.T

Unnamed: 0,2,2.1,2.2,11,11.1,21,21.1,23,23.1,23.2,...,7858,7859,7859.1,7861,7861.1,7861.2,7866,7866.1,7866.2,7866.3
EventID,705334,705334,705334,744752,744752,703094,703094,730357,730357,730357,...,793845,786257,786257,793865,793865,793865,793926,793926,793926,793926
EventType,Shooting Permit,Shooting Permit,Shooting Permit,Shooting Permit,Shooting Permit,Shooting Permit,Shooting Permit,Shooting Permit,Shooting Permit,Shooting Permit,...,Shooting Permit,Theater Load in and Load Outs,Theater Load in and Load Outs,Theater Load in and Load Outs,Theater Load in and Load Outs,Theater Load in and Load Outs,Shooting Permit,Shooting Permit,Shooting Permit,Shooting Permit
StartDateTime,2023-04-10 09:00:00,2023-04-10 09:00:00,2023-04-10 09:00:00,2023-10-17 09:00:00,2023-10-17 09:00:00,2023-03-24 11:00:00,2023-03-24 11:00:00,2023-07-19 16:00:00,2023-07-19 16:00:00,2023-07-19 16:00:00,...,2024-06-20 06:00:00,2024-06-20 00:01:00,2024-06-20 00:01:00,2024-06-20 00:01:00,2024-06-20 00:01:00,2024-06-20 00:01:00,2024-06-20 09:00:00,2024-06-20 09:00:00,2024-06-20 09:00:00,2024-06-20 09:00:00
EndDateTime,2023-04-10 22:00:00,2023-04-10 22:00:00,2023-04-10 22:00:00,2023-10-17 23:45:00,2023-10-17 23:45:00,2023-03-25 03:00:00,2023-03-25 03:00:00,2023-07-20 02:00:00,2023-07-20 02:00:00,2023-07-20 02:00:00,...,2024-06-20 19:00:00,2024-06-21 06:00:00,2024-06-21 06:00:00,2024-06-23 05:00:00,2024-06-23 05:00:00,2024-06-23 05:00:00,2024-06-20 22:00:00,2024-06-20 22:00:00,2024-06-20 22:00:00,2024-06-20 22:00:00
EnteredOn,2023-03-30 17:17:29,2023-03-30 17:17:29,2023-03-30 17:17:29,2023-10-13 14:34:03,2023-10-13 14:34:03,2023-03-21 10:50:14,2023-03-21 10:50:14,2023-07-14 06:45:54,2023-07-14 06:45:54,2023-07-14 06:45:54,...,2024-06-17 15:59:16,2024-05-21 13:50:20,2024-05-21 13:50:20,2024-06-17 16:39:48,2024-06-17 16:39:48,2024-06-17 16:39:48,2024-06-17 22:24:41,2024-06-17 22:24:41,2024-06-17 22:24:41,2024-06-17 22:24:41
EventAgency,Mayor's Office of Media & Entertainment,Mayor's Office of Media & Entertainment,Mayor's Office of Media & Entertainment,Mayor's Office of Media & Entertainment,Mayor's Office of Media & Entertainment,Mayor's Office of Media & Entertainment,Mayor's Office of Media & Entertainment,Mayor's Office of Media & Entertainment,Mayor's Office of Media & Entertainment,Mayor's Office of Media & Entertainment,...,Mayor's Office of Media & Entertainment,Mayor's Office of Media & Entertainment,Mayor's Office of Media & Entertainment,Mayor's Office of Media & Entertainment,Mayor's Office of Media & Entertainment,Mayor's Office of Media & Entertainment,Mayor's Office of Media & Entertainment,Mayor's Office of Media & Entertainment,Mayor's Office of Media & Entertainment,Mayor's Office of Media & Entertainment
ParkingHeld,SOUTH STREET between BROAD STREET and OLD SLIP...,SOUTH STREET between BROAD STREET and OLD SLIP...,SOUTH STREET between BROAD STREET and OLD SLIP...,PIKE STREET between CHERRY STREET and MONROE S...,PIKE STREET between CHERRY STREET and MONROE S...,EAST 13 STREET between 1 AVENUE and 2 AVENUE...,EAST 13 STREET between 1 AVENUE and 2 AVENUE...,GREENE STREET between GRAND STREET and BROOME ...,GREENE STREET between GRAND STREET and BROOME ...,GREENE STREET between GRAND STREET and BROOME ...,...,LENOX AVENUE between WEST 119 STREET and WEST...,EAST 11 STREET between 3 AVENUE and 4 AVENUE...,EAST 11 STREET between 3 AVENUE and 4 AVENUE...,WEST 31 STREET between 7 AVENUE and 8 AVENUE...,WEST 31 STREET between 7 AVENUE and 8 AVENUE...,WEST 31 STREET between 7 AVENUE and 8 AVENUE...,SCHERMERHORN STREET between COURT STREET and B...,SCHERMERHORN STREET between COURT STREET and B...,SCHERMERHORN STREET between COURT STREET and B...,SCHERMERHORN STREET between COURT STREET and B...
Borough,Manhattan,Manhattan,Manhattan,Manhattan,Manhattan,Manhattan,Manhattan,Manhattan,Manhattan,Manhattan,...,Manhattan,Manhattan,Manhattan,Manhattan,Manhattan,Manhattan,Brooklyn,Brooklyn,Brooklyn,Brooklyn
CommunityBoard(s),"1, 3","1, 3","1, 3","1, 3","1, 3",3,3,"2, 3","2, 3","2, 3",...,"10, 11","11, 3","11, 3","4, 5","4, 5","4, 5","2, 3, 4","2, 3, 4","2, 3, 4","2, 3, 4"
PolicePrecinct(s),"1, 5","1, 5","1, 5","5, 7, 94","5, 7, 94",9,9,"1, 5, 7, 9","1, 5, 7, 9","1, 5, 7, 9",...,"25, 28","23, 9","23, 9","10, 14","10, 14","10, 14","79, 81, 83, 84","79, 81, 83, 84","79, 81, 83, 84","79, 81, 83, 84"


In [18]:
clean_df = df[~df['ZipCode(s)'].str.contains(',')]

In [19]:
clean_df.T

Unnamed: 0,0,1,3,4,5,6,7,8,9,10,...,7856,7857,7860,7862,7863,7864,7865,7867,7868,7869
EventID,696255,714139,746696,717328,712928,745314,694231,747840,704967,693518,...,793749,792611,791664,792350,793384,792026,793722,793783,785081,792643
EventType,Shooting Permit,Shooting Permit,Shooting Permit,Theater Load in and Load Outs,Shooting Permit,Theater Load in and Load Outs,Shooting Permit,Theater Load in and Load Outs,Shooting Permit,Shooting Permit,...,Shooting Permit,Shooting Permit,Shooting Permit,Shooting Permit,Rigging Permit,Shooting Permit,Shooting Permit,Theater Load in and Load Outs,Theater Load in and Load Outs,Shooting Permit
StartDateTime,2023-02-17 09:00:00,2023-05-12 13:00:00,2023-11-07 06:00:00,2023-05-31 00:01:00,2023-05-05 07:00:00,2023-10-19 00:01:00,2023-02-10 07:00:00,2023-11-10 06:00:00,2023-04-03 07:00:00,2023-02-02 07:00:00,...,2024-06-20 07:00:00,2024-06-20 10:00:00,2024-06-20 07:00:00,2024-06-20 07:00:00,2024-06-20 07:00:00,2024-06-20 06:00:00,2024-06-19 12:00:00,2024-06-20 00:01:00,2024-06-20 00:01:00,2024-06-20 07:00:00
EndDateTime,2023-02-18 12:00:00,2023-05-13 05:00:00,2023-11-07 22:00:00,2023-06-01 06:00:00,2023-05-05 21:00:00,2023-10-19 23:59:00,2023-02-10 21:00:00,2023-11-10 23:59:00,2023-04-03 21:00:00,2023-02-02 23:00:00,...,2024-06-20 19:00:00,2024-06-21 02:00:00,2024-06-20 19:00:00,2024-06-20 21:00:00,2024-06-20 19:00:00,2024-06-20 21:00:00,2024-06-20 04:00:00,2024-06-24 23:59:00,2024-06-21 06:00:00,2024-06-20 22:00:00
EnteredOn,2023-02-14 22:47:33,2023-05-04 14:27:51,2023-10-31 12:08:00,2023-05-17 11:55:51,2023-05-01 11:09:00,2023-10-17 15:08:32,2023-02-02 17:15:08,2023-11-01 13:33:39,2023-03-29 13:12:23,2023-01-30 11:09:47,...,2024-06-17 12:05:40,2024-06-12 16:00:23,2024-06-09 23:47:19,2024-06-11 20:08:25,2024-06-14 17:34:31,2024-06-11 08:51:15,2024-06-17 11:01:29,2024-06-17 13:21:55,2024-05-16 12:54:50,2024-06-12 17:10:02
EventAgency,Mayor's Office of Media & Entertainment,Mayor's Office of Media & Entertainment,Mayor's Office of Media & Entertainment,Mayor's Office of Media & Entertainment,Mayor's Office of Media & Entertainment,Mayor's Office of Media & Entertainment,Mayor's Office of Media & Entertainment,Mayor's Office of Media & Entertainment,Mayor's Office of Media & Entertainment,Mayor's Office of Media & Entertainment,...,Mayor's Office of Media & Entertainment,Mayor's Office of Media & Entertainment,Mayor's Office of Media & Entertainment,Mayor's Office of Media & Entertainment,Mayor's Office of Media & Entertainment,Mayor's Office of Media & Entertainment,Mayor's Office of Media & Entertainment,Mayor's Office of Media & Entertainment,Mayor's Office of Media & Entertainment,Mayor's Office of Media & Entertainment
ParkingHeld,KINGSLAND AVENUE between DEAD END and GREENPOI...,WEST 26 STREET between 12 AVENUE and 11 AVEN...,NORTH HENRY STREET between GREENPOINT AVENUE a...,WEST 55 STREET between 11 AVENUE and 12 AVEN...,COLGATE AVENUE between STORY AVENUE and LAFAYE...,WEST 55 STREET between 11 AVENUE and 12 AVEN...,22 STREET between 43 AVENUE and QUEENS PLAZA S...,FROST STREET between DEBEVOISE AVENUE and MORG...,EAGLE STREET between FRANKLIN STREET and WEST ...,CALYER STREET between DIAMOND STREET and JEWEL...,...,THIRD AVENUE between EAST 58 STREET and EAST...,COURT SQUARE W between JACKSON AVE. and DEAD E...,WEST 91 STREET between AMSTERDAM AVENUE and ...,"34 AVENUE between 35 STREET and 36 STREET, 35...",BOGART STREET between JOHNSON AVENUE and DEAD END,WEST 44 STREET between 6 AVENUE and 7 AVENUE...,"BUTLER STREET between 3 AVENUE and 4 AVENUE, ...",WEST 35 STREET between 8 AVENUE and 9 AVENUE...,FROST STREET between DEBEVOISE AVENUE and MORG...,COLUMBUS AVENUE between WEST 62 STREET and W...
Borough,Brooklyn,Manhattan,Brooklyn,Manhattan,Bronx,Manhattan,Queens,Brooklyn,Brooklyn,Brooklyn,...,Manhattan,Queens,Manhattan,Queens,Brooklyn,Manhattan,Brooklyn,Manhattan,Brooklyn,Manhattan
CommunityBoard(s),1,4,1,4,9,4,2,1,1,1,...,6,2,7,1,1,5,6,4,1,"4, 7"
PolicePrecinct(s),94,10,94,18,43,18,108,94,94,94,...,17,108,24,114,90,"14, 18",78,14,94,"18, 20"


In [20]:
final_df = pd.concat ([clean_df, expanded_df], ignore_index=True)

In [21]:
final_df.T

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,...,12152,12153,12154,12155,12156,12157,12158,12159,12160,12161
EventID,696255,714139,746696,717328,712928,745314,694231,747840,704967,693518,...,793845,786257,786257,793865,793865,793865,793926,793926,793926,793926
EventType,Shooting Permit,Shooting Permit,Shooting Permit,Theater Load in and Load Outs,Shooting Permit,Theater Load in and Load Outs,Shooting Permit,Theater Load in and Load Outs,Shooting Permit,Shooting Permit,...,Shooting Permit,Theater Load in and Load Outs,Theater Load in and Load Outs,Theater Load in and Load Outs,Theater Load in and Load Outs,Theater Load in and Load Outs,Shooting Permit,Shooting Permit,Shooting Permit,Shooting Permit
StartDateTime,2023-02-17 09:00:00,2023-05-12 13:00:00,2023-11-07 06:00:00,2023-05-31 00:01:00,2023-05-05 07:00:00,2023-10-19 00:01:00,2023-02-10 07:00:00,2023-11-10 06:00:00,2023-04-03 07:00:00,2023-02-02 07:00:00,...,2024-06-20 06:00:00,2024-06-20 00:01:00,2024-06-20 00:01:00,2024-06-20 00:01:00,2024-06-20 00:01:00,2024-06-20 00:01:00,2024-06-20 09:00:00,2024-06-20 09:00:00,2024-06-20 09:00:00,2024-06-20 09:00:00
EndDateTime,2023-02-18 12:00:00,2023-05-13 05:00:00,2023-11-07 22:00:00,2023-06-01 06:00:00,2023-05-05 21:00:00,2023-10-19 23:59:00,2023-02-10 21:00:00,2023-11-10 23:59:00,2023-04-03 21:00:00,2023-02-02 23:00:00,...,2024-06-20 19:00:00,2024-06-21 06:00:00,2024-06-21 06:00:00,2024-06-23 05:00:00,2024-06-23 05:00:00,2024-06-23 05:00:00,2024-06-20 22:00:00,2024-06-20 22:00:00,2024-06-20 22:00:00,2024-06-20 22:00:00
EnteredOn,2023-02-14 22:47:33,2023-05-04 14:27:51,2023-10-31 12:08:00,2023-05-17 11:55:51,2023-05-01 11:09:00,2023-10-17 15:08:32,2023-02-02 17:15:08,2023-11-01 13:33:39,2023-03-29 13:12:23,2023-01-30 11:09:47,...,2024-06-17 15:59:16,2024-05-21 13:50:20,2024-05-21 13:50:20,2024-06-17 16:39:48,2024-06-17 16:39:48,2024-06-17 16:39:48,2024-06-17 22:24:41,2024-06-17 22:24:41,2024-06-17 22:24:41,2024-06-17 22:24:41
EventAgency,Mayor's Office of Media & Entertainment,Mayor's Office of Media & Entertainment,Mayor's Office of Media & Entertainment,Mayor's Office of Media & Entertainment,Mayor's Office of Media & Entertainment,Mayor's Office of Media & Entertainment,Mayor's Office of Media & Entertainment,Mayor's Office of Media & Entertainment,Mayor's Office of Media & Entertainment,Mayor's Office of Media & Entertainment,...,Mayor's Office of Media & Entertainment,Mayor's Office of Media & Entertainment,Mayor's Office of Media & Entertainment,Mayor's Office of Media & Entertainment,Mayor's Office of Media & Entertainment,Mayor's Office of Media & Entertainment,Mayor's Office of Media & Entertainment,Mayor's Office of Media & Entertainment,Mayor's Office of Media & Entertainment,Mayor's Office of Media & Entertainment
ParkingHeld,KINGSLAND AVENUE between DEAD END and GREENPOI...,WEST 26 STREET between 12 AVENUE and 11 AVEN...,NORTH HENRY STREET between GREENPOINT AVENUE a...,WEST 55 STREET between 11 AVENUE and 12 AVEN...,COLGATE AVENUE between STORY AVENUE and LAFAYE...,WEST 55 STREET between 11 AVENUE and 12 AVEN...,22 STREET between 43 AVENUE and QUEENS PLAZA S...,FROST STREET between DEBEVOISE AVENUE and MORG...,EAGLE STREET between FRANKLIN STREET and WEST ...,CALYER STREET between DIAMOND STREET and JEWEL...,...,LENOX AVENUE between WEST 119 STREET and WEST...,EAST 11 STREET between 3 AVENUE and 4 AVENUE...,EAST 11 STREET between 3 AVENUE and 4 AVENUE...,WEST 31 STREET between 7 AVENUE and 8 AVENUE...,WEST 31 STREET between 7 AVENUE and 8 AVENUE...,WEST 31 STREET between 7 AVENUE and 8 AVENUE...,SCHERMERHORN STREET between COURT STREET and B...,SCHERMERHORN STREET between COURT STREET and B...,SCHERMERHORN STREET between COURT STREET and B...,SCHERMERHORN STREET between COURT STREET and B...
Borough,Brooklyn,Manhattan,Brooklyn,Manhattan,Bronx,Manhattan,Queens,Brooklyn,Brooklyn,Brooklyn,...,Manhattan,Manhattan,Manhattan,Manhattan,Manhattan,Manhattan,Brooklyn,Brooklyn,Brooklyn,Brooklyn
CommunityBoard(s),1,4,1,4,9,4,2,1,1,1,...,"10, 11","11, 3","11, 3","4, 5","4, 5","4, 5","2, 3, 4","2, 3, 4","2, 3, 4","2, 3, 4"
PolicePrecinct(s),94,10,94,18,43,18,108,94,94,94,...,"25, 28","23, 9","23, 9","10, 14","10, 14","10, 14","79, 81, 83, 84","79, 81, 83, 84","79, 81, 83, 84","79, 81, 83, 84"


In [22]:
final_df.isna().sum().sum()
final_df.dropna(inplace = True)

In [23]:
final_df.isna().sum().sum()

0

# Once the data frame is clean, please answer the following questions with code:
1. Which borough has the most _unique_ events for television, film and theater respectively?
2. Which zip code has the most _unique_ events for television, film and theater respectively?

In [24]:
final_df.columns

Index(['EventID', 'EventType', 'StartDateTime', 'EndDateTime', 'EnteredOn',
       'EventAgency', 'ParkingHeld', 'Borough', 'CommunityBoard(s)',
       'PolicePrecinct(s)', 'Category', 'SubCategoryName', 'Country',
       'ZipCode(s)'],
      dtype='object')

In [25]:
final_df["Category"].value_counts()


Television           5515
Theater              2149
Film                 1608
Commercial           1249
Still Photography     903
WEB                   535
Documentary            96
Student                53
Music Video            52
Name: Category, dtype: int64

In [26]:
# entertainment = final_df[final_df["Category"].isin(["Television", "Theater", "Film"])]

# # len(entertainment["EventID"].unique()
# # print (entertainment.shape)

# entertainment = entertainment.drop_duplicates(subset = ["EventID"])

# # entertainment.groupby("Borough")
# entertainment.groupby("Borough").count()

# # de = abc [abc["Category"] == "Brooklyn" ] 
# # entertainment.shape

# # group = entertainment.groupby("Borough")


In [27]:
entertainment = final_df[final_df["Category"].isin(["Television", "Theater", "Film"])]

entertainment = entertainment.drop_duplicates(subset = ["EventID"])

borough_event_counts = entertainment.groupby(["Borough", "Category"])["EventID"].nunique()

In [28]:
borough_event_counts

Borough        Category  
Bronx          Film            44
               Television     223
               Theater          3
Brooklyn       Film           360
               Television    1199
               Theater        343
Manhattan      Film           416
               Television    1310
               Theater       1190
Queens         Film           122
               Television     874
Staten Island  Film            28
               Television       5
Name: EventID, dtype: int64

In [29]:
boro_tv = borough_event_counts.xs("Television", level = "Category").idxmax()
tv_c= borough_event_counts.xs("Television", level = "Category").max()

boro_f = borough_event_counts.xs("Film", level = "Category").idxmax()
f_c= borough_event_counts.xs("Film", level = "Category").max()

boro_th = borough_event_counts.xs("Theater", level = "Category").idxmax()
th_c= borough_event_counts.xs("Theater", level = "Category").max()

print (f"Borough with highest number of TELEVISION events is {boro_tv} with {tv_c} events.")
print (f"Borough with highest number of FILM events is {boro_f} with {f_c} events.")
print (f"Borough with highest number of Theater events is {boro_th} with {th_c} events.")

Borough with highest number of TELEVISION events is Manhattan with 1310 events.
Borough with highest number of FILM events is Manhattan with 416 events.
Borough with highest number of Theater events is Manhattan with 1190 events.


In [30]:
entertainment2 = final_df[final_df["Category"].isin(["Television", "Film", "Theater"])]
entertainment2 = entertainment2.drop_duplicates(subset="EventID")

# entertainment2["ZipCode(s)"] = entertainment2["ZipCode(s)"].astype(int)
zip_event_counts = entertainment2.groupby(["ZipCode(s)", "Category"])["EventID"].nunique()



zip_event_counts

ZipCode(s)  Category  
0           Film           1
            Television     1
00083       Film           1
            Television     1
10001       Film          26
                          ..
11435       Television     4
11691       Television     1
11693       Television     1
11694       Film           4
            Television     2
Name: EventID, Length: 270, dtype: int64

In [31]:
zip_tv = zip_event_counts.xs("Television", level = "Category" ).idxmax()
tv_c2 = zip_event_counts.xs("Television", level = "Category" ).max()

zip_f = zip_event_counts.xs("Film", level = "Category" ).idxmax()
f_c2 = zip_event_counts.xs("Film", level = "Category" ).max()

zip_th = zip_event_counts.xs("Theater", level = "Category" ).idxmax()
th_c2 = zip_event_counts.xs("Theater", level = "Category" ).max()


In [32]:
print (f"Borough with highest number of TELEVISION events is {boro_tv} with {tv_c} events.")
print (f"Borough with highest number of FILM events is {boro_f} with {f_c} events.")
print (f"Borough with highest number of Theater events is {boro_th} with {th_c} events.")
print()

print (f"ZipCode with highest number of TELEVISION events is {zip_tv} with {tv_c2} events.")
print (f"ZipCode with highest number of FILM events is {zip_f} with {f_c2} events.")
print (f"ZipCode with highest number of Theater events is {zip_th} with {th_c2} events.")


Borough with highest number of TELEVISION events is Manhattan with 1310 events.
Borough with highest number of FILM events is Manhattan with 416 events.
Borough with highest number of Theater events is Manhattan with 1190 events.

ZipCode with highest number of TELEVISION events is 11222 with 778 events.
ZipCode with highest number of FILM events is 10002 with 71 events.
ZipCode with highest number of Theater events is 10019 with 295 events.


## Answer

### Student name: Rohan Gore (N19332535)

### Your answer:
1. The borough with the most unique TV events is ...
    Answer: Manhattan
   
2. The borough with the most unique film events is ...
    Answer: Manhattan

3. The borough with the most unique theater events is ...
    Answer: Manhattan


4. The zip code with the most unique TV events is ...
    Answer: 11222

5. The zip code with the most unique film events is ...
    Answer: 10002
 
6. The zip code with the most unique theater events is ...
    Answer: 10019


In [33]:
print("GG")

GG
