#### Analysis of "hazardous" and "immediately hazardous" housing maintenance code violations (Jan. 1, 2018 to Dec. 31, 2022)

Data was downloaded from [NYC Open Data](https://data.cityofnewyork.us/Housing-Development/Housing-Maintenance-Code-Violations/wvxf-dwi5/explore/query/SELECT%0A%20%20%60violationid%60%2C%0A%20%20%60buildingid%60%2C%0A%20%20%60registrationid%60%2C%0A%20%20%60boroid%60%2C%0A%20%20%60boro%60%2C%0A%20%20%60housenumber%60%2C%0A%20%20%60lowhousenumber%60%2C%0A%20%20%60highhousenumber%60%2C%0A%20%20%60streetname%60%2C%0A%20%20%60streetcode%60%2C%0A%20%20%60zip%60%2C%0A%20%20%60apartment%60%2C%0A%20%20%60story%60%2C%0A%20%20%60block%60%2C%0A%20%20%60lot%60%2C%0A%20%20%60class%60%2C%0A%20%20%60inspectiondate%60%2C%0A%20%20%60approveddate%60%2C%0A%20%20%60originalcertifybydate%60%2C%0A%20%20%60originalcorrectbydate%60%2C%0A%20%20%60newcertifybydate%60%2C%0A%20%20%60newcorrectbydate%60%2C%0A%20%20%60certifieddate%60%2C%0A%20%20%60ordernumber%60%2C%0A%20%20%60novid%60%2C%0A%20%20%60novdescription%60%2C%0A%20%20%60novissueddate%60%2C%0A%20%20%60currentstatusid%60%2C%0A%20%20%60currentstatus%60%2C%0A%20%20%60currentstatusdate%60%2C%0A%20%20%60novtype%60%2C%0A%20%20%60violationstatus%60%2C%0A%20%20%60rentimpairing%60%2C%0A%20%20%60latitude%60%2C%0A%20%20%60longitude%60%2C%0A%20%20%60communityboard%60%2C%0A%20%20%60councildistrict%60%2C%0A%20%20%60censustract%60%2C%0A%20%20%60bin%60%2C%0A%20%20%60bbl%60%2C%0A%20%20%60nta%60/page/filter) as a .csv file

The data was filtered for selected years and for just "Class B and C" violations, on the NYC Open Data portal. Class B and C violations indicate conditions considered to be hazardous to the health and safety of tenants [(NYC Housing Preservation and Development)](https://www.nyc.gov/site/hpd/services-and-information/penalties-and-fees.page) 

The analysis compares the average number of violations per building citywide with complaints related to 13 buildings tied to Robert Izsak.  

In [1]:
import pandas as pd

In [2]:
dtypes = {
    "ViolationID":"str",
    "BuildingID":"str"
    
}


housing_data = pd.read_csv(
    "../data/Housing_Maintenance_Code_Violations-6.csv",
      dtype=dtypes,
      parse_dates = ['ApprovedDate', 'OriginalCertifyByDate',
       'OriginalCorrectByDate', 'NewCertifyByDate', 'NewCorrectByDate',
       'CertifiedDate',]
)

  housing_data = pd.read_csv(


In [3]:
housing_data

Unnamed: 0,ViolationID,BuildingID,RegistrationID,BoroID,Borough,HouseNumber,LowHouseNumber,HighHouseNumber,StreetName,StreetCode,...,ViolationStatus,RentImpairing,Latitude,Longitude,CommunityBoard,CouncilDistrict,CensusTract,BIN,BBL,NTA
0,13957715,146093,348239,3,BROOKLYN,225,225,225,23 STREET,7480,...,Close,N,40.660628,-73.995235,7.0,38.0,145.0,3009467.0,3.006460e+09,Sunset Park West
1,13975317,119085,211560,2,BRONX,1890,1890,1890,WALTON AVENUE,72720,...,Close,N,40.849519,-73.909293,5.0,14.0,23301.0,2007904.0,2.028270e+09,Mount Hope
2,14018854,744424,500617,5,STATEN ISLAND,192,192,192,HEBERTON AVENUE,32450,...,Open,N,40.635605,-74.132229,1.0,49.0,207.0,5023964.0,5.010210e+09,Port Richmond
3,13998721,43514,105846,1,MANHATTAN,552,552,554,WEST 185 STREET,37450,...,Close,N,40.851277,-73.931018,12.0,10.0,269.0,1063766.0,1.021570e+09,Washington Heights North
4,13998722,43514,105846,1,MANHATTAN,552,552,554,WEST 185 STREET,37450,...,Close,N,40.851277,-73.931018,12.0,10.0,269.0,1063766.0,1.021570e+09,Washington Heights North
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
784363,15611001,939397,431260,4,QUEENS,148-15,148-13,148-19,90 AVENUE,18140,...,Close,N,40.703307,-73.806940,12.0,24.0,240.0,,,Jamaica
784364,15689756,50311,212324,2,BRONX,289,289,289,BONNER PLACE,13520,...,Open,N,40.827652,-73.916712,4.0,16.0,173.0,2001993.0,2.024230e+09,Concourse-Concourse Village
784365,16027755,932377,226443,2,BRONX,507,507,507,EAST 161 STREET,26490,...,Open,N,40.823066,-73.910004,3.0,17.0,141.0,2116648.0,2.023660e+09,Morrisania
784366,16271443,20629,140066,1,MANHATTAN,107,107,131,EAST 126 STREET,19470,...,Open,N,40.805507,-73.937997,11.0,9.0,242.0,1054496.0,1.017750e+09,East Harlem (North)


In [4]:
housing_data.columns

Index(['ViolationID', 'BuildingID', 'RegistrationID', 'BoroID', 'Borough',
       'HouseNumber', 'LowHouseNumber', 'HighHouseNumber', 'StreetName',
       'StreetCode', 'Postcode', 'Apartment', 'Story', 'Block', 'Lot', 'Class',
       'InspectionDate', 'ApprovedDate', 'OriginalCertifyByDate',
       'OriginalCorrectByDate', 'NewCertifyByDate', 'NewCorrectByDate',
       'CertifiedDate', 'OrderNumber', 'NOVID', 'NOVDescription',
       'NOVIssuedDate', 'CurrentStatusID', 'CurrentStatus',
       'CurrentStatusDate', 'NovType', 'ViolationStatus', 'RentImpairing',
       'Latitude', 'Longitude', 'CommunityBoard', 'CouncilDistrict',
       'CensusTract', 'BIN', 'BBL', 'NTA'],
      dtype='object')

In [5]:
housing_data["ApprovedDate"].describe()

count                           784368
mean     2020-10-12 13:09:25.852762368
min                2018-01-02 00:00:00
25%                2019-08-06 00:00:00
50%                2020-12-09 00:00:00
75%                2022-01-18 00:00:00
max                2023-10-14 00:00:00
Name: ApprovedDate, dtype: object

#### Create tallies of violations per building
- Used the function .groupby to determine the count of ViolationID (a unique value for each violation) associated with each BuildingID (the same ID is assigned to each building)
- Used .describe() to view the average count of ViolationIDs associated with each BuildingID

In [6]:
violation_tallies = housing_data.groupby(
    ["BuildingID"]
    )["ViolationID"].count(
   
    ).reset_index(
).sort_values(
        by="ViolationID",
        ascending = False
)

violation_tallies

Unnamed: 0,BuildingID,ViolationID
1606,111945,647
54234,94506,574
22259,344337,566
16240,28946,546
40059,64592,519
...,...,...
42115,676760,1
15972,287586,1
15959,287571,1
15954,287561,1


#### Calculating averages 
- used .describe() to look at citywide averages

In [7]:
violation_tallies.describe()

Unnamed: 0,ViolationID
count,57188.0
mean,13.715605
std,29.169384
min,1.0
25%,2.0
50%,4.0
75%,12.0
max,647.0


#### Filter down data to buildings for the investigation

- created a lambda function to filter out of the citywide data for the specific buildings tied to the landlord Robert Izsak, created a new column titled "izsak_buildings." 

In [8]:
housing_data["izsak_buildings"] = housing_data["BuildingID"].apply(
    lambda x: 
    "323187" == str(x).lower() or 
    "74521" == str(x).lower() or 
    "347340" == str(x).lower() or 
    "142778" == str(x).lower() or 
    "192024" == str(x).lower() or 
    "202731" == str(x).lower() or 
    "301247" == str(x).lower() or 
    "391660" == str(x).lower() or 
    "347632" == str(x).lower() or 
    "256023" == str(x).lower() or 
    "347311" == str(x).lower() or 
    "187832" == str(x).lower() or 
    "206597" == str(x).lower()
)

In [9]:
housing_data["izsak_buildings"].value_counts()

izsak_buildings
False    783778
True        590
Name: count, dtype: int64

In [10]:
housing_subset = housing_data[
    housing_data["izsak_buildings"] == True
    
]

In [11]:
len(housing_subset)

590

In [12]:
housing_subset

Unnamed: 0,ViolationID,BuildingID,RegistrationID,BoroID,Borough,HouseNumber,LowHouseNumber,HighHouseNumber,StreetName,StreetCode,...,RentImpairing,Latitude,Longitude,CommunityBoard,CouncilDistrict,CensusTract,BIN,BBL,NTA,izsak_buildings
649,13556399,323187,337276,3,BROOKLYN,84,84,84,LAWRENCE AVENUE,54630,...,N,40.629599,-73.975769,14.0,44.0,480.0,3127878.0,3.054220e+09,Flatbush,True
913,15183291,74521,202798,2,BRONX,1701,1701,1701,EASTBURN AVENUE,29720,...,N,40.844336,-73.908519,4.0,15.0,22902.0,2007399.0,2.027940e+09,Mount Hope,True
914,15183292,74521,202798,2,BRONX,1701,1701,1701,EASTBURN AVENUE,29720,...,N,40.844336,-73.908519,4.0,15.0,22902.0,2007399.0,2.027940e+09,Mount Hope,True
1024,15387386,323187,337276,3,BROOKLYN,84,84,84,LAWRENCE AVENUE,54630,...,N,40.629599,-73.975769,14.0,44.0,480.0,3127878.0,3.054220e+09,Flatbush,True
1025,15387387,323187,337276,3,BROOKLYN,84,84,84,LAWRENCE AVENUE,54630,...,N,40.629599,-73.975769,14.0,44.0,480.0,3127878.0,3.054220e+09,Flatbush,True
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
782032,12368908,301247,300894,3,BROOKLYN,3111,3101,3111,GLENWOOD ROAD,44430,...,N,40.634082,-73.946602,17.0,45.0,78602.0,3113984.0,3.050060e+09,East Flatbush-Farragut,True
782332,13978532,301247,300894,3,BROOKLYN,3111,3101,3111,GLENWOOD ROAD,44430,...,N,40.634082,-73.946602,17.0,45.0,78602.0,3113984.0,3.050060e+09,East Flatbush-Farragut,True
783390,14363465,347340,315466,3,BROOKLYN,1928,1928,1940,OCEAN AVENUE,67530,...,N,40.614910,-73.954361,14.0,48.0,546.0,3181611.0,3.067570e+09,Midwood,True
783560,15306236,347340,315466,3,BROOKLYN,1928,1928,1940,OCEAN AVENUE,67530,...,N,40.614910,-73.954361,14.0,48.0,546.0,3181611.0,3.067570e+09,Midwood,True



## From the filtered data set, we followed the same steps as listed above for the set of city-wide violations. 

In [13]:
notable_violation_tallies = housing_subset.groupby(
    ["BuildingID"]
    )["ViolationID"].count(
   
    ).reset_index(
).sort_values(
        by="ViolationID",
        ascending = False
)

notable_violation_tallies

Unnamed: 0,BuildingID,ViolationID
7,323187,123
12,74521,94
9,347340,90
0,142778,58
2,192024,56
3,202731,34
6,301247,33
11,391660,29
10,347632,18
5,256023,17


In [14]:
notable_violation_tallies.describe()

Unnamed: 0,ViolationID
count,13.0
mean,45.384615
std,36.516066
min,9.0
25%,17.0
50%,33.0
75%,58.0
max,123.0


In [15]:
violation_tallies.to_csv('../output/citywide_violation_tallies.csv')

In [16]:
notable_violation_tallies.to_csv('../output/izsak_building_violation_tallies.csv')