# Exploring NYCDOH's Violations Dataset

Lucky, a software engineer from Arizona is looking to move to New York! They're looking for an apartment to rent, and in doing so are looking for neighborhoods to avoid. They've asked us to help them with their research, to help narrow down their streeteasy filters, as... well they're a fan of hot water and really hate rodents. 

As a result, we're getting two birds with one stone:

 We get to work on our data wrangling and data visualization skills, and also help out a friend in need! So let's dive in!

In [17]:
import pandas as pd
import hvplot.pandas  
import geopandas as gpd

import plotly.express as px 

import panel as pn
pn.extension()
import holoviews as hv
hv.extension('bokeh')
import warnings
warnings.filterwarnings('ignore')

In [3]:
#uncomment and run for cleaned CSV file

# import data_cleaning
# closed_violations = data_cleaning.clean('Housing_Maintenance_Code_Violations.csv')
# closed_violations.to_csv('Housing_Maintenance_CV_cleaned.csv')

In [6]:
closed_violations = pd.read_csv('Housing_Maintenance_CV_cleaned.csv')

### Looking Through Time:

Inserting blurb about looking at things over time

To first help Lucky, we want to look at violations over a period of time which, in our current case, we'll go with violations past the year 2000.
This will help us look at trends and allow us to filter things by year, and hopefully filter out some bad apartments!

We'll primarily be manipulating two columns to help us in our analysis: InspectionDate and OriginalCorrectByDate

**InspectionDate** column is when the violation was observed  <br>
**OriginalCorrectByDate** is when the owner was expected to correct the violation by

Out of those two, we'll be creating two new columns:

**ViolationLength**: The length of time from Inspection date to the Date that owner should have corrected their violation <br>
**ViolationYear**: The year in which the Inspection was given

This will help us with our visualizations and plotting things over time

Our first visualization is going to be showing the amount of violations over time! 

#### Length of Violations on Average per Year by Class

In [7]:
violation_length = closed_violations.groupby(['Class','ViolationYear'])['ViolationLength'].mean().reset_index().round(2)
vio_length = list(violation_length.Class.unique())


In [8]:
first_select = pn.widgets.Select(name='Select Class:', value='A', options= vio_length)

@pn.depends(first_select)
def violation_length_plot(vio_length):
  return violation_length[violation_length['Class']==vio_length].hvplot('ViolationYear','ViolationLength',kind='line',yformatter='%.0f', color="#EF830F", title='Average Violation Lengths by Year (and Class)')

This data follows expectations: On average, the timeframe given to resolve complaints is larger for Class A Violations (Least Serious) than for Class C Violations (Most Serious)

## Class Violations by Borough
Now that we've explored violations over time, let's look at violations within boroughs.
Lucky is primarily looking to live in Brooklyn, as it's close to their friends and has a lot of cool coffee shops and climbing gyms.
But before we deep dive into the Brooklyn areas, let's get the bigger picture

In [9]:
violations_by_borough = closed_violations.groupby(['Class','Borough'])['ViolationID'].count().reset_index(name='Count')
class_violations = list(violations_by_borough.Class.unique())


In [10]:
second_select = pn.widgets.Select(name='Select Class:', value='C', options= class_violations)


@pn.depends(second_select)
def violations_by_borough_plot(class_violations):
  return violations_by_borough[violations_by_borough['Class']==class_violations].hvplot('Borough','Count',kind='bar',yformatter='%.0f', color="#ff6f69", title='Violations by Borough (and Class)')

In [11]:
uZip = pn.widgets.TextInput(name="Enter the zipcode", value='11213', width=100)

@pn.depends(uZip)
def plotly_vioMap(paramZip):
  postcode_Violations = closed_violations[closed_violations['Postcode'] == paramZip]

  figMapbox = px.scatter_mapbox(
                          postcode_Violations,
                          lat=postcode_Violations.Latitude,
                          lon=postcode_Violations.Longitude,
                          color='Class',
                          hover_name="Address",
                          hover_data=['Class'],
                          opacity=.2,
                          mapbox_style='carto-positron',
                          zoom=13)

  return figMapbox

In [18]:
class_c_violations = closed_violations[closed_violations['Class']=='C'] 
closed_violations['Address']=closed_violations['HouseNumber']+ ' '+ closed_violations['StreetName']

ntaVIO= class_c_violations.groupby(['NTA'])['BuildingID'].count().reset_index()
ntaVIO.rename(columns={'BuildingID':'Count'}, inplace=True)
ntaShape = gpd.read_file("NTA map.geojson")

In [19]:
# step 1: get the list of all number of violations
vioAMOUNT= ntaVIO[ntaVIO['Count']<= 1000000]

# Step 2: Now groupby this data by neightborhood .sum() to get a count by NTA
vioTotalAmt = vioAMOUNT.groupby(['NTA'])['Count'].sum().reset_index() # resetting to return df

#3 Merge the count by NTA/COUNT dataset and nta SHAPE files on NTA field

vioAmtShape = pd.merge(ntaShape,vioTotalAmt, 
                      how='inner', 
                      left_on='ntaname', right_on='NTA')

#set index of merged DF to zipcode field
vioAmtShape.set_index("NTA", inplace=True)


# Use the choropleth_mapbox and it's attributes to set desired visual properties
figVioPxChoro = px.choropleth_mapbox(vioAmtShape,
                          geojson=vioAmtShape.geometry,
                          locations=vioAmtShape.index,
                          color="Count",
                          color_continuous_scale=px.colors.sequential.Teal, 
                          
                          center={"lat": 40.754932, "lon": -73.984016}, 
                          mapbox_style="carto-positron",
                          zoom=9)

In [26]:
# Add the year slider with ranges set to min and max values of violation counts in the above merged dataset
vioSlider=pn.widgets.IntSlider(name="Slide to see  class C violation totals by neighborhood",
                              start=int(min(ntaVIO['Count'])),
                              end=int(max(ntaVIO['Count'])),
                              value=2000)
#this chart allows the user to slide through violation amounts

@pn.depends(vioSlider)
def plotly_violationSliderChoroMap(paramSlider):
    # step 1: get the list of all number of vacated units
    vioAMOUNT= ntaVIO[ntaVIO['Count']<= paramSlider]

# Step 2: Now groupby this data on zipcode and perform .sum() to get a count by zipcode
    vioTotalAmt = vioAMOUNT.groupby(['NTA'])['Count'].sum().reset_index() # resetting to return df

#3 Merge the count by zipcode/vacate dataset and zip SHAPE files on zipcode field

    vioAmtShape = pd.merge(ntaShape,vioTotalAmt, 
                              how='inner', 
                              left_on='ntaname', right_on='NTA')

#set index of merged DF to zipcode field
    vioAmtShape.set_index("NTA", inplace=True)


# Use the choropleth_mapbox and it's attributes to set desired visual properties
    figPxChoro = px.choropleth_mapbox(vioAmtShape,
                          geojson=vioAmtShape.geometry,
                          locations=vioAmtShape.index,
                          color="Count",
                          color_continuous_scale=px.colors.sequential.Teal, 
                          
                          center={"lat": 40.754932, "lon": -73.984016}, 
                          mapbox_style="carto-positron",
                          zoom=9)

# Return the figure container
    return figPxChoro

In [46]:
uNTAcode = pn.widgets.MultiChoice(name='Select Neighborhood:',
                                  value=['Crown Heights North','Flatbush'],
                                  options=list(closed_violations.NTA.unique()),
                                  max_items= 200)
uNTAcode

#more useful chloro map with options to pick specific neighberhoods

@pn.depends(uNTAcode)
def plotly_violationChoroMap(uSelect):
  
  # step 1: get the list of all number of violations
  vioAMOUNT= ntaVIO[ntaVIO['NTA'].isin(uSelect)]

# Step 2: Now groupby this data on NTA and perform .sum() to get a count by neighborhood
  vioTotalAmt= vioAMOUNT.groupby(['NTA'])['Count'].sum().reset_index() 

#3 Merge the count by nta/count dataset and NTA SHAPE files on NTA field

  vioAmtShape = pd.merge(ntaShape,vioTotalAmt, 
                            how='inner', 
                            left_on='ntaname', right_on='NTA')

#set index of merged DF to NTA field
  vioAmtShape.set_index("NTA", inplace=True)


# Use the choropleth_mapbox and it's attributes to set desired visual properties
  figPxChoro = px.choropleth_mapbox(vioAmtShape,
                        geojson=vioAmtShape.geometry,
                        locations=vioAmtShape.index,
                        color="Count",
                        color_continuous_scale=px.colors.sequential.Teal, 
                        
                        center={"lat": 40.754932, "lon": -73.984016}, 
                        mapbox_style="carto-positron",
                        zoom=9)

# Return the figure container
  return figPxChoro

In [50]:
# create a gridspec 
autoGS = pn.GridSpec(sizing_mode='stretch_both', width=300, height=300)

autoGS[0,0] = pn.Column(first_select, margin=0)
autoGS[0,1:3] = pn.Column(violation_length_plot, margin=0, align="center")

autoGS[1,0] = pn.Column(second_select, margin=0)
autoGS[1,1:3] = pn.Column(violations_by_borough_plot, margin=0, align="center")

autoGS[2,0] = pn.Column(uZip, margin=0)
autoGS[2,1:3] = pn.Column(plotly_vioMap, margin=0, align="center")

# autoGS[4,0:3] = pn.Column(figVioPxChoro, margin=0, align="center")
autoGS[4, :3] = pn.Spacer(background='#0000')

autoGS[5,0] = pn.Column(vioSlider, margin=0)
autoGS[5,1] = pn.Column(plotly_violationSliderChoroMap, margin=0, align="center")

autoGS[6, :3] = pn.Spacer(background='#0000')


autoGS[7,0] = pn.Column(uNTAcode, margin=0)
autoGS[7,1] = pn.Column(plotly_violationChoroMap, margin=0, align="center")
# Launch the viz as an app in a separate browser window with .show()
autoGS.show()

Launching server at http://localhost:43315


<panel.io.server.Server at 0x7f22b69e7df0>