# Improving the Response to NYC "Social Distancing Violation" Reports 

## 1. Introduction

### 1.1 Background

As the Covid 19 pandemic sweeps across the United States many states, counties and cities have implemented "social distancing" laws relating to a restriction on the number and distance at which citizens can congregate. New York City (NYC) has  facilitated enforcement of their local distancing ordinance by allowing citizens to report suspected violations by phone, and on-line using a "311" service, which is analogous to the "911" emergency response system but for lower urgency issues. The NYC Police Department (PD) assigns each report to one or more officers for review, and investigation at the location of the complaint. Many of these reports turn out to be non-actionable (i.e. the investigation does not result in a successful resoluton of the issue or issuance of a violation to suspects.)

### 1.2 Problem

Responding to non-actionable reports requires a police department to task officers to non-productive work that could be assigned to other emergency related tasks. This project attempts to lower the number of unactionable social distancing reports by looking at associated date from two approaches:

A. Detect and define distinct clusters within the reports that might indicate locations that could be included in more routined PD patrols to deter social distancing violators and thus reduce the necessity of citizen reports and one-off dispatches of officers to handle those calls.
B. Review reports to determine the likelihood of the report being actionable. 


### 1.3 Interest

This type of information should be of interest to police department leadership as a factor in their pursuit of effectively assigning officers to types of neighborhoods to patrol, as well as in their efforts to predict which reports would likely result in effective action, so as to allow them to prioritize their response to reports in times when police staffing resources are limited. The leaders of the communities (e.g. community boards) where social distancing reports were made would also likely be interested in understanding how their community compared to others in terms of social distancing related incidents.

## 2. Data Acquisition and Cleaning

### 2.1 Data Sources
311 reports for social distancing violations in NYC can be found in the NYC Open dataset found here:  (https://opendata.cityofnewyork.us/calls). Data related to the social venues in the immediate region around the location of the social distancing reports can be found using the Foursquare API found here: (https://foursquare.com/).

### 2.2 Data Requirements:

From the FourSquare API (https://foursquare.com/): 
    - Venue categories (to obtain counts of venues (restaraunts, bars, parks) to determine characteristics of area around 311 report)
    - latitude
    - longitude
From the NYC Open Data for 311 reports (https://opendata.cityofnewyork.us/calls):
    - Unique ID
    - date/time of report
    - call source (online / phone call)
    - Location type (where social distancing violation occured)
    - resolution type (investigation ended with no resolution, investigation ended with further investigation needed, report filed)
    - incident zip (zip code for location of potential violation)
    - community board (sub-division of NYC borough where report was made)
    - borough (NYC borough where report was made)
    - latitude  
    - longitude
    
These will be joined on latitude/longitude for analysis
        

## Create a pandas dataframe of latest 311 reports about "Social Distancing" in NYC 

### Load 311 call data from NYC Open Data (311 call data)

In [1]:
import requests
import pandas as pd

#Request all 311 calls for "Social Distancing" complaints
r = requests.get('https://data.cityofnewyork.us/resource/erm2-nwe9.json?descriptor=Social%20Distancing')

j = r.json()
df = pd.DataFrame.from_dict(j)
print("Columns list = ",df.columns," and data shape=",df.shape)

Columns list =  Index(['unique_key', 'created_date', 'closed_date', 'agency', 'agency_name',
       'complaint_type', 'descriptor', 'incident_zip', 'incident_address',
       'street_name', 'cross_street_1', 'cross_street_2',
       'intersection_street_1', 'intersection_street_2', 'city', 'landmark',
       'status', 'resolution_description', 'resolution_action_updated_date',
       'community_board', 'bbl', 'borough', 'x_coordinate_state_plane',
       'y_coordinate_state_plane', 'open_data_channel_type',
       'park_facility_name', 'park_borough', 'latitude', 'longitude',
       'location', 'location_type'],
      dtype='object')  and data shape= (1000, 31)


### 311 Call Data Preparation 

### Filter out rows that have no associated location

In [2]:
# Remove rows where latitude or longitude are "NaN" (not a number)
def isNaN(num):
    return num != num
df = df[isNaN(df['latitude']) == False]
df = df[isNaN(df['longitude']) == False]
#Note: It is important to reset the index for a dataframe after filtering out rows or problems will occur using iteration later
df = df.reset_index(drop=True)
#df.head(1000)

### Remove columns that are not needed from 311 data for our project

In [3]:
# Make a smaller dataframe to keep just data we are interested in for our project
df = df.loc[:,['unique_key','created_date','open_data_channel_type','location_type','resolution_description',
                     'incident_zip','community_board','borough','latitude','longitude']]
#df.head(3)

### Modify resolution_column to become a numeric value (0.0 for non-actionable report, 0.5 for an indeterminate report, and 1.0 for an actionable report) to support numeric processing

In [4]:
# set-up list of resolution descriptions that indicate 311 call was actionable (i.e. police were able to do something 
# based on the 311 "Social Distancing" call. Note: descriptions provided by looking at complete set of unique descriptions
actionable_description = ['The Police Department responded to the complaint and took action to fix the condition.',
                         'The Police Department responded to the complaint and a report was prepared.']
follow_up_description = ['The Police Department reviewed your complaint and provided additional information below.',
                        'Your complaint has been received by the Police Department and additional information will be availab']
def actionable_call(s):
    if s['resolution_description'] in actionable_description:
        # 311 call resulted in police either fixing the situation or filing a complaint report
        return 1.0
    elif s['resolution_description'] in follow_up_description:
        # 311 Call resulted in follow-on information, but no action with target(s) of call
        return 0.5
    else:
        # No confirmation and/or action happened as a result of the 311 "social distanceing call"
        return 0.0
df['actionable_call']=df.apply(actionable_call,axis=1)
#df.head(3)

### Modify "open data channel type" column to become a numeric value (0.0 for phone call, 1.0 for on-line)

In [5]:
# convert call source to 0 if phone call and 1 if online
def check_online_call(s):
    if s['open_data_channel_type'] == "PHONE":
        # 311 call was a voice phone call
        return 0.0
    else:
        # "social distanceing call" was made online
        return 1.0
df['online_call']=df.apply(check_online_call,axis=1)
#df.head(3)

### Modify date/time column to become a time of date value to support numeric processing

In [6]:
from datetime import datetime as dtl
# from dateutil.parser import parse as pdt
# convert call datetime string to a float representing the part of the day the call was made
# 0.0 = night, 0.25 = morning, 0.5 = afternoon, 0.75 = evening
def set_time_of_call(s):
    date_time_str = s['created_date']
    time_idx = date_time_str.find('T')
    hour_str = date_time_str[time_idx+1:(time_idx+3)]
    hour = float(hour_str)
    if hour < 6.0:
        tod= 0.0
    elif hour < 13:
        tod = 0.25
    elif hour < 19:
        tod = 0.5
    else:
        tod = 0.75
    #print("The hour is: ",hour_str," and time of day slot is: ",tod)
    return tod
df['part_of_day'] = df.apply(set_time_of_call,axis=1)
df.head(3)

Unnamed: 0,unique_key,created_date,open_data_channel_type,location_type,resolution_description,incident_zip,community_board,borough,latitude,longitude,actionable_call,online_call,part_of_day
0,45958936,2020-04-07T18:51:01.000,PHONE,,The Police Department responded to the complai...,11226,14 BROOKLYN,BROOKLYN,40.63821942108572,-73.95581502290973,1.0,0.0,0.5
1,45961849,2020-04-07T20:35:21.000,ONLINE,Residential Building/House,The Police Department responded to the complai...,11223,15 BROOKLYN,BROOKLYN,40.60276221644951,-73.96370711005132,0.0,1.0,0.75
2,45958982,2020-04-07T18:51:37.000,ONLINE,Park/Playground,The Police Department responded to the complai...,10016,06 MANHATTAN,MANHATTAN,40.742232667442735,-73.97465944083966,0.0,1.0,0.5


### Convert community_board to float for later correlation processing

In [7]:
community_board_array = ['07 MANHATTAN', '12 BROOKLYN', '28 BRONX', '05 MANHATTAN',
       '02 BROOKLYN', '03 MANHATTAN', '09 QUEENS', '01 BRONX', '02 BRONX',
       '04 BRONX', '11 MANHATTAN', '01 BROOKLYN', '06 BROOKLYN',
       '11 QUEENS', '04 BROOKLYN', '11 BROOKLYN', '08 BRONX',
       '15 BROOKLYN', '10 MANHATTAN', '16 BROOKLYN', '12 MANHATTAN',
       '12 BRONX', '04 MANHATTAN', '05 BROOKLYN', '82 QUEENS',
       '02 QUEENS', '01 STATEN ISLAND', '02 STATEN ISLAND', '07 QUEENS',
       '07 BRONX', '05 QUEENS', '03 BROOKLYN', '03 QUEENS', '08 QUEENS',
       '12 QUEENS', '10 BROOKLYN', '03 STATEN ISLAND', '14 BROOKLYN',
       '18 BROOKLYN', '09 BRONX', '07 BROOKLYN', '09 MANHATTAN',
       '02 MANHATTAN', '06 BRONX', '13 QUEENS', '11 BRONX', '14 QUEENS',
       '06 QUEENS', '08 BROOKLYN', '05 BRONX', '03 BRONX', '08 MANHATTAN',
       '09 BROOKLYN', '06 MANHATTAN', '01 QUEENS', '64 MANHATTAN',
       '13 BROOKLYN', '10 BRONX', '04 QUEENS', '55 BROOKLYN', '80 QUEENS',
       '01 MANHATTAN', '17 BROOKLYN', '83 QUEENS', '10 QUEENS',
       '81 QUEENS', '56 BROOKLYN']
community_board_values = [0.0,0.015,0.030,0.045,0.060,0.75,0.09,0.105,0.120,0.135,
                         0.150,0.165,0.180,0.195,0.210,0.225,0.240,0.255,0.270,0.285,
                         0.30,0.315,0.330,0.345,0.360,0.375,0.39,0.405,0.420,0.435,
                         0.450,0.465,0.480,0.495,0.510,0.525,0.540,0.555,0.570,0.585,
                         0.60,0.615,0.630,0.645,0.660,0.675,0.69,0.705,0.720,0.735,
                         0.750,0.765,0.780,0.795,0.810,0.825,0.840,0.855,0.870,0.885,                         
                         0.90,0.915,0.930,0.945,0.960,0.975,0.99]
names_and_values = zip(community_board_array,community_board_values)
community_board_dict = {}
for aCName, aCValue in names_and_values:
    community_board_dict[aCName] = aCValue

In [8]:
def convert_board_to_float(s):
    rvalue = community_board_dict[s["community_board"]]
    return rvalue
df["community_value"] = df.apply(convert_board_to_float,axis=1)

## Update dataframe to include Foursquare API data of counts of Venues in proximity to latest 311 report 

### Define Foursquare Credentials

In [9]:
CLIENT_ID = '' # My Foursquare ID
CLIENT_SECRET = '' # My Foursquare Secret
VERSION = '20180604'
LIMIT = 100
#radius = 80 # radius in meters, 80 meters is about one NYC block
radius = 160 # radius in meters, 80 meters is about one NYC block

### Load Needed Python Libraries

In [10]:
# Install geopy and folium if they haven't been installed
#!conda install geopy -c conda-forge
#!conda install folium -c conda-forge
# import pandas json (data), folium (mapping), and geopy (geolocation) libraries 
from pandas.io.json import json_normalize
import folium
from geopy.geocoders import Nominatim 

### Create Functions to Make it Easier to Process Foursquare API Data

In [11]:
# function that calls the Foursquare API to get venues for a given latitude and longitued
def get_foursquare_venues(latitude,longitude,fs_radius=160,fs_limit=100):
    # fs_limit is maximum number of venues that can be returned
    # 100 was chosen as default fs_limit as that is a compromise between a large number of venues that still throttles API usage
    # fs_radius is radius in meters away from latitude/longitude provides for which venues are provided
    # 80 was chosen for the default fs_radius as 80.4672 is the number of meters in a NYC block
    url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
       CLIENT_ID, 
       CLIENT_SECRET, 
       VERSION, 
       latitude, 
       longitude, 
       fs_radius, 
       fs_limit)
    #print("URL=",url)
    results = requests.get(url).json()
    #print("results are: ",results)
    venues = results['response']['groups'][0]['items']
    nearby_venues = pd.json_normalize(venues)
    return nearby_venues

In [12]:
# function to return foursquare venues for a pandas dataframe that has latitudes and longitudes
def get_venues(s):
    latitude = s['latitude']
    longitude = s['longitude']
    if ((latitude == "nan") or (longitude == "nan")):
        print("Row found where latitude=",latitude," and longitude=",longitude)
        nearby_venues = "Error: latitude or longitude is nan"
    else:
        nearby_venues = get_foursquare_venues(latitude,longitude,radius,LIMIT)
    return nearby_venues

In [13]:
#Note: this applies Foursquare venues to 311-calls dataframe and can take a minute or more to run
df['venues']=df.apply(get_venues,axis=1)

### Make Foursquare API Call to Get All Venue Categories

In [14]:
#API call to Foursquare to get all venue categories
url = 'https://api.foursquare.com/v2/venues/categories'

params = dict(
client_id='',
client_secret='',
v='20180323'
)
resp = requests.get(url=url, params=params)
#code test functionality
#response_meta_df = pd.read_json(resp.text, orient='records')
#top_of_response_df = response_meta_df["response"]

### Function to Build a Python Dictionary from Reference "Categories"

In [15]:
# Create a python dictionary: key=unique row ID, value = array consisting of row name and parent key 
def build_categories_dict(aDict,aDF,parent_key='top'):
    for index,row in aDF.iterrows():
        row_key=row['id']
        name=row['name']
        categories=row['categories']
        #print("parent_key=",parent_key,", row id=",row_key,", name = ",name)
        aDict[row_key] = [name,parent_key]
        if len(categories) >0:
            new_df = pd.DataFrame.from_records(categories)
            build_categories_dict(aDict,new_df,row_key)
    return aDict

### Load Foursquare Reference Dictionary

In [16]:
# Build a foursquare key/[name,parent_key] dictionary for use in rolling-up venue IDs
response_meta_df = pd.read_json(resp.text, orient='records')
top_of_response_df = response_meta_df["response"]
# Note: the 2nd row of the URL response data begins the "categories" hierarchy
categories_df =pd.DataFrame.from_records(top_of_response_df[2])
categories_df.head(10)
venue_dictionary = {'venue_key': ['venue_name','parent_key']}
venue_dictionary = build_categories_dict(venue_dictionary,categories_df) 

#Test call for the newly loaded reference dictionary
#print("Name of venue is",venue_dictionary['4bf58dd8d48988d129941735'][0])

### Function to Get the Highest Level Category Name for a Foursquare Venue  

In [17]:
# There are 10 top level categories in the hierarchy of Foursquare venue names
# Given a Foursquare Venue key this function will return the top level parent category name
def get_toplevel_category_name(rowKey,aVenueDictionary=venue_dictionary):
    entry_list = aVenueDictionary[rowKey]
    name = entry_list[0]
    parent_key = entry_list[1]
    if parent_key == "top":
        category_name=name
    else:
        category_name = get_toplevel_category_name(parent_key)
    return category_name

#Test call for retrieving top level category for a venue
#print("Top level Category for ",venue_dictionary['4bf58dd8d48988d1e9941735'][0]," is ",get_toplevel_category_name("4bf58dd8d48988d1e9941735"))    

## Build data frame of venues associated with 311-calls

In [18]:
#initialize an empty venues data frame
venues_df = pd.DataFrame(data=None, index=None, columns={'callId','venueId','name','category','groupName','latitude','longitude'},
                     dtype=None, copy=False)
# function to build a pandas data frame of venue locations
def build_venues_df(aCalls_df,venues_df):
    for index,row in aCalls_df.iterrows():
        call_id = row['unique_key']
        row_venue_df = row['venues']
        for v_index,v_row in row_venue_df.iterrows():
            #print(v_row)
            venue_key=v_row['venue.id']
            vc = v_row['venue.categories']
            venue_category_key = vc[0]['id']
            venue_category=get_toplevel_category_name(venue_category_key)
            venue_name=v_row['venue.name']
            venue_latitude=v_row['venue.location.lat']
            venue_longitude=v_row['venue.location.lng']
            #print("call_id=",call_id," venue_id=",venue_key," venue_category=",venue_category," venue name=",venue_name)
            #create a new venues_df row
            new_row = {'callId':call_id,'venueId':venue_key,'name':venue_name,'category':venue_category_key,
                       'groupName':venue_category,'latitude':venue_latitude,'longitude':venue_longitude}
            #append the row to the venues_df
            venues_df = venues_df.append(new_row,ignore_index=True)
    return venues_df

#NOTE: this call will take a few seconds to complete
venues_df=build_venues_df(df,venues_df)

#function test 
venues_df.head(3)

Unnamed: 0,callId,latitude,venueId,groupName,longitude,name,category
0,45961849,40.602509,4edfc25ee300bf196b8ef2b1,Shop & Service,-73.963721,Avenue S Supermarket,4bf58dd8d48988d118951735
1,45958982,40.741909,4ff84741e4b0b8fdaa7e9aad,Shop & Service,-73.974707,NYU Langone Hospitals Center Pharmacy,4bf58dd8d48988d10f951735
2,45958982,40.742177,4d9c8fe948b6224b174a109f,College & University,-73.973461,NYU Medical Center Student Cafeteria,4bf58dd8d48988d1a1941735


## Join 311 Call Data to Foursquare data
### Function to count the number of toplevel foursquare venue location categories for each 311 call

In [19]:
# function to initialize an NxN array with zeroes
def init_array(x,y):
    counts_array = []
    print(x,y)
    for i in range(x+1):
        anArray = []
        for j in range(y):
            anArray.append(0)
        counts_array.append(anArray)
    return counts_array

# function to update the counts of top level foursquare venue categories from 311 social distancing reports 
def updateGroupCounts(aCallsDataFrame,aVenuesDataFrame,counts_array):
    for index,row in aCallsDataFrame.iterrows():
        call_id = row['unique_key']
        #print("*",index)
        food_count = (aVenuesDataFrame[(aVenuesDataFrame.callId == call_id) & (aVenuesDataFrame.groupName == "Food")]).shape[0]
        counts_array[index][0]=food_count
        outdoor_count = (aVenuesDataFrame[(aVenuesDataFrame.callId == call_id) & (aVenuesDataFrame.groupName == "Outdoors & Recreation")]).shape[0]
        counts_array[index][1]=outdoor_count
        shop_service_count = (aVenuesDataFrame[(aVenuesDataFrame.callId == call_id) & (aVenuesDataFrame.groupName == "Shop & Service")]).shape[0]
        counts_array[index][2]=shop_service_count
        nightlife_count = (aVenuesDataFrame[(aVenuesDataFrame.callId == call_id) & (aVenuesDataFrame.groupName == "Nightlife Spot")]).shape[0]
        counts_array[index][3]=nightlife_count
        travel_count = (aVenuesDataFrame[(aVenuesDataFrame.callId == call_id) & (aVenuesDataFrame.groupName == "Travel & Transport")]).shape[0]
        counts_array[index][4]=travel_count
        arts_count = (aVenuesDataFrame[(aVenuesDataFrame.callId == call_id) & (aVenuesDataFrame.groupName == "Arts & Entertainment")]).shape[0]
        counts_array[index][5]=arts_count
        residence_count = (aVenuesDataFrame[(aVenuesDataFrame.callId == call_id) & (aVenuesDataFrame.groupName == "Residence")]).shape[0]
        counts_array[index][6]=residence_count
        professional_count = (aVenuesDataFrame[(aVenuesDataFrame.callId == call_id) & (aVenuesDataFrame.groupName == "Professional & Other Places")]).shape[0]
        counts_array[index][7]=professional_count        
        event_count = (aVenuesDataFrame[(aVenuesDataFrame.callId == call_id) & (aVenuesDataFrame.groupName == "Event")]).shape[0]
        counts_array[index][8]=event_count
        college_count = (aVenuesDataFrame[(aVenuesDataFrame.callId == call_id) & (aVenuesDataFrame.groupName == "College & University")]).shape[0]
        counts_array[index][9]=college_count
        TOTAL_count = food_count + outdoor_count + shop_service_count + nightlife_count + travel_count + arts_count 
        + residence_count + professional_count + college_count
        counts_array[index][10]=TOTAL_count
        #print(counts_array[index])
        #print(index)
    return counts_array

#Initialize an empty array with N rows and 10 columns, N being the number of rows in a given DataFrame
my_array = init_array((df.shape[0]),11)
#Fill in the array with the counts of different venue categories for venues associated with each 311 call
counts_array=updateGroupCounts(df,venues_df,my_array)
#Build a dataframe from the array so that the array columns can be used as new columns for the 311 call dataframe
counts_df = pd.DataFrame.from_dict(counts_array)
#Associate each category count column with the 311 call dataframe
df["cat_counts_food"] = counts_df[0]
df["cat_counts_outdoor"] = counts_df[1]
df["cat_counts_shop"] = counts_df[2]
df["cat_counts_nightlife"] = counts_df[3]
df["cat_counts_travel"] = counts_df[4]
df["cat_counts_arts"] = counts_df[5]
df["cat_counts_residence"] = counts_df[6]
df["cat_counts_professional"] = counts_df[7]
df["cat_counts_event"] = counts_df[8]
df["cat_counts_college"] = counts_df[9]
df["TOTAL_count"] = counts_df[10]
#function test
df.head(1000)

985 11


Unnamed: 0,unique_key,created_date,open_data_channel_type,location_type,resolution_description,incident_zip,community_board,borough,latitude,longitude,...,cat_counts_outdoor,cat_counts_shop,cat_counts_nightlife,cat_counts_travel,cat_counts_arts,cat_counts_residence,cat_counts_professional,cat_counts_event,cat_counts_college,TOTAL_count
0,45958936,2020-04-07T18:51:01.000,PHONE,,The Police Department responded to the complai...,11226,14 BROOKLYN,BROOKLYN,40.63821942108572,-73.95581502290973,...,0,0,0,0,0,0,0,0,0,0
1,45961849,2020-04-07T20:35:21.000,ONLINE,Residential Building/House,The Police Department responded to the complai...,11223,15 BROOKLYN,BROOKLYN,40.60276221644951,-73.96370711005132,...,0,1,0,0,0,0,0,0,0,1
2,45958982,2020-04-07T18:51:37.000,ONLINE,Park/Playground,The Police Department responded to the complai...,10016,06 MANHATTAN,MANHATTAN,40.742232667442735,-73.97465944083966,...,1,1,0,1,0,0,0,0,1,4
3,45963489,2020-04-08T01:03:58.000,PHONE,,,11377,02 QUEENS,QUEENS,40.745591755207094,-73.91219856976753,...,0,3,1,0,0,0,0,0,0,15
4,45958166,2020-04-07T21:51:59.000,PHONE,Street/Sidewalk,The Police Department responded to the complai...,10469,12 BRONX,BRONX,40.87709100647922,-73.85254795151418,...,0,0,0,0,0,0,0,0,0,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
980,45954711,2020-04-06T16:30:51.000,PHONE,,The Police Department responded to the complai...,10454,01 BRONX,BRONX,40.809107143867614,-73.92288685607681,...,0,1,0,1,0,0,0,0,0,5
981,45953615,2020-04-06T22:37:28.000,ONLINE,Residential Building/House,The Police Department responded to the complai...,10029,11 MANHATTAN,MANHATTAN,40.797540724287586,-73.94693997826751,...,0,1,0,0,1,0,0,0,0,4
982,45954698,2020-04-06T20:01:23.000,PHONE,,The Police Department responded to the complai...,10467,07 BRONX,BRONX,40.87168232993852,-73.87951910905315,...,2,1,0,0,0,0,0,0,0,4
983,45953632,2020-04-06T22:04:06.000,ONLINE,Store/Commercial,The Police Department responded to the complai...,11201,02 BROOKLYN,BROOKLYN,40.69731336572483,-73.9822966403807,...,3,0,0,1,0,0,0,0,0,6


## Data Exploration
### Get basic data statistics for our "311 Calls" Dataframe

In [20]:
df.describe()

Unnamed: 0,actionable_call,online_call,part_of_day,community_value,cat_counts_food,cat_counts_outdoor,cat_counts_shop,cat_counts_nightlife,cat_counts_travel,cat_counts_arts,cat_counts_residence,cat_counts_professional,cat_counts_event,cat_counts_college,TOTAL_count
count,985.0,985.0,985.0,985.0,985.0,985.0,985.0,985.0,985.0,985.0,985.0,985.0,985.0,985.0,985.0
mean,0.377665,0.507614,0.497208,0.463081,4.194924,0.738071,1.901523,0.619289,0.285279,0.240609,0.013198,0.057868,0.0,0.00203,7.979695
std,0.481896,0.500196,0.190087,0.265406,5.603659,1.26009,2.481117,1.306734,0.639833,0.773005,0.11418,0.233612,0.0,0.045038,8.906903
min,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
25%,0.0,0.0,0.5,0.225,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,2.0
50%,0.0,1.0,0.5,0.45,2.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,5.0
75%,1.0,1.0,0.75,0.72,6.0,1.0,3.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,11.0
max,1.0,1.0,0.75,0.99,51.0,11.0,17.0,10.0,6.0,10.0,1.0,1.0,0.0,1.0,67.0


In [21]:
venues_df.describe()

Unnamed: 0,latitude,longitude
count,7932.0,7932.0
mean,40.736663,-73.943795
std,0.07212,0.062351
min,40.510751,-74.241883
25%,40.695111,-73.982855
50%,40.740981,-73.952376
75%,40.778487,-73.914315
max,40.905467,-73.710851


In [22]:
len(venue_dictionary)

944

In [23]:
#df.corr(method ='pearson')
df.corr(method ='kendall')

Unnamed: 0,actionable_call,online_call,part_of_day,community_value,cat_counts_food,cat_counts_outdoor,cat_counts_shop,cat_counts_nightlife,cat_counts_travel,cat_counts_arts,cat_counts_residence,cat_counts_professional,cat_counts_event,cat_counts_college,TOTAL_count
actionable_call,1.0,-0.021291,-0.034183,0.084142,-0.025704,0.050261,-0.015046,-0.0189,0.000642,0.053788,0.037989,-0.050298,,-0.035259,0.017273
online_call,-0.021291,1.0,0.030088,0.007511,-0.007143,-0.010826,-0.033318,0.000891,0.048998,0.043911,0.02493,0.052756,,-0.000687,-0.007872
part_of_day,-0.034183,0.030088,1.0,-0.017083,-0.038051,-0.050577,-0.091869,-0.000168,-0.067648,-0.038002,-0.006218,0.031204,,-0.001634,-0.077038
community_value,0.084142,0.007511,-0.017083,1.0,0.011555,0.068685,0.039602,0.032298,0.010886,-0.03051,0.001641,0.009825,,0.044509,0.038437
cat_counts_food,-0.025704,-0.007143,-0.038051,0.011555,1.0,0.100289,0.521981,0.437977,0.094537,0.189779,0.015237,0.116215,,0.012726,0.780341
cat_counts_outdoor,0.050261,-0.010826,-0.050577,0.068685,0.100289,1.0,0.115514,0.234351,0.070776,0.17835,0.107666,0.085043,,0.049691,0.286693
cat_counts_shop,-0.015046,-0.033318,-0.091869,0.039602,0.521981,0.115514,1.0,0.338343,0.088848,0.150791,0.060531,0.101778,,0.032913,0.64219
cat_counts_nightlife,-0.0189,0.000891,-0.000168,0.032298,0.437977,0.234351,0.338343,1.0,0.078679,0.3035,0.08624,0.130855,,0.013132,0.501736
cat_counts_travel,0.000642,0.048998,-0.067648,0.010886,0.094537,0.070776,0.088848,0.078679,1.0,0.074305,-0.017929,-0.004334,,0.028224,0.197198
cat_counts_arts,0.053788,0.043911,-0.038002,-0.03051,0.189779,0.17835,0.150791,0.3035,0.074305,1.0,0.057124,0.167615,,-0.018658,0.273522


In [27]:
import matplotlib.pyplot as plt
plt.plot([1, 2, 3, 4])

ModuleNotFoundError: No module named 'matplotlib.colorbar'

In [26]:
#!conda install -c conda-forge matplotlib
!pip install matplotlib
import matplotlib.pyplot as plt
x=df["actionable_call"]
y=df["cat_counts_food"]
plt.plot(x,y,'o',color='black')
plt.ylabel("Number of food locations")
plt.xlabel("actionalable call")
plt.show()



ModuleNotFoundError: No module named 'matplotlib.colorbar'

In [None]:
# Create a figure instance
fig = plt.figure(1, figsize=(9, 6))

# Create an axes instance
ax = fig.add_subplot(111)
controls = ["food","shopping", "night life","outdoors","travel","arts","residence","professional"]
ax.set_ylabel("Numbers of Venues in Radius per 311 Report")
# Create the boxplot
bp = ax.boxplot([df["cat_counts_food"],df["cat_counts_shop"],df["cat_counts_nightlife"],df["cat_counts_outdoor"],
                 df["cat_counts_travel"],df["cat_counts_arts"],df["cat_counts_residence"],df["cat_counts_professional"]],
                patch_artist=True,
                widths=0.5,
                labels=controls)

In [None]:
latitude = df["latitude"][0]
longitude = df["longitude"][0]
print("Lat=",latitude," and longitude =",longitude)
nearby_venues = get_foursquare_venues(latitude,longitude)
print(nearby_venues)

In [None]:
a = range(1,100)
print(a)
venues_df['groupName'].unique()s

In [None]:
#food_count = (df[(venues_df['callId']=="45947452") and (df[venues_df['groupName']=="Food"])]).shape[0]
#food_df = venues_df[(venues_df.callId='45947452')&(venues_df.groupName=="Outdoors & Recreation")]
dfo = venues_df
new_df = dfo[(dfo.callId == "45947452") & (dfo.groupName == "Outdoors & Recreation")]
print(new_df.shape[0])

In [None]:
venues_df["groupName"].unique()

In [None]:

#print(counts_array)
counts_array[0][5]=9
print(counts_array)

## Data Visualizations
### Map of 311-Social Distancing Reports (red/green dots) to Foursquare Venues

In [None]:
neighborhood_latitude=40.75688
neighborhood_longitude=-73.9828
venues_map = folium.Map(location=[neighborhood_latitude, neighborhood_longitude], zoom_start=15)
# add a red circle marker to represent each 311 call
#folium.features.CircleMarker(
for index,row in df.iterrows():
    circle_color = 'red'
    circle_radius = 10
    if (row["actionable_call"]>0.01):
        circle_color = 'green'
        circle_raidus = 15
    folium.vector_layers.CircleMarker(
        [row["latitude"], row["longitude"]],
        radius=circle_radius,
        color=circle_color,
        popup=('311-Call in ' + row["community_board"]),
        fill = True,
        fill_color = circle_color,
        fill_opacity = 0.6
    ).add_to(venues_map)
# add all venues as blue circle markers
#for lat, lng, label in zip(nearby_venues.lat, nearby_venues.lng, nearby_venues.categories):
for index,row in venues_df.iterrows():
    #    folium.features.CircleMarker(
    lat = row["latitude"]
    lng = row["longitude"]
    folium.vector_layers.CircleMarker(
        [lat, lng],
        radius=2,
        color='blue',
        popup=row["name"],
        fill = True,
        fill_color='blue',
        fill_opacity=0.6
    ).add_to(venues_map)

### Display map (Social Distancing Reports with Social Venues)
Foursquare Venues are small blue circles, non-actionable social-distancing reports are red, actionable reports are green

In [None]:
venues_map

In [None]:
neighborhood_latitude=40.75688
neighborhood_longitude=-73.9828
venues_map = folium.Map(location=[neighborhood_latitude, neighborhood_longitude], zoom_start=15)
# add a red circle marker to represent each 311 call
#folium.features.CircleMarker(
for index,row in df.iterrows():
    circle_color = 'red'
    circle_radius = 5
    if (row["actionable_call"]>0.1):
        if (row["actionable_call"]>0.5):
            circle_color = 'green'
            circle_radius = 10
        else:
            circle_color='yellow'
            circle_radus = 7
    folium.vector_layers.CircleMarker(
        [row["latitude"], row["longitude"]],
        radius=circle_radius,
        color=circle_color,
        popup=('311-Call in ' + row["community_board"]),
        fill = True,
        fill_color = circle_color,
        fill_opacity = 0.6
    ).add_to(venues_map)

### Display map (Social Distancing Reports no Social Venues)
Foursquare Venues are small blue circles, non-actionable social-distancing reports are red, actionable reports are yellow 
and green

In [None]:
venues_map

## Use Density-Based (DBSCAN) Clustering Algorithm to Potentially Identify Clusters of Non-Actionable 311 Social Distancing Reports 

In [None]:
import numpy as np
from sklearn.cluster import DBSCAN
from sklearn.preprocessing import StandardScaler

# Get location data from our 311-Report DataFrame
ndf = df.loc[:,['longitude','latitude','actionable_call']]
X=ndf.values
X = StandardScaler().fit_transform(X)

# MODELING
epsilon = 0.3
minimumSamples = 5
db = DBSCAN(eps=epsilon, min_samples=minimumSamples).fit(X)
labels = db.labels_
labels

# DISTINGUISH OUTLIERS
# First, create an array of booleans using the labels from db.
core_samples_mask = np.zeros_like(db.labels_, dtype=bool)
core_samples_mask[db.core_sample_indices_] = True
core_samples_mask
# Number of clusters in labels, ignoring noise if present.
n_clusters_ = len(set(labels)) - (1 if -1 in labels else 0)
n_clusters_
# Remove repetition in labels by turning it into a set.
unique_labels = set(labels)
unique_labels

# DATA VISUALIZATION
# Create colors for the clusters.
colors = plt.cm.Spectral(np.linspace(0, 1, len(unique_labels)))
# Plot the points with colors
for k, col in zip(unique_labels, colors):
    if k == -1:
        # Black used for noise.
        col = 'k'

    class_member_mask = (labels == k)

    # Plot the datapoints that are clustered
    xy = X[class_member_mask & core_samples_mask]
    plt.scatter(xy[:, 0], xy[:, 1],s=50, c=[col], marker=u'o', alpha=0.5)

    # Plot the outliers
    xy = X[class_member_mask & ~core_samples_mask]
    plt.scatter(xy[:, 0], xy[:, 1],s=50, c=[col], marker=u'o', alpha=0.5)


### Visualization of data using basemap toolkit

In [None]:
!pip install --upgrade matplotlib
import numpy as np
from sklearn.cluster import DBSCAN
from sklearn.preprocessing import StandardScaler
from mpl_tookits.basemap import Basemap
from pylab import rcParams
rcParams['figure.figsize'] = (14,10)

# Get location data from our 311-Report DataFrame
ndf = df.loc[:,['latitude','longitude','actionable_call']]
X=ndf.values
X = StandardScaler().fit_transform(X)

#Establish base map 
llon=-140
ulon=-50
llat=40
ulat=65

my_map = Basemap(projection='merc',
            resolution = 'l', area_thresh = 1000.0,
            llcrnrlon=llon, llcrnrlat=llat, #min longitude (llcrnrlon) and latitude (llcrnrlat)
            urcrnrlon=ulon, urcrnrlat=ulat) #max longitude (urcrnrlon) and latitude (urcrnrlat)

my_map.drawcoastlines()
my_map.drawcountries()
# my_map.drawmapboundary()
my_map.fillcontinents(color = 'white', alpha = 0.3)
my_map.shadedrelief()

xs,ys = my_map(np.asarray(ndf.longitude), np.asarray(ndf.latitude))
pdf['xm']= xs.tolist()
pdf['ym'] =ys.tolist()

#Visualization1
for index,row in pdf.iterrows():
#   x,y = my_map(row.Long, row.Lat)
   my_map.plot(row.xm, row.ym,markerfacecolor =([1,0,0]),  marker='o', markersize= 5, alpha = 0.75)
#plt.text(x,y,stn)
plt.show()



In [None]:
# MODELING
epsilon = 0.3
minimumSamples = 7
db = DBSCAN(eps=epsilon, min_samples=minimumSamples).fit(X)
labels = db.labels_
labels

# DISTINGUISH OUTLIERS
# First, create an array of booleans using the labels from db.
core_samples_mask = np.zeros_like(db.labels_, dtype=bool)
core_samples_mask[db.core_sample_indices_] = True
core_samples_mask
# Number of clusters in labels, ignoring noise if present.
n_clusters_ = len(set(labels)) - (1 if -1 in labels else 0)
n_clusters_
# Remove repetition in labels by turning it into a set.
unique_labels = set(labels)
unique_labels

# DATA VISUALIZATION
# Create colors for the clusters.
colors = plt.cm.Spectral(np.linspace(0, 1, len(unique_labels)))
# Plot the points with colors
for k, col in zip(unique_labels, colors):
    if k == -1:
        # Black used for noise.
        col = 'k'

    class_member_mask = (labels == k)

    # Plot the datapoints that are clustered
    xy = X[class_member_mask & core_samples_mask]
    plt.scatter(xy[:, 0], xy[:, 1],s=50, c=[col], marker=u'o', alpha=0.5)

    # Plot the outliers
    xy = X[class_member_mask & ~core_samples_mask]
    plt.scatter(xy[:, 0], xy[:, 1],s=50, c=[col], marker=u'o', alpha=0.5)


In [None]:
df_venues.shape

In [None]:
df.shape