# Violent neighbourhoods in Toronto: what do they have in common?
An analysis of Toronto neighbourhoods in relation to its crime rates and what the community has to offer

## Introduction

With little research, one can figure out Toronto is not on the list of the most dangerous cities in the world but it does not mean Torotonians are exempt from crime occurrence. [Data from Toronto Police Service](https://data.torontopolice.on.ca/pages/major-crime-indicators) shows that in 2019, despite a significant decrease of 19% in murder rate, all the other crime categories showed an increase of 10% (average) in comparison with 2018 rates. 

But what characteristics do these violent neighbourhoods have in common?

This report is an attempt to relate crime occurrence in Toronto with respect to the urban and communitarian infrastructure and facilities, such as parks, restaurants and other venues. Thus, it might help urban planners and the police.

## Data
The first task will be listing the most violent Toronto boroughs by analysing its crime data. For that I will be using [Neighbourhood Crime Rates](https://data.torontopolice.on.ca/datasets/neighbourhood-crime-rates-boundary-file-/data) data, obtained from Toronto Police Service website. The data consists of 2014-2019 Crime Data by Neighbourhood. Counts are available for Assault, Auto Theft, Break and Enter, Robbery, Theft Over and Homicide. Data also includes four year averages and crime rates per 100,000 people by neighbourhood based on 2016 Census Population. 

Then I will use the [Foursquare API](https://developer.foursquare.com/) to retrieve all the venues in each neighbourhood. 

## Methodology

As this report is trying to explain the features of the neighbourhoods that are linked to the criminality indices, I will be using the average data instead of breaking it into year, for the sake of simplicity. I will be plotting these data in a map to give an overview of the cime locations. 

Then I will extract venues information for every one of these neighbourhoods and also plot them in the map.

For the data analysis and machine learning task, I will compare the neighbouhoods using the venues and the crime data. For that I will follow these steps:

0. Account the total cr
1. Classify the neighbourhood by the types of crime that happen there.  
    1.1. Show on the map.  
    1.2. Run a clustering algorithm to try to separate the types of crime.  

2. Classify the neighbourhoods by the types of venues that occurs there.  
    2.1. Run a clustering algorithm to try to separate the types of venues by its occurrence.  

3. Statistics.  
    3.1. Multivariate analysis: Try to find a relationship between the types of venues and the types of crime  

4. Machine Learning.  
    4.1. Build a classification (Bayesian) model to show the probability of being victim of a crime in each neighbourhood.  

## Analysis

In [9]:
# Import libraries
import numpy as np
import pandas as pd
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import geopandas as gpd
from bokeh.io import reset_output, output_notebook, show
from bokeh.plotting import figure, output_file
from bokeh.palettes import brewer
from bokeh.models import GeoJSONDataSource, LinearColorMapper, ColorBar, HoverTool
import json

### 0. Obtain neighbourhood crime data

In [10]:
# Read crime data (Spreadsheet) from the csv file obtained at:
# https://data.torontopolice.on.ca/datasets/neighbourhood-crime-rates-boundary-file-/data
crime = pd.read_csv('Neighbourhood_Crime_Rates_(Boundary_File)_.csv', index_col='OBJECTID')

# Read districts geo data (Shapefile) into a geopandas dataframe
# https://data.torontopolice.on.ca/datasets/neighbourhood-crime-rates-boundary-file-/data
t_geodata = gpd.read_file('Neighbourhood_Crime_Rates_(Boundary_File)_-shp/f412cc26-3247-4ac5-95e6-0f3d6fc509ad2020329-1-gs8ft1.c4okq.shx')

#### For the sake of simplicity I will be using the "Crime rate" data, which represents the number of crimes per 100,000 population in 2019.

In [11]:
# From the crime data, use only the crime rate average
crime['crime_rate_avg'] = crime.filter(regex='Rate').sum(axis=1).astype(int)

# Append total average crime data to the geopandas dataframe
t_geodata['crime_rate_avg'] = crime['crime_rate_avg']

# Transform geometry into a new coordinate reference system and append them to the crime dataframe.
t_geodata['geometry'] = t_geodata['geometry'].to_crs(epsg=4269)

### 0.1. Create map for all crimes

In [12]:
# Read data to json. 
t_geodata_json = json.loads(t_geodata.to_json())

# Convert to String like object.
json_data = json.dumps(t_geodata_json)

# Input GeoJSON source that contains features for plotting.
geosource = GeoJSONDataSource(geojson=json_data)

# Add hover tool
hover = HoverTool(tooltips=[('District','@Neighbourh'),
                            ('Crime Rate', '@crime_rate_avg')])

# Create figure
p = figure(title = 'Toronto: Number of crimes per 100,000 population in 2019 (All types of crimes)', 
           plot_height=600,
           plot_width=950,
           tools = 'save')
p.xgrid.grid_line_color = None
p.ygrid.grid_line_color = None

# Add patch renderer to figure. 
palette = brewer['Spectral'][11]
color_mapper = LinearColorMapper(palette=palette,
                                 low=t_geodata['crime_rate_avg'].min(),
                                 high=t_geodata['crime_rate_avg'].max())
p.patches('xs', 'ys', source=geosource,
          line_color='white',
          line_width=0.25,
          fill_alpha=1,
          fill_color={'field': 'crime_rate_avg', 'transform': color_mapper})

# Add color bar
color_bar = ColorBar(color_mapper=color_mapper, 
                     label_standoff=8, 
                     width=500, 
                     height=15,
                     border_line_color=None,
                     location=(0,0),
                     orientation='horizontal')

# Specify figure layout.
p.add_layout(color_bar, 'below')

# Add hover tool
p.add_tools(hover)

# Display figure in Jupyter Notebook.
reset_output()
output_notebook()

# Display figure.
show(p)

### 1. Classify the neighbourhood by the types of crime that happen there.

In [13]:
crime_rate = pd.DataFrame(crime['Neighbourhood']).join(crime.filter(regex='Rate'))
crime_rate.columns = ['Neighbourhood', 
                      'Assault', 
                      'Auto Theft',
                      'Break and Enter', 
                      'Homicide', 
                      'Robbery',
                      'Theft Over']

In [14]:
def return_most_common_crimes(row, num_top_crimes):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_crimes]


num_top_crimes = 5

indicators = ['st', 'nd', 'rd']

# create columns according to number of top crimes
columns = ['Neighbourhood']
for ind in np.arange(num_top_crimes):
    try:
        columns.append(f'{ind+1}{indicators[ind]} Most Common Crime')
    except:
        columns.append(f'{ind+1}th Most Common Crime')

# create a new dataframe
neighborhoods_crimes_sorted = pd.DataFrame(columns=columns)
neighborhoods_crimes_sorted['Neighbourhood'] = crime_rate['Neighbourhood']

for ind in np.arange(crime_rate.shape[0]):
    neighborhoods_crimes_sorted.iloc[ind, 1:] = return_most_common_crimes(crime_rate.iloc[ind, :], num_top_crimes)

neighborhoods_crimes_sorted.head()

Unnamed: 0_level_0,Neighbourhood,1st Most Common Crime,2nd Most Common Crime,3rd Most Common Crime,4th Most Common Crime,5th Most Common Crime
OBJECTID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
1,Yonge-St.Clair,Assault,Break and Enter,Theft Over,Auto Theft,Robbery
2,York University Heights,Assault,Auto Theft,Break and Enter,Robbery,Theft Over
3,Lansing-Westgate,Assault,Break and Enter,Auto Theft,Theft Over,Robbery
4,Yorkdale-Glen Park,Assault,Break and Enter,Auto Theft,Robbery,Theft Over
5,Stonegate-Queensway,Assault,Break and Enter,Auto Theft,Robbery,Theft Over


### 1.1. Show on the map.

### 1.2. Run a clustering algorithm to try to separate the types of crime.

### 2. Classify the neighbourhoods by the types of venues that occurs there.  

### 2.1. Run a clustering algorithm to try to separate the types of venues by its occurrence.   

### 3. Statistics.  

### 3.1. Multivariate analysis: Try to find a relationship between the types of venues and the types of crime  

### 4. Machine Learning.  

### 4.1. Build a classification (Bayesian) model to show the probability of being victim of a crime in each neighbourhood. 