## SOFE3720 | FinalProject - Neighbourhoods in Toronto

## Table of Contents
* [Project Description](#description)

## Project Description <a name="description"></a>

Let's take a city like the City of Toronto, you will segment it into separate neighborhoods using
the geographical coordinates of each neighborhood. Then, using a combination of location data
using Foursquare API and Artificial intelligent (clustering), you will group the neighborhoods
into clusters. You will learn the skills and the apps to use location data to explore a geographical
location. You will have the chance to be as creative as you want and come up with a new idea
to leverage the Foursquare API location data to explore neighborhoods of your choice or to
come up with an issue that you can use the Foursquare location data to solve. The main aspects
to consider is to determine the most common area in Toronto to a minimum of two of these
topics and find the correlation between them:
1. Crime rate (e.g. Assault, break and enter, homicide, and robbery)
2. Population information (e.g. Age, marital status, education, and income)
3. Income source and taxes.

### Importing Libraries

In [338]:
import pandas as pd
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import numpy as np
import json

from geopy.geocoders import Nominatim

import requests

from pandas.io.json import json_normalize

import matplotlib.cm as cm             
import matplotlib.colors as colors

from sklearn.cluster import KMeans

import folium

from bs4 import BeautifulSoup as bs
from IPython.display import display_html
import urllib

print("Libraries imported.")

Libraries imported.


### Extract Postal Codes

In [339]:
url = 'https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M'
html_table_data = requests.get(url).text 

soup = bs(html_table_data, 'html5lib')
df = pd.DataFrame(columns = ['PostalCode','Borough','Neighbourhood'])
tb_rows = soup.find('table').tbody.find_all('tr')

for rows in tb_rows :
    for column in rows.find_all('td') :
        if column.span.text != 'Not assigned' :
            span  = column.span.text.split('(')
            df = df.append({'PostalCode' : column.b.text,
                              'Borough' : span[0],
                              'Neighbourhood' : span[1][:-1]}, ignore_index=True)

df['Borough']=df['Borough'].replace({'Downtown TorontoStn A PO Boxes25 The Esplanade':'Downtown Toronto Stn A',
                                             'East TorontoBusiness reply mail Processing Centre969 Eastern':'East Toronto Business',
                                             'EtobicokeNorthwest':'Etobicoke Northwest','East YorkEast Toronto':'East York/East Toronto',
                                             'MississaugaCanada Post Gateway Processing Centre':'Mississauga'})

df = df.sort_values('PostalCode').reset_index(drop = True)
df.head(10)

Unnamed: 0,PostalCode,Borough,Neighbourhood
0,M1B,Scarborough,Malvern / Rouge
1,M1C,Scarborough,Rouge Hill / Port Union / Highland Creek
2,M1E,Scarborough,Guildwood / Morningside / West Hill
3,M1G,Scarborough,Woburn
4,M1H,Scarborough,Cedarbrae
5,M1J,Scarborough,Scarborough Village
6,M1K,Scarborough,Kennedy Park / Ionview / East Birchmount Park
7,M1L,Scarborough,Golden Mile / Clairlea / Oakridge
8,M1M,Scarborough,Cliffside / Cliffcrest / Scarborough Village West
9,M1N,Scarborough,Birch Cliff / Cliffside West


### Extract Latitude and Longitude of Postal Codes and Merge with Table Data

In [340]:
# import wget
!wget -O GeoSpatial_Data https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBMDeveloperSkillsNetwork-DS0701EN-SkillsNetwork/labs_v1/Geospatial_Coordinates.csv
geospatial_data = pd.read_csv('GeoSpatial_Data')
geospatial_data.columns = ['PostalCode', 'Latitude', 'Longitude']
geospatial_data.head(10)
df.shape


'wget' is not recognized as an internal or external command,
operable program or batch file.


(103, 3)

In [341]:
df = df.join(geospatial_data.set_index('PostalCode'), on = 'PostalCode')
df = df.assign(Neighbourhood=df.Neighbourhood.str.split(" / ")).explode('Neighbourhood')

df.head(10)


Unnamed: 0,PostalCode,Borough,Neighbourhood,Latitude,Longitude
0,M1B,Scarborough,Malvern,43.806686,-79.194353
0,M1B,Scarborough,Rouge,43.806686,-79.194353
1,M1C,Scarborough,Rouge Hill,43.784535,-79.160497
1,M1C,Scarborough,Port Union,43.784535,-79.160497
1,M1C,Scarborough,Highland Creek,43.784535,-79.160497
2,M1E,Scarborough,Guildwood,43.763573,-79.188711
2,M1E,Scarborough,Morningside,43.763573,-79.188711
2,M1E,Scarborough,West Hill,43.763573,-79.188711
3,M1G,Scarborough,Woburn,43.770992,-79.216917
4,M1H,Scarborough,Cedarbrae,43.773136,-79.239476


### Create Clustered Map of Toronto Neighbourhoods

In [342]:
df.Borough.value_counts()

Etobicoke                 44
Scarborough               38
North York                36
Downtown Toronto          35
Central Toronto           16
West Toronto              13
Etobicoke Northwest        9
York                       8
East Toronto               6
East York                  5
Downtown Toronto Stn A     1
Mississauga                1
East York/East Toronto     1
Queen's Park               1
East Toronto Business      1
Name: Borough, dtype: int64

In [343]:
address = 'Toronto, Ontario'
geolocator = Nominatim(user_agent = 'ny_explorer')
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude

In [344]:
borough_array = ['North York', 'York ', 'East York', 'Downtown Toronto', 'Central Toronto', 'West Toronto', 'East Toronto', 'Downtown Toronto Stn A' , 'East Toronto Business', 'East York/East Toronto', 'Scarborough',
                 'Etobicoke', 'Etobicoke Northwest', "Queen's Park", 'Mississauga']

# now let's make changes in the dataframe accordingly
df1 = df.copy()
for boroughs in borough_array :
    for borough in boroughs :
        df1.replace(borough, str(boroughs), inplace = True)

colors_array = np.empty(15, dtype = str)
colors_array.fill('blue')

# create map of Toronto using latitude and longitude values
map_toronto = folium.Map(location=[latitude, longitude], zoom_start=10)

# add markers to map
for borough, color in zip(borough_array, colors_array) :
    df2 = df1[df1.Borough == str(borough)]
    for lat, lng, borough, neighborhood in zip(df2['Latitude'], df2['Longitude'], df2['Borough'], df2['Neighbourhood']):
        label = '{}, {}'.format(neighborhood, borough)
        label = folium.Popup(label, parse_html=True)
        folium.CircleMarker(
            [lat, lng],
            radius = 5,
            popup = label,
            color = 'blue',
            fill = True,
            fill_color = 'blue',
            fill_opacity = 1,
            parse_html = False).add_to(map_toronto)  
    
map_toronto


### Types of Crime Rates Based on Location

In [345]:
!wget -O Crime_Data https://opendata.arcgis.com/datasets/af500b5abb7240399853b35a2362d0c0_0.csv

'wget' is not recognized as an internal or external command,
operable program or batch file.


In [346]:
crime_data = pd.read_csv('Crime_Data')
crime_data = crime_data[["Neighbourhood", "Population", "Assault_Rate_2019", "AutoTheft_Rate_2019", "BreakandEnter_Rate_2019", "Homicide_Rate_2019", "Robbery_Rate_2019", "TheftOver_Rate_2019"]]

df = df.merge(crime_data.set_index('Neighbourhood'), on = 'Neighbourhood')

df.head()
df.shape


(33, 12)