<a href="https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M"><img src = "https://cdn.torontolife.com/wp-content/uploads/2016/10/toronto-skyline-803x603-1476458932.jpg" width = 140, align = "center"></a>

# Clustering Neighborhoods in Toronto
### Autor: Patrick Franco Alves

## Table of Contents

<div class="alert alert-block alert-info" style="margin-top: 20px">

<font size = 3>

1. <a href="#item1">Install the required packages</a>

2. <a href="#item2">BeautifulSoup to scrap the data</a>

3. <a href="#item3">Storing in pandas dataset</a>

4. <a href="#item4">Map of Toronto using Folium</a>
    
5. <a href="#item4">Pizza Clusters of Toronto</a>
    
</font>
</div>

## 1.  Install the required packages

In [3]:
! conda install -c anaconda beautifulsoup4 --yes
#! conda install -c conda-forge geopy --yes 
#! conda install -c anaconda lxml --yes
#! conda install -c conda-forge geopy --yes 

Solving environment: done


  current version: 4.5.11
  latest version: 4.7.12

Please update conda by running

    $ conda update -n base -c defaults conda



## Package Plan ##

  environment location: /home/jupyterlab/conda/envs/python

  added / updated specs: 
    - beautifulsoup4


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    soupsieve-1.9.3            |           py36_0          60 KB  anaconda
    openssl-1.1.1              |       h7b6447c_0         5.0 MB  anaconda
    certifi-2019.9.11          |           py36_0         154 KB  anaconda
    beautifulsoup4-4.8.0       |           py36_0         147 KB  anaconda
    ------------------------------------------------------------
                                           Total:         5.4 MB

The following NEW packages will be INSTALLED:

    soupsieve:      1.9.3-py36_0      anaconda   

The following packages will be UPDATED

## 2.  BeautifulSoup to scrap the data

In [4]:
import requests # library to handle requests
import pandas as pd # library for data analsysis
import numpy as np # library to handle data in a vectorized manner
import random # library for random number generation
from bs4 import BeautifulSoup

from geopy.geocoders import Nominatim # module to convert an address into latitude and longitude values

# libraries for displaying images
from IPython.display import Image 
from IPython.core.display import HTML 
    
# tranforming json file into a pandas dataframe library
from pandas.io.json import json_normalize

#!conda install -c conda-forge folium=0.5.0 --yes
import folium # plotting library

print('Folium installed')
print('Libraries imported.')

Folium installed
Libraries imported.


## 2. Uses BeautifulSoup to scrap the data.

In [5]:
website_text = requests.get('https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M').text

soup = BeautifulSoup(website_text, "html5lib")

In [6]:
table = soup.find('table',{'class':'wikitable sortable'})
table_rows = table.find_all('tr')

## 3.  Storing in pandas dataset

In [7]:
table = soup.find('table',{'class':'wikitable sortable'})
table_rows = table.find_all('tr')

data = []
for row in table_rows:
    data.append([t.text.strip() for t in row.find_all('td')])

df = pd.DataFrame(data, columns=['PostalCode', 'Borough', 'Neighbourhood'])
df = df[~df['PostalCode'].isnull()]  # to filter out bad rows

In [8]:
df.head(10)

Unnamed: 0,PostalCode,Borough,Neighbourhood
1,M1A,Not assigned,Not assigned
2,M2A,Not assigned,Not assigned
3,M3A,North York,Parkwoods
4,M4A,North York,Victoria Village
5,M5A,Downtown Toronto,Harbourfront
6,M5A,Downtown Toronto,Regent Park
7,M6A,North York,Lawrence Heights
8,M6A,North York,Lawrence Manor
9,M7A,Queen's Park,Not assigned
10,M8A,Not assigned,Not assigned


In [9]:
df[df.Borough != 'Not assigned'].groupby('Neighbourhood').count()

Unnamed: 0_level_0,PostalCode,Borough
Neighbourhood,Unnamed: 1_level_1,Unnamed: 2_level_1
Adelaide,1,1
Agincourt,1,1
Agincourt North,1,1
Albion Gardens,1,1
Alderwood,1,1
...,...,...
Woodbine Heights,1,1
York Mills,1,1
York Mills West,1,1
York University,1,1


In [10]:
df2 = df[df.Borough != 'Not assigned']

In [11]:
print("Before deleting Not assigned rows",df.shape)
print("After deleting Not assigned rows",df2.shape)

Before deleting Not assigned rows (288, 3)
After deleting Not assigned rows (211, 3)


##### Use geopy to get the latitude and longitude of Toronto.

In [12]:
from geopy.geocoders import Nominatim 

In [13]:
# convert an address into latitude and longitude values
address = 'Toronto, Ontario'

#Budapest Park

geolocator = Nominatim(user_agent="Toronto")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('Latitude and longitude of Toronto: {}, {}.'.format(latitude, longitude))

Latitude and longitude of Toronto: 43.653963, -79.387207.


## 4. Map of Toronto using folium

In [17]:
map_newyork = folium.Map(location=[latitude, longitude], zoom_start=12)

In [18]:
map_newyork

In [19]:
!wget -q -O 'Geospatial_Coordinates.csv' http://cocl.us/Geospatial_data

#http://cocl.us/Geospatial_data

print('Data downloaded!')

Data downloaded!


In [20]:
geo_df = pd.read_csv('Geospatial_Coordinates.csv')
geo_df.head()

Unnamed: 0,Postal Code,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497
2,M1E,43.763573,-79.188711
3,M1G,43.770992,-79.216917
4,M1H,43.773136,-79.239476


In [21]:
#merge
geo_df.columns

Index(['Postal Code', 'Latitude', 'Longitude'], dtype='object')

In [22]:
df2.columns

Index(['PostalCode', 'Borough', 'Neighbourhood'], dtype='object')

In [23]:
neighborhoods = df2.merge(geo_df, left_on='PostalCode', right_on='Postal Code')

In [24]:
neighborhoods.head()

Unnamed: 0,PostalCode,Borough,Neighbourhood,Postal Code,Latitude,Longitude
0,M3A,North York,Parkwoods,M3A,43.753259,-79.329656
1,M4A,North York,Victoria Village,M4A,43.725882,-79.315572
2,M5A,Downtown Toronto,Harbourfront,M5A,43.65426,-79.360636
3,M5A,Downtown Toronto,Regent Park,M5A,43.65426,-79.360636
4,M6A,North York,Lawrence Heights,M6A,43.718518,-79.464763


In [20]:
#import folium  
#from IPython.display import HTML, display

In [108]:
map_newyork = folium.Map(location=[latitude, longitude], zoom_start=14)

# add markers to map
for lat, lng, borough, neighborhood in zip(neighborhoods['Latitude'], neighborhoods['Longitude'], neighborhoods['Borough'], neighborhoods['Neighbourhood']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_newyork)  
    
display(map_newyork)

In [109]:
map_newyork.save('map_newyork.html')

In [110]:
map_newyork.save("map_newyork.png")

## 5. Pizza Clusters of Toronto

In [111]:
CLIENT_ID = 'USC23TI2Y3K0WLFYXPJUXXEJVZ4NUEMT3CR5T3ZCNTJ3IB55' # your Foursquare ID
CLIENT_SECRET = 'U2QVNANSIRP4OBT5MWECVSUP0VR1PVEQLDKY5S4HY4CNCP0S' # your Foursquare Secret
VERSION = '20191013'
LIMIT = 50
print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: USC23TI2Y3K0WLFYXPJUXXEJVZ4NUEMT3CR5T3ZCNTJ3IB55
CLIENT_SECRET:U2QVNANSIRP4OBT5MWECVSUP0VR1PVEQLDKY5S4HY4CNCP0S


In [87]:
address = 'Toronto Tower'
geolocator = Nominatim(user_agent="foursquare_agent")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print(latitude, longitude)

43.64635315 -79.4023620504047


In [88]:
print(latitude, longitude)

43.64635315 -79.4023620504047


In [89]:
search_query = 'pizza'
radius = 1000
print(search_query + ' .... OK!')

pizza .... OK!


In [90]:
url = 'https://api.foursquare.com/v2/venues/search?client_id={}&client_secret={}&ll={},{}&v={}&query={}&radius={}&limit={}'.format(CLIENT_ID, CLIENT_SECRET, latitude, longitude, VERSION, search_query, radius, LIMIT)
url

'https://api.foursquare.com/v2/venues/search?client_id=USC23TI2Y3K0WLFYXPJUXXEJVZ4NUEMT3CR5T3ZCNTJ3IB55&client_secret=U2QVNANSIRP4OBT5MWECVSUP0VR1PVEQLDKY5S4HY4CNCP0S&ll=43.64635315,-79.4023620504047&v=20191013&query=pizza&radius=1000&limit=50'

In [91]:
results = requests.get(url).json()
results

{'meta': {'code': 200, 'requestId': '5da475b70d2be7002c877c13'},
 'response': {'venues': [{'id': '4bc3a04cabf49521650bc493',
    'name': 'Pizza Pizza',
    'location': {'address': '655 Queen St W',
     'lat': 43.64711589206853,
     'lng': -79.40415196491139,
     'labeledLatLngs': [{'label': 'display',
       'lat': 43.64711589206853,
       'lng': -79.40415196491139}],
     'distance': 167,
     'postalCode': 'M6J 1E8',
     'cc': 'CA',
     'city': 'Toronto',
     'state': 'ON',
     'country': 'Canada',
     'formattedAddress': ['655 Queen St W', 'Toronto ON M6J 1E8', 'Canada']},
    'categories': [{'id': '4bf58dd8d48988d1ca941735',
      'name': 'Pizza Place',
      'pluralName': 'Pizza Places',
      'shortName': 'Pizza',
      'icon': {'prefix': 'https://ss3.4sqi.net/img/categories_v2/food/pizza_',
       'suffix': '.png'},
      'primary': True}],
    'referralId': 'v-1571059127',
    'hasPerk': False},
   {'id': '4af23cabf964a520d4e621e3',
    'name': 'Pizza Rustica Restauran

In [93]:
# assign relevant part of JSON to venues
venues = results['response']['venues']

# tranform venues into a dataframe
dataframe = json_normalize(venues)
dataframe.head()

Unnamed: 0,id,name,categories,referralId,hasPerk,location.address,location.lat,location.lng,location.labeledLatLngs,location.distance,location.postalCode,location.cc,location.city,location.state,location.country,location.formattedAddress,location.crossStreet
0,4bc3a04cabf49521650bc493,Pizza Pizza,"[{'id': '4bf58dd8d48988d1ca941735', 'name': 'P...",v-1571059127,False,655 Queen St W,43.647116,-79.404152,"[{'label': 'display', 'lat': 43.64711589206853...",167,M6J 1E8,CA,Toronto,ON,Canada,"[655 Queen St W, Toronto ON M6J 1E8, Canada]",
1,4af23cabf964a520d4e621e3,Pizza Rustica Restaurant & Bar,"[{'id': '4bf58dd8d48988d1ca941735', 'name': 'P...",v-1571059127,False,37 Blue Jays Way,43.644919,-79.391844,"[{'label': 'display', 'lat': 43.64491920354413...",862,M5V 3P5,CA,Toronto,ON,Canada,"[37 Blue Jays Way, Toronto ON M5V 3P5, Canada]",
2,4adc9f99f964a520092e21e3,Mamma's Pizza,"[{'id': '4bf58dd8d48988d1ca941735', 'name': 'P...",v-1571059127,False,405 Richmond St. W,43.647943,-79.395647,"[{'label': 'display', 'lat': 43.64794272203376...",569,,CA,Toronto,ON,Canada,"[405 Richmond St. W (at Spadina Ave.), Toronto...",at Spadina Ave.
3,4dde735f7d8bb03c06b3db73,Boston Pizza,"[{'id': '4bf58dd8d48988d1ca941735', 'name': 'P...",v-1571059127,False,250 Front St,43.644059,-79.388815,"[{'label': 'display', 'lat': 43.64405883528329...",1120,M5V 3G5,CA,Toronto,ON,Canada,"[250 Front St (John St), Toronto ON M5V 3G5, C...",John St
4,4b7f161bf964a5206e1530e3,Pizza Nova,"[{'id': '4bf58dd8d48988d1ca941735', 'name': 'P...",v-1571059127,False,371 Front Street W,43.643272,-79.391836,"[{'label': 'display', 'lat': 43.64327168873745...",914,M5V 3S8,CA,Toronto,ON,Canada,"[371 Front Street W (at Blue Jay Way), Toronto...",at Blue Jay Way


In [94]:
dataframe.shape

(37, 17)

In [95]:
# keep only columns that include venue name, and anything that is associated with location
filtered_columns = ['name', 'categories'] + [col for col in dataframe.columns if col.startswith('location.')] + ['id']
dataframe_filtered = dataframe.loc[:, filtered_columns]

# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

# filter the category for each row
dataframe_filtered['categories'] = dataframe_filtered.apply(get_category_type, axis=1)

# clean column names by keeping only last term
dataframe_filtered.columns = [column.split('.')[-1] for column in dataframe_filtered.columns]

dataframe_filtered

Unnamed: 0,name,categories,address,lat,lng,labeledLatLngs,distance,postalCode,cc,city,state,country,formattedAddress,crossStreet,id
0,Pizza Pizza,Pizza Place,655 Queen St W,43.647116,-79.404152,"[{'label': 'display', 'lat': 43.64711589206853...",167,M6J 1E8,CA,Toronto,ON,Canada,"[655 Queen St W, Toronto ON M6J 1E8, Canada]",,4bc3a04cabf49521650bc493
1,Pizza Rustica Restaurant & Bar,Pizza Place,37 Blue Jays Way,43.644919,-79.391844,"[{'label': 'display', 'lat': 43.64491920354413...",862,M5V 3P5,CA,Toronto,ON,Canada,"[37 Blue Jays Way, Toronto ON M5V 3P5, Canada]",,4af23cabf964a520d4e621e3
2,Mamma's Pizza,Pizza Place,405 Richmond St. W,43.647943,-79.395647,"[{'label': 'display', 'lat': 43.64794272203376...",569,,CA,Toronto,ON,Canada,"[405 Richmond St. W (at Spadina Ave.), Toronto...",at Spadina Ave.,4adc9f99f964a520092e21e3
3,Boston Pizza,Pizza Place,250 Front St,43.644059,-79.388815,"[{'label': 'display', 'lat': 43.64405883528329...",1120,M5V 3G5,CA,Toronto,ON,Canada,"[250 Front St (John St), Toronto ON M5V 3G5, C...",John St,4dde735f7d8bb03c06b3db73
4,Pizza Nova,Pizza Place,371 Front Street W,43.643272,-79.391836,"[{'label': 'display', 'lat': 43.64327168873745...",914,M5V 3S8,CA,Toronto,ON,Canada,"[371 Front Street W (at Blue Jay Way), Toronto...",at Blue Jay Way,4b7f161bf964a5206e1530e3
5,Pizzaiolo,Pizza Place,521 King St. W,43.644793,-79.39823,"[{'label': 'display', 'lat': 43.64479341856585...",375,M5V 1K4,CA,Toronto,ON,Canada,"[521 King St. W (at Brant St.), Toronto ON M5V...",at Brant St.,4aeb18d8f964a5204bbe21e3
6,Pizza Pizza,Pizza Place,540 King Street West,43.644943,-79.398023,"[{'label': 'display', 'lat': 43.6449428004504,...",383,M5V 1M3,CA,Toronto,ON,Canada,"[540 King Street West, Toronto ON M5V 1M3, Can...",,4b02ec72f964a5200c4b22e3
7,Pizza Thick & The Ice Cream Bake Shop,Pizza Place,536 Queen St W,43.647512,-79.402509,"[{'label': 'display', 'lat': 43.647512, 'lng':...",129,M5V 2B5,CA,Toronto,ON,Canada,"[536 Queen St W, Toronto ON M5V 2B5, Canada]",,580a5dfb38fa4875ae680a3f
8,Pizzaiolo,Pizza Place,609 Queen Street West,43.647457,-79.402476,"[{'label': 'display', 'lat': 43.64745689946616...",123,,CA,Toronto,ON,Canada,"[609 Queen Street West (Bathurst), Toronto ON,...",Bathurst,4ae09982f964a520d08021e3
9,Pizza Aiolo,Pizza Place,609 queen street west,43.647499,-79.402481,"[{'label': 'display', 'lat': 43.647499, 'lng':...",127,,CA,Toronto,ON,Canada,"[609 queen street west, Toronto ON, Canada]",,4beb2f0162c0c92811d4e1d4


In [100]:
poutine2 = dataframe_filtered[['lat','lng','formattedAddress']]

In [101]:
poutine2.head()

Unnamed: 0,lat,lng,formattedAddress
0,43.647116,-79.404152,"[655 Queen St W, Toronto ON M6J 1E8, Canada]"
1,43.644919,-79.391844,"[37 Blue Jays Way, Toronto ON M5V 3P5, Canada]"
2,43.647943,-79.395647,"[405 Richmond St. W (at Spadina Ave.), Toronto..."
3,43.644059,-79.388815,"[250 Front St (John St), Toronto ON M5V 3G5, C..."
4,43.643272,-79.391836,"[371 Front Street W (at Blue Jay Way), Toronto..."


In [102]:
poutine2.shape

(37, 3)

In [105]:
map_newyork2 = folium.Map(location=[latitude, longitude], zoom_start=14)

# add markers to map
for lat, lng, address in zip(poutine2['lat'], poutine2['lng'], poutine2['formattedAddress']):
    label = '{}'.format(address)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_newyork2)  
    
display(map_newyork2)

In [106]:
map_newyork2.save('map_newyork2.html')

In [107]:
map_newyork2.save("map_newyork2.png")