<b>Importing the Data Processing and Geo-encoding Libraries</b>

In [2]:
import requests # library to handle requests
import pandas as pd # library for data analsysis
import numpy as np # library to handle data in a vectorized manner
import random # library for random number generation

!conda install -c conda-forge geopy --yes 
from geopy.geocoders import Nominatim # module to convert an address into latitude and longitude values

import json # library to handle JSON files

import requests # library to handle requests

from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

Collecting package metadata (current_repodata.json): done
Solving environment: done

## Package Plan ##

  environment location: /home/jupyterlab/conda/envs/python

  added / updated specs:
    - geopy


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    geographiclib-1.50         |             py_0          34 KB  conda-forge
    geopy-1.21.0               |             py_0          58 KB  conda-forge
    ------------------------------------------------------------
                                           Total:          92 KB

The following NEW packages will be INSTALLED:

  geographiclib      conda-forge/noarch::geographiclib-1.50-py_0
  geopy              conda-forge/noarch::geopy-1.21.0-py_0



Downloading and Extracting Packages
geopy-1.21.0         | 58 KB     | ##################################### | 100% 
geographiclib-1.50   | 34 KB     | ##################################### |

<b>Importing Beautiful Soup</b>

In [3]:
!conda install --yes -c anaconda  beautifulsoup4

Collecting package metadata (current_repodata.json): done
Solving environment: done

## Package Plan ##

  environment location: /home/jupyterlab/conda/envs/python

  added / updated specs:
    - beautifulsoup4


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    beautifulsoup4-4.8.2       |           py36_0         161 KB  anaconda
    ca-certificates-2020.1.1   |                0         132 KB  anaconda
    certifi-2020.4.5.1         |           py36_0         159 KB  anaconda
    openssl-1.1.1              |       h7b6447c_0         5.0 MB  anaconda
    soupsieve-2.0              |             py_0          33 KB  anaconda
    ------------------------------------------------------------
                                           Total:         5.5 MB

The following NEW packages will be INSTALLED:

  beautifulsoup4     anaconda/linux-64::beautifulsoup4-4.8.2-py36_0
  soupsieve          a

In [4]:
from bs4 import BeautifulSoup # library to parse HTML and XML documents
print('Beautifulsoup imported')

Beautifulsoup imported


<b>Scrapping Data and Converting to dataframe</b>

In [5]:
# send the GET request
data = requests.get('https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M').text

In [6]:
# parse data from the html into a beautifulsoup object
soup = BeautifulSoup(data, 'html.parser')

In [7]:
# create three lists to store table data
postalCodeList = []
boroughList = []
neighborhoodList = []

In [8]:
# append the data into the respective lists
for row in soup.find('table').find_all('tr'):
    cells = row.find_all('td')
    if(len(cells) > 0):
        postalCodeList.append(cells[0].text.rstrip('\n'))
        boroughList.append(cells[1].text.rstrip('\n'))
        neighborhoodList.append(cells[2].text.rstrip('\n'))

In [9]:
# create a new DataFrame from the three lists
toronto_df = pd.DataFrame({"PostalCode": postalCodeList,
                           "Borough": boroughList,
                           "Neighborhood": neighborhoodList})

toronto_df.head()

Unnamed: 0,PostalCode,Borough,Neighborhood
0,M1A,Not assigned,
1,M2A,Not assigned,
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,Regent Park / Harbourfront


<b>Data Processing</b>

In [10]:
# drop cells with a borough that is Not assigned
toronto_df_dropna = toronto_df[toronto_df.Borough != "Not assigned"].reset_index(drop=True)
toronto_df_dropna.head()

Unnamed: 0,PostalCode,Borough,Neighborhood
0,M3A,North York,Parkwoods
1,M4A,North York,Victoria Village
2,M5A,Downtown Toronto,Regent Park / Harbourfront
3,M6A,North York,Lawrence Manor / Lawrence Heights
4,M7A,Downtown Toronto,Queen's Park / Ontario Provincial Government


<b>Group the Data by Neighbourhoods</b>

In [11]:
# group neighborhoods in the same borough
toronto_df_grouped = toronto_df_dropna.groupby(["PostalCode", "Borough"], as_index=False).agg(lambda x: ", ".join(x))
toronto_df_grouped.head()

Unnamed: 0,PostalCode,Borough,Neighborhood
0,M1B,Scarborough,Malvern / Rouge
1,M1C,Scarborough,Rouge Hill / Port Union / Highland Creek
2,M1E,Scarborough,Guildwood / Morningside / West Hill
3,M1G,Scarborough,Woburn
4,M1H,Scarborough,Cedarbrae


In [12]:
# for Neighborhood="Not assigned", make the value the same as Borough
for index, row in toronto_df_grouped.iterrows():
    if row["Neighborhood"] == "Not assigned":
        row["Neighborhood"] = row["Borough"]
        
toronto_df_grouped.head()

Unnamed: 0,PostalCode,Borough,Neighborhood
0,M1B,Scarborough,Malvern / Rouge
1,M1C,Scarborough,Rouge Hill / Port Union / Highland Creek
2,M1E,Scarborough,Guildwood / Morningside / West Hill
3,M1G,Scarborough,Woburn
4,M1H,Scarborough,Cedarbrae


In [13]:
# print the number of rows of the cleaned dataframe
toronto_df_grouped.shape

(103, 3)

<b>Loading Coordinates from given csv</b>

In [14]:
# load the coordinates from the csv file on Coursera
coordinates = pd.read_csv("Geospatial_Coordinates.csv")
coordinates.head()

Unnamed: 0,Postal Code,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497
2,M1E,43.763573,-79.188711
3,M1G,43.770992,-79.216917
4,M1H,43.773136,-79.239476


In [15]:
# rename the column "PostalCode"
coordinates.rename(columns={"Postal Code": "PostalCode"}, inplace=True)
coordinates.head()

Unnamed: 0,PostalCode,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497
2,M1E,43.763573,-79.188711
3,M1G,43.770992,-79.216917
4,M1H,43.773136,-79.239476


In [16]:
# merge two table on the column "PostalCode"
toronto_df_new = toronto_df_grouped.merge(coordinates, on="PostalCode", how="left")
toronto_df_new.head()

Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude
0,M1B,Scarborough,Malvern / Rouge,43.806686,-79.194353
1,M1C,Scarborough,Rouge Hill / Port Union / Highland Creek,43.784535,-79.160497
2,M1E,Scarborough,Guildwood / Morningside / West Hill,43.763573,-79.188711
3,M1G,Scarborough,Woburn,43.770992,-79.216917
4,M1H,Scarborough,Cedarbrae,43.773136,-79.239476


<b>Get the coordinates of Toronto and Create a map with neighbourhoods superimposed</b>

In [17]:
address = 'Toronto'

geolocator = Nominatim(user_agent="my-application")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Toronto are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Toronto are 43.6534817, -79.3839347.


In [18]:
import folium # map rendering library

In [19]:
# create map of Toronto using latitude and longitude values
map_toronto = folium.Map(location=[latitude, longitude], zoom_start=10)

# add markers to map
for lat, lng, borough, neighborhood in zip(toronto_df_new['Latitude'], toronto_df_new['Longitude'], toronto_df_new['Borough'], toronto_df_new['Neighborhood']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7).add_to(map_toronto)  
    
map_toronto 

<b>Use the Foursquare API to explore the neighborhoods</b>

In [20]:
CLIENT_ID = 'VAWXW5EBT00H3V1AN143DJRQHF3FGRF5Z1EJG44TWHPIZ3ZF' # your Foursquare ID
CLIENT_SECRET = 'Z5K3X30QBGQS15JEXAAEIR0BWIXNDH3UL0WXRZYEI1NCD4VB' # your Foursquare Secret
VERSION = '20180604'
LIMIT = 30
print('Your credentails:Riddhiman')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:Riddhiman
CLIENT_ID: VAWXW5EBT00H3V1AN143DJRQHF3FGRF5Z1EJG44TWHPIZ3ZF
CLIENT_SECRET:Z5K3X30QBGQS15JEXAAEIR0BWIXNDH3UL0WXRZYEI1NCD4VB


<b>Getting Top 100 venues within a radius of 500 meters</b>

In [21]:
radius = 500
LIMIT = 100

venues = []

for lat, long, post, borough, neighborhood in zip(toronto_df_new['Latitude'], toronto_df_new['Longitude'], toronto_df_new['PostalCode'], toronto_df_new['Borough'], toronto_df_new['Neighborhood']):
    url = "https://api.foursquare.com/v2/venues/explore?client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}".format(
        CLIENT_ID,
        CLIENT_SECRET,
        VERSION,
        lat,
        long,
        radius, 
        LIMIT)
    
    results = requests.get(url).json()["response"]['groups'][0]['items']
    
    for venue in results:
        venues.append((
            post, 
            borough,
            neighborhood,
            lat, 
            long, 
            venue['venue']['name'], 
            venue['venue']['location']['lat'], 
            venue['venue']['location']['lng'],  
            venue['venue']['categories'][0]['name']))

In [22]:
# convert the venues list into a new DataFrame
venues_df = pd.DataFrame(venues)

# define the column names
venues_df.columns = ['PostalCode', 'Borough', 'Neighborhood', 'BoroughLatitude', 'BoroughLongitude', 'VenueName', 'VenueLatitude', 'VenueLongitude', 'VenueCategory']

print(venues_df.shape)
venues_df.head()

(2136, 9)


Unnamed: 0,PostalCode,Borough,Neighborhood,BoroughLatitude,BoroughLongitude,VenueName,VenueLatitude,VenueLongitude,VenueCategory
0,M1B,Scarborough,Malvern / Rouge,43.806686,-79.194353,Wendy’s,43.807448,-79.199056,Fast Food Restaurant
1,M1C,Scarborough,Rouge Hill / Port Union / Highland Creek,43.784535,-79.160497,Royal Canadian Legion,43.782533,-79.163085,Bar
2,M1C,Scarborough,Rouge Hill / Port Union / Highland Creek,43.784535,-79.160497,SEBS Engineering Inc. (Sustainable Energy and ...,43.782371,-79.15682,Construction & Landscaping
3,M1E,Scarborough,Guildwood / Morningside / West Hill,43.763573,-79.188711,G & G Electronics,43.765309,-79.191537,Electronics Store
4,M1E,Scarborough,Guildwood / Morningside / West Hill,43.763573,-79.188711,Big Bite Burrito,43.766299,-79.19072,Mexican Restaurant


<b> See the List of Unique Categories in the venues </b>

In [23]:
print('There are {} uniques categories.'.format(len(venues_df['VenueCategory'].unique())))

There are 269 uniques categories.


In [24]:
venues_df['VenueCategory'].unique()[:50]

array(['Fast Food Restaurant', 'Bar', 'Construction & Landscaping',
       'Electronics Store', 'Mexican Restaurant', 'Rental Car Location',
       'Bank', 'Medical Center', 'Intersection', 'Breakfast Spot',
       'Coffee Shop', 'Korean Restaurant', 'Hakka Restaurant',
       'Caribbean Restaurant', 'Thai Restaurant', 'Athletics & Sports',
       'Gas Station', 'Bakery', 'Fried Chicken Joint', 'Playground',
       'Jewelry Store', "Women's Store", 'Department Store',
       'Discount Store', 'Chinese Restaurant', 'Hobby Shop',
       'Bus Station', 'Train Station', 'Ice Cream Shop', 'Bus Line',
       'Metro Station', 'Park', 'Soccer Field', 'Motel',
       'American Restaurant', 'Café', 'General Entertainment',
       'Skating Rink', 'College Stadium', 'Indian Restaurant',
       'Vietnamese Restaurant', 'Pet Store', 'Thrift / Vintage Store',
       'Sandwich Place', 'Middle Eastern Restaurant', 'Shopping Mall',
       'Auto Garage', 'Latin American Restaurant', 'Lounge',
       'Ita

<b>Analyze Each Area</b>

In [25]:
# one hot encoding
toronto_onehot = pd.get_dummies(venues_df[['VenueCategory']], prefix="", prefix_sep="")

# add postal, borough and neighborhood column back to dataframe
toronto_onehot['PostalCode'] = venues_df['PostalCode'] 
toronto_onehot['Borough'] = venues_df['Borough'] 
toronto_onehot['Neighborhoods'] = venues_df['Neighborhood'] 

# move postal, borough and neighborhood column to the first column
fixed_columns = list(toronto_onehot.columns[-3:]) + list(toronto_onehot.columns[:-3])
toronto_onehot = toronto_onehot[fixed_columns]

print(toronto_onehot.shape)
toronto_onehot.head()

(2136, 272)


Unnamed: 0,PostalCode,Borough,Neighborhoods,Accessories Store,Airport,Airport Food Court,Airport Gate,Airport Lounge,Airport Service,American Restaurant,...,Train Station,Vegetarian / Vegan Restaurant,Video Game Store,Video Store,Vietnamese Restaurant,Warehouse Store,Wine Bar,Wings Joint,Women's Store,Yoga Studio
0,M1B,Scarborough,Malvern / Rouge,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,M1C,Scarborough,Rouge Hill / Port Union / Highland Creek,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,M1C,Scarborough,Rouge Hill / Port Union / Highland Creek,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,M1E,Scarborough,Guildwood / Morningside / West Hill,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,M1E,Scarborough,Guildwood / Morningside / West Hill,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [27]:
#Grouping rows by neighbourhood using mean frequency
toronto_grouped = toronto_onehot.groupby(["PostalCode", "Borough", "Neighborhoods"]).mean().reset_index()

print(toronto_grouped.shape)
toronto_grouped

(99, 272)


Unnamed: 0,PostalCode,Borough,Neighborhoods,Accessories Store,Airport,Airport Food Court,Airport Gate,Airport Lounge,Airport Service,American Restaurant,...,Train Station,Vegetarian / Vegan Restaurant,Video Game Store,Video Store,Vietnamese Restaurant,Warehouse Store,Wine Bar,Wings Joint,Women's Store,Yoga Studio
0,M1B,Scarborough,Malvern / Rouge,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,M1C,Scarborough,Rouge Hill / Port Union / Highland Creek,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,M1E,Scarborough,Guildwood / Morningside / West Hill,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,M1G,Scarborough,Woburn,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,M1H,Scarborough,Cedarbrae,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
94,M9M,North York,Humberlea / Emery,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
95,M9P,Etobicoke,Westmount,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
96,M9R,Etobicoke,Kingsview Village / St. Phillips / Martin Grov...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
97,M9V,Etobicoke,South Steeles / Silverstone / Humbergate / Jam...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


<b>Getting Top10 venues per Neighbourhood</b>

In [28]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
areaColumns = ['PostalCode', 'Borough', 'Neighborhoods']
freqColumns = []
for ind in np.arange(num_top_venues):
    try:
        freqColumns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        freqColumns.append('{}th Most Common Venue'.format(ind+1))
columns = areaColumns+freqColumns

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['PostalCode'] = toronto_grouped['PostalCode']
neighborhoods_venues_sorted['Borough'] = toronto_grouped['Borough']
neighborhoods_venues_sorted['Neighborhoods'] = toronto_grouped['Neighborhoods']

for ind in np.arange(toronto_grouped.shape[0]):
    row_categories = toronto_grouped.iloc[ind, :].iloc[3:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    neighborhoods_venues_sorted.iloc[ind, 3:] = row_categories_sorted.index.values[0:num_top_venues]

# neighborhoods_venues_sorted.sort_values(freqColumns, inplace=True)
print(neighborhoods_venues_sorted.shape)
neighborhoods_venues_sorted

(99, 13)


Unnamed: 0,PostalCode,Borough,Neighborhoods,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,M1B,Scarborough,Malvern / Rouge,Fast Food Restaurant,Yoga Studio,Discount Store,Farmers Market,Falafel Restaurant,Event Space,Ethiopian Restaurant,Empanada Restaurant,Electronics Store,Eastern European Restaurant
1,M1C,Scarborough,Rouge Hill / Port Union / Highland Creek,Construction & Landscaping,Bar,Yoga Studio,Empanada Restaurant,Donut Shop,Drugstore,Dumpling Restaurant,Eastern European Restaurant,Electronics Store,Ethiopian Restaurant
2,M1E,Scarborough,Guildwood / Morningside / West Hill,Rental Car Location,Bank,Medical Center,Electronics Store,Mexican Restaurant,Intersection,Breakfast Spot,Cupcake Shop,Curling Ice,Farmers Market
3,M1G,Scarborough,Woburn,Coffee Shop,Korean Restaurant,Empanada Restaurant,Doner Restaurant,Donut Shop,Drugstore,Dumpling Restaurant,Eastern European Restaurant,Electronics Store,Yoga Studio
4,M1H,Scarborough,Cedarbrae,Hakka Restaurant,Bank,Caribbean Restaurant,Gas Station,Thai Restaurant,Athletics & Sports,Fried Chicken Joint,Bakery,Electronics Store,Eastern European Restaurant
...,...,...,...,...,...,...,...,...,...,...,...,...,...
94,M9M,North York,Humberlea / Emery,Baseball Field,Food Service,Yoga Studio,Electronics Store,Doner Restaurant,Donut Shop,Drugstore,Dumpling Restaurant,Eastern European Restaurant,Empanada Restaurant
95,M9P,Etobicoke,Westmount,Pizza Place,Coffee Shop,Sandwich Place,Intersection,Chinese Restaurant,Discount Store,Dog Run,Doner Restaurant,Donut Shop,Drugstore
96,M9R,Etobicoke,Kingsview Village / St. Phillips / Martin Grov...,Park,Mobile Phone Shop,Pizza Place,Sandwich Place,Drugstore,Distribution Center,Dog Run,Doner Restaurant,Donut Shop,Dumpling Restaurant
97,M9V,Etobicoke,South Steeles / Silverstone / Humbergate / Jam...,Grocery Store,Discount Store,Fried Chicken Joint,Fast Food Restaurant,Beer Store,Sandwich Place,Japanese Restaurant,Liquor Store,Pharmacy,Pizza Place


<b>Clustering of Neighborhoods</b>

In [29]:
# import k-means from clustering stage
from sklearn.cluster import KMeans

In [30]:
# set number of clusters
kclusters = 5

toronto_grouped_clustering = toronto_grouped.drop(["PostalCode", "Borough", "Neighborhoods"], 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(toronto_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

array([0, 2, 0, 0, 0, 0, 0, 0, 0, 0], dtype=int32)

In [47]:
#Insert Clustering Labels
neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

In [64]:
neighborhoods_venues_sorted

Unnamed: 0,Cluster Labels,PostalCode,Borough,Neighborhoods,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,0,M1B,Scarborough,Malvern / Rouge,Fast Food Restaurant,Yoga Studio,Discount Store,Farmers Market,Falafel Restaurant,Event Space,Ethiopian Restaurant,Empanada Restaurant,Electronics Store,Eastern European Restaurant
1,2,M1C,Scarborough,Rouge Hill / Port Union / Highland Creek,Construction & Landscaping,Bar,Yoga Studio,Empanada Restaurant,Donut Shop,Drugstore,Dumpling Restaurant,Eastern European Restaurant,Electronics Store,Ethiopian Restaurant
2,0,M1E,Scarborough,Guildwood / Morningside / West Hill,Rental Car Location,Bank,Medical Center,Electronics Store,Mexican Restaurant,Intersection,Breakfast Spot,Cupcake Shop,Curling Ice,Farmers Market
3,0,M1G,Scarborough,Woburn,Coffee Shop,Korean Restaurant,Empanada Restaurant,Doner Restaurant,Donut Shop,Drugstore,Dumpling Restaurant,Eastern European Restaurant,Electronics Store,Yoga Studio
4,0,M1H,Scarborough,Cedarbrae,Hakka Restaurant,Bank,Caribbean Restaurant,Gas Station,Thai Restaurant,Athletics & Sports,Fried Chicken Joint,Bakery,Electronics Store,Eastern European Restaurant
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
94,1,M9M,North York,Humberlea / Emery,Baseball Field,Food Service,Yoga Studio,Electronics Store,Doner Restaurant,Donut Shop,Drugstore,Dumpling Restaurant,Eastern European Restaurant,Empanada Restaurant
95,0,M9P,Etobicoke,Westmount,Pizza Place,Coffee Shop,Sandwich Place,Intersection,Chinese Restaurant,Discount Store,Dog Run,Doner Restaurant,Donut Shop,Drugstore
96,0,M9R,Etobicoke,Kingsview Village / St. Phillips / Martin Grov...,Park,Mobile Phone Shop,Pizza Place,Sandwich Place,Drugstore,Distribution Center,Dog Run,Doner Restaurant,Donut Shop,Dumpling Restaurant
97,0,M9V,Etobicoke,South Steeles / Silverstone / Humbergate / Jam...,Grocery Store,Discount Store,Fried Chicken Joint,Fast Food Restaurant,Beer Store,Sandwich Place,Japanese Restaurant,Liquor Store,Pharmacy,Pizza Place


In [65]:
#create new dataframe
tornto_merged = toronto_df_new.copy()
toronto_merged

Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude
0,M1B,Scarborough,Malvern / Rouge,43.806686,-79.194353
68,M5V,Downtown Toronto,CN Tower / King and Spadina / Railway Lands / ...,43.628947,-79.394420
67,M5T,Downtown Toronto,Kensington Market / Chinatown / Grange Park,43.653206,-79.400049
66,M5S,Downtown Toronto,University of Toronto / Harbord,43.662696,-79.400049
65,M5R,Central Toronto,The Annex / North Midtown / Yorkville,43.672710,-79.405678
...,...,...,...,...,...
21,M2M,North York,Willowdale / Newtonbrook,43.789053,-79.408493
16,M1X,Scarborough,Upper Rouge,43.836125,-79.205636
93,M9A,Etobicoke,Islington Avenue,43.667856,-79.532242
94,M9B,Etobicoke,West Deane Park / Princess Gardens / Martin Gr...,43.650943,-79.554724


In [66]:
# merge toronto_grouped with toronto_data to add latitude/longitude for each neighborhood
toronto_merged = toronto_merged.join(neighborhoods_venues_sorted.drop(["Borough", "Neighborhoods"], 1).set_index("PostalCode"), on="PostalCode")

print(toronto_merged.shape)
toronto_merged.head() # check the last columns!

(103, 16)


Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,M1B,Scarborough,Malvern / Rouge,43.806686,-79.194353,0.0,Fast Food Restaurant,Yoga Studio,Discount Store,Farmers Market,Falafel Restaurant,Event Space,Ethiopian Restaurant,Empanada Restaurant,Electronics Store,Eastern European Restaurant
68,M5V,Downtown Toronto,CN Tower / King and Spadina / Railway Lands / ...,43.628947,-79.39442,0.0,Airport Service,Airport Lounge,Harbor / Marina,Coffee Shop,Airport,Airport Food Court,Airport Gate,Sculpture Garden,Bar,Boutique
67,M5T,Downtown Toronto,Kensington Market / Chinatown / Grange Park,43.653206,-79.400049,0.0,Café,Coffee Shop,Vietnamese Restaurant,Mexican Restaurant,Gaming Cafe,Bar,Bakery,Dessert Shop,Arts & Crafts Store,Grocery Store
66,M5S,Downtown Toronto,University of Toronto / Harbord,43.662696,-79.400049,0.0,Café,Italian Restaurant,Japanese Restaurant,Bar,Bookstore,Bakery,Restaurant,Yoga Studio,Bank,Beer Bar
65,M5R,Central Toronto,The Annex / North Midtown / Yorkville,43.67271,-79.405678,0.0,Sandwich Place,Café,Coffee Shop,Park,Pizza Place,Burger Joint,Donut Shop,Cheese Shop,Pub,Indian Restaurant


In [67]:
# sort the results by Cluster Labels
print(toronto_merged.shape)
toronto_merged.sort_values(["Cluster Labels"], inplace=True)
toronto_merged

(103, 16)


Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,M1B,Scarborough,Malvern / Rouge,43.806686,-79.194353,0.0,Fast Food Restaurant,Yoga Studio,Discount Store,Farmers Market,Falafel Restaurant,Event Space,Ethiopian Restaurant,Empanada Restaurant,Electronics Store,Eastern European Restaurant
10,M1P,Scarborough,Dorset Park / Wexford Heights / Scarborough To...,43.757410,-79.273304,0.0,Indian Restaurant,Pet Store,Vietnamese Restaurant,Chinese Restaurant,Thrift / Vintage Store,Drugstore,Discount Store,Distribution Center,Dog Run,Doner Restaurant
11,M1R,Scarborough,Wexford / Maryvale,43.750072,-79.295849,0.0,Breakfast Spot,Vietnamese Restaurant,Middle Eastern Restaurant,Auto Garage,Bakery,Sandwich Place,Shopping Mall,Event Space,Ethiopian Restaurant,Empanada Restaurant
12,M1S,Scarborough,Agincourt,43.794200,-79.262029,0.0,Lounge,Skating Rink,Breakfast Spot,Latin American Restaurant,Empanada Restaurant,Donut Shop,Drugstore,Dumpling Restaurant,Eastern European Restaurant,Electronics Store
13,M1T,Scarborough,Clarks Corners / Tam O'Shanter / Sullivan,43.781638,-79.304302,0.0,Pizza Place,Fast Food Restaurant,Coffee Shop,Italian Restaurant,Bank,Fried Chicken Joint,Chinese Restaurant,Thai Restaurant,Gas Station,Shopping Mall
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
21,M2M,North York,Willowdale / Newtonbrook,43.789053,-79.408493,4.0,Piano Bar,Yoga Studio,Electronics Store,Dog Run,Doner Restaurant,Donut Shop,Drugstore,Dumpling Restaurant,Eastern European Restaurant,Empanada Restaurant
16,M1X,Scarborough,Upper Rouge,43.836125,-79.205636,,,,,,,,,,,
93,M9A,Etobicoke,Islington Avenue,43.667856,-79.532242,,,,,,,,,,,
94,M9B,Etobicoke,West Deane Park / Princess Gardens / Martin Gr...,43.650943,-79.554724,,,,,,,,,,,


In [68]:
#Checking the datatypes
toronto_merged.dtypes

PostalCode                 object
Borough                    object
Neighborhood               object
Latitude                  float64
Longitude                 float64
Cluster Labels            float64
1st Most Common Venue      object
2nd Most Common Venue      object
3rd Most Common Venue      object
4th Most Common Venue      object
5th Most Common Venue      object
6th Most Common Venue      object
7th Most Common Venue      object
8th Most Common Venue      object
9th Most Common Venue      object
10th Most Common Venue     object
dtype: object

In [69]:
#dropping NaN cluster labels
toronto_merged.dropna(axis=0,inplace=True)
toronto_merged

Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,M1B,Scarborough,Malvern / Rouge,43.806686,-79.194353,0.0,Fast Food Restaurant,Yoga Studio,Discount Store,Farmers Market,Falafel Restaurant,Event Space,Ethiopian Restaurant,Empanada Restaurant,Electronics Store,Eastern European Restaurant
10,M1P,Scarborough,Dorset Park / Wexford Heights / Scarborough To...,43.757410,-79.273304,0.0,Indian Restaurant,Pet Store,Vietnamese Restaurant,Chinese Restaurant,Thrift / Vintage Store,Drugstore,Discount Store,Distribution Center,Dog Run,Doner Restaurant
11,M1R,Scarborough,Wexford / Maryvale,43.750072,-79.295849,0.0,Breakfast Spot,Vietnamese Restaurant,Middle Eastern Restaurant,Auto Garage,Bakery,Sandwich Place,Shopping Mall,Event Space,Ethiopian Restaurant,Empanada Restaurant
12,M1S,Scarborough,Agincourt,43.794200,-79.262029,0.0,Lounge,Skating Rink,Breakfast Spot,Latin American Restaurant,Empanada Restaurant,Donut Shop,Drugstore,Dumpling Restaurant,Eastern European Restaurant,Electronics Store
13,M1T,Scarborough,Clarks Corners / Tam O'Shanter / Sullivan,43.781638,-79.304302,0.0,Pizza Place,Fast Food Restaurant,Coffee Shop,Italian Restaurant,Bank,Fried Chicken Joint,Chinese Restaurant,Thai Restaurant,Gas Station,Shopping Mall
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
30,M3K,North York,Downsview,43.737473,-79.464763,3.0,Park,Airport,Construction & Landscaping,Electronics Store,Dog Run,Doner Restaurant,Donut Shop,Drugstore,Dumpling Restaurant,Eastern European Restaurant
40,M4J,East York,East Toronto,43.685347,-79.338106,3.0,Park,Convenience Store,Coffee Shop,Electronics Store,Dog Run,Doner Restaurant,Donut Shop,Drugstore,Dumpling Restaurant,Eastern European Restaurant
44,M4N,Central Toronto,Lawrence Park,43.728020,-79.388790,3.0,Park,Lawyer,Swim School,Bus Line,Yoga Studio,Eastern European Restaurant,Doner Restaurant,Donut Shop,Drugstore,Dumpling Restaurant
14,M1V,Scarborough,Milliken / Agincourt North / Steeles East / L'...,43.815252,-79.284577,3.0,Park,Playground,Coffee Shop,Eastern European Restaurant,Distribution Center,Dog Run,Doner Restaurant,Donut Shop,Drugstore,Dumpling Restaurant


In [75]:
#converting cluster labels to integer
toronto_merged.astype({'Cluster Labels':'int64'}).dtypes
toronto_merged['Cluster Labels'] = toronto_merged['Cluster Labels'].astype('int64')

In [76]:
toronto_merged

Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,M1B,Scarborough,Malvern / Rouge,43.806686,-79.194353,0,Fast Food Restaurant,Yoga Studio,Discount Store,Farmers Market,Falafel Restaurant,Event Space,Ethiopian Restaurant,Empanada Restaurant,Electronics Store,Eastern European Restaurant
10,M1P,Scarborough,Dorset Park / Wexford Heights / Scarborough To...,43.757410,-79.273304,0,Indian Restaurant,Pet Store,Vietnamese Restaurant,Chinese Restaurant,Thrift / Vintage Store,Drugstore,Discount Store,Distribution Center,Dog Run,Doner Restaurant
11,M1R,Scarborough,Wexford / Maryvale,43.750072,-79.295849,0,Breakfast Spot,Vietnamese Restaurant,Middle Eastern Restaurant,Auto Garage,Bakery,Sandwich Place,Shopping Mall,Event Space,Ethiopian Restaurant,Empanada Restaurant
12,M1S,Scarborough,Agincourt,43.794200,-79.262029,0,Lounge,Skating Rink,Breakfast Spot,Latin American Restaurant,Empanada Restaurant,Donut Shop,Drugstore,Dumpling Restaurant,Eastern European Restaurant,Electronics Store
13,M1T,Scarborough,Clarks Corners / Tam O'Shanter / Sullivan,43.781638,-79.304302,0,Pizza Place,Fast Food Restaurant,Coffee Shop,Italian Restaurant,Bank,Fried Chicken Joint,Chinese Restaurant,Thai Restaurant,Gas Station,Shopping Mall
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
30,M3K,North York,Downsview,43.737473,-79.464763,3,Park,Airport,Construction & Landscaping,Electronics Store,Dog Run,Doner Restaurant,Donut Shop,Drugstore,Dumpling Restaurant,Eastern European Restaurant
40,M4J,East York,East Toronto,43.685347,-79.338106,3,Park,Convenience Store,Coffee Shop,Electronics Store,Dog Run,Doner Restaurant,Donut Shop,Drugstore,Dumpling Restaurant,Eastern European Restaurant
44,M4N,Central Toronto,Lawrence Park,43.728020,-79.388790,3,Park,Lawyer,Swim School,Bus Line,Yoga Studio,Eastern European Restaurant,Doner Restaurant,Donut Shop,Drugstore,Dumpling Restaurant
14,M1V,Scarborough,Milliken / Agincourt North / Steeles East / L'...,43.815252,-79.284577,3,Park,Playground,Coffee Shop,Eastern European Restaurant,Distribution Center,Dog Run,Doner Restaurant,Donut Shop,Drugstore,Dumpling Restaurant


<b>Visualizing the Clusters</b>

In [54]:
# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

In [77]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i+x+(i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, post, bor, poi, cluster in zip(toronto_merged['Latitude'], toronto_merged['Longitude'], toronto_merged['PostalCode'], toronto_merged['Borough'], toronto_merged['Neighborhood'], toronto_merged['Cluster Labels']):
    label = folium.Popup('{} ({}): {} - Cluster {}'.format(bor, post, poi, cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

<b>Examine Clusters</b>

<b>Cluster 1</b>

In [78]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 0, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Scarborough,0,Fast Food Restaurant,Yoga Studio,Discount Store,Farmers Market,Falafel Restaurant,Event Space,Ethiopian Restaurant,Empanada Restaurant,Electronics Store,Eastern European Restaurant
10,Scarborough,0,Indian Restaurant,Pet Store,Vietnamese Restaurant,Chinese Restaurant,Thrift / Vintage Store,Drugstore,Discount Store,Distribution Center,Dog Run,Doner Restaurant
11,Scarborough,0,Breakfast Spot,Vietnamese Restaurant,Middle Eastern Restaurant,Auto Garage,Bakery,Sandwich Place,Shopping Mall,Event Space,Ethiopian Restaurant,Empanada Restaurant
12,Scarborough,0,Lounge,Skating Rink,Breakfast Spot,Latin American Restaurant,Empanada Restaurant,Donut Shop,Drugstore,Dumpling Restaurant,Eastern European Restaurant,Electronics Store
13,Scarborough,0,Pizza Place,Fast Food Restaurant,Coffee Shop,Italian Restaurant,Bank,Fried Chicken Joint,Chinese Restaurant,Thai Restaurant,Gas Station,Shopping Mall
...,...,...,...,...,...,...,...,...,...,...,...,...
95,Etobicoke,0,Coffee Shop,Park,Liquor Store,Shopping Plaza,Cosmetics Shop,Pizza Place,Café,Beer Store,Eastern European Restaurant,Doner Restaurant
96,North York,0,Empanada Restaurant,Pizza Place,Electronics Store,Dog Run,Doner Restaurant,Donut Shop,Drugstore,Dumpling Restaurant,Eastern European Restaurant,Yoga Studio
99,Etobicoke,0,Pizza Place,Coffee Shop,Sandwich Place,Intersection,Chinese Restaurant,Discount Store,Dog Run,Doner Restaurant,Donut Shop,Drugstore
100,Etobicoke,0,Park,Mobile Phone Shop,Pizza Place,Sandwich Place,Drugstore,Distribution Center,Dog Run,Doner Restaurant,Donut Shop,Dumpling Restaurant


<b>Cluster 2</b>

In [80]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 1, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
91,Etobicoke,1,Baseball Field,Yoga Studio,Electronics Store,Doner Restaurant,Donut Shop,Drugstore,Dumpling Restaurant,Eastern European Restaurant,Empanada Restaurant,Fish & Chips Shop
97,North York,1,Baseball Field,Food Service,Yoga Studio,Electronics Store,Doner Restaurant,Donut Shop,Drugstore,Dumpling Restaurant,Eastern European Restaurant,Empanada Restaurant


<b>Cluster 3</b>

In [81]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 2, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
1,Scarborough,2,Construction & Landscaping,Bar,Yoga Studio,Empanada Restaurant,Donut Shop,Drugstore,Dumpling Restaurant,Eastern European Restaurant,Electronics Store,Ethiopian Restaurant


<b>Cluster 4</b>

In [82]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 3, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
90,Etobicoke,3,Park,Pool,River,Smoke Shop,Concert Hall,Comfort Food Restaurant,Falafel Restaurant,Event Space,Ethiopian Restaurant,Empanada Restaurant
48,Central Toronto,3,Park,Playground,Summer Camp,Dumpling Restaurant,Distribution Center,Dog Run,Doner Restaurant,Donut Shop,Drugstore,Eastern European Restaurant
50,Downtown Toronto,3,Park,Trail,Playground,Distribution Center,Dog Run,Doner Restaurant,Donut Shop,Drugstore,Dumpling Restaurant,Eastern European Restaurant
79,North York,3,Park,Basketball Court,Bakery,Construction & Landscaping,Field,Fast Food Restaurant,Farmers Market,Falafel Restaurant,Event Space,Ethiopian Restaurant
74,York,3,Park,Women's Store,Pool,Fast Food Restaurant,Farmers Market,Falafel Restaurant,Event Space,Ethiopian Restaurant,Empanada Restaurant,Discount Store
23,North York,3,Park,Bank,Convenience Store,Bar,Electronics Store,Donut Shop,Drugstore,Dumpling Restaurant,Eastern European Restaurant,Empanada Restaurant
25,North York,3,Park,Food & Drink Shop,Electronics Store,Dog Run,Doner Restaurant,Donut Shop,Drugstore,Dumpling Restaurant,Eastern European Restaurant,Empanada Restaurant
30,North York,3,Park,Airport,Construction & Landscaping,Electronics Store,Dog Run,Doner Restaurant,Donut Shop,Drugstore,Dumpling Restaurant,Eastern European Restaurant
40,East York,3,Park,Convenience Store,Coffee Shop,Electronics Store,Dog Run,Doner Restaurant,Donut Shop,Drugstore,Dumpling Restaurant,Eastern European Restaurant
44,Central Toronto,3,Park,Lawyer,Swim School,Bus Line,Yoga Studio,Eastern European Restaurant,Doner Restaurant,Donut Shop,Drugstore,Dumpling Restaurant


<b>Cluster 5</b>

In [83]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 4, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
21,North York,4,Piano Bar,Yoga Studio,Electronics Store,Dog Run,Doner Restaurant,Donut Shop,Drugstore,Dumpling Restaurant,Eastern European Restaurant,Empanada Restaurant


<b>Observations</b>

Cluster 1 has mostly restaurants as the most popular venue. Cluster 2 has Baseball Field as same. Cluster 3 has construction & Landscaping. Cluster 4 has Parks as the most popular venue. For Cluster 5, it is Piano Bar. So the clusters are pretty marketed with respect to customers' tastes.