# Coursera data science capstone project

This notebook contains the capstone project for Cousera Data Science course

***
# BEGINNING OF PART 1

Retrieving postcodes, boroughs and neighbourhoods in Toronto and treating missing values and duplicated postcodes

In [1]:
import numpy as np
import pandas as pd
from bs4 import BeautifulSoup
import requests

## Reading the data with web scraping

We will get the postcode, borough and neighbourhood data of Toronto from Wikipedia

In [2]:
#making the request to the Wikipedia page that contains the data for Toronto postcodes, borough and neighbourhoods
url_toronto_postcodes = "http://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M"
page_request = requests.get(url_toronto_postcodes).text

#scraping the page to get the html table
page_html = BeautifulSoup(page_request,"lxml")
postcode_table = page_html.find("table", class_="wikitable sortable")

In [3]:
#reading the data from the html table
df_toronto_postcode = pd.read_html(str(postcode_table))[0]
df_toronto_postcode

Unnamed: 0,Postcode,Borough,Neighbourhood
0,M1A,Not assigned,Not assigned
1,M2A,Not assigned,Not assigned
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,Harbourfront
...,...,...,...
282,M8Z,Etobicoke,Mimico NW
283,M8Z,Etobicoke,The Queensway West
284,M8Z,Etobicoke,Royal York South West
285,M8Z,Etobicoke,South of Bloor


## Preparing the data

### Removing rows with not assigned borough

We need to check the information that is present in the Borough column
This will be important to confirm that all rows with not assigned borough will be removed

In [4]:
#how many rows do we have with borough = Not assigned? 
df_toronto_postcode["Borough"].value_counts()

Not assigned        77
Etobicoke           44
North York          38
Scarborough         37
Downtown Toronto    36
Central Toronto     17
West Toronto        13
York                 9
East Toronto         7
East York            6
Queen's Park         2
Mississauga          1
Name: Borough, dtype: int64

In [5]:
#Removing rows with not assigned borough
df_toronto_postcode = df_toronto_postcode[df_toronto_postcode["Borough"] != "Not assigned"].copy()

#Confirming the remaining values for borough
df_toronto_postcode["Borough"].value_counts()

Etobicoke           44
North York          38
Scarborough         37
Downtown Toronto    36
Central Toronto     17
West Toronto        13
York                 9
East Toronto         7
East York            6
Queen's Park         2
Mississauga          1
Name: Borough, dtype: int64

In [6]:
#Reseting the index after droping rows
df_toronto_postcode.reset_index(drop=True, inplace=True)
df_toronto_postcode.head(10)

Unnamed: 0,Postcode,Borough,Neighbourhood
0,M3A,North York,Parkwoods
1,M4A,North York,Victoria Village
2,M5A,Downtown Toronto,Harbourfront
3,M6A,North York,Lawrence Heights
4,M6A,North York,Lawrence Manor
5,M7A,Queen's Park,Not assigned
6,M9A,Queen's Park,Queen's Park
7,M1B,Scarborough,Rouge
8,M1B,Scarborough,Malvern
9,M3B,North York,Don Mills North


### Replacing not assigned neighbourhood with the borough name

We first check which rows do not have an assigned neighbourhood

In [7]:
df_toronto_postcode[df_toronto_postcode["Neighbourhood"] == "Not assigned"]

Unnamed: 0,Postcode,Borough,Neighbourhood
5,M7A,Queen's Park,Not assigned


In [8]:
#For those rows we assign the borough name to the neighbourhood column
df_toronto_postcode.loc[df_toronto_postcode["Neighbourhood"] == "Not assigned", "Neighbourhood"] = df_toronto_postcode.loc[df_toronto_postcode["Neighbourhood"] == "Not assigned", "Borough"]

In [9]:
#Then we confirm that no neighbourhood remained with a not assigned value
df_toronto_postcode[df_toronto_postcode["Neighbourhood"] == "Not assigned"]

Unnamed: 0,Postcode,Borough,Neighbourhood


### Concatenating neighbourhoods from the same postcode

More than one neighborhood can exist in one postal code area. For example, in the table on the Wikipedia page, the M5A postcode is listed twice and has two neighborhoods: Harbourfront and Regent Park. These two rows will be combined into one row with the neighborhoods separated with a comma. The same is applied to all the postcodes.

In [10]:
#Let's define a function to retrieve all the categories present in the data from a given row's postcode
def get_neighbourhoods_for_postcode(row):
    neighbourhood_series = df_toronto_postcode.loc[df_toronto_postcode["Postcode"] == row["Postcode"], "Neighbourhood"]
    neighbourhood_list = neighbourhood_series.tolist()
    return ",".join(neighbourhood_list)

#We apply that function to all the rows in the data and place the results in a new column
df_toronto_postcode["Neighbourhood List"] = df_toronto_postcode.apply(get_neighbourhoods_for_postcode, axis=1)

In [11]:
#We then drop the old neighbourhood column
df_toronto_postcode.drop(["Neighbourhood"],axis=1,inplace=True)

In [12]:
#Then we remove the duplicate rows keeping only the first occurrence 
df_toronto_postcode.drop_duplicates(keep="first",inplace=True)

In [13]:
#Let's rename the columns and reset the index
df_toronto_postcode.columns = ["Postcode", "Borough", "Neighbourhood"]
df_toronto_postcode.reset_index(drop=True, inplace=True)
df_toronto_postcode.head(20)

Unnamed: 0,Postcode,Borough,Neighbourhood
0,M3A,North York,Parkwoods
1,M4A,North York,Victoria Village
2,M5A,Downtown Toronto,Harbourfront
3,M6A,North York,"Lawrence Heights,Lawrence Manor"
4,M7A,Queen's Park,Queen's Park
5,M9A,Queen's Park,Queen's Park
6,M1B,Scarborough,"Rouge,Malvern"
7,M3B,North York,Don Mills North
8,M4B,East York,"Woodbine Gardens,Parkview Hill"
9,M5B,Downtown Toronto,"Ryerson,Garden District"


In [14]:
#Let's check the shape of the dataframe
df_toronto_postcode.shape

(103, 3)

# END OF PART 1
***
# BEGINNING OF PART 2

Getting the latitute and longitude of each post code

In [15]:
#Let's read the latitude and longitude of each postcode in Toronto from a csv file
df_coordinates = pd.read_csv("./Geospatial_Coordinates.csv")
df_coordinates.head()

Unnamed: 0,Postal Code,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497
2,M1E,43.763573,-79.188711
3,M1G,43.770992,-79.216917
4,M1H,43.773136,-79.239476


We have to join our old dataframe with the one with the coordinates data

In [16]:
#First we rename the column on the coordinates to have the same name as in the older dataframe
df_coordinates.columns = ["Postcode","Latitude","Longitude"]

#Now perform the actual merge
df_toronto_postcode_coord = pd.merge(df_toronto_postcode, df_coordinates, how='left', on="Postcode")
df_toronto_postcode_coord.head()

Unnamed: 0,Postcode,Borough,Neighbourhood,Latitude,Longitude
0,M3A,North York,Parkwoods,43.753259,-79.329656
1,M4A,North York,Victoria Village,43.725882,-79.315572
2,M5A,Downtown Toronto,Harbourfront,43.65426,-79.360636
3,M6A,North York,"Lawrence Heights,Lawrence Manor",43.718518,-79.464763
4,M7A,Queen's Park,Queen's Park,43.662301,-79.389494


Let's see if any column for latitude or longitude remained empty. If it is all ok, the count of latitude and longitude should match the shape of the old dataframe (103 rows)

In [17]:
#Use describe method to check the count of numeric columns
df_toronto_postcode_coord.describe()

Unnamed: 0,Latitude,Longitude
count,103.0,103.0
mean,43.704608,-79.397153
std,0.052463,0.097146
min,43.602414,-79.615819
25%,43.660567,-79.464763
50%,43.696948,-79.38879
75%,43.74532,-79.340923
max,43.836125,-79.160497


# END OF PART 2
***
# BEGINNING OF PART 3

For this last section, we will explore and cluster the neighbourhoods in Toronto and plot them in a map
Let's first install some necessary libraries

In [18]:
#Install geopy and import Nominatim (to convert address into latitude and longitude)
!conda install -c conda-forge geopy --yes 
from geopy.geocoders import Nominatim

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

#Install folium to plot the map
!conda install -c conda-forge folium=0.5.0 --yes 
import folium

Collecting package metadata (current_repodata.json): done
Solving environment: done

# All requested packages already installed.

Collecting package metadata (current_repodata.json): done
Solving environment: done

# All requested packages already installed.



We should first find the latitude and longitude of Toronto to plot the map

In [19]:
toronto_address = 'Toronto, ON, Canada'

geolocator = Nominatim(user_agent="toronto_explorer")
location = geolocator.geocode(toronto_address)
toronto_latitude = location.latitude
toronto_longitude = location.longitude
print('The geograpical coordinate of Toronto {}, {}.'.format(toronto_latitude, toronto_longitude))

The geograpical coordinate of Toronto 43.653963, -79.387207.


We are now ready to plot our neighbourhood data in Toronto's map

In [20]:
# create map of Toronto using latitude and longitude values
map_toronto = folium.Map(location=[toronto_latitude, toronto_longitude], zoom_start=11)

# add markers to map
for lat, lng, borough, neighbourhood in zip(df_toronto_postcode_coord['Latitude'], df_toronto_postcode_coord['Longitude'], df_toronto_postcode_coord['Borough'], df_toronto_postcode_coord['Neighbourhood']):
    label = '{}, {}'.format(neighbourhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_toronto)  
    
map_toronto

## Exploring the first neighbourhood

To have an idea of how to explore the neighbourhoods in Toronto, let's explore the first entry in our data using the Foursquare API

In [21]:
#Let's retrieve our foursquare credentials
#The Foursquare ID
CLIENT_ID = ""
#The Foursquare Secret
CLIENT_SECRET = ""
#The Foursquare API version
VERSION = "20191117" 
with open("credentials","r") as credentials_file:
    CLIENT_ID = credentials_file.readline().replace("\n","")
    CLIENT_SECRET = credentials_file.readline().replace("\n","")
print("Done getting the credentials!")

Done getting the credentials!


In [22]:
import json

#Tranform JSON file into a pandas dataframe
from pandas.io.json import json_normalize 

In [23]:
#Get the data of the 1st neighbourhood in our data
neighborhood_latitude = df_toronto_postcode_coord.loc[0,"Latitude"]
neighborhood_longitude = df_toronto_postcode_coord.loc[0,"Longitude"]

#Define a radius for exploration
RADIUS = 1000

#Set the limit of the first 100 venues to be retrieved
LIMIT = 100

#Build the url and with the parameters defined previously and perform the request
url = "https://api.foursquare.com/v2/venues/explore?client_id={}&client_secret={}&ll={},{}&v={}&radius={}&limit={}".format(CLIENT_ID, CLIENT_SECRET, neighborhood_latitude, neighborhood_longitude, VERSION, RADIUS, LIMIT)
results = requests.get(url).json()

In [24]:
#From the json in the response, let's separate the interesting data of the venues
venues = results['response']['groups'][0]['items']
    
#Flatten the json into a dataframe
nearby_venues = json_normalize(venues) 
nearby_venues.head()

Unnamed: 0,referralId,reasons.count,reasons.items,venue.id,venue.name,venue.location.address,venue.location.lat,venue.location.lng,venue.location.labeledLatLngs,venue.location.distance,...,venue.location.cc,venue.location.neighborhood,venue.location.city,venue.location.state,venue.location.country,venue.location.formattedAddress,venue.categories,venue.photos.count,venue.photos.groups,venue.location.crossStreet
0,e-0-4b8991cbf964a520814232e3-0,0,"[{'summary': 'This spot is popular', 'type': '...",4b8991cbf964a520814232e3,Allwyn's Bakery,81 Underhill drive,43.75984,-79.324719,"[{'label': 'display', 'lat': 43.75984035203157...",833,...,CA,Parkwoods - Donalda,Toronto,ON,Canada,"[81 Underhill drive, Toronto ON M3A 1Z5, Canada]","[{'id': '4bf58dd8d48988d144941735', 'name': 'C...",0,[],
1,e-0-4e8d9dcdd5fbbbb6b3003c7b-1,0,"[{'summary': 'This spot is popular', 'type': '...",4e8d9dcdd5fbbbb6b3003c7b,Brookbanks Park,Toronto,43.751976,-79.33214,"[{'label': 'display', 'lat': 43.75197604605557...",245,...,CA,,Toronto,ON,Canada,"[Toronto, Toronto ON, Canada]","[{'id': '4bf58dd8d48988d163941735', 'name': 'P...",0,[],
2,e-0-57e286f2498e43d84d92d34a-2,0,"[{'summary': 'This spot is popular', 'type': '...",57e286f2498e43d84d92d34a,Tim Hortons,215 Brookbanks,43.760668,-79.326368,"[{'label': 'display', 'lat': 43.76066827030228...",866,...,CA,,Toronto,ON,Canada,"[215 Brookbanks (York Miils Rd), Toronto ON M3...","[{'id': '4bf58dd8d48988d16d941735', 'name': 'C...",0,[],York Miils Rd
3,e-0-58a8dcaa6119f47b9a94dc05-3,0,"[{'summary': 'This spot is popular', 'type': '...",58a8dcaa6119f47b9a94dc05,A&W Canada,1277 York Mills Road,43.760643,-79.326865,"[{'label': 'display', 'lat': 43.76064307616131...",852,...,CA,,Toronto,ON,Canada,"[1277 York Mills Road, Toronto ON M3A 1Z5, Can...","[{'id': '4bf58dd8d48988d16e941735', 'name': 'F...",0,[],
4,e-0-4bafa285f964a5203a123ce3-4,0,"[{'summary': 'This spot is popular', 'type': '...",4bafa285f964a5203a123ce3,Bruno's valu-mart,83 Underhill,43.746143,-79.32463,"[{'label': 'display', 'lat': 43.746143, 'lng':...",889,...,CA,,Don Mills,ON,Canada,"[83 Underhill (at Donwood Plaza), Don Mills ON...","[{'id': '4bf58dd8d48988d118951735', 'name': 'G...",0,[],at Donwood Plaza


In [25]:
#Define a function that returns the category list of a given row
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

In [26]:
#We will use only some columns from the "json dataframe"
filtered_columns = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
nearby_venues =nearby_venues.loc[:, filtered_columns]

#Then we retrieve the list of the venues categories for each row
nearby_venues['venue.categories'] = nearby_venues.apply(get_category_type, axis=1)

In [27]:
#We finally rename the columns
nearby_venues.columns = ["Venue", "Venue categories", "Venue latitude", "Venue longitude"]

nearby_venues.head()

Unnamed: 0,Venue,Venue categories,Venue latitude,Venue longitude
0,Allwyn's Bakery,Caribbean Restaurant,43.75984,-79.324719
1,Brookbanks Park,Park,43.751976,-79.33214
2,Tim Hortons,Café,43.760668,-79.326368
3,A&W Canada,Fast Food Restaurant,43.760643,-79.326865
4,Bruno's valu-mart,Grocery Store,43.746143,-79.32463


In [28]:
nearby_venues.shape[0]

29

In [29]:
#Let's not forget that we are interested in clustering neighbourhoods
#So, let's retake the neighbourhood data and add it to the new dataframe
nearby_venues["Neighbourhood name"] = df_toronto_postcode_coord.loc[0,"Neighbourhood"]
nearby_venues["Neighbourhood Latitude"] = df_toronto_postcode_coord.loc[0,"Latitude"]
nearby_venues["Neighbourhood Longitude"] = df_toronto_postcode_coord.loc[0,"Longitude"]
nearby_venues.head()

Unnamed: 0,Venue,Venue categories,Venue latitude,Venue longitude,Neighbourhood name,Neighbourhood Latitude,Neighbourhood Longitude
0,Allwyn's Bakery,Caribbean Restaurant,43.75984,-79.324719,Parkwoods,43.753259,-79.329656
1,Brookbanks Park,Park,43.751976,-79.33214,Parkwoods,43.753259,-79.329656
2,Tim Hortons,Café,43.760668,-79.326368,Parkwoods,43.753259,-79.329656
3,A&W Canada,Fast Food Restaurant,43.760643,-79.326865,Parkwoods,43.753259,-79.329656
4,Bruno's valu-mart,Grocery Store,43.746143,-79.32463,Parkwoods,43.753259,-79.329656


## Explore all neighbourhoods in Toronto

Now that we have seen what to do with one neighbourhood, we can define the process for all neighbourhoods.
We start by defining a function to repeat the process above but now for all neighbourhoods in Toronto

In [30]:
#From the series of neighbourhoods, latitudes and longitudes,
#returns a dataframe with neighbourhood and venues categories data
def get_near_venues(neighbourhoods, latitudes, longitudes,radius=500):
    
    df_near_venues = pd.DataFrame()
    
    for neighbourhood, latitude, longitude in zip(neighbourhoods,latitudes,longitudes):
        #print(neighbourhood)
        
        #construct the url and perform the request
        url = "https://api.foursquare.com/v2/venues/explore?client_id={}&client_secret={}&ll={},{}&v={}&radius={}&limit={}".format(CLIENT_ID, CLIENT_SECRET, latitude, longitude, VERSION, radius, LIMIT)
        results = requests.get(url).json()
        
        #from the json response get the interesting data for venues
        venues = results['response']['groups'][0]['items']
    
        #flatten JSON
        nearby_venues = json_normalize(venues)
        #print(nearby_venues.shape)
        #filter columns
        if nearby_venues.shape[0] > 0:
            filtered_columns = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
            nearby_venues =nearby_venues.loc[:, filtered_columns]

            #filter the category for each row
            nearby_venues['venue.categories'] = nearby_venues.apply(get_category_type, axis=1)
        
            #clean columns
            nearby_venues.columns = ["Venue", "Venue categories", "Venue latitude", "Venue longitude"]
    
            nearby_venues["Neighbourhood name"] = neighbourhood
            nearby_venues["Neighbourhood Latitude"] = latitude
            nearby_venues["Neighbourhood Longitude"] = longitude
        
            #append the data to the other venues already found
            df_near_venues = df_near_venues.append(nearby_venues,ignore_index=True)
        
    return df_near_venues

In [31]:
df_toronto_venues = get_near_venues(df_toronto_postcode_coord["Neighbourhood"],df_toronto_postcode_coord["Latitude"],df_toronto_postcode_coord["Longitude"],radius=1000)

In [32]:
df_toronto_venues.head()

Unnamed: 0,Venue,Venue categories,Venue latitude,Venue longitude,Neighbourhood name,Neighbourhood Latitude,Neighbourhood Longitude
0,Allwyn's Bakery,Caribbean Restaurant,43.75984,-79.324719,Parkwoods,43.753259,-79.329656
1,Brookbanks Park,Park,43.751976,-79.33214,Parkwoods,43.753259,-79.329656
2,Tim Hortons,Café,43.760668,-79.326368,Parkwoods,43.753259,-79.329656
3,A&W Canada,Fast Food Restaurant,43.760643,-79.326865,Parkwoods,43.753259,-79.329656
4,Bruno's valu-mart,Grocery Store,43.746143,-79.32463,Parkwoods,43.753259,-79.329656


In [33]:
#Check the data grouped by each neighbourhood
pd.set_option('display.max_rows', None)
df_toronto_venues.groupby('Neighbourhood name').count()

Unnamed: 0_level_0,Venue,Venue categories,Venue latitude,Venue longitude,Neighbourhood Latitude,Neighbourhood Longitude
Neighbourhood name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
"Adelaide,King,Richmond",100,100,100,100,100,100
Agincourt,45,45,45,45,45,45
"Agincourt North,L'Amoreaux East,Milliken,Steeles East",29,29,29,29,29,29
"Albion Gardens,Beaumond Heights,Humbergate,Jamestown,Mount Olive,Silverstone,South Steeles,Thistletown",13,13,13,13,13,13
"Alderwood,Long Branch",28,28,28,28,28,28
"Bathurst Manor,Downsview North,Wilson Heights",31,31,31,31,31,31
Bayview Village,13,13,13,13,13,13
"Bedford Park,Lawrence Manor East",42,42,42,42,42,42
Berczy Park,100,100,100,100,100,100
"Birch Cliff,Cliffside West",13,13,13,13,13,13


In [34]:
#Check number of venues and unique categories using the describe method
df_toronto_venues.describe(include="all")

Unnamed: 0,Venue,Venue categories,Venue latitude,Venue longitude,Neighbourhood name,Neighbourhood Latitude,Neighbourhood Longitude
count,4887,4887,4887.0,4887.0,4887,4887.0,4887.0
unique,2910,331,,,101,,
top,Starbucks,Coffee Shop,,,Queen's Park,,
freq,110,379,,,112,,
mean,,,43.683909,-79.393098,,43.684202,-79.392763
std,,,0.044558,0.068797,,0.044695,0.068907
min,,,43.593866,-79.62696,,43.602414,-79.615819
25%,,,43.650656,-79.41949,,43.651494,-79.41975
50%,,,43.66607,-79.386541,,43.668999,-79.384568
75%,,,43.70727,-79.361053,,43.70906,-79.360636


## Anaysis of the neighbourhoods and their venues' categories

Now that we already have enough data from all neighbourhoods, let's prepare for clustering
In order to do so, we need to perform one hot enconding on the categories values

In [35]:
#Perform one hot econding with the categories
df_toronto_onehot = pd.get_dummies(df_toronto_venues[["Venue categories"]],prefix="", prefix_sep="")

#Add the neighbourhood name in the dataframe
df_toronto_onehot["Neighbourhood"] = df_toronto_venues["Neighbourhood name"]

#Put the neighbourhood column in the first position
new_column_order = [df_toronto_onehot.columns[-1]] + list(df_toronto_onehot.columns[:-1])
df_toronto_onehot = df_toronto_onehot[new_column_order]

pd.set_option('display.max_columns', None)

df_toronto_onehot.head()

Unnamed: 0,Neighbourhood,Accessories Store,Afghan Restaurant,Airport,Airport Lounge,American Restaurant,Amphitheater,Animal Shelter,Antique Shop,Aquarium,Art Gallery,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,Auto Dealership,Auto Workshop,Automotive Shop,BBQ Joint,Baby Store,Badminton Court,Bagel Shop,Bakery,Bank,Bar,Baseball Field,Baseball Stadium,Basketball Stadium,Beach,Beach Bar,Beer Bar,Beer Store,Belgian Restaurant,Bike Shop,Bistro,Bookstore,Boutique,Bowling Alley,Boxing Gym,Brazilian Restaurant,Breakfast Spot,Brewery,Bridal Shop,Bridge,Bubble Tea Shop,Buffet,Burger Joint,Burrito Place,Bus Line,Bus Station,Bus Stop,Business Service,Butcher,Café,Cajun / Creole Restaurant,Camera Store,Candy Store,Cantonese Restaurant,Caribbean Restaurant,Castle,Cemetery,Check Cashing Service,Cheese Shop,Chinese Restaurant,Chiropractor,Chocolate Shop,Church,Churrascaria,Climbing Gym,Clothing Store,Cocktail Bar,Coffee Shop,College Gym,College Quad,College Rec Center,College Stadium,College Theater,Comedy Club,Comfort Food Restaurant,Comic Shop,Community Center,Concert Hall,Construction & Landscaping,Convenience Store,Cosmetics Shop,Coworking Space,Creperie,Cuban Restaurant,Cupcake Shop,Curling Ice,Dance Studio,Deli / Bodega,Dentist's Office,Department Store,Design Studio,Dessert Shop,Dim Sum Restaurant,Diner,Discount Store,Dive Bar,Dog Run,Doner Restaurant,Donut Shop,Dumpling Restaurant,Eastern European Restaurant,Electronics Store,Empanada Restaurant,Ethiopian Restaurant,Event Space,Falafel Restaurant,Farm,Farmers Market,Fast Food Restaurant,Field,Filipino Restaurant,Fireworks Store,Fish & Chips Shop,Fish Market,Flea Market,Flower Shop,Food,Food & Drink Shop,Food Court,Food Truck,Fountain,French Restaurant,Fried Chicken Joint,Frozen Yogurt Shop,Fruit & Vegetable Store,Furniture / Home Store,Gaming Cafe,Garden,Garden Center,Gas Station,Gastropub,Gay Bar,General Entertainment,German Restaurant,Gift Shop,Golf Course,Golf Driving Range,Gourmet Shop,Greek Restaurant,Grocery Store,Gym,Gym / Fitness Center,Gym Pool,Hakka Restaurant,Halal Restaurant,Harbor / Marina,Hardware Store,Hawaiian Restaurant,Health & Beauty Service,Health Food Store,Historic Site,History Museum,Hobby Shop,Hockey Arena,Home Service,Hong Kong Restaurant,Hookah Bar,Hostel,Hot Dog Joint,Hotel,Hotel Bar,Hotpot Restaurant,Housing Development,Ice Cream Shop,Indian Chinese Restaurant,Indian Restaurant,Indie Movie Theater,Indie Theater,Indonesian Restaurant,Intersection,Italian Restaurant,Japanese Restaurant,Jazz Club,Jewelry Store,Jewish Restaurant,Juice Bar,Karaoke Bar,Kids Store,Kitchen Supply Store,Korean Restaurant,Lake,Latin American Restaurant,Laundry Service,Light Rail Station,Lighting Store,Lingerie Store,Liquor Store,Locksmith,Lounge,Mac & Cheese Joint,Malay Restaurant,Market,Martial Arts Dojo,Massage Studio,Mattress Store,Medical Center,Mediterranean Restaurant,Men's Store,Metro Station,Mexican Restaurant,Middle Eastern Restaurant,Miscellaneous Shop,Mobile Phone Shop,Modern European Restaurant,Monument / Landmark,Moroccan Restaurant,Movie Theater,Museum,Music School,Music Store,Music Venue,Nail Salon,Neighborhood,New American Restaurant,Nightclub,Noodle House,Office,Opera House,Optical Shop,Organic Grocery,Other Great Outdoors,Other Repair Shop,Pakistani Restaurant,Paper / Office Supplies Store,Park,Pastry Shop,Performing Arts Venue,Persian Restaurant,Pet Store,Pharmacy,Photography Studio,Pide Place,Pie Shop,Pilates Studio,Pizza Place,Playground,Plaza,Poke Place,Pool,Pool Hall,Portuguese Restaurant,Poutine Place,Print Shop,Pub,Ramen Restaurant,Record Shop,Recreation Center,Rental Car Location,Rental Service,Residential Building (Apartment / Condo),Restaurant,River,Road,Rock Climbing Spot,Rock Club,Roof Deck,Sake Bar,Salad Place,Salon / Barbershop,Sandwich Place,Scenic Lookout,School,Sculpture Garden,Seafood Restaurant,Shoe Store,Shop & Service,Shopping Mall,Skate Park,Skating Rink,Ski Area,Ski Chalet,Smoke Shop,Smoothie Shop,Snack Place,Soccer Field,Soccer Stadium,Social Club,Soup Place,South American Restaurant,Southern / Soul Food Restaurant,Spa,Speakeasy,Sporting Goods Shop,Sports Bar,Sports Club,Sri Lankan Restaurant,Stationery Store,Steakhouse,Storage Facility,Street Art,Supermarket,Supplement Shop,Sushi Restaurant,Syrian Restaurant,Taco Place,Tailor Shop,Taiwanese Restaurant,Tanning Salon,Tapas Restaurant,Tea Room,Tech Startup,Tennis Court,Thai Restaurant,Theater,Theme Restaurant,Thrift / Vintage Store,Tibetan Restaurant,Toy / Game Store,Track,Trail,Train Station,Transportation Service,Tree,Tunnel,Turkish Restaurant,Udon Restaurant,University,Vegetarian / Vegan Restaurant,Video Game Store,Video Store,Vietnamese Restaurant,Warehouse Store,Whisky Bar,Wine Bar,Wine Shop,Wings Joint,Women's Store,Yoga Studio,Zoo
0,Parkwoods,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,Parkwoods,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,Parkwoods,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,Parkwoods,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,Parkwoods,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


Let's group the result by neighbourhood and take the mean of the frequency of occurrence of each category

In [36]:
df_toronto_grouped = df_toronto_onehot.groupby(["Neighbourhood"]).mean().reset_index()
df_toronto_grouped.head()

Unnamed: 0,Neighbourhood,Accessories Store,Afghan Restaurant,Airport,Airport Lounge,American Restaurant,Amphitheater,Animal Shelter,Antique Shop,Aquarium,Art Gallery,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,Auto Dealership,Auto Workshop,Automotive Shop,BBQ Joint,Baby Store,Badminton Court,Bagel Shop,Bakery,Bank,Bar,Baseball Field,Baseball Stadium,Basketball Stadium,Beach,Beach Bar,Beer Bar,Beer Store,Belgian Restaurant,Bike Shop,Bistro,Bookstore,Boutique,Bowling Alley,Boxing Gym,Brazilian Restaurant,Breakfast Spot,Brewery,Bridal Shop,Bridge,Bubble Tea Shop,Buffet,Burger Joint,Burrito Place,Bus Line,Bus Station,Bus Stop,Business Service,Butcher,Café,Cajun / Creole Restaurant,Camera Store,Candy Store,Cantonese Restaurant,Caribbean Restaurant,Castle,Cemetery,Check Cashing Service,Cheese Shop,Chinese Restaurant,Chiropractor,Chocolate Shop,Church,Churrascaria,Climbing Gym,Clothing Store,Cocktail Bar,Coffee Shop,College Gym,College Quad,College Rec Center,College Stadium,College Theater,Comedy Club,Comfort Food Restaurant,Comic Shop,Community Center,Concert Hall,Construction & Landscaping,Convenience Store,Cosmetics Shop,Coworking Space,Creperie,Cuban Restaurant,Cupcake Shop,Curling Ice,Dance Studio,Deli / Bodega,Dentist's Office,Department Store,Design Studio,Dessert Shop,Dim Sum Restaurant,Diner,Discount Store,Dive Bar,Dog Run,Doner Restaurant,Donut Shop,Dumpling Restaurant,Eastern European Restaurant,Electronics Store,Empanada Restaurant,Ethiopian Restaurant,Event Space,Falafel Restaurant,Farm,Farmers Market,Fast Food Restaurant,Field,Filipino Restaurant,Fireworks Store,Fish & Chips Shop,Fish Market,Flea Market,Flower Shop,Food,Food & Drink Shop,Food Court,Food Truck,Fountain,French Restaurant,Fried Chicken Joint,Frozen Yogurt Shop,Fruit & Vegetable Store,Furniture / Home Store,Gaming Cafe,Garden,Garden Center,Gas Station,Gastropub,Gay Bar,General Entertainment,German Restaurant,Gift Shop,Golf Course,Golf Driving Range,Gourmet Shop,Greek Restaurant,Grocery Store,Gym,Gym / Fitness Center,Gym Pool,Hakka Restaurant,Halal Restaurant,Harbor / Marina,Hardware Store,Hawaiian Restaurant,Health & Beauty Service,Health Food Store,Historic Site,History Museum,Hobby Shop,Hockey Arena,Home Service,Hong Kong Restaurant,Hookah Bar,Hostel,Hot Dog Joint,Hotel,Hotel Bar,Hotpot Restaurant,Housing Development,Ice Cream Shop,Indian Chinese Restaurant,Indian Restaurant,Indie Movie Theater,Indie Theater,Indonesian Restaurant,Intersection,Italian Restaurant,Japanese Restaurant,Jazz Club,Jewelry Store,Jewish Restaurant,Juice Bar,Karaoke Bar,Kids Store,Kitchen Supply Store,Korean Restaurant,Lake,Latin American Restaurant,Laundry Service,Light Rail Station,Lighting Store,Lingerie Store,Liquor Store,Locksmith,Lounge,Mac & Cheese Joint,Malay Restaurant,Market,Martial Arts Dojo,Massage Studio,Mattress Store,Medical Center,Mediterranean Restaurant,Men's Store,Metro Station,Mexican Restaurant,Middle Eastern Restaurant,Miscellaneous Shop,Mobile Phone Shop,Modern European Restaurant,Monument / Landmark,Moroccan Restaurant,Movie Theater,Museum,Music School,Music Store,Music Venue,Nail Salon,Neighborhood,New American Restaurant,Nightclub,Noodle House,Office,Opera House,Optical Shop,Organic Grocery,Other Great Outdoors,Other Repair Shop,Pakistani Restaurant,Paper / Office Supplies Store,Park,Pastry Shop,Performing Arts Venue,Persian Restaurant,Pet Store,Pharmacy,Photography Studio,Pide Place,Pie Shop,Pilates Studio,Pizza Place,Playground,Plaza,Poke Place,Pool,Pool Hall,Portuguese Restaurant,Poutine Place,Print Shop,Pub,Ramen Restaurant,Record Shop,Recreation Center,Rental Car Location,Rental Service,Residential Building (Apartment / Condo),Restaurant,River,Road,Rock Climbing Spot,Rock Club,Roof Deck,Sake Bar,Salad Place,Salon / Barbershop,Sandwich Place,Scenic Lookout,School,Sculpture Garden,Seafood Restaurant,Shoe Store,Shop & Service,Shopping Mall,Skate Park,Skating Rink,Ski Area,Ski Chalet,Smoke Shop,Smoothie Shop,Snack Place,Soccer Field,Soccer Stadium,Social Club,Soup Place,South American Restaurant,Southern / Soul Food Restaurant,Spa,Speakeasy,Sporting Goods Shop,Sports Bar,Sports Club,Sri Lankan Restaurant,Stationery Store,Steakhouse,Storage Facility,Street Art,Supermarket,Supplement Shop,Sushi Restaurant,Syrian Restaurant,Taco Place,Tailor Shop,Taiwanese Restaurant,Tanning Salon,Tapas Restaurant,Tea Room,Tech Startup,Tennis Court,Thai Restaurant,Theater,Theme Restaurant,Thrift / Vintage Store,Tibetan Restaurant,Toy / Game Store,Track,Trail,Train Station,Transportation Service,Tree,Tunnel,Turkish Restaurant,Udon Restaurant,University,Vegetarian / Vegan Restaurant,Video Game Store,Video Store,Vietnamese Restaurant,Warehouse Store,Whisky Bar,Wine Bar,Wine Shop,Wings Joint,Women's Store,Yoga Studio,Zoo
0,"Adelaide,King,Richmond",0.0,0.0,0.0,0.0,0.03,0.0,0.0,0.0,0.0,0.0,0.01,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.01,0.02,0.0,0.0,0.0,0.0,0.0,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.07,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.03,0.0,0.06,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.01,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.02,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.05,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.02,0.01,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.01,0.01,0.01,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.01,0.02,0.01,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.01,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.03,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.04,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0
1,Agincourt,0.0,0.0,0.0,0.0,0.022222,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.022222,0.0,0.044444,0.022222,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.022222,0.0,0.0,0.0,0.022222,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.022222,0.044444,0.0,0.0,0.0,0.0,0.177778,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.022222,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.022222,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.022222,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.022222,0.0,0.0,0.0,0.0,0.0,0.0,0.022222,0.0,0.022222,0.0,0.0,0.0,0.0,0.0,0.022222,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.022222,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.022222,0.0,0.0,0.0,0.0,0.022222,0.0,0.0,0.0,0.0,0.044444,0.0,0.0,0.0,0.022222,0.022222,0.0,0.0,0.022222,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.044444,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.022222,0.0,0.0,0.0,0.022222,0.0,0.0,0.066667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.022222,0.0,0.0,0.0,0.0,0.044444,0.0,0.022222,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.022222,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,"Agincourt North,L'Amoreaux East,Milliken,Steel...",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.034483,0.0,0.0,0.0,0.068966,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.034483,0.0,0.0,0.0,0.0,0.172414,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.034483,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.034483,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.034483,0.0,0.0,0.0,0.068966,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.034483,0.034483,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.034483,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.034483,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.034483,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.034483,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.068966,0.0,0.0,0.0,0.0,0.034483,0.0,0.0,0.0,0.0,0.068966,0.034483,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.034483,0.034483,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.034483,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,"Albion Gardens,Beaumond Heights,Humbergate,Jam...",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.076923,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.076923,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.076923,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.076923,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.153846,0.0,0.0,0.0,0.0,0.0,0.0,0.076923,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.076923,0.0,0.0,0.0,0.0,0.076923,0.0,0.0,0.0,0.0,0.230769,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.076923,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,"Alderwood,Long Branch",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.035714,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.071429,0.0,0.0,0.0,0.0,0.0,0.0,0.035714,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.107143,0.0,0.0,0.0,0.035714,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.035714,0.035714,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.035714,0.035714,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.035714,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.035714,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.035714,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.071429,0.0,0.0,0.0,0.0,0.107143,0.0,0.0,0.0,0.0,0.071429,0.0,0.0,0.0,0.035714,0.0,0.0,0.0,0.0,0.035714,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.035714,0.0,0.0,0.0,0.0,0.0,0.0,0.035714,0.0,0.035714,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.035714,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [37]:
df_toronto_grouped.shape

(101, 332)

For the analysis, we should be able to tell what are the n-th top categories in each neighbourhood, so we define a function for that

In [38]:
#Given a row (with a neighbourhood) and a number n, return the n-th most frequent categories for that row
def get_top_venue_categories(row,top):
    row_categories = row.iloc[1:]
    row_categories = row_categories.sort_values(ascending=False)
    return row_categories.index.values[0:top]

In [39]:
#Let's look first at the top 3 categories in each neighbourhood
top = 3

#Auxiliar sufixes to pretify the dataframe
sufix = ["st","nd","rd"]

#Set up the columns names
columns = ['Neighbourhood']
for i in range(top):
    try:
        columns.append('{}{} Most common venue category'.format(i+1, sufix[i]))
    except:
        columns.append('{}th Most common venue category'.format(i+1))

#Create the new dataframe that will contain the top categories for each neighbourhood
neighbourhood_top_venue_categories = pd.DataFrame(columns=columns)

#Populate this new dataframe using the neighbourhood data and the previously defined function
neighbourhood_top_venue_categories["Neighbourhood"] = df_toronto_grouped["Neighbourhood"]
for i in np.arange(df_toronto_grouped.shape[0]):
    neighbourhood_top_venue_categories.iloc[i,1:] = get_top_venue_categories(df_toronto_grouped.iloc[i,:],3)

In [40]:
neighbourhood_top_venue_categories.head()

Unnamed: 0,Neighbourhood,1st Most common venue category,2nd Most common venue category,3rd Most common venue category
0,"Adelaide,King,Richmond",Café,Coffee Shop,Hotel
1,Agincourt,Chinese Restaurant,Shopping Mall,Supermarket
2,"Agincourt North,L'Amoreaux East,Milliken,Steel...",Chinese Restaurant,Pizza Place,Bakery
3,"Albion Gardens,Beaumond Heights,Humbergate,Jam...",Pizza Place,Grocery Store,Park
4,"Alderwood,Long Branch",Discount Store,Pharmacy,Park


## Clustering the neighbourhoods based on the venues categories

We then do the clustering using the k-means algorithm

In [41]:
#import k-means from clustering stage
from sklearn.cluster import KMeans

In [42]:
#Let's use 7 clusters
k = 7
toronto_clustering = df_toronto_grouped.drop(["Neighbourhood"],axis=1)
kmeans = KMeans(init="k-means++", n_clusters=k, n_init=12)
kmeans.fit(toronto_clustering)
kmeans.labels_[0:10]

array([5, 6, 6, 6, 0, 6, 1, 5, 5, 5], dtype=int32)

In [43]:
#Add the labels into our previous dataframe
neighbourhood_top_venue_categories.insert(0,"Cluster labels",kmeans.labels_)
neighbourhood_top_venue_categories.head(10)

Unnamed: 0,Cluster labels,Neighbourhood,1st Most common venue category,2nd Most common venue category,3rd Most common venue category
0,5,"Adelaide,King,Richmond",Café,Coffee Shop,Hotel
1,6,Agincourt,Chinese Restaurant,Shopping Mall,Supermarket
2,6,"Agincourt North,L'Amoreaux East,Milliken,Steel...",Chinese Restaurant,Pizza Place,Bakery
3,6,"Albion Gardens,Beaumond Heights,Humbergate,Jam...",Pizza Place,Grocery Store,Park
4,0,"Alderwood,Long Branch",Discount Store,Pharmacy,Park
5,6,"Bathurst Manor,Downsview North,Wilson Heights",Coffee Shop,Pizza Place,Mediterranean Restaurant
6,1,Bayview Village,Bank,Japanese Restaurant,Intersection
7,5,"Bedford Park,Lawrence Manor East",Italian Restaurant,Coffee Shop,Thai Restaurant
8,5,Berczy Park,Coffee Shop,Hotel,Café
9,5,"Birch Cliff,Cliffside West",Thai Restaurant,Restaurant,Photography Studio


In order to plot the clusters in the map, we need to merge it with our original neighourhood data

In [44]:
#Let's merge the labeled cluster data with our original neighbourhood data
toronto_merged_data = df_toronto_postcode_coord.join(neighbourhood_top_venue_categories.set_index("Neighbourhood"),on="Neighbourhood")
toronto_merged_data.head(10)

Unnamed: 0,Postcode,Borough,Neighbourhood,Latitude,Longitude,Cluster labels,1st Most common venue category,2nd Most common venue category,3rd Most common venue category
0,M3A,North York,Parkwoods,43.753259,-79.329656,6.0,Park,Pharmacy,Convenience Store
1,M4A,North York,Victoria Village,43.725882,-79.315572,5.0,Coffee Shop,Sporting Goods Shop,Intersection
2,M5A,Downtown Toronto,Harbourfront,43.65426,-79.360636,5.0,Coffee Shop,Café,Pub
3,M6A,North York,"Lawrence Heights,Lawrence Manor",43.718518,-79.464763,6.0,Fast Food Restaurant,Coffee Shop,Clothing Store
4,M7A,Queen's Park,Queen's Park,43.662301,-79.389494,5.0,Coffee Shop,Park,Gastropub
5,M9A,Queen's Park,Queen's Park,43.667856,-79.532242,5.0,Coffee Shop,Park,Gastropub
6,M1B,Scarborough,"Rouge,Malvern",43.806686,-79.194353,6.0,Fast Food Restaurant,Coffee Shop,Gym
7,M3B,North York,Don Mills North,43.745906,-79.352188,5.0,Japanese Restaurant,Pizza Place,Coffee Shop
8,M4B,East York,"Woodbine Gardens,Parkview Hill",43.706397,-79.309937,6.0,Coffee Shop,Pizza Place,Brewery
9,M5B,Downtown Toronto,"Ryerson,Garden District",43.657162,-79.378937,5.0,Coffee Shop,Clothing Store,Fast Food Restaurant


In [45]:
#The value of 'k' will indicate that there is no cluster label for that neighbourhood
#This is because the explore API call might not return venues for all locations
toronto_merged_data.fillna(value={"Cluster labels":k},inplace=True)

In [46]:
toronto_merged_data.head(10)

Unnamed: 0,Postcode,Borough,Neighbourhood,Latitude,Longitude,Cluster labels,1st Most common venue category,2nd Most common venue category,3rd Most common venue category
0,M3A,North York,Parkwoods,43.753259,-79.329656,6.0,Park,Pharmacy,Convenience Store
1,M4A,North York,Victoria Village,43.725882,-79.315572,5.0,Coffee Shop,Sporting Goods Shop,Intersection
2,M5A,Downtown Toronto,Harbourfront,43.65426,-79.360636,5.0,Coffee Shop,Café,Pub
3,M6A,North York,"Lawrence Heights,Lawrence Manor",43.718518,-79.464763,6.0,Fast Food Restaurant,Coffee Shop,Clothing Store
4,M7A,Queen's Park,Queen's Park,43.662301,-79.389494,5.0,Coffee Shop,Park,Gastropub
5,M9A,Queen's Park,Queen's Park,43.667856,-79.532242,5.0,Coffee Shop,Park,Gastropub
6,M1B,Scarborough,"Rouge,Malvern",43.806686,-79.194353,6.0,Fast Food Restaurant,Coffee Shop,Gym
7,M3B,North York,Don Mills North,43.745906,-79.352188,5.0,Japanese Restaurant,Pizza Place,Coffee Shop
8,M4B,East York,"Woodbine Gardens,Parkview Hill",43.706397,-79.309937,6.0,Coffee Shop,Pizza Place,Brewery
9,M5B,Downtown Toronto,"Ryerson,Garden District",43.657162,-79.378937,5.0,Coffee Shop,Clothing Store,Fast Food Restaurant


We are now ready to plot the neighbourhood data with the clusters labels into Toronto's map

In [47]:
#Create map
map_toronto_clusters = folium.Map(location=[toronto_latitude, toronto_longitude], zoom_start=11)

#Set color scheme for the clusters
x = np.arange(k+1)
ys = [i + x + (i*x)**2 for i in range(k+1)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

#Add markers to the map
markers_colors = []
for lat, lon, neighbourhood, cluster in zip(toronto_merged_data['Latitude'], toronto_merged_data['Longitude'], toronto_merged_data['Neighbourhood'], toronto_merged_data['Cluster labels']):
    cluster = int(cluster)
    label = folium.Popup(str(neighbourhood) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_toronto_clusters)
       
map_toronto_clusters

# END PART 3