This Notebook will be used to scrape the Wikipedia page, https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M, in order to obtain the data that is in the table of postal codes and to transform the data into a pandas dataframe like the one shown below:

First import the necessary libraries. I will use Pandas and wikipdia libraries to get the Postal Codes data.

In [1]:
import pandas as pd
import wikipedia as wp

Now we will get the HTML page and load the into the dataframe

In [2]:
#Get the html source and load to dataframe
html = wp.page("List_of_postal_codes_of_Canada:_M").html()
wikiDF = pd.read_html(html)[0] #This will load the table to dataframe

wikiDF = wikiDF.iloc[1:] # get all rows except the first one since it has table headings

#Define Column Names and assign the to dataframe
columnNames = ['Postal Code','Borough','Neighborhood']
wikiDF.columns = columnNames
wikiDF.head()

Unnamed: 0,Postal Code,Borough,Neighborhood
1,M1A,Not assigned,Not assigned
2,M2A,Not assigned,Not assigned
3,M3A,North York,Parkwoods
4,M4A,North York,Victoria Village
5,M5A,Downtown Toronto,Harbourfront


Start cleaning the DataFrame

In [3]:
#Drop rows where Borough = "Not Assigned"
wikiDF = wikiDF[wikiDF["Borough"] != "Not assigned"]
wikiDF.head(10)

Unnamed: 0,Postal Code,Borough,Neighborhood
3,M3A,North York,Parkwoods
4,M4A,North York,Victoria Village
5,M5A,Downtown Toronto,Harbourfront
6,M5A,Downtown Toronto,Regent Park
7,M6A,North York,Lawrence Heights
8,M6A,North York,Lawrence Manor
9,M7A,Queen's Park,Not assigned
11,M9A,Etobicoke,Islington Avenue
12,M1B,Scarborough,Rouge
13,M1B,Scarborough,Malvern


Let's first Create a function that will help us to make "Neighborhood" = "Borough", if "Neighborhood" = "Not assigned"

In [4]:
#Function to get neighborhood name
def get_Borough_Name(data):
    if data['Neighborhood'] == "Not assigned":
        neighborhood_Name = data['Borough']
    else:
        neighborhood_Name = data['Neighborhood']
        
    return neighborhood_Name

Now let's us update "Not assigned" Neighborhoods.

In [5]:
# Change Neighborhood = Borough if Neighborhood = "Not assigned" by invoking get_Borough_Name function
wikiDF['Neighborhood']=wikiDF[['Borough', 'Neighborhood']].apply(get_Borough_Name, axis=1)
wikiDF.head(10)

Unnamed: 0,Postal Code,Borough,Neighborhood
3,M3A,North York,Parkwoods
4,M4A,North York,Victoria Village
5,M5A,Downtown Toronto,Harbourfront
6,M5A,Downtown Toronto,Regent Park
7,M6A,North York,Lawrence Heights
8,M6A,North York,Lawrence Manor
9,M7A,Queen's Park,Queen's Park
11,M9A,Etobicoke,Islington Avenue
12,M1B,Scarborough,Rouge
13,M1B,Scarborough,Malvern


More than one neighborhood can exist in one postal code area. These two rows will be combined into one row with the neighborhoods separated with a comma.

In [6]:
#Combine Neighborhood if same PostCode
wikiDF = wikiDF.groupby(['Postal Code','Borough'])['Neighborhood'].apply(', '.join).reset_index()
wikiDF.head(10)

Unnamed: 0,Postal Code,Borough,Neighborhood
0,M1B,Scarborough,"Rouge, Malvern"
1,M1C,Scarborough,"Highland Creek, Rouge Hill, Port Union"
2,M1E,Scarborough,"Guildwood, Morningside, West Hill"
3,M1G,Scarborough,Woburn
4,M1H,Scarborough,Cedarbrae
5,M1J,Scarborough,Scarborough Village
6,M1K,Scarborough,"East Birchmount Park, Ionview, Kennedy Park"
7,M1L,Scarborough,"Clairlea, Golden Mile, Oakridge"
8,M1M,Scarborough,"Cliffcrest, Cliffside, Scarborough Village West"
9,M1N,Scarborough,"Birch Cliff, Cliffside West"


Get the number of rows in the datafram

In [7]:
wikiDF.shape

(103, 3)

Now we need to get Latidude and Longitude for the postal codes
Plan was to use "geocoder". However, it was not retrieving coordinates. Hence, used Geospatial_Coordinats.csv file

In [8]:
#Read Geospatial Coordinates file into a dataframe
latlngDF = pd.read_csv("http://cocl.us/Geospatial_data/Geospatial_Coordinates.csv")

In [9]:
latlngDF.head()

Unnamed: 0,Postal Code,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497
2,M1E,43.763573,-79.188711
3,M1G,43.770992,-79.216917
4,M1H,43.773136,-79.239476


Now let's join the two DataFrame (wikiDF and latlngDF) on "Postal Code"

In [10]:
#Join the wikiDF and latlngDF on Postal Code
mergedDF = wikiDF.join(latlngDF.set_index('Postal Code'), on='Postal Code')
mergedDF.head(10)

Unnamed: 0,Postal Code,Borough,Neighborhood,Latitude,Longitude
0,M1B,Scarborough,"Rouge, Malvern",43.806686,-79.194353
1,M1C,Scarborough,"Highland Creek, Rouge Hill, Port Union",43.784535,-79.160497
2,M1E,Scarborough,"Guildwood, Morningside, West Hill",43.763573,-79.188711
3,M1G,Scarborough,Woburn,43.770992,-79.216917
4,M1H,Scarborough,Cedarbrae,43.773136,-79.239476
5,M1J,Scarborough,Scarborough Village,43.744734,-79.239476
6,M1K,Scarborough,"East Birchmount Park, Ionview, Kennedy Park",43.727929,-79.262029
7,M1L,Scarborough,"Clairlea, Golden Mile, Oakridge",43.711112,-79.284577
8,M1M,Scarborough,"Cliffcrest, Cliffside, Scarborough Village West",43.716316,-79.239476
9,M1N,Scarborough,"Birch Cliff, Cliffside West",43.692657,-79.264848


Import folium for plotting maps and geopy to get the coordinates of Toronto

In [11]:
import folium
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

Use geopy library to get the latitude and longitude values of Toronto City. Define an instance of the geocoder with user_agent as toronto_explorer.

In [12]:
address = 'Toronto'

geolocator = Nominatim(user_agent="toronto_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Toronto are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Toronto are 43.653963, -79.387207.


Create a map of Toronto with Postal Code, Borough and Neighborhood superimposed on top

In [13]:
# create map of New York using latitude and longitude values
map_toronto = folium.Map(location=[latitude, longitude], zoom_start=10)

# add markers to map
for lat, lng, postalcode, borough, neighborhood in zip(mergedDF['Latitude'], mergedDF['Longitude'], mergedDF['Postal Code'],mergedDF['Borough'], mergedDF['Neighborhood']):
    label = '{}, {}, {}'.format(postalcode, borough, neighborhood)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_toronto)  
    
map_toronto

Import requests library to handle requsts

In [14]:
import requests

Now we are going to start utilizing the Foursquare API to explore the Postal Codes and Boroughs and 
segment them.

In [15]:
#Define Foursquare Credentials and Version
CLIENT_ID = '24ALNEKCXXGKZP1LR5I4RVXC5PLEBUSVI2H3N5NXHGSDKMFC' # your Foursquare ID
CLIENT_SECRET = '3YYXMPCUOHNXKQGM13QRWBHICR1Z1FEOG41RKXRGNUDU0N5M' # your Foursquare Secret
VERSION = '20180605'
print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: 24ALNEKCXXGKZP1LR5I4RVXC5PLEBUSVI2H3N5NXHGSDKMFC
CLIENT_SECRET:3YYXMPCUOHNXKQGM13QRWBHICR1Z1FEOG41RKXRGNUDU0N5M


Now, let's get the top 100 venues that are within a radius of 500 meters

In [16]:
radius = 500
limit = 100

Now, Explore Neighborhoods in Toronto

Let's create a function to repeat the same process to all the Postal Codes/Boroughs in Toroto

In [17]:
def getNearbyVenues(codes, names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for code, name, lat, lng in zip(codes, names, latitudes, longitudes):
        print(code, name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}\
            &v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            limit)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            code,
            name, 
            lat, 
            lng, 
#            v['venue']['code'], 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Postal Code',
                   'Borough', 
                  'Latitude', 
                  'Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

Now write the code to run the above function on each Postal Code and Borough and create a new dataframe

In [18]:
toronto_venues = getNearbyVenues(codes=mergedDF['Postal Code'], 
                                 names=mergedDF['Borough'],
                                 latitudes=mergedDF['Latitude'],
                                 longitudes=mergedDF['Longitude']
                                  )

M1B Scarborough
M1C Scarborough
M1E Scarborough
M1G Scarborough
M1H Scarborough
M1J Scarborough
M1K Scarborough
M1L Scarborough
M1M Scarborough
M1N Scarborough
M1P Scarborough
M1R Scarborough
M1S Scarborough
M1T Scarborough
M1V Scarborough
M1W Scarborough
M1X Scarborough
M2H North York
M2J North York
M2K North York
M2L North York
M2M North York
M2N North York
M2P North York
M2R North York
M3A North York
M3B North York
M3C North York
M3H North York
M3J North York
M3K North York
M3L North York
M3M North York
M3N North York
M4A North York
M4B East York
M4C East York
M4E East Toronto
M4G East York
M4H East York
M4J East York
M4K East Toronto
M4L East Toronto
M4M East Toronto
M4N Central Toronto
M4P Central Toronto
M4R Central Toronto
M4S Central Toronto
M4T Central Toronto
M4V Central Toronto
M4W Downtown Toronto
M4X Downtown Toronto
M4Y Downtown Toronto
M5A Downtown Toronto
M5B Downtown Toronto
M5C Downtown Toronto
M5E Downtown Toronto
M5G Downtown Toronto
M5H Downtown Toronto
M5J Downtow

In [19]:
print(toronto_venues.shape)
print(toronto_venues.head())

(2259, 8)
  Postal Code      Borough   Latitude  Longitude  \
0         M1B  Scarborough  43.806686 -79.194353   
1         M1C  Scarborough  43.784535 -79.160497   
2         M1C  Scarborough  43.784535 -79.160497   
3         M1E  Scarborough  43.763573 -79.188711   
4         M1E  Scarborough  43.763573 -79.188711   

                             Venue  Venue Latitude  Venue Longitude  \
0                          Wendy's       43.807448       -79.199056   
1            Royal Canadian Legion       43.782533       -79.163085   
2        Affordable Toronto Movers       43.787919       -79.162977   
3  Swiss Chalet Rotisserie & Grill       43.767697       -79.189914   
4                G & G Electronics       43.765309       -79.191537   

         Venue Category  
0  Fast Food Restaurant  
1                   Bar  
2         Moving Target  
3           Pizza Place  
4     Electronics Store  


Let's check how many venues were returned for each Postal Code and Borough

In [20]:

toronto_venues.groupby(['Postal Code', 'Borough']).count()

Unnamed: 0_level_0,Unnamed: 1_level_0,Latitude,Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Postal Code,Borough,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
M1B,Scarborough,1,1,1,1,1,1
M1C,Scarborough,2,2,2,2,2,2
M1E,Scarborough,6,6,6,6,6,6
M1G,Scarborough,4,4,4,4,4,4
M1H,Scarborough,7,7,7,7,7,7
M1J,Scarborough,2,2,2,2,2,2
M1K,Scarborough,7,7,7,7,7,7
M1L,Scarborough,10,10,10,10,10,10
M1M,Scarborough,3,3,3,3,3,3
M1N,Scarborough,4,4,4,4,4,4


In [22]:
#Let's find out how many unique categories can be curated from all the returned venues
print('There are {} uniques categories.'.format(len(toronto_venues['Venue Category'].unique())))

There are 282 uniques categories.


Analyze Each Postal Code/Borough

In [23]:
# one hot encoding - get dummies
toronto_onehot = pd.get_dummies(toronto_venues[['Venue Category']], prefix="", prefix_sep="")

In [24]:
toronto_onehot.head()

Unnamed: 0,Accessories Store,Adult Boutique,Afghan Restaurant,Airport,Airport Food Court,Airport Gate,Airport Lounge,Airport Service,Airport Terminal,American Restaurant,...,Vegetarian / Vegan Restaurant,Video Game Store,Video Store,Vietnamese Restaurant,Warehouse Store,Wine Bar,Wine Shop,Wings Joint,Women's Store,Yoga Studio
0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [25]:
# add Postal Code and Borough to the dataframe.
borough = toronto_venues['Borough']
postcode = toronto_venues['Postal Code']
toronto_onehot.insert(0, 'Borough', borough)
toronto_onehot.insert(0, 'Postal Code', postcode)

In [26]:
toronto_onehot.head()

Unnamed: 0,Postal Code,Borough,Accessories Store,Adult Boutique,Afghan Restaurant,Airport,Airport Food Court,Airport Gate,Airport Lounge,Airport Service,...,Vegetarian / Vegan Restaurant,Video Game Store,Video Store,Vietnamese Restaurant,Warehouse Store,Wine Bar,Wine Shop,Wings Joint,Women's Store,Yoga Studio
0,M1B,Scarborough,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,M1C,Scarborough,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,M1C,Scarborough,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,M1E,Scarborough,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,M1E,Scarborough,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


Now, let's group rows by neighborhood and by taking the mean of the frequency of occurrence of each category

In [27]:
toronto_grouped = toronto_onehot.groupby(['Postal Code','Borough']).mean().reset_index()
print(toronto_grouped.head())
print(toronto_grouped.shape)

  Postal Code      Borough  Accessories Store  Adult Boutique  \
0         M1B  Scarborough                0.0             0.0   
1         M1C  Scarborough                0.0             0.0   
2         M1E  Scarborough                0.0             0.0   
3         M1G  Scarborough                0.0             0.0   
4         M1H  Scarborough                0.0             0.0   

   Afghan Restaurant  Airport  Airport Food Court  Airport Gate  \
0                0.0      0.0                 0.0           0.0   
1                0.0      0.0                 0.0           0.0   
2                0.0      0.0                 0.0           0.0   
3                0.0      0.0                 0.0           0.0   
4                0.0      0.0                 0.0           0.0   

   Airport Lounge  Airport Service     ...       \
0             0.0              0.0     ...        
1             0.0              0.0     ...        
2             0.0              0.0     ...        
3 

Print each Postal Code/Borough along with the top 5 most common venues

In [28]:
num_top_venues = 5

for postcode, rough in zip(toronto_grouped['Postal Code'],toronto_grouped['Borough']):
#for hood,rough in Toronto_grouped['Postal Code','Borough']:
    print("----"+postcode + ", " + rough + "----")
    temp = toronto_grouped[toronto_grouped['Postal Code'] == postcode].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[2:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----M1B, Scarborough----
                  venue  freq
0  Fast Food Restaurant   1.0
1     Accessories Store   0.0
2     Mobile Phone Shop   0.0
3         Moving Target   0.0
4         Movie Theater   0.0


----M1C, Scarborough----
               venue  freq
0      Moving Target   0.5
1                Bar   0.5
2  Accessories Store   0.0
3     Massage Studio   0.0
4             Museum   0.0


----M1E, Scarborough----
                 venue  freq
0       Medical Center  0.17
1   Mexican Restaurant  0.17
2  Rental Car Location  0.17
3    Electronics Store  0.17
4       Breakfast Spot  0.17


----M1G, Scarborough----
                 venue  freq
0          Coffee Shop  0.50
1    Korean Restaurant  0.25
2     Insurance Office  0.25
3    Accessories Store  0.00
4  Monument / Landmark  0.00


----M1H, Scarborough----
                 venue  freq
0     Hakka Restaurant  0.14
1  Fried Chicken Joint  0.14
2      Thai Restaurant  0.14
3                 Bank  0.14
4   Athletics & Sports  0.14


-

           venue  freq
0     Playground  0.25
1   Tennis Court  0.25
2     Restaurant  0.25
3            Gym  0.25
4  Metro Station  0.00


----M4V, Central Toronto----
                 venue  freq
0          Coffee Shop  0.14
1                  Pub  0.14
2  Fried Chicken Joint  0.07
3    Convenience Store  0.07
4           Bagel Shop  0.07


----M4W, Downtown Toronto----
               venue  freq
0               Park  0.50
1         Playground  0.25
2              Trail  0.25
3  Mobile Phone Shop  0.00
4      Moving Target  0.00


----M4X, Downtown Toronto----
         venue  freq
0  Coffee Shop  0.08
1   Restaurant  0.06
2          Pub  0.04
3       Bakery  0.04
4  Pizza Place  0.04


----M4Y, Downtown Toronto----
                 venue  freq
0  Japanese Restaurant  0.07
1          Coffee Shop  0.06
2     Sushi Restaurant  0.06
3              Gay Bar  0.05
4           Restaurant  0.03


----M5A, Downtown Toronto----
         venue  freq
0  Coffee Shop  0.16
1         Park  0.06
2   

               venue  freq
0     Baseball Field   1.0
1  Accessories Store   0.0
2     Massage Studio   0.0
3             Museum   0.0
4      Moving Target   0.0


----M9N, York----
               venue  freq
0               Park   1.0
1  Accessories Store   0.0
2  Mobile Phone Shop   0.0
3      Moving Target   0.0
4      Movie Theater   0.0


----M9P, Etobicoke----
                       venue  freq
0                Pizza Place   0.2
1  Middle Eastern Restaurant   0.2
2         Chinese Restaurant   0.2
3             Sandwich Place   0.2
4                Coffee Shop   0.2


----M9R, Etobicoke----
                        venue  freq
0           Mobile Phone Shop  0.25
1                 Pizza Place  0.25
2                    Bus Line  0.25
3                        Park  0.25
4  Modern European Restaurant  0.00


----M9V, Etobicoke----
                  venue  freq
0           Pizza Place   0.2
1         Grocery Store   0.2
2  Fast Food Restaurant   0.1
3   Fried Chicken Joint   0.1
4    

Let's put top 10 venues for each Postal Code/Borough into a pandas dataframe

First, let's write a function to sort the venues in descending order.

In [29]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[2:]
    
    row_categories_sorted = row_categories.sort_values(ascending=False)
    return row_categories_sorted.index.values[0:num_top_venues]

Now let's create the new dataframe and display the top 10 venues for each neighborhood.

In [30]:
import numpy as np

num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Postal Code','Borough']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
postalcodes_venues_sorted = pd.DataFrame(columns=columns)
postalcodes_venues_sorted['Postal Code'] = toronto_grouped['Postal Code']
postalcodes_venues_sorted['Borough'] = toronto_grouped['Borough']

for ind in np.arange(toronto_grouped.shape[0]):
    postalcodes_venues_sorted.iloc[ind, 2:] = return_most_common_venues(toronto_grouped.iloc[ind, :], num_top_venues)

postalcodes_venues_sorted.head()

Unnamed: 0,Postal Code,Borough,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,M1B,Scarborough,Fast Food Restaurant,Yoga Studio,Empanada Restaurant,Doner Restaurant,Donut Shop,Drugstore,Dumpling Restaurant,Eastern European Restaurant,Electronics Store,Ethiopian Restaurant
1,M1C,Scarborough,Moving Target,Bar,Fish & Chips Shop,Dog Run,Doner Restaurant,Donut Shop,Drugstore,Dumpling Restaurant,Eastern European Restaurant,Yoga Studio
2,M1E,Scarborough,Pizza Place,Breakfast Spot,Rental Car Location,Electronics Store,Mexican Restaurant,Medical Center,Event Space,Ethiopian Restaurant,Empanada Restaurant,Discount Store
3,M1G,Scarborough,Coffee Shop,Insurance Office,Korean Restaurant,Yoga Studio,Electronics Store,Doner Restaurant,Donut Shop,Drugstore,Dumpling Restaurant,Eastern European Restaurant
4,M1H,Scarborough,Athletics & Sports,Fried Chicken Joint,Caribbean Restaurant,Thai Restaurant,Hakka Restaurant,Bakery,Bank,Doner Restaurant,Donut Shop,Drugstore


Let's now cluster Postal Codes/Boroughs

Import KMeans from SKlearn.cluster

In [31]:
from sklearn.cluster import KMeans

We will run k-means to cluster the Postal Codes/Boroughs into 5 clusters.

In [32]:
# set number of clusters
kclusters = 5

#drop Neighborhood column since it is not a numeric field
toronto_grouped_clustering = toronto_grouped.drop(['Postal Code','Borough'], 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(toronto_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_

array([4, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 3, 0, 0, 0, 0, 0, 3, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 3, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 3, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       1, 0, 0, 0, 0, 1, 3, 0, 0, 0, 0])

Let's create a new dataframe that includes the cluster as well as the top 10 venues for each Postal Code/Borough.

In [33]:
# add clustering labels
postalcodes_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)
postalcodes_venues_sorted.head()

Unnamed: 0,Cluster Labels,Postal Code,Borough,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,4,M1B,Scarborough,Fast Food Restaurant,Yoga Studio,Empanada Restaurant,Doner Restaurant,Donut Shop,Drugstore,Dumpling Restaurant,Eastern European Restaurant,Electronics Store,Ethiopian Restaurant
1,0,M1C,Scarborough,Moving Target,Bar,Fish & Chips Shop,Dog Run,Doner Restaurant,Donut Shop,Drugstore,Dumpling Restaurant,Eastern European Restaurant,Yoga Studio
2,0,M1E,Scarborough,Pizza Place,Breakfast Spot,Rental Car Location,Electronics Store,Mexican Restaurant,Medical Center,Event Space,Ethiopian Restaurant,Empanada Restaurant,Discount Store
3,0,M1G,Scarborough,Coffee Shop,Insurance Office,Korean Restaurant,Yoga Studio,Electronics Store,Doner Restaurant,Donut Shop,Drugstore,Dumpling Restaurant,Eastern European Restaurant
4,0,M1H,Scarborough,Athletics & Sports,Fried Chicken Joint,Caribbean Restaurant,Thai Restaurant,Hakka Restaurant,Bakery,Bank,Doner Restaurant,Donut Shop,Drugstore


merge mergedDF with postalcodes_venues_sorted to add latitude/longitude for each Postal Code/Borough

In [34]:
toronto_merged = pd.merge(mergedDF, postalcodes_venues_sorted, on=['Postal Code','Borough'], how='inner')
#toronto_merged = mergedDF.join(postalcodes_venues_sorted.set_index(['Postal Code','Borough']), on=['Postal Code','Borough'])
toronto_merged

Unnamed: 0,Postal Code,Borough,Neighborhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,M1B,Scarborough,"Rouge, Malvern",43.806686,-79.194353,4,Fast Food Restaurant,Yoga Studio,Empanada Restaurant,Doner Restaurant,Donut Shop,Drugstore,Dumpling Restaurant,Eastern European Restaurant,Electronics Store,Ethiopian Restaurant
1,M1C,Scarborough,"Highland Creek, Rouge Hill, Port Union",43.784535,-79.160497,0,Moving Target,Bar,Fish & Chips Shop,Dog Run,Doner Restaurant,Donut Shop,Drugstore,Dumpling Restaurant,Eastern European Restaurant,Yoga Studio
2,M1E,Scarborough,"Guildwood, Morningside, West Hill",43.763573,-79.188711,0,Pizza Place,Breakfast Spot,Rental Car Location,Electronics Store,Mexican Restaurant,Medical Center,Event Space,Ethiopian Restaurant,Empanada Restaurant,Discount Store
3,M1G,Scarborough,Woburn,43.770992,-79.216917,0,Coffee Shop,Insurance Office,Korean Restaurant,Yoga Studio,Electronics Store,Doner Restaurant,Donut Shop,Drugstore,Dumpling Restaurant,Eastern European Restaurant
4,M1H,Scarborough,Cedarbrae,43.773136,-79.239476,0,Athletics & Sports,Fried Chicken Joint,Caribbean Restaurant,Thai Restaurant,Hakka Restaurant,Bakery,Bank,Doner Restaurant,Donut Shop,Drugstore
5,M1J,Scarborough,Scarborough Village,43.744734,-79.239476,0,Spa,Playground,Electronics Store,Dog Run,Doner Restaurant,Donut Shop,Drugstore,Dumpling Restaurant,Eastern European Restaurant,Empanada Restaurant
6,M1K,Scarborough,"East Birchmount Park, Ionview, Kennedy Park",43.727929,-79.262029,0,Discount Store,Train Station,Bus Station,Department Store,Chinese Restaurant,Coffee Shop,Yoga Studio,Doner Restaurant,Donut Shop,Drugstore
7,M1L,Scarborough,"Clairlea, Golden Mile, Oakridge",43.711112,-79.284577,0,Bus Line,Bakery,Intersection,Bus Station,Soccer Field,Metro Station,Fast Food Restaurant,Park,Creperie,Cuban Restaurant
8,M1M,Scarborough,"Cliffcrest, Cliffside, Scarborough Village West",43.716316,-79.239476,0,American Restaurant,Motel,Skating Rink,Yoga Studio,Doner Restaurant,Donut Shop,Drugstore,Dumpling Restaurant,Eastern European Restaurant,Electronics Store
9,M1N,Scarborough,"Birch Cliff, Cliffside West",43.692657,-79.264848,0,College Stadium,General Entertainment,Skating Rink,Café,Dumpling Restaurant,Discount Store,Dog Run,Doner Restaurant,Donut Shop,Drugstore


Let's visualize the resulting clusters

In [35]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=10)

In [36]:
# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

In [37]:
# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

In [82]:
# add markers to the map
markers_colors = []
for lat, lon, post, rough, cluster in zip(toronto_merged['Latitude'], toronto_merged['Longitude'], toronto_merged['Postal Code'], toronto_merged['Borough'],toronto_merged['Cluster Labels']):
    label = folium.Popup(str(post) + ', ' + rough + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[int(cluster)-1],
        fill=True,
        fill_color=rainbow[int(cluster)-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

Now, let's each cluster and determine the discriminating venue categories that distinguish each cluster.

Cluster 1

In [38]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 0, toronto_merged.columns[[0,1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Postal Code,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
1,M1C,Scarborough,0,Moving Target,Bar,Fish & Chips Shop,Dog Run,Doner Restaurant,Donut Shop,Drugstore,Dumpling Restaurant,Eastern European Restaurant,Yoga Studio
2,M1E,Scarborough,0,Pizza Place,Breakfast Spot,Rental Car Location,Electronics Store,Mexican Restaurant,Medical Center,Event Space,Ethiopian Restaurant,Empanada Restaurant,Discount Store
3,M1G,Scarborough,0,Coffee Shop,Insurance Office,Korean Restaurant,Yoga Studio,Electronics Store,Doner Restaurant,Donut Shop,Drugstore,Dumpling Restaurant,Eastern European Restaurant
4,M1H,Scarborough,0,Athletics & Sports,Fried Chicken Joint,Caribbean Restaurant,Thai Restaurant,Hakka Restaurant,Bakery,Bank,Doner Restaurant,Donut Shop,Drugstore
5,M1J,Scarborough,0,Spa,Playground,Electronics Store,Dog Run,Doner Restaurant,Donut Shop,Drugstore,Dumpling Restaurant,Eastern European Restaurant,Empanada Restaurant
6,M1K,Scarborough,0,Discount Store,Train Station,Bus Station,Department Store,Chinese Restaurant,Coffee Shop,Yoga Studio,Doner Restaurant,Donut Shop,Drugstore
7,M1L,Scarborough,0,Bus Line,Bakery,Intersection,Bus Station,Soccer Field,Metro Station,Fast Food Restaurant,Park,Creperie,Cuban Restaurant
8,M1M,Scarborough,0,American Restaurant,Motel,Skating Rink,Yoga Studio,Doner Restaurant,Donut Shop,Drugstore,Dumpling Restaurant,Eastern European Restaurant,Electronics Store
9,M1N,Scarborough,0,College Stadium,General Entertainment,Skating Rink,Café,Dumpling Restaurant,Discount Store,Dog Run,Doner Restaurant,Donut Shop,Drugstore
10,M1P,Scarborough,0,Indian Restaurant,Pet Store,Furniture / Home Store,Vietnamese Restaurant,Chinese Restaurant,Latin American Restaurant,Yoga Studio,Dumpling Restaurant,Doner Restaurant,Donut Shop


Cluster 2

In [39]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 1, toronto_merged.columns[[0,1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Postal Code,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
88,M8Y,Etobicoke,1,Baseball Field,Fish Market,Doner Restaurant,Donut Shop,Drugstore,Dumpling Restaurant,Eastern European Restaurant,Electronics Store,Empanada Restaurant,Yoga Studio
93,M9M,North York,1,Baseball Field,Fish Market,Doner Restaurant,Donut Shop,Drugstore,Dumpling Restaurant,Eastern European Restaurant,Electronics Store,Empanada Restaurant,Yoga Studio


Cluster 3

In [40]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 2, toronto_merged.columns[[0,1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Postal Code,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
60,M5N,Central Toronto,2,Garden,Yoga Studio,Electronics Store,Doner Restaurant,Donut Shop,Drugstore,Dumpling Restaurant,Eastern European Restaurant,Empanada Restaurant,Discount Store


Cluster 4

In [41]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 3, toronto_merged.columns[[0,1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Postal Code,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
14,M1V,Scarborough,3,Park,Playground,Yoga Studio,Electronics Store,Dog Run,Doner Restaurant,Donut Shop,Drugstore,Dumpling Restaurant,Eastern European Restaurant
20,M2P,North York,3,Park,Bank,Yoga Studio,Electronics Store,Doner Restaurant,Donut Shop,Drugstore,Dumpling Restaurant,Eastern European Restaurant,Empanada Restaurant
37,M4J,East York,3,Park,Convenience Store,Coffee Shop,Yoga Studio,Electronics Store,Dog Run,Doner Restaurant,Donut Shop,Drugstore,Dumpling Restaurant
47,M4W,Downtown Toronto,3,Park,Playground,Trail,Yoga Studio,Dumpling Restaurant,Discount Store,Dog Run,Doner Restaurant,Donut Shop,Drugstore
94,M9N,York,3,Park,Yoga Studio,Electronics Store,Dog Run,Doner Restaurant,Donut Shop,Drugstore,Dumpling Restaurant,Eastern European Restaurant,Empanada Restaurant


Cluster 5

In [42]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 4, toronto_merged.columns[[0,1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Postal Code,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,M1B,Scarborough,4,Fast Food Restaurant,Yoga Studio,Empanada Restaurant,Doner Restaurant,Donut Shop,Drugstore,Dumpling Restaurant,Eastern European Restaurant,Electronics Store,Ethiopian Restaurant
