In [1]:
import pandas as pd

In [2]:
import requests

In [3]:
url='https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M'

Using Pandas read html function to extract the relevant data table from above wikipedia page

In [4]:
html = requests.get(url).content
df_list = pd.read_html(html)
df = df_list[0]
print(df)

    Postcode           Borough          Neighbourhood
0        M1A      Not assigned           Not assigned
1        M2A      Not assigned           Not assigned
2        M3A        North York              Parkwoods
3        M4A        North York       Victoria Village
4        M5A  Downtown Toronto           Harbourfront
..       ...               ...                    ...
282      M8Z         Etobicoke              Mimico NW
283      M8Z         Etobicoke     The Queensway West
284      M8Z         Etobicoke  Royal York South West
285      M8Z         Etobicoke         South of Bloor
286      M9Z      Not assigned           Not assigned

[287 rows x 3 columns]


In [5]:
df.shape

(287, 3)

Creating a new dataframe df1 which ignores all the Boroughs marked 'Not assigned'

In [6]:
df1=df[df.Borough!='Not assigned']

In [7]:
df1.shape

(210, 3)

The new dataframe has reduced to 210 rows from 287 rows indicating that 77 rows had borough field Not assigned. This can be checked as follows to count the number of Not assigned Boroughs

In [8]:
NA=df[df['Borough']=='Not assigned']
NA.shape

(77, 3)

In [9]:
df1.head()

Unnamed: 0,Postcode,Borough,Neighbourhood
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,Harbourfront
5,M6A,North York,Lawrence Heights
6,M6A,North York,Lawrence Manor


Replacing the neigbourhood entries which are 'Not assigned' with name of the corresponding Borough

In [10]:
import numpy as np

In [11]:
df1['Neighbourhood']=np.where(df1['Neighbourhood']=='Not assigned',df1['Borough'],df1['Neighbourhood'])

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  """Entry point for launching an IPython kernel.


In [12]:
df1.shape

(210, 3)

Creating a new dataframe df2 which has aggregated the all the neigbourhoods with same post codes

In [13]:
df2=df1.groupby(['Postcode','Borough'])['Neighbourhood'].unique().to_frame(name='Neighbourhood').reset_index()

In [14]:
df2.head()

Unnamed: 0,Postcode,Borough,Neighbourhood
0,M1B,Scarborough,"[Rouge, Malvern]"
1,M1C,Scarborough,"[Highland Creek, Rouge Hill, Port Union]"
2,M1E,Scarborough,"[Guildwood, Morningside, West Hill]"
3,M1G,Scarborough,[Woburn]
4,M1H,Scarborough,[Cedarbrae]


Checking the shape of dataframe df2

In [15]:
df2.shape

(103, 3)

Read the csv file containing the co-ordinates data

In [16]:
Co=pd.read_csv('Toronto_SRN_mod.csv')
Co.head()

Unnamed: 0,Postal Code,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497
2,M1E,43.763573,-79.188711
3,M1G,43.770992,-79.216917
4,M1H,43.773136,-79.239476


In [17]:
df2.head()

Unnamed: 0,Postcode,Borough,Neighbourhood
0,M1B,Scarborough,"[Rouge, Malvern]"
1,M1C,Scarborough,"[Highland Creek, Rouge Hill, Port Union]"
2,M1E,Scarborough,"[Guildwood, Morningside, West Hill]"
3,M1G,Scarborough,[Woburn]
4,M1H,Scarborough,[Cedarbrae]


In order to merge the two dataframes using the common column of Postcode the column name in the two dataframes for post code should be identitcal

Renaming the column name Postal Code in the co-ordinates file as Postcode in line with df2

In [18]:
Co.rename(columns={'Postal Code':'Postcode'}, inplace=True)

In [19]:
Co.head()

Unnamed: 0,Postcode,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497
2,M1E,43.763573,-79.188711
3,M1G,43.770992,-79.216917
4,M1H,43.773136,-79.239476


Merging the two dataframes

In [20]:
df3=pd.merge(df2,Co)

In [21]:
df3.head()

Unnamed: 0,Postcode,Borough,Neighbourhood,Latitude,Longitude
0,M1B,Scarborough,"[Rouge, Malvern]",43.806686,-79.194353
1,M1C,Scarborough,"[Highland Creek, Rouge Hill, Port Union]",43.784535,-79.160497
2,M1E,Scarborough,"[Guildwood, Morningside, West Hill]",43.763573,-79.188711
3,M1G,Scarborough,[Woburn],43.770992,-79.216917
4,M1H,Scarborough,[Cedarbrae],43.773136,-79.239476


In [22]:
df3.shape

(103, 5)

In [24]:
len(df3['Borough'].unique())

11

In [25]:
conda install -c conda-forge folium

Collecting package metadata (current_repodata.json): ...working... done
Solving environment: ...working... done

## Package Plan ##

  environment location: D:\Users\Sharad\Anaconda3

  added / updated specs:
    - folium


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    folium-0.10.1              |             py_0          59 KB  conda-forge
    ------------------------------------------------------------
                                           Total:          59 KB

The following packages will be UPDATED:

  folium                                         0.5.0-py_0 --> 0.10.1-py_0



Downloading and Extracting Packages

folium-0.10.1        | 59 KB     |            |   0% 
folium-0.10.1        | 59 KB     | ##7        |  27% 
folium-0.10.1        | 59 KB     | ########## | 100% 
Preparing transaction: ...working... done
Verifying transaction: ...working... done
Executing transactio

In [26]:
df3.head()

Unnamed: 0,Postcode,Borough,Neighbourhood,Latitude,Longitude
0,M1B,Scarborough,"[Rouge, Malvern]",43.806686,-79.194353
1,M1C,Scarborough,"[Highland Creek, Rouge Hill, Port Union]",43.784535,-79.160497
2,M1E,Scarborough,"[Guildwood, Morningside, West Hill]",43.763573,-79.188711
3,M1G,Scarborough,[Woburn],43.770992,-79.216917
4,M1H,Scarborough,[Cedarbrae],43.773136,-79.239476


Generating a new dataframe that contains only Boroughs in Toronto

In [47]:
df4=df3[df3['Borough'].str.contains('Toronto')].reset_index()

In [48]:
df4.head()

Unnamed: 0,index,Postcode,Borough,Neighbourhood,Latitude,Longitude
0,37,M4E,East Toronto,[The Beaches],43.676357,-79.293031
1,41,M4K,East Toronto,"[The Danforth West, Riverdale]",43.679557,-79.352188
2,42,M4L,East Toronto,"[The Beaches West, India Bazaar]",43.668999,-79.315572
3,43,M4M,East Toronto,[Studio District],43.659526,-79.340923
4,44,M4N,Central Toronto,[Lawrence Park],43.72802,-79.38879


In [30]:
from sklearn.cluster import KMeans

In [33]:
!conda install -c conda-forge geopy --yes 
from geopy.geocoders import Nominatim 

Collecting package metadata (current_repodata.json): ...working... done
Solving environment: ...working... done

## Package Plan ##

  environment location: D:\Users\Sharad\Anaconda3

  added / updated specs:
    - geopy


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    geographiclib-1.50         |             py_0          34 KB  conda-forge
    geopy-1.20.0               |             py_0          57 KB  conda-forge
    ------------------------------------------------------------
                                           Total:          91 KB

The following NEW packages will be INSTALLED:

  geographiclib      conda-forge/noarch::geographiclib-1.50-py_0
  geopy              conda-forge/noarch::geopy-1.20.0-py_0



Downloading and Extracting Packages

geographiclib-1.50   | 34 KB     |            |   0% 
geographiclib-1.50   | 34 KB     | ####7      |  47% 
geographiclib-1.50   | 34 K

In [34]:
address = 'Toronto'

geolocator = Nominatim(user_agent="Toronto_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Toronto are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Toronto are 43.653963, -79.387207.


In [35]:
import folium

In [49]:
# create map of Manhattan using latitude and longitude values
map_Toronto = folium.Map(location=[latitude, longitude], zoom_start=11)

# add markers to map
for lat, lng, label in zip(df4['Latitude'], df4['Longitude'], df4['Neighbourhood']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_Toronto)  
    
map_Toronto

In [39]:
CLIENT_ID = '1GTOB151OXYG0KP0THG4BS5I1GEB0DLNMG5WZ04NJBXBKXDX' # your Foursquare ID
CLIENT_SECRET = 'KH2DA45DVKE1SAKITIGO0LW3URRMDJTXNJXH4FAXH53GRHQD' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: 1GTOB151OXYG0KP0THG4BS5I1GEB0DLNMG5WZ04NJBXBKXDX
CLIENT_SECRET:KH2DA45DVKE1SAKITIGO0LW3URRMDJTXNJXH4FAXH53GRHQD


In [40]:
import json

In [74]:
df4.loc[0,'Borough']

'East Toronto'

In [75]:
Borough_latitude =df4.loc[0, 'Latitude']
Borough_longitude = df4.loc[0, 'Longitude'] 

Borough_name = df4.loc[0, 'Borough'] 

print('Latitude and longitude values of {} are {}, {}.'.format(Borough_name, 
                                                               Borough_latitude, 
                                                               Borough_longitude))

Latitude and longitude values of East Toronto are 43.67635739999999, -79.2930312.


In [76]:
LIMIT = 100 # limit of number of venues returned by Foursquare API
radius = 500 # define radius
url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
    CLIENT_ID, 
    CLIENT_SECRET, 
    VERSION, 
    Borough_latitude, 
    Borough_longitude, 
    radius, 
    LIMIT)
url # display URL

'https://api.foursquare.com/v2/venues/explore?&client_id=1GTOB151OXYG0KP0THG4BS5I1GEB0DLNMG5WZ04NJBXBKXDX&client_secret=KH2DA45DVKE1SAKITIGO0LW3URRMDJTXNJXH4FAXH53GRHQD&v=20180605&ll=43.67635739999999,-79.2930312&radius=500&limit=100'

In [77]:
results = requests.get(url).json()

In [78]:
# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

In [79]:
from pandas.io.json import json_normalize

In [80]:
venues = results['response']['groups'][0]['items']
    
nearby_venues = json_normalize(venues) # flatten JSON

# filter columns
filtered_columns = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
nearby_venues =nearby_venues.loc[:, filtered_columns]

# filter the category for each row
nearby_venues['venue.categories'] = nearby_venues.apply(get_category_type, axis=1)

# clean columns
nearby_venues.columns = [col.split(".")[-1] for col in nearby_venues.columns]

nearby_venues.head()

Unnamed: 0,name,categories,lat,lng
0,Glen Manor Ravine,Trail,43.676821,-79.293942
1,The Big Carrot Natural Food Market,Health Food Store,43.678879,-79.297734
2,Glen Stewart Park,Park,43.675278,-79.294647
3,Glen Stewart Ravine,Other Great Outdoors,43.6763,-79.294784
4,Grover Pub and Grub,Pub,43.679181,-79.297215


In [81]:
print('{} venues were returned by Foursquare.'.format(nearby_venues.shape[0]))

6 venues were returned by Foursquare.


In [82]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [83]:
Toronto_venues = getNearbyVenues(names=df4['Borough'],
                                   latitudes=df4['Latitude'],
                                   longitudes=df4['Longitude']
                                  )



East Toronto
East Toronto
East Toronto
East Toronto
Central Toronto
Central Toronto
Central Toronto
Central Toronto
Central Toronto
Central Toronto
Downtown Toronto
Downtown Toronto
Downtown Toronto
Downtown Toronto
Downtown Toronto
Downtown Toronto
Downtown Toronto
Downtown Toronto
Downtown Toronto
Downtown Toronto
Downtown Toronto
Downtown Toronto
Central Toronto
Central Toronto
Central Toronto
Downtown Toronto
Downtown Toronto
Downtown Toronto
Downtown Toronto
Downtown Toronto
Downtown Toronto
West Toronto
West Toronto
West Toronto
West Toronto
West Toronto
West Toronto
Downtown Toronto
East Toronto


In [84]:
print(Toronto_venues.shape)
Toronto_venues.head()

(1704, 7)


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,East Toronto,43.676357,-79.293031,Glen Manor Ravine,43.676821,-79.293942,Trail
1,East Toronto,43.676357,-79.293031,The Big Carrot Natural Food Market,43.678879,-79.297734,Health Food Store
2,East Toronto,43.676357,-79.293031,Glen Stewart Park,43.675278,-79.294647,Park
3,East Toronto,43.676357,-79.293031,Glen Stewart Ravine,43.6763,-79.294784,Other Great Outdoors
4,East Toronto,43.676357,-79.293031,Grover Pub and Grub,43.679181,-79.297215,Pub


In [85]:
print('There are {} uniques categories.'.format(len(Toronto_venues['Venue Category'].unique())))

There are 232 uniques categories.


In [87]:
# one hot encoding
Toronto_onehot = pd.get_dummies(Toronto_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
Toronto_onehot['Neighborhood'] = Toronto_venues['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [Toronto_onehot.columns[-1]] + list(Toronto_onehot.columns[:-1])
Toronto_onehot = Toronto_onehot[fixed_columns]

Toronto_onehot.head()

Unnamed: 0,Yoga Studio,Afghan Restaurant,Airport,Airport Food Court,Airport Gate,Airport Lounge,Airport Service,Airport Terminal,American Restaurant,Antique Shop,...,Toy / Game Store,Trail,Train Station,Vegetarian / Vegan Restaurant,Video Game Store,Video Store,Vietnamese Restaurant,Wine Bar,Wine Shop,Women's Store
0,0,0,0,0,0,0,0,0,0,0,...,0,1,0,0,0,0,0,0,0,0
1,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [88]:
Toronto_onehot.shape

(1704, 232)

In [90]:
Toronto_grouped = Toronto_onehot.groupby('Neighborhood').mean().reset_index()
Toronto_grouped

Unnamed: 0,Neighborhood,Yoga Studio,Afghan Restaurant,Airport,Airport Food Court,Airport Gate,Airport Lounge,Airport Service,Airport Terminal,American Restaurant,...,Toy / Game Store,Trail,Train Station,Vegetarian / Vegan Restaurant,Video Game Store,Video Store,Vietnamese Restaurant,Wine Bar,Wine Shop,Women's Store
0,Central Toronto,0.009346,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.018692,...,0.009346,0.009346,0.0,0.009346,0.0,0.0,0.009346,0.0,0.0,0.0
1,Downtown Toronto,0.002295,0.000765,0.000765,0.000765,0.000765,0.00153,0.002295,0.00153,0.009946,...,0.000765,0.000765,0.002295,0.011477,0.00153,0.000765,0.005356,0.006886,0.000765,0.00153
2,East Toronto,0.023622,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.023622,...,0.0,0.015748,0.0,0.0,0.0,0.0,0.0,0.007874,0.0,0.0
3,West Toronto,0.02454,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.01227,0.0,0.0,0.01227,0.006135,0.0,0.0


In [91]:
Toronto_grouped.shape

(4, 232)

In [92]:
num_top_venues = 5

for hood in Toronto_grouped['Neighborhood']:
    print("----"+hood+"----")
    temp = Toronto_grouped[Toronto_grouped['Neighborhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----Central Toronto----
            venue  freq
0  Sandwich Place  0.07
1     Coffee Shop  0.07
2            Park  0.06
3            Café  0.06
4     Pizza Place  0.05


----Downtown Toronto----
         venue  freq
0  Coffee Shop  0.10
1         Café  0.05
2   Restaurant  0.03
3        Hotel  0.03
4       Bakery  0.02


----East Toronto----
                venue  freq
0    Greek Restaurant  0.07
1         Coffee Shop  0.06
2  Italian Restaurant  0.05
3             Brewery  0.04
4      Ice Cream Shop  0.04


----West Toronto----
         venue  freq
0          Bar  0.09
1         Café  0.07
2  Coffee Shop  0.05
3       Bakery  0.04
4   Restaurant  0.04




In [93]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [110]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Borough']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Borough'] = Toronto_grouped['Neighborhood']

for ind in np.arange(Toronto_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(Toronto_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted.head()

Unnamed: 0,Borough,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Central Toronto,Coffee Shop,Sandwich Place,Park,Café,Pizza Place,Dessert Shop,Sushi Restaurant,Pub,Restaurant,Gym
1,Downtown Toronto,Coffee Shop,Café,Restaurant,Hotel,Bakery,Japanese Restaurant,Italian Restaurant,Bar,Park,Beer Bar
2,East Toronto,Greek Restaurant,Coffee Shop,Italian Restaurant,Brewery,Ice Cream Shop,Park,Café,Pub,Sandwich Place,Yoga Studio
3,West Toronto,Bar,Café,Coffee Shop,Restaurant,Bakery,Italian Restaurant,Yoga Studio,Pizza Place,Asian Restaurant,Bookstore


In [111]:
# set number of clusters
kclusters = 4

Toronto_grouped_clustering = Toronto_grouped.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(Toronto_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

array([0, 3, 2, 1])

In [112]:
Toronto_merged.head()

Unnamed: 0,index,Postcode,Borough,Neighbourhood,Latitude,Longitude
0,37,M4E,East Toronto,[The Beaches],43.676357,-79.293031
1,41,M4K,East Toronto,"[The Danforth West, Riverdale]",43.679557,-79.352188
2,42,M4L,East Toronto,"[The Beaches West, India Bazaar]",43.668999,-79.315572
3,43,M4M,East Toronto,[Studio District],43.659526,-79.340923
4,44,M4N,Central Toronto,[Lawrence Park],43.72802,-79.38879


In [113]:
neighborhoods_venues_sorted.head()

Unnamed: 0,Borough,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Central Toronto,Coffee Shop,Sandwich Place,Park,Café,Pizza Place,Dessert Shop,Sushi Restaurant,Pub,Restaurant,Gym
1,Downtown Toronto,Coffee Shop,Café,Restaurant,Hotel,Bakery,Japanese Restaurant,Italian Restaurant,Bar,Park,Beer Bar
2,East Toronto,Greek Restaurant,Coffee Shop,Italian Restaurant,Brewery,Ice Cream Shop,Park,Café,Pub,Sandwich Place,Yoga Studio
3,West Toronto,Bar,Café,Coffee Shop,Restaurant,Bakery,Italian Restaurant,Yoga Studio,Pizza Place,Asian Restaurant,Bookstore


In [114]:
Toronto_grouped.head()

Unnamed: 0,Neighborhood,Yoga Studio,Afghan Restaurant,Airport,Airport Food Court,Airport Gate,Airport Lounge,Airport Service,Airport Terminal,American Restaurant,...,Toy / Game Store,Trail,Train Station,Vegetarian / Vegan Restaurant,Video Game Store,Video Store,Vietnamese Restaurant,Wine Bar,Wine Shop,Women's Store
0,Central Toronto,0.009346,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.018692,...,0.009346,0.009346,0.0,0.009346,0.0,0.0,0.009346,0.0,0.0,0.0
1,Downtown Toronto,0.002295,0.000765,0.000765,0.000765,0.000765,0.00153,0.002295,0.00153,0.009946,...,0.000765,0.000765,0.002295,0.011477,0.00153,0.000765,0.005356,0.006886,0.000765,0.00153
2,East Toronto,0.023622,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.023622,...,0.0,0.015748,0.0,0.0,0.0,0.0,0.0,0.007874,0.0,0.0
3,West Toronto,0.02454,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.01227,0.0,0.0,0.01227,0.006135,0.0,0.0


In [115]:
# add clustering labels
neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

Toronto_merged = df4

# merge toronto_grouped with toronto_data to add latitude/longitude for each neighborhood
Toronto_merged = Toronto_merged.join(neighborhoods_venues_sorted.set_index('Borough'), on='Borough')

Toronto_merged.head() # check the last columns!

Unnamed: 0,index,Postcode,Borough,Neighbourhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,37,M4E,East Toronto,[The Beaches],43.676357,-79.293031,2,Greek Restaurant,Coffee Shop,Italian Restaurant,Brewery,Ice Cream Shop,Park,Café,Pub,Sandwich Place,Yoga Studio
1,41,M4K,East Toronto,"[The Danforth West, Riverdale]",43.679557,-79.352188,2,Greek Restaurant,Coffee Shop,Italian Restaurant,Brewery,Ice Cream Shop,Park,Café,Pub,Sandwich Place,Yoga Studio
2,42,M4L,East Toronto,"[The Beaches West, India Bazaar]",43.668999,-79.315572,2,Greek Restaurant,Coffee Shop,Italian Restaurant,Brewery,Ice Cream Shop,Park,Café,Pub,Sandwich Place,Yoga Studio
3,43,M4M,East Toronto,[Studio District],43.659526,-79.340923,2,Greek Restaurant,Coffee Shop,Italian Restaurant,Brewery,Ice Cream Shop,Park,Café,Pub,Sandwich Place,Yoga Studio
4,44,M4N,Central Toronto,[Lawrence Park],43.72802,-79.38879,0,Coffee Shop,Sandwich Place,Park,Café,Pizza Place,Dessert Shop,Sushi Restaurant,Pub,Restaurant,Gym


In [117]:
import matplotlib.cm as cm
import matplotlib.colors as colors

In [121]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(Toronto_merged['Latitude'], Toronto_merged['Longitude'], Toronto_merged['Borough'], Toronto_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

Exploring Cluster

Cluster 1

In [122]:
Toronto_merged.loc[Toronto_merged['Cluster Labels'] == 0, Toronto_merged.columns[[1] + list(range(5, Toronto_merged.shape[1]))]]

Unnamed: 0,Postcode,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
4,M4N,-79.38879,0,Coffee Shop,Sandwich Place,Park,Café,Pizza Place,Dessert Shop,Sushi Restaurant,Pub,Restaurant,Gym
5,M4P,-79.390197,0,Coffee Shop,Sandwich Place,Park,Café,Pizza Place,Dessert Shop,Sushi Restaurant,Pub,Restaurant,Gym
6,M4R,-79.405678,0,Coffee Shop,Sandwich Place,Park,Café,Pizza Place,Dessert Shop,Sushi Restaurant,Pub,Restaurant,Gym
7,M4S,-79.38879,0,Coffee Shop,Sandwich Place,Park,Café,Pizza Place,Dessert Shop,Sushi Restaurant,Pub,Restaurant,Gym
8,M4T,-79.38316,0,Coffee Shop,Sandwich Place,Park,Café,Pizza Place,Dessert Shop,Sushi Restaurant,Pub,Restaurant,Gym
9,M4V,-79.400049,0,Coffee Shop,Sandwich Place,Park,Café,Pizza Place,Dessert Shop,Sushi Restaurant,Pub,Restaurant,Gym
22,M5N,-79.416936,0,Coffee Shop,Sandwich Place,Park,Café,Pizza Place,Dessert Shop,Sushi Restaurant,Pub,Restaurant,Gym
23,M5P,-79.411307,0,Coffee Shop,Sandwich Place,Park,Café,Pizza Place,Dessert Shop,Sushi Restaurant,Pub,Restaurant,Gym
24,M5R,-79.405678,0,Coffee Shop,Sandwich Place,Park,Café,Pizza Place,Dessert Shop,Sushi Restaurant,Pub,Restaurant,Gym


Cluster 2

In [123]:
Toronto_merged.loc[Toronto_merged['Cluster Labels'] == 1, Toronto_merged.columns[[1] + list(range(5, Toronto_merged.shape[1]))]]

Unnamed: 0,Postcode,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
31,M6H,-79.442259,1,Bar,Café,Coffee Shop,Restaurant,Bakery,Italian Restaurant,Yoga Studio,Pizza Place,Asian Restaurant,Bookstore
32,M6J,-79.41975,1,Bar,Café,Coffee Shop,Restaurant,Bakery,Italian Restaurant,Yoga Studio,Pizza Place,Asian Restaurant,Bookstore
33,M6K,-79.428191,1,Bar,Café,Coffee Shop,Restaurant,Bakery,Italian Restaurant,Yoga Studio,Pizza Place,Asian Restaurant,Bookstore
34,M6P,-79.464763,1,Bar,Café,Coffee Shop,Restaurant,Bakery,Italian Restaurant,Yoga Studio,Pizza Place,Asian Restaurant,Bookstore
35,M6R,-79.456325,1,Bar,Café,Coffee Shop,Restaurant,Bakery,Italian Restaurant,Yoga Studio,Pizza Place,Asian Restaurant,Bookstore
36,M6S,-79.48445,1,Bar,Café,Coffee Shop,Restaurant,Bakery,Italian Restaurant,Yoga Studio,Pizza Place,Asian Restaurant,Bookstore


Cluster 3

In [124]:
Toronto_merged.loc[Toronto_merged['Cluster Labels'] == 2, Toronto_merged.columns[[1] + list(range(5, Toronto_merged.shape[1]))]]

Unnamed: 0,Postcode,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,M4E,-79.293031,2,Greek Restaurant,Coffee Shop,Italian Restaurant,Brewery,Ice Cream Shop,Park,Café,Pub,Sandwich Place,Yoga Studio
1,M4K,-79.352188,2,Greek Restaurant,Coffee Shop,Italian Restaurant,Brewery,Ice Cream Shop,Park,Café,Pub,Sandwich Place,Yoga Studio
2,M4L,-79.315572,2,Greek Restaurant,Coffee Shop,Italian Restaurant,Brewery,Ice Cream Shop,Park,Café,Pub,Sandwich Place,Yoga Studio
3,M4M,-79.340923,2,Greek Restaurant,Coffee Shop,Italian Restaurant,Brewery,Ice Cream Shop,Park,Café,Pub,Sandwich Place,Yoga Studio
38,M7Y,-79.321558,2,Greek Restaurant,Coffee Shop,Italian Restaurant,Brewery,Ice Cream Shop,Park,Café,Pub,Sandwich Place,Yoga Studio


Cluster 4

In [125]:
Toronto_merged.loc[Toronto_merged['Cluster Labels'] == 3, Toronto_merged.columns[[1] + list(range(5, Toronto_merged.shape[1]))]]

Unnamed: 0,Postcode,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
10,M4W,-79.377529,3,Coffee Shop,Café,Restaurant,Hotel,Bakery,Japanese Restaurant,Italian Restaurant,Bar,Park,Beer Bar
11,M4X,-79.367675,3,Coffee Shop,Café,Restaurant,Hotel,Bakery,Japanese Restaurant,Italian Restaurant,Bar,Park,Beer Bar
12,M4Y,-79.38316,3,Coffee Shop,Café,Restaurant,Hotel,Bakery,Japanese Restaurant,Italian Restaurant,Bar,Park,Beer Bar
13,M5A,-79.360636,3,Coffee Shop,Café,Restaurant,Hotel,Bakery,Japanese Restaurant,Italian Restaurant,Bar,Park,Beer Bar
14,M5B,-79.378937,3,Coffee Shop,Café,Restaurant,Hotel,Bakery,Japanese Restaurant,Italian Restaurant,Bar,Park,Beer Bar
15,M5C,-79.375418,3,Coffee Shop,Café,Restaurant,Hotel,Bakery,Japanese Restaurant,Italian Restaurant,Bar,Park,Beer Bar
16,M5E,-79.373306,3,Coffee Shop,Café,Restaurant,Hotel,Bakery,Japanese Restaurant,Italian Restaurant,Bar,Park,Beer Bar
17,M5G,-79.387383,3,Coffee Shop,Café,Restaurant,Hotel,Bakery,Japanese Restaurant,Italian Restaurant,Bar,Park,Beer Bar
18,M5H,-79.384568,3,Coffee Shop,Café,Restaurant,Hotel,Bakery,Japanese Restaurant,Italian Restaurant,Bar,Park,Beer Bar
19,M5J,-79.381752,3,Coffee Shop,Café,Restaurant,Hotel,Bakery,Japanese Restaurant,Italian Restaurant,Bar,Park,Beer Bar
