In the block below, both pandas and numpy libraries are loaded into the jupyter notebook

In [1]:
import pandas as pd
import numpy as np

In the block below, the dataframe was read from the provided Wikipedia website, and any Borough's with the name 'Not assigned' were ommitted.

In [2]:
tables=pd.read_html('https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M',header=0)
df1=tables[0]
df2=df1[df1['Borough']!='Not assigned']
df2.head()

Unnamed: 0,Postcode,Borough,Neighbourhood
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,Harbourfront
5,M5A,Downtown Toronto,Regent Park
6,M6A,North York,Lawrence Heights


In the block below, neighborhoods with common Borough's and Postcodes were combined and seperated by a comma,
using the aggregate join command

In [3]:
df3=df2.groupby(['Postcode','Borough']).agg(','.join)

In the block below, any Neighbourhood with the 'Not Assigned' ID were repaced with the Borough name

In [4]:
print(df3[df3['Neighbourhood']=='Not assigned'])

df4=df3.reset_index()
df4.loc[df4['Neighbourhood']=='Not assigned','Neighbourhood']=df4.loc[df4['Neighbourhood']=='Not assigned']['Borough']
df4[df4['Postcode']=='M7A']

                      Neighbourhood
Postcode Borough                   
M7A      Queen's Park  Not assigned


Unnamed: 0,Postcode,Borough,Neighbourhood
85,M7A,Queen's Park,Queen's Park


In the block below, the shape command was used to print the number of rows in the dataframe.

In [5]:
df4.shape[0]

103

I was unable to use the Geocoder package, after mulitple attempts.  Accordingly, I used the provided 'Geospatial_Coordinates.csv' file, which I read directly into a dataframe.

In [6]:
geodf=pd.read_csv('Geospatial_Coordinates.csv')
geodf.head()

Unnamed: 0,Postal Code,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497
2,M1E,43.763573,-79.188711
3,M1G,43.770992,-79.216917
4,M1H,43.773136,-79.239476


In the block below, I merged the dataframes using the postal code in both dataframes.

In [7]:
dfAll=pd.merge(df4,geodf,left_on='Postcode',right_on='Postal Code')
dfAll.drop(columns='Postal Code',inplace=True)
dfAll.rename(columns={'Postcode':'PostalCode'},inplace=True)
dfAll.head()

Unnamed: 0,PostalCode,Borough,Neighbourhood,Latitude,Longitude
0,M1B,Scarborough,"Rouge,Malvern",43.806686,-79.194353
1,M1C,Scarborough,"Highland Creek,Rouge Hill,Port Union",43.784535,-79.160497
2,M1E,Scarborough,"Guildwood,Morningside,West Hill",43.763573,-79.188711
3,M1G,Scarborough,Woburn,43.770992,-79.216917
4,M1H,Scarborough,Cedarbrae,43.773136,-79.239476


# Create a map of toronto with neighbourhoods superimposed on top

In [8]:
import folium

tlat=43.6532
tlong=-79.3832
torontoMap=folium.Map(location=[tlat,tlong],zoom_start=10.7)

for lat,long,borough,neighborhood in zip(dfAll['Latitude'],dfAll['Longitude'],dfAll['Borough'],dfAll['Neighbourhood']):
    label='{}, {}'.format(neighborhood,borough)
    label=folium.Popup(label,parse_html=True)
    folium.CircleMarker([lat,long],
                       radius=5,
                       popup=label,
                       color='blue',
                       fill=True,
                       fill_color='#3186cc',
                       fill_opacity=0.7,
                       parse_html=False).add_to(torontoMap)
    
torontoMap

Here we find the boroughs that contain the word Toronto

In [9]:
torontoData=dfAll.loc[dfAll.Borough.str.contains('Toronto')].reset_index(drop=True)
torontoData.head()

Unnamed: 0,PostalCode,Borough,Neighbourhood,Latitude,Longitude
0,M4E,East Toronto,The Beaches,43.676357,-79.293031
1,M4K,East Toronto,"The Danforth West,Riverdale",43.679557,-79.352188
2,M4L,East Toronto,"The Beaches West,India Bazaar",43.668999,-79.315572
3,M4M,East Toronto,Studio District,43.659526,-79.340923
4,M4N,Central Toronto,Lawrence Park,43.72802,-79.38879


In [10]:
print('The geographical coordinates of Toronto are {}, {}'.format(tlat,tlong))

The geographical coordinates of Toronto are 43.6532, -79.3832


Create a visualization of the Boroughs that have Toronto in the name

In [11]:
scarboroughMap=folium.Map(location=[tlat,tlong],zoom_start=12)
for lat,long,label in zip(torontoData['Latitude'],torontoData['Longitude'],torontoData['Neighbourhood']):
    label=folium.Popup(label,parse_html=True)
    folium.CircleMarker([lat,long],
                       radius=5,
                       popup=label,
                       color='blue',
                       fill=True,
                       fill_color='#3186cc',
                       fil_opacity=0.7,
                       parse_html=False).add_to(scarboroughMap)
scarboroughMap

Define Foursquare Credentials and Version

In [12]:
clientID='UOGBR5LSFAH4AXKJEFNRXGYJ4QDQLJIUR0FKQGEWWZKCUP3X'
clientSecret='Q4CTC2LZKLDLYUB15Q0LV0Q1K5I3EN1YANID25U1KBVABSZM'
version='20180605'
print('Your credentials:')
print('Client ID:  '+ clientID)
print('Client Secret:  '+clientSecret)

Your credentials:
Client ID:  UOGBR5LSFAH4AXKJEFNRXGYJ4QDQLJIUR0FKQGEWWZKCUP3X
Client Secret:  Q4CTC2LZKLDLYUB15Q0LV0Q1K5I3EN1YANID25U1KBVABSZM


Here we explore the first neighborhood in our dataframe

In [13]:
nName=torontoData.loc[0,'Neighbourhood']
nLat=torontoData.loc[0,'Latitude']
nLong=torontoData.loc[0,'Longitude']
print('Latitude and longitude values of {} are {}, {}.'.format(nName,nLat,nLong))

Latitude and longitude values of The Beaches are 43.67635739999999, -79.2930312.


Here we obtain the top 100 venues that are located within Toronto Boroughs with Toronto in the name within a radius of 500 meters

In [14]:
LIMIT = 100
radius = 500
url = 'https://api.foursquare.com/v2/venues/explore?client_id={}&client_secret={}&ll={},{}&v={}&radius={}&limit={}'.format(clientID,
                                                                                                                           clientSecret,
                                                                                                                           tlat,
                                                                                                                           tlong,
                                                                                                                           version,
                                                                                                                           radius,
                                                                                                                           LIMIT)

Here we send the get request and examine the results

In [15]:
import requests
import json
results=requests.get(url).json()
results

{'meta': {'code': 200, 'requestId': '5c59b675f594df20f80bd484'},
 'response': {'suggestedFilters': {'header': 'Tap to show:',
   'filters': [{'name': 'Open now', 'key': 'openNow'}]},
  'headerLocation': 'Bay Street Corridor',
  'headerFullLocation': 'Bay Street Corridor, Toronto',
  'headerLocationGranularity': 'neighborhood',
  'totalResults': 129,
  'suggestedBounds': {'ne': {'lat': 43.6577000045, 'lng': -79.37699210971401},
   'sw': {'lat': 43.648699995499996, 'lng': -79.389407890286}},
  'groups': [{'type': 'Recommended Places',
    'name': 'recommended',
    'items': [{'reasons': {'count': 0,
       'items': [{'summary': 'This spot is popular',
         'type': 'general',
         'reasonName': 'globalInteractionReason'}]},
      'venue': {'id': '5227bb01498e17bf485e6202',
       'name': 'Downtown Toronto',
       'location': {'lat': 43.65323167517444,
        'lng': -79.38529600606677,
        'labeledLatLngs': [{'label': 'display',
          'lat': 43.65323167517444,
          '

Using the get_category_type function from the IBM Foursquare lab.  The function extracts the category of the venue

In [16]:
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

Now we are ready to clean the json and structure it into a pandas dataframe

In [17]:
from pandas.io.json import json_normalize
venues=results['response']['groups'][0]['items']
venuesNearBy=json_normalize(venues)

# filter columns
filtered_columns=['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
venuesNearBy=venuesNearBy.loc[:, filtered_columns]

# clean columns
venuesNearBy.columns = [col.split(".")[-1] for col in venuesNearBy.columns]

venuesNearBy.head()

Unnamed: 0,name,categories,lat,lng
0,Downtown Toronto,"[{'id': '4f2a25ac4b909258e854f55f', 'name': 'N...",43.653232,-79.385296
1,Nathan Phillips Square,"[{'id': '4bf58dd8d48988d164941735', 'name': 'P...",43.65227,-79.383516
2,Eggspectation Bell Trinity Square,"[{'id': '4bf58dd8d48988d143941735', 'name': 'B...",43.653144,-79.38198
3,Indigo,"[{'id': '4bf58dd8d48988d114951735', 'name': 'B...",43.653515,-79.380696
4,Old City Hall,"[{'id': '4bf58dd8d48988d12d941735', 'name': 'M...",43.652009,-79.381744


Here we check to make sure Foursquare returned 100 venues

In [18]:
print('{} venues were returned by Foursquare.'.format(venuesNearBy.shape[0]))

100 venues were returned by Foursquare.


# Explore Neighborhoods in Toronto.
Here we create a function to repeat the same process to all the neighborhoods in Toronto

In [19]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            clientID, 
            clientSecret, 
            version, 
            tlat, 
            tlong, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

Now write the code to run the above function on each neighborhood and create a new dataframe called torontoVenues.

In [20]:
torontoVenues=getNearbyVenues(names=torontoData['Neighbourhood'],
                                   latitudes=torontoData['Latitude'],
                                   longitudes=torontoData['Longitude']
                                  )

The Beaches
The Danforth West,Riverdale
The Beaches West,India Bazaar
Studio District
Lawrence Park
Davisville North
North Toronto West
Davisville
Moore Park,Summerhill East
Deer Park,Forest Hill SE,Rathnelly,South Hill,Summerhill West
Rosedale
Cabbagetown,St. James Town
Church and Wellesley
Harbourfront,Regent Park
Ryerson,Garden District
St. James Town
Berczy Park
Central Bay Street
Adelaide,King,Richmond
Harbourfront East,Toronto Islands,Union Station
Design Exchange,Toronto Dominion Centre
Commerce Court,Victoria Hotel
Roselawn
Forest Hill North,Forest Hill West
The Annex,North Midtown,Yorkville
Harbord,University of Toronto
Chinatown,Grange Park,Kensington Market
CN Tower,Bathurst Quay,Island airport,Harbourfront West,King and Spadina,Railway Lands,South Niagara
Stn A PO Boxes 25 The Esplanade
First Canadian Place,Underground city
Christie
Dovercourt Village,Dufferin
Little Portugal,Trinity
Brockton,Exhibition Place,Parkdale Village
High Park,The Junction South
Parkdale,Roncesvall

Here we check the size of the resulting dataframe

In [21]:
print(torontoVenues.shape)
torontoVenues.head()

(3800, 7)


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,The Beaches,43.676357,-79.293031,Downtown Toronto,43.653232,-79.385296,Neighborhood
1,The Beaches,43.676357,-79.293031,Nathan Phillips Square,43.65227,-79.383516,Plaza
2,The Beaches,43.676357,-79.293031,Eggspectation Bell Trinity Square,43.653144,-79.38198,Breakfast Spot
3,The Beaches,43.676357,-79.293031,Indigo,43.653515,-79.380696,Bookstore
4,The Beaches,43.676357,-79.293031,Old City Hall,43.652009,-79.381744,Monument / Landmark


Here we check how many venues were returned for each neighborhood

In [22]:
torontoVenues.groupby('Neighborhood').count()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
"Adelaide,King,Richmond",100,100,100,100,100,100
Berczy Park,100,100,100,100,100,100
"Brockton,Exhibition Place,Parkdale Village",100,100,100,100,100,100
Business Reply Mail Processing Centre 969 Eastern,100,100,100,100,100,100
"CN Tower,Bathurst Quay,Island airport,Harbourfront West,King and Spadina,Railway Lands,South Niagara",100,100,100,100,100,100
"Cabbagetown,St. James Town",100,100,100,100,100,100
Central Bay Street,100,100,100,100,100,100
"Chinatown,Grange Park,Kensington Market",100,100,100,100,100,100
Christie,100,100,100,100,100,100
Church and Wellesley,100,100,100,100,100,100


Here we find out how many unique categories can be curated from all the returned venues

In [23]:
print('There are {} uniques categories.'.format(len(torontoVenues['Venue Category'].unique())))

There are 63 uniques categories.


# Analyze Each Neighborhood

In [24]:
# one hot encoding
torontoOnehot=pd.get_dummies(torontoVenues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
torontoOnehot['Neighborhood'] = torontoVenues['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [torontoOnehot.columns[-1]] + list(torontoOnehot.columns[:-1])
torontoOnehot=torontoOnehot[fixed_columns]

torontoOnehot.head()

Unnamed: 0,Women's Store,Accessories Store,American Restaurant,Art Museum,Asian Restaurant,Bakery,Bank,Bar,Beer Bar,Bookstore,...,Shoe Store,Shopping Mall,Smoothie Shop,Steakhouse,Sushi Restaurant,Tanning Salon,Tea Room,Theater,Toy / Game Store,Vegetarian / Vegan Restaurant
0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,0,0,0,0,0,0,0,0,0,1,...,0,0,0,0,0,0,0,0,0,0
4,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


Here we examine the new dataframe size.

In [25]:
torontoOnehot.shape

(3800, 63)

Here we group rows by neighborhood and by taking the mean of the frequency of occurrence of each category

In [26]:
torontoGrouped=torontoOnehot.groupby('Neighborhood').mean().reset_index()
torontoGrouped.head()

Unnamed: 0,Neighborhood,Women's Store,Accessories Store,American Restaurant,Art Museum,Asian Restaurant,Bakery,Bank,Bar,Beer Bar,...,Shoe Store,Shopping Mall,Smoothie Shop,Steakhouse,Sushi Restaurant,Tanning Salon,Tea Room,Theater,Toy / Game Store,Vegetarian / Vegan Restaurant
0,"Adelaide,King,Richmond",0.01,0.01,0.05,0.01,0.02,0.02,0.01,0.01,0.01,...,0.01,0.01,0.01,0.03,0.01,0.01,0.03,0.02,0.01,0.02
1,Berczy Park,0.01,0.01,0.05,0.01,0.02,0.02,0.01,0.01,0.01,...,0.01,0.01,0.01,0.03,0.01,0.01,0.03,0.02,0.01,0.02
2,"Brockton,Exhibition Place,Parkdale Village",0.01,0.01,0.05,0.01,0.02,0.02,0.01,0.01,0.01,...,0.01,0.01,0.01,0.03,0.01,0.01,0.03,0.02,0.01,0.02
3,Business Reply Mail Processing Centre 969 Eastern,0.01,0.01,0.05,0.01,0.02,0.02,0.01,0.01,0.01,...,0.01,0.01,0.01,0.03,0.01,0.01,0.03,0.02,0.01,0.02
4,"CN Tower,Bathurst Quay,Island airport,Harbourf...",0.01,0.01,0.05,0.01,0.02,0.02,0.01,0.01,0.01,...,0.01,0.01,0.01,0.03,0.01,0.01,0.03,0.02,0.01,0.02


Here we conrim the new size

In [27]:
torontoGrouped.shape

(38, 63)

Here we print each neighborhood along with the top 5 most common venues

In [28]:
num_top_venues = 5

for hood in torontoGrouped['Neighborhood']:
    print("----"+hood+"----")
    temp = torontoGrouped[torontoGrouped['Neighborhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----Adelaide,King,Richmond----
                 venue  freq
0       Clothing Store  0.07
1  American Restaurant  0.05
2          Coffee Shop  0.04
3                Plaza  0.03
4       Breakfast Spot  0.03


----Berczy Park----
                 venue  freq
0       Clothing Store  0.07
1  American Restaurant  0.05
2          Coffee Shop  0.04
3                Plaza  0.03
4       Breakfast Spot  0.03


----Brockton,Exhibition Place,Parkdale Village----
                 venue  freq
0       Clothing Store  0.07
1  American Restaurant  0.05
2          Coffee Shop  0.04
3                Plaza  0.03
4       Breakfast Spot  0.03


----Business Reply Mail Processing Centre 969 Eastern----
                 venue  freq
0       Clothing Store  0.07
1  American Restaurant  0.05
2          Coffee Shop  0.04
3                Plaza  0.03
4       Breakfast Spot  0.03


----CN Tower,Bathurst Quay,Island airport,Harbourfront West,King and Spadina,Railway Lands,South Niagara----
                 venue  fre

Here we put that into a pandas dataframe

In [29]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

Now lets create the new dataframe and display the top 10 venues for ech neighbourhood

In [30]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted=pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood']=torontoGrouped['Neighborhood']

for ind in np.arange(torontoGrouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:]=return_most_common_venues(torontoGrouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,"Adelaide,King,Richmond",Clothing Store,American Restaurant,Coffee Shop,Breakfast Spot,Tea Room,Steakhouse,Plaza,Café,Vegetarian / Vegan Restaurant,Food Court
1,Berczy Park,Clothing Store,American Restaurant,Coffee Shop,Breakfast Spot,Tea Room,Steakhouse,Plaza,Café,Vegetarian / Vegan Restaurant,Food Court
2,"Brockton,Exhibition Place,Parkdale Village",Clothing Store,American Restaurant,Coffee Shop,Breakfast Spot,Tea Room,Steakhouse,Plaza,Café,Vegetarian / Vegan Restaurant,Food Court
3,Business Reply Mail Processing Centre 969 Eastern,Clothing Store,American Restaurant,Coffee Shop,Breakfast Spot,Tea Room,Steakhouse,Plaza,Café,Vegetarian / Vegan Restaurant,Food Court
4,"CN Tower,Bathurst Quay,Island airport,Harbourf...",Clothing Store,American Restaurant,Coffee Shop,Breakfast Spot,Tea Room,Steakhouse,Plaza,Café,Vegetarian / Vegan Restaurant,Food Court
5,"Cabbagetown,St. James Town",Clothing Store,American Restaurant,Coffee Shop,Breakfast Spot,Tea Room,Steakhouse,Plaza,Café,Vegetarian / Vegan Restaurant,Food Court
6,Central Bay Street,Clothing Store,American Restaurant,Coffee Shop,Breakfast Spot,Tea Room,Steakhouse,Plaza,Café,Vegetarian / Vegan Restaurant,Food Court
7,"Chinatown,Grange Park,Kensington Market",Clothing Store,American Restaurant,Coffee Shop,Breakfast Spot,Tea Room,Steakhouse,Plaza,Café,Vegetarian / Vegan Restaurant,Food Court
8,Christie,Clothing Store,American Restaurant,Coffee Shop,Breakfast Spot,Tea Room,Steakhouse,Plaza,Café,Vegetarian / Vegan Restaurant,Food Court
9,Church and Wellesley,Clothing Store,American Restaurant,Coffee Shop,Breakfast Spot,Tea Room,Steakhouse,Plaza,Café,Vegetarian / Vegan Restaurant,Food Court


# Cluster Neighborhoods
Here we run k-means to cluster the neighborhood into 5 clusters.

In [31]:
# import k-means from clustering stage
from sklearn.cluster import KMeans

# set number of clusters
kclusters = 5

torontoGroupedClustering=torontoGrouped.drop('Neighborhood', 1)

# run k-means clustering
kmeans=KMeans(n_clusters=kclusters, random_state=0).fit(torontoGroupedClustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0])

Here we create a new dataframe that includes the cluster as well as the top 10 venues for each neighborhood.

In [32]:
torontoMerged=torontoData
torontoMerged.rename(columns={'Neighbourhood':'Neighborhood'},inplace=True)

# add clustering labels
torontoMerged['Cluster Labels']=kmeans.labels_

# merge toronto_grouped with toronto_data to add latitude/longitude for each neighborhood
torontoMerged=torontoMerged.join(neighborhoods_venues_sorted.set_index('Neighborhood'), on='Neighborhood')

torontoMerged.head() # check the last columns!

Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,M4E,East Toronto,The Beaches,43.676357,-79.293031,0,Clothing Store,American Restaurant,Coffee Shop,Breakfast Spot,Tea Room,Steakhouse,Plaza,Café,Vegetarian / Vegan Restaurant,Food Court
1,M4K,East Toronto,"The Danforth West,Riverdale",43.679557,-79.352188,0,Clothing Store,American Restaurant,Coffee Shop,Breakfast Spot,Tea Room,Steakhouse,Plaza,Café,Vegetarian / Vegan Restaurant,Food Court
2,M4L,East Toronto,"The Beaches West,India Bazaar",43.668999,-79.315572,0,Clothing Store,American Restaurant,Coffee Shop,Breakfast Spot,Tea Room,Steakhouse,Plaza,Café,Vegetarian / Vegan Restaurant,Food Court
3,M4M,East Toronto,Studio District,43.659526,-79.340923,0,Clothing Store,American Restaurant,Coffee Shop,Breakfast Spot,Tea Room,Steakhouse,Plaza,Café,Vegetarian / Vegan Restaurant,Food Court
4,M4N,Central Toronto,Lawrence Park,43.72802,-79.38879,0,Clothing Store,American Restaurant,Coffee Shop,Breakfast Spot,Tea Room,Steakhouse,Plaza,Café,Vegetarian / Vegan Restaurant,Food Court


Finally, let's visualize the resulting clusters

In [33]:
# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# create map
map_clusters=folium.Map(location=[tlat,tlong], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i+x+(i*x)**2 for i in range(kclusters)]
colors_array=cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(torontoMerged['Latitude'],torontoMerged['Longitude'],torontoMerged['Neighborhood'],torontoMerged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

# Examine Clusters
Now, you can examine each cluster and determine the discriminating venue categories that distinguish each cluster. Based on the defining categories, you can then assign a name to each cluster. I will leave this exercise to you.

## Cluster 1

In [34]:
torontoMerged.loc[torontoMerged['Cluster Labels'] == 0,torontoMerged.columns[[1] + list(range(5,torontoMerged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,East Toronto,0,Clothing Store,American Restaurant,Coffee Shop,Breakfast Spot,Tea Room,Steakhouse,Plaza,Café,Vegetarian / Vegan Restaurant,Food Court
1,East Toronto,0,Clothing Store,American Restaurant,Coffee Shop,Breakfast Spot,Tea Room,Steakhouse,Plaza,Café,Vegetarian / Vegan Restaurant,Food Court
2,East Toronto,0,Clothing Store,American Restaurant,Coffee Shop,Breakfast Spot,Tea Room,Steakhouse,Plaza,Café,Vegetarian / Vegan Restaurant,Food Court
3,East Toronto,0,Clothing Store,American Restaurant,Coffee Shop,Breakfast Spot,Tea Room,Steakhouse,Plaza,Café,Vegetarian / Vegan Restaurant,Food Court
4,Central Toronto,0,Clothing Store,American Restaurant,Coffee Shop,Breakfast Spot,Tea Room,Steakhouse,Plaza,Café,Vegetarian / Vegan Restaurant,Food Court
5,Central Toronto,0,Clothing Store,American Restaurant,Coffee Shop,Breakfast Spot,Tea Room,Steakhouse,Plaza,Café,Vegetarian / Vegan Restaurant,Food Court
6,Central Toronto,0,Clothing Store,American Restaurant,Coffee Shop,Breakfast Spot,Tea Room,Steakhouse,Plaza,Café,Vegetarian / Vegan Restaurant,Food Court
7,Central Toronto,0,Clothing Store,American Restaurant,Coffee Shop,Breakfast Spot,Tea Room,Steakhouse,Plaza,Café,Vegetarian / Vegan Restaurant,Food Court
8,Central Toronto,0,Clothing Store,American Restaurant,Coffee Shop,Breakfast Spot,Tea Room,Steakhouse,Plaza,Café,Vegetarian / Vegan Restaurant,Food Court
9,Central Toronto,0,Clothing Store,American Restaurant,Coffee Shop,Breakfast Spot,Tea Room,Steakhouse,Plaza,Café,Vegetarian / Vegan Restaurant,Food Court


## Cluster 2

In [35]:
torontoMerged.loc[torontoMerged['Cluster Labels'] == 1,torontoMerged.columns[[1] + list(range(5,torontoMerged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue


## Cluster 3

In [36]:
torontoMerged.loc[torontoMerged['Cluster Labels'] == 2,torontoMerged.columns[[1] + list(range(5,torontoMerged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue


## Cluster 4

In [37]:
torontoMerged.loc[torontoMerged['Cluster Labels'] == 3,torontoMerged.columns[[1] + list(range(5,torontoMerged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue


## Cluster 5

In [38]:
torontoMerged.loc[torontoMerged['Cluster Labels'] == 4,torontoMerged.columns[[1] + list(range(5,torontoMerged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
