# Week 3 Clustering Neighborhoods Project pt.1

Hello. This particular project is all about clustering neighborhoods in Toronto using the website of https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M where webscraping is utilized.

However, the first step is to import the appropriate libraries for this project first.

In [1]:
import numpy as np 
import pandas as pd 
from bs4 import BeautifulSoup 
import requests 

Next, we prepare our web scraping code by utilizing BeautifulSoup.

In [2]:
url='https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M'
source = requests.get(url).content
content = BeautifulSoup(requests.get(url).content, 'lxml')

## Project Rules

The dataframe will consist of three columns: PostalCode, Borough, and Neighborhood.

Only process the cells that have an assigned borough. 

Ignore cells with a borough that is Not assigned.

More than one neighborhood can exist in one postal code area. 

For example, in the table on the Wikipedia page, you will notice that M5A is listed twice and has two neighborhoods: Harbourfront and Regent Park. These two rows will be combined into one row with the neighborhoods separated with a comma as shown in row 11 in the above table.

If a cell has a borough but a Not assigned neighborhood, then the neighborhood will be the same as the borough. So for the 9th cell in the table on the Wikipedia page, the value of the Borough and the Neighborhood columns will be Queen's Park.

Clean your Notebook and add Markdown cells to explain your work and any assumptions you are making.

In the last cell of your notebook, use the .shape method to print the number of rows of your dataframe.

In [3]:
# Here we actually webscrape the data from the wikipedia page
table = content.find('table')
td = table.find_all('td')
postcode = []
borough = []
neighbourhood = []

# create a list with the scraped data
for i in range(0, len(td), 3):
    postcode.append(td[i].text.strip())
    borough.append(td[i+1].text.strip())
    neighbourhood.append(td[i+2].text.strip())

In [4]:
# create the actual DataFrame with the lists previously scraped and give the columns appropriate names  
df_codes = pd.DataFrame(data=[postcode, borough, neighbourhood]).transpose()
df_codes.columns = ['Postal Code', 'Borough', 'Neighborhood']

### Cleaning pt.1

The next step requires us to follow some of the project rules requirements. In this case, it would be ignoring boroughs with the 'Not assigned' value. Also, if a cell has a borough but a 'Not assigned' neighborhood value, then the neighborhood value will be the same as the borough value for that particular row.

In [5]:
# Ignore cells with a borough that is Not assigned.
df_codes['Borough'].replace('Not assigned', np.nan, inplace=True)
df_codes.dropna(subset=['Borough'], inplace=True)

# Also, if a cell has a borough but a 'Not assigned' neighborhood value, 
# then the neighborhood value will be the same as the borough value for that particular row.
df_codes['Neighborhood'].replace('Not assigned', "Queen's Park", inplace=True)

### Cleaning pt. 2

Next we do the last constraint which is combining neighborhoods if they exist in one postal code.

In [6]:
# combining the neighborhoods into one line separated by a comma.
df_codes = df_codes.groupby(['Postal Code', 'Borough'])['Neighborhood'].apply(', '.join).reset_index()
df_codes.columns = ['Postal Code', 'Borough', 'Neighborhood']
df_codes.head(12)

Unnamed: 0,Postal Code,Borough,Neighborhood
0,M1B,Scarborough,"Rouge, Malvern"
1,M1C,Scarborough,"Highland Creek, Rouge Hill, Port Union"
2,M1E,Scarborough,"Guildwood, Morningside, West Hill"
3,M1G,Scarborough,Woburn
4,M1H,Scarborough,Cedarbrae
5,M1J,Scarborough,Scarborough Village
6,M1K,Scarborough,"East Birchmount Park, Ionview, Kennedy Park"
7,M1L,Scarborough,"Clairlea, Golden Mile, Oakridge"
8,M1M,Scarborough,"Cliffcrest, Cliffside, Scarborough Village West"
9,M1N,Scarborough,"Birch Cliff, Cliffside West"


In [7]:
#last step is to use the .shape function
df_codes.shape

(103, 3)

# Week 3 Clustering Neighborhoods Project pt.2

The next section of the project requires us to utilize a provided csv file to load tthe longitude and latitude coordinates of each of the neighborhoods.

We combine the location data with our neighborhoods DataFrame.

In [8]:
# First we load the provided csv into Pandas and check the newly created DataFrame
df_latlong = pd.read_csv('Geospatial_Coordinates.csv')
df_latlong.head()

Unnamed: 0,Postal Code,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497
2,M1E,43.763573,-79.188711
3,M1G,43.770992,-79.216917
4,M1H,43.773136,-79.239476


In [9]:
# we merge the location DataFrame with the code DataFrame and join on the 'Postal Code' column. 
# The type of join is inner join which is default, but can be specificed with how='inner' if desired
neighborhoods_df = pd.merge(df_codes, df_latlong, on=['Postal Code'])
neighborhoods_df.head()

Unnamed: 0,Postal Code,Borough,Neighborhood,Latitude,Longitude
0,M1B,Scarborough,"Rouge, Malvern",43.806686,-79.194353
1,M1C,Scarborough,"Highland Creek, Rouge Hill, Port Union",43.784535,-79.160497
2,M1E,Scarborough,"Guildwood, Morningside, West Hill",43.763573,-79.188711
3,M1G,Scarborough,Woburn,43.770992,-79.216917
4,M1H,Scarborough,Cedarbrae,43.773136,-79.239476


# Week 3 Clustering Neighborhoods Project pt.3

The last part of the project is to perform the actual clustering analysis. The project requirements says that I can use boroughs that only have the word Toronto in them so I will just reduce our neighborhoods_df down to toronto_df first!

In [10]:
toronto_df = neighborhoods_df[neighborhoods_df['Borough'].str.contains('Toronto')].reset_index(drop = True)
toronto_df.head()

Unnamed: 0,Postal Code,Borough,Neighborhood,Latitude,Longitude
0,M4E,East Toronto,The Beaches,43.676357,-79.293031
1,M4K,East Toronto,"The Danforth West, Riverdale",43.679557,-79.352188
2,M4L,East Toronto,"The Beaches West, India Bazaar",43.668999,-79.315572
3,M4M,East Toronto,Studio District,43.659526,-79.340923
4,M4N,Central Toronto,Lawrence Park,43.72802,-79.38879


In [14]:
import json
from pandas.io.json import json_normalize
import folium 
from geopy.geocoders import Nominatim
import matplotlib.cm as cm
import matplotlib.colors as colors
from sklearn.cluster import KMeans
print('Libraries imported.')

Libraries imported.


In [15]:
address = 'Toronto, CA'

geolocator = Nominatim()
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude

latitude=43.653963
longitude=-79.387207
print('The Geograpical Coordinate of Toronto is {}, {}.'.format(latitude, longitude))

  This is separate from the ipykernel package so we can avoid doing imports until


The Geograpical Coordinate of Toronto is 43.653963, -79.387207.


In [16]:
# create map of Toronto using latitude and longitude values
toronto_map = folium.Map(location=[latitude, longitude], zoom_start=11)

# add markers to map
for lat, lng, label in zip(toronto_df['Latitude'], toronto_df['Longitude'], toronto_df['Neighborhood']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(toronto_map)  
    
toronto_map

In [17]:
CLIENT_ID = 'I30QD1IX2UJNXY1JIC2CRQUHYFE2PSXTDWAYB4D5V5L5VMVM' # your Foursquare ID
CLIENT_SECRET = 'AVJ40PUXLEZAXFSYQT1VXMJBAUBV431N05BZZKKJXCOL0ICO' # your Foursquare Secret
VERSION = '20180604'
LIMIT = 30
print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: I30QD1IX2UJNXY1JIC2CRQUHYFE2PSXTDWAYB4D5V5L5VMVM
CLIENT_SECRET:AVJ40PUXLEZAXFSYQT1VXMJBAUBV431N05BZZKKJXCOL0ICO


In [19]:
toronto_df.loc[1, 'Neighborhood']

'The Danforth West, Riverdale'

In [20]:
hood_latitude = toronto_df.loc[1, 'Latitude'] # neighborhood latitude value
hood_longitude = toronto_df.loc[1, 'Longitude'] # neighborhood longitude value

hood_name = toronto_df.loc[1, 'Neighborhood'] # neighborhood name

print('Latitude and longitude values of {} are {}, {}.'.format(hood_name, 
                                                               hood_latitude, 
                                                               hood_longitude))

Latitude and longitude values of The Danforth West, Riverdale are 43.6795571, -79.352188.


### Now, let's get the top 100 venues that are in Danforth West, Riverdale within a radius of 500 meters.

First, let's create the GET request URL. Name your URL url.

In [22]:
LIMIT = 100 # limit of number of venues returned by Foursquare API

radius = 500 # define radius

# create URL
url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
    CLIENT_ID, 
    CLIENT_SECRET, 
    VERSION, 
    hood_latitude, 
    hood_longitude, 
    radius, 
    LIMIT)
url # display URL

'https://api.foursquare.com/v2/venues/explore?&client_id=I30QD1IX2UJNXY1JIC2CRQUHYFE2PSXTDWAYB4D5V5L5VMVM&client_secret=AVJ40PUXLEZAXFSYQT1VXMJBAUBV431N05BZZKKJXCOL0ICO&v=20180604&ll=43.6795571,-79.352188&radius=500&limit=100'

In [23]:
results = requests.get(url).json()
results

{'meta': {'code': 200, 'requestId': '5c3e84d7351e3d773f975c9d'},
 'response': {'suggestedFilters': {'header': 'Tap to show:',
   'filters': [{'name': 'Open now', 'key': 'openNow'}]},
  'headerLocation': 'Greektown',
  'headerFullLocation': 'Greektown, Toronto',
  'headerLocationGranularity': 'neighborhood',
  'totalResults': 42,
  'suggestedBounds': {'ne': {'lat': 43.6840571045, 'lng': -79.34597738331301},
   'sw': {'lat': 43.675057095499994, 'lng': -79.35839861668698}},
  'groups': [{'type': 'Recommended Places',
    'name': 'recommended',
    'items': [{'reasons': {'count': 0,
       'items': [{'summary': 'This spot is popular',
         'type': 'general',
         'reasonName': 'globalInteractionReason'}]},
      'venue': {'id': '4bce4183ef10952197da8386',
       'name': 'Pantheon',
       'location': {'address': '407 Danforth Ave.',
        'crossStreet': 'at Chester Ave.',
        'lat': 43.67762124481265,
        'lng': -79.35143390043564,
        'labeledLatLngs': [{'label': 'di

In [24]:
# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

In [25]:
venues = results['response']['groups'][0]['items']
    
nearby_venues = json_normalize(venues) # flatten JSON

# filter columns
filtered_columns = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
nearby_venues =nearby_venues.loc[:, filtered_columns]

# filter the category for each row
nearby_venues['venue.categories'] = nearby_venues.apply(get_category_type, axis=1)

# clean columns
nearby_venues.columns = [col.split(".")[-1] for col in nearby_venues.columns]

nearby_venues.head()

Unnamed: 0,name,categories,lat,lng
0,Pantheon,Greek Restaurant,43.677621,-79.351434
1,Dolce Gelato,Ice Cream Shop,43.677773,-79.351187
2,MenEssentials,Cosmetics Shop,43.67782,-79.351265
3,Messini Authentic Gyros,Greek Restaurant,43.677827,-79.350569
4,Mezes,Greek Restaurant,43.677962,-79.350196


In [26]:
print('{} venues were returned by Foursquare.'.format(nearby_venues.shape[0]))

42 venues were returned by Foursquare.


## 2. Explore Neighborhoods in Toronto
Let's create a function to repeat the same process to all the neighborhoods in Toronto

In [29]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            latitude, 
            longitude, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [30]:
toronto_values = getNearbyVenues(names=toronto_df['Neighborhood'],
                                   latitudes=toronto_df['Latitude'],
                                   longitudes=toronto_df['Longitude']
                                  )
 

The Beaches
The Danforth West, Riverdale
The Beaches West, India Bazaar
Studio District
Lawrence Park
Davisville North
North Toronto West
Davisville
Moore Park, Summerhill East
Deer Park, Forest Hill SE, Rathnelly, South Hill, Summerhill West
Rosedale
Cabbagetown, St. James Town
Church and Wellesley
Harbourfront, Regent Park
Ryerson, Garden District
St. James Town
Berczy Park
Central Bay Street
Adelaide, King, Richmond
Harbourfront East, Toronto Islands, Union Station
Design Exchange, Toronto Dominion Centre
Commerce Court, Victoria Hotel
Roselawn
Forest Hill North, Forest Hill West
The Annex, North Midtown, Yorkville
Harbord, University of Toronto
Chinatown, Grange Park, Kensington Market
CN Tower, Bathurst Quay, Island airport, Harbourfront West, King and Spadina, Railway Lands, South Niagara
Stn A PO Boxes 25 The Esplanade
First Canadian Place, Underground city
Christie
Dovercourt Village, Dufferin
Little Portugal, Trinity
Brockton, Exhibition Place, Parkdale Village
High Park, The 

In [31]:
print(toronto_values.shape)
toronto_values.head()

(2698, 7)


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,The Beaches,43.676357,-79.293031,Downtown Toronto,43.653232,-79.385296,Neighborhood
1,The Beaches,43.676357,-79.293031,Textile Museum of Canada,43.654396,-79.3865,Art Museum
2,The Beaches,43.676357,-79.293031,Sansotei Ramen 三草亭,43.655157,-79.386501,Ramen Restaurant
3,The Beaches,43.676357,-79.293031,Japango,43.655268,-79.385165,Sushi Restaurant
4,The Beaches,43.676357,-79.293031,Cafe Plenty,43.654571,-79.38945,Café


In [32]:
toronto_values.groupby('Neighborhood').count()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
"Adelaide, King, Richmond",71,71,71,71,71,71
Berczy Park,71,71,71,71,71,71
"Brockton, Exhibition Place, Parkdale Village",71,71,71,71,71,71
Business Reply Mail Processing Centre 969 Eastern,71,71,71,71,71,71
"CN Tower, Bathurst Quay, Island airport, Harbourfront West, King and Spadina, Railway Lands, South Niagara",71,71,71,71,71,71
"Cabbagetown, St. James Town",71,71,71,71,71,71
Central Bay Street,71,71,71,71,71,71
"Chinatown, Grange Park, Kensington Market",71,71,71,71,71,71
Christie,71,71,71,71,71,71
Church and Wellesley,71,71,71,71,71,71


In [33]:
print('There are {} uniques categories.'.format(len(toronto_values['Venue Category'].unique())))


There are 52 uniques categories.


In [34]:
# one hot encoding
toronto_onehot = pd.get_dummies(toronto_values[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
toronto_onehot['Neighborhood'] = toronto_values['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [toronto_onehot.columns[-1]] + list(toronto_onehot.columns[:-1])
toronto_onehot = toronto_onehot[fixed_columns]

toronto_onehot.head()

Unnamed: 0,Vegetarian / Vegan Restaurant,American Restaurant,Art Gallery,Art Museum,Arts & Crafts Store,Asian Restaurant,Bakery,Bar,Breakfast Spot,Bubble Tea Shop,...,Salon / Barbershop,Seafood Restaurant,Smoke Shop,Steakhouse,Sushi Restaurant,Tapas Restaurant,Tea Room,Thai Restaurant,Toy / Game Store,University
0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,0,0,0,1,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,1,0,0,0,0,0
4,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [35]:
toronto_grouped = toronto_onehot.groupby('Neighborhood').mean().reset_index()
toronto_grouped

Unnamed: 0,Neighborhood,Vegetarian / Vegan Restaurant,American Restaurant,Art Gallery,Art Museum,Arts & Crafts Store,Asian Restaurant,Bakery,Bar,Breakfast Spot,...,Salon / Barbershop,Seafood Restaurant,Smoke Shop,Steakhouse,Sushi Restaurant,Tapas Restaurant,Tea Room,Thai Restaurant,Toy / Game Store,University
0,"Adelaide, King, Richmond",0.014085,0.028169,0.070423,0.014085,0.014085,0.014085,0.014085,0.028169,0.042254,...,0.014085,0.014085,0.014085,0.014085,0.028169,0.014085,0.014085,0.014085,0.014085,0.014085
1,Berczy Park,0.014085,0.028169,0.070423,0.014085,0.014085,0.014085,0.014085,0.028169,0.042254,...,0.014085,0.014085,0.014085,0.014085,0.028169,0.014085,0.014085,0.014085,0.014085,0.014085
2,"Brockton, Exhibition Place, Parkdale Village",0.014085,0.028169,0.070423,0.014085,0.014085,0.014085,0.014085,0.028169,0.042254,...,0.014085,0.014085,0.014085,0.014085,0.028169,0.014085,0.014085,0.014085,0.014085,0.014085
3,Business Reply Mail Processing Centre 969 Eastern,0.014085,0.028169,0.070423,0.014085,0.014085,0.014085,0.014085,0.028169,0.042254,...,0.014085,0.014085,0.014085,0.014085,0.028169,0.014085,0.014085,0.014085,0.014085,0.014085
4,"CN Tower, Bathurst Quay, Island airport, Harbo...",0.014085,0.028169,0.070423,0.014085,0.014085,0.014085,0.014085,0.028169,0.042254,...,0.014085,0.014085,0.014085,0.014085,0.028169,0.014085,0.014085,0.014085,0.014085,0.014085
5,"Cabbagetown, St. James Town",0.014085,0.028169,0.070423,0.014085,0.014085,0.014085,0.014085,0.028169,0.042254,...,0.014085,0.014085,0.014085,0.014085,0.028169,0.014085,0.014085,0.014085,0.014085,0.014085
6,Central Bay Street,0.014085,0.028169,0.070423,0.014085,0.014085,0.014085,0.014085,0.028169,0.042254,...,0.014085,0.014085,0.014085,0.014085,0.028169,0.014085,0.014085,0.014085,0.014085,0.014085
7,"Chinatown, Grange Park, Kensington Market",0.014085,0.028169,0.070423,0.014085,0.014085,0.014085,0.014085,0.028169,0.042254,...,0.014085,0.014085,0.014085,0.014085,0.028169,0.014085,0.014085,0.014085,0.014085,0.014085
8,Christie,0.014085,0.028169,0.070423,0.014085,0.014085,0.014085,0.014085,0.028169,0.042254,...,0.014085,0.014085,0.014085,0.014085,0.028169,0.014085,0.014085,0.014085,0.014085,0.014085
9,Church and Wellesley,0.014085,0.028169,0.070423,0.014085,0.014085,0.014085,0.014085,0.028169,0.042254,...,0.014085,0.014085,0.014085,0.014085,0.028169,0.014085,0.014085,0.014085,0.014085,0.014085


In [36]:
toronto_grouped.shape

(38, 52)

In [37]:
num_top_venues = 5

for hood in toronto_grouped['Neighborhood']:
    print("----"+hood+"----")
    temp = toronto_grouped[toronto_grouped['Neighborhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----Adelaide, King, Richmond----
                 venue  freq
0          Art Gallery  0.07
1          Coffee Shop  0.06
2                 Café  0.06
3  Japanese Restaurant  0.04
4       Breakfast Spot  0.04


----Berczy Park----
                 venue  freq
0          Art Gallery  0.07
1          Coffee Shop  0.06
2                 Café  0.06
3  Japanese Restaurant  0.04
4       Breakfast Spot  0.04


----Brockton, Exhibition Place, Parkdale Village----
                 venue  freq
0          Art Gallery  0.07
1          Coffee Shop  0.06
2                 Café  0.06
3  Japanese Restaurant  0.04
4       Breakfast Spot  0.04


----Business Reply Mail Processing Centre 969 Eastern----
                 venue  freq
0          Art Gallery  0.07
1          Coffee Shop  0.06
2                 Café  0.06
3  Japanese Restaurant  0.04
4       Breakfast Spot  0.04


----CN Tower, Bathurst Quay, Island airport, Harbourfront West, King and Spadina, Railway Lands, South Niagara----
                 

In [38]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [39]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = toronto_grouped['Neighborhood']

for ind in np.arange(toronto_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(toronto_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,"Adelaide, King, Richmond",Art Gallery,Coffee Shop,Café,Japanese Restaurant,Breakfast Spot,Bar,Exhibit,Chinese Restaurant,Sushi Restaurant,American Restaurant
1,Berczy Park,Art Gallery,Coffee Shop,Café,Japanese Restaurant,Breakfast Spot,Bar,Exhibit,Chinese Restaurant,Sushi Restaurant,American Restaurant
2,"Brockton, Exhibition Place, Parkdale Village",Art Gallery,Coffee Shop,Café,Japanese Restaurant,Breakfast Spot,Bar,Exhibit,Chinese Restaurant,Sushi Restaurant,American Restaurant
3,Business Reply Mail Processing Centre 969 Eastern,Art Gallery,Coffee Shop,Café,Japanese Restaurant,Breakfast Spot,Bar,Exhibit,Chinese Restaurant,Sushi Restaurant,American Restaurant
4,"CN Tower, Bathurst Quay, Island airport, Harbo...",Art Gallery,Coffee Shop,Café,Japanese Restaurant,Breakfast Spot,Bar,Exhibit,Chinese Restaurant,Sushi Restaurant,American Restaurant
5,"Cabbagetown, St. James Town",Art Gallery,Coffee Shop,Café,Japanese Restaurant,Breakfast Spot,Bar,Exhibit,Chinese Restaurant,Sushi Restaurant,American Restaurant
6,Central Bay Street,Art Gallery,Coffee Shop,Café,Japanese Restaurant,Breakfast Spot,Bar,Exhibit,Chinese Restaurant,Sushi Restaurant,American Restaurant
7,"Chinatown, Grange Park, Kensington Market",Art Gallery,Coffee Shop,Café,Japanese Restaurant,Breakfast Spot,Bar,Exhibit,Chinese Restaurant,Sushi Restaurant,American Restaurant
8,Christie,Art Gallery,Coffee Shop,Café,Japanese Restaurant,Breakfast Spot,Bar,Exhibit,Chinese Restaurant,Sushi Restaurant,American Restaurant
9,Church and Wellesley,Art Gallery,Coffee Shop,Café,Japanese Restaurant,Breakfast Spot,Bar,Exhibit,Chinese Restaurant,Sushi Restaurant,American Restaurant


In [40]:
# set number of clusters
kclusters = 5

toronto_grouped_clustering = toronto_grouped.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(toronto_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10]


  return_n_iter=True)


array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0])

In [42]:
toronto_merged = toronto_df

# add clustering labels
toronto_merged['Cluster Labels'] = kmeans.labels_

# merge toronto_grouped with toronto_data to add latitude/longitude for each neighborhood
toronto_merged = toronto_merged.join(neighborhoods_venues_sorted.set_index('Neighborhood'), on='Neighborhood')

toronto_merged.head() # check the last columns!

Unnamed: 0,Postal Code,Borough,Neighborhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,M4E,East Toronto,The Beaches,43.676357,-79.293031,0,Art Gallery,Coffee Shop,Café,Japanese Restaurant,Breakfast Spot,Bar,Exhibit,Chinese Restaurant,Sushi Restaurant,American Restaurant
1,M4K,East Toronto,"The Danforth West, Riverdale",43.679557,-79.352188,0,Art Gallery,Coffee Shop,Café,Japanese Restaurant,Breakfast Spot,Bar,Exhibit,Chinese Restaurant,Sushi Restaurant,American Restaurant
2,M4L,East Toronto,"The Beaches West, India Bazaar",43.668999,-79.315572,0,Art Gallery,Coffee Shop,Café,Japanese Restaurant,Breakfast Spot,Bar,Exhibit,Chinese Restaurant,Sushi Restaurant,American Restaurant
3,M4M,East Toronto,Studio District,43.659526,-79.340923,0,Art Gallery,Coffee Shop,Café,Japanese Restaurant,Breakfast Spot,Bar,Exhibit,Chinese Restaurant,Sushi Restaurant,American Restaurant
4,M4N,Central Toronto,Lawrence Park,43.72802,-79.38879,0,Art Gallery,Coffee Shop,Café,Japanese Restaurant,Breakfast Spot,Bar,Exhibit,Chinese Restaurant,Sushi Restaurant,American Restaurant


In [43]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=12)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i+x+(i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(toronto_merged['Latitude'], toronto_merged['Longitude'], toronto_merged['Neighborhood'], toronto_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters