# Segmenting and Clustering Neighborhoods in Toronto

1. Start by creating a new Notebook for this assignment.
1. Use the Notebook to build the code to scrape the following Wikipedia page, https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M, in order to obtain the data that is in the table of postal codes and to transform the data into a pandas dataframe like the one shown below:

In [8]:
!conda install -c conda-forge beautifulsoup4 --yes 

Fetching package metadata .............
Solving package specifications: .

Package plan for installation in environment /opt/conda/envs/DSX-Python35:

The following packages will be UPDATED:

    beautifulsoup4: 4.6.0-py35h442a8c9_1 --> 4.6.3-py35_0 conda-forge

beautifulsoup4 100% |################################| Time: 0:00:00  41.76 MB/s


In [120]:
import urllib.request as url
from bs4 import BeautifulSoup as bs
import requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe
import pandas as pd
import numpy as np

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans


- using requests and beautifullsoup4 to download and parse the page

In [3]:
w_url = 'https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M'
w_page = requests.get(w_url).text
soup = bs(w_page,'lxml')
#print(soup.prettify())

- parse the elements of the table and create a first dataframe

In [4]:

w_class='wikitable sortable'# jquery-tablesorter

rows = []
for w_row in soup.find('table',{'class':w_class}).find('tbody').find_all('tr'):
    row = [x.text.replace('\n','') for x in w_row.find_all('td')]
    if (len(row)==3):
        #print(row)
        rows.append(row)


df_postcode = pd.DataFrame(rows,columns = ['PostalCode','Borough','Neighborhood'])
print(df_postcode.shape)
df_postcode.head(10)

(289, 3)


Unnamed: 0,PostalCode,Borough,Neighborhood
0,M1A,Not assigned,Not assigned
1,M2A,Not assigned,Not assigned
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,Harbourfront
5,M5A,Downtown Toronto,Regent Park
6,M6A,North York,Lawrence Heights
7,M6A,North York,Lawrence Manor
8,M7A,Queen's Park,Not assigned
9,M8A,Not assigned,Not assigned


3. To create the above dataframe:

- The dataframe will consist of three columns: PostalCode, Borough, and Neighborhood
- Only process the cells that have an assigned borough. Ignore cells with a borough that is Not assigned.

In [5]:
df_filtered = df_postcode[df_postcode['Borough']!='Not assigned']
df_filtered.head(5)

Unnamed: 0,PostalCode,Borough,Neighborhood
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,Harbourfront
5,M5A,Downtown Toronto,Regent Park
6,M6A,North York,Lawrence Heights


In [6]:
df_filtered[df_filtered['Borough']=='Queen\'s Park']

Unnamed: 0,PostalCode,Borough,Neighborhood
8,M7A,Queen's Park,Not assigned


If a cell has a borough but a Not assigned neighborhood, then the neighborhood will be the same as the borough. So for the 9th cell in the table on the Wikipedia page, the value of the Borough and the Neighborhood columns will be Queen's Park.

In [7]:
def check_neighbourhood(row):
    if row['Neighborhood'] == 'Not assigned':
        row['Neighborhood'] = row['Borough']
    return row
df_filtered.apply(check_neighbourhood,axis=1)
df_filtered[df_filtered['Borough']=='Queen\'s Park']

Unnamed: 0,PostalCode,Borough,Neighborhood
8,M7A,Queen's Park,Queen's Park


- More than one neighborhood can exist in one postal code area. For example, in the table on the Wikipedia page, you will notice that M5A is listed twice and has two neighborhoods: Harbourfront and Regent Park. These two rows will be combined into one row with the neighborhoods separated with a comma as shown in row 11 in the above table.

In [8]:
df_grouped = pd.DataFrame(df_filtered.groupby(['PostalCode','Borough'])['Neighborhood'].apply(list)).reset_index()
df_grouped.head(5)

Unnamed: 0,PostalCode,Borough,Neighborhood
0,M1B,Scarborough,"[Rouge, Malvern]"
1,M1C,Scarborough,"[Highland Creek, Rouge Hill, Port Union]"
2,M1E,Scarborough,"[Guildwood, Morningside, West Hill]"
3,M1G,Scarborough,[Woburn]
4,M1H,Scarborough,[Cedarbrae]


# Second part - Geographical data

- create a dataframe from the csv file

In [9]:
!wget -q -O 'Geospatial_data.csv' https://cocl.us/Geospatial_data

In [10]:
df_geo = pd.read_csv('Geospatial_data.csv')
df_geo.head(5)

Unnamed: 0,Postal Code,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497
2,M1E,43.763573,-79.188711
3,M1G,43.770992,-79.216917
4,M1H,43.773136,-79.239476


- join the two dataframe

In [11]:
df_joined = df_grouped.set_index('PostalCode').join(df_geo.set_index('Postal Code')).reset_index()
df_joined.head(5)

Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude
0,M1B,Scarborough,"[Rouge, Malvern]",43.806686,-79.194353
1,M1C,Scarborough,"[Highland Creek, Rouge Hill, Port Union]",43.784535,-79.160497
2,M1E,Scarborough,"[Guildwood, Morningside, West Hill]",43.763573,-79.188711
3,M1G,Scarborough,[Woburn],43.770992,-79.216917
4,M1H,Scarborough,[Cedarbrae],43.773136,-79.239476


In [12]:
neighborhoods = df_joined
print('The dataframe has {} boroughs and {} neighborhoods.'.format(
        len(neighborhoods['Borough'].unique()),
        neighborhoods.shape[0]
    )
)

The dataframe has 11 boroughs and 103 neighborhoods.


In [13]:
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values


In [15]:
address = 'Toronto'

geolocator = Nominatim()
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of {} are {}, {}.'.format(address,latitude, longitude))

The geograpical coordinate of Toronto are 43.653963, -79.387207.


In [18]:
!conda install -c conda-forge folium=0.5.0 --yes # uncomment this line if you haven't completed the Foursquare API lab

Fetching package metadata .............
Solving package specifications: .

Package plan for installation in environment /opt/conda/envs/DSX-Python35:

The following NEW packages will be INSTALLED:

    altair:  2.2.2-py35_1 conda-forge
    branca:  0.3.1-py_0   conda-forge
    folium:  0.5.0-py_0   conda-forge
    vincent: 0.4.4-py_1   conda-forge

altair-2.2.2-p 100% |################################| Time: 0:00:00  58.76 MB/s
branca-0.3.1-p 100% |################################| Time: 0:00:00  36.49 MB/s
vincent-0.4.4- 100% |################################| Time: 0:00:00  42.79 MB/s
folium-0.5.0-p 100% |################################| Time: 0:00:00  52.20 MB/s


In [19]:

import folium # map rendering library

In [20]:
# create map of New York using latitude and longitude values
map_toronto = folium.Map(location=[latitude, longitude], zoom_start=10)

# add markers to map
for lat, lng, borough, neighborhood in zip(neighborhoods['Latitude'], neighborhoods['Longitude'], neighborhoods['Borough'], neighborhoods['Neighborhood']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_toronto)  
    
map_toronto

In [21]:
CLIENT_ID = 'UYWPJMENVNHLUYMJSJLNVUHQYD5FLJKVSIEP1HE1SX25I2OO' # your Foursquare ID
CLIENT_SECRET = 'RX3MFP4E1AOKEMCGMHINC1FCVY4Z3GO2G1VZE2FHPGW0FLT2' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version
LIMIT = 30

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: UYWPJMENVNHLUYMJSJLNVUHQYD5FLJKVSIEP1HE1SX25I2OO
CLIENT_SECRET:RX3MFP4E1AOKEMCGMHINC1FCVY4Z3GO2G1VZE2FHPGW0FLT2


In [22]:
neighborhoods.loc[0, 'Neighborhood']

['Rouge', 'Malvern']

In [23]:
neighborhood_latitude = neighborhoods.loc[0, 'Latitude'] # neighborhood latitude value
neighborhood_longitude = neighborhoods.loc[0, 'Longitude'] # neighborhood longitude value

neighborhood_name = neighborhoods.loc[0, 'Neighborhood'] # neighborhood name

print('Latitude and longitude values of {} are {}, {}.'.format(neighborhood_name, 
                                                               neighborhood_latitude, 
                                                               neighborhood_longitude))

Latitude and longitude values of ['Rouge', 'Malvern'] are 43.806686299999996, -79.19435340000001.


In [24]:
fs_search_url = 'https://api.foursquare.com/v2/venues/search?client_id={}&client_secret={}&v={}&ll={},{}'.format(CLIENT_ID,CLIENT_SECRET,VERSION,neighborhood_latitude,neighborhood_longitude)
results = requests.get(fs_search_url).json()

In [169]:
#results

In [26]:
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

In [27]:
venues = results['response']['venues']
    
nearby_venues = json_normalize(venues) # flatten JSON

# filter columns
filtered_columns = ['name', 'categories', 'location.lat', 'location.lng']
nearby_venues =nearby_venues.loc[:, filtered_columns]

# filter the category for each row
nearby_venues['venue.categories'] = nearby_venues.apply(get_category_type, axis=1)

# clean columns
nearby_venues.columns = [col.split(".")[-1] for col in nearby_venues.columns]

nearby_venues.head()

Unnamed: 0,name,categories,lat,lng,categories.1
0,Kaycan - SCARBOROUGH,"[{'primary': True, 'name': 'Building', 'shortN...",43.806768,-79.197875,Building
1,Frito Lay,"[{'primary': True, 'name': 'Factory', 'shortNa...",43.803844,-79.194841,Factory
2,Alvin Curling Public School,"[{'primary': True, 'name': 'Elementary School'...",43.808683,-79.190103,Elementary School
3,Shell,"[{'primary': True, 'name': 'Gas Station', 'sho...",43.803227,-79.192414,Gas Station
4,Cascades (Metro Waste),"[{'primary': True, 'name': 'Building', 'shortN...",43.807494,-79.195073,Building


In [28]:
print('{} venues were returned by Foursquare.'.format(nearby_venues.shape[0]))

30 venues were returned by Foursquare.


In [64]:
def getNearbyVenues(neighborhoods):
    
    radius=500
    venues_list=[]
    for (index,(postal_code,borough,neighborhood,lat,lng)) in neighborhoods.iterrows():
        print(borough)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            borough, 
            neighborhood,
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Borough', 
                             'Neighborhood',
                              'Borough Latitude', 
                              'Borough Longitude', 
                              'Venue', 
                              'Venue Latitude', 
                              'Venue Longitude', 
                              'Venue Category']
    
    return(nearby_venues)

In [62]:
#for (index,(postal_code,borough,neighborhood,latitude,longitude)) in neighborhoods[neighborhoods['Borough'].str.contains('Toronto')].iterrows():
#    print (borough)
    


In [65]:
toronto_venues = getNearbyVenues(neighborhoods[neighborhoods['Borough'].str.contains('Toronto')])

East Toronto
East Toronto
East Toronto
East Toronto
Central Toronto
Central Toronto
Central Toronto
Central Toronto
Central Toronto
Central Toronto
Downtown Toronto
Downtown Toronto
Downtown Toronto
Downtown Toronto
Downtown Toronto
Downtown Toronto
Downtown Toronto
Downtown Toronto
Downtown Toronto
Downtown Toronto
Downtown Toronto
Downtown Toronto
Central Toronto
Central Toronto
Central Toronto
Downtown Toronto
Downtown Toronto
Downtown Toronto
Downtown Toronto
Downtown Toronto
Downtown Toronto
West Toronto
West Toronto
West Toronto
West Toronto
West Toronto
West Toronto
East Toronto


In [66]:
toronto_venues.head()

Unnamed: 0,Borough,Neighborhood,Borough Latitude,Borough Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,East Toronto,[The Beaches],43.676357,-79.293031,Grover Pub and Grub,43.679181,-79.297215,Pub
1,East Toronto,[The Beaches],43.676357,-79.293031,Starbucks,43.678798,-79.298045,Coffee Shop
2,East Toronto,[The Beaches],43.676357,-79.293031,Glen Stewart Ravine,43.6763,-79.294784,Other Great Outdoors
3,East Toronto,[The Beaches],43.676357,-79.293031,Upper Beaches,43.680563,-79.292869,Neighborhood
4,East Toronto,"[The Danforth West, Riverdale]",43.679557,-79.352188,Pantheon,43.677621,-79.351434,Greek Restaurant


In [67]:
print(toronto_venues.shape)

(831, 8)


In [68]:
toronto_venues.groupby('Borough').count()

Unnamed: 0_level_0,Neighborhood,Borough Latitude,Borough Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Borough,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
Central Toronto,108,108,108,108,108,108,108
Downtown Toronto,482,482,482,482,482,482,482
East Toronto,104,104,104,104,104,104,104
West Toronto,137,137,137,137,137,137,137


In [69]:
print('There are {} uniques categories.'.format(len(toronto_venues['Venue Category'].unique())))

There are 192 uniques categories.


In [107]:
# one hot encoding
toronto_onehot = pd.get_dummies(toronto_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
toronto_onehot['Neighborhood'] = toronto_venues['Neighborhood'].apply(lambda x: ', '.join(x))
toronto_onehot['Borough'] = toronto_venues['Borough'] 

toronto_onehot = toronto_onehot.set_index('Neighborhood')
toronto_onehot.head()

Unnamed: 0_level_0,Adult Boutique,Airport,Airport Food Court,Airport Gate,Airport Lounge,Airport Service,Airport Terminal,American Restaurant,Antique Shop,Aquarium,...,Theater,Toy / Game Store,Trail,Train Station,Vegetarian / Vegan Restaurant,Video Game Store,Vietnamese Restaurant,Wine Bar,Yoga Studio,Borough
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
The Beaches,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,East Toronto
The Beaches,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,East Toronto
The Beaches,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,East Toronto
The Beaches,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,East Toronto
"The Danforth West, Riverdale",0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,East Toronto


In [108]:
toronto_onehot.shape

(831, 192)

In [109]:
toronto_grouped = toronto_onehot.reset_index().groupby('Neighborhood').mean().reset_index()
toronto_grouped.head(5)

Unnamed: 0,Neighborhood,Adult Boutique,Airport,Airport Food Court,Airport Gate,Airport Lounge,Airport Service,Airport Terminal,American Restaurant,Antique Shop,...,Thai Restaurant,Theater,Toy / Game Store,Trail,Train Station,Vegetarian / Vegan Restaurant,Video Game Store,Vietnamese Restaurant,Wine Bar,Yoga Studio
0,"Adelaide, King, Richmond",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.066667,0.0,...,0.0,0.0,0.0,0.0,0.0,0.033333,0.0,0.0,0.0,0.0
1,Berczy Park,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.033333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,"Brockton, Exhibition Place, Parkdale Village",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,Business Reply Mail Processing Centre 969 Eastern,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.052632
4,"CN Tower, Bathurst Quay, Island airport, Harbo...",0.0,0.076923,0.076923,0.076923,0.153846,0.153846,0.153846,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [110]:
toronto_grouped.shape

(38, 192)

In [111]:
num_top_venues = 5

for hood in toronto_grouped['Neighborhood']:
    print("----"+hood+"----")
    temp = toronto_grouped[toronto_grouped['Neighborhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----Adelaide, King, Richmond----
                 venue  freq
0           Steakhouse  0.10
1                 Café  0.07
2                Hotel  0.07
3  American Restaurant  0.07
4     Asian Restaurant  0.07


----Berczy Park----
                venue  freq
0        Cocktail Bar  0.10
1      Farmers Market  0.07
2  Seafood Restaurant  0.07
3              Bakery  0.07
4                Café  0.07


----Brockton, Exhibition Place, Parkdale Village----
            venue  freq
0  Breakfast Spot  0.11
1            Café  0.11
2     Coffee Shop  0.11
3    Climbing Gym  0.05
4             Bar  0.05


----Business Reply Mail Processing Centre 969 Eastern----
           venue  freq
0    Yoga Studio  0.05
1    Pizza Place  0.05
2        Brewery  0.05
3     Skate Park  0.05
4  Burrito Place  0.05


----CN Tower, Bathurst Quay, Island airport, Harbourfront West, King and Spadina, Railway Lands, South Niagara----
              venue  freq
0    Airport Lounge  0.15
1   Airport Service  0.15
2  Airport 

In [112]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [131]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = toronto_grouped['Neighborhood']

for ind in np.arange(toronto_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(toronto_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted.head(5)

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,"Adelaide, King, Richmond",Steakhouse,American Restaurant,Hotel,Asian Restaurant,Café,Greek Restaurant,Opera House,Bar,Plaza,Speakeasy
1,Berczy Park,Cocktail Bar,Seafood Restaurant,Café,Bakery,Farmers Market,Bistro,Basketball Stadium,Beer Bar,Jazz Club,Belgian Restaurant
2,"Brockton, Exhibition Place, Parkdale Village",Coffee Shop,Café,Breakfast Spot,Grocery Store,Italian Restaurant,Performing Arts Venue,Pet Store,Nightclub,Climbing Gym,Caribbean Restaurant
3,Business Reply Mail Processing Centre 969 Eastern,Yoga Studio,Fast Food Restaurant,Comic Shop,Park,Pizza Place,Moving Target,Butcher,Burrito Place,Recording Studio,Restaurant
4,"CN Tower, Bathurst Quay, Island airport, Harbo...",Airport Service,Airport Terminal,Airport Lounge,Harbor / Marina,Boat or Ferry,Sculpture Garden,Boutique,Airport Gate,Airport,Airport Food Court


In [132]:
neighborhoods_venues_sorted.shape

(38, 11)

# Cluster Neighborhoods

In [123]:
# set number of clusters
kclusters = 5

toronto_grouped_clustering = toronto_grouped.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(toronto_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

array([3, 3, 3, 0, 3, 3, 0, 3, 3, 0], dtype=int32)

In [125]:
kmeans.labels_.shape

(38,)

In [145]:
neighborhoods[neighborhoods['Borough'].str.contains('Toronto')].shape

(38, 5)

In [154]:
toronto_merged = neighborhoods[neighborhoods['Borough'].str.contains('Toronto')]
toronto_merged['Neighborhood'] = toronto_merged['Neighborhood'].apply(lambda x: ', '.join(x))

# add clustering labels
#toronto_merged['Cluster Labels'] = kmeans.labels_

# merge toronto_grouped with toronto_data to add latitude/longitude for each neighborhood
toronto_merged = toronto_merged.join(neighborhoods_venues_sorted.set_index('Neighborhood'), on='Neighborhood',lsuffix ='_l')

toronto_merged.head() # check the last columns!

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  from ipykernel import kernelapp as app


Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue,Cluster Labels
37,M4E,East Toronto,The Beaches,43.676357,-79.293031,Coffee Shop,Other Great Outdoors,Pub,Deli / Bodega,Falafel Restaurant,Ethiopian Restaurant,Eastern European Restaurant,Dumpling Restaurant,Dog Run,Discount Store,0
41,M4K,East Toronto,"The Danforth West, Riverdale",43.679557,-79.352188,Greek Restaurant,Italian Restaurant,Ice Cream Shop,Yoga Studio,Brewery,Bookstore,Restaurant,Juice Bar,Spa,Diner,0
42,M4L,East Toronto,"The Beaches West, India Bazaar",43.668999,-79.315572,Park,Sandwich Place,Sushi Restaurant,Food & Drink Shop,Light Rail Station,Brewery,Burger Joint,Burrito Place,Fast Food Restaurant,Fish & Chips Shop,0
43,M4M,East Toronto,Studio District,43.659526,-79.340923,Café,Coffee Shop,American Restaurant,Italian Restaurant,Bakery,Cheese Shop,Seafood Restaurant,Bookstore,Juice Bar,Fish Market,3
44,M4N,Central Toronto,Lawrence Park,43.72802,-79.38879,Bus Line,Park,Dim Sum Restaurant,Swim School,Yoga Studio,Falafel Restaurant,Ethiopian Restaurant,Eastern European Restaurant,Dumpling Restaurant,Dog Run,2


In [155]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i+x+(i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(toronto_merged['Latitude'], toronto_merged['Longitude'], toronto_merged['Neighborhood'], toronto_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

# Examine Clusters

## Cluster 0 : Drunk zone (Pub, Coffee and bar)

In [159]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 0, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Borough,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue,Cluster Labels
37,East Toronto,Coffee Shop,Other Great Outdoors,Pub,Deli / Bodega,Falafel Restaurant,Ethiopian Restaurant,Eastern European Restaurant,Dumpling Restaurant,Dog Run,Discount Store,0
41,East Toronto,Greek Restaurant,Italian Restaurant,Ice Cream Shop,Yoga Studio,Brewery,Bookstore,Restaurant,Juice Bar,Spa,Diner,0
42,East Toronto,Park,Sandwich Place,Sushi Restaurant,Food & Drink Shop,Light Rail Station,Brewery,Burger Joint,Burrito Place,Fast Food Restaurant,Fish & Chips Shop,0
45,Central Toronto,Hotel,Park,Gym,Breakfast Spot,Clothing Store,Food & Drink Shop,Sandwich Place,Burger Joint,Dim Sum Restaurant,Eastern European Restaurant,0
46,Central Toronto,Coffee Shop,Sporting Goods Shop,Park,Clothing Store,Chinese Restaurant,Miscellaneous Shop,Dessert Shop,Rental Car Location,Mexican Restaurant,Salon / Barbershop,0
47,Central Toronto,Sandwich Place,Dessert Shop,Pizza Place,Seafood Restaurant,Italian Restaurant,Café,Sushi Restaurant,Coffee Shop,Indian Restaurant,Park,0
49,Central Toronto,Coffee Shop,Pub,American Restaurant,Bagel Shop,Sports Bar,Supermarket,Sushi Restaurant,Fried Chicken Joint,Light Rail Station,Pizza Place,0
52,Downtown Toronto,Gay Bar,Burger Joint,Adult Boutique,Coffee Shop,Salon / Barbershop,Bubble Tea Shop,Restaurant,Ramen Restaurant,Pub,Pizza Place,0
57,Downtown Toronto,Coffee Shop,Italian Restaurant,Spa,Café,Art Museum,Sandwich Place,Seafood Restaurant,Ramen Restaurant,Bubble Tea Shop,Bar,0
65,Central Toronto,Coffee Shop,Sandwich Place,Café,Pizza Place,BBQ Joint,Park,Vegetarian / Vegan Restaurant,Indian Restaurant,Pharmacy,Cosmetics Shop,0


## Cluster 1 : Familly zone (Parks and playgrounds)

In [161]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 1, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Borough,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue,Cluster Labels
48,Central Toronto,Park,Playground,Tennis Court,Yoga Studio,Deli / Bodega,Falafel Restaurant,Ethiopian Restaurant,Eastern European Restaurant,Dumpling Restaurant,Dog Run,1
50,Downtown Toronto,Park,Playground,Trail,Yoga Studio,Deli / Bodega,Falafel Restaurant,Ethiopian Restaurant,Eastern European Restaurant,Dumpling Restaurant,Dog Run,1
64,Central Toronto,Park,Trail,Jewelry Store,Sushi Restaurant,Yoga Studio,Dessert Shop,Falafel Restaurant,Ethiopian Restaurant,Eastern European Restaurant,Dumpling Restaurant,1


## Cluster 2 : Far far away zone (bus line comes first)

In [163]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 2, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Borough,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue,Cluster Labels
44,Central Toronto,Bus Line,Park,Dim Sum Restaurant,Swim School,Yoga Studio,Falafel Restaurant,Ethiopian Restaurant,Eastern European Restaurant,Dumpling Restaurant,Dog Run,2


## Cluster 3 : Hungry zone (Cafe, restaurants, chocolate and bakery)

In [164]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 3, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Borough,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue,Cluster Labels
43,East Toronto,Café,Coffee Shop,American Restaurant,Italian Restaurant,Bakery,Cheese Shop,Seafood Restaurant,Bookstore,Juice Bar,Fish Market,3
51,Downtown Toronto,Restaurant,Coffee Shop,Italian Restaurant,Café,Gastropub,Market,General Entertainment,Diner,Bank,Bakery,3
53,Downtown Toronto,Coffee Shop,Bakery,Park,Breakfast Spot,Café,Pub,Mexican Restaurant,Performing Arts Venue,Restaurant,Chocolate Shop,3
54,Downtown Toronto,Café,Coffee Shop,Clothing Store,Shopping Mall,Burrito Place,Japanese Restaurant,Sandwich Place,Diner,Ramen Restaurant,Burger Joint,3
55,Downtown Toronto,Gastropub,Italian Restaurant,Restaurant,Hotel,Japanese Restaurant,Coffee Shop,Gym,Creperie,Café,Poke Place,3
56,Downtown Toronto,Cocktail Bar,Seafood Restaurant,Café,Bakery,Farmers Market,Bistro,Basketball Stadium,Beer Bar,Jazz Club,Belgian Restaurant,3
58,Downtown Toronto,Steakhouse,American Restaurant,Hotel,Asian Restaurant,Café,Greek Restaurant,Opera House,Bar,Plaza,Speakeasy,3
59,Downtown Toronto,Hotel,Park,Café,Brewery,Sporting Goods Shop,Ice Cream Shop,Basketball Stadium,Skating Rink,Italian Restaurant,Japanese Restaurant,3
60,Downtown Toronto,Coffee Shop,Café,Deli / Bodega,Restaurant,Fried Chicken Joint,Pub,Concert Hall,Japanese Restaurant,Bar,Bakery,3
61,Downtown Toronto,Café,Hotel,Restaurant,Coffee Shop,Gastropub,Deli / Bodega,Museum,Salad Place,Japanese Restaurant,Pub,3


## Equilibrium zone (it may have been in every predecessing clusters)

In [167]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 4, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Borough,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue,Cluster Labels
63,Central Toronto,Garden,Home Service,Yoga Studio,Farmers Market,Falafel Restaurant,Ethiopian Restaurant,Eastern European Restaurant,Dumpling Restaurant,Dog Run,Discount Store,4
