<a href="https://colab.research.google.com/github/phyllsmoyo/Coursera_Capstone/blob/main/Capstone_Project_Segmenting_and_Clustering.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

 Notebook to explore and cluster the neighborhoods in Toronto.

### **1. Web Scrapping**


Import the required Libraries.


In [148]:
import pandas as pd #data analysis and manipulation tool
import numpy as np #numerical computation tool
import matplotlib.pyplot as plt #python plotting library


Now we assign the website link through which we are going to scrape the data and assign it to variable named url.

In [149]:
url = "https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M"

Pandas ```read_html``` is pretty simple and works well on many Wikipedia pages since the tables are not complicated. 


In [150]:
table1 = pd.read_html(url, match='Borough') #Use the match parameter in order to return the specigic table
print(f'Total tables in the url: {len(table1)}')

Total tables in the url: 1


In [151]:
df = table1[0]
df.head()

Unnamed: 0,Postal Code,Borough,Neighbourhood
0,M1A,Not assigned,Not assigned
1,M2A,Not assigned,Not assigned
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,"Regent Park, Harbourfront"


1. The dataframe should consist of three columns: 
    - PostalCode, 
    - Borough, and 
    - Neighborhood

2. Only process the cells that have an assigned borough. Ignore cells with a borough that is Not assigned.
    - More than one neighborhood can exist in one postal code area. 
        - For example, in the table on the Wikipedia page, you will notice that M5A is listed twice and has two neighborhoods: Harbourfront and Regent Park. 
        - These two rows will be combined into one row with the neighborhoods separated with a comma.
    - If a cell has a borough but a Not assigned  neighborhood, then the neighborhood will be the same as the borough.

3. Clean the Notebook and add Markdown cells to explain the work and any assumptions made.

In [152]:
#Creat a filter for to remove the Boroughs not assigned.
filter = df['Borough'] == 'Not assigned'

#Filter out the "Not assigned" Boroughs 
df_BA = df[~filter]
df_BA.head()


Unnamed: 0,Postal Code,Borough,Neighbourhood
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,"Regent Park, Harbourfront"
5,M6A,North York,"Lawrence Manor, Lawrence Heights"
6,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government"


In the last cell of the notebook, use the .shape method to print the number of rows in the dataframe.

In [153]:
print(f'The dataframe has {df_BA.shape[0]} rows')


The dataframe has 103 rows


### **2.  Get the geographical coordinates of the neighborhoods**

Install the module ```wget``` to download the files from a url.

In [154]:
!pip install wget

import wget



Download the file using the given link

In [155]:
geo_url = r'https://cocl.us/Geospatial_data'
filename = wget.download(geo_url)

In [156]:
df_geo = pd.read_csv(filename)
df_geo.head()

Unnamed: 0,Postal Code,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497
2,M1E,43.763573,-79.188711
3,M1G,43.770992,-79.216917
4,M1H,43.773136,-79.239476


In [157]:
df_BA.head()

Unnamed: 0,Postal Code,Borough,Neighbourhood
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,"Regent Park, Harbourfront"
5,M6A,North York,"Lawrence Manor, Lawrence Heights"
6,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government"


In [158]:
df_final = pd.merge(df_BA, df_geo, on=["Postal Code", "Postal Code"])
df_final.head(11)

Unnamed: 0,Postal Code,Borough,Neighbourhood,Latitude,Longitude
0,M3A,North York,Parkwoods,43.753259,-79.329656
1,M4A,North York,Victoria Village,43.725882,-79.315572
2,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.65426,-79.360636
3,M6A,North York,"Lawrence Manor, Lawrence Heights",43.718518,-79.464763
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",43.662301,-79.389494
5,M9A,Etobicoke,"Islington Avenue, Humber Valley Village",43.667856,-79.532242
6,M1B,Scarborough,"Malvern, Rouge",43.806686,-79.194353
7,M3B,North York,Don Mills,43.745906,-79.352188
8,M4B,East York,"Parkview Hill, Woodbine Gardens",43.706397,-79.309937
9,M5B,Downtown Toronto,"Garden District, Ryerson",43.657162,-79.378937


In [159]:
df_final.shape

(103, 5)

### **3. Explore and cluster the neighborhoods in Toronto.**

In [160]:
print('The dataframe has {} boroughs'.format(
        len(df_final['Borough'].unique())
    )
)

The dataframe has 10 boroughs


**Use geopy library to get the latitude and longitude values of Toronto City**

In [161]:
from geopy.geocoders import Nominatim
import requests
import matplotlib.cm as cm
import matplotlib.colors as colors
from sklearn.cluster import KMeans
import folium

address = 'Toronto , Ontario'

geolocator = Nominatim(user_agent="toronto_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Toronto City are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Toronto City are 43.6534817, -79.3839347.


**Create a map of Toronto with neighborhoods on top**

In [162]:
# create map of Toronto using latitude and longitude values
map_toronto = folium.Map(location=[latitude, longitude], zoom_start=10)

# add markers to map
for lat, lng, borough, neighborhood in zip(df_final['Latitude'], df_final['Longitude'], df_final['Borough'], df_final['Neighbourhood']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_toronto)  
    
map_toronto

**We segment and cluster only boroughs that contain the East Toronto**

In [163]:
east_toronto_data = df_final[df_final['Borough'].str.contains('East Toronto')].reset_index(drop=True)
east_toronto_data.head(6)

Unnamed: 0,Postal Code,Borough,Neighbourhood,Latitude,Longitude
0,M4E,East Toronto,The Beaches,43.676357,-79.293031
1,M4K,East Toronto,"The Danforth West, Riverdale",43.679557,-79.352188
2,M4L,East Toronto,"India Bazaar, The Beaches West",43.668999,-79.315572
3,M4M,East Toronto,Studio District,43.659526,-79.340923
4,M7Y,East Toronto,"Business reply mail Processing Centre, South C...",43.662744,-79.321558


In [164]:
address = 'East Toronto, Toronto'

geolocator = Nominatim(user_agent="toronto_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of East Toronto are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of East Toronto are 43.626243, -79.396962.


In [165]:
# create map of East Toronto using latitude and longitude values
map_east_toronto = folium.Map(location=[latitude, longitude], zoom_start=10)

# add markers to map
for lat, lng, borough, neighbourhood in zip(east_toronto_data['Latitude'], east_toronto_data['Longitude'], east_toronto_data['Borough'], east_toronto_data['Neighbourhood']):
    label = '{}, {}'.format(neighbourhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_east_toronto)  
    
map_east_toronto

**Define Foursquare Credentials and Version**

In [166]:
#@title Hidden Cell
CLIENT_ID = 'AEEH2T5ADCYUM3YRO35VFBSMZDLIMBC13JW02XPLSMGVDO2R' # your Foursquare ID
CLIENT_SECRET = 'XHLCJYOXO0QVK40PK3NGAACUIXUXZUKIFNJQ2FUVMZKVUYNB' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version
LIMIT = 100 # A default Foursquare API limit value

In [167]:
east_toronto_data.loc[0, 'Neighbourhood']

'The Beaches'

**Get the neighbourhood's latitude and longitude**

In [168]:
neighbourhood_latitude = east_toronto_data.loc[0, 'Latitude'] # neighbourhood latitude value
neighbourhood_longitude = east_toronto_data.loc[0, 'Longitude'] # neighbourhood longitude value

neighbourhood_name = east_toronto_data.loc[0, 'Neighbourhood'] # neighbourhood name

print('Latitude and longitude values of {} are {}, {}.'.format(neighbourhood_name, 
                                                               neighbourhood_latitude, 
                                                               neighbourhood_longitude))

Latitude and longitude values of The Beaches are 43.67635739999999, -79.2930312.


**Now, let's get the top 100 venues that are in The Beaches, Rouge within a radius of 500 meters**

In [169]:
LIMIT = 100 
radius = 500 

url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
    CLIENT_ID, 
    CLIENT_SECRET, 
    VERSION, 
    neighbourhood_latitude, 
    neighbourhood_longitude, 
    radius, 
    LIMIT)

results = requests.get(url).json()
results

{'meta': {'code': 200, 'requestId': '6004ab1914532e7aaf7a8f75'},
 'response': {'groups': [{'items': [{'reasons': {'count': 0,
       'items': [{'reasonName': 'globalInteractionReason',
         'summary': 'This spot is popular',
         'type': 'general'}]},
      'referralId': 'e-0-4bd461bc77b29c74a07d9282-0',
      'venue': {'categories': [{'icon': {'prefix': 'https://ss3.4sqi.net/img/categories_v2/parks_outdoors/hikingtrail_',
          'suffix': '.png'},
         'id': '4bf58dd8d48988d159941735',
         'name': 'Trail',
         'pluralName': 'Trails',
         'primary': True,
         'shortName': 'Trail'}],
       'id': '4bd461bc77b29c74a07d9282',
       'location': {'address': 'Glen Manor',
        'cc': 'CA',
        'city': 'Toronto',
        'country': 'Canada',
        'crossStreet': 'Queen St.',
        'distance': 89,
        'formattedAddress': ['Glen Manor (Queen St.)', 'Toronto ON', 'Canada'],
        'labeledLatLngs': [{'label': 'display',
          'lat': 43.67682

In [170]:
# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

**Clean the json and structure it into a pandas dataframe**

In [171]:
venues = results['response']['groups'][0]['items']
    
nearby_venues = pd.json_normalize(venues) # flatten JSON

# filter columns
filtered_columns = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
nearby_venues =nearby_venues.loc[:, filtered_columns]

# filter the category for each row
nearby_venues['venue.categories'] = nearby_venues.apply(get_category_type, axis=1)

# clean columns
nearby_venues.columns = [col.split(".")[-1] for col in nearby_venues.columns]

nearby_venues.head()

Unnamed: 0,name,categories,lat,lng
0,Glen Manor Ravine,Trail,43.676821,-79.293942
1,The Big Carrot Natural Food Market,Health Food Store,43.678879,-79.297734
2,Grover Pub and Grub,Pub,43.679181,-79.297215
3,Upper Beaches,Neighborhood,43.680563,-79.292869


In [172]:
print('{} venues were returned by Foursquare.'.format(nearby_venues.shape[0]))

4 venues were returned by Foursquare.


**Function to repeat the same process to all the neighborhoods in East Toronto**

In [173]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [174]:
east_toronto_venues = getNearbyVenues(names=east_toronto_data['Neighbourhood'],
                                   latitudes=east_toronto_data['Latitude'],
                                   longitudes=east_toronto_data['Longitude']
                                  )

The Beaches
The Danforth West, Riverdale
India Bazaar, The Beaches West
Studio District
Business reply mail Processing Centre, South Central Letter Processing Plant Toronto


In [175]:
print(east_toronto_venues.shape)
east_toronto_venues.head()

(118, 7)


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,The Beaches,43.676357,-79.293031,Glen Manor Ravine,43.676821,-79.293942,Trail
1,The Beaches,43.676357,-79.293031,The Big Carrot Natural Food Market,43.678879,-79.297734,Health Food Store
2,The Beaches,43.676357,-79.293031,Grover Pub and Grub,43.679181,-79.297215,Pub
3,The Beaches,43.676357,-79.293031,Upper Beaches,43.680563,-79.292869,Neighborhood
4,"The Danforth West, Riverdale",43.679557,-79.352188,MenEssentials,43.67782,-79.351265,Cosmetics Shop


**No. of venues were returned for each neighborhood**

In [176]:
east_toronto_venues.groupby('Neighborhood').count()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
"Business reply mail Processing Centre, South Central Letter Processing Plant Toronto",16,16,16,16,16,16
"India Bazaar, The Beaches West",19,19,19,19,19,19
Studio District,37,37,37,37,37,37
The Beaches,4,4,4,4,4,4
"The Danforth West, Riverdale",42,42,42,42,42,42


**Unique categories curated from all the returned venues**

In [177]:
print('There are {} uniques categories.'.format(len(east_toronto_venues['Venue Category'].unique())))

There are 66 uniques categories.


**Analyze Each Neighborhood**

In [178]:
# one hot encoding
east_toronto_onehot = pd.get_dummies(east_toronto_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
east_toronto_onehot['Neighborhood'] = east_toronto_venues['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [east_toronto_onehot.columns[-1]] + list(east_toronto_onehot.columns[:-1])
east_toronto_onehot = east_toronto_onehot[fixed_columns]

east_toronto_onehot.head()

Unnamed: 0,Yoga Studio,American Restaurant,Bakery,Bank,Bar,Bookstore,Brewery,Bubble Tea Shop,Burrito Place,Café,Caribbean Restaurant,Cheese Shop,Clothing Store,Coffee Shop,Comfort Food Restaurant,Comic Shop,Convenience Store,Cosmetics Shop,Coworking Space,Dessert Shop,Diner,Farmers Market,Fast Food Restaurant,Fish & Chips Shop,Fish Market,Food & Drink Shop,Frozen Yogurt Shop,Fruit & Vegetable Store,Furniture / Home Store,Garden,Garden Center,Gastropub,Gay Bar,Greek Restaurant,Grocery Store,Gym,Gym / Fitness Center,Health Food Store,Ice Cream Shop,Indian Restaurant,Italian Restaurant,Juice Bar,Latin American Restaurant,Light Rail Station,Liquor Store,Lounge,Middle Eastern Restaurant,Movie Theater,Neighborhood,Park,Pet Store,Pharmacy,Pizza Place,Pub,Recording Studio,Restaurant,Sandwich Place,Seafood Restaurant,Skate Park,Spa,Stationery Store,Steakhouse,Sushi Restaurant,Thai Restaurant,Tibetan Restaurant,Trail
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,The Beaches,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1
1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,The Beaches,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,The Beaches,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0
3,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,The Beaches,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,"The Danforth West, Riverdale",0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


In [179]:
east_toronto_onehot.shape

(118, 66)

**Group rows by neighborhood and by taking the mean of the frequency of occurrence of each category**

In [180]:
east_toronto_grouped = east_toronto_onehot.groupby('Neighborhood').mean().reset_index()
east_toronto_grouped

Unnamed: 0,Neighborhood,Yoga Studio,American Restaurant,Bakery,Bank,Bar,Bookstore,Brewery,Bubble Tea Shop,Burrito Place,Café,Caribbean Restaurant,Cheese Shop,Clothing Store,Coffee Shop,Comfort Food Restaurant,Comic Shop,Convenience Store,Cosmetics Shop,Coworking Space,Dessert Shop,Diner,Farmers Market,Fast Food Restaurant,Fish & Chips Shop,Fish Market,Food & Drink Shop,Frozen Yogurt Shop,Fruit & Vegetable Store,Furniture / Home Store,Garden,Garden Center,Gastropub,Gay Bar,Greek Restaurant,Grocery Store,Gym,Gym / Fitness Center,Health Food Store,Ice Cream Shop,Indian Restaurant,Italian Restaurant,Juice Bar,Latin American Restaurant,Light Rail Station,Liquor Store,Lounge,Middle Eastern Restaurant,Movie Theater,Park,Pet Store,Pharmacy,Pizza Place,Pub,Recording Studio,Restaurant,Sandwich Place,Seafood Restaurant,Skate Park,Spa,Stationery Store,Steakhouse,Sushi Restaurant,Thai Restaurant,Tibetan Restaurant,Trail
0,"Business reply mail Processing Centre, South C...",0.0625,0.0,0.0,0.0,0.0,0.0,0.0625,0.0,0.0625,0.0,0.0,0.0,0.0,0.0,0.0,0.0625,0.0,0.0,0.0,0.0,0.0,0.0625,0.0625,0.0,0.0,0.0,0.0,0.0,0.0,0.0625,0.0625,0.0,0.0,0.0,0.0,0.0,0.0625,0.0,0.0,0.0,0.0,0.0,0.0,0.125,0.0,0.0,0.0,0.0,0.0625,0.0,0.0,0.0625,0.0,0.0625,0.0625,0.0,0.0,0.0625,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,"India Bazaar, The Beaches West",0.0,0.0,0.0,0.0,0.0,0.0,0.052632,0.0,0.052632,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.105263,0.052632,0.0,0.052632,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.052632,0.0,0.0,0.052632,0.0,0.052632,0.0,0.0,0.0,0.052632,0.0,0.0,0.052632,0.052632,0.052632,0.0,0.052632,0.052632,0.0,0.052632,0.052632,0.0,0.0,0.0,0.0,0.052632,0.052632,0.0,0.0,0.0
2,Studio District,0.027027,0.054054,0.054054,0.027027,0.027027,0.027027,0.054054,0.0,0.0,0.054054,0.0,0.027027,0.027027,0.081081,0.027027,0.0,0.027027,0.0,0.027027,0.0,0.027027,0.0,0.0,0.0,0.027027,0.0,0.0,0.0,0.0,0.0,0.0,0.054054,0.027027,0.0,0.0,0.0,0.027027,0.0,0.027027,0.0,0.027027,0.0,0.027027,0.0,0.0,0.0,0.027027,0.0,0.027027,0.027027,0.027027,0.0,0.0,0.0,0.0,0.0,0.027027,0.0,0.0,0.027027,0.0,0.0,0.027027,0.0,0.0
3,The Beaches,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25
4,"The Danforth West, Riverdale",0.02381,0.02381,0.02381,0.0,0.0,0.02381,0.02381,0.02381,0.0,0.02381,0.02381,0.0,0.0,0.095238,0.0,0.0,0.0,0.02381,0.0,0.02381,0.0,0.0,0.0,0.0,0.0,0.0,0.02381,0.02381,0.047619,0.0,0.0,0.0,0.0,0.190476,0.02381,0.0,0.0,0.0,0.047619,0.02381,0.071429,0.02381,0.0,0.0,0.02381,0.02381,0.0,0.0,0.0,0.0,0.0,0.02381,0.02381,0.0,0.02381,0.0,0.0,0.0,0.02381,0.0,0.0,0.0,0.0,0.02381,0.02381


**Print each neighborhood along with the top 5 most common venues**

In [181]:
num_top_venues = 5

for hood in east_toronto_grouped['Neighborhood']:
    print("----"+hood+"----")
    temp = east_toronto_grouped[east_toronto_grouped['Neighborhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----Business reply mail Processing Centre, South Central Letter Processing Plant Toronto----
                venue  freq
0  Light Rail Station  0.12
1         Yoga Studio  0.06
2          Skate Park  0.06
3      Farmers Market  0.06
4                Park  0.06


----India Bazaar, The Beaches West----
                  venue  freq
0  Fast Food Restaurant  0.11
1        Sandwich Place  0.05
2     Food & Drink Shop  0.05
3          Liquor Store  0.05
4         Movie Theater  0.05


----Studio District----
                 venue  freq
0          Coffee Shop  0.08
1               Bakery  0.05
2            Gastropub  0.05
3  American Restaurant  0.05
4              Brewery  0.05


----The Beaches----
               venue  freq
0              Trail  0.25
1  Health Food Store  0.25
2                Pub  0.25
3               Park  0.00
4                Gym  0.00


----The Danforth West, Riverdale----
                    venue  freq
0        Greek Restaurant  0.19
1             Coffee Shop  0.10

**Put this into a pandas dataframe**

In [182]:
#function to sort the venues in descending order

def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [183]:
#create the new dataframe and display the top 10 venues for each neighborhood

num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = east_toronto_grouped['Neighborhood']

for ind in np.arange(east_toronto_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(east_toronto_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted.head()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,"Business reply mail Processing Centre, South C...",Light Rail Station,Yoga Studio,Fast Food Restaurant,Park,Gym / Fitness Center,Farmers Market,Pizza Place,Burrito Place,Restaurant,Recording Studio
1,"India Bazaar, The Beaches West",Fast Food Restaurant,Pizza Place,Pub,Italian Restaurant,Gym,Liquor Store,Food & Drink Shop,Burrito Place,Movie Theater,Park
2,Studio District,Coffee Shop,American Restaurant,Bakery,Gastropub,Brewery,Café,Gay Bar,Convenience Store,Ice Cream Shop,Gym / Fitness Center
3,The Beaches,Trail,Pub,Health Food Store,Fast Food Restaurant,Cosmetics Shop,Coworking Space,Dessert Shop,Diner,Farmers Market,Fish & Chips Shop
4,"The Danforth West, Riverdale",Greek Restaurant,Coffee Shop,Italian Restaurant,Ice Cream Shop,Furniture / Home Store,Dessert Shop,Indian Restaurant,Grocery Store,Tibetan Restaurant,Fruit & Vegetable Store


**Cluster Neighborhoods**

In [184]:
#Run k-means to cluster the neighborhood into 5 clusters

# set number of clusters
kclusters = 5

east_toronto_grouped_clustering = east_toronto_grouped.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(east_toronto_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10]

array([3, 1, 4, 2, 0], dtype=int32)

In [185]:
#east_toronto_merged.rename(columns={"Neighbourhood": "Neighborhood"})
#east_toronto_merged.head()

In [186]:
#create a new dataframe that includes the cluster as well as the top 10 venues for each neighborhood

# add clustering labels
neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

east_toronto_merged = east_toronto_data
east_toronto_merged.rename(columns={"Neighbourhood": "Neighborhood"}, inplace=True)

# merge toronto_grouped with toronto_data to add latitude/longitude for each neighborhood
east_toronto_merged = east_toronto_merged.join(neighborhoods_venues_sorted.set_index('Neighborhood'), on='Neighborhood')

east_toronto_merged.head() # check the last columns!

Unnamed: 0,Postal Code,Borough,Neighborhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,M4E,East Toronto,The Beaches,43.676357,-79.293031,2,Trail,Pub,Health Food Store,Fast Food Restaurant,Cosmetics Shop,Coworking Space,Dessert Shop,Diner,Farmers Market,Fish & Chips Shop
1,M4K,East Toronto,"The Danforth West, Riverdale",43.679557,-79.352188,0,Greek Restaurant,Coffee Shop,Italian Restaurant,Ice Cream Shop,Furniture / Home Store,Dessert Shop,Indian Restaurant,Grocery Store,Tibetan Restaurant,Fruit & Vegetable Store
2,M4L,East Toronto,"India Bazaar, The Beaches West",43.668999,-79.315572,1,Fast Food Restaurant,Pizza Place,Pub,Italian Restaurant,Gym,Liquor Store,Food & Drink Shop,Burrito Place,Movie Theater,Park
3,M4M,East Toronto,Studio District,43.659526,-79.340923,4,Coffee Shop,American Restaurant,Bakery,Gastropub,Brewery,Café,Gay Bar,Convenience Store,Ice Cream Shop,Gym / Fitness Center
4,M7Y,East Toronto,"Business reply mail Processing Centre, South C...",43.662744,-79.321558,3,Light Rail Station,Yoga Studio,Fast Food Restaurant,Park,Gym / Fitness Center,Farmers Market,Pizza Place,Burrito Place,Restaurant,Recording Studio


In [188]:
#visualize the resulting clusters

# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(east_toronto_merged['Latitude'], east_toronto_merged['Longitude'], east_toronto_merged['Neighborhood'], east_toronto_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[int(cluster)],
        fill=True,
        fill_color=rainbow[int(cluster)],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

**Examine Clusters**

Examine each cluster and determine the discriminating venue categories that distinguish each cluster. Based on the defining categories, you can then assign a name to each cluster. I will leave this exercise to you.

In [189]:
#Cluster 1

east_toronto_merged.loc[east_toronto_merged['Cluster Labels'] == 0, east_toronto_merged.columns[[1] + list(range(5, east_toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
1,East Toronto,0,Greek Restaurant,Coffee Shop,Italian Restaurant,Ice Cream Shop,Furniture / Home Store,Dessert Shop,Indian Restaurant,Grocery Store,Tibetan Restaurant,Fruit & Vegetable Store


In [190]:
#Cluster 2

east_toronto_merged.loc[east_toronto_merged['Cluster Labels'] == 1, east_toronto_merged.columns[[1] + list(range(5, east_toronto_merged.shape[1]))]]


Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
2,East Toronto,1,Fast Food Restaurant,Pizza Place,Pub,Italian Restaurant,Gym,Liquor Store,Food & Drink Shop,Burrito Place,Movie Theater,Park


In [191]:
#Cluster 3

east_toronto_merged.loc[east_toronto_merged['Cluster Labels'] == 2, east_toronto_merged.columns[[1] + list(range(5, east_toronto_merged.shape[1]))]]


Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,East Toronto,2,Trail,Pub,Health Food Store,Fast Food Restaurant,Cosmetics Shop,Coworking Space,Dessert Shop,Diner,Farmers Market,Fish & Chips Shop


In [192]:
#Cluster 4

east_toronto_merged.loc[east_toronto_merged['Cluster Labels'] == 3, east_toronto_merged.columns[[1] + list(range(5, east_toronto_merged.shape[1]))]]


Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
4,East Toronto,3,Light Rail Station,Yoga Studio,Fast Food Restaurant,Park,Gym / Fitness Center,Farmers Market,Pizza Place,Burrito Place,Restaurant,Recording Studio


In [193]:
#Cluster 5

east_toronto_merged.loc[east_toronto_merged['Cluster Labels'] == 4, east_toronto_merged.columns[[1] + list(range(5, east_toronto_merged.shape[1]))]]


Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
3,East Toronto,4,Coffee Shop,American Restaurant,Bakery,Gastropub,Brewery,Café,Gay Bar,Convenience Store,Ice Cream Shop,Gym / Fitness Center
