# Coursera Peer-graded Assignment
## Segment and Cluster Toronto Neighborhoods

### Step 1: Get postal code data from wikipedia, clean the data and get into a pandas dataframe

# CODE FOR STEP 3 OF THE ASSIGNMENT.....SCROLL DOWN TO THE CODE

Import libraries to be used 


In [1]:
import pandas as pd
import numpy as np
from bs4 import BeautifulSoup   # using BeautifulSoup to parse html for data needed for this lab
from urllib.request import urlopen   # library to read url web page
import json # library to handle JSON files
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe
# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

Read the wikipedia web page that contains a table of Toronto posstal codes
Parse the page with BeautifulSoup

In [2]:
url = 'https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M' #URL of the wikipedia page to read
f = urlopen(url)   # open the url
wiki_html = f.read()  # read the data from the page
soup = BeautifulSoup(wiki_html, 'html.parser')  # parse data with BeautifulSoup

Extract the table of postal codes and put into a pandas data frame

In [3]:
table = soup.find_all('table')[0]   #  Extract just the postal code table from the html page
dfs = pd.read_html(str(table))    #pandas gets a list of dataframes for the postal codes
df = dfs[0]   # Get postal codes data from the list into a dataframe

Now clean up the data frame 
drop any row that does not have a Borough name assigned.
replace any Neighbourhood that is not assigned with thet the Borough name

In [11]:
df_x = df[df.Borough != 'Not assigned']    # Get rid of the Borough's that are not assigned

df_y = df_x[['Postcode','Borough','Neighbourhood']]


In [10]:

# Get rid of the Borough's that are not assigned

df_x = df[df.Borough != 'Not assigned'].copy()

# replace any not assigned Neighbourhoods with Borough

df_x['Neighbourhood'].loc[df_x['Neighbourhood'] == 'Not assigned'] = df_x['Borough']

df_x.head()

Unnamed: 0,Postcode,Borough,Neighbourhood
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,Harbourfront
5,M5A,Downtown Toronto,Regent Park
6,M6A,North York,Lawrence Heights


Finally, group the data by postal code, combine neighbourhoods within postal codes separating them with commas

In [12]:

post_codes = df_x.groupby(['Postcode','Borough'],as_index=False)['Neighbourhood'].apply(','.join).reset_index()  # group postal codes, combine neighborhoods 
post_codes.rename(columns={0:'Neighbourhood'}, inplace=True)

Display the shape of the final data frame

In [13]:
post_codes.shape    #  display the shape of the resulting data frame


(103, 3)

In [14]:
post_codes.head()


Unnamed: 0,Postcode,Borough,Neighbourhood
0,M1B,Scarborough,"Rouge,Malvern"
1,M1C,Scarborough,"Highland Creek,Rouge Hill,Port Union"
2,M1E,Scarborough,"Guildwood,Morningside,West Hill"
3,M1G,Scarborough,Woburn
4,M1H,Scarborough,Cedarbrae


# Coursera Peer-graded Assignment
## Segment and Cluster Toronto Neighborhoods

### Step 2: Get latitude and longitude for postal codes

In [15]:
import geocoder

In [16]:
lat_lng_coords = None
latitude = []    # create a list for the latitude
longitude = []  # create a list for the longitude

for index, row in post_codes.iterrows():
#    print(row['Postcode'])
    while(lat_lng_coords == None):
        g = geocoder.arcgis('{}, Toronto, Ontario'.format(row['Postcode']))
        lat_lng_coords = g.latlng
    latitude.append(lat_lng_coords[0]) 
    longitude.append(lat_lng_coords[1])
    lat_lng_coords = None

post_codes['Latitude'] = latitude
post_codes['Longitude'] = longitude
print('coordinates have been loaded')



coordinates have been loaded


In [17]:
post_codes.head()

Unnamed: 0,Postcode,Borough,Neighbourhood,Latitude,Longitude
0,M1B,Scarborough,"Rouge,Malvern",43.811525,-79.195517
1,M1C,Scarborough,"Highland Creek,Rouge Hill,Port Union",43.78573,-79.15875
2,M1E,Scarborough,"Guildwood,Morningside,West Hill",43.76569,-79.175256
3,M1G,Scarborough,Woburn,43.768359,-79.21759
4,M1H,Scarborough,Cedarbrae,43.769688,-79.23944


# STEP 3 STARTS HERE

# Coursera Peer-graded Assignment
## Segment and Cluster Toronto Neighborhoods

# Step 3: Segment and cluster neighborhoods

####. Import libraries for map rendering and k-means clustering and foursquare venue information

In [18]:
#  Import libraries for clustering and map visualization
import folium   # map rendering library
from sklearn.cluster import KMeans   #  library to cluster via k-means
import requests # library to handle requests

#  ----------------------DEFINE FUNCTIONS----------------------------------------------------
####  Define functions that will be used in this segment of the exercise|

In [19]:
#
# function that extracts the category of the venue
#

def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

In [20]:
#
#  function to get nearby venues in all the neighborhoods
#

def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            limit)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Postcode', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [21]:
#
#  function to sort the venues in descending order.
#

def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

####   END OF FUNCTIONS
# -------------------------------END OF FUNCTION DEF's---------------------------------------

####  get coordinates for Toronto, Ontario, Canada to set map center

In [22]:
address = 'Toronto, Ontario'
lat_lng_coords = None

while(lat_lng_coords == None):
    g = geocoder.arcgis('{}, Toronto, Ontario'.format(row['Postcode']))
    lat_lng_coords = g.latlng
    tor_lat = lat_lng_coords[0] 
    tor_lng = lat_lng_coords[1]
    
print('The geograpical coordinate of Toronto are {}, {}.'.format(tor_lat, tor_lng))

The geograpical coordinate of Toronto are 43.71174000000008, -79.57918134599998.


#### Create a map of Toronto with neighborhoods superimposed on top.

In [24]:
#
# create map of Toronto using latitude and longitude values obtained above
#

map_toronto = folium.Map(location=[tor_lat, tor_lng], zoom_start=10)

#
# add markers to map for neighborhoods/postal codes
#

for lat, lng, borough, neighborhood in zip(post_codes['Latitude'], post_codes['Longitude'], post_codes['Borough'], post_codes['Neighbourhood']):
    label = '{}, {}'.format(borough, neighborhood)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_toronto)  
    
map_toronto



#### Now utilize Foursquare API to explore the neighborhoods.

#### Define Foursquare Credentials and Version

In [25]:
CLIENT_ID = 'KZTP21MDEVOQIJCJPYIIM2LUL1BYATR4CZGVUHVI1N44NHNU' # your Foursquare ID
CLIENT_SECRET = 'AFXBPNNG1TC0BODY5KBSUWC4LL113MGGS5GSZEQSCHHUFYMH' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: KZTP21MDEVOQIJCJPYIIM2LUL1BYATR4CZGVUHVI1N44NHNU
CLIENT_SECRET:AFXBPNNG1TC0BODY5KBSUWC4LL113MGGS5GSZEQSCHHUFYMH


#### explore the one neighborhood in our dataframe to make sure everything is set up correctly.

In [26]:
post_codes.loc[10, 'Postcode']
neighborhood_latitude = post_codes.loc[10, 'Latitude'] # neighborhood latitude value
neighborhood_longitude = post_codes.loc[10, 'Longitude'] # neighborhood longitude value

neighborhood_name =post_codes.loc[10, 'Postcode'] # neighborhood name

print('Latitude and longitude values of {} are {}, {}.'.format(neighborhood_name, 
                                                               neighborhood_latitude, 
                                                               neighborhood_longitude))

Latitude and longitude values of M1P are 43.759975000000054, -79.26897402899993.


#### Now, let's get the top 100 venues that are in that neighborhood within a radius of 500 meters.

First create the GET request URL for foursquare. 

In [27]:

limit = 100   #Limit to 100 venues
radius = 500  # 500 meters from center point
url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
    CLIENT_ID, 
    CLIENT_SECRET, 
    VERSION, 
    neighborhood_latitude, 
    neighborhood_longitude, 
    radius, 
    limit)

url

'https://api.foursquare.com/v2/venues/explore?&client_id=KZTP21MDEVOQIJCJPYIIM2LUL1BYATR4CZGVUHVI1N44NHNU&client_secret=AFXBPNNG1TC0BODY5KBSUWC4LL113MGGS5GSZEQSCHHUFYMH&v=20180605&ll=43.759975000000054,-79.26897402899993&radius=500&limit=100'

#### Get results from foursquare,  in for form of a json file

In [28]:
results = requests.get(url).json()
results

{'meta': {'code': 200, 'requestId': '5d45b85f6e46500038d98521'},
  'headerLocation': 'Dorset Park',
  'headerFullLocation': 'Dorset Park, Toronto',
  'headerLocationGranularity': 'neighborhood',
  'totalResults': 3,
  'suggestedBounds': {'ne': {'lat': 43.764475004500056,
    'lng': -79.26275507085488},
   'sw': {'lat': 43.75547499550005, 'lng': -79.27519298714499}},
  'groups': [{'type': 'Recommended Places',
    'name': 'recommended',
    'items': [{'reasons': {'count': 0,
       'items': [{'summary': 'This spot is popular',
         'type': 'general',
         'reasonName': 'globalInteractionReason'}]},
      'venue': {'id': '4bc88e208b7c9c743b5538cf',
       'name': 'Calabria Bakery',
       'location': {'address': '1772 Midland Avenue',
        'lat': 43.7616699583205,
        'lng': -79.26950674263456,
        'labeledLatLngs': [{'label': 'display',
          'lat': 43.7616699583205,
          'lng': -79.26950674263456}],
        'distance': 193,
        'cc': 'CA',
        'city'


####   Now that that worked,   put into a pandas dataframe and get just the columns of interest



In [29]:
venues = results['response']['groups'][0]['items']
    
nearby_venues = json_normalize(venues) # flatten JSON

# filter columns
filtered_columns = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
nearby_venues =nearby_venues.loc[:, filtered_columns]

# filter the category for each row
nearby_venues['venue.categories'] = nearby_venues.apply(get_category_type, axis=1)

# clean columns
nearby_venues.columns = [col.split(".")[-1] for col in nearby_venues.columns]

nearby_venues.head()

Unnamed: 0,name,categories,lat,lng
0,Calabria Bakery,Bakery,43.76167,-79.269507
1,Aurora Fine China & Crystal,Gift Shop,43.757156,-79.267361
2,Shiro Sushi,Japanese Restaurant,43.756228,-79.266965


In [30]:
print('{} venues were returned by Foursquare.'.format(nearby_venues.shape[0]))

3 venues were returned by Foursquare.


#### Now that that the initial setup works,  let's get venues for all postal codes/neighborhoods

In [31]:
toronto_venues = getNearbyVenues(names=post_codes['Postcode'],
                                   latitudes=post_codes['Latitude'],
                                   longitudes=post_codes['Longitude']
                                  )

M1B
M1C
M1E
M1G
M1H
M1J
M1K
M1L
M1M
M1N
M1P
M1R
M1S
M1T
M1V
M1W
M1X
M2H
M2J
M2K
M2L
M2M
M2N
M2P
M2R
M3A
M3B
M3C
M3H
M3J
M3K
M3L
M3M
M3N
M4A
M4B
M4C
M4E
M4G
M4H
M4J
M4K
M4L
M4M
M4N
M4P
M4R
M4S
M4T
M4V
M4W
M4X
M4Y
M5A
M5B
M5C
M5E
M5G
M5H
M5J
M5K
M5L
M5M
M5N
M5P
M5R
M5S
M5T
M5V
M5W
M5X
M6A
M6B
M6C
M6E
M6G
M6H
M6J
M6K
M6L
M6M
M6N
M6P
M6R
M6S
M7A
M7R
M7Y
M8V
M8W
M8X
M8Y
M8Z
M9A
M9B
M9C
M9L
M9M
M9N
M9P
M9R
M9V
M9W


In [32]:
toronto_venues.head()

Unnamed: 0,Postcode,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,M1B,43.811525,-79.195517,Canadian Appliance Source Whitby,43.808353,-79.191331,Home Service
1,M1C,43.78573,-79.15875,Royal Canadian Legion,43.782533,-79.163085,Bar
2,M1E,43.76569,-79.175256,Homestead Roofing Repair,43.76514,-79.178663,Construction & Landscaping
3,M1E,43.76569,-79.175256,Heron Park Community Centre,43.768867,-79.176958,Gym / Fitness Center
4,M1E,43.76569,-79.175256,Heron Park,43.769327,-79.177201,Park


In [33]:
# How many venues were returned by neighborhoor

toronto_venues.groupby('Postcode').count()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Postcode,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
M1B,1,1,1,1,1,1
M1C,1,1,1,1,1,1
M1E,4,4,4,4,4,4
M1G,5,5,5,5,5,5
M1H,1,1,1,1,1,1
M1J,4,4,4,4,4,4
M1K,6,6,6,6,6,6
M1L,11,11,11,11,11,11
M1M,9,9,9,9,9,9
M1N,6,6,6,6,6,6


In [34]:
# How many unique catagories were returned

print('There are {} uniques categories.'.format(len(toronto_venues['Venue Category'].unique())))

There are 261 uniques categories.


## Analyze Each Neighborhood

#### encode the venue types to set up for analysis

In [35]:
# one hot encoding
toronto_onehot = pd.get_dummies(toronto_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
toronto_onehot['Postcode'] = toronto_venues['Postcode'] 

# move neighborhood column to the first column
fixed_columns = [toronto_onehot.columns[-1]] + list(toronto_onehot.columns[:-1])
toronto_onehot = toronto_onehot[fixed_columns]

toronto_onehot.head(10)

Unnamed: 0,Postcode,Afghan Restaurant,Airport,American Restaurant,Antique Shop,Art Gallery,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,Auto Dealership,...,Trail,Train Station,Vegetarian / Vegan Restaurant,Video Game Store,Video Store,Vietnamese Restaurant,Wine Bar,Wings Joint,Women's Store,Yoga Studio
0,M1B,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,M1C,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,M1E,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,M1E,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,M1E,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
5,M1E,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
6,M1G,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
7,M1G,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
8,M1G,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
9,M1G,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [36]:
#. Check what the encoded dataframe looks like.

toronto_onehot.shape

(2470, 262)

In [40]:
#. group all venues by postal code

toronto_grouped = toronto_onehot.groupby('Postcode').sum().reset_index()
toronto_grouped

Unnamed: 0,Postcode,Afghan Restaurant,Airport,American Restaurant,Antique Shop,Art Gallery,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,Auto Dealership,...,Trail,Train Station,Vegetarian / Vegan Restaurant,Video Game Store,Video Store,Vietnamese Restaurant,Wine Bar,Wings Joint,Women's Store,Yoga Studio
0,M1B,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,M1C,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,M1E,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,M1G,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,M1H,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
5,M1J,0,0,0,0,0,0,0,0,0,...,0,1,0,0,0,0,0,0,0,0
6,M1K,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
7,M1L,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
8,M1M,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
9,M1N,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [41]:
toronto_grouped.shape

(102, 262)

In [42]:
#   print top 5 venues for each postal code 

num_top_venues = 5

for hood in toronto_grouped['Postcode']:
    print("----"+hood+"----")
    temp = toronto_grouped[toronto_grouped['Postcode'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----M1B----
                       venue  freq
0               Home Service   1.0
1   Mediterranean Restaurant   0.0
2              Metro Station   0.0
3         Mexican Restaurant   0.0
4  Middle Eastern Restaurant   0.0


----M1C----
                       venue  freq
0                        Bar   1.0
1               Neighborhood   0.0
2              Metro Station   0.0
3         Mexican Restaurant   0.0
4  Middle Eastern Restaurant   0.0


----M1E----
                        venue  freq
0  Construction & Landscaping   1.0
1                    Bus Stop   1.0
2                        Park   1.0
3        Gym / Fitness Center   1.0
4           Afghan Restaurant   0.0


----M1G----
                venue  freq
0  Mexican Restaurant   1.0
1                Park   1.0
2         Coffee Shop   1.0
3    Business Service   1.0
4   Korean Restaurant   1.0


----M1H----
                       venue  freq
0                 Playground   1.0
1   Mediterranean Restaurant   0.0
2              Metro St

                venue  freq
0        Dessert Shop   3.0
1      Sandwich Place   2.0
2         Pizza Place   2.0
3  Italian Restaurant   2.0
4         Coffee Shop   2.0


----M4T----
               venue  freq
0  Convenience Store   1.0
1                Gym   1.0
2       Tennis Court   1.0
3         Restaurant   1.0
4         Playground   1.0


----M4V----
                 venue  freq
0   Light Rail Station   2.0
1          Coffee Shop   2.0
2          Supermarket   1.0
3         Liquor Store   1.0
4  Monument / Landmark   0.0


----M4W----
                       venue  freq
0                 Playground   1.0
1                       Park   1.0
2                       Bank   1.0
3                   Building   1.0
4  Middle Eastern Restaurant   0.0


----M4X----
         venue  freq
0  Coffee Shop   4.0
1  Pizza Place   2.0
2       Bakery   2.0
3         Café   2.0
4   Restaurant   2.0


----M4Y----
                 venue  freq
0          Coffee Shop   8.0
1  Japanese Restaurant   5.0
2  

            venue  freq
0        Pharmacy   2.0
1  Baseball Field   1.0
2            Park   1.0
3            Bank   1.0
4   Grocery Store   1.0


----M9B----
                venue  freq
0         Pizza Place   2.0
1      Sandwich Place   1.0
2                Bank   1.0
3  Chinese Restaurant   1.0
4            Tea Room   1.0


----M9C----
                venue  freq
0  College Rec Center   1.0
1   Fish & Chips Shop   1.0
2       Grocery Store   1.0
3                Bank   1.0
4        Carpet Store   1.0


----M9L----
                             venue  freq
0                      Auto Garage   1.0
1       Construction & Landscaping   1.0
2              Rental Car Location   1.0
3           Furniture / Home Store   1.0
4  Molecular Gastronomy Restaurant   0.0


----M9M----
                 venue  freq
0          Coffee Shop   2.0
1            Nightclub   1.0
2                 Park   1.0
3    Afghan Restaurant   0.0
4  Monument / Landmark   0.0


----M9N----
                 venue  freq
0

#### Let's put that into a *pandas* dataframe

Now let's create the new dataframe and display the top 10 venues for each neighborhood.

In [68]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Postcode']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Postcode'] = toronto_grouped['Postcode']

for ind in np.arange(toronto_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(toronto_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted.head()

Unnamed: 0,Postcode,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,M1B,Home Service,Flower Shop,Fish Market,Fish & Chips Shop,Fireworks Store,Field,Fast Food Restaurant,Farmers Market,Farm,Falafel Restaurant
1,M1C,Bar,Yoga Studio,Flea Market,Fish Market,Fish & Chips Shop,Fireworks Store,Field,Fast Food Restaurant,Farmers Market,Farm
2,M1E,Construction & Landscaping,Park,Bus Stop,Gym / Fitness Center,Fish & Chips Shop,Fireworks Store,Field,Fast Food Restaurant,Fish Market,Donut Shop
3,M1G,Park,Korean Restaurant,Mexican Restaurant,Business Service,Coffee Shop,Yoga Studio,Falafel Restaurant,Electronics Store,Ethiopian Restaurant,Event Space
4,M1H,Playground,Yoga Studio,Farm,Dumpling Restaurant,Eastern European Restaurant,Electronics Store,Ethiopian Restaurant,Event Space,Falafel Restaurant,Farmers Market


## 4. Cluster Neighborhoods

create a new dataframe that includes the cluster as well as the top 10 venues for each neighborhood.

In [69]:
# set number of clusters
kclusters = 3

toronto_grouped_clustering = toronto_grouped.drop('Postcode', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(toronto_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

array([1, 1, 1, 1, 1, 1, 1, 1, 1, 1], dtype=int32)

In [70]:
# add clustering labels
neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

toronto_merged = post_codes

# merge toronto_grouped with toronto_data to add latitude/longitude for each neighborhood
toronto_merged = toronto_merged.join(neighborhoods_venues_sorted.set_index('Postcode'), on='Postcode')

# Some neighborhoods did not return any venues,  drop those neighborhoods from the clustering
# Because some had NaN for cluster labels, column was float and needs to be integer so convert to int after dropping NaN rows

toronto_merged.dropna(subset=['Cluster Labels'],inplace=True)
toronto_merged['Cluster Labels'] = toronto_merged['Cluster Labels'].astype('Int32')
toronto_merged['Cluster Labels'].dtype
toronto_merged.head()


toronto_merged.head() # check the last columns!

Unnamed: 0,Postcode,Borough,Neighbourhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,M1B,Scarborough,"Rouge,Malvern",43.811525,-79.195517,1,Home Service,Flower Shop,Fish Market,Fish & Chips Shop,Fireworks Store,Field,Fast Food Restaurant,Farmers Market,Farm,Falafel Restaurant
1,M1C,Scarborough,"Highland Creek,Rouge Hill,Port Union",43.78573,-79.15875,1,Bar,Yoga Studio,Flea Market,Fish Market,Fish & Chips Shop,Fireworks Store,Field,Fast Food Restaurant,Farmers Market,Farm
2,M1E,Scarborough,"Guildwood,Morningside,West Hill",43.76569,-79.175256,1,Construction & Landscaping,Park,Bus Stop,Gym / Fitness Center,Fish & Chips Shop,Fireworks Store,Field,Fast Food Restaurant,Fish Market,Donut Shop
3,M1G,Scarborough,Woburn,43.768359,-79.21759,1,Park,Korean Restaurant,Mexican Restaurant,Business Service,Coffee Shop,Yoga Studio,Falafel Restaurant,Electronics Store,Ethiopian Restaurant,Event Space
4,M1H,Scarborough,Cedarbrae,43.769688,-79.23944,1,Playground,Yoga Studio,Farm,Dumpling Restaurant,Eastern European Restaurant,Electronics Store,Ethiopian Restaurant,Event Space,Falafel Restaurant,Farmers Market


In [71]:
# create map
map_clusters = folium.Map(location=[tor_lat, tor_lng], zoom_start=11)
# toronto_merged['Cluster Labels'] = toronto_merged['Cluster Labels'].dtype=pd.Int64Dtype()


# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(toronto_merged['Latitude'], toronto_merged['Longitude'], toronto_merged['Neighbourhood'], toronto_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

In [72]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 0, toronto_merged.columns[[0] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Postcode,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
43,M4M,0,Café,Italian Restaurant,Diner,Bakery,Sushi Restaurant,Brewery,American Restaurant,Gastropub,Bar,Arts & Crafts Store
51,M4X,0,Coffee Shop,Restaurant,Market,Bakery,Italian Restaurant,Pizza Place,Café,Liquor Store,Chinese Restaurant,Pub
52,M4Y,0,Coffee Shop,Japanese Restaurant,Dance Studio,Restaurant,Gay Bar,Sushi Restaurant,Men's Store,Bubble Tea Shop,Hotel,Pub
53,M5A,0,Coffee Shop,Restaurant,Breakfast Spot,Food Truck,Electronics Store,Italian Restaurant,Bakery,Pub,Thai Restaurant,Health Food Store
56,M5E,0,Coffee Shop,Cocktail Bar,Restaurant,Bakery,Farmers Market,Hotel,Seafood Restaurant,Breakfast Spot,Steakhouse,Cheese Shop
66,M5S,0,Café,Restaurant,Coffee Shop,Bakery,Gym,Japanese Restaurant,Bar,Bookstore,Yoga Studio,Pizza Place
67,M5T,0,Café,Vegetarian / Vegan Restaurant,Chinese Restaurant,Bar,Mexican Restaurant,Vietnamese Restaurant,Dumpling Restaurant,Ice Cream Shop,Coffee Shop,Ramen Restaurant
68,M5V,0,Coffee Shop,Italian Restaurant,Restaurant,Café,Bar,Pub,Park,Speakeasy,Gym / Fitness Center,Sandwich Place
77,M6J,0,Coffee Shop,Bar,Restaurant,Cocktail Bar,Asian Restaurant,Pizza Place,French Restaurant,Bakery,New American Restaurant,Vietnamese Restaurant
78,M6K,0,Coffee Shop,Café,Furniture / Home Store,Restaurant,Sandwich Place,Italian Restaurant,Bar,Art Gallery,Vegetarian / Vegan Restaurant,Beer Bar


In [73]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 1, toronto_merged.columns[[0] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Postcode,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,M1B,1,Home Service,Flower Shop,Fish Market,Fish & Chips Shop,Fireworks Store,Field,Fast Food Restaurant,Farmers Market,Farm,Falafel Restaurant
1,M1C,1,Bar,Yoga Studio,Flea Market,Fish Market,Fish & Chips Shop,Fireworks Store,Field,Fast Food Restaurant,Farmers Market,Farm
2,M1E,1,Construction & Landscaping,Park,Bus Stop,Gym / Fitness Center,Fish & Chips Shop,Fireworks Store,Field,Fast Food Restaurant,Fish Market,Donut Shop
3,M1G,1,Park,Korean Restaurant,Mexican Restaurant,Business Service,Coffee Shop,Yoga Studio,Falafel Restaurant,Electronics Store,Ethiopian Restaurant,Event Space
4,M1H,1,Playground,Yoga Studio,Farm,Dumpling Restaurant,Eastern European Restaurant,Electronics Store,Ethiopian Restaurant,Event Space,Falafel Restaurant,Farmers Market
5,M1J,1,Indian Restaurant,Grocery Store,Train Station,Restaurant,Yoga Studio,Donut Shop,Dumpling Restaurant,Eastern European Restaurant,Electronics Store,Ethiopian Restaurant
6,M1K,1,Discount Store,Hobby Shop,Department Store,Coffee Shop,Convenience Store,Hardware Store,Harbor / Marina,Fireworks Store,Field,Fast Food Restaurant
7,M1L,1,Bus Line,Bakery,Coffee Shop,Intersection,Soccer Field,Bus Station,Metro Station,Farm,Ethiopian Restaurant,Event Space
8,M1M,1,Fast Food Restaurant,Pharmacy,Coffee Shop,Discount Store,Furniture / Home Store,Sandwich Place,Liquor Store,Pizza Place,Falafel Restaurant,Event Space
9,M1N,1,General Entertainment,College Stadium,Skating Rink,Gym Pool,Gym,Park,Fast Food Restaurant,Farmers Market,Field,Doctor's Office


In [74]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 2, toronto_merged.columns[[0] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Postcode,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
54,M5B,2,Coffee Shop,Clothing Store,Cosmetics Shop,Middle Eastern Restaurant,Café,Fast Food Restaurant,Tea Room,Lingerie Store,Diner,Japanese Restaurant
55,M5C,2,Coffee Shop,Café,Restaurant,Italian Restaurant,Hotel,Breakfast Spot,Seafood Restaurant,Clothing Store,Gastropub,Cocktail Bar
57,M5G,2,Coffee Shop,Clothing Store,Cosmetics Shop,Tea Room,Middle Eastern Restaurant,Plaza,Sushi Restaurant,Fast Food Restaurant,Sandwich Place,Hotel
58,M5H,2,Coffee Shop,Café,Hotel,Asian Restaurant,Bar,Burger Joint,Restaurant,Gastropub,Deli / Bodega,Steakhouse
60,M5K,2,Coffee Shop,Café,Hotel,Restaurant,Italian Restaurant,American Restaurant,Bakery,Gastropub,Bar,Gym
61,M5L,2,Coffee Shop,Hotel,Restaurant,Café,American Restaurant,Gym,Italian Restaurant,Japanese Restaurant,Deli / Bodega,Beer Bar
69,M5W,2,Coffee Shop,Bar,Café,Steakhouse,Hotel,Japanese Restaurant,Pizza Place,Sushi Restaurant,Pub,Italian Restaurant
70,M5X,2,Coffee Shop,Café,Hotel,American Restaurant,Asian Restaurant,Bar,Bakery,Gastropub,Steakhouse,Restaurant
85,M7A,2,Coffee Shop,Sandwich Place,Café,Indian Restaurant,Bubble Tea Shop,Italian Restaurant,Smoothie Shop,Art Gallery,Bookstore,Juice Bar
86,M7R,2,Coffee Shop,Bar,Café,Steakhouse,Hotel,Japanese Restaurant,Pizza Place,Sushi Restaurant,Pub,Italian Restaurant
