<h1 align=center> Coursera Capstone Project</h1>
<h2 align='center' style="font-size: 18px"> By: Kyle McLester</h2>
<hr>

<body>This notebook is to demonstrate how to convert addresses into equivalent latitude and longitude values. Also, it will implement the Foursquare API to explore neighborhoods in Toronto, Canada. The neighborhoods will then be clustered using k-means based on the postalcode and borough information.</body>
<hr>

<h2> Part 1</h2><h3> Import necessary libraries </h3>

In [1]:
import pandas as pd # library to handle data in a vectorized manner

import numpy as np # library for data analysis
#pd.set_option('display.max_columns', None)
#pd.set_option('display.max_rows', None)

import json # library to hand JSON files

from geopy.geocoders import Nominatim # convert an address into lattitude and longitude values

import requests # library to handle requests
from pandas.io.json import json_normalize # transform JSON file into a pandas dataframe

import matplotlib.cm as cm
import matplotlib.colors as colors

from sklearn.cluster import KMeans # library to handle k means clustering algorithm

import folium # library for map rendering

from bs4 import BeautifulSoup # library to help in web scraping

print('Libraries imported successfully')

Libraries imported successfully


<h3> Create Soup Object and Retrieve Data </h3>

In [2]:
url = 'https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M'
html_data = requests.get(url)
print('Retrieved HTML data')

Retrieved HTML data


In [3]:
soup = BeautifulSoup(html_data.text, 'html5lib')
print('Created BeautifulSoup object')

Created BeautifulSoup object


In [4]:
tag_object = soup.title
tag_object

<title>List of postal codes of Canada: M - Wikipedia</title>

In [5]:
table_contents=[]
table=soup.find('table')
for row in table.findAll('td'):
    cell = {}
    if row.span.text=='Not assigned':
        pass
    else:
        cell['PostalCode'] = row.p.text[:3]
        cell['Borough'] = (row.span.text).split('(')[0]
        cell['Neighborhood'] = (((((row.span.text).split('(')[1]).strip(')')).replace(' /',',')).replace(')',' ')).strip(' ')
        table_contents.append(cell)

<h3> Create Pandas DataFrame for table_contents </h3>

In [6]:
toronto_data = pd.DataFrame(table_contents)
toronto_data['Borough'] = toronto_data['Borough'].replace({'Downtown A PO Boxes25 The Esplande': 'Downtown Toronto Stn A',
                                       'East TorontoBusiness reply mail Processing Centre969 Eastern': 'East Toronto Business',
                                       'EtobicokeNorthWest':'Etobicoke Northwest', 'East YorkEast Toronto': 'East York/East Toronto',
                                       'MississaugaCanada Post Gateway Proecessing Centre':'Mississauga'})

In [7]:
toronto_data.head()

Unnamed: 0,PostalCode,Borough,Neighborhood
0,M3A,North York,Parkwoods
1,M4A,North York,Victoria Village
2,M5A,Downtown Toronto,"Regent Park, Harbourfront"
3,M6A,North York,"Lawrence Manor, Lawrence Heights"
4,M7A,Queen's Park,Ontario Provincial Government


In [8]:
toronto_data.shape

(103, 3)

<h2>Part 2</h2><h3> Convert Address to Latitude and Longitude </h3>

In [9]:
geo_coords = pd.read_csv('Geospatial_Coordinates.csv')
geo_df = geo_coords.set_index(geo_coords['Postal Code'])
geo_df.head()

Unnamed: 0_level_0,Postal Code,Latitude,Longitude
Postal Code,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
M1B,M1B,43.806686,-79.194353
M1C,M1C,43.784535,-79.160497
M1E,M1E,43.763573,-79.188711
M1G,M1G,43.770992,-79.216917
M1H,M1H,43.773136,-79.239476


In [10]:
list_latitudes = []
list_longitudes = []
for codes in toronto_data['PostalCode']:
    single_lat = geo_df.loc[codes]['Latitude']
    single_long = geo_df.loc[codes]['Longitude']
    list_longitudes.append(single_long)
    list_latitudes.append(single_lat)
    
# Another way to merge these dataframes is: merged_df = pd.merge(toronto_data, geo_df, how='inner', on='PostalCode')

In [11]:
toronto_data['Latitude'] = list_latitudes
toronto_data['Longitude'] = list_longitudes

In [12]:
toronto_data.head()

Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude
0,M3A,North York,Parkwoods,43.753259,-79.329656
1,M4A,North York,Victoria Village,43.725882,-79.315572
2,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.65426,-79.360636
3,M6A,North York,"Lawrence Manor, Lawrence Heights",43.718518,-79.464763
4,M7A,Queen's Park,Ontario Provincial Government,43.662301,-79.389494


<h2>Part 3</h2><h3>Render a map of Toronto, CA with neighborhoods superimposed on top</h3>

In [13]:
address = 'Toronto, CA'

geolocator = Nominatim(user_agent="toronto_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Toronto are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Toronto are 43.6534817, -79.3839347.


In [14]:
# create map of Toronto using latitude and longitude values
map_toronto = folium.Map(location=[latitude, longitude], zoom_start=10)

# add markers to map
for lat, lng, borough, neighborhood in zip(toronto_data['Latitude'], toronto_data['Longitude'], toronto_data['Borough'], toronto_data['Neighborhood']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_toronto)  
    
map_toronto

<h3> Render a map of the borough: North York </h3>

In [15]:
borough_data = toronto_data[toronto_data['Borough'] == 'North York'].reset_index(drop=True)
borough_data.head()

# if you wanted to show only the buroughs with 'toronto', toronto = final_df[final_df['Borough'].str.contains("Toronto")].reset)index(drop=True)

Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude
0,M3A,North York,Parkwoods,43.753259,-79.329656
1,M4A,North York,Victoria Village,43.725882,-79.315572
2,M6A,North York,"Lawrence Manor, Lawrence Heights",43.718518,-79.464763
3,M3B,North York,Don Mills North,43.745906,-79.352188
4,M6B,North York,Glencairn,43.709577,-79.445073


In [16]:
address = 'North York, CA'

geolocator = Nominatim(user_agent="toronto_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of North York are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of North York are 43.7543263, -79.44911696639593.


In [17]:
# create map of north_york using latitude and longitude values
map_north_york = folium.Map(location=[latitude, longitude], zoom_start=11)

# add markers to map
for lat, lng, label in zip(borough_data['Latitude'], borough_data['Longitude'], borough_data['Neighborhood']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_north_york)  
    
map_north_york

<h3> Now we are going to use the Foursquare API </h3>

In [18]:
borough_data.loc[0, 'Neighborhood']

'Parkwoods'

In [19]:
neighborhood_latitude = borough_data.loc[0, 'Latitude'] # neighborhood latitude value
neighborhood_longitude = borough_data.loc[0, 'Longitude'] # neighborhood longitude value

neighborhood_name = borough_data.loc[0, 'Neighborhood'] # neighborhood name

print('Latitude and longitude values of {} are {}, {}.'.format(neighborhood_name, 
                                                               neighborhood_latitude, 
                                                               neighborhood_longitude))

Latitude and longitude values of Parkwoods are 43.7532586, -79.3296565.


In [20]:
CLIENT_ID = 'GOFVXW5JYKENRUZNP1ACD5PIKHHH5P2PLJATSK4NIUQDW441' # your Foursquare ID
CLIENT_SECRET = '3V2MUZBQNVB2YHJJY1XBKAG11CUKFQOGKA1X1LJOSFFFZ4LK' # your Foursquare Secret
ACCESS_TOKEN = '1J1IEP12ZIYAXVRSVUMFKIAMHAOXH1VUD4QSTFJIYHDW3YOW' # your FourSquare Access Token
VERSION = '20180604'
LIMIT = 100 # limit of number of venues returned by Foursquare API
radius = 500 # define radius

In [21]:
url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
    CLIENT_ID, 
    CLIENT_SECRET, 
    VERSION, 
    neighborhood_latitude, 
    neighborhood_longitude, 
    radius, 
    LIMIT)
url # display URL

'https://api.foursquare.com/v2/venues/explore?&client_id=GOFVXW5JYKENRUZNP1ACD5PIKHHH5P2PLJATSK4NIUQDW441&client_secret=3V2MUZBQNVB2YHJJY1XBKAG11CUKFQOGKA1X1LJOSFFFZ4LK&v=20180604&ll=43.7532586,-79.3296565&radius=500&limit=100'

In [22]:
results = requests.get(url).json()
results

{'meta': {'code': 200, 'requestId': '606120f05baaec38e3738731'},
  'headerLocation': 'Parkwoods - Donalda',
  'headerFullLocation': 'Parkwoods - Donalda, Toronto',
  'headerLocationGranularity': 'neighborhood',
  'totalResults': 2,
  'suggestedBounds': {'ne': {'lat': 43.757758604500005,
    'lng': -79.32343823984928},
   'sw': {'lat': 43.7487585955, 'lng': -79.33587476015072}},
  'groups': [{'type': 'Recommended Places',
    'name': 'recommended',
    'items': [{'reasons': {'count': 0,
       'items': [{'summary': 'This spot is popular',
         'type': 'general',
         'reasonName': 'globalInteractionReason'}]},
      'venue': {'id': '4e8d9dcdd5fbbbb6b3003c7b',
       'name': 'Brookbanks Park',
       'location': {'address': 'Toronto',
        'lat': 43.751976046055574,
        'lng': -79.33214044722958,
        'labeledLatLngs': [{'label': 'display',
          'lat': 43.751976046055574,
          'lng': -79.33214044722958}],
        'distance': 245,
        'cc': 'CA',
        'c

<h3> Extract category values from venues </h3>

In [23]:
# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

In [24]:
venues = results['response']['groups'][0]['items']
    
nearby_venues = json_normalize(venues) # flatten JSON

# filter columns
filtered_columns = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
nearby_venues = nearby_venues.reindex(columns = filtered_columns)
#nearby_venues =nearby_venues.loc[:, filtered_columns]

# filter the category for each row
nearby_venues['venue.categories'] = nearby_venues.apply(get_category_type, axis=1)

# clean columns
nearby_venues.columns = [col.split(".")[-1] for col in nearby_venues.columns]

nearby_venues.head()

  nearby_venues = json_normalize(venues) # flatten JSON


Unnamed: 0,name,categories,lat,lng
0,Brookbanks Park,Park,43.751976,-79.33214
1,Variety Store,Food & Drink Shop,43.751974,-79.333114


In [25]:
print('{} venues were returned by Foursquare.'.format(nearby_venues.shape[0]))

2 venues were returned by Foursquare.


<h3> Get all of the nearby venues </h3>

In [26]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [27]:
north_york_venues = getNearbyVenues(names=borough_data['Neighborhood'],
                                   latitudes=borough_data['Latitude'],
                                   longitudes=borough_data['Longitude']
                                  )

Parkwoods
Victoria Village
Lawrence Manor, Lawrence Heights
Don Mills North
Glencairn
Don Mills South
Hillcrest Village
Bathurst Manor, Wilson Heights, Downsview North
Fairview, Henry Farm, Oriole
Northwood Park, York University
Bayview Village
Downsview East
York Mills, Silver Hills
Downsview West
North Park, Maple Leaf Park, Upwood Park
Humber Summit
Willowdale, Newtonbrook
Downsview Central
Bedford Park, Lawrence Manor East
Humberlea, Emery
Willowdale South
Downsview Northwest
York Mills West
Willowdale West


In [28]:
print(north_york_venues.shape)
north_york_venues.head()

(252, 7)


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Parkwoods,43.753259,-79.329656,Brookbanks Park,43.751976,-79.33214,Park
1,Parkwoods,43.753259,-79.329656,Variety Store,43.751974,-79.333114,Food & Drink Shop
2,Victoria Village,43.725882,-79.315572,Victoria Village Arena,43.723481,-79.315635,Hockey Arena
3,Victoria Village,43.725882,-79.315572,Portugril,43.725819,-79.312785,Portuguese Restaurant
4,Victoria Village,43.725882,-79.315572,Tim Hortons,43.725517,-79.313103,Coffee Shop


In [29]:
north_york_venues.groupby('Neighborhood').count()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
"Bathurst Manor, Wilson Heights, Downsview North",21,21,21,21,21,21
Bayview Village,4,4,4,4,4,4
"Bedford Park, Lawrence Manor East",25,25,25,25,25,25
Don Mills North,5,5,5,5,5,5
Don Mills South,22,22,22,22,22,22
Downsview Central,4,4,4,4,4,4
Downsview East,3,3,3,3,3,3
Downsview Northwest,5,5,5,5,5,5
Downsview West,6,6,6,6,6,6
"Fairview, Henry Farm, Oriole",64,64,64,64,64,64


In [30]:
print('There are {} uniques categories.'.format(len(north_york_venues['Venue Category'].unique())))

There are 98 uniques categories.


In [31]:
north_york_onehot = pd.get_dummies(north_york_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
north_york_onehot['Neighborhood'] = north_york_venues['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [north_york_onehot.columns[-1]] + list(north_york_onehot.columns[:-1])
north_york_onehot = north_york_onehot[fixed_columns]

north_york_onehot.head()

Unnamed: 0,Neighborhood,Accessories Store,Airport,American Restaurant,Art Gallery,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,Bakery,Bank,...,Sporting Goods Shop,Supermarket,Sushi Restaurant,Tea Room,Thai Restaurant,Theater,Toy / Game Store,Trail,Video Game Store,Vietnamese Restaurant
0,Parkwoods,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,Parkwoods,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,Victoria Village,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,Victoria Village,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,Victoria Village,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [32]:
north_york_onehot.shape

(252, 99)

In [33]:
north_york_grouped = north_york_onehot.groupby('Neighborhood').mean().reset_index()
north_york_grouped

Unnamed: 0,Neighborhood,Accessories Store,Airport,American Restaurant,Art Gallery,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,Bakery,Bank,...,Sporting Goods Shop,Supermarket,Sushi Restaurant,Tea Room,Thai Restaurant,Theater,Toy / Game Store,Trail,Video Game Store,Vietnamese Restaurant
0,"Bathurst Manor, Wilson Heights, Downsview North",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.095238,...,0.0,0.047619,0.047619,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,Bayview Village,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,"Bedford Park, Lawrence Manor East",0.0,0.0,0.04,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.04,0.0,0.04,0.0,0.0,0.0,0.0,0.0
3,Don Mills North,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,Don Mills South,0.0,0.0,0.0,0.045455,0.0,0.045455,0.0,0.0,0.0,...,0.045455,0.045455,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
5,Downsview Central,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
6,Downsview East,0.0,0.333333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
7,Downsview Northwest,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
8,Downsview West,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.166667,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
9,"Fairview, Henry Farm, Oriole",0.0,0.0,0.015625,0.0,0.0,0.015625,0.0,0.015625,0.03125,...,0.015625,0.0,0.0,0.015625,0.0,0.015625,0.03125,0.0,0.015625,0.0


In [34]:
north_york_grouped.shape

(24, 99)

In [35]:
num_top_venues = 5

for hood in north_york_grouped['Neighborhood']:
    print("----"+hood+"----")
    temp = north_york_grouped[north_york_grouped['Neighborhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----Bathurst Manor, Wilson Heights, Downsview North----
           venue  freq
0    Coffee Shop  0.10
1           Bank  0.10
2    Bridal Shop  0.05
3  Shopping Mall  0.05
4  Deli / Bodega  0.05


----Bayview Village----
                 venue  freq
0                 Café  0.25
1                 Bank  0.25
2  Japanese Restaurant  0.25
3   Chinese Restaurant  0.25
4    Accessories Store  0.00


----Bedford Park, Lawrence Manor East----
                     venue  freq
0           Sandwich Place  0.08
1              Coffee Shop  0.08
2       Italian Restaurant  0.08
3     Fast Food Restaurant  0.04
4  Comfort Food Restaurant  0.04


----Don Mills North----
                  venue  freq
0  Caribbean Restaurant   0.2
1    Athletics & Sports   0.2
2   Japanese Restaurant   0.2
3                  Café   0.2
4                   Gym   0.2


----Don Mills South----
             venue  freq
0      Coffee Shop  0.09
1              Gym  0.09
2       Restaurant  0.09
3  Bubble Tea Shop  0.05
4      

In [36]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [37]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = north_york_grouped['Neighborhood']

for ind in np.arange(north_york_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(north_york_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted.head()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,"Bathurst Manor, Wilson Heights, Downsview North",Bank,Coffee Shop,Shopping Mall,Pizza Place,Park,Bridal Shop,Mobile Phone Shop,Ice Cream Shop,Restaurant,Grocery Store
1,Bayview Village,Chinese Restaurant,Bank,Japanese Restaurant,Café,Vietnamese Restaurant,Deli / Bodega,Chocolate Shop,Clothing Store,Coffee Shop,Comfort Food Restaurant
2,"Bedford Park, Lawrence Manor East",Italian Restaurant,Coffee Shop,Sandwich Place,Pub,Grocery Store,Juice Bar,Liquor Store,Butcher,Indian Restaurant,Breakfast Spot
3,Don Mills North,Café,Gym,Athletics & Sports,Japanese Restaurant,Caribbean Restaurant,Department Store,Chocolate Shop,Clothing Store,Coffee Shop,Comfort Food Restaurant
4,Don Mills South,Restaurant,Coffee Shop,Gym,Grocery Store,Clothing Store,Italian Restaurant,Bubble Tea Shop,Bike Shop,Beer Store,Dim Sum Restaurant


<h3>Cluster Neighborhoods</h3>

In [38]:
# set number of clusters
kclusters = 5

north_york_grouped_clustering = north_york_grouped.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(north_york_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0])

In [39]:
# add clustering labels
neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

north_york_merged = borough_data

# merge north_york_grouped with borough_data to add latitude/longitude for each neighborhood
north_york_merged = north_york_merged.join(neighborhoods_venues_sorted.set_index('Neighborhood'), on='Neighborhood')

north_york_merged.head() # check the last columns!

Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,M3A,North York,Parkwoods,43.753259,-79.329656,1,Park,Food & Drink Shop,Deli / Bodega,Caribbean Restaurant,Chinese Restaurant,Chocolate Shop,Clothing Store,Coffee Shop,Comfort Food Restaurant,Construction & Landscaping
1,M4A,North York,Victoria Village,43.725882,-79.315572,3,Intersection,Pizza Place,Coffee Shop,Portuguese Restaurant,Hockey Arena,Vietnamese Restaurant,Chinese Restaurant,Chocolate Shop,Clothing Store,Comfort Food Restaurant
2,M6A,North York,"Lawrence Manor, Lawrence Heights",43.718518,-79.464763,4,Clothing Store,Furniture / Home Store,Accessories Store,Boutique,Coffee Shop,Vietnamese Restaurant,Tea Room,Cosmetics Shop,Café,Caribbean Restaurant
3,M3B,North York,Don Mills North,43.745906,-79.352188,0,Café,Gym,Athletics & Sports,Japanese Restaurant,Caribbean Restaurant,Department Store,Chocolate Shop,Clothing Store,Coffee Shop,Comfort Food Restaurant
4,M6B,North York,Glencairn,43.709577,-79.445073,0,Pizza Place,Sushi Restaurant,Bakery,Japanese Restaurant,Vietnamese Restaurant,Deli / Bodega,Chinese Restaurant,Chocolate Shop,Clothing Store,Coffee Shop


In [40]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(north_york_merged['Latitude'], north_york_merged['Longitude'], north_york_merged['Neighborhood'], north_york_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

<h3> Examining each cluster and determine the discriminating venue categories that distinguish each cluster </h3>
<h3> Cluster 1 </h3>

In [41]:
north_york_merged.loc[north_york_merged['Cluster Labels'] == 0, north_york_merged.columns[[1] + list(range(5, north_york_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
3,North York,0,Café,Gym,Athletics & Sports,Japanese Restaurant,Caribbean Restaurant,Department Store,Chocolate Shop,Clothing Store,Coffee Shop,Comfort Food Restaurant
4,North York,0,Pizza Place,Sushi Restaurant,Bakery,Japanese Restaurant,Vietnamese Restaurant,Deli / Bodega,Chinese Restaurant,Chocolate Shop,Clothing Store,Coffee Shop
5,North York,0,Restaurant,Coffee Shop,Gym,Grocery Store,Clothing Store,Italian Restaurant,Bubble Tea Shop,Bike Shop,Beer Store,Dim Sum Restaurant
6,North York,0,Golf Course,Mediterranean Restaurant,Fast Food Restaurant,Athletics & Sports,Dog Run,Pool,Vietnamese Restaurant,Cosmetics Shop,Chinese Restaurant,Chocolate Shop
7,North York,0,Bank,Coffee Shop,Shopping Mall,Pizza Place,Park,Bridal Shop,Mobile Phone Shop,Ice Cream Shop,Restaurant,Grocery Store
8,North York,0,Clothing Store,Coffee Shop,Fast Food Restaurant,Restaurant,Japanese Restaurant,Juice Bar,Toy / Game Store,Mobile Phone Shop,Bank,Video Game Store
9,North York,0,Furniture / Home Store,Falafel Restaurant,Massage Studio,Coffee Shop,Caribbean Restaurant,Bar,Vietnamese Restaurant,Electronics Store,Dog Run,Chinese Restaurant
10,North York,0,Chinese Restaurant,Bank,Japanese Restaurant,Café,Vietnamese Restaurant,Deli / Bodega,Chocolate Shop,Clothing Store,Coffee Shop,Comfort Food Restaurant
11,North York,0,Park,Airport,Business Service,Department Store,Chinese Restaurant,Chocolate Shop,Clothing Store,Coffee Shop,Comfort Food Restaurant,Construction & Landscaping
13,North York,0,Grocery Store,Hotel,Park,Bank,Shopping Mall,Department Store,Chinese Restaurant,Chocolate Shop,Clothing Store,Coffee Shop


<h3> Cluster 2</h3>

In [42]:
north_york_merged.loc[north_york_merged['Cluster Labels'] == 1, north_york_merged.columns[[1] + list(range(5, north_york_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,North York,1,Park,Food & Drink Shop,Deli / Bodega,Caribbean Restaurant,Chinese Restaurant,Chocolate Shop,Clothing Store,Coffee Shop,Comfort Food Restaurant,Construction & Landscaping
12,North York,1,Park,Butcher,Caribbean Restaurant,Chinese Restaurant,Chocolate Shop,Clothing Store,Coffee Shop,Comfort Food Restaurant,Construction & Landscaping,Convenience Store
16,North York,1,Park,Butcher,Caribbean Restaurant,Chinese Restaurant,Chocolate Shop,Clothing Store,Coffee Shop,Comfort Food Restaurant,Construction & Landscaping,Convenience Store
22,North York,1,Park,Convenience Store,Butcher,Caribbean Restaurant,Chinese Restaurant,Chocolate Shop,Clothing Store,Coffee Shop,Comfort Food Restaurant,Construction & Landscaping


<h3>Cluster 3</h3>

In [43]:
north_york_merged.loc[north_york_merged['Cluster Labels'] == 2, north_york_merged.columns[[1] + list(range(5, north_york_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
19,North York,2,Baseball Field,Vietnamese Restaurant,Dim Sum Restaurant,Chinese Restaurant,Chocolate Shop,Clothing Store,Coffee Shop,Comfort Food Restaurant,Construction & Landscaping,Convenience Store


<h3> Cluster 4</h3>

In [44]:
north_york_merged.loc[north_york_merged['Cluster Labels'] == 3, north_york_merged.columns[[1] + list(range(5, north_york_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
1,North York,3,Intersection,Pizza Place,Coffee Shop,Portuguese Restaurant,Hockey Arena,Vietnamese Restaurant,Chinese Restaurant,Chocolate Shop,Clothing Store,Comfort Food Restaurant
15,North York,3,Intersection,Pizza Place,Furniture / Home Store,Food Court,Cosmetics Shop,Caribbean Restaurant,Chinese Restaurant,Chocolate Shop,Food Truck,Clothing Store


<h3> Cluster 5</h3>

In [45]:
north_york_merged.loc[north_york_merged['Cluster Labels'] == 4, north_york_merged.columns[[1] + list(range(5, north_york_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
2,North York,4,Clothing Store,Furniture / Home Store,Accessories Store,Boutique,Coffee Shop,Vietnamese Restaurant,Tea Room,Cosmetics Shop,Café,Caribbean Restaurant


In [111]:
options_col = north_york_onehot.columns
biased_pref = pd.DataFrame(columns = ['Venue','Weights'])
biased_pref['Venue']=options_col
biased_pref['Weights'] = 0.0

biased_pref.set_index('Venue',inplace=True)

selection = ['Airport', 'Trail', 'Clothing Store', 'American Restaurant', 'Video Game Store']
weights = [1.0, 0.8, 0.6, 0.4, 0.2]

iter = 0
for select in selection:
    biased_pref.loc[select]['Weights'] = weights[iter]
    iter+=1
        

biased_pref

Unnamed: 0_level_0,Weights
Venue,Unnamed: 1_level_1
Neighborhood,0.0
Accessories Store,0.0
Airport,1.0
American Restaurant,0.4
Art Gallery,0.0
...,...
Theater,0.0
Toy / Game Store,0.0
Trail,0.8
Video Game Store,0.2
