### Exploring and clustering the neighborhoods in Toronto, Canada.

In this notebook, I am exploring the Toronto city data to find similar neighborhoods in 'Scarborogh, CA'<br>
It can be used to find a new home in the type of specific area they want where they can have their all need fulfilled.

<p> Installing and importing necessary libraries. </p>

In [1]:
!pip install BeautifulSoup4
import requests
from bs4 import BeautifulSoup  # To work with a HTML page
import pandas as pd
import numpy as np



Extracting content from a given url and stroring it using BeutifulfulSoup

In [2]:
URL = "https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M"
r = requests.get(URL) 
# print(r.content) 
soup = BeautifulSoup(r.content, 'html5lib') 
# print(soup.prettify()) 

Dictionary to store the content of a page so that it can be directly converted to Dataframe for an easy manipulation.

In [3]:
from collections import defaultdict
dic = defaultdict(list)

Extracting data from a table tag with help of BeautifulSoup and storing it in a dictionary

In [4]:
table = soup.table
rows = table.find_all('tr')
for r in rows:
    cols = r.find_all('td')
#     row = [ dict[]=i.text[:-1] for i in cols]
    for i in range(len(cols)):
        if i == 0:
            dic['Postal Code'].append(cols[i].text[:-1])
        if i == 1:
            dic['Borough'].append(cols[i].text[:-1])
        if i == 2:
            dic['Neighborhood'].append(cols[i].text[:-1])
# dic          

Converting Dictionary to Dataframe.

In [5]:
data = pd.DataFrame.from_dict(dic)
data.head()

Unnamed: 0,Postal Code,Borough,Neighborhood
0,M1A,Not assigned,
1,M2A,Not assigned,
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,"Regent Park, Harbourfront"


Extracting Boroughs which are 'Not assigned'

In [6]:
to_remove = data[data['Borough'] == 'Not assigned'].index

Dropping the Boroughs which are 'Not assigned'

In [7]:
data.drop(to_remove, inplace = True)
data

Unnamed: 0,Postal Code,Borough,Neighborhood
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,"Regent Park, Harbourfront"
5,M6A,North York,"Lawrence Manor, Lawrence Heights"
6,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government"
8,M9A,Etobicoke,"Islington Avenue, Humber Valley Village"
9,M1B,Scarborough,"Malvern, Rouge"
11,M3B,North York,Don Mills
12,M4B,East York,"Parkview Hill, Woodbine Gardens"
13,M5B,Downtown Toronto,"Garden District, Ryerson"


Resetting index.

In [8]:
data.reset_index(inplace = True)
data.head()

Unnamed: 0,index,Postal Code,Borough,Neighborhood
0,2,M3A,North York,Parkwoods
1,3,M4A,North York,Victoria Village
2,4,M5A,Downtown Toronto,"Regent Park, Harbourfront"
3,5,M6A,North York,"Lawrence Manor, Lawrence Heights"
4,6,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government"


Dropping the unnecessary index column.

In [9]:
data.drop('index', axis = 1)

Unnamed: 0,Postal Code,Borough,Neighborhood
0,M3A,North York,Parkwoods
1,M4A,North York,Victoria Village
2,M5A,Downtown Toronto,"Regent Park, Harbourfront"
3,M6A,North York,"Lawrence Manor, Lawrence Heights"
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government"
5,M9A,Etobicoke,"Islington Avenue, Humber Valley Village"
6,M1B,Scarborough,"Malvern, Rouge"
7,M3B,North York,Don Mills
8,M4B,East York,"Parkview Hill, Woodbine Gardens"
9,M5B,Downtown Toronto,"Garden District, Ryerson"


Shape of the final DataFrame.

In [10]:
data.shape

(103, 4)

Installing GeoCoder to get the latitude and longitude for Boroughs in 'data' Dataframe 

In [11]:
!pip install geocoder
import geocoder

Collecting geocoder
[?25l  Downloading https://files.pythonhosted.org/packages/4f/6b/13166c909ad2f2d76b929a4227c952630ebaf0d729f6317eb09cbceccbab/geocoder-1.38.1-py2.py3-none-any.whl (98kB)
[K     |████████████████████████████████| 102kB 9.8MB/s eta 0:00:01
Collecting ratelim (from geocoder)
  Downloading https://files.pythonhosted.org/packages/f2/98/7e6d147fd16a10a5f821db6e25f192265d6ecca3d82957a4fdd592cad49c/ratelim-0.1.6-py2.py3-none-any.whl
Installing collected packages: ratelim, geocoder
Successfully installed geocoder-1.38.1 ratelim-0.1.6


The below cell didn't worked as API was taking too much time.

In [12]:
# latitude = []
# longitude = []
# for pcode in data['Postal Code']:
#     lat_lng_coords = None
#     while(lat_lng_coords is None):
#       response = geocoder.google('{}, Toronto, Ontario'.format(pcode))
#       lat_lng_coords = response.latlng

#     latitude.append(lat_lng_coords[0])
#     longitude.append(lat_lng_coords[1])

So, I read latitudes and logitudes from a csv file

In [13]:
lat_long = pd.read_csv('http://cocl.us/Geospatial_data')
lat_long.head()

Unnamed: 0,Postal Code,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497
2,M1E,43.763573,-79.188711
3,M1G,43.770992,-79.216917
4,M1H,43.773136,-79.239476


Sorting data and removing any trailing white spaces in columns

In [14]:
data.columns = data.columns.str.strip()
data.sort_values('Postal Code', ascending = True, axis = 0, inplace = True)
data.head()

Unnamed: 0,index,Postal Code,Borough,Neighborhood
6,9,M1B,Scarborough,"Malvern, Rouge"
12,18,M1C,Scarborough,"Rouge Hill, Port Union, Highland Creek"
18,27,M1E,Scarborough,"Guildwood, Morningside, West Hill"
22,36,M1G,Scarborough,Woburn
26,45,M1H,Scarborough,Cedarbrae


Sorting lat_long inorder to merge it with data so each Borough get its latitude and longitude correctly

In [15]:
lat_long.columns = lat_long.columns.str.strip()
lat_long.sort_values('Postal Code', ascending = True, axis = 0, inplace = True)
lat_long.head()

Unnamed: 0,Postal Code,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497
2,M1E,43.763573,-79.188711
3,M1G,43.770992,-79.216917
4,M1H,43.773136,-79.239476


Adding Latitude and Longitude columns to 'data' dataframe.

In [16]:
data['Latitude'] = lat_long['Latitude']
data['Longitude'] = lat_long['Longitude']
data.head()

Unnamed: 0,index,Postal Code,Borough,Neighborhood,Latitude,Longitude
6,9,M1B,Scarborough,"Malvern, Rouge",43.727929,-79.262029
12,18,M1C,Scarborough,"Rouge Hill, Port Union, Highland Creek",43.7942,-79.262029
18,27,M1E,Scarborough,"Guildwood, Morningside, West Hill",43.778517,-79.346556
22,36,M1G,Scarborough,Woburn,43.77012,-79.408493
26,45,M1H,Scarborough,Cedarbrae,43.745906,-79.352188


In [17]:
data.reset_index(inplace = True)
data.head()

Unnamed: 0,level_0,index,Postal Code,Borough,Neighborhood,Latitude,Longitude
0,6,9,M1B,Scarborough,"Malvern, Rouge",43.727929,-79.262029
1,12,18,M1C,Scarborough,"Rouge Hill, Port Union, Highland Creek",43.7942,-79.262029
2,18,27,M1E,Scarborough,"Guildwood, Morningside, West Hill",43.778517,-79.346556
3,22,36,M1G,Scarborough,Woburn,43.77012,-79.408493
4,26,45,M1H,Scarborough,Cedarbrae,43.745906,-79.352188


Removing extra columns. It may not look exactly like given in assigment as it is sorted.

In [18]:
data.drop(['level_0', 'index'], axis = 1, inplace = True)
data.head(10)

Unnamed: 0,Postal Code,Borough,Neighborhood,Latitude,Longitude
0,M1B,Scarborough,"Malvern, Rouge",43.727929,-79.262029
1,M1C,Scarborough,"Rouge Hill, Port Union, Highland Creek",43.7942,-79.262029
2,M1E,Scarborough,"Guildwood, Morningside, West Hill",43.778517,-79.346556
3,M1G,Scarborough,Woburn,43.77012,-79.408493
4,M1H,Scarborough,Cedarbrae,43.745906,-79.352188
5,M1J,Scarborough,Scarborough Village,43.728496,-79.495697
6,M1K,Scarborough,"Kennedy Park, Ionview, East Birchmount Park",43.70906,-79.363452
7,M1L,Scarborough,"Golden Mile, Clairlea, Oakridge",43.72802,-79.38879
8,M1M,Scarborough,"Cliffside, Cliffcrest, Scarborough Village West",43.667967,-79.367675
9,M1N,Scarborough,"Birch Cliff, Cliffside West",43.650571,-79.384568


Extractin Boroughs which contains Toronto in its name

In [19]:
temp = data['Borough'].str.contains('Scarborough')
temp.value_counts()

False    86
True     17
Name: Borough, dtype: int64

Storing Boroughs which contains Toronto

In [20]:
data_scar = data[temp].reset_index(drop=True)
data_scar.head()

Unnamed: 0,Postal Code,Borough,Neighborhood,Latitude,Longitude
0,M1B,Scarborough,"Malvern, Rouge",43.727929,-79.262029
1,M1C,Scarborough,"Rouge Hill, Port Union, Highland Creek",43.7942,-79.262029
2,M1E,Scarborough,"Guildwood, Morningside, West Hill",43.778517,-79.346556
3,M1G,Scarborough,Woburn,43.77012,-79.408493
4,M1H,Scarborough,Cedarbrae,43.745906,-79.352188


Importing libraries for getting latitude and longitude, reading json, plotting and for KMeans

In [21]:
import json
# !conda install -c conda-forge geopy --yes # uncomment this line if you haven't completed the Foursquare API lab
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

print('Libraries imported.')

Libraries imported.


To plot interactive maps

In [22]:
!pip install folium # uncomment this line if you haven't completed the Foursquare API lab
import folium # map rendering library

Collecting folium
[?25l  Downloading https://files.pythonhosted.org/packages/a4/f0/44e69d50519880287cc41e7c8a6acc58daa9a9acf5f6afc52bcc70f69a6d/folium-0.11.0-py2.py3-none-any.whl (93kB)
[K     |████████████████████████████████| 102kB 18.2MB/s ta 0:00:01
Collecting branca>=0.3.0 (from folium)
  Downloading https://files.pythonhosted.org/packages/13/fb/9eacc24ba3216510c6b59a4ea1cd53d87f25ba76237d7f4393abeaf4c94e/branca-0.4.1-py3-none-any.whl
Installing collected packages: branca, folium
Successfully installed branca-0.4.1 folium-0.11.0


Getting latitude and longitude of 'Scarborough, CA'

In [23]:
address = 'Scarborough, CA'
geolocater = Nominatim(user_agent = "scar_explorer")
location = geolocater.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('Coordinates of Scarborough are {}, {}'.format(latitude, longitude))

Coordinates of Scarborough are 43.773077, -79.257774


Plotting Map of world point and keeping corrdinates of 'Scarborogh, CA' as centre and labelling Neighborhoods. 

In [79]:
map_scar = folium.Map(location = [latitude, longitude], zoom_start = 12)

for lat, lang, label in zip(data_scar['Latitude'], data_scar['Longitude'], data_scar['Neighborhood']):
    label = folium.Popup(label, parse_html = True)
    folium.CircleMarker(
        [lat, lang],
        radius=5,
        popup=label,
        color='Blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_scar)
    
map_scar    

The following cell should contain your Foursquare ID and SECRET

In [49]:
CLIENT_ID = '' # your Foursquare ID
CLIENT_SECRET = '' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: 
CLIENT_SECRET:


In [26]:
data_scar.loc[0, 'Neighborhood'].split(',')[0]

'Malvern'

In [27]:
neigh_lat = data_scar.loc[0, 'Latitude']
neigh_long = data_scar.loc[0, 'Longitude']
neigh_name = data_scar.loc[0, 'Neighborhood'].split(',')[0]

print('The latitude and longitude of {} are {}, {}.'.format(neigh_name, neigh_lat, neigh_long))

The latitude and longitude of Malvern are 43.7279292, -79.26202940000002.


Creating URL for getting data from Foursquare API.
It contains ID, SECRET, latitude, longitude, version, radius and limit of output to get.

In [28]:
LIMIT = 100
radius = 500
url = 'https://api.foursquare.com/v2/venues/explore?client_id={}&client_secret={}&ll={},{}&v={}&radius={}&limit={}'.format(
    CLIENT_ID, CLIENT_SECRET, neigh_lat, neigh_long, VERSION, radius, LIMIT)
url

'https://api.foursquare.com/v2/venues/explore?client_id=K0S5WC0VPH3FXOJXNVY1WKEZOSTJCUJBGFMT52TQ2BHAT3MX&client_secret=IJ34AI144DDJLZ2W3ANDQ0NJSD2XMM01YHY3K0JHBRDLM0WX&ll=43.7279292,-79.26202940000002&v=20180605&radius=500&limit=100'

Creating a request for the above URL and storing output in JSON format.

In [29]:
results = requests.get(url).json()
results

{'meta': {'code': 200, 'requestId': '5ed129376001fe001b7ff902'},
 'response': {'suggestedFilters': {'header': 'Tap to show:',
   'filters': [{'name': 'Open now', 'key': 'openNow'}]},
  'headerLocation': 'Toronto',
  'headerFullLocation': 'Toronto',
  'headerLocationGranularity': 'city',
  'totalResults': 7,
  'suggestedBounds': {'ne': {'lat': 43.7324292045, 'lng': -79.25581377000155},
   'sw': {'lat': 43.723429195499996, 'lng': -79.26824502999848}},
  'groups': [{'type': 'Recommended Places',
    'name': 'recommended',
    'items': [{'reasons': {'count': 0,
       'items': [{'summary': 'This spot is popular',
         'type': 'general',
         'reasonName': 'globalInteractionReason'}]},
      'venue': {'id': '4b6a37ccf964a520a5cd2be3',
       'name': 'Giant Tiger',
       'location': {'address': '682 Kennedy Road',
        'crossStreet': 'Eglinton Ave. E.',
        'lat': 43.72744662939136,
        'lng': -79.26624035854763,
        'labeledLatLngs': [{'label': 'display',
          '

Function to get categories of venues from above output.

In [30]:
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

Getting near by venues to a borough.

In [31]:
venues = results['response']['groups'][0]['items']
    
nearby_venues = json_normalize(venues) # flatten JSON

# filter columns
filtered_columns = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
nearby_venues =nearby_venues.loc[:, filtered_columns]

nearby_venues['venue.categories'] = nearby_venues.apply(get_category_type, axis=1)

# clean columns
nearby_venues.columns = [col.split(".")[-1] for col in nearby_venues.columns]

nearby_venues.head()

Unnamed: 0,name,categories,lat,lng
0,Giant Tiger,Department Store,43.727447,-79.26624
1,Tim Hortons,Coffee Shop,43.726895,-79.266157
2,Bros. CONVENIENCE,Convenience Store,43.727781,-79.265708
3,Dollarama,Discount Store,43.727092,-79.265784
4,Tandy Leather,Hobby Shop,43.726974,-79.266513


For all the places in Scaarborough find the places.

In [32]:
def getNearbyVenues(names, latitudes, longitudes, radius = 500):
    venues = []
    for name, lat, long in zip(names, latitudes, longitudes):
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            long, 
            radius, 
            LIMIT)
        
        results = requests.get(url).json()['response']['groups'][0]['items']
        
        venues.append([
            (name,
            lat,
            long,
            v['venue']['name'],
            v['venue']['location']['lat'],
            v['venue']['location']['lng'],
            v['venue']['categories'][0]['name']) for v in results])
        
    nearby_venues = pd.DataFrame([item for venue_list in venues for item in venue_list])
        
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    return(nearby_venues) 
        

Getting near by places for all boroughs in Toronto city data.

In [33]:
scar_venues = getNearbyVenues(names = data_scar['Neighborhood'],
               latitudes = data_scar['Latitude'],
               longitudes = data_scar['Longitude'])

In [34]:
print(scar_venues.shape)
scar_venues.head()

(418, 7)


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,"Malvern, Rouge",43.727929,-79.262029,Giant Tiger,43.727447,-79.26624,Department Store
1,"Malvern, Rouge",43.727929,-79.262029,Tim Hortons,43.726895,-79.266157,Coffee Shop
2,"Malvern, Rouge",43.727929,-79.262029,Bros. CONVENIENCE,43.727781,-79.265708,Convenience Store
3,"Malvern, Rouge",43.727929,-79.262029,Dollarama,43.727092,-79.265784,Discount Store
4,"Malvern, Rouge",43.727929,-79.262029,Tandy Leather,43.726974,-79.266513,Hobby Shop


In [35]:
scar_venues.groupby('Neighborhood').count()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Agincourt,23,23,23,23,23,23
"Birch Cliff, Cliffside West",91,91,91,91,91,91
Cedarbrae,6,6,6,6,6,6
"Clarks Corners, Tam O'Shanter, Sullivan",24,24,24,24,24,24
"Cliffside, Cliffcrest, Scarborough Village West",43,43,43,43,43,43
"Dorset Park, Wexford Heights, Scarborough Town Centre",23,23,23,23,23,23
"Golden Mile, Clairlea, Oakridge",3,3,3,3,3,3
"Guildwood, Morningside, West Hill",65,65,65,65,65,65
"Kennedy Park, Ionview, East Birchmount Park",33,33,33,33,33,33
"Malvern, Rouge",7,7,7,7,7,7


Encoding all the categories and creating a new dataframe which contains this categories and neighborhood coloumn from scar_venues dataframe.

In [36]:
scar_venues_onehot = pd.get_dummies(scar_venues[['Venue Category']], prefix = "", prefix_sep = "")
scar_venues_onehot['Neighborhood'] = scar_venues['Neighborhood']
col = scar_venues_onehot.columns.tolist()
ind = col.index('Neighborhood')
fixed_columns = [scar_venues_onehot.columns[94]] + list(scar_venues_onehot.columns[0:94]) + list(scar_venues_onehot.columns[95:])
scar_venues_onehot = scar_venues_onehot[fixed_columns]
scar_venues_onehot.head()
# print(len(col))
# print(scar_venues_onehot.shape)


Unnamed: 0,Neighborhood,Accessories Store,American Restaurant,Art Gallery,Art Museum,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,BBQ Joint,Bagel Shop,...,Tea Room,Thai Restaurant,Theater,Toy / Game Store,Vegetarian / Vegan Restaurant,Video Game Store,Vietnamese Restaurant,Wings Joint,Women's Store,Yoga Studio
0,"Malvern, Rouge",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,"Malvern, Rouge",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,"Malvern, Rouge",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,"Malvern, Rouge",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,"Malvern, Rouge",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


Grouping data on 'Neighborhood' and getting mean for each neighborhood.

In [37]:
scar_grouped = scar_venues_onehot.groupby('Neighborhood').mean().reset_index()
scar_grouped

Unnamed: 0,Neighborhood,Accessories Store,American Restaurant,Art Gallery,Art Museum,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,BBQ Joint,Bagel Shop,...,Tea Room,Thai Restaurant,Theater,Toy / Game Store,Vegetarian / Vegan Restaurant,Video Game Store,Vietnamese Restaurant,Wings Joint,Women's Store,Yoga Studio
0,Agincourt,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,"Birch Cliff, Cliffside West",0.0,0.021978,0.010989,0.010989,0.0,0.010989,0.0,0.0,0.0,...,0.0,0.032967,0.010989,0.0,0.010989,0.0,0.0,0.0,0.010989,0.0
2,Cedarbrae,0.0,0.0,0.0,0.0,0.0,0.0,0.166667,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,"Clarks Corners, Tam O'Shanter, Sullivan",0.0,0.0,0.0,0.0,0.083333,0.0,0.0,0.0,0.0,...,0.0,0.083333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,"Cliffside, Cliffcrest, Scarborough Village West",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.023256,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
5,"Dorset Park, Wexford Heights, Scarborough Town...",0.0,0.043478,0.0,0.0,0.0,0.0,0.0,0.043478,0.0,...,0.0,0.0,0.0,0.0,0.043478,0.0,0.0,0.0,0.0,0.0
6,"Golden Mile, Clairlea, Oakridge",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
7,"Guildwood, Morningside, West Hill",0.0,0.015385,0.0,0.0,0.0,0.015385,0.0,0.0,0.0,...,0.030769,0.0,0.015385,0.030769,0.0,0.015385,0.0,0.0,0.061538,0.0
8,"Kennedy Park, Ionview, East Birchmount Park",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.030303,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
9,"Malvern, Rouge",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [38]:
temp = scar_grouped[scar_grouped['Neighborhood'] == 'Agincourt'].T.reset_index()
temp.columns = ['venue','freq']
temp = temp.iloc[1:]
temp

Unnamed: 0,venue,freq
1,Accessories Store,0
2,American Restaurant,0
3,Art Gallery,0
4,Art Museum,0
5,Arts & Crafts Store,0
6,Asian Restaurant,0
7,Athletics & Sports,0
8,BBQ Joint,0
9,Bagel Shop,0
10,Bakery,0.0434783


Finding out top 5 places for each neighborhood based on frequency.

In [39]:
num_top_venues = 5

for hd in scar_grouped['Neighborhood']:
    print("----"+ hd + "----")
    temp = scar_grouped[scar_grouped['Neighborhood'] == hd].T.reset_index()
    temp.columns = ['venue', 'freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq' : 2})
    print(temp.sort_values('freq', ascending = False).reset_index(drop = True).head(num_top_venues))

----Agincourt----
                venue  freq
0                Café  0.13
1      Breakfast Spot  0.09
2         Coffee Shop  0.09
3           Nightclub  0.09
4  Italian Restaurant  0.04
----Birch Cliff, Cliffside West----
           venue  freq
0    Coffee Shop  0.10
1           Café  0.05
2     Restaurant  0.04
3            Gym  0.03
4  Deli / Bodega  0.03
----Cedarbrae----
                  venue  freq
0  Gym / Fitness Center  0.17
1    Athletics & Sports  0.17
2   Japanese Restaurant  0.17
3                  Café  0.17
4  Caribbean Restaurant  0.17
----Clarks Corners, Tam O'Shanter, Sullivan----
                 venue  freq
0                 Café  0.08
1  Arts & Crafts Store  0.08
2   Mexican Restaurant  0.08
3      Thai Restaurant  0.08
4                 Park  0.04
----Cliffside, Cliffcrest, Scarborough Village West----
                venue  freq
0         Coffee Shop  0.07
1                 Pub  0.05
2                Café  0.05
3  Italian Restaurant  0.05
4         Pizza Place  0

Function that returns specified number of top places for each neighborhood. 

In [40]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

Creating dataframe which stores Top 10 places for each neighborhood using <br>
return_most_common_venues().

In [41]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))
        

neigh_venues_sorted = pd.DataFrame(columns = columns)
neigh_venues_sorted['Neighborhood'] = scar_grouped['Neighborhood']

for ind in np.arange(scar_grouped.shape[0]):
    neigh_venues_sorted.iloc[ind, 1:] = return_most_common_venues(scar_grouped.iloc[ind, :], num_top_venues)

neigh_venues_sorted.head()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Agincourt,Café,Breakfast Spot,Nightclub,Coffee Shop,Intersection,Pet Store,Convenience Store,Climbing Gym,Restaurant,Burrito Place
1,"Birch Cliff, Cliffside West",Coffee Shop,Café,Restaurant,Clothing Store,Hotel,Thai Restaurant,Deli / Bodega,Gym,Bookstore,Pizza Place
2,Cedarbrae,Gym / Fitness Center,Caribbean Restaurant,Baseball Field,Café,Athletics & Sports,Japanese Restaurant,Cupcake Shop,Creperie,Deli / Bodega,Cosmetics Shop
3,"Clarks Corners, Tam O'Shanter, Sullivan",Mexican Restaurant,Café,Arts & Crafts Store,Thai Restaurant,Italian Restaurant,Diner,Liquor Store,Fast Food Restaurant,Cajun / Creole Restaurant,Speakeasy
4,"Cliffside, Cliffcrest, Scarborough Village West",Coffee Shop,Restaurant,Café,Pub,Pizza Place,Italian Restaurant,Chinese Restaurant,Bakery,Pharmacy,Market


Initializing KMeans for finding out clusters that have same kind of places in Top 10.

In [71]:
clusters = 2

scar_clustering = scar_grouped.drop('Neighborhood', 1)

kmeans = KMeans(n_clusters = clusters, random_state = 4).fit(scar_clustering)

kmeans.labels_[0:10]


array([1, 1, 1, 1, 1, 1, 0, 1, 1, 1], dtype=int32)

In [72]:
neigh_venues_sorted['Cluster Labels'] = kmeans.labels_

scar_merged = data_scar

scar_merged = scar_merged.join(neigh_venues_sorted.set_index('Neighborhood'), on='Neighborhood')

scar_merged.head()

Unnamed: 0,Postal Code,Borough,Neighborhood,Latitude,Longitude,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue,Cluster Labels
0,M1B,Scarborough,"Malvern, Rouge",43.727929,-79.262029,Discount Store,Hobby Shop,Bus Station,Department Store,Coffee Shop,Convenience Store,Cosmetics Shop,Creperie,Cupcake Shop,Event Space,1
1,M1C,Scarborough,"Rouge Hill, Port Union, Highland Creek",43.7942,-79.262029,Lounge,Latin American Restaurant,Skating Rink,Breakfast Spot,Yoga Studio,Dessert Shop,Donut Shop,Distribution Center,Discount Store,Diner,1
2,M1E,Scarborough,"Guildwood, Morningside, West Hill",43.778517,-79.346556,Clothing Store,Coffee Shop,Women's Store,Fast Food Restaurant,Japanese Restaurant,Shoe Store,Bank,Toy / Game Store,Tea Room,Restaurant,1
3,M1G,Scarborough,Woburn,43.77012,-79.408493,Ramen Restaurant,Coffee Shop,Restaurant,Sandwich Place,Café,Sushi Restaurant,Pizza Place,Ice Cream Shop,Shopping Mall,Bubble Tea Shop,1
4,M1H,Scarborough,Cedarbrae,43.745906,-79.352188,Gym / Fitness Center,Caribbean Restaurant,Baseball Field,Café,Athletics & Sports,Japanese Restaurant,Cupcake Shop,Creperie,Deli / Bodega,Cosmetics Shop,1


<p> Plotting created clusters on a map using folium. </p>

In [73]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(clusters)
ys = [i + x + (i*x)**2 for i in range(clusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(scar_merged['Latitude'], scar_merged['Longitude'], scar_merged['Neighborhood'], scar_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

The following 3 cells contain the 3 clusters that were created using KMeans.<br>
You can find obervation at the end of the notebook.
<br>
<br>

In [76]:
scar_merged.loc[scar_merged['Cluster Labels'] == 0, scar_merged.columns[[2] + list(range(5, scar_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue,Cluster Labels
7,"Golden Mile, Clairlea, Oakridge",Bus Line,Park,Swim School,Deli / Bodega,Distribution Center,Discount Store,Diner,Dessert Shop,Department Store,Yoga Studio,0
15,"Steeles West, L'Amoreaux West",River,Park,Yoga Studio,Cupcake Shop,Distribution Center,Discount Store,Diner,Dessert Shop,Department Store,Deli / Bodega,0


In [77]:
scar_merged.loc[scar_merged['Cluster Labels'] == 1, scar_merged.columns[[2] + list(range(5, scar_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue,Cluster Labels
0,"Malvern, Rouge",Discount Store,Hobby Shop,Bus Station,Department Store,Coffee Shop,Convenience Store,Cosmetics Shop,Creperie,Cupcake Shop,Event Space,1
1,"Rouge Hill, Port Union, Highland Creek",Lounge,Latin American Restaurant,Skating Rink,Breakfast Spot,Yoga Studio,Dessert Shop,Donut Shop,Distribution Center,Discount Store,Diner,1
2,"Guildwood, Morningside, West Hill",Clothing Store,Coffee Shop,Women's Store,Fast Food Restaurant,Japanese Restaurant,Shoe Store,Bank,Toy / Game Store,Tea Room,Restaurant,1
3,Woburn,Ramen Restaurant,Coffee Shop,Restaurant,Sandwich Place,Café,Sushi Restaurant,Pizza Place,Ice Cream Shop,Shopping Mall,Bubble Tea Shop,1
4,Cedarbrae,Gym / Fitness Center,Caribbean Restaurant,Baseball Field,Café,Athletics & Sports,Japanese Restaurant,Cupcake Shop,Creperie,Deli / Bodega,Cosmetics Shop,1
5,Scarborough Village,Home Service,Food Truck,Baseball Field,Korean Restaurant,Yoga Studio,Dessert Shop,Donut Shop,Distribution Center,Discount Store,Diner,1
6,"Kennedy Park, Ionview, East Birchmount Park",Coffee Shop,Sporting Goods Shop,Burger Joint,Furniture / Home Store,Bank,Sports Bar,Sandwich Place,Liquor Store,Brewery,Shopping Mall,1
8,"Cliffside, Cliffcrest, Scarborough Village West",Coffee Shop,Restaurant,Café,Pub,Pizza Place,Italian Restaurant,Chinese Restaurant,Bakery,Pharmacy,Market,1
9,"Birch Cliff, Cliffside West",Coffee Shop,Café,Restaurant,Clothing Store,Hotel,Thai Restaurant,Deli / Bodega,Gym,Bookstore,Pizza Place,1
10,"Dorset Park, Wexford Heights, Scarborough Town...",Café,Sandwich Place,Coffee Shop,Flower Shop,Pharmacy,Pizza Place,Pub,Cosmetics Shop,Middle Eastern Restaurant,Burger Joint,1


<br><br><br>
Observations:<br> 
cluster 0: Due to small dataset we only have 2 rows but it contains many simlilar places such as park, distribution center, yoga studio and so on.<br>
Cluster 1: It contains places which have coffee shops, cafes, Restaurants in Top 10 places.

So, if you are a person who is fond of eating different food, neighborhoods in cluster 1 can suit your need best.<br>
But, if you want to spend there time doing yoga or wondering in a park you can choose neighborhoods in cluster 0.