# Segmenting and Clustering Neighborhoods in Toronto

In [1]:
import numpy as np # library to handle data in a vectorized manner

import pandas as pd # library for data analsysis
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import json # library to handle JSON files


!conda install -c conda-forge geopy --yes # uncomment this line if you haven't completed the Foursquare API lab

from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

!conda install -c conda-forge folium=0.5.0 --yes # uncomment this line if you haven't completed the Foursquare API lab
import folium # map rendering library

print('Libraries imported.')

Collecting package metadata (current_repodata.json): failed

NotWritableError: The current user does not have write permissions to a required path.
  path: /home/lorena/.conda/pkgs/urls.txt
  uid: 1000
  gid: 1000

If you feel that permissions on this path are set incorrectly, you can manually
change them by executing

  $ sudo chown 1000:1000 /home/lorena/.conda/pkgs/urls.txt

In general, it's not advisable to use 'sudo conda'.


Collecting package metadata (current_repodata.json): failed

NotWritableError: The current user does not have write permissions to a required path.
  path: /home/lorena/.conda/pkgs/urls.txt
  uid: 1000
  gid: 1000

If you feel that permissions on this path are set incorrectly, you can manually
change them by executing

  $ sudo chown 1000:1000 /home/lorena/.conda/pkgs/urls.txt

In general, it's not advisable to use 'sudo conda'.


Libraries imported.


## Obtaining the list from Wikipedia

In [2]:
toronto_wiki = pd.read_html('https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M')[0]
print(toronto_wiki.shape)
toronto_wiki.head()

(180, 3)


Unnamed: 0,Postal Code,Borough,Neighborhood
0,M1A,Not assigned,Not assigned
1,M2A,Not assigned,Not assigned
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,"Regent Park, Harbourfront"


## Instructions
The dataframe will consist of three columns: PostalCode, Borough, and Neighborhood
   * Only process the cells that have an assigned borough. Ignore cells with a borough that is Not assigned.
   
   * More than one neighborhood can exist in one postal code area. For example, in the table on the Wikipedia page, you will notice that M5A is listed twice and has two neighborhoods: Harbourfront and Regent Park. These two rows will be combined into one row with the neighborhoods separated with a comma as shown in row 11 in the above table.

   * If a cell has a borough but a Not assigned neighborhood, then the neighborhood will be the same as the borough.
   * Clean your Notebook and add Markdown cells to explain your work and any assumptions you are making.
   * In the last cell of your notebook, use the .shape method to print the number of rows of your dataframe.

### Procesing data

In [3]:
#Droping cells with Not Assigned Boroughs
toronto_wiki.drop(toronto_wiki[toronto_wiki['Borough']=='Not assigned']. index, axis=0, inplace=True)

#Assigning any neighborhood with Not Assigned the Borough name
for Neighborhood in toronto_wiki['Neighborhood']:
        toronto_wiki['Neighborhood'].replace(to_replace= 'Not assigned', value = toronto_wiki['Borough'])
        
#Reseting index
toronto_wiki.reset_index(drop=True, inplace=True)

#Shape of the new dataframe       
print(toronto_wiki.shape)
toronto_wiki.head(11)

(103, 3)


Unnamed: 0,Postal Code,Borough,Neighborhood
0,M3A,North York,Parkwoods
1,M4A,North York,Victoria Village
2,M5A,Downtown Toronto,"Regent Park, Harbourfront"
3,M6A,North York,"Lawrence Manor, Lawrence Heights"
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government"
5,M9A,Etobicoke,"Islington Avenue, Humber Valley Village"
6,M1B,Scarborough,"Malvern, Rouge"
7,M3B,North York,Don Mills
8,M4B,East York,"Parkview Hill, Woodbine Gardens"
9,M5B,Downtown Toronto,"Garden District, Ryerson"


# <span style="color:blue">First Commit</span> 

Now that you have built a dataframe of the postal code of each neighborhood along with the borough name and neighborhood name, in order to utilize the Foursquare location data, we need to get the latitude and the longitude coordinates of each neighborhood.

In an older version of this course, we were leveraging the Google Maps Geocoding API to get the latitude and the longitude coordinates of each neighborhood. However, recently Google started charging for their API: http://geoawesomeness.com/developers-up-in-arms-over-google-maps-api-insane-price-hike/, so we will use the Geocoder Python package instead: https://geocoder.readthedocs.io/index.html.

The problem with this Package is you have to be persistent sometimes in order to get the geographical coordinates of a given postal code. So you can make a call to get the latitude and longitude coordinates of a given postal code and the result would be None, and then make the call again and you would get the coordinates. So, in order to make sure that you get the coordinates for all of our neighborhoods, you can run a while loop for each postal code.

Given that this package can be very unreliable, in case you are not able to get the geographical coordinates of the neighborhoods using the Geocoder package, here is a link to a csv file that has the geographical coordinates of each postal code: http://cocl.us/Geospatial_data

## Obtaining Geospatial coordinates

In [4]:
!wget -q -O 'geospatial_data.csv' http://cocl.us/Geospatial_data
geo_data=pd.read_csv('geospatial_data.csv')
geo_data.head()

Unnamed: 0,Postal Code,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497
2,M1E,43.763573,-79.188711
3,M1G,43.770992,-79.216917
4,M1H,43.773136,-79.239476


## Joining two data frames

In [5]:
Toronto = pd.merge(toronto_wiki, geo_data, on='Postal Code', how='inner')
print(Toronto.shape)
Toronto.head()

(103, 5)


Unnamed: 0,Postal Code,Borough,Neighborhood,Latitude,Longitude
0,M3A,North York,Parkwoods,43.753259,-79.329656
1,M4A,North York,Victoria Village,43.725882,-79.315572
2,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.65426,-79.360636
3,M6A,North York,"Lawrence Manor, Lawrence Heights",43.718518,-79.464763
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",43.662301,-79.389494


# <span style="color:blue">Second Commit</span> 

Explore and cluster the neighborhoods in Toronto. You can decide to work with only boroughs that contain the word Toronto and then replicate the same analysis we did to the New York City data. It is up to you.

Just make sure:

   * To add enough Markdown cells to explain what you decided to do and to report any observations you make.
   * To generate maps to visualize your neighborhoods and how they cluster together. 

Once you are happy with your analysis, submit a link to the new Notebook on your Github repository. (3 marks)

# Choose new dataframe

In [6]:
# I will choose the borough with more information to work with ie. the biggest borough
Toronto['Borough'].value_counts()

North York          24
Downtown Toronto    19
Scarborough         17
Etobicoke           12
Central Toronto      9
West Toronto         6
York                 5
East Toronto         5
East York            5
Mississauga          1
Name: Borough, dtype: int64

In [7]:
# Create new dataframe of the Borough North York

ny_data = Toronto[Toronto['Borough'] == 'North York'].reset_index(drop=True)
ny_data.drop(columns=['Borough'], inplace=True)
print(ny_data.shape)
ny_data.head()

(24, 4)


Unnamed: 0,Postal Code,Neighborhood,Latitude,Longitude
0,M3A,Parkwoods,43.753259,-79.329656
1,M4A,Victoria Village,43.725882,-79.315572
2,M6A,"Lawrence Manor, Lawrence Heights",43.718518,-79.464763
3,M3B,Don Mills,43.745906,-79.352188
4,M6B,Glencairn,43.709577,-79.445073


In [8]:
ny_data['Postal Code'].nunique()

24

## Create a map of North York

In [9]:
#North York Latitude and Longitude
latitude =43.761539
longitude=-79.411079

# create map of North York using latitude and longitude values
map_northyork = folium.Map(location=[latitude, longitude], zoom_start=10)

# add markers to map
for lat, lng, label, post in zip(ny_data['Latitude'], ny_data['Longitude'], ny_data['Neighborhood'], ny_data['Postal Code']):
    label = '{}, {}'.format(post, label)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_northyork)  
    
map_northyork

## Foursquare Credentials

In [10]:
CLIENT_ID = '32TMXOPLKYMNKCHQB5FX0UWRG2TAVTPM13K0GLVZYLJZBEUG' # your Foursquare ID
CLIENT_SECRET = 'P5IYLFARAC1JI0DZNUWNZJ2R3MA0CR0CPK5PR0HMTHI0PL1T' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: 32TMXOPLKYMNKCHQB5FX0UWRG2TAVTPM13K0GLVZYLJZBEUG
CLIENT_SECRET:P5IYLFARAC1JI0DZNUWNZJ2R3MA0CR0CPK5PR0HMTHI0PL1T


## Top 50 venues that are in North York within a radius of 1 km

In [11]:
LIMIT = 50
radius = 1000

url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
    CLIENT_ID, 
    CLIENT_SECRET, 
    VERSION, 
    latitude, 
    longitude, 
    radius, 
    LIMIT)
url

'https://api.foursquare.com/v2/venues/explore?&client_id=32TMXOPLKYMNKCHQB5FX0UWRG2TAVTPM13K0GLVZYLJZBEUG&client_secret=P5IYLFARAC1JI0DZNUWNZJ2R3MA0CR0CPK5PR0HMTHI0PL1T&v=20180605&ll=43.761539,-79.411079&radius=1000&limit=50'

In [12]:
results = requests.get(url).json()
results

{'meta': {'code': 200, 'requestId': '5ee19ef3c6a68d348cafd8f1'},
 'response': {'suggestedFilters': {'header': 'Tap to show:',
   'filters': [{'name': 'Open now', 'key': 'openNow'}]},
  'headerLocation': 'Willowdale',
  'headerFullLocation': 'Willowdale, Toronto',
  'headerLocationGranularity': 'neighborhood',
  'totalResults': 95,
  'suggestedBounds': {'ne': {'lat': 43.77053900900001,
    'lng': -79.39864075856647},
   'sw': {'lat': 43.75253899099999, 'lng': -79.42351724143353}},
  'groups': [{'type': 'Recommended Places',
    'name': 'recommended',
    'items': [{'reasons': {'count': 0,
       'items': [{'summary': 'This spot is popular',
         'type': 'general',
         'reasonName': 'globalInteractionReason'}]},
      'venue': {'id': '55665a0b498ec5589987b1f7',
       'name': 'Kinka Izakaya',
       'location': {'address': '4775 Yonge Street, Unit #114',
        'crossStreet': 'Sheppard Ave.',
        'lat': 43.76016102214242,
        'lng': -79.40982686116466,
        'labeledL

In [13]:
# Using a function that extracts the category of the venue

def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

In [14]:

venues = results['response']['groups'][0]['items']
    
nearby_venues = json_normalize(venues) # flatten JSON

# filter columns
filtered_columns = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
nearby_venues =nearby_venues.loc[:, filtered_columns]

# filter the category for each row
nearby_venues['venue.categories'] = nearby_venues.apply(get_category_type, axis=1)

# clean columns
nearby_venues.columns = [col.split(".")[-1] for col in nearby_venues.columns]

print(nearby_venues.shape)
nearby_venues.head()

(50, 4)


Unnamed: 0,name,categories,lat,lng
0,Kinka Izakaya,Japanese Restaurant,43.760161,-79.409827
1,EatBKK,Thai Restaurant,43.75932,-79.410454
2,Sushi Moto Sake & Wine Bar,Sushi Restaurant,43.763902,-79.411559
3,Longo’s Sheppard Centre,Grocery Store,43.762221,-79.410762
4,Pizzaiolo,Pizza Place,43.764289,-79.41178


## Now I will separate the info according to its postal code

In [15]:
# This function repeat the search for each Neighborhood or postal code

def getNearbyVenues(names, latitudes, longitudes, radius=1000):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Postal Code', 
                  'PC Latitude', 
                  'PC Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [16]:
#Venues for North York for each postal code

northyork_venues = getNearbyVenues(names=ny_data['Postal Code'],
                                   latitudes=ny_data['Latitude'],
                                   longitudes=ny_data['Longitude']
                                  )


M3A
M4A
M6A
M3B
M6B
M3C
M2H
M3H
M2J
M3J
M2K
M3K
M2L
M3L
M6L
M9L
M2M
M3M
M5M
M9M
M2N
M3N
M2P
M2R


In [17]:
#Dataframe with venues
print(northyork_venues['Postal Code'].nunique())
print(northyork_venues.shape)
northyork_venues.head()

24
(589, 7)


Unnamed: 0,Postal Code,PC Latitude,PC Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,M3A,43.753259,-79.329656,Allwyn's Bakery,43.75984,-79.324719,Caribbean Restaurant
1,M3A,43.753259,-79.329656,Brookbanks Park,43.751976,-79.33214,Park
2,M3A,43.753259,-79.329656,Tim Hortons,43.760668,-79.326368,Café
3,M3A,43.753259,-79.329656,A&W,43.760643,-79.326865,Fast Food Restaurant
4,M3A,43.753259,-79.329656,Bruno's valu-mart,43.746143,-79.32463,Grocery Store


### Counting how many venues each postal code has

In [18]:
northyork_venues.groupby('Postal Code').count()


Unnamed: 0_level_0,PC Latitude,PC Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Postal Code,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
M2H,19,19,19,19,19,19
M2J,44,44,44,44,44,44
M2K,16,16,16,16,16,16
M2L,4,4,4,4,4,4
M2M,30,30,30,30,30,30
M2N,50,50,50,50,50,50
M2P,22,22,22,22,22,22
M2R,13,13,13,13,13,13
M3A,29,29,29,29,29,29
M3B,30,30,30,30,30,30


## Analyzing each postal code

In [19]:
# one hot encoding
northyork_onehot = pd.get_dummies(northyork_venues[['Venue Category']], prefix="", prefix_sep="")

# add Postal Code column back to dataframe
northyork_onehot['Postal Code'] = northyork_venues['Postal Code'] 

# move neighborhood column to the first column
fixed_columns = [northyork_onehot.columns[-1]] + list(northyork_onehot.columns[:-1])
northyork_onehot = northyork_onehot[fixed_columns]

northyork_onehot.head()

Unnamed: 0,Postal Code,Accessories Store,Airport,American Restaurant,Art Gallery,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,Baby Store,Bagel Shop,Bakery,Bank,Bar,Baseball Field,Beer Store,Bike Shop,Boutique,Bowling Alley,Boxing Gym,Breakfast Spot,Bridal Shop,Bubble Tea Shop,Burger Joint,Bus Line,Bus Station,Bus Stop,Business Service,Butcher,Café,Caribbean Restaurant,Cheese Shop,Chinese Restaurant,Clothing Store,Coffee Shop,Comfort Food Restaurant,Community Center,Convenience Store,Cosmetics Shop,Creperie,Deli / Bodega,Dentist's Office,Department Store,Dessert Shop,Dim Sum Restaurant,Diner,Discount Store,Dog Run,Eastern European Restaurant,Electronics Store,Event Space,Fabric Shop,Falafel Restaurant,Fast Food Restaurant,Fireworks Store,Fish & Chips Shop,Fish Market,Food & Drink Shop,Food Court,Food Truck,Frame Store,French Restaurant,Fried Chicken Joint,Furniture / Home Store,Gas Station,Gift Shop,Golf Course,Greek Restaurant,Grocery Store,Gym,Gym / Fitness Center,Gym Pool,History Museum,Hobby Shop,Hockey Arena,Hookah Bar,Hot Dog Joint,Hotel,Ice Cream Shop,Indian Restaurant,Indonesian Restaurant,Intersection,Italian Restaurant,Japanese Restaurant,Jewelry Store,Juice Bar,Kitchen Supply Store,Korean Restaurant,Latin American Restaurant,Laundry Service,Liquor Store,Lounge,Massage Studio,Mediterranean Restaurant,Men's Store,Middle Eastern Restaurant,Miscellaneous Shop,Mobile Phone Shop,Movie Theater,Moving Target,New American Restaurant,Office,Other Repair Shop,Paper / Office Supplies Store,Park,Pet Store,Pharmacy,Pizza Place,Playground,Plaza,Pool,Portuguese Restaurant,Pub,Ramen Restaurant,Recreation Center,Rental Car Location,Residential Building (Apartment / Condo),Restaurant,Road,Salad Place,Salon / Barbershop,Sandwich Place,Seafood Restaurant,Shop & Service,Shopping Mall,Skating Rink,Ski Area,Ski Chalet,Snack Place,Soccer Field,Spa,Sporting Goods Shop,Sports Bar,Sports Club,Steakhouse,Storage Facility,Supermarket,Sushi Restaurant,Tea Room,Tennis Court,Thai Restaurant,Theater,Toy / Game Store,Trail,Train Station,Turkish Restaurant,Video Game Store,Video Store,Vietnamese Restaurant,Wings Joint,Women's Store,Yoga Studio
0,M3A,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,M3A,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,M3A,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,M3A,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,M3A,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


### Grouping the data by Postal Code and taking the mean of occurrency of each venue

In [20]:
northyork_grouped = northyork_onehot.groupby('Postal Code').mean().reset_index()
northyork_grouped

Unnamed: 0,Postal Code,Accessories Store,Airport,American Restaurant,Art Gallery,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,Baby Store,Bagel Shop,Bakery,Bank,Bar,Baseball Field,Beer Store,Bike Shop,Boutique,Bowling Alley,Boxing Gym,Breakfast Spot,Bridal Shop,Bubble Tea Shop,Burger Joint,Bus Line,Bus Station,Bus Stop,Business Service,Butcher,Café,Caribbean Restaurant,Cheese Shop,Chinese Restaurant,Clothing Store,Coffee Shop,Comfort Food Restaurant,Community Center,Convenience Store,Cosmetics Shop,Creperie,Deli / Bodega,Dentist's Office,Department Store,Dessert Shop,Dim Sum Restaurant,Diner,Discount Store,Dog Run,Eastern European Restaurant,Electronics Store,Event Space,Fabric Shop,Falafel Restaurant,Fast Food Restaurant,Fireworks Store,Fish & Chips Shop,Fish Market,Food & Drink Shop,Food Court,Food Truck,Frame Store,French Restaurant,Fried Chicken Joint,Furniture / Home Store,Gas Station,Gift Shop,Golf Course,Greek Restaurant,Grocery Store,Gym,Gym / Fitness Center,Gym Pool,History Museum,Hobby Shop,Hockey Arena,Hookah Bar,Hot Dog Joint,Hotel,Ice Cream Shop,Indian Restaurant,Indonesian Restaurant,Intersection,Italian Restaurant,Japanese Restaurant,Jewelry Store,Juice Bar,Kitchen Supply Store,Korean Restaurant,Latin American Restaurant,Laundry Service,Liquor Store,Lounge,Massage Studio,Mediterranean Restaurant,Men's Store,Middle Eastern Restaurant,Miscellaneous Shop,Mobile Phone Shop,Movie Theater,Moving Target,New American Restaurant,Office,Other Repair Shop,Paper / Office Supplies Store,Park,Pet Store,Pharmacy,Pizza Place,Playground,Plaza,Pool,Portuguese Restaurant,Pub,Ramen Restaurant,Recreation Center,Rental Car Location,Residential Building (Apartment / Condo),Restaurant,Road,Salad Place,Salon / Barbershop,Sandwich Place,Seafood Restaurant,Shop & Service,Shopping Mall,Skating Rink,Ski Area,Ski Chalet,Snack Place,Soccer Field,Spa,Sporting Goods Shop,Sports Bar,Sports Club,Steakhouse,Storage Facility,Supermarket,Sushi Restaurant,Tea Room,Tennis Court,Thai Restaurant,Theater,Toy / Game Store,Trail,Train Station,Turkish Restaurant,Video Game Store,Video Store,Vietnamese Restaurant,Wings Joint,Women's Store,Yoga Studio
0,M2H,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.052632,0.052632,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.052632,0.0,0.105263,0.0,0.0,0.052632,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.052632,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.052632,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.052632,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.105263,0.0,0.105263,0.052632,0.0,0.0,0.0,0.0,0.0,0.0,0.052632,0.0,0.052632,0.052632,0.0,0.0,0.0,0.052632,0.0,0.0,0.052632,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,M2J,0.0,0.0,0.022727,0.0,0.0,0.022727,0.0,0.0,0.0,0.045455,0.045455,0.0,0.0,0.022727,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.022727,0.0,0.0,0.0,0.0,0.0,0.0,0.022727,0.0,0.0,0.113636,0.113636,0.0,0.0,0.0,0.022727,0.0,0.0,0.0,0.022727,0.0,0.0,0.0,0.0,0.0,0.0,0.022727,0.0,0.0,0.0,0.022727,0.0,0.0,0.0,0.0,0.022727,0.0,0.0,0.0,0.022727,0.0,0.0,0.022727,0.0,0.0,0.022727,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.045455,0.0,0.022727,0.0,0.0,0.0,0.0,0.022727,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.022727,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.022727,0.022727,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.045455,0.0,0.0,0.022727,0.045455,0.0,0.0,0.022727,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.022727,0.0,0.0,0.022727,0.022727,0.0,0.0,0.0,0.022727,0.0,0.0,0.0,0.0,0.0
2,M2K,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.125,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0625,0.0,0.0,0.0625,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.125,0.0,0.0,0.0,0.125,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0625,0.0,0.125,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0625,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0625,0.0,0.0,0.0,0.0,0.0,0.0,0.0625,0.0625,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0625,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,M2L,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.75,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,M2M,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.033333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.033333,0.033333,0.0,0.0,0.0,0.1,0.0,0.0,0.0,0.0,0.066667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.033333,0.0,0.033333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.033333,0.0,0.0,0.0,0.0,0.0,0.033333,0.0,0.0,0.0,0.0,0.0,0.0,0.033333,0.033333,0.0,0.0,0.033333,0.0,0.0,0.0,0.033333,0.0,0.0,0.0,0.133333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.066667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.033333,0.0,0.0,0.066667,0.0,0.0,0.0,0.0,0.0,0.033333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.033333,0.0,0.0,0.033333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.033333,0.0,0.0,0.0,0.0,0.0,0.0,0.033333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
5,M2N,0.0,0.0,0.02,0.0,0.02,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.04,0.02,0.0,0.0,0.0,0.0,0.0,0.04,0.0,0.0,0.0,0.0,0.06,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.04,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.02,0.0,0.02,0.0,0.0,0.04,0.0,0.02,0.0,0.08,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.04,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.06,0.0,0.02,0.02,0.0,0.0,0.08,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.02,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.02,0.0,0.02,0.06,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02
6,M2P,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.045455,0.0,0.0,0.0,0.045455,0.0,0.0,0.0,0.0,0.045455,0.0,0.0,0.0,0.0,0.045455,0.0,0.090909,0.0,0.0,0.045455,0.0,0.0,0.0,0.045455,0.0,0.0,0.0,0.0,0.0,0.045455,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.045455,0.0,0.0,0.045455,0.0,0.045455,0.0,0.045455,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.045455,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.181818,0.045455,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.136364,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
7,M2R,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.076923,0.0,0.076923,0.076923,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.076923,0.0,0.0,0.0,0.076923,0.0,0.0,0.0,0.0,0.0,0.076923,0.0,0.0,0.076923,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.076923,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.076923,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.076923,0.0,0.153846,0.076923,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
8,M3A,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.068966,0.0,0.0,0.034483,0.034483,0.0,0.034483,0.0,0.034483,0.0,0.0,0.068966,0.034483,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.034483,0.0,0.0,0.0,0.0,0.0,0.0,0.034483,0.0,0.034483,0.0,0.034483,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.034483,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.034483,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.103448,0.0,0.068966,0.034483,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.034483,0.0,0.0,0.0,0.0,0.034483,0.068966,0.034483,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.034483,0.0,0.0,0.034483,0.0,0.0,0.0,0.0,0.034483,0.0,0.0,0.0,0.0,0.0,0.0,0.0
9,M3B,0.0,0.0,0.0,0.0,0.0,0.033333,0.033333,0.0,0.0,0.0,0.066667,0.033333,0.033333,0.0,0.0,0.0,0.0,0.0,0.033333,0.0,0.0,0.066667,0.0,0.0,0.0,0.0,0.0,0.033333,0.033333,0.0,0.0,0.0,0.066667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.033333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.033333,0.0,0.0,0.033333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.1,0.0,0.0,0.0,0.0,0.0,0.0,0.033333,0.0,0.0,0.0,0.0,0.0,0.0,0.033333,0.0,0.0,0.0,0.0,0.0,0.033333,0.0,0.0,0.0,0.066667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.033333,0.0,0.033333,0.0,0.0,0.0,0.033333,0.0,0.0,0.0,0.0,0.0,0.0,0.033333,0.0,0.0,0.0,0.0,0.0,0.033333,0.0,0.0,0.0,0.033333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


## Grouping 3 most common venues by postal code

In [21]:
num_top_venues = 3

for post in northyork_grouped['Postal Code']:
    print("----"+post+"----")
    temp = northyork_grouped[northyork_grouped['Postal Code'] == post].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----M2H----
         venue  freq
0         Park  0.11
1     Pharmacy  0.11
2  Coffee Shop  0.11


----M2J----
            venue  freq
0  Clothing Store  0.11
1     Coffee Shop  0.11
2          Bakery  0.05


----M2K----
                 venue  freq
0                 Bank  0.12
1  Japanese Restaurant  0.12
2        Grocery Store  0.12


----M2L----
               venue  freq
0               Park  0.75
1               Pool  0.25
2  Accessories Store  0.00


----M2M----
               venue  freq
0  Korean Restaurant  0.13
1               Café  0.10
2        Pizza Place  0.07


----M2N----
               venue  freq
0   Ramen Restaurant  0.08
1  Korean Restaurant  0.08
2   Sushi Restaurant  0.06


----M2P----
         venue  freq
0         Park  0.18
1   Restaurant  0.14
2  Coffee Shop  0.09


----M2R----
         venue  freq
0     Pharmacy  0.15
1  Pizza Place  0.08
2     Bus Line  0.08


----M3A----
           venue  freq
0           Park  0.10
1       Pharmacy  0.07
2  Shopping Mall  0

### Transforming this information to a dataframe

In [22]:
# this function sort venues in descending order
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]


In [23]:
num_top_venues = 7

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Postal Code']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
post_venues_sorted = pd.DataFrame(columns=columns)
post_venues_sorted['Postal Code'] = northyork_grouped['Postal Code']

for ind in np.arange(northyork_grouped.shape[0]):
    post_venues_sorted.iloc[ind, 1:] = return_most_common_venues(northyork_grouped.iloc[ind, :], num_top_venues)

post_venues_sorted

Unnamed: 0,Postal Code,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue
0,M2H,Pharmacy,Park,Coffee Shop,Grocery Store,Recreation Center,Sandwich Place,Korean Restaurant
1,M2J,Coffee Shop,Clothing Store,Sandwich Place,Restaurant,Japanese Restaurant,Bank,Bakery
2,M2K,Bank,Grocery Store,Gas Station,Japanese Restaurant,Skating Rink,Shopping Mall,Park
3,M2L,Park,Pool,Dog Run,Fireworks Store,Fast Food Restaurant,Falafel Restaurant,Fabric Shop
4,M2M,Korean Restaurant,Café,Pizza Place,Coffee Shop,Middle Eastern Restaurant,Hot Dog Joint,Supermarket
5,M2N,Ramen Restaurant,Korean Restaurant,Sushi Restaurant,Coffee Shop,Pizza Place,Bubble Tea Shop,Middle Eastern Restaurant
6,M2P,Park,Restaurant,Coffee Shop,Bubble Tea Shop,French Restaurant,Pet Store,Convenience Store
7,M2R,Pharmacy,Bank,Pizza Place,Convenience Store,Grocery Store,Park,Baby Store
8,M3A,Park,Pharmacy,Convenience Store,Shopping Mall,Bus Stop,Discount Store,Laundry Service
9,M3B,Japanese Restaurant,Burger Joint,Pizza Place,Coffee Shop,Bank,Paper / Office Supplies Store,Salad Place


## Cluster postal code

In [24]:
# set number of clusters
kclusters = 8

northyork_grouped_clustering = northyork_grouped.drop('Postal Code', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(northyork_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

array([3, 5, 0, 2, 5, 5, 3, 3, 0, 5], dtype=int32)

### Creating a new dataframe that includes the new clusters

In [25]:
# add clustering labels
post_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

northyork_merged = ny_data

# merge toronto_grouped with toronto_data to add latitude/longitude for each postal code
northyork_merged = northyork_merged.join(post_venues_sorted.set_index('Postal Code'), on='Postal Code')

northyork_merged # check the last columns!

Unnamed: 0,Postal Code,Neighborhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue
0,M3A,Parkwoods,43.753259,-79.329656,0,Park,Pharmacy,Convenience Store,Shopping Mall,Bus Stop,Discount Store,Laundry Service
1,M4A,Victoria Village,43.725882,-79.315572,6,Coffee Shop,Hockey Arena,French Restaurant,Playground,Park,Café,Men's Store
2,M6A,"Lawrence Manor, Lawrence Heights",43.718518,-79.464763,5,Clothing Store,Furniture / Home Store,Coffee Shop,Fast Food Restaurant,Dessert Shop,Restaurant,Vietnamese Restaurant
3,M3B,Don Mills,43.745906,-79.352188,5,Japanese Restaurant,Burger Joint,Pizza Place,Coffee Shop,Bank,Paper / Office Supplies Store,Salad Place
4,M6B,Glencairn,43.709577,-79.445073,0,Grocery Store,Fast Food Restaurant,Pizza Place,Gas Station,Coffee Shop,Japanese Restaurant,Pet Store
5,M3C,Don Mills,43.7259,-79.340923,5,Restaurant,Gym,Coffee Shop,Japanese Restaurant,Asian Restaurant,Beer Store,Supermarket
6,M2H,Hillcrest Village,43.803762,-79.363452,3,Pharmacy,Park,Coffee Shop,Grocery Store,Recreation Center,Sandwich Place,Korean Restaurant
7,M3H,"Bathurst Manor, Wilson Heights, Downsview North",43.754328,-79.442259,3,Coffee Shop,Park,Bank,Diner,Shopping Mall,Sandwich Place,Dog Run
8,M2J,"Fairview, Henry Farm, Oriole",43.778517,-79.346556,5,Coffee Shop,Clothing Store,Sandwich Place,Restaurant,Japanese Restaurant,Bank,Bakery
9,M3J,"Northwood Park, York University",43.76798,-79.487262,5,Pizza Place,Coffee Shop,Furniture / Home Store,Restaurant,Japanese Restaurant,Middle Eastern Restaurant,Fast Food Restaurant


In [26]:
northyork_merged.dtypes

Postal Code               object
Neighborhood              object
Latitude                 float64
Longitude                float64
Cluster Labels             int32
1st Most Common Venue     object
2nd Most Common Venue     object
3rd Most Common Venue     object
4th Most Common Venue     object
5th Most Common Venue     object
6th Most Common Venue     object
7th Most Common Venue     object
dtype: object

In [27]:
northyork_merged

Unnamed: 0,Postal Code,Neighborhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue
0,M3A,Parkwoods,43.753259,-79.329656,0,Park,Pharmacy,Convenience Store,Shopping Mall,Bus Stop,Discount Store,Laundry Service
1,M4A,Victoria Village,43.725882,-79.315572,6,Coffee Shop,Hockey Arena,French Restaurant,Playground,Park,Café,Men's Store
2,M6A,"Lawrence Manor, Lawrence Heights",43.718518,-79.464763,5,Clothing Store,Furniture / Home Store,Coffee Shop,Fast Food Restaurant,Dessert Shop,Restaurant,Vietnamese Restaurant
3,M3B,Don Mills,43.745906,-79.352188,5,Japanese Restaurant,Burger Joint,Pizza Place,Coffee Shop,Bank,Paper / Office Supplies Store,Salad Place
4,M6B,Glencairn,43.709577,-79.445073,0,Grocery Store,Fast Food Restaurant,Pizza Place,Gas Station,Coffee Shop,Japanese Restaurant,Pet Store
5,M3C,Don Mills,43.7259,-79.340923,5,Restaurant,Gym,Coffee Shop,Japanese Restaurant,Asian Restaurant,Beer Store,Supermarket
6,M2H,Hillcrest Village,43.803762,-79.363452,3,Pharmacy,Park,Coffee Shop,Grocery Store,Recreation Center,Sandwich Place,Korean Restaurant
7,M3H,"Bathurst Manor, Wilson Heights, Downsview North",43.754328,-79.442259,3,Coffee Shop,Park,Bank,Diner,Shopping Mall,Sandwich Place,Dog Run
8,M2J,"Fairview, Henry Farm, Oriole",43.778517,-79.346556,5,Coffee Shop,Clothing Store,Sandwich Place,Restaurant,Japanese Restaurant,Bank,Bakery
9,M3J,"Northwood Park, York University",43.76798,-79.487262,5,Pizza Place,Coffee Shop,Furniture / Home Store,Restaurant,Japanese Restaurant,Middle Eastern Restaurant,Fast Food Restaurant


## Visualizing the clusters

In [28]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(northyork_merged['Latitude'], northyork_merged['Longitude'], northyork_merged['Postal Code'], northyork_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

## Examine each cluster

Cluster 1

In [31]:
northyork_merged.loc[northyork_merged['Cluster Labels'] == 0, northyork_merged.columns[[1] + list(range(8, northyork_merged.shape[1]))]]

Unnamed: 0,Neighborhood,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue
0,Parkwoods,Shopping Mall,Bus Stop,Discount Store,Laundry Service
4,Glencairn,Gas Station,Coffee Shop,Japanese Restaurant,Pet Store
10,Bayview Village,Japanese Restaurant,Skating Rink,Shopping Mall,Park
21,Downsview,Pharmacy,Pizza Place,Fast Food Restaurant,Gas Station


Cluster 2

In [32]:
northyork_merged.loc[northyork_merged['Cluster Labels'] == 1, northyork_merged.columns[[1] + list(range(8, northyork_merged.shape[1]))]]

Unnamed: 0,Neighborhood,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue
17,Downsview,Yoga Studio,Electronics Store,Fireworks Store,Fast Food Restaurant


Cluster 3

In [33]:
northyork_merged.loc[northyork_merged['Cluster Labels'] == 2, northyork_merged.columns[[1] + list(range(8, northyork_merged.shape[1]))]]

Unnamed: 0,Neighborhood,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue
12,"York Mills, Silver Hills",Fireworks Store,Fast Food Restaurant,Falafel Restaurant,Fabric Shop


Cluster 4

In [34]:
northyork_merged.loc[northyork_merged['Cluster Labels'] == 3, northyork_merged.columns[[1] + list(range(8, northyork_merged.shape[1]))]]

Unnamed: 0,Neighborhood,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue
6,Hillcrest Village,Grocery Store,Recreation Center,Sandwich Place,Korean Restaurant
7,"Bathurst Manor, Wilson Heights, Downsview North",Diner,Shopping Mall,Sandwich Place,Dog Run
13,Downsview,Grocery Store,Coffee Shop,Shopping Mall,Moving Target
14,"North Park, Maple Leaf Park, Upwood Park",Gas Station,Chinese Restaurant,Park,Bakery
22,York Mills West,Bubble Tea Shop,French Restaurant,Pet Store,Convenience Store
23,"Willowdale, Willowdale West",Convenience Store,Grocery Store,Park,Baby Store


Cluster 5

In [35]:
northyork_merged.loc[northyork_merged['Cluster Labels'] == 4, northyork_merged.columns[[1] + list(range(8, northyork_merged.shape[1]))]]

Unnamed: 0,Neighborhood,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue
19,"Humberlea, Emery",Intersection,Discount Store,Golf Course,Bakery


Cluster 6

In [36]:
northyork_merged.loc[northyork_merged['Cluster Labels'] == 5, northyork_merged.columns[[1] + list(range(8, northyork_merged.shape[1]))]]

Unnamed: 0,Neighborhood,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue
2,"Lawrence Manor, Lawrence Heights",Fast Food Restaurant,Dessert Shop,Restaurant,Vietnamese Restaurant
3,Don Mills,Coffee Shop,Bank,Paper / Office Supplies Store,Salad Place
5,Don Mills,Japanese Restaurant,Asian Restaurant,Beer Store,Supermarket
8,"Fairview, Henry Farm, Oriole",Restaurant,Japanese Restaurant,Bank,Bakery
9,"Northwood Park, York University",Restaurant,Japanese Restaurant,Middle Eastern Restaurant,Fast Food Restaurant
11,Downsview,Italian Restaurant,Park,Other Repair Shop,Chinese Restaurant
16,"Willowdale, Newtonbrook",Coffee Shop,Middle Eastern Restaurant,Hot Dog Joint,Supermarket
18,"Bedford Park, Lawrence Manor East",Restaurant,Bank,Park,Pharmacy
20,"Willowdale, Willowdale East",Coffee Shop,Pizza Place,Bubble Tea Shop,Middle Eastern Restaurant


Cluster 7

In [37]:
northyork_merged.loc[northyork_merged['Cluster Labels'] == 6, northyork_merged.columns[[1] + list(range(8, northyork_merged.shape[1]))]]

Unnamed: 0,Neighborhood,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue
1,Victoria Village,Playground,Park,Café,Men's Store


Cluster 8

In [38]:
northyork_merged.loc[northyork_merged['Cluster Labels'] == 7, northyork_merged.columns[[1] + list(range(8, northyork_merged.shape[1]))]]

Unnamed: 0,Neighborhood,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue
15,Humber Summit,Pharmacy,Arts & Crafts Store,Park,Italian Restaurant


# <span style="color:blue">Third Commit</span> 