# Geospatial Analysis and Exploration

### Phuc (Peter) Luong
**References:**
For this assignment, I use the following guides:
* [How to scrape wiki data and input to Python](https://simpleanalytical.com/how-to-web-scrape-wikipedia-python-urllib-beautiful-soup-pandas)
* [Exploring Neighboorhoods using API and Python (from IBM through Couresera)](https://simpleanalytical.com/how-to-web-scrape-wikipedia-python-urllib-beautiful-soup-pandas)

# PART 1

## Scraping data from Wiki Tables

The first step is to retrieve data of postal codes from Wikipedia 

In [1]:
import urllib.request
link_p = "https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M"
page = urllib.request.urlopen(link_p)

In [2]:
from bs4 import BeautifulSoup

In [3]:
soup = BeautifulSoup(page, "html.parser")

In [4]:
# Finding all tables: 
# all_tables=soup.find_all("table")

In [5]:
table=soup.find('table', class_='wikitable sortable')
# table 

In [6]:
## looping to fill up data from the wiki table, reading text
A=[]
B=[]
C=[]
for row in table.findAll('tr'):
    cells=row.findAll('td')
    if len(cells)==3:
        A.append(cells[0].find(text=True))
        B.append(cells[1].find(text=True))
        C.append(cells[2].find(text=True))

In [7]:
import pandas as pd

In [8]:
## Postal Code + Borough + Neighborhood: column headers

In [9]:
df_wiki=pd.DataFrame(A,columns=['Postal Code'])
df_wiki['Borough']=B
df_wiki['Neighborhood']=C

After I create a new dataframe, for some reasons, the text appears with "\n" (space break) at the end. So I modify the data by dropping the "\n"

In [10]:
df_wiki = df_wiki.replace('\n','', regex=True)

In [11]:
df_wiki.head(12)

Unnamed: 0,Postal Code,Borough,Neighborhood
0,M1A,Not assigned,Not assigned
1,M2A,Not assigned,Not assigned
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,"Regent Park, Harbourfront"
5,M6A,North York,"Lawrence Manor, Lawrence Heights"
6,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government"
7,M8A,Not assigned,Not assigned
8,M9A,Etobicoke,"Islington Avenue, Humber Valley Village"
9,M1B,Scarborough,"Malvern, Rouge"


In [None]:
# note that there are only cases where both Borough and Neighborhood are "Not assigned", not either or.

In [12]:
df_wiki= df_wiki[df_wiki.Borough != 'Not assigned']
df_wiki= df_wiki[df_wiki.Neighborhood != 'Not assigned']

In [13]:
df_wiki.head(12)

Unnamed: 0,Postal Code,Borough,Neighborhood
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,"Regent Park, Harbourfront"
5,M6A,North York,"Lawrence Manor, Lawrence Heights"
6,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government"
8,M9A,Etobicoke,"Islington Avenue, Humber Valley Village"
9,M1B,Scarborough,"Malvern, Rouge"
11,M3B,North York,Don Mills
12,M4B,East York,"Parkview Hill, Woodbine Gardens"
13,M5B,Downtown Toronto,"Garden District, Ryerson"


In [15]:
df_wiki.shape

(103, 3)

With the address information including the postal codes, we can find the coordinates (longitude and lattitude) for geospatial analysis using geocoding packages from Python (and Google). However, this package is very unreliable. So for a shorter version, I used the "Geospatial_Coordinates" csv file provided through Coursera, combining it with the previous neighborhood data. 

# PART 2

In [16]:
df_gc = pd.read_csv("Geospatial_Coordinates.csv")
df_gc.head()

Unnamed: 0,Postal Code,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497
2,M1E,43.763573,-79.188711
3,M1G,43.770992,-79.216917
4,M1H,43.773136,-79.239476


In [17]:
df = pd.merge(df_wiki, df_gc, on='Postal Code')
df.head(12)

Unnamed: 0,Postal Code,Borough,Neighborhood,Latitude,Longitude
0,M3A,North York,Parkwoods,43.753259,-79.329656
1,M4A,North York,Victoria Village,43.725882,-79.315572
2,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.65426,-79.360636
3,M6A,North York,"Lawrence Manor, Lawrence Heights",43.718518,-79.464763
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",43.662301,-79.389494
5,M9A,Etobicoke,"Islington Avenue, Humber Valley Village",43.667856,-79.532242
6,M1B,Scarborough,"Malvern, Rouge",43.806686,-79.194353
7,M3B,North York,Don Mills,43.745906,-79.352188
8,M4B,East York,"Parkview Hill, Woodbine Gardens",43.706397,-79.309937
9,M5B,Downtown Toronto,"Garden District, Ryerson",43.657162,-79.378937


# PART 3

## Exploring Neighborhoods of North York in Toronto

Before diving into our analysis, let's load up the relevant packages

In [19]:
import numpy as np 
import pandas as pd 

# to display all columns and rows of data
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

# this is for API json files
import json

#!pip install geopy # uncomment this line if you haven't installed the package
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values (unreliable)
import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

import matplotlib.cm as cm
import matplotlib.colors as colors

# for K-means clustering
from sklearn.cluster import KMeans

# for displaying map
import folium 

First, I filtered my Toronto data to look at North York borough only

In [43]:
df_ny = df[df['Borough']=='North York'].reset_index(drop=True)
df_ny

Unnamed: 0,Postal Code,Borough,Neighborhood,Latitude,Longitude
0,M3A,North York,Parkwoods,43.753259,-79.329656
1,M4A,North York,Victoria Village,43.725882,-79.315572
2,M6A,North York,"Lawrence Manor, Lawrence Heights",43.718518,-79.464763
3,M3B,North York,Don Mills,43.745906,-79.352188
4,M6B,North York,Glencairn,43.709577,-79.445073
5,M3C,North York,Don Mills,43.7259,-79.340923
6,M2H,North York,Hillcrest Village,43.803762,-79.363452
7,M3H,North York,"Bathurst Manor, Wilson Heights, Downsview North",43.754328,-79.442259
8,M2J,North York,"Fairview, Henry Farm, Oriole",43.778517,-79.346556
9,M3J,North York,"Northwood Park, York University",43.76798,-79.487262


Then I input my API credentials to fetch location data

In [44]:
CLIENT_ID = 'O21VRXRAQKNHZ4RNQQ3ALNR1PTYARC4XTRI4OHLOPHNC31B4' # your Foursquare ID
CLIENT_SECRET = '0W3HDXRJTDPU3ABNLVO20JR1BLMOMHPQH2ZGT3GJQSTD11PL' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version

Next, I use API to fetch nearby location data for North York (within 500 meters)

In [45]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

Now I am looking at the top 100 venues (LIMIT=100)

In [46]:
## 
LIMIT = 100
northyork_venues = getNearbyVenues(names=df_ny['Neighborhood'],
                                   latitudes=df['Latitude'],
                                   longitudes=df['Longitude']
                                  )

Parkwoods
Victoria Village
Lawrence Manor, Lawrence Heights
Don Mills
Glencairn
Don Mills
Hillcrest Village
Bathurst Manor, Wilson Heights, Downsview North
Fairview, Henry Farm, Oriole
Northwood Park, York University
Bayview Village
Downsview
York Mills, Silver Hills
Downsview
North Park, Maple Leaf Park, Upwood Park
Humber Summit
Willowdale, Newtonbrook
Downsview
Bedford Park, Lawrence Manor East
Humberlea, Emery
Willowdale, Willowdale East
Downsview
York Mills West
Willowdale, Willowdale West


In [47]:
northyork_venues.head()

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Parkwoods,43.753259,-79.329656,Brookbanks Park,43.751976,-79.33214,Park
1,Parkwoods,43.753259,-79.329656,TTC stop #8380,43.752672,-79.326351,Bus Stop
2,Parkwoods,43.753259,-79.329656,Variety Store,43.751974,-79.333114,Food & Drink Shop
3,Parkwoods,43.753259,-79.329656,TTC stop - 44 Valley Woods,43.755402,-79.333741,Bus Stop
4,Victoria Village,43.725882,-79.315572,Victoria Village Arena,43.723481,-79.315635,Hockey Arena


In [48]:
northyork_venues.groupby('Neighborhood').count()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
"Bathurst Manor, Wilson Heights, Downsview North",5,5,5,5,5,5
Bayview Village,4,4,4,4,4,4
"Bedford Park, Lawrence Manor East",9,9,9,9,9,9
Don Mills,14,14,14,14,14,14
Downsview,38,38,38,38,38,38
"Fairview, Henry Farm, Oriole",12,12,12,12,12,12
Glencairn,35,35,35,35,35,35
Hillcrest Village,1,1,1,1,1,1
Humber Summit,86,86,86,86,86,86
"Humberlea, Emery",4,4,4,4,4,4


In [26]:
print('There are {} uniques categories.'.format(len(northyork_venues['Venue Category'].unique())))

There are 151 uniques categories.


Next, I am creating a "dummy" tables with all the venue categories as dummies (etc, =1 if the data record falls in a specific venue category)

In [104]:
# one hot encoding
northyork_onehot = pd.get_dummies(northyork_venues[['Venue Category']], prefix="", prefix_sep="")
# add neighborhood column back to dataframe
northyork_onehot['Neighborhood'] = northyork_venues['Neighborhood'] 

In [77]:
# this is purely for displaying data
# move neighborhood column to the first column
# I used the "pop" method by first removing the data
first_col = northyork_onehot.pop('Neighborhood')    

# and then inserting again as the first column
northyork_onehot.insert(0, 'Neighborhood', first_col)

In [78]:
northyork_onehot.head()

Unnamed: 0,Neighborhood,Accessories Store,American Restaurant,Antique Shop,Art Gallery,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,BBQ Joint,Bagel Shop,Bakery,Bank,Bar,Baseball Field,Basketball Stadium,Beach,Beer Bar,Beer Store,Belgian Restaurant,Bike Shop,Bistro,Bookstore,Boutique,Breakfast Spot,Brewery,Bubble Tea Shop,Burger Joint,Burrito Place,Bus Stop,Café,Caribbean Restaurant,Cheese Shop,Chinese Restaurant,Chocolate Shop,Clothing Store,Cocktail Bar,Coffee Shop,College Auditorium,College Cafeteria,College Rec Center,Comfort Food Restaurant,Comic Shop,Concert Hall,Convenience Store,Cosmetics Shop,Creperie,Curling Ice,Department Store,Dessert Shop,Dim Sum Restaurant,Diner,Discount Store,Distribution Center,Eastern European Restaurant,Electronics Store,Ethiopian Restaurant,Event Space,Farmers Market,Fast Food Restaurant,Field,Filipino Restaurant,Fish & Chips Shop,Fish Market,Food & Drink Shop,Food Truck,Fountain,French Restaurant,Fried Chicken Joint,Furniture / Home Store,Gastropub,German Restaurant,Gourmet Shop,Greek Restaurant,Grocery Store,Gym,Gym / Fitness Center,Health Food Store,Historic Site,History Museum,Hobby Shop,Hockey Arena,Hotel,Ice Cream Shop,Indian Restaurant,Insurance Office,Intersection,Irish Pub,Italian Restaurant,Japanese Restaurant,Jazz Club,Korean Restaurant,Lake,Lingerie Store,Liquor Store,Lounge,Medical Center,Mexican Restaurant,Middle Eastern Restaurant,Miscellaneous Shop,Modern European Restaurant,Moroccan Restaurant,Movie Theater,Moving Target,Museum,Music Venue,New American Restaurant,Nightclub,Office,Other Great Outdoors,Park,Performing Arts Venue,Pet Store,Pharmacy,Pizza Place,Plaza,Poke Place,Pool,Portuguese Restaurant,Pub,Ramen Restaurant,Rental Car Location,Restaurant,Salon / Barbershop,Sandwich Place,Seafood Restaurant,Shoe Store,Shopping Mall,Shopping Plaza,Skating Rink,Smoke Shop,Smoothie Shop,Spa,Sporting Goods Shop,Sports Bar,Steakhouse,Supermarket,Sushi Restaurant,Tailor Shop,Tanning Salon,Tea Room,Tennis Court,Thai Restaurant,Theater,Toy / Game Store,Trail,Vegetarian / Vegan Restaurant,Video Game Store,Vietnamese Restaurant,Wine Bar,Women's Store,Yoga Studio
0,Parkwoods,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,Parkwoods,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,Parkwoods,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,Parkwoods,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,Victoria Village,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


In [79]:
northyork_onehot.shape

(467, 151)

Next, I am grouping data by percentage of certain venues by neighborhoods

In [None]:
northyork_grouped = northyork_onehot.groupby('Neighborhood').mean().reset_index()
northyork_grouped

In [81]:
num_top_venues = 5

for hood in northyork_grouped['Neighborhood']:
    print("----"+hood+"----")
    temp = northyork_grouped[northyork_grouped['Neighborhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----Bathurst Manor, Wilson Heights, Downsview North----
                  venue  freq
0   Japanese Restaurant   0.2
1                   Gym   0.2
2                  Café   0.2
3  Caribbean Restaurant   0.2
4        Baseball Field   0.2


----Bayview Village----
                 venue  freq
0          Pizza Place  0.25
1                  Pub  0.25
2     Asian Restaurant  0.25
3  Japanese Restaurant  0.25
4          Music Venue  0.00


----Bedford Park, Lawrence Manor East----
                 venue  freq
0   Mexican Restaurant  0.11
1       Medical Center  0.11
2    Electronics Store  0.11
3         Intersection  0.11
4  Rental Car Location  0.11


----Don Mills----
                    venue  freq
0          Clothing Store  0.36
1  Furniture / Home Store  0.14
2       Accessories Store  0.07
3             Coffee Shop  0.07
4             Event Space  0.07


----Downsview----
              venue  freq
0        Beer Store  0.08
1       Coffee Shop  0.08
2              Park  0.08
3        R

In [82]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [83]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = northyork_grouped['Neighborhood']

for ind in np.arange(northyork_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(northyork_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted.head()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,"Bathurst Manor, Wilson Heights, Downsview North",Café,Gym,Japanese Restaurant,Baseball Field,Caribbean Restaurant,Dessert Shop,Eastern European Restaurant,Distribution Center,Discount Store,Diner
1,Bayview Village,Pizza Place,Asian Restaurant,Japanese Restaurant,Pub,Yoga Studio,Department Store,Distribution Center,Discount Store,Diner,Dim Sum Restaurant
2,"Bedford Park, Lawrence Manor East",Electronics Store,Bank,Medical Center,Mexican Restaurant,Intersection,Restaurant,Rental Car Location,Breakfast Spot,Moving Target,Curling Ice
3,Don Mills,Clothing Store,Furniture / Home Store,Accessories Store,Boutique,Event Space,Miscellaneous Shop,Coffee Shop,Women's Store,Vietnamese Restaurant,Diner
4,Downsview,Park,Beer Store,Coffee Shop,Café,Asian Restaurant,Restaurant,Gym,Sandwich Place,Japanese Restaurant,Bubble Tea Shop


## Clustering Neighbors

In [84]:
# set number of clusters
kclusters = 5

northyork_grouped_clustering = northyork_grouped.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(northyork_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

array([0, 0, 0, 0, 0, 0, 0, 1, 0, 0])

In [85]:
# add clustering labels
neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

northyork_merged = northyork_venues

# merge toronto_grouped with toronto_data to add latitude/longitude for each neighborhood
northyork_merged = northyork_merged.join(neighborhoods_venues_sorted.set_index('Neighborhood'), on='Neighborhood')

northyork_merged.head() # check the last columns!

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Parkwoods,43.753259,-79.329656,Brookbanks Park,43.751976,-79.33214,Park,4,Bus Stop,Park,Food & Drink Shop,Yoga Studio,Distribution Center,Discount Store,Diner,Dim Sum Restaurant,Dessert Shop,Department Store
1,Parkwoods,43.753259,-79.329656,TTC stop #8380,43.752672,-79.326351,Bus Stop,4,Bus Stop,Park,Food & Drink Shop,Yoga Studio,Distribution Center,Discount Store,Diner,Dim Sum Restaurant,Dessert Shop,Department Store
2,Parkwoods,43.753259,-79.329656,Variety Store,43.751974,-79.333114,Food & Drink Shop,4,Bus Stop,Park,Food & Drink Shop,Yoga Studio,Distribution Center,Discount Store,Diner,Dim Sum Restaurant,Dessert Shop,Department Store
3,Parkwoods,43.753259,-79.329656,TTC stop - 44 Valley Woods,43.755402,-79.333741,Bus Stop,4,Bus Stop,Park,Food & Drink Shop,Yoga Studio,Distribution Center,Discount Store,Diner,Dim Sum Restaurant,Dessert Shop,Department Store
4,Victoria Village,43.725882,-79.315572,Victoria Village Arena,43.723481,-79.315635,Hockey Arena,0,Coffee Shop,Pizza Place,Hockey Arena,Portuguese Restaurant,Curling Ice,Distribution Center,Discount Store,Diner,Dim Sum Restaurant,Dessert Shop


In [86]:
# create map
map_clusters = folium.Map(location=[43.7615, -79.4111], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(northyork_merged['Venue Latitude'], northyork_merged['Venue Longitude'], northyork_merged['Neighborhood'], northyork_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

## Breakdown of Clusters

### Cluster 1

In [96]:
northyork_merged.shape[1]

18

In [99]:
northyork_merged.loc[northyork_merged['Cluster Labels'] == 0, northyork_merged.columns[[0] + list(range(8, northyork_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
4,Victoria Village,Coffee Shop,Pizza Place,Hockey Arena,Portuguese Restaurant,Curling Ice,Distribution Center,Discount Store,Diner,Dim Sum Restaurant,Dessert Shop
5,Victoria Village,Coffee Shop,Pizza Place,Hockey Arena,Portuguese Restaurant,Curling Ice,Distribution Center,Discount Store,Diner,Dim Sum Restaurant,Dessert Shop
6,Victoria Village,Coffee Shop,Pizza Place,Hockey Arena,Portuguese Restaurant,Curling Ice,Distribution Center,Discount Store,Diner,Dim Sum Restaurant,Dessert Shop
7,Victoria Village,Coffee Shop,Pizza Place,Hockey Arena,Portuguese Restaurant,Curling Ice,Distribution Center,Discount Store,Diner,Dim Sum Restaurant,Dessert Shop
8,"Lawrence Manor, Lawrence Heights",Coffee Shop,Park,Bakery,Pub,Theater,Breakfast Spot,Café,Electronics Store,Performing Arts Venue,Mexican Restaurant
9,"Lawrence Manor, Lawrence Heights",Coffee Shop,Park,Bakery,Pub,Theater,Breakfast Spot,Café,Electronics Store,Performing Arts Venue,Mexican Restaurant
10,"Lawrence Manor, Lawrence Heights",Coffee Shop,Park,Bakery,Pub,Theater,Breakfast Spot,Café,Electronics Store,Performing Arts Venue,Mexican Restaurant
11,"Lawrence Manor, Lawrence Heights",Coffee Shop,Park,Bakery,Pub,Theater,Breakfast Spot,Café,Electronics Store,Performing Arts Venue,Mexican Restaurant
12,"Lawrence Manor, Lawrence Heights",Coffee Shop,Park,Bakery,Pub,Theater,Breakfast Spot,Café,Electronics Store,Performing Arts Venue,Mexican Restaurant
13,"Lawrence Manor, Lawrence Heights",Coffee Shop,Park,Bakery,Pub,Theater,Breakfast Spot,Café,Electronics Store,Performing Arts Venue,Mexican Restaurant


In [100]:
northyork_merged.loc[northyork_merged['Cluster Labels'] == 1, northyork_merged.columns[[0] + list(range(8, northyork_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
101,Hillcrest Village,Fast Food Restaurant,Yoga Studio,Dessert Shop,Electronics Store,Eastern European Restaurant,Distribution Center,Discount Store,Diner,Dim Sum Restaurant,Department Store


In [101]:
northyork_merged.loc[northyork_merged['Cluster Labels'] == 2, northyork_merged.columns[[0] + list(range(8, northyork_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
224,"York Mills, Silver Hills",History Museum,Bar,Yoga Studio,Dessert Shop,Electronics Store,Eastern European Restaurant,Distribution Center,Discount Store,Diner,Dim Sum Restaurant
225,"York Mills, Silver Hills",History Museum,Bar,Yoga Studio,Dessert Shop,Electronics Store,Eastern European Restaurant,Distribution Center,Discount Store,Diner,Dim Sum Restaurant


In [102]:
northyork_merged.loc[northyork_merged['Cluster Labels'] == 3, northyork_merged.columns[[0] + list(range(8, northyork_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
430,York Mills West,Coffee Shop,Insurance Office,Korean Restaurant,Dessert Shop,Electronics Store,Eastern European Restaurant,Distribution Center,Discount Store,Diner,Dim Sum Restaurant
431,York Mills West,Coffee Shop,Insurance Office,Korean Restaurant,Dessert Shop,Electronics Store,Eastern European Restaurant,Distribution Center,Discount Store,Diner,Dim Sum Restaurant
432,York Mills West,Coffee Shop,Insurance Office,Korean Restaurant,Dessert Shop,Electronics Store,Eastern European Restaurant,Distribution Center,Discount Store,Diner,Dim Sum Restaurant
433,York Mills West,Coffee Shop,Insurance Office,Korean Restaurant,Dessert Shop,Electronics Store,Eastern European Restaurant,Distribution Center,Discount Store,Diner,Dim Sum Restaurant


In [103]:
northyork_merged.loc[northyork_merged['Cluster Labels'] == 4, northyork_merged.columns[[0] + list(range(8, northyork_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Parkwoods,Bus Stop,Park,Food & Drink Shop,Yoga Studio,Distribution Center,Discount Store,Diner,Dim Sum Restaurant,Dessert Shop,Department Store
1,Parkwoods,Bus Stop,Park,Food & Drink Shop,Yoga Studio,Distribution Center,Discount Store,Diner,Dim Sum Restaurant,Dessert Shop,Department Store
2,Parkwoods,Bus Stop,Park,Food & Drink Shop,Yoga Studio,Distribution Center,Discount Store,Diner,Dim Sum Restaurant,Dessert Shop,Department Store
3,Parkwoods,Bus Stop,Park,Food & Drink Shop,Yoga Studio,Distribution Center,Discount Store,Diner,Dim Sum Restaurant,Dessert Shop,Department Store
249,"North Park, Maple Leaf Park, Upwood Park",Pharmacy,Athletics & Sports,Curling Ice,Bus Stop,Skating Rink,Beer Store,Park,Fried Chicken Joint,Distribution Center,College Rec Center
250,"North Park, Maple Leaf Park, Upwood Park",Pharmacy,Athletics & Sports,Curling Ice,Bus Stop,Skating Rink,Beer Store,Park,Fried Chicken Joint,Distribution Center,College Rec Center
251,"North Park, Maple Leaf Park, Upwood Park",Pharmacy,Athletics & Sports,Curling Ice,Bus Stop,Skating Rink,Beer Store,Park,Fried Chicken Joint,Distribution Center,College Rec Center
252,"North Park, Maple Leaf Park, Upwood Park",Pharmacy,Athletics & Sports,Curling Ice,Bus Stop,Skating Rink,Beer Store,Park,Fried Chicken Joint,Distribution Center,College Rec Center
253,"North Park, Maple Leaf Park, Upwood Park",Pharmacy,Athletics & Sports,Curling Ice,Bus Stop,Skating Rink,Beer Store,Park,Fried Chicken Joint,Distribution Center,College Rec Center
254,"North Park, Maple Leaf Park, Upwood Park",Pharmacy,Athletics & Sports,Curling Ice,Bus Stop,Skating Rink,Beer Store,Park,Fried Chicken Joint,Distribution Center,College Rec Center


So form my analysis, it seems that one of the most common places in Northyork is the coffee shops since they appear as the most or second most common venue of more than 3 clusterings. 