# Part 1

We will need the pandas library for making the dataframe, wikipedia to download a wikipedia page for us (easier than trying to use a raw url), and the numpy library for it's arange function, as it is useful.

In [1]:
import pandas as pd
import wikipedia as wp
import numpy as np

In [2]:
# For reference, wiki page is https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M

The code below grabs the wiki page as a raw string, and then has pandas read the html for any table. Through experimentation, it turned out the table needed was the first table on the page, hence the [0] after the df. We then called df.head to make sure we were getting the right thing.

In [8]:
wiki_page = wp.page("List_of_postal_codes_of_Canada:_M").html().encode("UTF-8") 
df = pd.read_html(wiki_page, header = 0)[0]
df.head(10)

Unnamed: 0,Postcode,Borough,Neighborhood
0,M1A,Not assigned,Not assigned
1,M2A,Not assigned,Not assigned
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,Harbourfront
5,M6A,North York,Lawrence Heights
6,M6A,North York,Lawrence Manor
7,M7A,Downtown Toronto,Queen's Park
8,M8A,Not assigned,Not assigned
9,M9A,Queen's Park,Not assigned


The below code dropped all rows in the dataframe where the Borough was not assigned

In [9]:
dropped_rows = df[df['Borough'] == 'Not assigned'].index
df.drop(dropped_rows, inplace=True)

In [10]:
df.head(10)

Unnamed: 0,Postcode,Borough,Neighborhood
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,Harbourfront
5,M6A,North York,Lawrence Heights
6,M6A,North York,Lawrence Manor
7,M7A,Downtown Toronto,Queen's Park
9,M9A,Queen's Park,Not assigned
10,M1B,Scarborough,Rouge
11,M1B,Scarborough,Malvern
13,M3B,North York,Don Mills North


The below code is slightly complicated, but in essence it combines all postal codes that have multiple neighborhoods in them into 1 postal code. 

Assumptions made were as follows:
    
    1) The dataframe was orderly in that M2A followed M1A and so on
    
    2) All Postal Codes have the same Borough (Glancing through the wiki page, it seemed to be the case)
    
In addition to combinging into 1 Postal Code, if the neighborhood was not assigned, it would assign the borough name as the neighborhood name.

Caveats:
    While a while loop could have been used in place of the "for y in range (2,10), using the for loop was a little simpler, and there was nowhere that there would have been more than 100 duplicates of Postal Codes.

In [14]:
data = []
i = 0
while i != len(df):
    if i == len(df)-1: break
    a1 = df.iloc[i]['Postcode']
    a2 = df.iloc[i]['Borough']
    a3 = df.iloc[i]['Neighborhood']
    if a3 == 'Not assigned':
        a3 = a2
    n = i+1
    b1 = df.iloc[n]['Postcode']
    if a1 == b1:
        b3 = df.iloc[n]['Neighborhood']
        if b3 == 'Not assigned':
            b3 = a2
        text = [a3,b3]
        for y in range(2,100):
            if i+y == len(df):
                data.append([a1, a2, str(text)])
                break
            c1 = df.iloc[i+y]['Postcode']
            if a1 == c1:
                c3 = df.iloc[i+y]['Neighborhood']
                if c3 == 'Not assigned':
                    c3 = a2
                text.append(c3)
                n +=1
            else:
                data.append([a1, a2, str(text)])
                break
            
        i = n+1
    elif a1 != b1:
        data.append([a1,a2,a3])
        i = n
    
    
            

In [15]:
combined_postcode_df = pd.DataFrame(data, columns = ['Postcode', 'Borough', 'Neighbourhood'])

In [16]:
combined_postcode_df.head(10)

Unnamed: 0,Postcode,Borough,Neighbourhood
0,M3A,North York,Parkwoods
1,M4A,North York,Victoria Village
2,M5A,Downtown Toronto,Harbourfront
3,M6A,North York,"['Lawrence Heights', 'Lawrence Manor']"
4,M7A,Downtown Toronto,Queen's Park
5,M9A,Queen's Park,Queen's Park
6,M1B,Scarborough,"['Rouge', 'Malvern']"
7,M3B,North York,Don Mills North
8,M4B,East York,"['Woodbine Gardens', 'Parkview Hill']"
9,M5B,Downtown Toronto,"['Ryerson', 'Garden District']"


In [17]:
combined_postcode_df.shape

(103, 3)

# Part 2

Import the geocoder module, even though it ends up not working.

In [18]:
import geocoder

Create our function that will get the latitude and longitude of our postal code. It just keeps running in a loop though...

In [19]:
def lat_and_longer (code):
    coords = None
    while (coords is None):
        geo = geocoder.google(str(code)+", "+"Toronto, Ontario")
        coords = geo.latlng
    latitude = coords[0]
    longitude = coords[1]
    return latitude, longitude

The below code does not work as the lat_and_longer keeps running in a loop. We keep getting None back as the result for the geocoder, so it is simply not talking to google very well. Mapquest requests an API key, so in the interest of preserving our sanity, we elected to proceed using the csv provided.

In [20]:

"""
newdata = []
i = 0
while i != len(combined_postcode_df):
    a1 = combined_postcode_df.iloc[i]['Postcode']
    print(a1, i)
    a2 = combined_postcode_df.iloc[i]['Borough']
    a3 = combined_postcode_df.iloc[i]['Neighbourhood']
    a4, a5 = lat_and_longer(a1)
    print(a4, a5)
    newdata.append([a1, a2, a3, a4, a5])
    i += 1
"""

"\nnewdata = []\ni = 0\nwhile i != len(combined_postcode_df):\n    a1 = combined_postcode_df.iloc[i]['Postcode']\n    print(a1, i)\n    a2 = combined_postcode_df.iloc[i]['Borough']\n    a3 = combined_postcode_df.iloc[i]['Neighbourhood']\n    a4, a5 = lat_and_longer(a1)\n    print(a4, a5)\n    newdata.append([a1, a2, a3, a4, a5])\n    i += 1\n"

Initialize the csv as a dataframe and check it for layout

In [21]:
latlongdb = pd.read_csv("Geospatial_Coordinates.csv")

In [22]:
latlongdb.head(10)

Unnamed: 0,Postal Code,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497
2,M1E,43.763573,-79.188711
3,M1G,43.770992,-79.216917
4,M1H,43.773136,-79.239476
5,M1J,43.744734,-79.239476
6,M1K,43.727929,-79.262029
7,M1L,43.711112,-79.284577
8,M1M,43.716316,-79.239476
9,M1N,43.692657,-79.264848


The below code takes the latitude and longitude from the csv file dataframe, and attaches it with the postcode, borough, and neighborhood from the other dataframe. 

Assumptions made:
    
    1) All postalcodes in both dataframes exist in the others'.

In [23]:
newdata = []
i = 0
while i != len(latlongdb):
    a1 = combined_postcode_df.iloc[i]['Postcode']
    a2 = combined_postcode_df.iloc[i]['Borough']
    a3 = combined_postcode_df.iloc[i]['Neighbourhood']
    location = latlongdb.loc[latlongdb['Postal Code'] == a1]
    a4 = location.iloc[0]['Latitude']
    a5 = location.iloc[0]['Longitude']
    newdata.append([a1, a2, a3, a4, a5])
    i += 1

And now we stuff the newdata list into a new dataframe.

In [24]:
final_df = pd.DataFrame(newdata, columns = ['Postal Code', 'Borough', 'Neighbourhood', 'Latitude', 'Longitude'])

In [25]:
final_df.head(10)

Unnamed: 0,Postal Code,Borough,Neighbourhood,Latitude,Longitude
0,M3A,North York,Parkwoods,43.753259,-79.329656
1,M4A,North York,Victoria Village,43.725882,-79.315572
2,M5A,Downtown Toronto,Harbourfront,43.65426,-79.360636
3,M6A,North York,"['Lawrence Heights', 'Lawrence Manor']",43.718518,-79.464763
4,M7A,Downtown Toronto,Queen's Park,43.662301,-79.389494
5,M9A,Queen's Park,Queen's Park,43.667856,-79.532242
6,M1B,Scarborough,"['Rouge', 'Malvern']",43.806686,-79.194353
7,M3B,North York,Don Mills North,43.745906,-79.352188
8,M4B,East York,"['Woodbine Gardens', 'Parkview Hill']",43.706397,-79.309937
9,M5B,Downtown Toronto,"['Ryerson', 'Garden District']",43.657162,-79.378937


# Part 3

In [26]:
from geopy.geocoders import Nominatim
import folium
import json
from sklearn.cluster import KMeans
import requests
from pandas.io.json import json_normalize
import matplotlib.cm as cm
import matplotlib.colors as colors

Below we will get Geo data for Toronto

In [27]:
address = 'Toronto, Ontario'

geolocator = Nominatim(user_agent="toto_explorer")
locationT = geolocator.geocode(address)
latitudeT = locationT.latitude
longitudeT = locationT.longitude
print('The geograpical coordinates of Toronto are {}, {}.'.format(latitudeT, longitudeT))

The geograpical coordinates of Toronto are 43.653963, -79.387207.


Now to create a map!

In [28]:
# create map of Toronto using latitude and longitude values
map_canada = folium.Map(location=[latitudeT, longitudeT], zoom_start=10)

# add markers to map
for lat, lng, borough, neighbourhood in zip(final_df['Latitude'], final_df['Longitude'], final_df['Borough'], final_df['Neighbourhood']):
    label = '{}, {}'.format(neighbourhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_canada)  
    
map_canada

In [29]:
CLIENT_ID = #for security reasons, nothing listed here. Put in your key if you would like to view it
CLIENT_SECRET =
VERSION =  

In [30]:
neighborhood_latitude = final_df.loc[0, 'Latitude'] 
neighborhood_longitude = final_df.loc[0, 'Longitude'] 

neighborhood_name = final_df.loc[0, 'Neighbourhood'] 

Now to get the Foursquare venue data and do some sorting

In [31]:

LIMIT = 100 

radius = 250

url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
    CLIENT_ID, CLIENT_SECRET, VERSION, neighborhood_latitude, neighborhood_longitude, radius, LIMIT)
print(url)

https://api.foursquare.com/v2/venues/explore?&client_id=KCU4KXUTNP1LKPPNM50HT2TE2OMXC0MVPOVAK5A2EHQN1WNN&client_secret=X2TZ0YR5RU5PCDJIW42NPOPSW5BXDEEVTACP2H552HSJ4PYZ&v=20180605&ll=43.7532586,-79.3296565&radius=250&limit=100


In [32]:
results = requests.get(url).json()
#results

In [33]:
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

In [34]:
venues = results['response']['groups'][0]['items']
    
nearby_venues = json_normalize(venues) # flatten JSON

# filter columns
filtered_columns = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
nearby_venues =nearby_venues.loc[:, filtered_columns]

# filter the category for each row
nearby_venues['venue.categories'] = nearby_venues.apply(get_category_type, axis=1)

# clean columns
nearby_venues.columns = [col.split(".")[-1] for col in nearby_venues.columns]

nearby_venues.head()

Unnamed: 0,name,categories,lat,lng
0,Brookbanks Park,Park,43.751976,-79.33214


In [35]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighbourhood', 
                  'Neighbourhood Latitude', 
                  'Neighbourhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

There we go, all sorted!

In [36]:
canada_venues = getNearbyVenues(names=final_df['Neighbourhood'],
                                   latitudes=final_df['Latitude'],
                                   longitudes=final_df['Longitude'])

Parkwoods
Victoria Village
Harbourfront
['Lawrence Heights', 'Lawrence Manor']
Queen's Park
Queen's Park
['Rouge', 'Malvern']
Don Mills North
['Woodbine Gardens', 'Parkview Hill']
['Ryerson', 'Garden District']
Glencairn
['Cloverdale', 'Islington', 'Martin Grove', 'Princess Gardens', 'West Deane Park']
['Highland Creek', 'Rouge Hill', 'Port Union']
['Flemingdon Park', 'Don Mills South']
Woodbine Heights
St. James Town
Humewood-Cedarvale
['Bloordale Gardens', 'Eringate', 'Markland Wood', 'Old Burnhamthorpe']
['Guildwood', 'Morningside', 'West Hill']
The Beaches
Berczy Park
Caledonia-Fairbanks
Woburn
Leaside
Central Bay Street
Christie
Cedarbrae
Hillcrest Village
['Bathurst Manor', 'Downsview North', 'Wilson Heights']
Thorncliffe Park
['Adelaide', 'King', 'Richmond']
['Dovercourt Village', 'Dufferin']
Scarborough Village
['Fairview', 'Henry Farm', 'Oriole']
['Northwood Park', 'York University']
East Toronto
['Harbourfront East', 'Toronto Islands', 'Union Station']
['Little Portugal', 'Tr

In [37]:
canada_venues.head(70)

Unnamed: 0,Neighbourhood,Neighbourhood Latitude,Neighbourhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Parkwoods,43.753259,-79.329656,Brookbanks Park,43.751976,-79.332140,Park
1,Parkwoods,43.753259,-79.329656,Variety Store,43.751974,-79.333114,Food & Drink Shop
2,Victoria Village,43.725882,-79.315572,Victoria Village Arena,43.723481,-79.315635,Hockey Arena
3,Victoria Village,43.725882,-79.315572,Tim Hortons,43.725517,-79.313103,Coffee Shop
4,Victoria Village,43.725882,-79.315572,Portugril,43.725819,-79.312785,Portuguese Restaurant
5,Victoria Village,43.725882,-79.315572,Eglinton Ave E & Sloane Ave/Bermondsey Rd,43.726086,-79.313620,Intersection
6,Victoria Village,43.725882,-79.315572,Pizza Nova,43.725824,-79.312860,Pizza Place
7,Harbourfront,43.654260,-79.360636,Roselle Desserts,43.653447,-79.362017,Bakery
8,Harbourfront,43.654260,-79.360636,Tandem Coffee,43.653559,-79.361809,Coffee Shop
9,Harbourfront,43.654260,-79.360636,Cooper Koo Family YMCA,43.653191,-79.357947,Gym / Fitness Center


In [38]:
canada_venues.shape

(2224, 7)

Some one-hot encoding to turn these into numbers that can be analyzed, followed by more sorting

In [39]:

canada_onehot = pd.get_dummies(canada_venues[['Venue Category']], prefix="", prefix_sep="")


canada_onehot['Neighbourhood'] = canada_venues['Neighbourhood'] 


fixed_columns = [canada_onehot.columns[-1]] + list(canada_onehot.columns[:-1])
canada_onehot = canada_onehot[fixed_columns]

canada_onehot.head()

Unnamed: 0,Neighbourhood,Accessories Store,Airport,Airport Food Court,Airport Gate,Airport Lounge,Airport Service,Airport Terminal,American Restaurant,Antique Shop,...,Vegetarian / Vegan Restaurant,Video Game Store,Video Store,Vietnamese Restaurant,Warehouse Store,Wine Bar,Wine Shop,Wings Joint,Women's Store,Yoga Studio
0,Parkwoods,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,Parkwoods,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,Victoria Village,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,Victoria Village,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,Victoria Village,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [40]:
canada_grouped = canada_onehot.groupby('Neighbourhood').mean().reset_index()
canada_grouped

Unnamed: 0,Neighbourhood,Accessories Store,Airport,Airport Food Court,Airport Gate,Airport Lounge,Airport Service,Airport Terminal,American Restaurant,Antique Shop,...,Vegetarian / Vegan Restaurant,Video Game Store,Video Store,Vietnamese Restaurant,Warehouse Store,Wine Bar,Wine Shop,Wings Joint,Women's Store,Yoga Studio
0,Agincourt,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.000000,...,0.000000,0.000000,0.0,0.000000,0.0,0.000000,0.000000,0.000000,0.000000,0.000000
1,Bayview Village,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.000000,...,0.000000,0.000000,0.0,0.000000,0.0,0.000000,0.000000,0.000000,0.000000,0.000000
2,Berczy Park,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.000000,...,0.017544,0.000000,0.0,0.000000,0.0,0.000000,0.000000,0.000000,0.000000,0.000000
3,Business Reply Mail Processing Centre 969 Eastern,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.000000,...,0.000000,0.000000,0.0,0.000000,0.0,0.000000,0.000000,0.000000,0.000000,0.062500
4,Caledonia-Fairbanks,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.000000,...,0.000000,0.000000,0.0,0.000000,0.0,0.000000,0.000000,0.000000,0.000000,0.000000
5,Canada Post Gateway Processing Centre,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.090909,0.000000,...,0.000000,0.000000,0.0,0.000000,0.0,0.000000,0.000000,0.000000,0.000000,0.000000
6,Cedarbrae,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.000000,...,0.000000,0.000000,0.0,0.000000,0.0,0.000000,0.000000,0.000000,0.000000,0.000000
7,Central Bay Street,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.012346,0.000000,...,0.012346,0.000000,0.0,0.000000,0.0,0.012346,0.000000,0.000000,0.000000,0.012346
8,Christie,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.000000,...,0.000000,0.000000,0.0,0.000000,0.0,0.000000,0.000000,0.000000,0.000000,0.000000
9,Church and Wellesley,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.011905,0.000000,...,0.000000,0.000000,0.0,0.000000,0.0,0.000000,0.011905,0.011905,0.000000,0.023810


In [41]:
num_top_venues = 5

for hood in canada_grouped['Neighbourhood']:
    print("----"+hood+"----")
    temp = canada_grouped[canada_grouped['Neighbourhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----Agincourt----
                       venue  freq
0             Breakfast Spot  0.25
1                     Lounge  0.25
2  Latin American Restaurant  0.25
3               Skating Rink  0.25
4          Accessories Store  0.00


----Bayview Village----
                 venue  freq
0                 Bank  0.25
1  Japanese Restaurant  0.25
2   Chinese Restaurant  0.25
3                 Café  0.25
4    Accessories Store  0.00


----Berczy Park----
                venue  freq
0         Coffee Shop  0.09
1        Cocktail Bar  0.05
2            Beer Bar  0.04
3      Farmers Market  0.04
4  Seafood Restaurant  0.04


----Business Reply Mail Processing Centre 969 Eastern----
                venue  freq
0         Yoga Studio  0.06
1       Auto Workshop  0.06
2       Garden Center  0.06
3              Garden  0.06
4  Light Rail Station  0.06


----Caledonia-Fairbanks----
                       venue  freq
0                       Park  0.50
1                     Market  0.25
2       Fast Food R

                 venue  freq
0     Ramen Restaurant  0.09
1                 Café  0.06
2  Japanese Restaurant  0.06
3     Sushi Restaurant  0.06
4           Restaurant  0.06


----Willowdale West----
            venue  freq
0        Pharmacy   0.2
1   Grocery Store   0.2
2  Discount Store   0.2
3     Coffee Shop   0.2
4     Pizza Place   0.2


----Woburn----
               venue  freq
0        Coffee Shop  0.50
1  Korean Restaurant  0.25
2           Pharmacy  0.25
3  Accessories Store  0.00
4              Motel  0.00


----Woodbine Heights----
            venue  freq
0        Pharmacy  0.11
1  Cosmetics Shop  0.11
2            Park  0.11
3        Bus Stop  0.11
4     Curling Ice  0.11


----York Mills West----
                       venue  freq
0                       Bank  0.33
1          Convenience Store  0.33
2                       Park  0.33
3          Accessories Store  0.00
4  Middle Eastern Restaurant  0.00


----['Adelaide', 'King', 'Richmond']----
          venue  freq
0   C

                 venue  freq
0       Baseball Field   1.0
1    Accessories Store   0.0
2   Miscellaneous Shop   0.0
3                Motel   0.0
4  Monument / Landmark   0.0


----['Kingsview Village', 'Martin Grove Gardens', 'Richview Gardens', 'St. Phillips']----
                venue  freq
0      Sandwich Place  0.25
1   Mobile Phone Shop  0.25
2            Bus Line  0.25
3         Pizza Place  0.25
4  Miscellaneous Shop  0.00


----['Kingsway Park South West', 'Mimico NW', 'The Queensway West', 'Royal York South West', 'South of Bloor']----
                  venue  freq
0        Hardware Store  0.08
1         Grocery Store  0.08
2        Discount Store  0.08
3        Sandwich Place  0.08
4  Fast Food Restaurant  0.08


----['Lawrence Heights', 'Lawrence Manor']----
                    venue  freq
0          Clothing Store  0.23
1       Accessories Store  0.15
2  Furniture / Home Store  0.15
3   Vietnamese Restaurant  0.08
4             Coffee Shop  0.08


----['Little Portugal', 'T

In [42]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [43]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighbourhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighbourhood'] = canada_grouped['Neighbourhood']

for ind in np.arange(canada_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(canada_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted.head()

Unnamed: 0,Neighbourhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Agincourt,Lounge,Breakfast Spot,Latin American Restaurant,Skating Rink,Eastern European Restaurant,Dog Run,Doner Restaurant,Donut Shop,Drugstore,Dumpling Restaurant
1,Bayview Village,Café,Bank,Chinese Restaurant,Japanese Restaurant,Discount Store,Dog Run,Doner Restaurant,Donut Shop,Drugstore,Yoga Studio
2,Berczy Park,Coffee Shop,Cocktail Bar,Steakhouse,Beer Bar,Bakery,Seafood Restaurant,Cheese Shop,Café,Farmers Market,Hotel
3,Business Reply Mail Processing Centre 969 Eastern,Yoga Studio,Auto Workshop,Park,Comic Shop,Pizza Place,Restaurant,Burrito Place,Brewery,Skate Park,Smoke Shop
4,Caledonia-Fairbanks,Park,Fast Food Restaurant,Market,Eastern European Restaurant,Dog Run,Doner Restaurant,Donut Shop,Drugstore,Dumpling Restaurant,Empanada Restaurant


Now we run a KNN algorithm to cluster the neighbourhoods by venues, then we will print a color-coded map of all 
neighbourhoods with their cluster type.

In [44]:
# set number of clusters
kclusters = 5

canada_grouped_clustering = canada_grouped.drop('Neighbourhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(canada_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

array([3, 3, 3, 3, 1, 3, 3, 3, 3, 3], dtype=int32)

In [45]:
# add clustering labels
neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

canada_merged = final_df

# merge toronto_grouped with toronto_data to add latitude/longitude for each neighborhood
canada_merged = canada_merged.join(neighborhoods_venues_sorted.set_index('Neighbourhood'), on='Neighbourhood')

canada_merged.head() # check the last columns!

Unnamed: 0,Postal Code,Borough,Neighbourhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,M3A,North York,Parkwoods,43.753259,-79.329656,1.0,Food & Drink Shop,Park,Yoga Studio,Dog Run,Doner Restaurant,Donut Shop,Drugstore,Dumpling Restaurant,Eastern European Restaurant,Electronics Store
1,M4A,North York,Victoria Village,43.725882,-79.315572,3.0,Portuguese Restaurant,Intersection,Pizza Place,Coffee Shop,Hockey Arena,Dumpling Restaurant,Discount Store,Dog Run,Doner Restaurant,Donut Shop
2,M5A,Downtown Toronto,Harbourfront,43.65426,-79.360636,3.0,Coffee Shop,Park,Bakery,Pub,Breakfast Spot,Café,Mexican Restaurant,Shoe Store,Brewery,Restaurant
3,M6A,North York,"['Lawrence Heights', 'Lawrence Manor']",43.718518,-79.464763,3.0,Clothing Store,Accessories Store,Furniture / Home Store,Arts & Crafts Store,Miscellaneous Shop,Coffee Shop,Boutique,Women's Store,Vietnamese Restaurant,Airport Terminal
4,M7A,Downtown Toronto,Queen's Park,43.662301,-79.389494,3.0,Coffee Shop,Park,Gym,Sushi Restaurant,Beer Bar,Smoothie Shop,Burger Joint,Sandwich Place,Burrito Place,Café


In [46]:
canada_merged.dropna(subset = ['Cluster Labels'], inplace=True)

In [47]:
canada_merged.shape

(100, 16)

In [48]:
# create map
map_clusters = folium.Map(location=[latitudeT, longitudeT], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(canada_merged['Latitude'], canada_merged['Longitude'], canada_merged['Neighbourhood'], canada_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[int(cluster)-1],
        fill=True,
        fill_color=rainbow[int(cluster)-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

And to get a good feel for how the clusters were sorted, we will look at the dataframes for each to see if we
see a common theme for each

In [49]:
canada_merged.loc[canada_merged['Cluster Labels'] == 0].groupby('1st Most Common Venue').count()
#canada_merged.loc[canada_merged['Cluster Labels'] == 0, canada_merged.columns[[1] + list(range(5, canada_merged.shape[1]))]]

#Random cluster, no discernable pattern



Unnamed: 0_level_0,Postal Code,Borough,Neighbourhood,Latitude,Longitude,Cluster Labels,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
1st Most Common Venue,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1
Playground,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1
Tennis Court,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1


In [50]:
#canada_merged.loc[canada_merged['Cluster Labels'] == 1].groupby('1st Most Common Venue').count()
canada_merged.loc[canada_merged['Cluster Labels'] == 1, canada_merged.columns[[1] + list(range(5, canada_merged.shape[1]))]]


#Public spaces cluster, containing our ideal neighbourhoods for living, as they contain Parks and other open spaces

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,North York,1.0,Food & Drink Shop,Park,Yoga Studio,Dog Run,Doner Restaurant,Donut Shop,Drugstore,Dumpling Restaurant,Eastern European Restaurant,Electronics Store
21,York,1.0,Park,Fast Food Restaurant,Market,Eastern European Restaurant,Dog Run,Doner Restaurant,Donut Shop,Drugstore,Dumpling Restaurant,Empanada Restaurant
35,East York,1.0,Park,Coffee Shop,Convenience Store,Yoga Studio,Eastern European Restaurant,Dog Run,Doner Restaurant,Donut Shop,Drugstore,Dumpling Restaurant
40,North York,1.0,Airport,Park,Yoga Studio,Eastern European Restaurant,Discount Store,Dog Run,Doner Restaurant,Donut Shop,Drugstore,Dumpling Restaurant
64,York,1.0,Park,Yoga Studio,Eastern European Restaurant,Discount Store,Dog Run,Doner Restaurant,Donut Shop,Drugstore,Dumpling Restaurant,Electronics Store
66,North York,1.0,Park,Bank,Convenience Store,Yoga Studio,Eastern European Restaurant,Dog Run,Doner Restaurant,Donut Shop,Drugstore,Dumpling Restaurant
85,Scarborough,1.0,Bakery,Park,Playground,Eastern European Restaurant,Discount Store,Dog Run,Doner Restaurant,Donut Shop,Drugstore,Dumpling Restaurant
91,Downtown Toronto,1.0,Park,Trail,Playground,Yoga Studio,Diner,Discount Store,Dog Run,Doner Restaurant,Donut Shop,Drugstore
98,Etobicoke,1.0,Park,River,Pool,Yoga Studio,Drugstore,Diner,Discount Store,Dog Run,Doner Restaurant,Donut Shop


In [51]:
canada_merged.loc[canada_merged['Cluster Labels'] == 2].groupby('1st Most Common Venue').count()
#canada_merged.loc[canada_merged['Cluster Labels'] == 2, canada_merged.columns[[1] + list(range(5, canada_merged.shape[1]))]]


#Another cluster with no discernable pattern

Unnamed: 0_level_0,Postal Code,Borough,Neighbourhood,Latitude,Longitude,Cluster Labels,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
1st Most Common Venue,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1
Baseball Field,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1
Food Truck,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1
Paper / Office Supplies Store,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1


In [52]:
#canada_merged.loc[canada_merged['Cluster Labels'] == 3].groupby('1st Most Common Venue').count()
canada_merged.loc[canada_merged['Cluster Labels'] == 3, canada_merged.columns[[1] + list(range(5, canada_merged.shape[1]))]]


#The mall cluster, contains venues you might find in a mall setting

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
1,North York,3.0,Portuguese Restaurant,Intersection,Pizza Place,Coffee Shop,Hockey Arena,Dumpling Restaurant,Discount Store,Dog Run,Doner Restaurant,Donut Shop
2,Downtown Toronto,3.0,Coffee Shop,Park,Bakery,Pub,Breakfast Spot,Café,Mexican Restaurant,Shoe Store,Brewery,Restaurant
3,North York,3.0,Clothing Store,Accessories Store,Furniture / Home Store,Arts & Crafts Store,Miscellaneous Shop,Coffee Shop,Boutique,Women's Store,Vietnamese Restaurant,Airport Terminal
4,Downtown Toronto,3.0,Coffee Shop,Park,Gym,Sushi Restaurant,Beer Bar,Smoothie Shop,Burger Joint,Sandwich Place,Burrito Place,Café
5,Queen's Park,3.0,Coffee Shop,Park,Gym,Sushi Restaurant,Beer Bar,Smoothie Shop,Burger Joint,Sandwich Place,Burrito Place,Café
7,North York,3.0,Gym / Fitness Center,Japanese Restaurant,Caribbean Restaurant,Baseball Field,Café,Ethiopian Restaurant,Empanada Restaurant,Electronics Store,Eastern European Restaurant,Discount Store
8,East York,3.0,Fast Food Restaurant,Pizza Place,Gastropub,Intersection,Bank,Breakfast Spot,Athletics & Sports,Pharmacy,Pet Store,Gym / Fitness Center
9,Downtown Toronto,3.0,Coffee Shop,Clothing Store,Café,Cosmetics Shop,Middle Eastern Restaurant,Theater,Diner,Pizza Place,Bubble Tea Shop,Sporting Goods Shop
10,North York,3.0,Park,Pub,Pizza Place,Japanese Restaurant,Discount Store,Dog Run,Doner Restaurant,Donut Shop,Drugstore,Dumpling Restaurant
12,Scarborough,3.0,Moving Target,Bar,History Museum,Yoga Studio,Eastern European Restaurant,Doner Restaurant,Donut Shop,Drugstore,Dumpling Restaurant,Electronics Store


In [53]:
#canada_merged.loc[canada_merged['Cluster Labels'] == 4].groupby('1st Most Common Venue').count()
canada_merged.loc[canada_merged['Cluster Labels'] == 4, canada_merged.columns[[1] + list(range(5, canada_merged.shape[1]))]]


#Another indiscernable cluster

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
6,Scarborough,4.0,Fast Food Restaurant,Yoga Studio,Diner,Farmers Market,Falafel Restaurant,Event Space,Ethiopian Restaurant,Empanada Restaurant,Electronics Store,Eastern European Restaurant
56,York,4.0,Fast Food Restaurant,Restaurant,Sandwich Place,Dim Sum Restaurant,Diner,Discount Store,Dog Run,Doner Restaurant,Donut Shop,Yoga Studio


In [54]:
canada_merged.loc[canada_merged['Cluster Labels'] == 5, canada_merged.columns[[1] + list(range(5, canada_merged.shape[1]))]]

#We probably could have used less clusters, but it still works!

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
