# Canada Postal Codes Dataframe Assignment

## Applied Data Science Capstone Project - Coursera: Week 3 assignment

### Nishant Vemulakonda

### Importing Libraries

In [1]:
import pandas as pd # library for data analsysis
import numpy as np

from bs4 import BeautifulSoup as bs # Library for Web scraping
import urllib

from geopy.geocoders import Nominatim # convert an address into latitude and longitude values
import requests # library to handle requests

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors
%matplotlib inline
!pip install folium
import folium as Folium

# import k-means from clustering stage
from sklearn.cluster import KMeans

print('Libraries imported')

Requirement not upgraded as not directly required: folium in /opt/conda/envs/DSX-Python35/lib/python3.5/site-packages
Requirement not upgraded as not directly required: six in /opt/conda/envs/DSX-Python35/lib/python3.5/site-packages (from folium)
Requirement not upgraded as not directly required: jinja2 in /opt/conda/envs/DSX-Python35/lib/python3.5/site-packages (from folium)
Requirement not upgraded as not directly required: branca>=0.3.0 in /opt/conda/envs/DSX-Python35/lib/python3.5/site-packages (from folium)
Requirement not upgraded as not directly required: numpy in /opt/conda/envs/DSX-Python35/lib/python3.5/site-packages (from folium)
Requirement not upgraded as not directly required: requests in /opt/conda/envs/DSX-Python35/lib/python3.5/site-packages (from folium)
Requirement not upgraded as not directly required: MarkupSafe>=0.23 in /opt/conda/envs/DSX-Python35/lib/python3.5/site-packages (from jinja2->folium)
Requirement not upgraded as not directly required: chardet<3.1.0,>=

## Part 1: Web scraping for Toronto neighborhood and build a clean dataframe

Use the Notebook to build the code to scrape the following Wikipedia page, https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M, in order to obtain the data that is in the table of postal codes and to transform the data into a pandas dataframe.

The dataframe will consist of three columns: PostalCode, Borough, and Neighborhood

Only process the cells that have an assigned borough. Ignore cells with a borough that is Not assigned.

More than one neighborhood can exist in one postal code area. For example, in the table on the Wikipedia page, you will notice that M5A is listed twice and has two neighborhoods: Harbourfront and Regent Park. These two rows will be combined into one row with the neighborhoods separated with a comma.

If a cell has a borough but a Not assigned neighborhood, then the neighborhood will be the same as the borough. So for the 9th cell in the table on the Wikipedia page, the value of the Borough and the Neighborhood columns will be Queen's Park.

In the last cell of your notebook, use the .shape method to print the number of rows of your dataframe.

In [2]:
# Get data from Wiki page using BeautifulSoup
URL = "https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M"

soup = bs(urllib.request.urlopen(URL), 'html.parser')
print(soup.prettify()[:500])

<!DOCTYPE html>
<html class="client-nojs" dir="ltr" lang="en">
 <head>
  <meta charset="utf-8"/>
  <title>
   List of postal codes of Canada: M - Wikipedia
  </title>
  <script>
   document.documentElement.className = document.documentElement.className.replace( /(^|\s)client-nojs(\s|$)/, "$1client-js$2" );
  </script>
  <script>
   (window.RLQ=window.RLQ||[]).push(function(){mw.config.set({"wgCanonicalNamespace":"","wgCanonicalSpecialPageName":false,"wgNamespaceNumber":0,"wgPageName":"List_of_po


### Scrape the HTML
The data we want is in a table, with 3 columns PostalCode, Borough and Neighbourhood.

In [3]:
#Extracting column names (Table headers) for pandas dataframe
columns = [str(col.string.replace("\n","")) for col in soup.table.find_all('th')] 
print(columns)

# Extracting row data for pandas dataframe
row_text = [str(bs(str(row).replace("\n",""), "lxml").text) for row in soup.table.find_all('td')]
print(row_text[:20])

['Postcode', 'Borough', 'Neighbourhood']
['M1A', 'Not assigned', 'Not assigned', 'M2A', 'Not assigned', 'Not assigned', 'M3A', 'North York', 'Parkwoods', 'M4A', 'North York', 'Victoria Village', 'M5A', 'Downtown Toronto', 'Harbourfront', 'M5A', 'Downtown Toronto', 'Regent Park', 'M6A', 'North York']


In [4]:
postalcodes = row_text[0::3]
boroughs = row_text[1::3]
neighbourhoods =  row_text[2::3]
# Checking length of each list, i.e. number of rows
print("Length: {},{},{}".format(len(postalcodes),len(boroughs),len(neighbourhoods)))

Length: 289,289,289


### Tranform the data into a *pandas* dataframe
We will put the 3 lists created above to a pandas dataframe with corresponding columns name.

In [5]:
df = pd.DataFrame(columns=columns)
df.rename(columns={"Postcode":"Postalcode"},inplace=True)
# Looping thorugh lists created above to insert data into dataframe
df.Postalcode = postalcodes
df.Borough =  boroughs
df.Neighbourhood = neighbourhoods
print ("df dimensions:",df.shape)

# Applying filters to get relevant data 

# Filter mask to remove "not assigned" Boroughs
# Removing not assigned rows There are some postal codes which are not belongs to any borough.

filter_mask = df.Borough == "Not assigned"

#Filtering dataframe
df_filtered = df[~filter_mask]
print ("df_filtered dimensions:", df_filtered.shape)

df dimensions: (289, 3)
df_filtered dimensions: (212, 3)


### Deal with Not assigned Neighborhood
For M7A Queen's Park, there is no neighborhood assigned.
We will replace the 'Not assigned' with the value of the corresponding Borough

In [6]:
#Replacing Neighborhoods with "not assigned" with Borough value:
df_filtered.Neighbourhood[df_filtered.Neighbourhood == 'Not assigned'] = df_filtered.Borough
print ("df_filtered dimensions:",df_filtered.shape)
# df_filtered.Neighbourhood.to_csv('data.csv') #validating data

df_filtered dimensions: (212, 3)


A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  self._update_inplace(new_data)
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  exec(code_obj, self.user_global_ns, self.user_ns)


In [7]:
df_final = df_filtered.groupby(['Postalcode','Borough']).Neighbourhood.agg(lambda val: ','.join(val)).reset_index()
df_final.head()

Unnamed: 0,Postalcode,Borough,Neighbourhood
0,M1B,Scarborough,"Rouge,Malvern"
1,M1C,Scarborough,"Highland Creek,Rouge Hill,Port Union"
2,M1E,Scarborough,"Guildwood,Morningside,West Hill"
3,M1G,Scarborough,Woburn
4,M1H,Scarborough,Cedarbrae


### DateFrame cleaned
#### The final dataframe has 103 rows

In [8]:
print (df_final.shape)

(103, 3)


## Part 2: Getting coordinates and add to the Toronto DataFrame

Now that you have built a dataframe of the postal code of each neighborhood along with the borough name and neighborhood name, in order to utilize the Foursquare location data, we need to get the latitude and the longitude coordinates of each neighborhood.

Note: Given that this package can be very unreliable, in case you are not able to get the geographical coordinates of the neighborhoods using the Geocoder package, here is a link to a csv file that has the geographical coordinates of each postal code: http://cocl.us/Geospatial_data

The csv file format has:
3 columns: Postal Code, Latitude and Longitude
103 rows: corresponding to 103 postal codes in our toronto dataframe

In [9]:
geo_coordinates_data = pd.read_csv('http://cocl.us/Geospatial_data')
print(geo_coordinates_data.shape)
geo_coordinates_data.head()

(103, 3)


Unnamed: 0,Postal Code,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497
2,M1E,43.763573,-79.188711
3,M1G,43.770992,-79.216917
4,M1H,43.773136,-79.239476


### Now, We will merge the two dataframes.
To get the correct result, we will need to perform an inner join on the dataframes using Postal Code column as the join key.

In [10]:
df_with_coordinates = pd.merge(df_final, geo_coordinates_data, left_on="Postalcode",right_on="Postal Code", how='inner')
df_with_coordinates.drop("Postal Code", axis=1, inplace=True)

df_with_coordinates.head()

Unnamed: 0,Postalcode,Borough,Neighbourhood,Latitude,Longitude
0,M1B,Scarborough,"Rouge,Malvern",43.806686,-79.194353
1,M1C,Scarborough,"Highland Creek,Rouge Hill,Port Union",43.784535,-79.160497
2,M1E,Scarborough,"Guildwood,Morningside,West Hill",43.763573,-79.188711
3,M1G,Scarborough,Woburn,43.770992,-79.216917
4,M1H,Scarborough,Cedarbrae,43.773136,-79.239476


In [11]:
df_with_coordinates.shape

(103, 5)

In [12]:
print('The dataframe has {} boroughs and {} neighborhoods.'.format(df_with_coordinates['Borough'].nunique(),df_with_coordinates.shape[0]))

The dataframe has 11 boroughs and 103 neighborhoods.


### Defining Foursquare credentials

In [13]:
CLIENT_ID     = 'FN0MURDA1W45WUJEJRFFSPJYFUHKA402D5YXDYQNNTEAIUSG' # Foursquare ID
CLIENT_SECRET = 'NYALEUEKN4A5AZWWGE0RF0EF5J25TM3BCW103ZG5SDU0ICXR' # Foursquare Secret
VERSION = '20180605' # Foursquare API version

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: FN0MURDA1W45WUJEJRFFSPJYFUHKA402D5YXDYQNNTEAIUSG
CLIENT_SECRET:NYALEUEKN4A5AZWWGE0RF0EF5J25TM3BCW103ZG5SDU0ICXR


## Part 3: Explore and cluster the neighborhoods in Toronto

In [14]:
address = 'Toronto, CA'

geolocator = Nominatim(user_agent="foursquare")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinates of Toronto are {}, {}.'.format(latitude, longitude))

The geograpical coordinates of Toronto are 43.653963, -79.387207.


### Exploring Toronto city
### Using Folium to create a Map of Toronto with Boroughs markers on top

In [15]:
# create map of New York using latitude and longitude values
toronto_map = Folium.Map(location=[latitude, longitude], zoom_start=10)

# add markers to map
for lat, long, borough, neighborhood in zip(df_with_coordinates['Latitude'], df_with_coordinates['Longitude'], df_with_coordinates['Borough'], df_with_coordinates['Neighbourhood']):
    Folium.CircleMarker(
        [lat, long],
        radius=5,
        popup=Folium.Popup('{}, {}'.format(neighborhood, borough), parse_html=True), #displays the popup for each point on map 
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.6,
        parse_html=False).add_to(toronto_map)  
    
toronto_map

### Exploring Downtown Toronto

In [16]:
address = 'Downtown Toronto, CA'

geolocator = Nominatim(user_agent="foursquare")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Downtown Toronto are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Downtown Toronto are 43.655115, -79.380219.


In [17]:
Neigh = df_with_coordinates[df_with_coordinates.Borough == "Downtown Toronto"].Neighbourhood

# create map of East Toronto using latitude and longitude values
downtown_toronto_map = Folium.Map(location=[latitude, longitude], zoom_start=10)

# add markers to map
for lat, long, neighborhood in zip(df_with_coordinates['Latitude'], df_with_coordinates['Longitude'], Neigh):
    Folium.CircleMarker(
        [lat, long],
        radius=5,
        popup=Folium.Popup('{}, {}'.format(neighborhood, borough), parse_html=True), #displays the popup for each point on map 
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.6,
        parse_html=False).add_to(downtown_toronto_map)  
    
downtown_toronto_map

### Exploring Downtown Toronto neighborhoods

In [18]:
Downtown_Toronto = df_with_coordinates[df_with_coordinates.Borough == "Downtown Toronto"].reset_index(drop=True)
Downtown_Toronto.loc[0, 'Neighbourhood']

'Rosedale'

In [19]:
n_latitude = Downtown_Toronto.loc[0, 'Latitude'] # neighborhood latitude value
n_longitude = Downtown_Toronto.loc[0, 'Longitude'] # neighborhood longitude value
n_name = Downtown_Toronto.loc[0, 'Neighbourhood'] # neighborhood name

print('Latitude and longitude values of {} are {}, {}.'.format(n_name, n_latitude, n_longitude))

Latitude and longitude values of Rosedale are 43.6795626, -79.37752940000001.


#### Top 100 venues that are in Rosedale in a radius of 500 meters.

In [20]:
#Building URL
URL = 'https://api.foursquare.com/v2/venues/explore?client_id={}&client_secret={}&ll={},{}&v={}&radius={}&limit={}'.format(CLIENT_ID, CLIENT_SECRET, n_latitude, n_longitude, VERSION, 500, 100)
URL

'https://api.foursquare.com/v2/venues/explore?client_id=FN0MURDA1W45WUJEJRFFSPJYFUHKA402D5YXDYQNNTEAIUSG&client_secret=NYALEUEKN4A5AZWWGE0RF0EF5J25TM3BCW103ZG5SDU0ICXR&ll=43.6795626,-79.37752940000001&v=20180605&radius=500&limit=100'

In [21]:
results = requests.get(URL).json()
results['response']['groups']

[{'items': [{'reasons': {'count': 0,
     'items': [{'reasonName': 'globalInteractionReason',
       'summary': 'This spot is popular',
       'type': 'general'}]},
    'referralId': 'e-0-4aff2d47f964a520743522e3-0',
    'venue': {'categories': [{'icon': {'prefix': 'https://ss3.4sqi.net/img/categories_v2/parks_outdoors/playground_',
        'suffix': '.png'},
       'id': '4bf58dd8d48988d1e7941735',
       'name': 'Playground',
       'pluralName': 'Playgrounds',
       'primary': True,
       'shortName': 'Playground'}],
     'id': '4aff2d47f964a520743522e3',
     'location': {'address': '38 Scholfield Ave.',
      'cc': 'CA',
      'city': 'Toronto',
      'country': 'Canada',
      'crossStreet': 'at Edgar Ave.',
      'distance': 327,
      'formattedAddress': ['38 Scholfield Ave. (at Edgar Ave.)',
       'Toronto ON',
       'Canada'],
      'labeledLatLngs': [{'label': 'display',
        'lat': 43.68232820227814,
        'lng': -79.37893434347683}],
      'lat': 43.68232820227814

In [22]:
# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

In [23]:
import json # library to handle JSON files
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

venues = results['response']['groups'][0]['items']
    
nearby_venues = json_normalize(venues) # flatten JSON

# filter columns
filtered_columns = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
nearby_venues = nearby_venues.loc[:, filtered_columns]

# # filter the category for each row
nearby_venues['venue.categories'] = nearby_venues.apply(get_category_type, axis=1)
nearby_venues.head()

Unnamed: 0,venue.name,venue.categories,venue.location.lat,venue.location.lng
0,Rosedale Park,Playground,43.682328,-79.378934
1,Whitney Park,Park,43.682036,-79.373788
2,Alex Murray Parkette,Park,43.6783,-79.382773
3,Milkman's Lane,Trail,43.676352,-79.373842


In [24]:
print('{} venues were returned by Foursquare.'.format(nearby_venues.shape[0]))

4 venues were returned by Foursquare.


### Exploring Toronto neighborhoods only 

#### We will only explore boroughs that have Toronto in their names.

In [25]:
# Creating boolean Filter for Boroughs with "toronto".
Toronto_boroughs = df_with_coordinates.Borough.map(lambda B: str(B).lower().find("toronto") > 0)

# Appyling boolean Filter
df_Toronto_Boroughs = df_with_coordinates[Toronto_boroughs].reset_index(drop=True)
df_Toronto_Boroughs.shape
df_Toronto_Boroughs.head()

Unnamed: 0,Postalcode,Borough,Neighbourhood,Latitude,Longitude
0,M4E,East Toronto,The Beaches,43.676357,-79.293031
1,M4K,East Toronto,"The Danforth West,Riverdale",43.679557,-79.352188
2,M4L,East Toronto,"The Beaches West,India Bazaar",43.668999,-79.315572
3,M4M,East Toronto,Studio District,43.659526,-79.340923
4,M4N,Central Toronto,Lawrence Park,43.72802,-79.38879


In [26]:
map_toronto_boroughs = Folium.Map(location=[latitude, longitude], zoom_start=12)

for lat, long, borough, neigh in zip(df_Toronto_Boroughs['Latitude'], df_Toronto_Boroughs['Longitude'], df_Toronto_Boroughs['Borough'], df_Toronto_Boroughs['Neighbourhood']):
    label = "{} {}".format(borough, neigh)
    popup = Folium.Popup(label, parse_html=True)
    Folium.CircleMarker(
        [lat, long],
        radius=5,
        popup=popup,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_toronto_boroughs)
    
map_toronto_boroughs

In [27]:
radius = 500
LIMIT = 100

venues_list = []

for lat, long, post, borough, neighborhood in zip( df_Toronto_Boroughs.Latitude, df_Toronto_Boroughs.Longitude, df_Toronto_Boroughs.Postalcode, 
                                                  df_Toronto_Boroughs.Borough, df_Toronto_Boroughs.Neighbourhood):
         
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            long, 
            radius, 
            100)
            
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each venue
        for venue in results:
            venues_list.append((
                post, 
                borough,
                neighborhood,
                lat, 
                long, 
                venue['venue']['name'], 
                venue['venue']['location']['lat'], 
                venue['venue']['location']['lng'],  
                venue['venue']['categories'][0]['name']))

In [28]:
# type your answer here
Toronto_venues = pd.DataFrame(venues_list)
Toronto_venues.columns = ['Postalcode', 'Borough', 'Neighbourhood', 'Neighbourhood Latitude', 'Neighbourhood Longitude', 'Venue Name', 'Venue Latitude', 'Venue Longitude', 'Venue Category']
print(Toronto_venues.shape)

(1692, 9)


## Analyze Each Neighborhood

In [29]:
Toronto_venues.head()

Unnamed: 0,Postalcode,Borough,Neighbourhood,Neighbourhood Latitude,Neighbourhood Longitude,Venue Name,Venue Latitude,Venue Longitude,Venue Category
0,M4E,East Toronto,The Beaches,43.676357,-79.293031,The Big Carrot Natural Food Market,43.678879,-79.297734,Health Food Store
1,M4E,East Toronto,The Beaches,43.676357,-79.293031,Grover Pub and Grub,43.679181,-79.297215,Pub
2,M4E,East Toronto,The Beaches,43.676357,-79.293031,Starbucks,43.678798,-79.298045,Coffee Shop
3,M4E,East Toronto,The Beaches,43.676357,-79.293031,Upper Beaches,43.680563,-79.292869,Neighborhood
4,M4K,East Toronto,"The Danforth West,Riverdale",43.679557,-79.352188,Pantheon,43.677621,-79.351434,Greek Restaurant


#### Check count of venues returned for each postal code

In [30]:
Toronto_venues.groupby(['Postalcode', 'Borough', 'Neighbourhood'])['Venue Name'].count().reset_index()

Unnamed: 0,Postalcode,Borough,Neighbourhood,Venue Name
0,M4E,East Toronto,The Beaches,4
1,M4K,East Toronto,"The Danforth West,Riverdale",43
2,M4L,East Toronto,"The Beaches West,India Bazaar",20
3,M4M,East Toronto,Studio District,40
4,M4N,Central Toronto,Lawrence Park,3
5,M4P,Central Toronto,Davisville North,8
6,M4R,Central Toronto,North Toronto West,21
7,M4S,Central Toronto,Davisville,35
8,M4T,Central Toronto,"Moore Park,Summerhill East",2
9,M4V,Central Toronto,"Deer Park,Forest Hill SE,Rathnelly,South Hill,...",14


#### Check number of unique venues categories returned

In [31]:
len(Toronto_venues['Venue Category'].unique())

233

In [32]:
# one hot encoding
Toronto_Boroughs_onehot = pd.get_dummies(Toronto_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood, Borough, Postalcode columns back to dataframe
Toronto_Boroughs_onehot['Postalcode'] = Toronto_venues['Postalcode'] 
Toronto_Boroughs_onehot['Borough'] = Toronto_venues['Borough']
Toronto_Boroughs_onehot['Neighbourhood'] = Toronto_venues['Neighbourhood'] 

# # move neighborhood column to the first column
fixed_columns = list(Toronto_Boroughs_onehot.columns[-3:]) + list(Toronto_Boroughs_onehot.columns[:-3])
Toronto_Boroughs_onehot = Toronto_Boroughs_onehot[fixed_columns]

print(Toronto_Boroughs_onehot.shape)
Toronto_Boroughs_onehot.head()

(1692, 236)


Unnamed: 0,Postalcode,Borough,Neighbourhood,Adult Boutique,Afghan Restaurant,Airport,Airport Food Court,Airport Gate,Airport Lounge,Airport Service,Airport Terminal,American Restaurant,Antique Shop,Aquarium,Art Gallery,Art Museum,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,Auto Workshop,BBQ Joint,Baby Store,Bagel Shop,Bakery,Bank,Bar,Baseball Stadium,Basketball Stadium,Beach,Beer Bar,Beer Store,Belgian Restaurant,Bistro,Boat or Ferry,Bookstore,Boutique,Brazilian Restaurant,Breakfast Spot,Brewery,Bubble Tea Shop,Building,Burger Joint,Burrito Place,Bus Line,Butcher,Café,Cajun / Creole Restaurant,Camera Store,Caribbean Restaurant,Cheese Shop,Chinese Restaurant,Chocolate Shop,Church,Climbing Gym,Clothing Store,Cocktail Bar,Coffee Shop,College Arts Building,College Gym,College Rec Center,Colombian Restaurant,Comfort Food Restaurant,Comic Shop,Concert Hall,Convenience Store,Cosmetics Shop,Costume Shop,Coworking Space,Creperie,Cuban Restaurant,Cupcake Shop,Dance Studio,Deli / Bodega,Department Store,Dessert Shop,Dim Sum Restaurant,Diner,Discount Store,Dog Run,Doner Restaurant,Donut Shop,Dumpling Restaurant,Eastern European Restaurant,Electronics Store,Ethiopian Restaurant,Event Space,Falafel Restaurant,Farmers Market,Fast Food Restaurant,Filipino Restaurant,Fish & Chips Shop,Fish Market,Flea Market,Food,Food & Drink Shop,Food Court,Food Truck,Fountain,French Restaurant,Fried Chicken Joint,Fruit & Vegetable Store,Furniture / Home Store,Gaming Cafe,Garden,Garden Center,Gastropub,Gay Bar,General Entertainment,General Travel,German Restaurant,Gift Shop,Gluten-free Restaurant,Gourmet Shop,Greek Restaurant,Grocery Store,Gym,Gym / Fitness Center,Harbor / Marina,Health & Beauty Service,Health Food Store,Historic Site,History Museum,Hobby Shop,Hookah Bar,Hospital,Hostel,Hotel,Hotel Bar,Hotpot Restaurant,Ice Cream Shop,Indian Restaurant,Indie Movie Theater,Irish Pub,Italian Restaurant,Japanese Restaurant,Jazz Club,Jewelry Store,Jewish Restaurant,Juice Bar,Korean Restaurant,Lake,Latin American Restaurant,Light Rail Station,Lingerie Store,Liquor Store,Lounge,Mac & Cheese Joint,Malay Restaurant,Market,Martial Arts Dojo,Mediterranean Restaurant,Men's Store,Mexican Restaurant,Middle Eastern Restaurant,Miscellaneous Shop,Modern European Restaurant,Monument / Landmark,Movie Theater,Museum,Music Store,Music Venue,Neighborhood,New American Restaurant,Nightclub,Noodle House,Office,Opera House,Optical Shop,Organic Grocery,Other Great Outdoors,Park,Performing Arts Venue,Pet Store,Pharmacy,Pizza Place,Plane,Playground,Plaza,Poke Place,Pool,Portuguese Restaurant,Poutine Place,Pub,Ramen Restaurant,Record Shop,Recording Studio,Rental Car Location,Restaurant,Sake Bar,Salad Place,Salon / Barbershop,Sandwich Place,Scenic Lookout,Sculpture Garden,Seafood Restaurant,Shoe Store,Shopping Mall,Skate Park,Skating Rink,Smoke Shop,Smoothie Shop,Snack Place,Soup Place,Southern / Soul Food Restaurant,Spa,Speakeasy,Sporting Goods Shop,Sports Bar,Stadium,Stationery Store,Steakhouse,Strip Club,Supermarket,Sushi Restaurant,Swim School,Taco Place,Tailor Shop,Taiwanese Restaurant,Tanning Salon,Tapas Restaurant,Tea Room,Thai Restaurant,Theater,Theme Restaurant,Toy / Game Store,Trail,Train Station,Vegetarian / Vegan Restaurant,Video Game Store,Video Store,Vietnamese Restaurant,Wine Bar,Wine Shop,Wings Joint,Women's Store,Yoga Studio
0,M4E,East Toronto,The Beaches,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,M4E,East Toronto,The Beaches,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,M4E,East Toronto,The Beaches,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,M4E,East Toronto,The Beaches,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,M4K,East Toronto,"The Danforth West,Riverdale",0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


In [33]:
Toronto_Boroughs_groupd = Toronto_Boroughs_onehot.groupby(['Postalcode', 'Borough', 'Neighbourhood']).mean().reset_index()
print(Toronto_Boroughs_groupd.shape)
Toronto_Boroughs_groupd.head()

(38, 236)


Unnamed: 0,Postalcode,Borough,Neighbourhood,Adult Boutique,Afghan Restaurant,Airport,Airport Food Court,Airport Gate,Airport Lounge,Airport Service,Airport Terminal,American Restaurant,Antique Shop,Aquarium,Art Gallery,Art Museum,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,Auto Workshop,BBQ Joint,Baby Store,Bagel Shop,Bakery,Bank,Bar,Baseball Stadium,Basketball Stadium,Beach,Beer Bar,Beer Store,Belgian Restaurant,Bistro,Boat or Ferry,Bookstore,Boutique,Brazilian Restaurant,Breakfast Spot,Brewery,Bubble Tea Shop,Building,Burger Joint,Burrito Place,Bus Line,Butcher,Café,Cajun / Creole Restaurant,Camera Store,Caribbean Restaurant,Cheese Shop,Chinese Restaurant,Chocolate Shop,Church,Climbing Gym,Clothing Store,Cocktail Bar,Coffee Shop,College Arts Building,College Gym,College Rec Center,Colombian Restaurant,Comfort Food Restaurant,Comic Shop,Concert Hall,Convenience Store,Cosmetics Shop,Costume Shop,Coworking Space,Creperie,Cuban Restaurant,Cupcake Shop,Dance Studio,Deli / Bodega,Department Store,Dessert Shop,Dim Sum Restaurant,Diner,Discount Store,Dog Run,Doner Restaurant,Donut Shop,Dumpling Restaurant,Eastern European Restaurant,Electronics Store,Ethiopian Restaurant,Event Space,Falafel Restaurant,Farmers Market,Fast Food Restaurant,Filipino Restaurant,Fish & Chips Shop,Fish Market,Flea Market,Food,Food & Drink Shop,Food Court,Food Truck,Fountain,French Restaurant,Fried Chicken Joint,Fruit & Vegetable Store,Furniture / Home Store,Gaming Cafe,Garden,Garden Center,Gastropub,Gay Bar,General Entertainment,General Travel,German Restaurant,Gift Shop,Gluten-free Restaurant,Gourmet Shop,Greek Restaurant,Grocery Store,Gym,Gym / Fitness Center,Harbor / Marina,Health & Beauty Service,Health Food Store,Historic Site,History Museum,Hobby Shop,Hookah Bar,Hospital,Hostel,Hotel,Hotel Bar,Hotpot Restaurant,Ice Cream Shop,Indian Restaurant,Indie Movie Theater,Irish Pub,Italian Restaurant,Japanese Restaurant,Jazz Club,Jewelry Store,Jewish Restaurant,Juice Bar,Korean Restaurant,Lake,Latin American Restaurant,Light Rail Station,Lingerie Store,Liquor Store,Lounge,Mac & Cheese Joint,Malay Restaurant,Market,Martial Arts Dojo,Mediterranean Restaurant,Men's Store,Mexican Restaurant,Middle Eastern Restaurant,Miscellaneous Shop,Modern European Restaurant,Monument / Landmark,Movie Theater,Museum,Music Store,Music Venue,Neighborhood,New American Restaurant,Nightclub,Noodle House,Office,Opera House,Optical Shop,Organic Grocery,Other Great Outdoors,Park,Performing Arts Venue,Pet Store,Pharmacy,Pizza Place,Plane,Playground,Plaza,Poke Place,Pool,Portuguese Restaurant,Poutine Place,Pub,Ramen Restaurant,Record Shop,Recording Studio,Rental Car Location,Restaurant,Sake Bar,Salad Place,Salon / Barbershop,Sandwich Place,Scenic Lookout,Sculpture Garden,Seafood Restaurant,Shoe Store,Shopping Mall,Skate Park,Skating Rink,Smoke Shop,Smoothie Shop,Snack Place,Soup Place,Southern / Soul Food Restaurant,Spa,Speakeasy,Sporting Goods Shop,Sports Bar,Stadium,Stationery Store,Steakhouse,Strip Club,Supermarket,Sushi Restaurant,Swim School,Taco Place,Tailor Shop,Taiwanese Restaurant,Tanning Salon,Tapas Restaurant,Tea Room,Thai Restaurant,Theater,Theme Restaurant,Toy / Game Store,Trail,Train Station,Vegetarian / Vegan Restaurant,Video Game Store,Video Store,Vietnamese Restaurant,Wine Bar,Wine Shop,Wings Joint,Women's Store,Yoga Studio
0,M4E,East Toronto,The Beaches,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,M4K,East Toronto,"The Danforth West,Riverdale",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.023256,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.023256,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.046512,0.0,0.0,0.0,0.023256,0.023256,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.023256,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.093023,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.023256,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.023256,0.0,0.023256,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.023256,0.023256,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.232558,0.023256,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.069767,0.023256,0.0,0.0,0.046512,0.023256,0.0,0.0,0.0,0.023256,0.0,0.0,0.0,0.0,0.0,0.023256,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.023256,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.023256,0.0,0.0,0.0,0.0,0.023256,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.023256,0.0,0.0,0.023256,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.023256,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.023256
2,M4L,East Toronto,"The Beaches West,India Bazaar",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.05,0.0,0.0,0.05,0.05,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.05,0.0,0.05,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.05,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.05,0.0,0.0,0.0,0.05,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.05,0.0,0.05,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.05,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.1,0.0,0.05,0.0,0.1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.05,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.05,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.05,0.0,0.0,0.05,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,M4M,East Toronto,Studio District,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.05,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.05,0.025,0.025,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.025,0.0,0.0,0.0,0.025,0.0,0.0,0.0,0.0,0.0,0.0,0.1,0.0,0.0,0.0,0.025,0.025,0.0,0.0,0.0,0.025,0.0,0.075,0.0,0.0,0.0,0.0,0.025,0.0,0.0,0.025,0.0,0.0,0.025,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.025,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.025,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.05,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.025,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.025,0.0,0.0,0.0,0.05,0.0,0.0,0.0,0.0,0.025,0.0,0.0,0.025,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.025,0.0,0.0,0.0,0.0,0.0,0.025,0.0,0.025,0.025,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.025,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.025,0.0,0.0,0.025,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.025,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.025
4,M4N,Central Toronto,Lawrence Park,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.333333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.333333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.333333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


#### Get 10 most common venue categories in each area

In [34]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [35]:
# num_top_venues = 10  # Top 10 venues
# indicators = ['st', 'nd', 'rd']

# # create columns according to number of top venues
# columns = ['Neighbourhood']
# for ind in np.arange(num_top_venues):
#     try:
#         columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
#     except:
#         columns.append('{}th Most Common Venue'.format(ind+1))

# # create a new dataframe
# Top_Neighborhood_Venues_Toronto = pd.DataFrame(columns=columns)
# Top_Neighborhood_Venues_Toronto['Neighbourhood'] = Toronto_Boroughs_groupd['Neighbourhood']

# for ind in np.arange(Toronto_Boroughs_groupd.shape[0]):
#     Top_Neighborhood_Venues_Toronto.iloc[ind, 1:] = return_most_common_venues(Toronto_Boroughs_groupd.iloc[ind, :], num_top_venues)

# Top_Neighborhood_Venues_Toronto.head(10)

In [36]:
num_top_venues = 10
indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
Columns = ['Postalcode', 'Borough', 'Neighbourhood']
freqColumns = []
for ind in np.arange(num_top_venues):
    try:
        freqColumns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        freqColumns.append('{}th Most Common Venue'.format(ind+1))
        
columns = Columns+freqColumns

# create a new dataframe
Top_Neighborhood_Venues_Toronto = pd.DataFrame(columns=columns)
Top_Neighborhood_Venues_Toronto['Postalcode'] = Toronto_Boroughs_groupd['Postalcode']
Top_Neighborhood_Venues_Toronto['Borough'] = Toronto_Boroughs_groupd['Borough']
Top_Neighborhood_Venues_Toronto['Neighbourhood'] = Toronto_Boroughs_groupd['Neighbourhood']

for ind in np.arange(Toronto_Boroughs_groupd.shape[0]):
    row_categories = Toronto_Boroughs_groupd.iloc[ind, :].iloc[3:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    Top_Neighborhood_Venues_Toronto.iloc[ind, 3:] = row_categories_sorted.index.values[0:num_top_venues]

Top_Neighborhood_Venues_Toronto.sort_values(freqColumns, inplace=True)
Top_Neighborhood_Venues_Toronto

Unnamed: 0,Postalcode,Borough,Neighbourhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
27,M5V,Downtown Toronto,"CN Tower,Bathurst Quay,Island airport,Harbourf...",Airport Lounge,Airport Service,Airport Terminal,Sculpture Garden,Airport,Airport Food Court,Airport Gate,Boutique,Plane,Harbor / Marina
31,M6H,West Toronto,"Dovercourt Village,Dufferin",Bakery,Pharmacy,Discount Store,Supermarket,Gym / Fitness Center,Pool,Music Venue,Café,Middle Eastern Restaurant,Brewery
32,M6J,West Toronto,"Little Portugal,Trinity",Bar,Coffee Shop,Asian Restaurant,Cocktail Bar,Restaurant,Vietnamese Restaurant,Café,Men's Store,Boutique,Bakery
35,M6R,West Toronto,"Parkdale,Roncesvalles",Breakfast Spot,Gift Shop,Italian Restaurant,Dessert Shop,Burger Joint,Eastern European Restaurant,Bar,Bank,Movie Theater,Dog Run
26,M5T,Downtown Toronto,"Chinatown,Grange Park,Kensington Market",Café,Bar,Chinese Restaurant,Vegetarian / Vegan Restaurant,Vietnamese Restaurant,Coffee Shop,Bakery,Caribbean Restaurant,Mexican Restaurant,Gaming Cafe
25,M5S,Downtown Toronto,"Harbord,University of Toronto",Café,Bookstore,Restaurant,Bar,Japanese Restaurant,Bakery,Coffee Shop,College Arts Building,Dessert Shop,Sandwich Place
3,M4M,East Toronto,Studio District,Café,Coffee Shop,Bakery,Gastropub,American Restaurant,Italian Restaurant,Yoga Studio,Cheese Shop,Fish Market,Juice Bar
19,M5J,Downtown Toronto,"Harbourfront East,Toronto Islands,Union Station",Coffee Shop,Aquarium,Hotel,Café,Italian Restaurant,Pizza Place,Scenic Lookout,Restaurant,Brewery,History Museum
13,M5A,Downtown Toronto,"Harbourfront,Regent Park",Coffee Shop,Café,Bakery,Park,Pub,Breakfast Spot,Theater,Mexican Restaurant,Beer Store,Bank
33,M6K,West Toronto,"Brockton,Exhibition Place,Parkdale Village",Coffee Shop,Café,Breakfast Spot,Gym,Furniture / Home Store,Convenience Store,Climbing Gym,Caribbean Restaurant,Burrito Place,Pet Store


In [37]:
Top_Neighborhood_Venues_Toronto.shape

(38, 13)

### Cluster Neighborhoods (Using K-Means Algorithm)

In [38]:
# set number of clusters
num_clusters = 3

Toronto_Neighborhoods_Clustered = Toronto_Boroughs_groupd.drop(['Postalcode', 'Borough', 'Neighbourhood'], axis=1)

# run k-means clustering
kmeans_model = KMeans(n_clusters=num_clusters, random_state=0).fit(Toronto_Neighborhoods_Clustered)

# check cluster labels generated for each row in the dataframe
print(kmeans_model.labels_[:20])

[2 2 2 2 1 2 2 2 0 2 1 2 2 2 2 2 2 2 2 2]


In [39]:
# add clustering labels
Top_Neighborhood_Venues_Toronto.insert(0, 'Clusters', kmeans_model.labels_) 

Toronto_Merged_data = df_Toronto_Boroughs.drop(['Borough', 'Neighbourhood'], axis =1)

Toronto_Merged_data = Toronto_Merged_data.join(Top_Neighborhood_Venues_Toronto.set_index('Postalcode'), on='Postalcode')
Toronto_Merged_data.sort_values(['Clusters'] + freqColumns, inplace=True)

Toronto_Merged_data.head()

Unnamed: 0,Postalcode,Latitude,Longitude,Clusters,Borough,Neighbourhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
13,M5A,43.65426,-79.360636,0,Downtown Toronto,"Harbourfront,Regent Park",Coffee Shop,Café,Bakery,Park,Pub,Breakfast Spot,Theater,Mexican Restaurant,Beer Store,Bank
26,M5T,43.653206,-79.400049,1,Downtown Toronto,"Chinatown,Grange Park,Kensington Market",Café,Bar,Chinese Restaurant,Vegetarian / Vegan Restaurant,Vietnamese Restaurant,Coffee Shop,Bakery,Caribbean Restaurant,Mexican Restaurant,Gaming Cafe
20,M5K,43.647177,-79.381576,1,Downtown Toronto,"Design Exchange,Toronto Dominion Centre",Coffee Shop,Café,Hotel,American Restaurant,Restaurant,Italian Restaurant,Gastropub,Seafood Restaurant,Deli / Bodega,Lounge
22,M5N,43.711695,-79.416936,1,Central Toronto,Roselawn,Garden,Health & Beauty Service,Comic Shop,Concert Hall,Farmers Market,Falafel Restaurant,Event Space,Ethiopian Restaurant,Electronics Store,Eastern European Restaurant
27,M5V,43.628947,-79.39442,2,Downtown Toronto,"CN Tower,Bathurst Quay,Island airport,Harbourf...",Airport Lounge,Airport Service,Airport Terminal,Sculpture Garden,Airport,Airport Food Court,Airport Gate,Boutique,Plane,Harbor / Marina


### Visualize the resulting clusters on Map

In [40]:
# create map
map_clusters = Folium.Map(location=[latitude, longitude], zoom_start=12)

# set color scheme for the clusters
x = np.arange(num_clusters)
ys = [i + x + (i*x)**2 for i in range(num_clusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, long, bor, neighborhood, cluster in zip(Toronto_Merged_data['Latitude'], Toronto_Merged_data['Longitude'], 
    Toronto_Merged_data['Borough'], Toronto_Merged_data['Neighbourhood'], Toronto_Merged_data['Clusters']):
    label = Folium.Popup(str(neighborhood) + ' Cluster: ' + str(cluster), parse_html=True)
    Folium.CircleMarker(
        [lat, long],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

### Upon observing the result, we can group the clusters as follows:

Cluster 0: Living area (with mostly park, trail, school; and some small businesses)

Cluster 1: Roselawn - Central Toronto (nothing here except a garden)

Cluster 2: Business area (with lots of business venues)