# Toronto Neighborhood Segmentation

## Part 1: Toronto's Neighboorhoods
In this part we retrieve the basic information about Toronto's neighborhoods that have *M* in their postcode and summarize it in a pandas dataframe.
The data are scrapped from the Wikipedia page: [https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M](https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M).

Steps:
- Downloading the Webpage Using Requests Library
- Parsing Webpage HTML Using BeautifulSoup
- Extracting Data and Building DataFrame

In [1]:
#import necessary packages
import pandas as pd 
import requests
from bs4 import BeautifulSoup

In [2]:
#downlaod wikipedia page using requests
url = "https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M"
html_data = requests.get(url).text

In [3]:
#parse the html data using beatiful_soup
soup = BeautifulSoup(html_data,"html5lib")

In [4]:
#get the page title
soup.title

<title>List of postal codes of Canada: M - Wikipedia</title>

Using Beatiful soup extract the table with the neighborhood data and store them in a dataframe .

In [5]:
# create the dataframe
toronto_neighborhoods = pd.DataFrame(columns=[
    "PostalCode", 
    "Borough", 
    "Neighborhood"])

#extract the table, and extract data row by row, column by column
for row in soup.find("tbody").find_all("tr"):
    col = row.find_all("td")
    if col:
        postcode = col[0].text
        postcode = col[0].text
        borough = col[1].text
        neighborhood = col[2].text

        toronto_neighborhoods = toronto_neighborhoods.append({
            "PostalCode":postcode, 
            "Borough":borough, 
            "Neighborhood":neighborhood}, 
            ignore_index = True)

toronto_neighborhoods.head()

Unnamed: 0,PostalCode,Borough,Neighborhood
0,M1A\n,Not assigned\n,Not assigned\n
1,M2A\n,Not assigned\n,Not assigned\n
2,M3A\n,North York\n,Parkwoods\n
3,M4A\n,North York\n,Victoria Village\n
4,M5A\n,Downtown Toronto\n,"Regent Park, Harbourfront\n"


Clean the dataframe:
- remove "\n"
- remove not assinged postal codes (rows with Borough="Not assigned")
- group Neighborhoods with same postalcode in the same row
- replace Neihgborhood cells having "Not Assigned" with the name of the corresponding Borough

In [6]:
#remove "\n"
toronto_neighborhoods=toronto_neighborhoods.replace(to_replace=r'\n', value='', regex=True)

In [7]:
#filter out not assinged postal codes
mask = toronto_neighborhoods['Borough']=="Not assigned"
toronto_neighborhoods = toronto_neighborhoods[~mask]
toronto_neighborhoods.reset_index(drop=True,inplace=True)
toronto_neighborhoods.head()

Unnamed: 0,PostalCode,Borough,Neighborhood
0,M3A,North York,Parkwoods
1,M4A,North York,Victoria Village
2,M5A,Downtown Toronto,"Regent Park, Harbourfront"
3,M6A,North York,"Lawrence Manor, Lawrence Heights"
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government"


In [8]:
# Check whether there are rows with duplicated postal code
duplicateDFRow = toronto_neighborhoods[toronto_neighborhoods.duplicated(['PostalCode'])]
print(duplicateDFRow)

Empty DataFrame
Columns: [PostalCode, Borough, Neighborhood]
Index: []


There are no rows with duplicated postal codes.

In [9]:
#Check whether there are cells with not assigned Neighboorhood field
mask_nb = toronto_neighborhoods['Neighborhood']=="Not assigned"
df_nb = toronto_neighborhoods[mask_nb]
df_nb.head()

Unnamed: 0,PostalCode,Borough,Neighborhood


There are no rows with "Not assigned" Neighborhood field.

In [10]:
#print out the shape of the dataframe
print(f'Number of rows (unique postal codes with assigned borough) in the toronto_neighborhoods dataframe: {toronto_neighborhoods.shape[0]}.')

Number of rows (unique postal codes with assigned borough) in the toronto_neighborhoods dataframe: 103.


### Assumption:
For the remainder of the project we assume that we need to downselect the postal code shown in the picture below, i.e, only 12 rows.:

![](https://d3c33hcgiwev3.cloudfront.net/imageAssetProxy.v1/7JXaz3NNEeiMwApe4i-fLg_40e690ae0e927abda2d4bde7d94ed133_Screen-Shot-2018-06-18-at-7.17.57-PM.png?expiry=1615593600000&hmac=unqSqgkLjy999x2SlSPGTtwyQY3V-RE76_R0fAdH2IY)

Source: [https://www.coursera.org/learn/applied-data-science-capstone/peer/I1bDq/segmenting-and-clustering-neighborhoods-in-toronto/submit](https://www.coursera.org/learn/applied-data-science-capstone/peer/I1bDq/segmenting-and-clustering-neighborhoods-in-toronto/submit)

In [11]:
#filter out the exact same DF rows as shown in the assignment and put them in a new dataframe
toronto_neighborhoods_xs = pd.DataFrame(columns=[
    "PostalCode", 
    "Borough", 
    "Neighborhood"])
postalcodes = ['M5G','M2H','M4B','M1J','M4G','M4M','M1R','M9V','M9L','M5V','M1B','M5A']
for postalcode in postalcodes:
    row = toronto_neighborhoods[toronto_neighborhoods['PostalCode']==postalcode]
    toronto_neighborhoods_xs = toronto_neighborhoods_xs.append(row,ignore_index=True)
toronto_neighborhoods_xs.head(12)

Unnamed: 0,PostalCode,Borough,Neighborhood
0,M5G,Downtown Toronto,Central Bay Street
1,M2H,North York,Hillcrest Village
2,M4B,East York,"Parkview Hill, Woodbine Gardens"
3,M1J,Scarborough,Scarborough Village
4,M4G,East York,Leaside
5,M4M,East Toronto,Studio District
6,M1R,Scarborough,"Wexford, Maryvale"
7,M9V,Etobicoke,"South Steeles, Silverstone, Humbergate, Jamest..."
8,M9L,North York,Humber Summit
9,M5V,Downtown Toronto,"CN Tower, King and Spadina, Railway Lands, Har..."


## Part 2: Add latitude and longitude to the Toronto Neighborhoods DataFrame

We will retrieve latitude and longitude for each postal code from the following csv file: [http://cocl.us/Geospatial_data](http://cocl.us/Geospatial_data).

In [12]:
# Download the csv with the geospatial data
#use -L to follow redirects (https://www.unix.com/shell-programming-and-scripting/263133-how-get-content-webpage-curl-vs-wget.html)
!curl -o Geospatial_data.csv -L http://cocl.us/Geospatial_data/Geospatial_Coordinates.csv 

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100   171  100   171    0     0    737      0 --:--:-- --:--:-- --:--:--   737
100   524    0   524    0     0    423      0 --:--:--  0:00:01 --:--:--   729
  0     0    0     0    0     0      0      0 --:--:--  0:00:01 --:--:--     0
  0     0    0     0    0     0      0      0 --:--:--  0:00:01 --:--:--     0
100     4    0     4    0     0      1      0 --:--:--  0:00:02 --:--:--   666
100  2891  100  2891    0     0    938      0  0:00:03  0:00:03 --:--:--   938


In [13]:
# Read the data into a dataframe
geo_data = pd.read_csv("Geospatial_data.csv",delimiter=",")
geo_data.head()

Unnamed: 0,Postal Code,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497
2,M1E,43.763573,-79.188711
3,M1G,43.770992,-79.216917
4,M1H,43.773136,-79.239476


In [14]:
#we will perform an inner join, which requires that the columns onto which the join is performed have the same name.
geo_data.rename(columns={'Postal Code':'PostalCode'},errors="raise",inplace=True) #remember to specify inplace=True to change the DF

In [15]:
#perform the merge
toronto_neighborhoods_xs=toronto_neighborhoods_xs.merge(geo_data,how="inner",on="PostalCode")
toronto_neighborhoods_xs.head(12)

Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude
0,M5G,Downtown Toronto,Central Bay Street,43.657952,-79.387383
1,M2H,North York,Hillcrest Village,43.803762,-79.363452
2,M4B,East York,"Parkview Hill, Woodbine Gardens",43.706397,-79.309937
3,M1J,Scarborough,Scarborough Village,43.744734,-79.239476
4,M4G,East York,Leaside,43.70906,-79.363452
5,M4M,East Toronto,Studio District,43.659526,-79.340923
6,M1R,Scarborough,"Wexford, Maryvale",43.750072,-79.295849
7,M9V,Etobicoke,"South Steeles, Silverstone, Humbergate, Jamest...",43.739416,-79.588437
8,M9L,North York,Humber Summit,43.756303,-79.565963
9,M5V,Downtown Toronto,"CN Tower, King and Spadina, Railway Lands, Har...",43.628947,-79.39442


The above dataframe corresponds to the [same downselected postal codes][1] ones shown in the assignment.

## Part 3: Exploring and segmenting neighborhoods in Toronto
In this part we will explore and segment the [downselected postal areas of Toronto][1]. Segmentation will be based on the top venues for each postal code area based on the listing in FourSquare.
Steps:
- Look at the geographical distribution of the postal code areas on the map of the metropolitan area of Toronto
- Set up request call for FourSquare API and test it for one single postal code
- Retrieve top 100 venues for each postal code area and determine 10 most frequent venue categories
- Segment postal code area based on venue categories using Kmeans clustering


In [16]:
#Libraries
import numpy as np # library to handle data in a vectorized manner

import pandas as pd # library for data analsysis
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import json # library to handle JSON files

from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

import folium # map rendering library

### Geographical distribution of downselected postal code areas

In [17]:
# geographical coordinates of Toronto
address = 'Toronto, ON, Canada'

geolocator = Nominatim(user_agent="to_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Manhattan are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Manhattan are 43.6534817, -79.3839347.


In [36]:
# create map of Toronto using latitude and longitude values
map_toronto = folium.Map(location=[latitude, longitude], zoom_start=10)

# add markers to map
for lat, lng, borough, neighborhood in zip(toronto_neighborhoods_xs['Latitude'], toronto_neighborhoods_xs['Longitude'], toronto_neighborhoods_xs['Borough'],toronto_neighborhoods_xs['Neighborhood']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_toronto)  
    
map_toronto

The [downselected postal area codes][1] are nicely distributed over Toronto's metropolitan area. It will be intereting to see if different clusters emerge after the segmentation

### Setup and test FourSquare API

Set up FourSquare API

In [40]:
# Define credentials
CLIENT_ID = 'MPKNF45NHNJUW2DCFMB1TNDAWCEQAUBX2JKCPZGW3DBPZG1V' # your Foursquare ID
CLIENT_SECRET = 'ECZM3IDRDFRA4KMHGUCOMHHG3KI33DWWANJX11ZYRWNI3M2N' # your Foursquare Secret
ACCESS_TOKEN = 'I1ZMJZBXJZWPSR5JX10JC510IZNVP02FPEPUAFFYWUM5LXQT' # your FourSquare Access Token
VERSION = '20180605' # Foursquare API version
LIMIT = 100 # A default Foursquare API limit value

Mike a test request for postal code area M9A (row number 7) and retrieve top 3 venues

In [22]:
# postal area longitude and latitude
neighborhood_latitude = toronto_neighborhoods_xs.loc[7, 'Latitude'] # neighborhood latitude value
neighborhood_longitude = toronto_neighborhoods_xs.loc[7, 'Longitude'] # neighborhood longitude value

neighborhood_name = toronto_neighborhoods_xs.loc[7, 'Neighborhood'] # neighborhood name

print('Latitude and longitude values of {} are {}, {}.'.format(neighborhood_name, 
                                                               neighborhood_latitude, 
                                                               neighborhood_longitude))

Latitude and longitude values of South Steeles, Silverstone, Humbergate, Jamestown, Mount Olive, Beaumond Heights, Thistletown, Albion Gardens are 43.739416399999996, -79.5884369.


In [23]:
# define query
radius = 500
limit = 3
url='https://api.foursquare.com/v2/venues/explore?client_id={}&client_secret={}&ll={},{}&v={}&radius={}&limit={}'.format(
    CLIENT_ID, 
    CLIENT_SECRET, 
    neighborhood_latitude, 
    neighborhood_longitude, 
    VERSION, radius, 
    limit)
url

'https://api.foursquare.com/v2/venues/explore?client_id=MPKNF45NHNJUW2DCFMB1TNDAWCEQAUBX2JKCPZGW3DBPZG1V&client_secret=ECZM3IDRDFRA4KMHGUCOMHHG3KI33DWWANJX11ZYRWNI3M2N&ll=43.739416399999996,-79.5884369&v=20180605&radius=500&limit=3'

In [24]:
# make the query and examine the results
results = requests.get(url).json()
results

{'meta': {'code': 200, 'requestId': '604b4f8dccb5ce3c75f42476'},
  'headerLocation': 'Rexdale',
  'headerFullLocation': 'Rexdale, Toronto',
  'headerLocationGranularity': 'neighborhood',
  'totalResults': 10,
  'suggestedBounds': {'ne': {'lat': 43.7439164045, 'lng': -79.58222007762089},
   'sw': {'lat': 43.734916395499994, 'lng': -79.59465372237912}},
  'groups': [{'type': 'Recommended Places',
    'name': 'recommended',
    'items': [{'reasons': {'count': 0,
       'items': [{'summary': 'This spot is popular',
         'type': 'general',
         'reasonName': 'globalInteractionReason'}]},
      'venue': {'id': '4be58dc4cf200f479154133c',
       'name': 'Shoppers Drug Mart',
       'location': {'address': '1530 Albion Rd',
        'crossStreet': 'Albion Mall',
        'lat': 43.741685,
        'lng': -79.584487,
        'labeledLatLngs': [{'label': 'display',
          'lat': 43.741685,
          'lng': -79.584487}],
        'distance': 405,
        'postalCode': 'M9V 1B4',
        'c

We can retrieve the top listings from FourSquare. The interesting information is in `items`.

In [29]:
venues = results['response']['groups'][0]['items'] 
venues

[{'reasons': {'count': 0,
   'items': [{'summary': 'This spot is popular',
     'type': 'general',
     'reasonName': 'globalInteractionReason'}]},
  'venue': {'id': '4be58dc4cf200f479154133c',
   'name': 'Shoppers Drug Mart',
   'location': {'address': '1530 Albion Rd',
    'crossStreet': 'Albion Mall',
    'lat': 43.741685,
    'lng': -79.584487,
    'labeledLatLngs': [{'label': 'display',
      'lat': 43.741685,
      'lng': -79.584487}],
    'distance': 405,
    'postalCode': 'M9V 1B4',
    'cc': 'CA',
    'city': 'Etobicoke',
    'state': 'ON',
    'country': 'Canada',
    'formattedAddress': ['1530 Albion Rd (Albion Mall)',
     'Etobicoke ON M9V 1B4',
     'Canada']},
   'categories': [{'id': '4bf58dd8d48988d10f951735',
     'name': 'Pharmacy',
     'pluralName': 'Pharmacies',
     'shortName': 'Pharmacy',
     'icon': {'prefix': 'https://ss3.4sqi.net/img/categories_v2/shops/pharmacy_',
      'suffix': '.png'},
     'primary': True}],
   'photos': {'count': 0, 'groups': []}},
  

Convert from json the pandas df.

In [31]:
# json to df
venues_df = pd.json_normalize(venues)
venues_df.head()

Unnamed: 0,referralId,reasons.count,reasons.items,venue.id,venue.name,venue.location.address,venue.location.crossStreet,venue.location.lat,venue.location.lng,venue.location.labeledLatLngs,venue.location.distance,venue.location.postalCode,venue.location.cc,venue.location.city,venue.location.state,venue.location.country,venue.location.formattedAddress,venue.categories,venue.photos.count,venue.photos.groups
0,e-0-4be58dc4cf200f479154133c-0,0,"[{'summary': 'This spot is popular', 'type': '...",4be58dc4cf200f479154133c,Shoppers Drug Mart,1530 Albion Rd,Albion Mall,43.741685,-79.584487,"[{'label': 'display', 'lat': 43.741685, 'lng':...",405,M9V 1B4,CA,Etobicoke,ON,Canada,"[1530 Albion Rd (Albion Mall), Etobicoke ON M9...","[{'id': '4bf58dd8d48988d10f951735', 'name': 'P...",0,[]
1,e-0-4be70e26cf200f47e334153c-1,0,"[{'summary': 'This spot is popular', 'type': '...",4be70e26cf200f47e334153c,Popeyes Louisiana Kitchen,80-1530 Albion Rd,at Kipling Ave. (Albion Centre),43.741209,-79.584332,"[{'label': 'display', 'lat': 43.74120870478487...",385,M9V 1B4,CA,Etobicoke,ON,Canada,[80-1530 Albion Rd (at Kipling Ave. (Albion Ce...,"[{'id': '4d4ae6fc7a7b7dea34424761', 'name': 'F...",0,[]
2,e-0-4c633939e1621b8d48842553-2,0,"[{'summary': 'This spot is popular', 'type': '...",4c633939e1621b8d48842553,Subway,"6210 Finch Ave West, Store 103",at Albion Rd.,43.742645,-79.589643,"[{'label': 'display', 'lat': 43.74264512142215...",372,M9V 0A1,CA,Toronto,ON,Canada,"[6210 Finch Ave West, Store 103 (at Albion Rd....","[{'id': '4bf58dd8d48988d1c5941735', 'name': 'S...",0,[]


### Extract 10 most frequent venues categories in each postal code area

#### Retrieve top 100 venues for each postal code area

In [None]:
Define a function to retrive the top 100 venues in 1500 m radius of each postal code area.
We use a larger radius as the downselected postal codes are relatively far away from each other.

In [55]:
def getNearbyVenues(postalcodes, names, latitudes, longitudes, radius=1500):
    
    venues_list=[]
    for postalcode, name, lat, lng in zip(postalcodes, names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            postalcode,
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = [
                  'PostalCode',
                  'Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

Make the request and store the venues in a dataframe

In [56]:
toronto_venues = getNearbyVenues(
    toronto_neighborhoods_xs['PostalCode'], 
    toronto_neighborhoods_xs['Neighborhood'],
    toronto_neighborhoods_xs['Latitude'],
    toronto_neighborhoods_xs['Longitude'])

Central Bay Street
Hillcrest Village
Parkview Hill, Woodbine Gardens
Scarborough Village
Leaside
Studio District
Wexford, Maryvale
South Steeles, Silverstone, Humbergate, Jamestown, Mount Olive, Beaumond Heights, Thistletown, Albion Gardens
Humber Summit
CN Tower, King and Spadina, Railway Lands, Harbourfront West, Bathurst Quay, South Niagara, Island airport
Malvern, Rouge
Regent Park, Harbourfront


In [57]:
toronto_venues.head()

Unnamed: 0,PostalCode,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,M5G,Central Bay Street,43.657952,-79.387383,Hailed Coffee,43.658833,-79.383684,Coffee Shop
1,M5G,Central Bay Street,43.657952,-79.387383,NEO COFFEE BAR,43.66013,-79.38583,Coffee Shop
2,M5G,Central Bay Street,43.657952,-79.387383,College Park Area,43.659453,-79.383785,Park
3,M5G,Central Bay Street,43.657952,-79.387383,Mercatto,43.660391,-79.387664,Italian Restaurant
4,M5G,Central Bay Street,43.657952,-79.387383,Banh Mi Boys,43.659292,-79.381949,Sandwich Place


In [58]:
print(f'{toronto_venues.shape[0]} venues were found.')

702 venues were found.


The dataframe should contain 12 (postal code areas) x 100 (top venues / postal code area) = 1200 (total venues), i.e. significantly more than the venues that were found. This points to the fact that some postal code areas have lower than expected number of popular spots.
Let's see how many venues per postal code area were found.

In [59]:
#venues grouped by postal code area sorted in descending order of venues
toronto_venues.groupby(['PostalCode']).count().sort_values('Venue',ascending=False)

Unnamed: 0_level_0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
PostalCode,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
M4M,100,100,100,100,100,100,100
M5A,100,100,100,100,100,100,100
M5G,100,100,100,100,100,100,100
M4G,76,76,76,76,76,76,76
M5V,68,68,68,68,68,68,68
M2H,54,54,54,54,54,54,54
M1R,53,53,53,53,53,53,53
M1J,35,35,35,35,35,35,35
M4B,34,34,34,34,34,34,34
M1B,32,32,32,32,32,32,32


Approx one third of the postal code areas feature 35 top venues and less. Let's see how many unique venue categories there are.

In [66]:
print('There are {} unique venue categories.'.format(len(toronto_venues['Venue Category'].unique())))

There are 180 unique venue categories.


#### Determine the 10 most frequent venue categories per postal area code.

Create category hot encoding for each venue.

In [122]:
# one hot encoding
toronto_onehot = pd.get_dummies(toronto_venues[['Venue Category']], prefix="", prefix_sep="")

#add PostalCode and Neighborhood column
if 'Neighborhood' in toronto_onehot.columns:
    toronto_onehot.rename(columns={'Neighborhood':'Neighborhood_cat'},inplace=True) #one of the categories is named Neighborhood and creates conflicts

cols = ['Neighborhood', 'PostalCode']
for col in cols:
    ls = toronto_venues[col]
    toronto_onehot.insert(loc=0,column=col,value=ls)

toronto_onehot.head()

Unnamed: 0,PostalCode,Neighborhood,African Restaurant,Airport,Airport Lounge,American Restaurant,Animal Shelter,Antique Shop,Art Gallery,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,Auto Dealership,Auto Garage,Automotive Shop,BBQ Joint,Baby Store,Badminton Court,Bagel Shop,Bakery,Bank,Bar,Baseball Stadium,Beer Bar,Beer Store,Big Box Store,Bike Shop,Bistro,Boat or Ferry,Bookstore,Botanical Garden,Boutique,Breakfast Spot,Brewery,Bubble Tea Shop,Burger Joint,Burrito Place,Bus Line,Bus Stop,Café,Campground,Cantonese Restaurant,Caribbean Restaurant,Cheese Shop,Chinese Restaurant,Chocolate Shop,Circus,Climbing Gym,Clothing Store,Cocktail Bar,Coffee Shop,Comic Shop,Concert Hall,Convenience Store,Cosmetics Shop,Coworking Space,Creperie,Curling Ice,Dance Studio,Deli / Bodega,Department Store,Diner,Discount Store,Distribution Center,Dive Bar,Dog Run,Doner Restaurant,Electronics Store,Falafel Restaurant,Farm,Farmers Market,Fast Food Restaurant,Filipino Restaurant,Fish & Chips Shop,Fish Market,Flea Market,Food Court,Food Truck,French Restaurant,Fried Chicken Joint,Furniture / Home Store,Garden,Gas Station,Gastropub,Gay Bar,Gift Shop,Golf Course,Government Building,Greek Restaurant,Grocery Store,Gym,Gym / Fitness Center,Harbor / Marina,Historic Site,Hookah Bar,Hostel,Hotel,Hotel Bar,Housing Development,Ice Cream Shop,Indian Chinese Restaurant,Indian Restaurant,Intersection,Italian Restaurant,Japanese Restaurant,Juice Bar,Korean Restaurant,Latin American Restaurant,Liquor Store,Market,Martial Arts School,Medical Center,Mediterranean Restaurant,Men's Store,Mexican Restaurant,Middle Eastern Restaurant,Movie Theater,Moving Target,Music Venue,Neighborhood_cat,New American Restaurant,Nudist Beach,Office,Optical Shop,Organic Grocery,Other Great Outdoors,Paper / Office Supplies Store,Park,Performing Arts Venue,Pet Store,Pharmacy,Pizza Place,Plaza,Pool,Pool Hall,Pub,Ramen Restaurant,Record Shop,Rental Car Location,Residential Building (Apartment / Condo),Restaurant,Rock Climbing Spot,Rock Club,Salon / Barbershop,Sandwich Place,Scenic Lookout,Sculpture Garden,Seafood Restaurant,Shopping Mall,Skating Rink,Smoke Shop,Smoothie Shop,Snack Place,Spa,Spanish Restaurant,Speakeasy,Sporting Goods Shop,Sports Bar,Stationery Store,Steakhouse,Supermarket,Sushi Restaurant,Szechuan Restaurant,Taco Place,Taiwanese Restaurant,Tea Room,Tennis Court,Thai Restaurant,Theater,Theme Restaurant,Track,Trail,Train Station,Vegetarian / Vegan Restaurant,Video Game Store,Vietnamese Restaurant,Warehouse Store,Wings Joint,Women's Store,Xinjiang Restaurant,Yoga Studio,Zoo Exhibit
0,M5G,Central Bay Street,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,M5G,Central Bay Street,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,M5G,Central Bay Street,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,M5G,Central Bay Street,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,M5G,Central Bay Street,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


Group rows by postalcode area and by take the mean of the frequency of occurrence of each category

In [126]:
toronto_group = toronto_onehot.groupby('PostalCode').mean().reset_index()
toronto_group.head(12)

Unnamed: 0,PostalCode,African Restaurant,Airport,Airport Lounge,American Restaurant,Animal Shelter,Antique Shop,Art Gallery,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,Auto Dealership,Auto Garage,Automotive Shop,BBQ Joint,Baby Store,Badminton Court,Bagel Shop,Bakery,Bank,Bar,Baseball Stadium,Beer Bar,Beer Store,Big Box Store,Bike Shop,Bistro,Boat or Ferry,Bookstore,Botanical Garden,Boutique,Breakfast Spot,Brewery,Bubble Tea Shop,Burger Joint,Burrito Place,Bus Line,Bus Stop,Café,Campground,Cantonese Restaurant,Caribbean Restaurant,Cheese Shop,Chinese Restaurant,Chocolate Shop,Circus,Climbing Gym,Clothing Store,Cocktail Bar,Coffee Shop,Comic Shop,Concert Hall,Convenience Store,Cosmetics Shop,Coworking Space,Creperie,Curling Ice,Dance Studio,Deli / Bodega,Department Store,Diner,Discount Store,Distribution Center,Dive Bar,Dog Run,Doner Restaurant,Electronics Store,Falafel Restaurant,Farm,Farmers Market,Fast Food Restaurant,Filipino Restaurant,Fish & Chips Shop,Fish Market,Flea Market,Food Court,Food Truck,French Restaurant,Fried Chicken Joint,Furniture / Home Store,Garden,Gas Station,Gastropub,Gay Bar,Gift Shop,Golf Course,Government Building,Greek Restaurant,Grocery Store,Gym,Gym / Fitness Center,Harbor / Marina,Historic Site,Hookah Bar,Hostel,Hotel,Hotel Bar,Housing Development,Ice Cream Shop,Indian Chinese Restaurant,Indian Restaurant,Intersection,Italian Restaurant,Japanese Restaurant,Juice Bar,Korean Restaurant,Latin American Restaurant,Liquor Store,Market,Martial Arts School,Medical Center,Mediterranean Restaurant,Men's Store,Mexican Restaurant,Middle Eastern Restaurant,Movie Theater,Moving Target,Music Venue,Neighborhood_cat,New American Restaurant,Nudist Beach,Office,Optical Shop,Organic Grocery,Other Great Outdoors,Paper / Office Supplies Store,Park,Performing Arts Venue,Pet Store,Pharmacy,Pizza Place,Plaza,Pool,Pool Hall,Pub,Ramen Restaurant,Record Shop,Rental Car Location,Residential Building (Apartment / Condo),Restaurant,Rock Climbing Spot,Rock Club,Salon / Barbershop,Sandwich Place,Scenic Lookout,Sculpture Garden,Seafood Restaurant,Shopping Mall,Skating Rink,Smoke Shop,Smoothie Shop,Snack Place,Spa,Spanish Restaurant,Speakeasy,Sporting Goods Shop,Sports Bar,Stationery Store,Steakhouse,Supermarket,Sushi Restaurant,Szechuan Restaurant,Taco Place,Taiwanese Restaurant,Tea Room,Tennis Court,Thai Restaurant,Theater,Theme Restaurant,Track,Trail,Train Station,Vegetarian / Vegan Restaurant,Video Game Store,Vietnamese Restaurant,Warehouse Store,Wings Joint,Women's Store,Xinjiang Restaurant,Yoga Studio,Zoo Exhibit
0,M1B,0.03125,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.03125,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.03125,0.0,0.03125,0.0,0.0,0.0,0.0,0.0,0.03125,0.0,0.0,0.0,0.03125,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0625,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.03125,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.03125,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.03125,0.03125,0.03125,0.0,0.0,0.0,0.0625,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0625,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.03125,0.0,0.0,0.0,0.0,0.0,0.0,0.03125,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.03125,0.0,0.0,0.03125,0.0,0.0,0.0,0.03125,0.0,0.0,0.3125
1,M1J,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.057143,0.0,0.0,0.0,0.028571,0.028571,0.0,0.0,0.0,0.028571,0.0,0.0,0.028571,0.0,0.0,0.0,0.0,0.0,0.028571,0.0,0.0,0.0,0.0,0.0,0.028571,0.0,0.0,0.0,0.0,0.0,0.057143,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.028571,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.028571,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.028571,0.0,0.0,0.0,0.0,0.0,0.0,0.085714,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.057143,0.0,0.028571,0.028571,0.0,0.028571,0.0,0.0,0.0,0.028571,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.028571,0.0,0.0,0.0,0.028571,0.0,0.0,0.085714,0.057143,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.114286,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.028571,0.0,0.028571,0.0,0.0
2,M1R,0.018868,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.018868,0.0,0.0,0.0,0.018868,0.0,0.0,0.018868,0.0,0.018868,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.018868,0.0,0.0,0.037736,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.018868,0.0,0.0,0.0,0.0,0.0,0.037736,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.018868,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.018868,0.018868,0.0,0.018868,0.018868,0.0,0.0,0.0,0.0,0.0,0.0,0.018868,0.0,0.0,0.0,0.0,0.0,0.018868,0.056604,0.0,0.0,0.0,0.0,0.018868,0.0,0.0,0.0,0.0,0.0,0.018868,0.018868,0.037736,0.018868,0.0,0.0,0.018868,0.0,0.0,0.0,0.0,0.018868,0.018868,0.0,0.0,0.075472,0.0,0.018868,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.018868,0.0,0.0,0.018868,0.056604,0.018868,0.0,0.018868,0.0,0.0,0.0,0.037736,0.0,0.037736,0.0,0.0,0.0,0.037736,0.0,0.0,0.018868,0.0,0.018868,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.018868,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.018868,0.018868,0.0,0.0,0.0,0.0,0.0
3,M2H,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.018519,0.0,0.0,0.0,0.037037,0.074074,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.018519,0.0,0.0,0.0,0.0,0.0,0.018519,0.037037,0.0,0.0,0.111111,0.0,0.0,0.0,0.0,0.0,0.092593,0.0,0.0,0.018519,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.018519,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.018519,0.0,0.0,0.037037,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.055556,0.0,0.0,0.0,0.018519,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.074074,0.0,0.0,0.074074,0.037037,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.018519,0.0,0.0,0.0,0.0,0.055556,0.0,0.0,0.0,0.018519,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.037037,0.037037,0.018519,0.0,0.018519,0.0,0.018519,0.0,0.0,0.0,0.0,0.0,0.018519,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,M4B,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.029412,0.0,0.0,0.0,0.029412,0.0,0.0,0.0,0.029412,0.029412,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.029412,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.029412,0.0,0.0,0.029412,0.0,0.0,0.0,0.029412,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.058824,0.0,0.0,0.0,0.029412,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.029412,0.0,0.0,0.0,0.0,0.0,0.029412,0.0,0.058824,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.029412,0.058824,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.029412,0.0,0.0,0.029412,0.029412,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.029412,0.0,0.0,0.0,0.0,0.058824,0.0,0.0,0.088235,0.088235,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.029412,0.029412,0.0,0.0,0.029412,0.0,0.0,0.0,0.0,0.029412,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
5,M4G,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.013158,0.0,0.013158,0.0,0.0,0.013158,0.013158,0.0,0.013158,0.052632,0.013158,0.0,0.0,0.0,0.0,0.0,0.013158,0.0,0.0,0.0,0.0,0.0,0.013158,0.013158,0.0,0.026316,0.0,0.0,0.0,0.013158,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.052632,0.0,0.0,0.013158,0.0,0.0,0.0,0.013158,0.0,0.0,0.013158,0.0,0.013158,0.0,0.0,0.0,0.0,0.026316,0.0,0.0,0.0,0.0,0.0,0.013158,0.013158,0.0,0.0,0.0,0.0,0.0,0.013158,0.0,0.013158,0.0,0.0,0.0,0.0,0.0,0.0,0.039474,0.013158,0.013158,0.0,0.0,0.0,0.0,0.0,0.0,0.013158,0.013158,0.0,0.065789,0.026316,0.0,0.013158,0.0,0.0,0.0,0.013158,0.0,0.0,0.0,0.0,0.0,0.0,0.013158,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.013158,0.026316,0.0,0.013158,0.013158,0.013158,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.039474,0.0,0.0,0.0,0.026316,0.0,0.0,0.0,0.013158,0.013158,0.013158,0.013158,0.0,0.0,0.0,0.0,0.026316,0.013158,0.0,0.0,0.039474,0.026316,0.0,0.0,0.0,0.0,0.0,0.026316,0.0,0.0,0.0,0.013158,0.0,0.0,0.0,0.0,0.013158,0.013158,0.0,0.0,0.013158,0.0
6,M4M,0.0,0.0,0.0,0.02,0.0,0.01,0.0,0.01,0.01,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.04,0.02,0.04,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.01,0.0,0.01,0.01,0.03,0.0,0.0,0.01,0.0,0.0,0.05,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.01,0.0,0.01,0.12,0.01,0.0,0.0,0.0,0.01,0.0,0.01,0.0,0.0,0.0,0.04,0.0,0.0,0.01,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.03,0.01,0.01,0.0,0.0,0.0,0.0,0.01,0.01,0.0,0.02,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.01,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.04,0.0,0.02,0.01,0.03,0.0,0.0,0.0,0.03,0.0,0.0,0.0,0.0,0.03,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.01,0.0,0.01,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.04,0.0,0.0,0.0,0.0,0.0,0.0
7,M5A,0.0,0.0,0.0,0.0,0.01,0.0,0.01,0.0,0.01,0.02,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.02,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.01,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.07,0.0,0.0,0.0,0.0,0.0,0.01,0.01,0.0,0.0,0.0,0.13,0.0,0.0,0.0,0.01,0.0,0.01,0.01,0.0,0.01,0.0,0.03,0.0,0.0,0.01,0.0,0.0,0.02,0.0,0.01,0.03,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.01,0.0,0.03,0.0,0.01,0.0,0.0,0.0,0.01,0.01,0.01,0.0,0.01,0.0,0.0,0.02,0.01,0.0,0.0,0.0,0.0,0.0,0.03,0.03,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.02,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.04,0.02,0.01,0.0,0.01,0.0,0.01,0.0,0.03,0.0,0.0,0.0,0.0,0.05,0.0,0.01,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.02,0.02,0.0,0.0,0.0,0.0,0.01,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0
8,M5G,0.0,0.0,0.0,0.01,0.0,0.0,0.02,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.02,0.01,0.0,0.01,0.0,0.02,0.0,0.02,0.0,0.0,0.06,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.01,0.0,0.07,0.0,0.01,0.0,0.01,0.0,0.01,0.0,0.02,0.0,0.01,0.02,0.0,0.01,0.0,0.0,0.01,0.0,0.01,0.0,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.01,0.0,0.0,0.01,0.01,0.0,0.0,0.0,0.0,0.01,0.01,0.0,0.0,0.01,0.0,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.02,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.01,0.02,0.0,0.0,0.0,0.01,0.01,0.01,0.0,0.0,0.0,0.01,0.0,0.0,0.05,0.0,0.0,0.0,0.03,0.02,0.0,0.0,0.01,0.03,0.01,0.0,0.0,0.05,0.0,0.0,0.01,0.02,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.03,0.02,0.01,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0
9,M5V,0.0,0.014706,0.014706,0.0,0.0,0.0,0.014706,0.0,0.0,0.014706,0.0,0.0,0.0,0.014706,0.0,0.0,0.0,0.0,0.0,0.014706,0.014706,0.0,0.0,0.0,0.0,0.0,0.029412,0.0,0.0,0.0,0.0,0.029412,0.0,0.0,0.0,0.0,0.0,0.088235,0.0,0.0,0.014706,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.073529,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.014706,0.0,0.0,0.014706,0.0,0.0,0.0,0.014706,0.0,0.0,0.0,0.0,0.0,0.014706,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.014706,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.058824,0.0,0.044118,0.014706,0.0,0.0,0.029412,0.0,0.0,0.029412,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.014706,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.014706,0.0,0.0,0.0,0.0,0.0,0.117647,0.014706,0.0,0.0,0.029412,0.0,0.0,0.0,0.0,0.014706,0.0,0.0,0.0,0.029412,0.0,0.0,0.0,0.0,0.029412,0.014706,0.014706,0.0,0.014706,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.014706,0.0,0.0,0.0,0.0,0.0,0.0,0.014706,0.0,0.029412,0.029412,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.014706,0.0


In [None]:
Let's define the 10 most frequent categories for each postal code

In [None]:
#function to sort 10 most frequent venues from a dataframe row. Returns a series.
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues] #we use the series' indes which are the type of venues

## References:
[1]: [https://www.coursera.org/learn/applied-data-science-capstone/peer/I1bDq/segmenting-and-clustering-neighborhoods-in-toronto/submit](https://www.coursera.org/learn/applied-data-science-capstone/peer/I1bDq/segmenting-and-clustering-neighborhoods-in-toronto/submit).