## Segmenting and Clustering Neighborhoods in Toronto

This script is for Coursera IBM Data Science capstone project. It is used to analyze and cluster neighborhoods in Toronto.

### Import libraries

In [1]:
import numpy as np # library to handle data in a vectorized manner

import pandas as pd # library for data analsysis
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import json # library to handle JSON files

from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import requests 
from bs4 import BeautifulSoup # used to parse data from website
import lxml

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

import folium # map rendering library

print('Libraries imported.')

Libraries imported.


### Step 1: Parse table from website

We can define a function to parse the url and search for table content

In [2]:
def parse_url_table(url):
    
    # parse url
    response = requests.get(url)
    soup = BeautifulSoup(response.text, 'lxml')
    
    # parse table from url
    table = soup.find_all("table")[0]
    
    # find column names
    col_names = []
    th_tags = table.find_all('th')
    for th in th_tags:
        col_names.append(th.get_text().rstrip("\n"))
    
    # create a new pandas DataFrame to restore the table
    df = pd.DataFrame(columns=col_names)
    
    # read table content
    for row in table.find_all('tr'):
        cols = row.find_all('td')
        if len(cols)>0:
            temp = []
            for col in cols:
                temp.append(col.get_text().rstrip("\n"))
            df = df.append(pd.Series(temp,index=df.columns),ignore_index=True)
        
        
    return df

Get table from the Wikipedia page

In [3]:
url = 'https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M'
df = parse_url_table(url)
df.head()

Unnamed: 0,Postal Code,Borough,Neighborhood
0,M1A,Not assigned,Not assigned
1,M2A,Not assigned,Not assigned
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,"Regent Park, Harbourfront"


### Step 2: Clean the table

Define a function to clean the table

In [4]:
def clean_table(df):
    
    # drop rows with 'Not assigned' Borough
    df = df[df.Borough!='Not assigned']
    
    # set 'Not assigned' Neighborhood the same name as Borough
    df[df.Neighborhood.isna()].loc[:,'Neighborhood'] = df[df.Neighborhood.isna()].loc[:,'Borough']
    
    # clean Neighborhood, change '/' to ', '
    temp = df['Neighborhood'].values
    for idx, istr in enumerate(temp):
        temp[idx] = istr.replace(' / ',', ')
        
    df.assign(Neighborhood = temp)
    
    df = df.reset_index(drop=True)
    
    
    return df

Clean the pandas DataFrame

In [5]:
df = clean_table(df)
df.head()

Unnamed: 0,Postal Code,Borough,Neighborhood
0,M3A,North York,Parkwoods
1,M4A,North York,Victoria Village
2,M5A,Downtown Toronto,"Regent Park, Harbourfront"
3,M6A,North York,"Lawrence Manor, Lawrence Heights"
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government"


Let's check the shape of table

In [6]:
df.shape

(103, 3)

### Step 3: Get Geographic Coordinates for boroughs

Let's define a function to get geographic coordinate for any give postal code

In [7]:
# download geographic coordinates from the link
geo_code = pd.read_csv('http://cocl.us/Geospatial_data')

def get_geo_post(postal_code):
    latitude = geo_code.loc[geo_code['Postal Code']==postal_code, 'Latitude'].values
    longitude = geo_code.loc[geo_code['Postal Code']==postal_code, 'Longitude'].values
    
    return latitude, longitude

Use above function to get lat/lon for each borough

In [8]:
# add two new columns to the table
df['Latitude'] = np.nan
df['Longitude'] = np.nan

# get geographic coordinate for each postal code (row)
for idx in range(len(df.index)):
    postal_code = df.iloc[idx,0]        # get postal code for each borough
    df.iloc[idx,3], df.iloc[idx,4] = get_geo_post(postal_code)  # fill in the lat/lon

# check lat/lon in the table
df

Unnamed: 0,Postal Code,Borough,Neighborhood,Latitude,Longitude
0,M3A,North York,Parkwoods,43.753259,-79.329656
1,M4A,North York,Victoria Village,43.725882,-79.315572
2,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.65426,-79.360636
3,M6A,North York,"Lawrence Manor, Lawrence Heights",43.718518,-79.464763
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",43.662301,-79.389494
5,M9A,Etobicoke,"Islington Avenue, Humber Valley Village",43.667856,-79.532242
6,M1B,Scarborough,"Malvern, Rouge",43.806686,-79.194353
7,M3B,North York,Don Mills,43.745906,-79.352188
8,M4B,East York,"Parkview Hill, Woodbine Gardens",43.706397,-79.309937
9,M5B,Downtown Toronto,"Garden District, Ryerson",43.657162,-79.378937


### Step 4: Select boroughs in Toronto only and plot the boroughs in a map

Let's use geopy to get the geographical coordinates of Toronto first.

In [9]:
address = 'Toronto, ON'

geolocator = Nominatim(user_agent="tor_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Toronto are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Toronto are 43.6534817, -79.3839347.


In this work, the main interest is focusing on the neighborhoods in Toronto area. So let's pull the Toronto data from the table first.

In [10]:
# create a boolean list to select the rows that contains 'Toronto'
selected_list = ['Toronto' in name for name in df['Borough']]
tor_neighborhoods = df[selected_list]
tor_neighborhoods.reset_index(inplace=True, drop=True)
tor_neighborhoods

Unnamed: 0,Postal Code,Borough,Neighborhood,Latitude,Longitude
0,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.65426,-79.360636
1,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",43.662301,-79.389494
2,M5B,Downtown Toronto,"Garden District, Ryerson",43.657162,-79.378937
3,M5C,Downtown Toronto,St. James Town,43.651494,-79.375418
4,M4E,East Toronto,The Beaches,43.676357,-79.293031
5,M5E,Downtown Toronto,Berczy Park,43.644771,-79.373306
6,M5G,Downtown Toronto,Central Bay Street,43.657952,-79.387383
7,M6G,Downtown Toronto,Christie,43.669542,-79.422564
8,M5H,Downtown Toronto,"Richmond, Adelaide, King",43.650571,-79.384568
9,M6H,West Toronto,"Dufferin, Dovercourt Village",43.669005,-79.442259


##### Create a map of Toronto with neighborhoods superimposed on top.

In [11]:
# create map of Toronto using latitude and longitude values
map_toronto = folium.Map(location=[latitude, longitude], zoom_start=11)

# add markers to map
for lat, lng, borough, neighborhood in zip(tor_neighborhoods['Latitude'], tor_neighborhoods['Longitude'], tor_neighborhoods['Borough'], tor_neighborhoods['Neighborhood']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.6,
        parse_html=False).add_to(map_toronto)  
    
map_toronto

### Step 5: Explore the venues in the first borough using the Foursquare API 

#### Define Foursquare Credentials and Version

In [12]:
# read Foursquare confidentials from json file

with open('Foursquare_credentials.json') as file:
    foursquare_id = json.load(file)
    CLIENT_ID = foursquare_id['CLIENT_ID']
    CLIENT_SECRET = foursquare_id['CLIENT_SECRET']
    VERSION = foursquare_id['VERSION']


#### Information of the first borough in Toronto

In [13]:
neighborhood_latitude = tor_neighborhoods.loc[0, 'Latitude'] # neighborhood latitude value
neighborhood_longitude = tor_neighborhoods.loc[0, 'Longitude'] # neighborhood longitude value

neighborhood_name = tor_neighborhoods.loc[0, 'Neighborhood'] +', in '+ tor_neighborhoods.loc[0, 'Borough'] # neighborhood name

print('Latitude and longitude values of {} are {}, {}.'.format(neighborhood_name, 
                                                               neighborhood_latitude, 
                                                               neighborhood_longitude))

Latitude and longitude values of Regent Park, Harbourfront, in Downtown Toronto are 43.6542599, -79.3606359.


#### Get the top 100 venues that are in the first borough within 500 meters.

In [237]:
# Set up parameter and url for request
LIMIT = 100
radius= 300
url='https://api.foursquare.com/v2/venues/explore?client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(CLIENT_ID,CLIENT_SECRET,VERSION,neighborhood_latitude,neighborhood_longitude,radius,LIMIT)

Retrieve the results from Foursquare.

In [238]:
results = requests.get(url).json()

Define a function to get the category type for each revenue 

In [239]:
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

Clean the json and structure it into a pandas dataframe.

In [240]:
venues = results['response']['groups'][0]['items']

# flatten JSON
nearby_venues = json_normalize(venues)

# filter columns
filtered_columns = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
nearby_venues =nearby_venues.loc[:, filtered_columns]

# filter the category for each row
nearby_venues['venue.categories'] = nearby_venues.apply(get_category_type, axis=1)

# clean columns
nearby_venues.columns = [col.split(".")[-1] for col in nearby_venues.columns]

# number of venues that are returned by Foursquare
print('{} venues were returned by Foursquare.'.format(nearby_venues.shape[0]))

# print first 5 rows
nearby_venues.head()

16 venues were returned by Foursquare.


  after removing the cwd from sys.path.


Unnamed: 0,name,categories,lat,lng
0,Roselle Desserts,Bakery,43.653447,-79.362017
1,Tandem Coffee,Coffee Shop,43.653559,-79.361809
2,Cooper Koo Family YMCA,Distribution Center,43.653249,-79.358008
3,Body Blitz Spa East,Spa,43.654735,-79.359874
4,Morning Glory Cafe,Breakfast Spot,43.653947,-79.361149


### Step 6: Explore Neighborhoods in Downtown Toronto Using the Foursquare API

#### Create a function to repeat previous step for all boroughs in downtown Toronto

In [241]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

#### Run the above function on downtown Toronto region.

In [242]:
neighborhoods = [ineighborhood +' in '+ iborough for ineighborhood, iborough in zip(tor_neighborhoods['Neighborhood'], tor_neighborhoods['Borough'])]
toronto_venues = getNearbyVenues(names=neighborhoods,
                                   latitudes=tor_neighborhoods['Latitude'],
                                   longitudes=tor_neighborhoods['Longitude'],
                                 radius = radius
                                  )

Regent Park, Harbourfront in Downtown Toronto
Queen's Park, Ontario Provincial Government in Downtown Toronto
Garden District, Ryerson in Downtown Toronto
St. James Town in Downtown Toronto
The Beaches in East Toronto
Berczy Park in Downtown Toronto
Central Bay Street in Downtown Toronto
Christie in Downtown Toronto
Richmond, Adelaide, King in Downtown Toronto
Dufferin, Dovercourt Village in West Toronto
Harbourfront East, Union Station, Toronto Islands in Downtown Toronto
Little Portugal, Trinity in West Toronto
The Danforth West, Riverdale in East Toronto
Toronto Dominion Centre, Design Exchange in Downtown Toronto
Brockton, Parkdale Village, Exhibition Place in West Toronto
India Bazaar, The Beaches West in East Toronto
Commerce Court, Victoria Hotel in Downtown Toronto
Studio District in East Toronto
Lawrence Park in Central Toronto
Roselawn in Central Toronto
Davisville North in Central Toronto
Forest Hill North & West, Forest Hill Road Park in Central Toronto
High Park, The Junct

In [243]:
# check the results
print(toronto_venues.shape)
toronto_venues.head()

(900, 7)


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,"Regent Park, Harbourfront in Downtown Toronto",43.65426,-79.360636,Roselle Desserts,43.653447,-79.362017,Bakery
1,"Regent Park, Harbourfront in Downtown Toronto",43.65426,-79.360636,Tandem Coffee,43.653559,-79.361809,Coffee Shop
2,"Regent Park, Harbourfront in Downtown Toronto",43.65426,-79.360636,Cooper Koo Family YMCA,43.653249,-79.358008,Distribution Center
3,"Regent Park, Harbourfront in Downtown Toronto",43.65426,-79.360636,Body Blitz Spa East,43.654735,-79.359874,Spa
4,"Regent Park, Harbourfront in Downtown Toronto",43.65426,-79.360636,Morning Glory Cafe,43.653947,-79.361149,Breakfast Spot


Check how many venues were returned for each neighborhood

In [244]:
toronto_venues.groupby('Neighborhood').count()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Berczy Park in Downtown Toronto,7,7,7,7,7,7
"Brockton, Parkdale Village, Exhibition Place in West Toronto",14,14,14,14,14,14
"Business reply mail Processing Centre, South Central Letter Processing Plant Toronto in East Toronto",7,7,7,7,7,7
"CN Tower, King and Spadina, Railway Lands, Harbourfront West, Bathurst Quay, South Niagara, Island airport in Downtown Toronto",5,5,5,5,5,5
Central Bay Street in Downtown Toronto,33,33,33,33,33,33
Christie in Downtown Toronto,6,6,6,6,6,6
Church and Wellesley in Downtown Toronto,52,52,52,52,52,52
"Commerce Court, Victoria Hotel in Downtown Toronto",75,75,75,75,75,75
Davisville North in Central Toronto,4,4,4,4,4,4
Davisville in Central Toronto,23,23,23,23,23,23


#### Find out how many unique categories can be curated from all the returned venues

In [245]:
print('There are {} uniques categories.'.format(len(toronto_venues['Venue Category'].unique())))

There are 191 uniques categories.


In [246]:
toronto_venues['Venue Category'].unique()

array(['Bakery', 'Coffee Shop', 'Distribution Center', 'Spa',
       'Breakfast Spot', 'Gym / Fitness Center', 'Food Truck',
       'History Museum', 'Furniture / Home Store', 'Sandwich Place',
       'Light Rail Station', 'Bus Stop', 'Theater', 'Park',
       'Italian Restaurant', 'Sushi Restaurant', 'Thai Restaurant',
       'Bubble Tea Shop', 'Café', 'Japanese Restaurant', 'Comic Shop',
       'Clothing Store', 'Tea Room', 'Burrito Place', 'Plaza',
       'Music Venue', 'Pizza Place', 'Ramen Restaurant', 'Diner',
       'College Rec Center', 'Sporting Goods Shop', 'Mexican Restaurant',
       'Art Gallery', 'Electronics Store', 'Steakhouse', 'Burger Joint',
       'Middle Eastern Restaurant', 'Tanning Salon', 'Beer Bar', 'Lake',
       'New American Restaurant', 'Bookstore', 'Ethiopian Restaurant',
       'Fast Food Restaurant', 'Other Great Outdoors', 'Hookah Bar',
       'Restaurant', 'Shoe Store', 'Vietnamese Restaurant',
       'Video Game Store', 'Movie Theater', 'Pub', 'Greek 

### Step 7: Analyze each Neighborhood

In [247]:
# one hot encoding
toronto_onehot = pd.get_dummies(toronto_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
toronto_onehot['Neighborhood'] = toronto_venues['Neighborhood'] 

# move neighborhood column to the first column
toronto_onehot = toronto_onehot.reindex(columns=(['Neighborhood'] + list([icol for icol in toronto_onehot.columns if icol != 'Neighborhood']) ))
toronto_onehot.head()

Unnamed: 0,Neighborhood,Adult Boutique,Airport Food Court,Airport Gate,Airport Lounge,Airport Terminal,American Restaurant,Arepa Restaurant,Art Gallery,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,Auto Workshop,BBQ Joint,Bakery,Bank,Bar,Beer Bar,Beer Store,Belgian Restaurant,Bistro,Board Shop,Boat or Ferry,Bookstore,Breakfast Spot,Brewery,Bubble Tea Shop,Building,Burger Joint,Burrito Place,Bus Stop,Butcher,Café,Camera Store,Campground,Candy Store,Caribbean Restaurant,Cheese Shop,Chinese Restaurant,Church,Clothing Store,Cocktail Bar,Coffee Shop,College Gym,College Rec Center,Colombian Restaurant,Comfort Food Restaurant,Comic Shop,Concert Hall,Convenience Store,Cosmetics Shop,Costume Shop,Creperie,Cuban Restaurant,Cupcake Shop,Dance Studio,Deli / Bodega,Department Store,Dessert Shop,Diner,Discount Store,Distribution Center,Dog Run,Donut Shop,Dumpling Restaurant,Electronics Store,Ethiopian Restaurant,Falafel Restaurant,Farmers Market,Fast Food Restaurant,Fish & Chips Shop,Fish Market,Flower Shop,Food Court,Food Truck,Fountain,French Restaurant,Fried Chicken Joint,Fruit & Vegetable Store,Furniture / Home Store,Gaming Cafe,Garden,Gas Station,Gastropub,Gay Bar,General Entertainment,General Travel,Gift Shop,Gluten-free Restaurant,Gourmet Shop,Greek Restaurant,Grocery Store,Gym,Gym / Fitness Center,Gym Pool,Health & Beauty Service,History Museum,Hobby Shop,Home Service,Hong Kong Restaurant,Hookah Bar,Hotel,Hotel Bar,IT Services,Ice Cream Shop,Indian Restaurant,Indie Movie Theater,Intersection,Irish Pub,Italian Restaurant,Japanese Restaurant,Jazz Club,Jewelry Store,Juice Bar,Korean Restaurant,Lake,Latin American Restaurant,Light Rail Station,Liquor Store,Lounge,Mac & Cheese Joint,Malay Restaurant,Martial Arts Dojo,Mediterranean Restaurant,Men's Store,Mexican Restaurant,Middle Eastern Restaurant,Modern European Restaurant,Monument / Landmark,Moroccan Restaurant,Movie Theater,Museum,Music Venue,New American Restaurant,Noodle House,Office,Opera House,Organic Grocery,Other Great Outdoors,Park,Performing Arts Venue,Pet Store,Pharmacy,Photography Studio,Pizza Place,Playground,Plaza,Poke Place,Polish Restaurant,Pool,Pub,Ramen Restaurant,Record Shop,Restaurant,Salad Place,Salon / Barbershop,Sandwich Place,Scenic Lookout,Seafood Restaurant,Shoe Store,Shopping Mall,Skating Rink,Smoothie Shop,Soup Place,Spa,Speakeasy,Sporting Goods Shop,Sports Bar,Steakhouse,Strip Club,Supermarket,Sushi Restaurant,Taco Place,Tailor Shop,Tanning Salon,Tea Room,Tennis Court,Thai Restaurant,Theater,Theme Restaurant,Thrift / Vintage Store,Toy / Game Store,Trail,Vegetarian / Vegan Restaurant,Video Game Store,Vietnamese Restaurant,Whisky Bar,Wine Bar,Wine Shop,Wings Joint,Yoga Studio
0,"Regent Park, Harbourfront in Downtown Toronto",0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,"Regent Park, Harbourfront in Downtown Toronto",0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,"Regent Park, Harbourfront in Downtown Toronto",0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,"Regent Park, Harbourfront in Downtown Toronto",0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,"Regent Park, Harbourfront in Downtown Toronto",0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


And examine the new dataframe size.

In [248]:
toronto_onehot.shape

(900, 191)

#### Group rows by neighborhood and by taking the mean of the frequency of occurrence of each category

In [249]:
toronto_grouped = toronto_onehot.groupby('Neighborhood').mean().reset_index()
toronto_grouped

Unnamed: 0,Neighborhood,Adult Boutique,Airport Food Court,Airport Gate,Airport Lounge,Airport Terminal,American Restaurant,Arepa Restaurant,Art Gallery,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,Auto Workshop,BBQ Joint,Bakery,Bank,Bar,Beer Bar,Beer Store,Belgian Restaurant,Bistro,Board Shop,Boat or Ferry,Bookstore,Breakfast Spot,Brewery,Bubble Tea Shop,Building,Burger Joint,Burrito Place,Bus Stop,Butcher,Café,Camera Store,Campground,Candy Store,Caribbean Restaurant,Cheese Shop,Chinese Restaurant,Church,Clothing Store,Cocktail Bar,Coffee Shop,College Gym,College Rec Center,Colombian Restaurant,Comfort Food Restaurant,Comic Shop,Concert Hall,Convenience Store,Cosmetics Shop,Costume Shop,Creperie,Cuban Restaurant,Cupcake Shop,Dance Studio,Deli / Bodega,Department Store,Dessert Shop,Diner,Discount Store,Distribution Center,Dog Run,Donut Shop,Dumpling Restaurant,Electronics Store,Ethiopian Restaurant,Falafel Restaurant,Farmers Market,Fast Food Restaurant,Fish & Chips Shop,Fish Market,Flower Shop,Food Court,Food Truck,Fountain,French Restaurant,Fried Chicken Joint,Fruit & Vegetable Store,Furniture / Home Store,Gaming Cafe,Garden,Gas Station,Gastropub,Gay Bar,General Entertainment,General Travel,Gift Shop,Gluten-free Restaurant,Gourmet Shop,Greek Restaurant,Grocery Store,Gym,Gym / Fitness Center,Gym Pool,Health & Beauty Service,History Museum,Hobby Shop,Home Service,Hong Kong Restaurant,Hookah Bar,Hotel,Hotel Bar,IT Services,Ice Cream Shop,Indian Restaurant,Indie Movie Theater,Intersection,Irish Pub,Italian Restaurant,Japanese Restaurant,Jazz Club,Jewelry Store,Juice Bar,Korean Restaurant,Lake,Latin American Restaurant,Light Rail Station,Liquor Store,Lounge,Mac & Cheese Joint,Malay Restaurant,Martial Arts Dojo,Mediterranean Restaurant,Men's Store,Mexican Restaurant,Middle Eastern Restaurant,Modern European Restaurant,Monument / Landmark,Moroccan Restaurant,Movie Theater,Museum,Music Venue,New American Restaurant,Noodle House,Office,Opera House,Organic Grocery,Other Great Outdoors,Park,Performing Arts Venue,Pet Store,Pharmacy,Photography Studio,Pizza Place,Playground,Plaza,Poke Place,Polish Restaurant,Pool,Pub,Ramen Restaurant,Record Shop,Restaurant,Salad Place,Salon / Barbershop,Sandwich Place,Scenic Lookout,Seafood Restaurant,Shoe Store,Shopping Mall,Skating Rink,Smoothie Shop,Soup Place,Spa,Speakeasy,Sporting Goods Shop,Sports Bar,Steakhouse,Strip Club,Supermarket,Sushi Restaurant,Taco Place,Tailor Shop,Tanning Salon,Tea Room,Tennis Court,Thai Restaurant,Theater,Theme Restaurant,Thrift / Vintage Store,Toy / Game Store,Trail,Vegetarian / Vegan Restaurant,Video Game Store,Vietnamese Restaurant,Whisky Bar,Wine Bar,Wine Shop,Wings Joint,Yoga Studio
0,Berczy Park in Downtown Toronto,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.142857,0.0,0.0,0.0,0.0,0.0,0.0,0.142857,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.142857,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.142857,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.142857,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.142857,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.142857,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,"Brockton, Parkdale Village, Exhibition Place i...",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.071429,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.142857,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.071429,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.071429,0.142857,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.071429,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.071429,0.0,0.071429,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.071429,0.0,0.0,0.142857,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.071429,0.0,0.0,0.0,0.0,0.0
2,"Business reply mail Processing Centre, South C...",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.142857,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.142857,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.142857,0.142857,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.142857,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.142857,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.142857,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,"CN Tower, King and Spadina, Railway Lands, Har...",0.0,0.2,0.2,0.2,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,Central Bay Street in Downtown Toronto,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.030303,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.030303,0.0,0.0,0.030303,0.0,0.0,0.0,0.0,0.0,0.060606,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.363636,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.030303,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.030303,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.060606,0.060606,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.030303,0.030303,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.030303,0.0,0.030303,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.030303,0.0,0.0,0.090909,0.0,0.0,0.0,0.0,0.0,0.030303,0.0,0.030303,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
5,Christie in Downtown Toronto,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.166667,0.0,0.0,0.166667,0.0,0.0,0.0,0.0,0.0,0.0,0.166667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.333333,0.0,0.166667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
6,Church and Wellesley in Downtown Toronto,0.019231,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.019231,0.0,0.0,0.0,0.0,0.0,0.019231,0.019231,0.0,0.019231,0.0,0.038462,0.0,0.0,0.0,0.019231,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.057692,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.019231,0.0,0.0,0.019231,0.0,0.0,0.038462,0.019231,0.0,0.0,0.019231,0.0,0.0,0.0,0.019231,0.0,0.0,0.0,0.0,0.0,0.019231,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.076923,0.019231,0.0,0.0,0.0,0.0,0.0,0.019231,0.019231,0.019231,0.0,0.0,0.0,0.019231,0.0,0.0,0.0,0.0,0.0,0.0,0.019231,0.0,0.0,0.0,0.0,0.019231,0.038462,0.0,0.0,0.019231,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.019231,0.0,0.019231,0.019231,0.019231,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.019231,0.0,0.0,0.0,0.0,0.019231,0.0,0.0,0.019231,0.019231,0.0,0.019231,0.019231,0.0,0.019231,0.0,0.019231,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.019231,0.0,0.019231,0.0,0.0,0.0,0.0,0.0,0.019231,0.0,0.019231,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.019231,0.019231,0.019231
7,"Commerce Court, Victoria Hotel in Downtown Tor...",0.0,0.0,0.0,0.0,0.0,0.026667,0.0,0.013333,0.0,0.013333,0.0,0.0,0.0,0.026667,0.013333,0.0,0.013333,0.0,0.0,0.0,0.0,0.0,0.013333,0.013333,0.0,0.0,0.013333,0.013333,0.013333,0.0,0.0,0.066667,0.0,0.0,0.0,0.0,0.0,0.013333,0.0,0.013333,0.0,0.146667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.013333,0.0,0.053333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.013333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.026667,0.0,0.013333,0.0,0.013333,0.026667,0.0,0.0,0.0,0.026667,0.013333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.053333,0.0,0.0,0.013333,0.0,0.0,0.0,0.013333,0.026667,0.026667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.013333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.013333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.013333,0.0,0.0,0.04,0.026667,0.0,0.013333,0.0,0.013333,0.0,0.013333,0.0,0.0,0.013333,0.0,0.0,0.013333,0.0,0.013333,0.0,0.0,0.013333,0.0,0.013333,0.0,0.013333,0.0,0.013333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.013333,0.0,0.0,0.0
8,Davisville North in Central Toronto,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
9,Davisville in Central Toronto,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.130435,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.130435,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.043478,0.0,0.0,0.0,0.0,0.0,0.0,0.130435,0.043478,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.043478,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.043478,0.0,0.0,0.0,0.0,0.0,0.0,0.043478,0.0,0.0,0.0,0.086957,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.086957,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.043478,0.0,0.043478,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.043478,0.0,0.0,0.0,0.0,0.0,0.043478,0.0,0.0,0.0,0.043478,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


Check the new size

In [250]:
toronto_grouped.shape

(37, 191)

#### Print the top 5 most common venues for each neighborhood

In [251]:
num_top_venues = 5

for ineighbor in toronto_grouped['Neighborhood']:
    print("----"+ineighbor+"----")
    temp = toronto_grouped[toronto_grouped['Neighborhood'] == ineighbor].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----Berczy Park in Downtown Toronto----
            venue  freq
0  Clothing Store  0.14
1    Concert Hall  0.14
2  Breakfast Spot  0.14
3   Grocery Store  0.14
4    Liquor Store  0.14


----Brockton, Parkdale Village, Exhibition Place in West Toronto----
            venue  freq
0            Café  0.14
1  Sandwich Place  0.14
2             Gym  0.14
3       Pet Store  0.07
4   Grocery Store  0.07


----Business reply mail Processing Centre, South Central Letter Processing Plant Toronto in East Toronto----
                  venue  freq
0               Brewery  0.14
1                  Park  0.14
2  Fast Food Restaurant  0.14
3    Light Rail Station  0.14
4        Farmers Market  0.14


----CN Tower, King and Spadina, Railway Lands, Harbourfront West, Bathurst Quay, South Niagara, Island airport in Downtown Toronto----
                venue  freq
0         Coffee Shop   0.2
1        Airport Gate   0.2
2      Airport Lounge   0.2
3    Airport Terminal   0.2
4  Airport Food Court   0.2


---

#### Merge top 10 most common venues in each neighborhood into a dataframe

Create a function to sort the venues in descending order.

In [252]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

Sort venues in each neighborhood.

In [253]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
toronto_neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
toronto_neighborhoods_venues_sorted['Neighborhood'] = toronto_grouped['Neighborhood']

for ind in np.arange(toronto_grouped.shape[0]):
    toronto_neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(toronto_grouped.iloc[ind, :], num_top_venues)

toronto_neighborhoods_venues_sorted

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Berczy Park in Downtown Toronto,Concert Hall,Clothing Store,Breakfast Spot,Beer Bar,Restaurant,Liquor Store,Grocery Store,Cupcake Shop,Fast Food Restaurant,Costume Shop
1,"Brockton, Parkdale Village, Exhibition Place i...",Gym,Sandwich Place,Café,Pet Store,Breakfast Spot,Japanese Restaurant,Park,Coffee Shop,Restaurant,Grocery Store
2,"Business reply mail Processing Centre, South C...",Light Rail Station,Auto Workshop,Fast Food Restaurant,Farmers Market,Garden,Brewery,Park,Cupcake Shop,Dance Studio,Fish & Chips Shop
3,"CN Tower, King and Spadina, Railway Lands, Har...",Airport Food Court,Airport Gate,Airport Lounge,Airport Terminal,Coffee Shop,Yoga Studio,Distribution Center,Fish & Chips Shop,Fast Food Restaurant,Farmers Market
4,Central Bay Street in Downtown Toronto,Coffee Shop,Sandwich Place,Café,Japanese Restaurant,Italian Restaurant,Bubble Tea Shop,Fried Chicken Joint,Smoothie Shop,Spa,Bookstore
5,Christie in Downtown Toronto,Grocery Store,Gym / Fitness Center,Café,Coffee Shop,Candy Store,Yoga Studio,Discount Store,Falafel Restaurant,Ethiopian Restaurant,Electronics Store
6,Church and Wellesley in Downtown Toronto,Gay Bar,Coffee Shop,Dessert Shop,Japanese Restaurant,Burger Joint,Yoga Studio,Ice Cream Shop,Italian Restaurant,Juice Bar,Martial Arts Dojo
7,"Commerce Court, Victoria Hotel in Downtown Tor...",Coffee Shop,Café,Hotel,Deli / Bodega,Restaurant,Salad Place,Bakery,Gym,Gastropub,Gluten-free Restaurant
8,Davisville North in Central Toronto,Breakfast Spot,Flower Shop,Pool,Convenience Store,Greek Restaurant,Gourmet Shop,Falafel Restaurant,Ethiopian Restaurant,Electronics Store,Dumpling Restaurant
9,Davisville in Central Toronto,Dessert Shop,Coffee Shop,Café,Italian Restaurant,Pizza Place,Gas Station,Diner,Indian Restaurant,Thai Restaurant,Sushi Restaurant


### Step 8: Cluster Neighborhoods

Use kmeans algorithm to cluster the nerghborhood into 5 clusters.

In [297]:
# set number of clusters
kclusters = 7

toronto_grouped_clustering = toronto_grouped.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(toronto_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

array([5, 0, 5, 0, 0, 0, 5, 0, 5, 0], dtype=int32)

Merge the cluster label with top 10 most common venues for all neighborhoods.

In [298]:
# add clustering labels
neighborhoods_venues_cluster = toronto_neighborhoods_venues_sorted.copy()
neighborhoods_venues_cluster.insert(0, 'Cluster Labels', kmeans.labels_)

toronto_merged = tor_neighborhoods.copy()
toronto_merged.drop(columns=['Borough','Neighborhood'],inplace=True)
toronto_merged.insert(0, 'Neighborhood', neighborhoods)

# merge toronto_grouped with toronto_data to add latitude/longitude for each neighborhood
toronto_merged = toronto_merged.join(neighborhoods_venues_cluster.set_index('Neighborhood'), on='Neighborhood', how='inner')

toronto_merged

Unnamed: 0,Neighborhood,Postal Code,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,"Regent Park, Harbourfront in Downtown Toronto",M5A,43.65426,-79.360636,5,Furniture / Home Store,Spa,Theater,Coffee Shop,Distribution Center,Sandwich Place,Bus Stop,Light Rail Station,Breakfast Spot,Food Truck
1,"Queen's Park, Ontario Provincial Government in...",M7A,43.662301,-79.389494,0,Coffee Shop,Italian Restaurant,Park,Sandwich Place,Café,Japanese Restaurant,Sushi Restaurant,Thai Restaurant,Bubble Tea Shop,Dog Run
2,"Garden District, Ryerson in Downtown Toronto",M5B,43.657162,-79.378937,0,Coffee Shop,Middle Eastern Restaurant,Café,Tea Room,Bar,Clothing Store,Hotel,Sandwich Place,Bubble Tea Shop,Restaurant
3,St. James Town in Downtown Toronto,M5C,43.651494,-79.375418,0,Gastropub,Restaurant,Coffee Shop,Japanese Restaurant,Cosmetics Shop,Middle Eastern Restaurant,Church,Food Truck,BBQ Joint,Poke Place
4,The Beaches in East Toronto,M4E,43.676357,-79.293031,5,Park,Playground,Trail,Spa,Dessert Shop,Falafel Restaurant,Ethiopian Restaurant,Electronics Store,Dumpling Restaurant,Donut Shop
5,Berczy Park in Downtown Toronto,M5E,43.644771,-79.373306,5,Concert Hall,Clothing Store,Breakfast Spot,Beer Bar,Restaurant,Liquor Store,Grocery Store,Cupcake Shop,Fast Food Restaurant,Costume Shop
6,Central Bay Street in Downtown Toronto,M5G,43.657952,-79.387383,0,Coffee Shop,Sandwich Place,Café,Japanese Restaurant,Italian Restaurant,Bubble Tea Shop,Fried Chicken Joint,Smoothie Shop,Spa,Bookstore
7,Christie in Downtown Toronto,M6G,43.669542,-79.422564,0,Grocery Store,Gym / Fitness Center,Café,Coffee Shop,Candy Store,Yoga Studio,Discount Store,Falafel Restaurant,Ethiopian Restaurant,Electronics Store
8,"Richmond, Adelaide, King in Downtown Toronto",M5H,43.650571,-79.384568,5,Coffee Shop,Steakhouse,Asian Restaurant,Café,Sushi Restaurant,Pizza Place,Thai Restaurant,Japanese Restaurant,Bar,Salad Place
9,"Dufferin, Dovercourt Village in West Toronto",M6H,43.669005,-79.442259,5,Pharmacy,Bakery,Grocery Store,Music Venue,Middle Eastern Restaurant,Bar,Bank,Falafel Restaurant,Ethiopian Restaurant,Electronics Store


Visualize the clustering results on the map.

In [299]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(toronto_merged['Latitude'], toronto_merged['Longitude'], toronto_merged['Neighborhood'], toronto_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

### Step 8: Examine Each Clusters

Examine each cluster for most common venues.

#### Cluster 1

In [300]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 0, toronto_merged.columns[[0] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
1,"Queen's Park, Ontario Provincial Government in...",Coffee Shop,Italian Restaurant,Park,Sandwich Place,Café,Japanese Restaurant,Sushi Restaurant,Thai Restaurant,Bubble Tea Shop,Dog Run
2,"Garden District, Ryerson in Downtown Toronto",Coffee Shop,Middle Eastern Restaurant,Café,Tea Room,Bar,Clothing Store,Hotel,Sandwich Place,Bubble Tea Shop,Restaurant
3,St. James Town in Downtown Toronto,Gastropub,Restaurant,Coffee Shop,Japanese Restaurant,Cosmetics Shop,Middle Eastern Restaurant,Church,Food Truck,BBQ Joint,Poke Place
6,Central Bay Street in Downtown Toronto,Coffee Shop,Sandwich Place,Café,Japanese Restaurant,Italian Restaurant,Bubble Tea Shop,Fried Chicken Joint,Smoothie Shop,Spa,Bookstore
7,Christie in Downtown Toronto,Grocery Store,Gym / Fitness Center,Café,Coffee Shop,Candy Store,Yoga Studio,Discount Store,Falafel Restaurant,Ethiopian Restaurant,Electronics Store
10,"Harbourfront East, Union Station, Toronto Isla...",Coffee Shop,Café,Plaza,Pizza Place,Hotel,Salad Place,Italian Restaurant,Boat or Ferry,Sports Bar,Bank
13,"Toronto Dominion Centre, Design Exchange in Do...",Coffee Shop,Restaurant,Deli / Bodega,Salad Place,American Restaurant,Café,Bakery,Gluten-free Restaurant,Gym,Gym / Fitness Center
14,"Brockton, Parkdale Village, Exhibition Place i...",Gym,Sandwich Place,Café,Pet Store,Breakfast Spot,Japanese Restaurant,Park,Coffee Shop,Restaurant,Grocery Store
16,"Commerce Court, Victoria Hotel in Downtown Tor...",Coffee Shop,Café,Hotel,Deli / Bodega,Restaurant,Salad Place,Bakery,Gym,Gastropub,Gluten-free Restaurant
17,Studio District in East Toronto,Coffee Shop,Café,Pet Store,Bank,Convenience Store,Comfort Food Restaurant,Diner,Clothing Store,Cheese Shop,Sandwich Place


#### Cluster 2

In [301]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 1, toronto_merged.columns[[0] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
33,Rosedale in Downtown Toronto,Campground,Yoga Studio,Concert Hall,Fast Food Restaurant,Farmers Market,Falafel Restaurant,Ethiopian Restaurant,Electronics Store,Dumpling Restaurant,Donut Shop


#### Cluster 3

In [302]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 2, toronto_merged.columns[[0] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
19,Roselawn in Central Toronto,Health & Beauty Service,Discount Store,Fast Food Restaurant,Farmers Market,Falafel Restaurant,Ethiopian Restaurant,Electronics Store,Dumpling Restaurant,Donut Shop,Dog Run


#### Cluster 4

In [303]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 3, toronto_merged.columns[[0] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
18,Lawrence Park in Central Toronto,Photography Studio,Gym / Fitness Center,Airport Food Court,Discount Store,Fast Food Restaurant,Farmers Market,Falafel Restaurant,Ethiopian Restaurant,Electronics Store,Dumpling Restaurant


#### Cluster 5

In [304]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 4, toronto_merged.columns[[0] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
23,"North Toronto West, Lawrence Park in Central ...",Sushi Restaurant,Yoga Studio,Fish & Chips Shop,Farmers Market,Falafel Restaurant,Ethiopian Restaurant,Electronics Store,Dumpling Restaurant,Donut Shop,Dog Run


#### Cluster 6

In [305]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 5, toronto_merged.columns[[0] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,"Regent Park, Harbourfront in Downtown Toronto",Furniture / Home Store,Spa,Theater,Coffee Shop,Distribution Center,Sandwich Place,Bus Stop,Light Rail Station,Breakfast Spot,Food Truck
4,The Beaches in East Toronto,Park,Playground,Trail,Spa,Dessert Shop,Falafel Restaurant,Ethiopian Restaurant,Electronics Store,Dumpling Restaurant,Donut Shop
5,Berczy Park in Downtown Toronto,Concert Hall,Clothing Store,Breakfast Spot,Beer Bar,Restaurant,Liquor Store,Grocery Store,Cupcake Shop,Fast Food Restaurant,Costume Shop
8,"Richmond, Adelaide, King in Downtown Toronto",Coffee Shop,Steakhouse,Asian Restaurant,Café,Sushi Restaurant,Pizza Place,Thai Restaurant,Japanese Restaurant,Bar,Salad Place
9,"Dufferin, Dovercourt Village in West Toronto",Pharmacy,Bakery,Grocery Store,Music Venue,Middle Eastern Restaurant,Bar,Bank,Falafel Restaurant,Ethiopian Restaurant,Electronics Store
11,"Little Portugal, Trinity in West Toronto",Bar,Vietnamese Restaurant,Asian Restaurant,Yoga Studio,Men's Store,Record Shop,Japanese Restaurant,Brewery,New American Restaurant,Mac & Cheese Joint
12,"The Danforth West, Riverdale in East Toronto",Greek Restaurant,Ice Cream Shop,Restaurant,Bookstore,Indian Restaurant,Fruit & Vegetable Store,Italian Restaurant,Japanese Restaurant,Juice Bar,Dessert Shop
15,"India Bazaar, The Beaches West in East Toronto",Fast Food Restaurant,Pub,Hotel,Burrito Place,Fish & Chips Shop,Restaurant,Board Shop,Italian Restaurant,Sushi Restaurant,Intersection
20,Davisville North in Central Toronto,Breakfast Spot,Flower Shop,Pool,Convenience Store,Greek Restaurant,Gourmet Shop,Falafel Restaurant,Ethiopian Restaurant,Electronics Store,Dumpling Restaurant
24,"The Annex, North Midtown, Yorkville in Central...",Sandwich Place,Grocery Store,Asian Restaurant,Middle Eastern Restaurant,Donut Shop,Café,Burger Joint,Liquor Store,Indian Restaurant,Park


#### Cluster 7

In [306]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 6, toronto_merged.columns[[0] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
29,"Moore Park, Summerhill East in Central Toronto",Intersection,Park,Yoga Studio,Farmers Market,Falafel Restaurant,Ethiopian Restaurant,Electronics Store,Dumpling Restaurant,Donut Shop,Dog Run
