<a href="https://cognitiveclass.ai"><img src = "https://ibm.box.com/shared/static/9gegpsmnsoo25ikkbl4qzlvlyjbgxs5x.png" width = 400> </a>

<h1 align=center><font size = 5>Segmenting and Clustering Neighborhoods in Toronto</font></h1>

## Introduction

In this lab, you will learn how to convert addresses into their equivalent latitude and longitude values. Also, you will use the Foursquare API to explore neighborhoods in Toronto. You will use the **explore** function to get the most common venue categories in each neighborhood, and then use this feature to group the neighborhoods into clusters. You will use the *k*-means clustering algorithm to complete this task. Finally, you will use the Folium library to visualize the neighborhoods in Toronto and their emerging clusters.

## Table of Contents

<div class="alert alert-block alert-info" style="margin-top: 20px">

<font size = 3>

1. <a href="#item1">Download and Explore Dataset</a>

2. <a href="#item2">Explore Neighborhoods in Toronto</a>

3. <a href="#item3">Analyze Each Neighborhood</a>

4. <a href="#item4">Cluster Neighborhoods</a>

5. <a href="#item5">Examine Clusters</a>    
</font>
</div>

Before we get the data and start exploring it, let's download all the dependencies that we will need.

In [1]:
import numpy as np # library to handle data in a vectorized manner

import pandas as pd # library for data analsysis
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import json # library to handle JSON files

#!conda install -c conda-forge geopy --yes # uncomment this line if you haven't completed the Foursquare API lab
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

#!conda install -c conda-forge folium=0.5.0 --yes # uncomment this line if you haven't completed the Foursquare API lab
# import folium # map rendering library

# print('Libraries imported.')

In [2]:
!conda install -c conda-forge folium=0.5.0 --yes

Solving environment: done

## Package Plan ##

  environment location: /opt/conda/envs/DSX-Python35

  added / updated specs: 
    - folium=0.5.0


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    branca-0.3.1               |             py_0          25 KB  conda-forge
    altair-2.2.2               |           py35_1         462 KB  conda-forge
    ca-certificates-2019.3.9   |       hecc5488_0         146 KB  conda-forge
    certifi-2018.8.24          |        py35_1001         139 KB  conda-forge
    openssl-1.0.2r             |       h14c3975_0         3.1 MB  conda-forge
    vincent-0.4.4              |             py_1          28 KB  conda-forge
    folium-0.5.0               |             py_0          45 KB  conda-forge
    ------------------------------------------------------------
                                           Total:         4.0 MB

The following NEW packages will

In [3]:
import folium # map rendering library

print('Libraries imported.')

Libraries imported.


<a id='item1'></a>

## 1. Download and Explore Dataset

The Toronto neighborhood file was located here. https://portal0.cf.opendata.inter.sandbox-toronto.ca/dataset/neighbourhoods/

In [14]:
import urllib
import json
import pprint


In [37]:
!wget -O toronto_data.csv https://ckan0.cf.opendata.inter.sandbox-toronto.ca/download_resource/1d02b0f0-d735-4469-8f71-ea6d96b319e4
print('Data downloaded!')

--2019-05-12 07:10:29--  https://ckan0.cf.opendata.inter.sandbox-toronto.ca/download_resource/1d02b0f0-d735-4469-8f71-ea6d96b319e4
Resolving ckan0.cf.opendata.inter.sandbox-toronto.ca (ckan0.cf.opendata.inter.sandbox-toronto.ca)... 13.249.79.76, 13.249.79.114, 13.249.79.25, ...
Connecting to ckan0.cf.opendata.inter.sandbox-toronto.ca (ckan0.cf.opendata.inter.sandbox-toronto.ca)|13.249.79.76|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: unspecified [text/csv]
Saving to: ‘toronto_data.csv’

    [ <=>                                   ] 1,675,733   --.-K/s   in 0.07s   

2019-05-12 07:10:31 (23.9 MB/s) - ‘toronto_data.csv’ saved [1675733]

Data downloaded!


In [38]:
neighborhoods = pd.read_csv('toronto_data.csv')
neighborhoods.head()

Unnamed: 0,_id,AREA_ID,AREA_ATTR_ID,PARENT_AREA_ID,AREA_SHORT_CODE,AREA_LONG_CODE,AREA_NAME,AREA_DESC,X,Y,LONGITUDE,LATITUDE,OBJECTID,Shape__Area,Shape__Length,geometry
0,981,25886861,25926662,49885,94,94,Wychwood (94),Wychwood (94),,,-79.425515,43.676919,16491505,3217960.0,7515.779658,POLYGON ((-79.43591570873059 43.68015339477487...
1,982,25886820,25926663,49885,100,100,Yonge-Eglinton (100),Yonge-Eglinton (100),,,-79.40359,43.704689,16491521,3160334.0,7872.021074,POLYGON ((-79.41095783825973 43.70408282301482...
2,983,25886834,25926664,49885,97,97,Yonge-St.Clair (97),Yonge-St.Clair (97),,,-79.397871,43.687859,16491537,2222464.0,8130.411276,POLYGON ((-79.39119482591805 43.68108112277795...
3,984,25886593,25926665,49885,27,27,York University Heights (27),York University Heights (27),,,-79.488883,43.765736,16491553,25418210.0,25632.335242,POLYGON ((-79.50528791818931 43.75987349878096...
4,985,25886688,25926666,49885,31,31,Yorkdale-Glen Park (31),Yorkdale-Glen Park (31),,,-79.457108,43.714672,16491569,11566690.0,13953.408098,"POLYGON ((-79.4396873322608 43.70560981891119,..."


In [39]:
neighborhoods.drop(['X','Y','OBJECTID','_id','AREA_ID','AREA_ATTR_ID','PARENT_AREA_ID','AREA_LONG_CODE','AREA_DESC','Shape__Area','Shape__Length','geometry'], axis = 1,inplace=True)
neighborhoods.head()

Unnamed: 0,AREA_SHORT_CODE,AREA_NAME,LONGITUDE,LATITUDE
0,94,Wychwood (94),-79.425515,43.676919
1,100,Yonge-Eglinton (100),-79.40359,43.704689
2,97,Yonge-St.Clair (97),-79.397871,43.687859
3,27,York University Heights (27),-79.488883,43.765736
4,31,Yorkdale-Glen Park (31),-79.457108,43.714672


In [41]:
print('The dataframe has {} NAMES and {} neighborhoods.'.format(
        len(neighborhoods['AREA_NAME'].unique()),
        neighborhoods.shape[0]
    )
)

The dataframe has 140 NAMES and 140 neighborhoods.


#### Use geopy library to get the latitude and longitude values of Toronto.

In order to define an instance of the geocoder, we need to define a user_agent. We will name our agent <em>YYZ_explorer</em>, as shown below.

In [34]:
address = 'Toronto,ON'

geolocator = Nominatim(user_agent="YYZ_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Toronto are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Toronto are 43.653963, -79.387207.


#### Create a map of Toronto with neighborhoods superimposed on top.

In [44]:
# create map of Toronto using latitude and longitude values
map_toronto = folium.Map(location=[latitude, longitude], zoom_start=10)

# add markers to map
for lat, lng, area_short_code, neighborhood in zip(neighborhoods['LATITUDE'], neighborhoods['LONGITUDE'], neighborhoods['AREA_SHORT_CODE'], neighborhoods['AREA_NAME']):
    label = '{}'.format(neighborhood)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_toronto)  
    
map_toronto

**Folium** is a great visualization library. Feel free to zoom into the above map, and click on each circle mark to reveal the name of the neighborhood and its respective borough.

Next, we are going to start utilizing the Foursquare API to explore the neighborhoods and segment them.

#### Define Foursquare Credentials and Version

In [45]:
CLIENT_ID = 'IJ5W52TJAI3JAMNFGMYAMDCW0NP2MGKG240YVJYI4EQ2F12I' # your Foursquare ID
CLIENT_SECRET = 'VSVBTV3D4VIROFHDAZFJPQHC303OBSBLFTQZNMU030GCPXPV' # your Foursquare Secret
VERSION = '20180604'
LIMIT = 30
print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: IJ5W52TJAI3JAMNFGMYAMDCW0NP2MGKG240YVJYI4EQ2F12I
CLIENT_SECRET:VSVBTV3D4VIROFHDAZFJPQHC303OBSBLFTQZNMU030GCPXPV


#### Let's explore the first neighborhood in our dataframe.

Get the neighborhood's name.

In [47]:
neighborhoods.loc[0, 'AREA_NAME']

'Wychwood (94)'

Get the neighborhood's latitude and longitude values.

In [48]:
neighborhood_latitude = neighborhoods.loc[0, 'LATITUDE'] # neighborhood latitude value
neighborhood_longitude = neighborhoods.loc[0, 'LONGITUDE'] # neighborhood longitude value

neighborhood_name = neighborhoods.loc[0, 'AREA_NAME'] # neighborhood name

print('Latitude and longitude values of {} are {}, {}.'.format(neighborhood_name, 
                                                               neighborhood_latitude, 
                                                               neighborhood_longitude))

Latitude and longitude values of Wychwood (94) are 43.6769192679, -79.425514947.


#### Now, let's get the top 100 venues that are in Bitter Lake within a radius of 500 meters.

In [52]:
LIMIT = 100 # limit of number of venues returned by Foursquare API
radius = 500 # define radius

First, let's create the GET request URL. Name your URL **url**.

In [53]:
# type your answer here
# create URL
url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
    CLIENT_ID, 
    CLIENT_SECRET, 
    VERSION, 
    neighborhood_latitude, 
    neighborhood_longitude, 
    radius, 
    LIMIT)
url # display URL




'https://api.foursquare.com/v2/venues/explore?&client_id=IJ5W52TJAI3JAMNFGMYAMDCW0NP2MGKG240YVJYI4EQ2F12I&client_secret=VSVBTV3D4VIROFHDAZFJPQHC303OBSBLFTQZNMU030GCPXPV&v=20180604&ll=43.6769192679,-79.425514947&radius=500&limit=100'

Send the GET request and examine the resutls

In [54]:
results = requests.get(url).json()
results

{'meta': {'code': 200, 'requestId': '5cd7ca18dd57972412348a03'},
 'response': {'groups': [{'items': [{'reasons': {'count': 0,
       'items': [{'reasonName': 'globalInteractionReason',
         'summary': 'This spot is popular',
         'type': 'general'}]},
      'referralId': 'e-0-4b86e89df964a52051a531e3-0',
      'venue': {'categories': [{'icon': {'prefix': 'https://ss3.4sqi.net/img/categories_v2/shops/food_farmersmarket_',
          'suffix': '.png'},
         'id': '4bf58dd8d48988d1fa941735',
         'name': 'Farmers Market',
         'pluralName': 'Farmers Markets',
         'primary': True,
         'shortName': "Farmer's Market"}],
       'id': '4b86e89df964a52051a531e3',
       'location': {'address': '601 Christie Street',
        'cc': 'CA',
        'city': 'Toronto',
        'country': 'Canada',
        'crossStreet': 'St Clair Avenue West',
        'distance': 369,
        'formattedAddress': ['601 Christie Street (St Clair Avenue West)',
         'Toronto ON M6G 4C7',


From the Foursquare lab in the previous module, we know that all the information is in the *items* key. Before we proceed, let's borrow the **get_category_type** function from the Foursquare lab.

In [55]:
# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

Now we are ready to clean the json and structure it into a *pandas* dataframe.

In [56]:
venues = results['response']['groups'][0]['items']
    
nearby_venues = json_normalize(venues) # flatten JSON

# filter columns
filtered_columns = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
nearby_venues =nearby_venues.loc[:, filtered_columns]

# filter the category for each row
nearby_venues['venue.categories'] = nearby_venues.apply(get_category_type, axis=1)

# clean columns
nearby_venues.columns = [col.split(".")[-1] for col in nearby_venues.columns]

nearby_venues.head()

Unnamed: 0,name,categories,lat,lng
0,Wychwood Barns Farmers' Market,Farmers Market,43.68001,-79.423849
1,Wychwood Barns,Event Space,43.680028,-79.42381
2,Hillcrest Park,Park,43.676012,-79.424787
3,Makerfaire Toronto,Public Art,43.680004,-79.423805


And how many venues were returned by Foursquare?

In [57]:
print('{} venues were returned by Foursquare.'.format(nearby_venues.shape[0]))

4 venues were returned by Foursquare.


<a id='item2'></a>

## 2. Explore Neighborhoods in Toronto

#### Let's create a function to repeat the same process to all the neighborhoods in Toronto

In [59]:
def getNearbyVenues(names, latitudes, longitudes, radius=300):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

#### Now write the code to run the above function on each neighborhood and create a new dataframe called *toronto_venues*.

In [60]:
# type your answer here

toronto_venues = getNearbyVenues(names=neighborhoods['AREA_NAME'],
                                   latitudes=neighborhoods['LATITUDE'],
                                   longitudes=neighborhoods['LONGITUDE']
                                  )



Wychwood (94)
Yonge-Eglinton (100)
Yonge-St.Clair (97)
York University Heights (27)
Yorkdale-Glen Park (31)
Lambton Baby Point (114)
Lansing-Westgate (38)
Lawrence Park North (105)
Lawrence Park South (103)
Leaside-Bennington (56)
Little Portugal (84)
Long Branch (19)
Malvern (132)
Maple Leaf (29)
Markland Wood (12)
Milliken (130)
Mimico (includes Humber Bay Shores) (17)
Morningside (135)
Moss Park (73)
Mount Dennis (115)
Mount Olive-Silverstone-Jamestown (2)
Mount Pleasant East (99)
Mount Pleasant West (104)
New Toronto (18)
Newtonbrook East (50)
Newtonbrook West (36)
Niagara (82)
North Riverdale (68)
North St.James Town (74)
O'Connor-Parkview (54)
Oakridge (121)
Oakwood Village (107)
Old East York (58)
Palmerston-Little Italy (80)
Parkwoods-Donalda (45)
Pelmo Park-Humberlea (23)
Playter Estates-Danforth (67)
Pleasant View (46)
Princess-Rosethorn (10)
Regent Park (72)
Rexdale-Kipling (4)
Rockcliffe-Smythe (111)
Roncesvalles (86)
Rosedale-Moore Park (98)
Rouge (131)
Runnymede-Bloor Wes

#### Let's check the size of the resulting dataframe

In [61]:
print(toronto_venues.shape)
toronto_venues.head()

(821, 7)


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Wychwood (94),43.676919,-79.425515,Hillcrest Park,43.676012,-79.424787,Park
1,Wychwood (94),43.676919,-79.425515,Annabelle Pasta Bar,43.675445,-79.423341,Italian Restaurant
2,Wychwood (94),43.676919,-79.425515,Bob Coffee Bar,43.675376,-79.423268,Coffee Shop
3,Wychwood (94),43.676919,-79.425515,Wychwood Barns Community Gallery,43.679386,-79.424254,Art Gallery
4,Yonge-Eglinton (100),43.704689,-79.40359,Boom Breakfast & Co.,43.705748,-79.403482,Breakfast Spot


Let's check how many venues were returned for each neighborhood

In [62]:
toronto_venues.groupby('Neighborhood').count()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Agincourt North (129),1,1,1,1,1,1
Agincourt South-Malvern West (128),10,10,10,10,10,10
Alderwood (20),4,4,4,4,4,4
Annex (95),1,1,1,1,1,1
Banbury-Don Mills (42),4,4,4,4,4,4
Bathurst Manor (34),1,1,1,1,1,1
Bay Street Corridor (76),44,44,44,44,44,44
Bayview Village (52),1,1,1,1,1,1
Bayview Woods-Steeles (49),1,1,1,1,1,1
Bedford Park-Nortown (39),9,9,9,9,9,9


#### Let's find out how many unique categories can be curated from all the returned venues

In [63]:
print('There are {} uniques categories.'.format(len(toronto_venues['Venue Category'].unique())))

There are 192 uniques categories.


<a id='item3'></a>

## 3. Analyze Each Neighborhood

In [64]:
# one hot encoding
toronto_onehot = pd.get_dummies(toronto_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
toronto_onehot['Neighborhood'] = toronto_venues['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [toronto_onehot.columns[-1]] + list(toronto_onehot.columns[:-1])
toronto_onehot = toronto_onehot[fixed_columns]

toronto_onehot.head()

Unnamed: 0,Neighborhood,American Restaurant,Amphitheater,Argentinian Restaurant,Art Gallery,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,BBQ Joint,Bagel Shop,Bakery,Bank,Bar,Baseball Field,Basketball Court,Beer Bar,Beer Store,Big Box Store,Bike Shop,Bike Trail,Boat or Ferry,Bookstore,Boutique,Brazilian Restaurant,Breakfast Spot,Bubble Tea Shop,Burger Joint,Burmese Restaurant,Bus Line,Bus Station,Bus Stop,Business Service,Butcher,Café,Camera Store,Candy Store,Cantonese Restaurant,Caribbean Restaurant,Cheese Shop,Chinese Restaurant,Chocolate Shop,Clothing Store,Cocktail Bar,Coffee Shop,College Gym,College Rec Center,Comfort Food Restaurant,Construction & Landscaping,Convenience Store,Cosmetics Shop,Cupcake Shop,Deli / Bodega,Department Store,Dessert Shop,Dim Sum Restaurant,Diner,Discount Store,Dog Run,Dumpling Restaurant,Electronics Store,Ethiopian Restaurant,Event Space,Falafel Restaurant,Farmers Market,Fast Food Restaurant,Field,Filipino Restaurant,Flower Shop,Food & Drink Shop,Food Court,Food Truck,Fountain,Fraternity House,French Restaurant,Fried Chicken Joint,Frozen Yogurt Shop,Furniture / Home Store,Gaming Cafe,Garden,Gastropub,General Entertainment,Gift Shop,Golf Course,Government Building,Greek Restaurant,Grocery Store,Gym,Gym / Fitness Center,Halal Restaurant,Hardware Store,Historic Site,Hobby Shop,Hockey Arena,Home Service,Hostel,Hotel,Housing Development,Ice Cream Shop,Indian Chinese Restaurant,Indian Restaurant,Indie Theater,Intersection,Italian Restaurant,Japanese Restaurant,Jazz Club,Jewelry Store,Juice Bar,Karaoke Bar,Korean Restaurant,Lake,Latin American Restaurant,Library,Light Rail Station,Lingerie Store,Liquor Store,Lounge,Malay Restaurant,Market,Massage Studio,Mediterranean Restaurant,Men's Store,Metro Station,Mexican Restaurant,Middle Eastern Restaurant,Miscellaneous Shop,Mobile Phone Shop,Modern European Restaurant,Motorcycle Shop,Movie Theater,Nail Salon,Nightlife Spot,Noodle House,Organic Grocery,Other Great Outdoors,Other Repair Shop,Pakistani Restaurant,Paper / Office Supplies Store,Park,Performing Arts Venue,Pet Store,Pharmacy,Photography Studio,Pizza Place,Playground,Poke Place,Pool,Pool Hall,Portuguese Restaurant,Pub,Ramen Restaurant,Recreation Center,Rental Car Location,Residential Building (Apartment / Condo),Restaurant,Salon / Barbershop,Sandwich Place,Scenic Lookout,Seafood Restaurant,Shoe Store,Shopping Mall,Shopping Plaza,Skating Rink,Smoke Shop,Smoothie Shop,Snack Place,Spa,Sporting Goods Shop,Sports Bar,Steakhouse,Supermarket,Supplement Shop,Sushi Restaurant,Swiss Restaurant,Taco Place,Tea Room,Tennis Court,Thai Restaurant,Theater,Thrift / Vintage Store,Toy / Game Store,Trail,Train Station,Transportation Service,Vegetarian / Vegan Restaurant,Video Store,Vietnamese Restaurant,Warehouse Store,Wine Bar,Wings Joint,Women's Store,Yoga Studio,Zoo,Zoo Exhibit
0,Wychwood (94),0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,Wychwood (94),0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,Wychwood (94),0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,Wychwood (94),0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,Yonge-Eglinton (100),0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


And let's examine the new dataframe size.

In [65]:
toronto_onehot.shape

(821, 193)

#### Next, let's group rows by neighborhood and by taking the mean of the frequency of occurrence of each category

In [66]:
toronto_grouped = toronto_onehot.groupby('Neighborhood').mean().reset_index()
toronto_grouped

Unnamed: 0,Neighborhood,American Restaurant,Amphitheater,Argentinian Restaurant,Art Gallery,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,BBQ Joint,Bagel Shop,Bakery,Bank,Bar,Baseball Field,Basketball Court,Beer Bar,Beer Store,Big Box Store,Bike Shop,Bike Trail,Boat or Ferry,Bookstore,Boutique,Brazilian Restaurant,Breakfast Spot,Bubble Tea Shop,Burger Joint,Burmese Restaurant,Bus Line,Bus Station,Bus Stop,Business Service,Butcher,Café,Camera Store,Candy Store,Cantonese Restaurant,Caribbean Restaurant,Cheese Shop,Chinese Restaurant,Chocolate Shop,Clothing Store,Cocktail Bar,Coffee Shop,College Gym,College Rec Center,Comfort Food Restaurant,Construction & Landscaping,Convenience Store,Cosmetics Shop,Cupcake Shop,Deli / Bodega,Department Store,Dessert Shop,Dim Sum Restaurant,Diner,Discount Store,Dog Run,Dumpling Restaurant,Electronics Store,Ethiopian Restaurant,Event Space,Falafel Restaurant,Farmers Market,Fast Food Restaurant,Field,Filipino Restaurant,Flower Shop,Food & Drink Shop,Food Court,Food Truck,Fountain,Fraternity House,French Restaurant,Fried Chicken Joint,Frozen Yogurt Shop,Furniture / Home Store,Gaming Cafe,Garden,Gastropub,General Entertainment,Gift Shop,Golf Course,Government Building,Greek Restaurant,Grocery Store,Gym,Gym / Fitness Center,Halal Restaurant,Hardware Store,Historic Site,Hobby Shop,Hockey Arena,Home Service,Hostel,Hotel,Housing Development,Ice Cream Shop,Indian Chinese Restaurant,Indian Restaurant,Indie Theater,Intersection,Italian Restaurant,Japanese Restaurant,Jazz Club,Jewelry Store,Juice Bar,Karaoke Bar,Korean Restaurant,Lake,Latin American Restaurant,Library,Light Rail Station,Lingerie Store,Liquor Store,Lounge,Malay Restaurant,Market,Massage Studio,Mediterranean Restaurant,Men's Store,Metro Station,Mexican Restaurant,Middle Eastern Restaurant,Miscellaneous Shop,Mobile Phone Shop,Modern European Restaurant,Motorcycle Shop,Movie Theater,Nail Salon,Nightlife Spot,Noodle House,Organic Grocery,Other Great Outdoors,Other Repair Shop,Pakistani Restaurant,Paper / Office Supplies Store,Park,Performing Arts Venue,Pet Store,Pharmacy,Photography Studio,Pizza Place,Playground,Poke Place,Pool,Pool Hall,Portuguese Restaurant,Pub,Ramen Restaurant,Recreation Center,Rental Car Location,Residential Building (Apartment / Condo),Restaurant,Salon / Barbershop,Sandwich Place,Scenic Lookout,Seafood Restaurant,Shoe Store,Shopping Mall,Shopping Plaza,Skating Rink,Smoke Shop,Smoothie Shop,Snack Place,Spa,Sporting Goods Shop,Sports Bar,Steakhouse,Supermarket,Supplement Shop,Sushi Restaurant,Swiss Restaurant,Taco Place,Tea Room,Tennis Court,Thai Restaurant,Theater,Thrift / Vintage Store,Toy / Game Store,Trail,Train Station,Transportation Service,Vegetarian / Vegan Restaurant,Video Store,Vietnamese Restaurant,Warehouse Store,Wine Bar,Wings Joint,Women's Store,Yoga Studio,Zoo,Zoo Exhibit
0,Agincourt North (129),0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,Agincourt South-Malvern West (128),0.1,0.0,0.0,0.0,0.0,0.1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.1,0.0,0.0,0.1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.1,0.0,0.0,0.0,0.1,0.0,0.0,0.0,0.0,0.0,0.0,0.1,0.0,0.0,0.0,0.1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,Alderwood (20),0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,Annex (95),0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,Banbury-Don Mills (42),0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
5,Bathurst Manor (34),0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
6,Bay Street Corridor (76),0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.022727,0.0,0.022727,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.045455,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.022727,0.0,0.0,0.0,0.0,0.0,0.045455,0.0,0.0,0.0,0.204545,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.022727,0.0,0.022727,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.022727,0.0,0.0,0.022727,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.022727,0.0,0.0,0.045455,0.045455,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.022727,0.0,0.022727,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.022727,0.0,0.0,0.022727,0.0,0.022727,0.0,0.022727,0.0,0.0,0.0,0.0,0.022727,0.0,0.0,0.0,0.0,0.0,0.022727,0.0,0.022727,0.0,0.0,0.0,0.0,0.0,0.022727,0.0,0.045455,0.0,0.0,0.022727,0.0,0.0,0.068182,0.0,0.0,0.022727,0.0,0.045455,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
7,Bayview Village (52),0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
8,Bayview Woods-Steeles (49),0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
9,Bedford Park-Nortown (39),0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.111111,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.222222,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.111111,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.111111,0.0,0.0,0.0,0.111111,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.111111,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.111111,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.111111,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


#### Let's confirm the new size

In [67]:
toronto_grouped.shape

(121, 193)

#### Let's print each neighborhood along with the top 5 most common venues

In [68]:
num_top_venues = 5

for hood in toronto_grouped['Neighborhood']:
    print("----"+hood+"----")
    temp = toronto_grouped[toronto_grouped['Neighborhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----Agincourt North (129)----
                       venue  freq
0                       Park   1.0
1        American Restaurant   0.0
2            Organic Grocery   0.0
3  Middle Eastern Restaurant   0.0
4         Miscellaneous Shop   0.0


----Agincourt South-Malvern West (128)----
                 venue  freq
0  American Restaurant   0.1
1            Pool Hall   0.1
2          Pizza Place   0.1
3     Asian Restaurant   0.1
4      Motorcycle Shop   0.1


----Alderwood (20)----
                        venue  freq
0          Burmese Restaurant  0.25
1                  Playground  0.25
2  Construction & Landscaping  0.25
3           Electronics Store  0.25
4        Other Great Outdoors  0.00


----Annex (95)----
                       venue  freq
0           Fraternity House   1.0
1  Indian Chinese Restaurant   0.0
2  Middle Eastern Restaurant   0.0
3         Miscellaneous Shop   0.0
4          Mobile Phone Shop   0.0


----Banbury-Don Mills (42)----
                  venue  freq
0     

                       venue  freq
0              Grocery Store  0.17
1       Pakistani Restaurant  0.17
2       Fast Food Restaurant  0.17
3       Caribbean Restaurant  0.17
4  Middle Eastern Restaurant  0.17


----Forest Hill North (102)----
                 venue  freq
0             Pharmacy   0.5
1             Bus Stop   0.5
2  American Restaurant   0.0
3      Organic Grocery   0.0
4   Miscellaneous Shop   0.0


----Glenfield-Jane Heights (25)----
                       venue  freq
0                       Pool   1.0
1               Amphitheater   0.0
2  Middle Eastern Restaurant   0.0
3         Miscellaneous Shop   0.0
4          Mobile Phone Shop   0.0


----Greenwood-Coxwell (65)----
                  venue  freq
0     Indian Restaurant  0.41
1         Grocery Store  0.09
2         Indie Theater  0.05
3                   Bar  0.05
4  Pakistani Restaurant  0.05


----Guildwood (140)----
                  venue  freq
0                 Hotel   0.5
1                  Park   0.5
2   A

                  venue  freq
0           Pizza Place  0.25
1             BBQ Joint  0.25
2           Coffee Shop  0.25
3  Caribbean Restaurant  0.25
4   American Restaurant  0.00


----Palmerston-Little Italy (80)----
                 venue  freq
0         Dessert Shop   0.5
1          Coffee Shop   0.5
2  American Restaurant   0.0
3    Other Repair Shop   0.0
4    Mobile Phone Shop   0.0


----Parkwoods-Donalda (45)----
                       venue  freq
0       Fast Food Restaurant   1.0
1        American Restaurant   0.0
2  Indian Chinese Restaurant   0.0
3         Miscellaneous Shop   0.0
4          Mobile Phone Shop   0.0


----Pelmo Park-Humberlea (23)----
                    venue  freq
0  Furniture / Home Store  0.25
1               Gift Shop  0.25
2                  Bakery  0.25
3       Other Repair Shop  0.25
4     American Restaurant  0.00


----Playter Estates-Danforth (67)----
               venue  freq
0     Cosmetics Shop  0.25
1        Art Gallery  0.25
2              

#### Let's put that into a *pandas* dataframe

First, let's write a function to sort the venues in descending order.

In [69]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

Now let's create the new dataframe and display the top 10 venues for each neighborhood.

In [70]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = toronto_grouped['Neighborhood']

for ind in np.arange(toronto_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(toronto_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted.head()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Agincourt North (129),Park,Zoo Exhibit,Ethiopian Restaurant,Food Court,Food & Drink Shop,Flower Shop,Filipino Restaurant,Field,Fast Food Restaurant,Farmers Market
1,Agincourt South-Malvern West (128),American Restaurant,Motorcycle Shop,Pizza Place,Malay Restaurant,Asian Restaurant,Pool Hall,Chinese Restaurant,Mediterranean Restaurant,Restaurant,Seafood Restaurant
2,Alderwood (20),Electronics Store,Burmese Restaurant,Playground,Construction & Landscaping,Cupcake Shop,Deli / Bodega,Food Court,Food & Drink Shop,Flower Shop,Filipino Restaurant
3,Annex (95),Fraternity House,Zoo,Food Truck,Food Court,Food & Drink Shop,Flower Shop,Filipino Restaurant,Field,Fast Food Restaurant,Farmers Market
4,Banbury-Don Mills (42),Bubble Tea Shop,Sandwich Place,Spa,Cantonese Restaurant,Dessert Shop,Falafel Restaurant,Deli / Bodega,Food Court,Food & Drink Shop,Flower Shop


<a id='item4'></a>

## 4. Cluster Neighborhoods

Run *k*-means to cluster the neighborhood into 5 clusters.

In [71]:
# set number of clusters
kclusters = 8

toronto_grouped_clustering = toronto_grouped.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(toronto_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

array([2, 1, 1, 1, 1, 4, 1, 1, 1, 1], dtype=int32)

In [72]:
neighborhoods_venues_sorted.head()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Agincourt North (129),Park,Zoo Exhibit,Ethiopian Restaurant,Food Court,Food & Drink Shop,Flower Shop,Filipino Restaurant,Field,Fast Food Restaurant,Farmers Market
1,Agincourt South-Malvern West (128),American Restaurant,Motorcycle Shop,Pizza Place,Malay Restaurant,Asian Restaurant,Pool Hall,Chinese Restaurant,Mediterranean Restaurant,Restaurant,Seafood Restaurant
2,Alderwood (20),Electronics Store,Burmese Restaurant,Playground,Construction & Landscaping,Cupcake Shop,Deli / Bodega,Food Court,Food & Drink Shop,Flower Shop,Filipino Restaurant
3,Annex (95),Fraternity House,Zoo,Food Truck,Food Court,Food & Drink Shop,Flower Shop,Filipino Restaurant,Field,Fast Food Restaurant,Farmers Market
4,Banbury-Don Mills (42),Bubble Tea Shop,Sandwich Place,Spa,Cantonese Restaurant,Dessert Shop,Falafel Restaurant,Deli / Bodega,Food Court,Food & Drink Shop,Flower Shop


Let's create a new dataframe that includes the cluster as well as the top 10 venues for each neighborhood.

In [None]:
neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

In [74]:
# add clustering labels


toronto_merged = neighborhoods

# merge toronto_grouped with toronto_data to add latitude/longitude for each neighborhood
toronto_merged = toronto_merged.join(neighborhoods_venues_sorted.set_index('Neighborhood'), on='AREA_NAME')

toronto_merged.head() # check the last columns!

Unnamed: 0,AREA_SHORT_CODE,AREA_NAME,LONGITUDE,LATITUDE,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,94,Wychwood (94),-79.425515,43.676919,1.0,Art Gallery,Park,Coffee Shop,Italian Restaurant,Zoo Exhibit,Event Space,Food Court,Food & Drink Shop,Flower Shop,Filipino Restaurant
1,100,Yonge-Eglinton (100),-79.40359,43.704689,1.0,Pizza Place,Burger Joint,Skating Rink,Breakfast Spot,Arts & Crafts Store,Gym / Fitness Center,Japanese Restaurant,Gym,Coffee Shop,Event Space
2,97,Yonge-St.Clair (97),-79.397871,43.687859,1.0,Coffee Shop,Pub,American Restaurant,Sushi Restaurant,Fried Chicken Joint,Convenience Store,Cantonese Restaurant,Café,Sandwich Place,Sports Bar
3,27,York University Heights (27),-79.488883,43.765736,1.0,Massage Studio,Falafel Restaurant,Furniture / Home Store,Coffee Shop,Japanese Restaurant,Bar,Food Truck,Food Court,Food & Drink Shop,Flower Shop
4,31,Yorkdale-Glen Park (31),-79.457108,43.714672,1.0,Fast Food Restaurant,Construction & Landscaping,Sandwich Place,Paper / Office Supplies Store,Coffee Shop,Gym,Bank,Men's Store,Restaurant,Flower Shop


In [75]:
toronto_merged = toronto_merged.dropna()
toronto_merged = toronto_merged.astype({"Cluster Labels": int})

Finally, let's visualize the resulting clusters

In [76]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(toronto_merged['LATITUDE'], toronto_merged['LONGITUDE'], toronto_merged['AREA_NAME'], toronto_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

<a id='item5'></a>

## 5. Examine Clusters

Now, you can examine each cluster and determine the discriminating venue categories that distinguish each cluster. Based on the defining categories, you can then assign a name to each cluster. I will leave this exercise to you.

#### Cluster 1

In [81]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 0, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,AREA_NAME,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
22,Mount Pleasant West (104),Sandwich Place,Zoo Exhibit,Ethiopian Restaurant,Food Court,Food & Drink Shop,Flower Shop,Filipino Restaurant,Field,Fast Food Restaurant,Farmers Market


#### Cluster 2

In [82]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 1, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,AREA_NAME,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Wychwood (94),Art Gallery,Park,Coffee Shop,Italian Restaurant,Zoo Exhibit,Event Space,Food Court,Food & Drink Shop,Flower Shop,Filipino Restaurant
1,Yonge-Eglinton (100),Pizza Place,Burger Joint,Skating Rink,Breakfast Spot,Arts & Crafts Store,Gym / Fitness Center,Japanese Restaurant,Gym,Coffee Shop,Event Space
2,Yonge-St.Clair (97),Coffee Shop,Pub,American Restaurant,Sushi Restaurant,Fried Chicken Joint,Convenience Store,Cantonese Restaurant,Café,Sandwich Place,Sports Bar
3,York University Heights (27),Massage Studio,Falafel Restaurant,Furniture / Home Store,Coffee Shop,Japanese Restaurant,Bar,Food Truck,Food Court,Food & Drink Shop,Flower Shop
4,Yorkdale-Glen Park (31),Fast Food Restaurant,Construction & Landscaping,Sandwich Place,Paper / Office Supplies Store,Coffee Shop,Gym,Bank,Men's Store,Restaurant,Flower Shop
7,Lawrence Park North (105),Bakery,Sushi Restaurant,Italian Restaurant,Coffee Shop,Japanese Restaurant,Tea Room,Bank,Burger Joint,Asian Restaurant,Lingerie Store
8,Lawrence Park South (103),Sushi Restaurant,Zoo Exhibit,Fountain,Food Court,Food & Drink Shop,Flower Shop,Filipino Restaurant,Field,Fast Food Restaurant,Farmers Market
9,Leaside-Bennington (56),Convenience Store,Park,Japanese Restaurant,Sandwich Place,Electronics Store,Food & Drink Shop,Flower Shop,Filipino Restaurant,Field,Fast Food Restaurant
10,Little Portugal (84),Restaurant,Bar,Café,Breakfast Spot,Grocery Store,Thai Restaurant,Sandwich Place,Cocktail Bar,Vegetarian / Vegan Restaurant,Coffee Shop
11,Long Branch (19),Wings Joint,Coffee Shop,Beer Store,Greek Restaurant,Zoo Exhibit,Event Space,Food Court,Food & Drink Shop,Flower Shop,Filipino Restaurant


#### Cluster 3

In [83]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 2, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,AREA_NAME,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
5,Lambton Baby Point (114),Park,Zoo Exhibit,Ethiopian Restaurant,Food Court,Food & Drink Shop,Flower Shop,Filipino Restaurant,Field,Fast Food Restaurant,Farmers Market
42,Roncesvalles (86),Coffee Shop,Park,Recreation Center,Zoo Exhibit,Ethiopian Restaurant,Food & Drink Shop,Flower Shop,Filipino Restaurant,Field,Fast Food Restaurant
54,Taylor-Massey (61),Park,Zoo Exhibit,Ethiopian Restaurant,Food Court,Food & Drink Shop,Flower Shop,Filipino Restaurant,Field,Fast Food Restaurant,Farmers Market
62,West Hill (136),Park,Gym / Fitness Center,Zoo Exhibit,Electronics Store,Food & Drink Shop,Flower Shop,Filipino Restaurant,Field,Fast Food Restaurant,Farmers Market
64,Westminster-Branson (35),Park,Gym / Fitness Center,Zoo Exhibit,Electronics Store,Food & Drink Shop,Flower Shop,Filipino Restaurant,Field,Fast Food Restaurant,Farmers Market
74,Agincourt North (129),Park,Zoo Exhibit,Ethiopian Restaurant,Food Court,Food & Drink Shop,Flower Shop,Filipino Restaurant,Field,Fast Food Restaurant,Farmers Market
85,Bendale (127),Park,Greek Restaurant,Zoo Exhibit,Electronics Store,Food & Drink Shop,Flower Shop,Filipino Restaurant,Field,Fast Food Restaurant,Farmers Market
88,Blake-Jones (69),Park,Burger Joint,Zoo Exhibit,Event Space,Food Court,Food & Drink Shop,Flower Shop,Filipino Restaurant,Field,Fast Food Restaurant
95,Casa Loma (96),Park,Lake,Tennis Court,Dog Run,Zoo Exhibit,Food & Drink Shop,Flower Shop,Filipino Restaurant,Field,Fast Food Restaurant
121,Guildwood (140),Park,Hotel,Zoo Exhibit,Ethiopian Restaurant,Food Court,Food & Drink Shop,Flower Shop,Filipino Restaurant,Field,Fast Food Restaurant


#### Cluster 4

In [84]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 3, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,AREA_NAME,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
119,Glenfield-Jane Heights (25),Pool,Convenience Store,Food Court,Food & Drink Shop,Flower Shop,Filipino Restaurant,Field,Fast Food Restaurant,Farmers Market,Falafel Restaurant
138,Kingsway South (15),Pool,Convenience Store,Food Court,Food & Drink Shop,Flower Shop,Filipino Restaurant,Field,Fast Food Restaurant,Farmers Market,Falafel Restaurant


#### Cluster 5

In [85]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 4, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,AREA_NAME,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
13,Maple Leaf (29),Convenience Store,Zoo,Food Truck,Food Court,Food & Drink Shop,Flower Shop,Filipino Restaurant,Field,Fast Food Restaurant,Farmers Market
40,Rexdale-Kipling (4),Convenience Store,Flower Shop,Zoo,Food Truck,Food Court,Food & Drink Shop,Filipino Restaurant,Field,Fast Food Restaurant,Farmers Market
79,Bathurst Manor (34),Convenience Store,Zoo,Food Truck,Food Court,Food & Drink Shop,Flower Shop,Filipino Restaurant,Field,Fast Food Restaurant,Farmers Market
114,Eringate-Centennial-West Deane (11),Convenience Store,Zoo,Food Truck,Food Court,Food & Drink Shop,Flower Shop,Filipino Restaurant,Field,Fast Food Restaurant,Farmers Market


<hr>

Copyright &copy; 2018 [Cognitive Class](https://cognitiveclass.ai/?utm_source=bducopyrightlink&utm_medium=dswb&utm_campaign=bdu). This notebook and its source code are released under the terms of the [MIT License](https://bigdatauniversity.com/mit-license/).