# Segmenting and Clustering Neighborhoods in Toronto

## 1. Install and import packages to use

Install BeautifulSoap package

In [None]:
!conda install -c anaconda beautifulsoup4

Solving environment: / 

In [None]:
!conda install -c anaconda lxml

In [None]:
!conda install -c conda-forge geopy --yes # uncomment this line if you haven't completed the Foursquare API lab

In [29]:
!conda install -c conda-forge folium=0.5.0 --yes

Solving environment: done

## Package Plan ##

  environment location: /opt/conda/envs/Python36

  added / updated specs: 
    - folium=0.5.0


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    openssl-1.1.1g             |       h516909a_0         2.1 MB  conda-forge
    branca-0.4.0               |             py_0          26 KB  conda-forge
    python_abi-3.6             |          1_cp36m           4 KB  conda-forge
    vincent-0.4.4              |             py_1          28 KB  conda-forge
    folium-0.5.0               |             py_0          45 KB  conda-forge
    certifi-2020.4.5.1         |   py36h9f0ad1d_0         151 KB  conda-forge
    altair-4.1.0               |             py_1         614 KB  conda-forge
    ca-certificates-2020.4.5.1 |       hecc5488_0         146 KB  conda-forge
    ------------------------------------------------------------
                       

In [128]:
import pandas as pd
import numpy as np
from bs4 import BeautifulSoup
import requests
from pandas.io.json import json_normalize
from sklearn.cluster import KMeans
import matplotlib.cm as cm
import matplotlib.colors as colors

## 2. Webscrapping Toronto info from Wikipedia page

In [3]:
source = requests.get('https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M').text

page_html = BeautifulSoup(source,'lxml')

In [4]:
toronto_postal_codes_html = page_html.find('table', class_='wikitable sortable')
print(toronto_postal_codes_html.prettify())

<table class="wikitable sortable">
 <tbody>
  <tr>
   <th>
    Postal code
   </th>
   <th>
    Borough
   </th>
   <th>
    Neighborhood
   </th>
  </tr>
  <tr>
   <td>
    M1A
   </td>
   <td>
    Not assigned
   </td>
   <td>
   </td>
  </tr>
  <tr>
   <td>
    M2A
   </td>
   <td>
    Not assigned
   </td>
   <td>
   </td>
  </tr>
  <tr>
   <td>
    M3A
   </td>
   <td>
    North York
   </td>
   <td>
    Parkwoods
   </td>
  </tr>
  <tr>
   <td>
    M4A
   </td>
   <td>
    North York
   </td>
   <td>
    Victoria Village
   </td>
  </tr>
  <tr>
   <td>
    M5A
   </td>
   <td>
    Downtown Toronto
   </td>
   <td>
    Regent Park / Harbourfront
   </td>
  </tr>
  <tr>
   <td>
    M6A
   </td>
   <td>
    North York
   </td>
   <td>
    Lawrence Manor / Lawrence Heights
   </td>
  </tr>
  <tr>
   <td>
    M7A
   </td>
   <td>
    Downtown Toronto
   </td>
   <td>
    Queen's Park / Ontario Provincial Government
   </td>
  </tr>
  <tr>
   <td>
    M8A
   </td>
   <td>
    Not assig

## 3. Analysis and Format Toronto info

In [5]:
df = pd.read_html(str(toronto_postal_codes_html))[0]

The dataframe will consist of three columns: PostalCode, Borough, and Neighborhood

In [6]:
df.head()

Unnamed: 0,Postal code,Borough,Neighborhood
0,M1A,Not assigned,
1,M2A,Not assigned,
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,Regent Park / Harbourfront


Only process the cells that have an assigned borough. Ignore cells with a borough that is Not assigned.

In [7]:
filter_assigned_borough = df['Borough'] != 'Not assigned'
df_filtered = df[filter_assigned_borough]
df_filtered.shape

(103, 3)

Check if there are duplicated rows related with Postal Code column

In [8]:
duplicatePostalCodesDF = df_filtered[df_filtered.duplicated(['Postal code'])]
print("Duplicate Rows based on Postal Code column are:", duplicatePostalCodesDF, sep='\n')

Duplicate Rows based on Postal Code column are:
Empty DataFrame
Columns: [Postal code, Borough, Neighborhood]
Index: []


Well, as we can see, doesn't exist duplicated rows in this new version of Wikipedia, but we also create the code to join Neighborhood columns content where is duplicated Postal Code

In [9]:
df_filtered['Neighborhood'] = df_filtered.groupby(['Postal code','Borough'])['Neighborhood'].transform(lambda x: '/'.join(x))
df_filtered[['Postal code','Borough','Neighborhood']].drop_duplicates()

df_filtered.head()

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  if __name__ == '__main__':


Unnamed: 0,Postal code,Borough,Neighborhood
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,Regent Park / Harbourfront
5,M6A,North York,Lawrence Manor / Lawrence Heights
6,M7A,Downtown Toronto,Queen's Park / Ontario Provincial Government


In [10]:
df_filtered.shape

(103, 3)

We check if there are some Neighborhood with Not assigned value, but doesn't too!

In [11]:
df_filtered[df_filtered["Neighborhood"] == 'Not assigned'].head()

Unnamed: 0,Postal code,Borough,Neighborhood


In the same way we create the code to format these values

In [19]:
#df_filtered["Neighborhood"] = np.where(df_filtered["Neighborhood"] == 'Not assigned', 0, df_filtered["Borough"])

Rename "Postal code" column to "PostalCode" 

In [12]:
#check the dataframe dimension
df_filtered = df_filtered.rename(columns={'Postal code': 'PostalCode'})
df_filtered.head()

Unnamed: 0,PostalCode,Borough,Neighborhood
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,Regent Park / Harbourfront
5,M6A,North York,Lawrence Manor / Lawrence Heights
6,M7A,Downtown Toronto,Queen's Park / Ontario Provincial Government


The last instruction is for check the dataframe dimension. We check the Neighborhood content, and for Postal Codes with more than 1 Neighborhood related, the separator character is '/' to separate the differents Neighborhoods

In [13]:
df_filtered.shape

(103, 3)

## 4. Adding location in Dataframe

Download geolocation information

In [14]:
geo_location_df = pd.read_csv("http://cocl.us/Geospatial_data")

In [15]:
geo_location_df.head()

Unnamed: 0,Postal Code,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497
2,M1E,43.763573,-79.188711
3,M1G,43.770992,-79.216917
4,M1H,43.773136,-79.239476


Rename "Postal Code" to "PostalCode"

In [16]:
geo_location_df = geo_location_df.rename(columns={'Postal Code' : 'PostalCode'})
geo_location_df.head()

Unnamed: 0,PostalCode,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497
2,M1E,43.763573,-79.188711
3,M1G,43.770992,-79.216917
4,M1H,43.773136,-79.239476


Join Toronto dataframe with Geolocation dataframe to retrieve Latitude and Longitude info

In [18]:
df_toronto = df_filtered.join(geo_location_df.set_index('PostalCode'), on='PostalCode')
df_toronto.head()

Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude
2,M3A,North York,Parkwoods,43.753259,-79.329656
3,M4A,North York,Victoria Village,43.725882,-79.315572
4,M5A,Downtown Toronto,Regent Park / Harbourfront,43.65426,-79.360636
5,M6A,North York,Lawrence Manor / Lawrence Heights,43.718518,-79.464763
6,M7A,Downtown Toronto,Queen's Park / Ontario Provincial Government,43.662301,-79.389494


## 5. Explore Toronto dataframe

Explore the different Boroughs in Toronto

In [22]:
pd.value_counts(df_toronto['Borough'].values, sort=False)

East Toronto         5
Etobicoke           12
East York            5
Mississauga          1
Downtown Toronto    19
Scarborough         17
North York          24
York                 5
Central Toronto      9
West Toronto         6
dtype: int64

In [26]:
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

In [27]:
address = 'Toronto, Ontario'

geolocator = Nominatim(user_agent="toronto_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude

print('The geograpical coordinate of Toronto are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Toronto are 43.6534817, -79.3839347.


In [30]:
import folium # map rendering library

In [31]:
# create map of Toronto using latitude and longitude values
map_toronto = folium.Map(location=[latitude, longitude], zoom_start=11)

# add markers to map
for lat, lng, label in zip(df_toronto['Latitude'], df_toronto['Longitude'], df_toronto['Neighborhood']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_toronto)  
    
map_toronto

In [36]:
#df_toronto_neighborhoods = df_toronto[df_toronto['Borough'] == 'Central Toronto'].reset_index(drop=True)
df_toronto_neighborhoods = df_toronto[df_toronto['Borough'] == 'Downtown Toronto'].reset_index(drop=True)
df_toronto_neighborhoods.head(20)

Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude
0,M5A,Downtown Toronto,Regent Park / Harbourfront,43.65426,-79.360636
1,M7A,Downtown Toronto,Queen's Park / Ontario Provincial Government,43.662301,-79.389494
2,M5B,Downtown Toronto,"Garden District, Ryerson",43.657162,-79.378937
3,M5C,Downtown Toronto,St. James Town,43.651494,-79.375418
4,M5E,Downtown Toronto,Berczy Park,43.644771,-79.373306
5,M5G,Downtown Toronto,Central Bay Street,43.657952,-79.387383
6,M6G,Downtown Toronto,Christie,43.669542,-79.422564
7,M5H,Downtown Toronto,Richmond / Adelaide / King,43.650571,-79.384568
8,M5J,Downtown Toronto,Harbourfront East / Union Station / Toronto Is...,43.640816,-79.381752
9,M5K,Downtown Toronto,Toronto Dominion Centre / Design Exchange,43.647177,-79.381576


In [39]:
# create map of Toronto using latitude and longitude values
map_toronto_neighborhoods = folium.Map(location=[latitude, longitude], zoom_start=13)

# add markers to map
for lat, lng, label in zip(df_toronto_neighborhoods['Latitude'], df_toronto_neighborhoods['Longitude'], df_toronto_neighborhoods['Neighborhood']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_toronto_neighborhoods)  
    
map_toronto_neighborhoods

## 6. Retrieve Foursquare venues for Neighborhood Central Bay Street

In [62]:
location = df_toronto_neighborhoods[df_toronto_neighborhoods['Neighborhood'] == 'Central Bay Street'][['Latitude','Longitude']].reset_index(drop=True)
location

Unnamed: 0,Latitude,Longitude
0,43.657952,-79.387383


In [66]:
CLIENT_ID = 'OIEMPVN4CF5L1KIINDH25UYZV2Z1BT5JOMMIYH0YVPYPQFPW' # your Foursquare ID
CLIENT_SECRET = 'SWUOJCKG1UXQZUBXTWTLANWJPHXDVXBNO4ZQ00TWXUUDZHDO' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version

radius = 500
LIMIT = 50

latitude = location.loc[0, 'Latitude'] #43.657952
longitude = location.loc[0, 'Longitude'] # -79.387383

url = 'https://api.foursquare.com/v2/venues/explore?client_id={}&client_secret={}&ll={},{}&v={}&radius={}&limit={}'.format(CLIENT_ID, CLIENT_SECRET, latitude, longitude, VERSION, radius, LIMIT)
url

'https://api.foursquare.com/v2/venues/explore?client_id=OIEMPVN4CF5L1KIINDH25UYZV2Z1BT5JOMMIYH0YVPYPQFPW&client_secret=SWUOJCKG1UXQZUBXTWTLANWJPHXDVXBNO4ZQ00TWXUUDZHDO&ll=43.6579524,-79.3873826&v=20180605&radius=500&limit=50'

In [67]:
results = requests.get(url).json()
results

{'meta': {'code': 200, 'requestId': '5ea183b5da9e14001b72cc1b'},
 'response': {'suggestedFilters': {'header': 'Tap to show:',
   'filters': [{'name': 'Open now', 'key': 'openNow'}]},
  'headerLocation': 'Bay Street Corridor',
  'headerFullLocation': 'Bay Street Corridor, Toronto',
  'headerLocationGranularity': 'neighborhood',
  'totalResults': 65,
  'suggestedBounds': {'ne': {'lat': 43.6624524045, 'lng': -79.38117421839567},
   'sw': {'lat': 43.6534523955, 'lng': -79.39359098160432}},
  'groups': [{'type': 'Recommended Places',
    'name': 'recommended',
    'items': [{'reasons': {'count': 0,
       'items': [{'summary': 'This spot is popular',
         'type': 'general',
         'reasonName': 'globalInteractionReason'}]},
      'venue': {'id': '537d4d6d498ec171ba22e7fe',
       'name': "Jimmy's Coffee",
       'location': {'address': '82 Gerrard Street W',
        'crossStreet': 'Gerrard & LaPlante',
        'lat': 43.65842123574496,
        'lng': -79.38561319551111,
        'label

In [68]:
# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

In [74]:
venues = results['response']['groups'][0]['items']
    
nearby_venues = json_normalize(venues) # flatten JSON

# filter columns
filtered_columns = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
nearby_venues =nearby_venues.loc[:, filtered_columns]

# filter the category for each row
nearby_venues['venue.categories'] = nearby_venues.apply(get_category_type, axis=1)

# clean columns
nearby_venues.columns = [col.split(".")[-1] for col in nearby_venues.columns]

nearby_venues.head(10)

Unnamed: 0,name,categories,lat,lng
0,Jimmy's Coffee,Coffee Shop,43.658421,-79.385613
1,Tim Hortons,Coffee Shop,43.65857,-79.385123
2,Neo Coffee Bar,Coffee Shop,43.66014,-79.38587
3,Hailed Coffee,Coffee Shop,43.658833,-79.383684
4,Somethin' 2 Talk About,Middle Eastern Restaurant,43.658395,-79.385338
5,The Queen and Beaver Public House,Gastropub,43.657472,-79.383524
6,The Elm Tree Restaurant,Modern European Restaurant,43.657397,-79.383761
7,Mercatto,Italian Restaurant,43.660391,-79.387664
8,Japango,Sushi Restaurant,43.655268,-79.385165
9,College Park Area,Park,43.659453,-79.383785


In [75]:
print('{} venues were returned by Foursquare.'.format(nearby_venues.shape[0]))

50 venues were returned by Foursquare.


## 7. Explore Neighborhoods in Downtown Toronto

In [77]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [100]:
downtown_neighborhoods_venues = getNearbyVenues(names=df_toronto_neighborhoods['Neighborhood'],
                                   latitudes=df_toronto_neighborhoods['Latitude'],
                                   longitudes=df_toronto_neighborhoods['Longitude']
                                  )

Regent Park / Harbourfront
Queen's Park / Ontario Provincial Government
Garden District, Ryerson
St. James Town
Berczy Park
Central Bay Street
Christie
Richmond / Adelaide / King
Harbourfront East / Union Station / Toronto Islands
Toronto Dominion Centre / Design Exchange
Commerce Court / Victoria Hotel
University of Toronto / Harbord
Kensington Market / Chinatown / Grange Park
CN Tower / King and Spadina / Railway Lands / Harbourfront West / Bathurst Quay / South Niagara / Island airport
Rosedale
Stn A PO Boxes
St. James Town / Cabbagetown
First Canadian Place / Underground city
Church and Wellesley


In [89]:
print(downtown_neighborhoods_venues.shape)
downtown_neighborhoods_venues.head(20)

(801, 7)


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Regent Park / Harbourfront,43.65426,-79.360636,Roselle Desserts,43.653447,-79.362017,Bakery
1,Regent Park / Harbourfront,43.65426,-79.360636,Tandem Coffee,43.653559,-79.361809,Coffee Shop
2,Regent Park / Harbourfront,43.65426,-79.360636,Cooper Koo Family YMCA,43.653249,-79.358008,Distribution Center
3,Regent Park / Harbourfront,43.65426,-79.360636,Body Blitz Spa East,43.654735,-79.359874,Spa
4,Regent Park / Harbourfront,43.65426,-79.360636,Morning Glory Cafe,43.653947,-79.361149,Breakfast Spot
5,Regent Park / Harbourfront,43.65426,-79.360636,Impact Kitchen,43.656369,-79.35698,Restaurant
6,Regent Park / Harbourfront,43.65426,-79.360636,Corktown Common,43.655618,-79.356211,Park
7,Regent Park / Harbourfront,43.65426,-79.360636,Figs Breakfast & Lunch,43.655675,-79.364503,Breakfast Spot
8,Regent Park / Harbourfront,43.65426,-79.360636,The Distillery Historic District,43.650244,-79.359323,Historic Site
9,Regent Park / Harbourfront,43.65426,-79.360636,Dominion Pub and Kitchen,43.656919,-79.358967,Pub


In [101]:
downtown_neighborhoods_venues.groupby('Neighborhood').count()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Berczy Park,50,50,50,50,50,50
CN Tower / King and Spadina / Railway Lands / Harbourfront West / Bathurst Quay / South Niagara / Island airport,17,17,17,17,17,17
Central Bay Street,50,50,50,50,50,50
Christie,18,18,18,18,18,18
Church and Wellesley,50,50,50,50,50,50
Commerce Court / Victoria Hotel,50,50,50,50,50,50
First Canadian Place / Underground city,50,50,50,50,50,50
"Garden District, Ryerson",50,50,50,50,50,50
Harbourfront East / Union Station / Toronto Islands,50,50,50,50,50,50
Kensington Market / Chinatown / Grange Park,50,50,50,50,50,50


In [86]:
print('There are {} uniques categories.'.format(len(downtown_neighborhoods_venues['Venue Category'].unique())))

There are 182 uniques categories.


## 8. Analyze Each Neighborhood 

In [142]:
# one hot encoding
downtown_onehot = pd.get_dummies(downtown_neighborhoods_venues[['Venue Category']], prefix="", prefix_sep="")

downtown_onehot.drop(columns=['Neighborhood'], inplace=True)

# add neighborhood column back to dataframe
downtown_onehot = pd.concat([pd.Series(downtown_neighborhoods_venues.loc[:,'Neighborhood'], index=downtown_onehot.index, name='Neighborhood'), downtown_onehot], axis=1)

downtown_onehot.head()

Unnamed: 0,Neighborhood,Airport,Airport Food Court,Airport Gate,Airport Lounge,Airport Service,Airport Terminal,American Restaurant,Antique Shop,Aquarium,...,Thai Restaurant,Theater,Theme Restaurant,Trail,Train Station,Vegetarian / Vegan Restaurant,Video Game Store,Vietnamese Restaurant,Wine Bar,Yoga Studio
0,Regent Park / Harbourfront,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,Regent Park / Harbourfront,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,Regent Park / Harbourfront,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,Regent Park / Harbourfront,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,Regent Park / Harbourfront,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [143]:
downtown_onehot.shape

(801, 182)

In [144]:
downtown_grouped = downtown_onehot.groupby('Neighborhood').mean().reset_index()
downtown_grouped

Unnamed: 0,Neighborhood,Airport,Airport Food Court,Airport Gate,Airport Lounge,Airport Service,Airport Terminal,American Restaurant,Antique Shop,Aquarium,...,Thai Restaurant,Theater,Theme Restaurant,Trail,Train Station,Vegetarian / Vegan Restaurant,Video Game Store,Vietnamese Restaurant,Wine Bar,Yoga Studio
0,Berczy Park,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.02,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0
1,CN Tower / King and Spadina / Railway Lands / ...,0.058824,0.058824,0.058824,0.117647,0.176471,0.117647,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,Central Bay Street,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.02,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.02
3,Christie,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,Church and Wellesley,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.02,0.02,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.04
5,Commerce Court / Victoria Hotel,0.0,0.0,0.0,0.0,0.0,0.0,0.04,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0
6,First Canadian Place / Underground city,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.02,0.0
7,"Garden District, Ryerson",0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,...,0.02,0.04,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
8,Harbourfront East / Union Station / Toronto Is...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.06,...,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0
9,Kensington Market / Chinatown / Grange Park,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.04,0.0,0.04,0.02,0.0


In [145]:
downtown_grouped.shape

(19, 182)

In [116]:
num_top_venues = 5

for hood in downtown_grouped['Neighborhood']:
    print("----"+hood+"----")
    temp = downtown_grouped[downtown_grouped['Neighborhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----Berczy Park----
                venue  freq
0         Coffee Shop  0.08
1        Cocktail Bar  0.04
2          Restaurant  0.04
3  Seafood Restaurant  0.04
4              Bakery  0.04


----CN Tower / King and Spadina / Railway Lands / Harbourfront West / Bathurst Quay / South Niagara / Island airport----
                venue  freq
0     Airport Service  0.18
1      Airport Lounge  0.12
2    Airport Terminal  0.12
3             Airport  0.06
4  Airport Food Court  0.06


----Central Bay Street----
                venue  freq
0         Coffee Shop  0.20
1                Café  0.06
2  Italian Restaurant  0.06
3      Sandwich Place  0.04
4        Burger Joint  0.04


----Christie----
                venue  freq
0       Grocery Store  0.22
1                Café  0.17
2                Park  0.11
3  Athletics & Sports  0.06
4         Gas Station  0.06


----Church and Wellesley----
              venue  freq
0  Sushi Restaurant  0.06
1         Gastropub  0.04
2      Burger Joint  0.04
3 

In [117]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [146]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = downtown_grouped['Neighborhood']

for ind in np.arange(downtown_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(downtown_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted.head()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Berczy Park,Coffee Shop,Bakery,Cocktail Bar,Beer Bar,Cheese Shop,Restaurant,Café,Farmers Market,Seafood Restaurant,Breakfast Spot
1,CN Tower / King and Spadina / Railway Lands / ...,Airport Service,Airport Lounge,Airport Terminal,Airport,Boat or Ferry,Boutique,Sculpture Garden,Bar,Plane,Harbor / Marina
2,Central Bay Street,Coffee Shop,Café,Italian Restaurant,Sandwich Place,Burger Joint,Bubble Tea Shop,Ice Cream Shop,French Restaurant,Ramen Restaurant,Diner
3,Christie,Grocery Store,Café,Park,Coffee Shop,Candy Store,Italian Restaurant,Restaurant,Baby Store,Athletics & Sports,Diner
4,Church and Wellesley,Sushi Restaurant,Japanese Restaurant,Coffee Shop,Restaurant,Gastropub,Burger Joint,Yoga Studio,Men's Store,Café,Pizza Place


## 9. Cluster Neighborhoods

In [147]:
# set number of clusters
kclusters = 5

downtown_grouped_clustering = downtown_grouped.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(downtown_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

array([0, 2, 4, 3, 0, 0, 0, 0, 0, 0], dtype=int32)

In [148]:
# add clustering labels
neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

downtown_merged = df_toronto_neighborhoods

# merge toronto_grouped with toronto_data to add latitude/longitude for each neighborhood
downtown_merged = downtown_merged.join(neighborhoods_venues_sorted.set_index('Neighborhood'), on='Neighborhood')

downtown_merged.head() # check the last columns!

Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,M5A,Downtown Toronto,Regent Park / Harbourfront,43.65426,-79.360636,4,Coffee Shop,Park,Pub,Bakery,Breakfast Spot,Theater,Café,Yoga Studio,Cosmetics Shop,Health Food Store
1,M7A,Downtown Toronto,Queen's Park / Ontario Provincial Government,43.662301,-79.389494,4,Coffee Shop,Sushi Restaurant,Diner,Japanese Restaurant,Hobby Shop,Sandwich Place,Beer Bar,Distribution Center,Discount Store,Italian Restaurant
2,M5B,Downtown Toronto,"Garden District, Ryerson",43.657162,-79.378937,0,Coffee Shop,Café,Restaurant,Clothing Store,Cosmetics Shop,Italian Restaurant,Theater,Ramen Restaurant,Bookstore,Diner
3,M5C,Downtown Toronto,St. James Town,43.651494,-79.375418,0,Café,Gastropub,Coffee Shop,Creperie,Seafood Restaurant,Farmers Market,Hotel,Cosmetics Shop,Cheese Shop,Park
4,M5E,Downtown Toronto,Berczy Park,43.644771,-79.373306,0,Coffee Shop,Bakery,Cocktail Bar,Beer Bar,Cheese Shop,Restaurant,Café,Farmers Market,Seafood Restaurant,Breakfast Spot


In [156]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=13)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(downtown_merged['Latitude'], downtown_merged['Longitude'], downtown_merged['Neighborhood'], downtown_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

## 10. Examine the Clusters 

In [155]:
downtown_merged.loc[downtown_merged['Cluster Labels'] == 0, downtown_merged.columns[[1] + list(range(5, downtown_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
2,Downtown Toronto,0,Coffee Shop,Café,Restaurant,Clothing Store,Cosmetics Shop,Italian Restaurant,Theater,Ramen Restaurant,Bookstore,Diner
3,Downtown Toronto,0,Café,Gastropub,Coffee Shop,Creperie,Seafood Restaurant,Farmers Market,Hotel,Cosmetics Shop,Cheese Shop,Park
4,Downtown Toronto,0,Coffee Shop,Bakery,Cocktail Bar,Beer Bar,Cheese Shop,Restaurant,Café,Farmers Market,Seafood Restaurant,Breakfast Spot
7,Downtown Toronto,0,Coffee Shop,American Restaurant,Café,Restaurant,Pizza Place,Seafood Restaurant,Steakhouse,Burrito Place,New American Restaurant,Bakery
8,Downtown Toronto,0,Coffee Shop,Aquarium,Café,Bar,Plaza,Hotel,Park,New American Restaurant,Chinese Restaurant,Salad Place
9,Downtown Toronto,0,Café,Coffee Shop,Seafood Restaurant,Restaurant,Japanese Restaurant,Hotel,Beer Bar,Sandwich Place,Shopping Mall,Bakery
10,Downtown Toronto,0,Coffee Shop,Café,Restaurant,Hotel,Seafood Restaurant,Deli / Bodega,Japanese Restaurant,Beer Bar,Gym,American Restaurant
11,Downtown Toronto,0,Café,Restaurant,Japanese Restaurant,Italian Restaurant,Bookstore,Bar,Bakery,Beer Store,Beer Bar,Sandwich Place
12,Downtown Toronto,0,Café,Mexican Restaurant,Coffee Shop,Bakery,Vietnamese Restaurant,Gaming Cafe,Vegetarian / Vegan Restaurant,Dessert Shop,Grocery Store,Caribbean Restaurant
15,Downtown Toronto,0,Café,Creperie,Cocktail Bar,Seafood Restaurant,Beer Bar,Farmers Market,Restaurant,Coffee Shop,Japanese Restaurant,Bakery


In [151]:
downtown_merged.loc[downtown_merged['Cluster Labels'] == 1, downtown_merged.columns[[1] + list(range(5, downtown_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
14,Downtown Toronto,1,Park,Trail,Playground,Yoga Studio,Convenience Store,Discount Store,Diner,Dessert Shop,Department Store,Deli / Bodega


In [152]:
downtown_merged.loc[downtown_merged['Cluster Labels'] == 2, downtown_merged.columns[[1] + list(range(5, downtown_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
13,Downtown Toronto,2,Airport Service,Airport Lounge,Airport Terminal,Airport,Boat or Ferry,Boutique,Sculpture Garden,Bar,Plane,Harbor / Marina


In [153]:
downtown_merged.loc[downtown_merged['Cluster Labels'] == 3, downtown_merged.columns[[1] + list(range(5, downtown_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
6,Downtown Toronto,3,Grocery Store,Café,Park,Coffee Shop,Candy Store,Italian Restaurant,Restaurant,Baby Store,Athletics & Sports,Diner


In [154]:
downtown_merged.loc[downtown_merged['Cluster Labels'] == 4, downtown_merged.columns[[1] + list(range(5, downtown_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Downtown Toronto,4,Coffee Shop,Park,Pub,Bakery,Breakfast Spot,Theater,Café,Yoga Studio,Cosmetics Shop,Health Food Store
1,Downtown Toronto,4,Coffee Shop,Sushi Restaurant,Diner,Japanese Restaurant,Hobby Shop,Sandwich Place,Beer Bar,Distribution Center,Discount Store,Italian Restaurant
5,Downtown Toronto,4,Coffee Shop,Café,Italian Restaurant,Sandwich Place,Burger Joint,Bubble Tea Shop,Ice Cream Shop,French Restaurant,Ramen Restaurant,Diner
