# Capstone Week 3 Peer Reviewed Submission

## _Problem 1: Setting up neighborhood dataframe from Wikipedia_

### Installing/Importing all necessary tools

In [1]:
pip install BeautifulSoup4

Collecting BeautifulSoup4
[?25l  Downloading https://files.pythonhosted.org/packages/cb/a1/c698cf319e9cfed6b17376281bd0efc6bfc8465698f54170ef60a485ab5d/beautifulsoup4-4.8.2-py3-none-any.whl (106kB)
[K     |████████████████████████████████| 112kB 7.2MB/s eta 0:00:01
[?25hCollecting soupsieve>=1.2 (from BeautifulSoup4)
  Downloading https://files.pythonhosted.org/packages/05/cf/ea245e52f55823f19992447b008bcbb7f78efc5960d77f6c34b5b45b36dd/soupsieve-2.0-py2.py3-none-any.whl
Installing collected packages: soupsieve, BeautifulSoup4
Successfully installed BeautifulSoup4-4.8.2 soupsieve-2.0
Note: you may need to restart the kernel to use updated packages.


In [2]:
pip install lxml

Collecting lxml
[?25l  Downloading https://files.pythonhosted.org/packages/dd/ba/a0e6866057fc0bbd17192925c1d63a3b85cf522965de9bc02364d08e5b84/lxml-4.5.0-cp36-cp36m-manylinux1_x86_64.whl (5.8MB)
[K     |████████████████████████████████| 5.8MB 5.2MB/s eta 0:00:01
[?25hInstalling collected packages: lxml
Successfully installed lxml-4.5.0
Note: you may need to restart the kernel to use updated packages.


In [3]:
pip install Requests

Note: you may need to restart the kernel to use updated packages.


In [6]:
pip install geopy

Collecting geopy
[?25l  Downloading https://files.pythonhosted.org/packages/53/fc/3d1b47e8e82ea12c25203929efb1b964918a77067a874b2c7631e2ec35ec/geopy-1.21.0-py2.py3-none-any.whl (104kB)
[K     |████████████████████████████████| 112kB 8.8MB/s eta 0:00:01
[?25hCollecting geographiclib<2,>=1.49 (from geopy)
  Downloading https://files.pythonhosted.org/packages/8b/62/26ec95a98ba64299163199e95ad1b0e34ad3f4e176e221c40245f211e425/geographiclib-1.50-py3-none-any.whl
Installing collected packages: geographiclib, geopy
Successfully installed geographiclib-1.50 geopy-1.21.0
Note: you may need to restart the kernel to use updated packages.


In [7]:
import numpy as np
import pandas as pd
from bs4 import BeautifulSoup
import requests

# !conda install -c conda-forge geopy --yes # uncomment this line if you haven't completed the Foursquare API lab
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

# !conda install -c conda-forge folium=0.5.0 --yes # uncomment this line if you haven't completed the Foursquare API lab
import folium # map rendering library

print('Libraries imported.')

Libraries imported.


### Web Scraping using Beautiful Soup

In [8]:
URL = requests.get('https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M')
soup = BeautifulSoup(URL.text, 'lxml')

In [9]:
print(soup.title.text)

List of postal codes of Canada: M - Wikipedia


### Getting data from table in Wikipedia

In [10]:
# data = soup.table (works if there are only one table)

In [11]:
table = soup.find('table', {'class':'wikitable'})

table_rows = table.find_all('tr')

NewData = []

for tr in table_rows:
    td = []
    for tabledata in tr.find_all('td'):
        td.append(tabledata.text.strip())
    NewData.append(td)
    #note that the table heading is still missing as th is not yet defined

NewData[0:7]

[[],
 ['M1A', 'Not assigned', ''],
 ['M2A', 'Not assigned', ''],
 ['M3A', 'North York', 'Parkwoods'],
 ['M4A', 'North York', 'Victoria Village'],
 ['M5A', 'Downtown Toronto', 'Regent Park / Harbourfront'],
 ['M6A', 'North York', 'Lawrence Manor / Lawrence Heights']]

### Creating New Table

In [12]:
# assign column names
columns_name = ['Postal Code', 'Borough', 'Neighborhood']

#set up table
NewTable = pd.DataFrame(NewData, columns = columns_name)

#drop first empty row
NewTable = NewTable.drop(NewTable.index[0])

#drop not-assigned boroughs
NewTable = NewTable.drop(NewTable[NewTable.Borough == 'Not assigned'].index)

#replace neighborhood to borough when neighborhood = not assigned
NewTable = NewTable.replace(NewTable.Neighborhood=='', NewTable['Borough'])

NewTable

Unnamed: 0,Postal Code,Borough,Neighborhood
3,M3A,North York,Parkwoods
4,M4A,North York,Victoria Village
5,M5A,Downtown Toronto,Regent Park / Harbourfront
6,M6A,North York,Lawrence Manor / Lawrence Heights
7,M7A,Downtown Toronto,Queen's Park / Ontario Provincial Government
...,...,...,...
161,M8X,Etobicoke,The Kingsway / Montgomery Road / Old Mill North
166,M4Y,Downtown Toronto,Church and Wellesley
169,M7Y,East Toronto,Business reply mail Processing CentrE
170,M8Y,Etobicoke,Old Mill South / King's Mill Park / Sunnylea /...


### New Table Shape

In [13]:
NewTable.shape

(103, 3)

## _Problem 2: Latitude and Longitude Coordinates of Neighborhoods_

In [14]:
CoordinatesURL = 'http://cocl.us/Geospatial_data'

df_coordinates = pd.read_csv(CoordinatesURL)
df_coordinates.head()

Unnamed: 0,Postal Code,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497
2,M1E,43.763573,-79.188711
3,M1G,43.770992,-79.216917
4,M1H,43.773136,-79.239476


### DataFrame Merging

In [15]:
df_complete = NewTable.merge(df_coordinates, on = 'Postal Code')
df_complete

Unnamed: 0,Postal Code,Borough,Neighborhood,Latitude,Longitude
0,M3A,North York,Parkwoods,43.753259,-79.329656
1,M4A,North York,Victoria Village,43.725882,-79.315572
2,M5A,Downtown Toronto,Regent Park / Harbourfront,43.654260,-79.360636
3,M6A,North York,Lawrence Manor / Lawrence Heights,43.718518,-79.464763
4,M7A,Downtown Toronto,Queen's Park / Ontario Provincial Government,43.662301,-79.389494
...,...,...,...,...,...
98,M8X,Etobicoke,The Kingsway / Montgomery Road / Old Mill North,43.653654,-79.506944
99,M4Y,Downtown Toronto,Church and Wellesley,43.665860,-79.383160
100,M7Y,East Toronto,Business reply mail Processing CentrE,43.662744,-79.321558
101,M8Y,Etobicoke,Old Mill South / King's Mill Park / Sunnylea /...,43.636258,-79.498509


## _Problem 3: Explore and Cluster Neighborhoods in Toronto_

### Foursquare API

In [16]:
CLIENT_ID = 'W30LZUZUDPKFJP1ZXHLSNHQQZHBOVX4TKXZHPO41BXBCHCHR' # your Foursquare ID
CLIENT_SECRET = 'YSBZOBJFTMGSSA2KXJWNMO5IIPVLTWG2QPJGCQSZPEWTW1RV' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version

### Explore Toronto Venues & Neighborhoods

In [19]:
#Get Venue function
def getNearbyVenues(names, latitudes, longitudes, radius=500, LIMIT=100):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [20]:
#Toronto Venues using the defined function above

toronto_venues = getNearbyVenues(names=df_complete['Neighborhood'],
                                   latitudes=df_complete['Latitude'],
                                   longitudes=df_complete['Longitude']
                                  )

Parkwoods
Victoria Village
Regent Park / Harbourfront
Lawrence Manor / Lawrence Heights
Queen's Park / Ontario Provincial Government
Islington Avenue
Malvern / Rouge
Don Mills
Parkview Hill / Woodbine Gardens
Garden District, Ryerson
Glencairn
West Deane Park / Princess Gardens / Martin Grove / Islington / Cloverdale
Rouge Hill / Port Union / Highland Creek
Don Mills
Woodbine Heights
St. James Town
Humewood-Cedarvale
Eringate / Bloordale Gardens / Old Burnhamthorpe / Markland Wood
Guildwood / Morningside / West Hill
The Beaches
Berczy Park
Caledonia-Fairbanks
Woburn
Leaside
Central Bay Street
Christie
Cedarbrae
Hillcrest Village
Bathurst Manor / Wilson Heights / Downsview North
Thorncliffe Park
Richmond / Adelaide / King
Dufferin / Dovercourt Village
Scarborough Village
Fairview / Henry Farm / Oriole
Northwood Park / York University
East Toronto
Harbourfront East / Union Station / Toronto Islands
Little Portugal / Trinity
Kennedy Park / Ionview / East Birchmount Park
Bayview Village
Do

In [24]:
toronto_venues.head()

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Parkwoods,43.753259,-79.329656,Brookbanks Park,43.751976,-79.33214,Park
1,Parkwoods,43.753259,-79.329656,649 Variety,43.754513,-79.331942,Convenience Store
2,Parkwoods,43.753259,-79.329656,Variety Store,43.751974,-79.333114,Food & Drink Shop
3,Victoria Village,43.725882,-79.315572,Victoria Village Arena,43.723481,-79.315635,Hockey Arena
4,Victoria Village,43.725882,-79.315572,Tim Hortons,43.725517,-79.313103,Coffee Shop


In [25]:
toronto_venues.shape

(2197, 7)

In [26]:
#count venues by neighborhood
toronto_venues.groupby('Neighborhood').count()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Agincourt,5,5,5,5,5,5
Alderwood / Long Branch,8,8,8,8,8,8
Bathurst Manor / Wilson Heights / Downsview North,20,20,20,20,20,20
Bayview Village,4,4,4,4,4,4
Bedford Park / Lawrence Manor East,24,24,24,24,24,24
...,...,...,...,...,...,...
Willowdale,40,40,40,40,40,40
Woburn,3,3,3,3,3,3
Woodbine Heights,9,9,9,9,9,9
York Mills / Silver Hills,2,2,2,2,2,2


In [27]:
#unique categories from all venues
print('There are {} uniques categories.'.format(len(toronto_venues['Venue Category'].unique())))

There are 270 uniques categories.


### Analyze Toronto Neighborhoods

In [28]:
# one hot encoding
toronto_onehot = pd.get_dummies(toronto_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
toronto_onehot['Neighborhood'] = toronto_venues['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [toronto_onehot.columns[-1]] + list(toronto_onehot.columns[:-1])
toronto_onehot = toronto_onehot[fixed_columns]

toronto_onehot.head()

Unnamed: 0,Yoga Studio,Accessories Store,Afghan Restaurant,Airport,Airport Food Court,Airport Gate,Airport Lounge,Airport Service,Airport Terminal,American Restaurant,...,Turkish Restaurant,Vegetarian / Vegan Restaurant,Video Game Store,Video Store,Vietnamese Restaurant,Warehouse Store,Wine Bar,Wine Shop,Wings Joint,Women's Store
0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [30]:
# dataframe size
toronto_onehot.shape

(2197, 270)

In [31]:
# group by neighborhood
trt_grouped = toronto_onehot.groupby('Neighborhood').mean().reset_index()
trt_grouped

Unnamed: 0,Neighborhood,Yoga Studio,Accessories Store,Afghan Restaurant,Airport,Airport Food Court,Airport Gate,Airport Lounge,Airport Service,Airport Terminal,...,Turkish Restaurant,Vegetarian / Vegan Restaurant,Video Game Store,Video Store,Vietnamese Restaurant,Warehouse Store,Wine Bar,Wine Shop,Wings Joint,Women's Store
0,Agincourt,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.000000,0.000,0.0,0.0,0.0,0.0,0.0
1,Alderwood / Long Branch,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.000000,0.000,0.0,0.0,0.0,0.0,0.0
2,Bathurst Manor / Wilson Heights / Downsview North,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.000000,0.000,0.0,0.0,0.0,0.0,0.0
3,Bayview Village,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.000000,0.000,0.0,0.0,0.0,0.0,0.0
4,Bedford Park / Lawrence Manor East,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.000000,0.000,0.0,0.0,0.0,0.0,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
88,Willowdale,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.000000,0.025,0.0,0.0,0.0,0.0,0.0
89,Woburn,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.000000,0.000,0.0,0.0,0.0,0.0,0.0
90,Woodbine Heights,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.111111,0.000,0.0,0.0,0.0,0.0,0.0
91,York Mills / Silver Hills,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.000000,0.000,0.0,0.0,0.0,0.0,0.0


In [37]:
# Top 5 Venues in Each Neighborhood by Frequency

num_top_venues = 5

for hood in trt_grouped['Neighborhood']:
    temp = trt_grouped[trt_grouped['Neighborhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})

In [38]:
# Convert into dataframe
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [39]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = trt_grouped['Neighborhood']

for ind in np.arange(trt_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(trt_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted.head()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Agincourt,Lounge,Skating Rink,Latin American Restaurant,Breakfast Spot,Clothing Store,Dumpling Restaurant,Distribution Center,Dog Run,Doner Restaurant,Donut Shop
1,Alderwood / Long Branch,Pizza Place,Gym,Sandwich Place,Skating Rink,Pharmacy,Coffee Shop,Pub,Dim Sum Restaurant,Diner,Discount Store
2,Bathurst Manor / Wilson Heights / Downsview North,Coffee Shop,Bank,Chinese Restaurant,Bridal Shop,Sandwich Place,Restaurant,Diner,Supermarket,Middle Eastern Restaurant,Sushi Restaurant
3,Bayview Village,Café,Bank,Chinese Restaurant,Japanese Restaurant,Diner,Discount Store,Distribution Center,Dog Run,Doner Restaurant,Women's Store
4,Bedford Park / Lawrence Manor East,Coffee Shop,Sandwich Place,Restaurant,Italian Restaurant,Pizza Place,Juice Bar,Fast Food Restaurant,Butcher,Café,Indian Restaurant


### Cluster Neighborhoods

In [40]:
# Clustering into 5 clusters of neighborhoods - k-means method

# set number of clusters
kclusters = 5

trt_grouped_clustering = trt_grouped.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(trt_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

array([1, 1, 1, 1, 1, 1, 1, 1, 1, 1], dtype=int32)

In [48]:
# new dataframe with clusters + top 10 restaurants in each neighborhood

# add clustering labels
neighborhoods_venues_sorted.insert(0, 'Cluster_Labels', kmeans.labels_)

trt_merged = df_complete

# merge toronto_grouped with toronto_data to add latitude/longitude for each neighborhood
trt_merged = trt_merged.join(neighborhoods_venues_sorted.set_index('Neighborhood'), on='Neighborhood')

trt_merged.head() # check the last columns!

Unnamed: 0,Postal Code,Borough,Neighborhood,Latitude,Longitude,Cluster_Labels,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,M3A,North York,Parkwoods,43.753259,-79.329656,0.0,0.0,Park,Convenience Store,Food & Drink Shop,Women's Store,Drugstore,Discount Store,Distribution Center,Dog Run,Doner Restaurant,Donut Shop
1,M4A,North York,Victoria Village,43.725882,-79.315572,1.0,1.0,Hockey Arena,Intersection,Coffee Shop,Portuguese Restaurant,Women's Store,Diner,Discount Store,Distribution Center,Dog Run,Doner Restaurant
2,M5A,Downtown Toronto,Regent Park / Harbourfront,43.65426,-79.360636,1.0,1.0,Coffee Shop,Park,Pub,Theater,Bakery,Breakfast Spot,Restaurant,Café,Mexican Restaurant,Electronics Store
3,M6A,North York,Lawrence Manor / Lawrence Heights,43.718518,-79.464763,1.0,1.0,Clothing Store,Furniture / Home Store,Coffee Shop,Miscellaneous Shop,Event Space,Women's Store,Boutique,Vietnamese Restaurant,Accessories Store,Diner
4,M7A,Downtown Toronto,Queen's Park / Ontario Provincial Government,43.662301,-79.389494,1.0,1.0,Coffee Shop,Yoga Studio,Bank,Beer Bar,Boutique,Burger Joint,Burrito Place,Restaurant,Café,College Auditorium


In [50]:
trt_merged=trt_merged.dropna()

In [53]:
# visualization in map

address = 'Toronto, CA'

geolocator = Nominatim(user_agent="trt_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinates of Toronto are {}, {}.'.format(latitude, longitude))

trt_merged['Cluster_Labels'] = trt_merged.Cluster_Labels.astype(int)

# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(trt_merged['Latitude'], trt_merged['Longitude'], trt_merged['Neighborhood'], trt_merged['Cluster_Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

The geograpical coordinates of Toronto are 43.6534817, -79.3839347.


### Examine Clusters - Venues

In [54]:
# Cluster 1
trt_merged.loc[trt_merged['Cluster_Labels'] == 0, trt_merged.columns[[1] + list(range(5, trt_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster_Labels,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,North York,0,0.0,Park,Convenience Store,Food & Drink Shop,Women's Store,Drugstore,Discount Store,Distribution Center,Dog Run,Doner Restaurant,Donut Shop
21,York,0,0.0,Park,Women's Store,Market,Donut Shop,Diner,Discount Store,Distribution Center,Dog Run,Doner Restaurant,Dumpling Restaurant
35,East York,0,0.0,Park,Intersection,Convenience Store,Women's Store,Drugstore,Discount Store,Distribution Center,Dog Run,Doner Restaurant,Donut Shop
66,North York,0,0.0,Park,Bank,Convenience Store,Women's Store,Drugstore,Discount Store,Distribution Center,Dog Run,Doner Restaurant,Donut Shop
85,Scarborough,0,0.0,Park,Playground,Women's Store,Donut Shop,Dim Sum Restaurant,Diner,Discount Store,Distribution Center,Dog Run,Doner Restaurant
91,Downtown Toronto,0,0.0,Park,Playground,Trail,Doner Restaurant,Dim Sum Restaurant,Diner,Discount Store,Distribution Center,Dog Run,Donut Shop
98,Etobicoke,0,0.0,Park,Smoke Shop,River,Doner Restaurant,Dim Sum Restaurant,Diner,Discount Store,Distribution Center,Dog Run,Donut Shop


In [55]:
# Cluster 2
trt_merged.loc[trt_merged['Cluster Labels'] == 1, trt_merged.columns[[1] + list(range(5, trt_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster_Labels,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
1,North York,1,1.0,Hockey Arena,Intersection,Coffee Shop,Portuguese Restaurant,Women's Store,Diner,Discount Store,Distribution Center,Dog Run,Doner Restaurant
2,Downtown Toronto,1,1.0,Coffee Shop,Park,Pub,Theater,Bakery,Breakfast Spot,Restaurant,Café,Mexican Restaurant,Electronics Store
3,North York,1,1.0,Clothing Store,Furniture / Home Store,Coffee Shop,Miscellaneous Shop,Event Space,Women's Store,Boutique,Vietnamese Restaurant,Accessories Store,Diner
4,Downtown Toronto,1,1.0,Coffee Shop,Yoga Studio,Bank,Beer Bar,Boutique,Burger Joint,Burrito Place,Restaurant,Café,College Auditorium
6,Scarborough,1,1.0,Fast Food Restaurant,Dessert Shop,Event Space,Ethiopian Restaurant,Empanada Restaurant,Electronics Store,Eastern European Restaurant,Dumpling Restaurant,Drugstore,Donut Shop
...,...,...,...,...,...,...,...,...,...,...,...,...,...
96,Downtown Toronto,1,1.0,Coffee Shop,Pub,Park,Bakery,Pizza Place,Italian Restaurant,Chinese Restaurant,Restaurant,Café,Liquor Store
97,Downtown Toronto,1,1.0,Coffee Shop,Café,Restaurant,Hotel,American Restaurant,Steakhouse,Gym,Asian Restaurant,Japanese Restaurant,Seafood Restaurant
99,Downtown Toronto,1,1.0,Coffee Shop,Japanese Restaurant,Gay Bar,Sushi Restaurant,Restaurant,Yoga Studio,Men's Store,Café,Pizza Place,Pub
100,East Toronto,1,1.0,Light Rail Station,Garden Center,Brewery,Spa,Farmers Market,Fast Food Restaurant,Burrito Place,Restaurant,Recording Studio,Auto Workshop


In [56]:
# Cluster 3
trt_merged.loc[trt_merged['Cluster Labels'] == 2, trt_merged.columns[[1] + list(range(5, trt_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster_Labels,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
101,Etobicoke,2,2.0,Construction & Landscaping,Deli / Bodega,Baseball Field,Women's Store,Drugstore,Distribution Center,Dog Run,Doner Restaurant,Donut Shop,Dumpling Restaurant


In [57]:
# Cluster 4
trt_merged.loc[trt_merged['Cluster Labels'] == 3, trt_merged.columns[[1] + list(range(5, trt_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster_Labels,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
83,Central Toronto,3,3.0,Tennis Court,Donut Shop,Dim Sum Restaurant,Diner,Discount Store,Distribution Center,Dog Run,Doner Restaurant,Drugstore,Farmers Market


In [58]:
# Cluster 5
trt_merged.loc[trt_merged['Cluster Labels'] == 4, trt_merged.columns[[1] + list(range(5, trt_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster_Labels,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
57,North York,4,4.0,Baseball Field,Women's Store,Drugstore,Discount Store,Distribution Center,Dog Run,Doner Restaurant,Donut Shop,Dumpling Restaurant,Fast Food Restaurant
