# IBM Data Science Professional Certification
## Specialization Capstone

This Jupyter Notebook is created solely for the writing and execution of programming codes to complete the Capstone Project for the IBM Data Science Specialization taken on Coursera.org

In [78]:
import numpy as np
import pandas as pd

print('Hello Capstone Project Course!')

Hello Capstone Project Course!


## Assignment 1: Obtaining Toronto Neighborhood Data

### Scrapping Wikipedia for the Data on Toronto Neighborhoods

The data would be scrapped using pandas' _read_html()_ function which returns all the tables on the web page as dataframe objects.

In [79]:
address = 'https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M'
dfs = pd.read_html(address)

# Now, we iterate over the tables returned and check for the desired table.
for i, df in enumerate(dfs):
    print('The length of DataFrame', i + 1, ' is ', len(df))
    print(df.head())
    print('End of DataFrame')
    print()

The length of DataFrame 1  is  180
  Postal code           Borough                Neighborhood
0         M1A      Not assigned                         NaN
1         M2A      Not assigned                         NaN
2         M3A        North York                   Parkwoods
3         M4A        North York            Victoria Village
4         M5A  Downtown Toronto  Regent Park / Harbourfront
End of DataFrame

The length of DataFrame 2  is  4
                                                  0   \
0                                                NaN   
1  NL NS PE NB QC ON MB SK AB BC NU/NT YT A B C E...   
2                                                 NL   
3                                                  A   

                                                  1   \
0                              Canadian postal codes   
1  NL NS PE NB QC ON MB SK AB BC NU/NT YT A B C E...   
2                                                 NS   
3                                                

The DataFrame 1 is the desired DataFrame for the project. Now, this DataFrame would be retrieved and saved as __toronto_data__.

In [80]:
toronto_data = dfs[0]
toronto_data.head(10)

Unnamed: 0,Postal code,Borough,Neighborhood
0,M1A,Not assigned,
1,M2A,Not assigned,
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,Regent Park / Harbourfront
5,M6A,North York,Lawrence Manor / Lawrence Heights
6,M7A,Downtown Toronto,Queen's Park / Ontario Provincial Government
7,M8A,Not assigned,
8,M9A,Etobicoke,Islington Avenue
9,M1B,Scarborough,Malvern / Rouge


### Check For Rows with Borough = 'Not assigned'.

The rows whose Borough are not defined are identified and removed from the data.

In [81]:
from collections import Counter
c = Counter(toronto_data['Borough'])
print('There are ', c['Not assigned'], ' "Non Assigned" under the Borough column.')
print('The table has ', sum(c.values()), ' rows.')
print('Hence, after removing the rows containing \'Non assigned\' under the Borough column, we expect ', sum(c.values()) - c['Not assigned'], ' rows.')

There are  77  "Non Assigned" under the Borough column.
The table has  180  rows.
Hence, after removing the rows containing 'Non assigned' under the Borough column, we expect  103  rows.


Now, we remove the rows in which the Borough = 'Not assigned'.

In [82]:
toronto_data = toronto_data.dropna(inplace = False).reset_index(drop = True)
toronto_data

Unnamed: 0,Postal code,Borough,Neighborhood
0,M3A,North York,Parkwoods
1,M4A,North York,Victoria Village
2,M5A,Downtown Toronto,Regent Park / Harbourfront
3,M6A,North York,Lawrence Manor / Lawrence Heights
4,M7A,Downtown Toronto,Queen's Park / Ontario Provincial Government
...,...,...,...
98,M8X,Etobicoke,The Kingsway / Montgomery Road / Old Mill North
99,M4Y,Downtown Toronto,Church and Wellesley
100,M7Y,East Toronto,Business reply mail Processing CentrE
101,M8Y,Etobicoke,Old Mill South / King's Mill Park / Sunnylea /...


### Check for duplicated postal codes.

Now, the postal code would be checked for duplicates and the duplicated postal codes, if any, would be combined into a single row.

In [83]:
c = Counter(toronto_data.duplicated('Postal code'))
print('There are ', c['True'], ' duplicated postal codes in the table.')

There are  0  duplicated postal codes in the table.


### Check for Neighborhoods with values 'Not assigned'.

Now, we would check neighborhoods whhose values are not assigned. If any is found, we would set the values to the name of the Borough.

In [84]:
c = Counter(toronto_data['Neighborhood'] == 'Not assigned')
print('There are ', c['True'], ' neighborhood that have their Neighborhood = "Not assigned".')

There are  0  neighborhood that have their Neighborhood = "Not assigned".


Hence, data has been well cleaned. And we can move on to the acquisition of the location data.

## Assignment 2: Obtaining the Geographical Locations of the Neighborhoods

Firstly, we import all necessary data, then, we obtain the location code for Toronto, Canada

In [85]:
from pandas.io.json import json_normalize

from IPython.display import Image
from IPython.core.display import HTML

from geopy.geocoders import Nominatim
import folium

In [153]:
address = 'Toronto, Canada'

geolocator = Nominatim(user_agent="ny_explorer")
location = geolocator.geocode(address)
tor_latitude = location.latitude
tor_longitude = location.longitude
print('The geograpical coordinate of Toronto are {}, {}.'.format(tor_latitude, tor_longitude))

The geograpical coordinate of Toronto are 43.6534817, -79.3839347.


The location code for canada has been saved as __tor_latitude__ and __tor_longitude__ for the latitude and the longitude respectively.

Now, we would obtain the location code for each neighborhood in the DataFrame __toronto_data__ and fit them in a DataFrame to be named __geo_coord__.

In [178]:
lat = []
long = []
for neigh, bor in zip(toronto_data['Neighborhood'], toronto_data['Borough']):
    address = f'{neigh}, {bor}'

    geolocator = Nominatim(user_agent="ny_explorer")
    
    # Due to the service failure when using Geocode, we iterate over and address 10 times.
    location, count = None, 0
    while not location and count <= 10: 
        location, count = geolocator.geocode(address), count + 1
        
    # If the code breaks out of the loop after 10 iterations, then NaN is returned for the latitude and Longitude
    if count <= 10: latitude, longitude = location.latitude, location.longitude
    else: latitude, longitude = np.nan, np.nan
    
    # Done is printed to show progress of the loop.
    lat.append(latitude)
    long.append(longitude)
    print('Done')
    
lat, long

Done
Done
Done
Done
Done
Done
Done
Done
Done
Done
Done
Done
Done
Done
Done
Done
Done
Done
Done
Done
Done
Done
Done
Done
Done
Done
Done
Done
Done
Done
Done
Done
Done
Done
Done
Done
Done
Done
Done
Done
Done
Done
Done
Done
Done
Done
Done
Done
Done
Done
Done
Done
Done
Done
Done
Done
Done
Done
Done
Done
Done
Done
Done
Done
Done
Done
Done
Done
Done
Done
Done
Done
Done
Done
Done
Done
Done
Done
Done
Done
Done
Done
Done
Done
Done
Done
Done
Done
Done
Done
Done
Done
Done
Done
Done
Done
Done
Done
Done
Done
Done


GeocoderTimedOut: Service timed out

The code has broken due to service time-out. But let's visualize what we have obtained as a DataFrame before going for the alternative provided.

In [188]:
lat, long = pd.DataFrame(lat), pd.Series(long, name='Longitude')
geo_coord = lat.join(long)
geo_coord.columns = ['Latitude', 'Longitude']
geo_coord.head(10)

Unnamed: 0,Latitude,Longitude
0,43.7588,-79.320197
1,43.732658,-79.311189
2,,
3,,
4,,
5,43.623657,-79.514873
6,43.809196,-79.221701
7,43.775347,-79.345944
8,,
9,,


Since, there are a number of NaN reported for the Latitude and Longitude and the loop was broken due to service failure, then, let's retrieve the data from the CSV file provided.

In [190]:
retrieved_coord = pd.read_csv('Geospatial_Coordinates.csv')
retrieved_coord.columns = ['Postal code', 'Latitude', 'Longitude']
retrieved_coord.head(30)

Unnamed: 0,Postal code,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497
2,M1E,43.763573,-79.188711
3,M1G,43.770992,-79.216917
4,M1H,43.773136,-79.239476
5,M1J,43.744734,-79.239476
6,M1K,43.727929,-79.262029
7,M1L,43.711112,-79.284577
8,M1M,43.716316,-79.239476
9,M1N,43.692657,-79.264848


The retrieved data, __retrieved_coord__ would be merged with the initial neighborhood data, __toronto_data__, to obtain the final DataFrame required.

In [191]:
# Merge left to obtain retain all the rows on the toronto_data DataFrame.
final_df = toronto_data.merge(retrieved_coord, how = 'left', on='Postal code')
final_df

Unnamed: 0,Postal code,Borough,Neighborhood,Latitude,Longitude
0,M3A,North York,Parkwoods,43.753259,-79.329656
1,M4A,North York,Victoria Village,43.725882,-79.315572
2,M5A,Downtown Toronto,Regent Park / Harbourfront,43.654260,-79.360636
3,M6A,North York,Lawrence Manor / Lawrence Heights,43.718518,-79.464763
4,M7A,Downtown Toronto,Queen's Park / Ontario Provincial Government,43.662301,-79.389494
...,...,...,...,...,...
98,M8X,Etobicoke,The Kingsway / Montgomery Road / Old Mill North,43.653654,-79.506944
99,M4Y,Downtown Toronto,Church and Wellesley,43.665860,-79.383160
100,M7Y,East Toronto,Business reply mail Processing CentrE,43.662744,-79.321558
101,M8Y,Etobicoke,Old Mill South / King's Mill Park / Sunnylea /...,43.636258,-79.498509


In [198]:
final_df.shape

(103, 5)

Hence, the conclusion of the second assignment.

## Assignment 3: Clustering of the Neighborhoods Using k-Means Clustering

This assignment would be based on a guide from an earlier completed lab work on similar data.

### Visualizing the City of Toronto on the map and superimposing the neighborhoods.

The Toronto geographical point is highlighted in red while the neighborhoods are highlighted in blue.

In [202]:
# Creating the map of Toronto Using Folium.

map_toronto = folium.Map(location=[tor_latitude, tor_longitude], zoom_start=11)

# add markers to map
for lat, lng, bor, neigh in zip(final_df['Latitude'], final_df['Longitude'], final_df['Borough'], final_df['Neighborhood']):
    label = '{}, {}'.format(neigh, bor)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=4,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_toronto) 

# Add a marker to show the Toronto central point
folium.CircleMarker(
        [tor_latitude, tor_longitude],
        radius=7,
        popup='Toronto',
        color='red',
        fill=True,
        fill_color='black',
        fill_opacity=0.8,
        parse_html=False).add_to(map_toronto)
    
map_toronto

### Initializing the Foursquare API Credentials

In [203]:
CLIENT_ID = 'JAMTHWMLQGSSK4APE5UVTZNFNWZKZOU3YOJYYBALEVQJZFNL' # your Foursquare ID
CLIENT_SECRET = 'TVE4TOA2P1MSCEAR0121SI4PBPA5A3RW2NWQH00GIEKN3PTT' # your Foursquare Secret
VERSION = '20200417' # Foursquare API version

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: JAMTHWMLQGSSK4APE5UVTZNFNWZKZOU3YOJYYBALEVQJZFNL
CLIENT_SECRET:TVE4TOA2P1MSCEAR0121SI4PBPA5A3RW2NWQH00GIEKN3PTT


Creating the URL required to explore the neighborhoods around Toronto central point in radius of 500meters.
A limit of 100 is set.

In [204]:
radius = 500
LIMIT = 100
url = 'https://api.foursquare.com/v2/venues/explore?client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(CLIENT_ID, CLIENT_SECRET, VERSION, tor_latitude, tor_longitude, radius, LIMIT)
url


'https://api.foursquare.com/v2/venues/explore?client_id=JAMTHWMLQGSSK4APE5UVTZNFNWZKZOU3YOJYYBALEVQJZFNL&client_secret=TVE4TOA2P1MSCEAR0121SI4PBPA5A3RW2NWQH00GIEKN3PTT&v=20200417&ll=43.6534817,-79.3839347&radius=500&limit=100'

### Obtaining the JSON File for the Locations around Toronto Central Point

Obtain the JSON file of the reponse obtained from FourSquare API using the __request__ module.

In [207]:
import requests
results = requests.get(url).json()
results

{'meta': {'code': 200, 'requestId': '5e9aaf3b9fcb92001bd4cbba'},
 'response': {'headerLocation': 'Bay Street Corridor',
  'headerFullLocation': 'Bay Street Corridor, Toronto',
  'headerLocationGranularity': 'neighborhood',
  'totalResults': 70,
  'suggestedBounds': {'ne': {'lat': 43.6579817045, 'lng': -79.37772678059432},
   'sw': {'lat': 43.6489816955, 'lng': -79.39014261940568}},
  'groups': [{'type': 'Recommended Places',
    'name': 'recommended',
    'items': [{'reasons': {'count': 0,
       'items': [{'summary': 'This spot is popular',
         'type': 'general',
         'reasonName': 'globalInteractionReason'}]},
      'venue': {'id': '5227bb01498e17bf485e6202',
       'name': 'Downtown Toronto',
       'location': {'lat': 43.65323167517444,
        'lng': -79.38529600606677,
        'labeledLatLngs': [{'label': 'display',
          'lat': 43.65323167517444,
          'lng': -79.38529600606677}],
        'distance': 113,
        'cc': 'CA',
        'city': 'Toronto',
        's

Trying to get the location details of the first listed location by indexing to the last value required.

In [218]:
get_ = results['response']['groups'][0]['items'][0]['venue']
print(get_)
print('ID: ', get_['id'])
print('Name: ', get_['name'])
print('Latitude, Longitude: ', get_['location']['lat'], get_['location']['lat'])


{'id': '5227bb01498e17bf485e6202', 'name': 'Downtown Toronto', 'location': {'lat': 43.65323167517444, 'lng': -79.38529600606677, 'labeledLatLngs': [{'label': 'display', 'lat': 43.65323167517444, 'lng': -79.38529600606677}], 'distance': 113, 'cc': 'CA', 'city': 'Toronto', 'state': 'ON', 'country': 'Canada', 'formattedAddress': ['Toronto ON', 'Canada']}, 'categories': [{'id': '4f2a25ac4b909258e854f55f', 'name': 'Neighborhood', 'pluralName': 'Neighborhoods', 'shortName': 'Neighborhood', 'icon': {'prefix': 'https://ss3.4sqi.net/img/categories_v2/parks_outdoors/neighborhood_', 'suffix': '.png'}, 'primary': True}], 'photos': {'count': 0, 'groups': []}}
Name:  Downtown Toronto
Latitude, Longitude:  43.65323167517444 43.65323167517444


Now, using _json_normalize()_ to obtain a DataFrame table that shows the details of the returned venues.

In [223]:
venues = json_normalize(results['response']['groups'][0]['items'])
venues.head(5)

Unnamed: 0,referralId,reasons.count,reasons.items,venue.id,venue.name,venue.location.lat,venue.location.lng,venue.location.labeledLatLngs,venue.location.distance,venue.location.cc,...,venue.location.country,venue.location.formattedAddress,venue.categories,venue.photos.count,venue.photos.groups,venue.location.address,venue.location.crossStreet,venue.location.postalCode,venue.venuePage.id,venue.location.neighborhood
0,e-0-5227bb01498e17bf485e6202-0,0,"[{'summary': 'This spot is popular', 'type': '...",5227bb01498e17bf485e6202,Downtown Toronto,43.653232,-79.385296,"[{'label': 'display', 'lat': 43.65323167517444...",113,CA,...,Canada,"[Toronto ON, Canada]","[{'id': '4f2a25ac4b909258e854f55f', 'name': 'N...",0,[],,,,,
1,e-0-4ad4c05ef964a520a6f620e3-1,0,"[{'summary': 'This spot is popular', 'type': '...",4ad4c05ef964a520a6f620e3,Nathan Phillips Square,43.65227,-79.383516,"[{'label': 'display', 'lat': 43.65227047322295...",138,CA,...,Canada,"[100 Queen St W (at Bay St), Toronto ON M5H 2N...","[{'id': '4bf58dd8d48988d164941735', 'name': 'P...",0,[],100 Queen St W,at Bay St,M5H 2N1,,
2,e-0-537773d1498e74a75bb75c1e-2,0,"[{'summary': 'This spot is popular', 'type': '...",537773d1498e74a75bb75c1e,Eggspectation Bell Trinity Square,43.653144,-79.38198,"[{'label': 'display', 'lat': 43.65314383888587...",161,CA,...,Canada,"[483 Bay Street (Albert Street), Toronto ON M5...","[{'id': '4bf58dd8d48988d143941735', 'name': 'B...",0,[],483 Bay Street,Albert Street,M5G 2C9,97507838.0,
3,e-0-4ae7b27df964a52068ad21e3-3,0,"[{'summary': 'This spot is popular', 'type': '...",4ae7b27df964a52068ad21e3,Japango,43.655268,-79.385165,"[{'label': 'display', 'lat': 43.65526771691681...",222,CA,...,Canada,"[122 Elizabeth St. (at Dundas St. W), Toronto ...","[{'id': '4bf58dd8d48988d1d2941735', 'name': 'S...",0,[],122 Elizabeth St.,at Dundas St. W,M5G 1P5,,
4,e-0-4b2a6eb8f964a52012a924e3-4,0,"[{'summary': 'This spot is popular', 'type': '...",4b2a6eb8f964a52012a924e3,Indigo,43.653515,-79.380696,"[{'label': 'display', 'lat': 43.65351471121164...",260,CA,...,Canada,"[220 Yonge St, Toronto ON M5B 2H1, Canada]","[{'id': '4bf58dd8d48988d114951735', 'name': 'B...",0,[],220 Yonge St,,M5B 2H1,,Downtown Yonge


### Cleaning the Data Obtained from FourSquare.

Using the FourSquare function, we extract the venue categories from each row at venue.categories.

In [236]:
# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

Now, we filter the columns that have location details and append the categories to the table. This gives a general insight to the data obtained and the features therein.

In [237]:
filter_column = ['venue.id', 'venue.name'] + [col for col in venues.columns if col.startswith('venue.location')]
venues_filtered = venues[filter_column]
venues_filtered.columns = ['id', 'name'] + ['.'.join(col.split('.')[2:]) for col in venues.columns if col.startswith('venue.location')]
venues_filtered['categories'] = venues.apply(get_category_type, axis = 1)
venues_filtered.head(5)

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  after removing the cwd from sys.path.


Unnamed: 0,id,name,lat,lng,labeledLatLngs,distance,cc,city,state,country,formattedAddress,address,crossStreet,postalCode,neighborhood,categories
0,5227bb01498e17bf485e6202,Downtown Toronto,43.653232,-79.385296,"[{'label': 'display', 'lat': 43.65323167517444...",113,CA,Toronto,ON,Canada,"[Toronto ON, Canada]",,,,,Neighborhood
1,4ad4c05ef964a520a6f620e3,Nathan Phillips Square,43.65227,-79.383516,"[{'label': 'display', 'lat': 43.65227047322295...",138,CA,Toronto,ON,Canada,"[100 Queen St W (at Bay St), Toronto ON M5H 2N...",100 Queen St W,at Bay St,M5H 2N1,,Plaza
2,537773d1498e74a75bb75c1e,Eggspectation Bell Trinity Square,43.653144,-79.38198,"[{'label': 'display', 'lat': 43.65314383888587...",161,CA,Toronto,ON,Canada,"[483 Bay Street (Albert Street), Toronto ON M5...",483 Bay Street,Albert Street,M5G 2C9,,Breakfast Spot
3,4ae7b27df964a52068ad21e3,Japango,43.655268,-79.385165,"[{'label': 'display', 'lat': 43.65526771691681...",222,CA,Toronto,ON,Canada,"[122 Elizabeth St. (at Dundas St. W), Toronto ...",122 Elizabeth St.,at Dundas St. W,M5G 1P5,,Sushi Restaurant
4,4b2a6eb8f964a52012a924e3,Indigo,43.653515,-79.380696,"[{'label': 'display', 'lat': 43.65351471121164...",260,CA,Toronto,ON,Canada,"[220 Yonge St, Toronto ON M5B 2H1, Canada]",220 Yonge St,,M5B 2H1,Downtown Yonge,Bookstore


Now, we select the data needed for our analysis.

In [238]:
nearby_venues = venues_filtered[['name', 'categories', 'lat', 'lng']]
nearby_venues.head()

Unnamed: 0,name,categories,lat,lng
0,Downtown Toronto,Neighborhood,43.653232,-79.385296
1,Nathan Phillips Square,Plaza,43.65227,-79.383516
2,Eggspectation Bell Trinity Square,Breakfast Spot,43.653144,-79.38198
3,Japango,Sushi Restaurant,43.655268,-79.385165
4,Indigo,Bookstore,43.653515,-79.380696


In [240]:
print('We have ', nearby_venues.shape[0], ' venues returned.')

We have  70  venues returned.


### Obtaining all venues around Toronto by Repeating above steps for all neighborhoods retrieved.

The neighborhood data overview is thus:

In [242]:
final_df.head(3)

Unnamed: 0,Postal code,Borough,Neighborhood,Latitude,Longitude
0,M3A,North York,Parkwoods,43.753259,-79.329656
1,M4A,North York,Victoria Village,43.725882,-79.315572
2,M5A,Downtown Toronto,Regent Park / Harbourfront,43.65426,-79.360636


Code to retrieve the venues around these neighborhoods:

In [285]:
def getNearByVenues(neighborhood, latitude, longitude, radius = 500, LIMIT = 100):
    
    # Initialize the list of the venues
    venues_list = []
    
    for neigh, lat, lng in zip(neighborhood, latitude, longitude):
        print(neigh)
        
        # Initialize the FourSquare API URL.
        url = 'https://api.foursquare.com/v2/venues/explore?client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(CLIENT_ID, CLIENT_SECRET, VERSION, lat, lng, radius, LIMIT)
        
        # Make the get request:
        result = requests.get(url).json()['response']['groups'][0]['items']
        
        # Retrieve the necessary data.
        venues_list.append([(
            neigh,
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in result])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return nearby_venues

In [348]:
toronto_venues = getNearByVenues(neighborhood=final_df['Neighborhood'], longitude=final_df['Longitude'], latitude=final_df['Latitude'])
toronto_venues

Parkwoods
Victoria Village
Regent Park / Harbourfront
Lawrence Manor / Lawrence Heights
Queen's Park / Ontario Provincial Government
Islington Avenue
Malvern / Rouge
Don Mills
Parkview Hill / Woodbine Gardens
Garden District, Ryerson
Glencairn
West Deane Park / Princess Gardens / Martin Grove / Islington / Cloverdale
Rouge Hill / Port Union / Highland Creek
Don Mills
Woodbine Heights
St. James Town
Humewood-Cedarvale
Eringate / Bloordale Gardens / Old Burnhamthorpe / Markland Wood
Guildwood / Morningside / West Hill
The Beaches
Berczy Park
Caledonia-Fairbanks
Woburn
Leaside
Central Bay Street
Christie
Cedarbrae
Hillcrest Village
Bathurst Manor / Wilson Heights / Downsview North
Thorncliffe Park
Richmond / Adelaide / King
Dufferin / Dovercourt Village
Scarborough Village
Fairview / Henry Farm / Oriole
Northwood Park / York University
East Toronto
Harbourfront East / Union Station / Toronto Islands
Little Portugal / Trinity
Kennedy Park / Ionview / East Birchmount Park
Bayview Village
Do

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Parkwoods,43.753259,-79.329656,Brookbanks Park,43.751976,-79.332140,Park
1,Parkwoods,43.753259,-79.329656,GTA Restoration,43.753396,-79.333477,Fireworks Store
2,Parkwoods,43.753259,-79.329656,Variety Store,43.751974,-79.333114,Food & Drink Shop
3,Victoria Village,43.725882,-79.315572,Victoria Village Arena,43.723481,-79.315635,Hockey Arena
4,Victoria Village,43.725882,-79.315572,Tim Hortons,43.725517,-79.313103,Coffee Shop
...,...,...,...,...,...,...,...
2119,Mimico NW / The Queensway West / South of Bloo...,43.628841,-79.520999,RONA,43.629393,-79.518320,Hardware Store
2120,Mimico NW / The Queensway West / South of Bloo...,43.628841,-79.520999,Once Upon A Child,43.631075,-79.518290,Kids Store
2121,Mimico NW / The Queensway West / South of Bloo...,43.628841,-79.520999,Value Village,43.631269,-79.518238,Thrift / Vintage Store
2122,Mimico NW / The Queensway West / South of Bloo...,43.628841,-79.520999,Kingsway Boxing Club,43.627254,-79.526684,Gym


#### Analysis Result Obtained From All Neighborhoods

Let's check the number of results obtained from all neighborhoods.

In [349]:
print('The number of rows generated from all neighborhoods in Toronto are: ', toronto_venues.shape[0])

The number of rows generated from all neighborhoods in Toronto are:  2124


Let's check the number of results obtain per neighborhood.

In [350]:
neigh_result = toronto_venues.groupby('Neighborhood').count()[['Venue']]
neigh_result.columns = ['Counts of Venue Returned']
neigh_result

Unnamed: 0_level_0,Counts of Venue Returned
Neighborhood,Unnamed: 1_level_1
Agincourt,4
Alderwood / Long Branch,10
Bathurst Manor / Wilson Heights / Downsview North,19
Bayview Village,4
Bedford Park / Lawrence Manor East,26
...,...
Willowdale,40
Woburn,4
Woodbine Heights,8
York Mills / Silver Hills,2


Let's obtain the number of unique categories that could be obtained from the result.

In [351]:
print('There are ', len(toronto_venues['Venue Category'].unique()), ' unique categories returned.')

There are  268  unique categories returned.


## Analysing Each Neighborhood and Preparing Data for Clustering

Hot-Encoding of the __Venue Categories__ column so as to prepare it for KMeans Clustering.

In [352]:
# one hot encoding
toronto_onehot = pd.get_dummies(toronto_venues[['Venue Category']], prefix="", prefix_sep="")
toronto_onehot = pd.DataFrame(toronto_onehot)

# add neighborhood column back to dataframe
toronto_onehot['Neighborhoods'] = pd.Series(toronto_venues['Neighborhood'])

# move neighborhood column to the first column
fixed_columns = [toronto_onehot.columns[-1]] + list(toronto_onehot.columns[:-1])
toronto_onehot = toronto_onehot[fixed_columns]

toronto_onehot.head(10)

Unnamed: 0,Neighborhoods,Accessories Store,Airport,Airport Food Court,Airport Gate,Airport Lounge,Airport Service,Airport Terminal,American Restaurant,Antique Shop,...,Train Station,Vegetarian / Vegan Restaurant,Video Game Store,Video Store,Vietnamese Restaurant,Warehouse Store,Wine Bar,Wings Joint,Women's Store,Yoga Studio
0,Parkwoods,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,Parkwoods,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,Parkwoods,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,Victoria Village,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,Victoria Village,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
5,Victoria Village,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
6,Victoria Village,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
7,Victoria Village,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
8,Regent Park / Harbourfront,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
9,Regent Park / Harbourfront,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


Now, we group the table by the __Neighborhoods__ column and obtain the mean value of all features. This stands as a means of standardization of the data sets.

In [353]:
neigh_grouped = toronto_onehot.groupby('Neighborhoods').mean().reset_index()
neigh_grouped.head(10)

Unnamed: 0,Neighborhoods,Accessories Store,Airport,Airport Food Court,Airport Gate,Airport Lounge,Airport Service,Airport Terminal,American Restaurant,Antique Shop,...,Train Station,Vegetarian / Vegan Restaurant,Video Game Store,Video Store,Vietnamese Restaurant,Warehouse Store,Wine Bar,Wings Joint,Women's Store,Yoga Studio
0,Agincourt,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,Alderwood / Long Branch,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,Bathurst Manor / Wilson Heights / Downsview North,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.052632,0.0,0.0,0.0,0.0,0.0,0.0
3,Bayview Village,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,Bedford Park / Lawrence Manor East,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.038462,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
5,Berczy Park,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.017857,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
6,Birch Cliff / Cliffside West,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
7,Brockton / Parkdale Village / Exhibition Place,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
8,Business reply mail Processing CentrE,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
9,CN Tower / King and Spadina / Railway Lands / ...,0.0,0.055556,0.055556,0.055556,0.111111,0.166667,0.111111,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


### Sorting the table into the top ten columns

We define the function which would be used to sort the respective rows in the DataFrame.

In [354]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

Now, let's sort the DataFrame __neigh_grouped__ and outline the ten most common venues.

In [441]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
venues_sorted = pd.DataFrame(columns=columns)
venues_sorted['Neighborhood'] = neigh_grouped['Neighborhoods']

for ind in np.arange(neigh_grouped.shape[0]):
    venues_sorted.iloc[ind, 1:] = return_most_common_venues(neigh_grouped.iloc[ind, :], num_top_venues)

venues_sorted.head(10)

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Agincourt,Lounge,Breakfast Spot,Latin American Restaurant,Skating Rink,Eastern European Restaurant,Dog Run,Doner Restaurant,Donut Shop,Drugstore,Dumpling Restaurant
1,Alderwood / Long Branch,Pizza Place,Gym,Pharmacy,Sandwich Place,Pub,Athletics & Sports,Pool,Skating Rink,Coffee Shop,Convenience Store
2,Bathurst Manor / Wilson Heights / Downsview North,Coffee Shop,Bank,Fried Chicken Joint,Bridal Shop,Sandwich Place,Restaurant,Diner,Ice Cream Shop,Supermarket,Sushi Restaurant
3,Bayview Village,Café,Bank,Chinese Restaurant,Japanese Restaurant,Yoga Studio,Dog Run,Doner Restaurant,Donut Shop,Drugstore,Dumpling Restaurant
4,Bedford Park / Lawrence Manor East,Sandwich Place,Italian Restaurant,Restaurant,Coffee Shop,Sushi Restaurant,Pizza Place,Comfort Food Restaurant,Thai Restaurant,Juice Bar,Fast Food Restaurant
5,Berczy Park,Coffee Shop,Cocktail Bar,Seafood Restaurant,Farmers Market,Bakery,Italian Restaurant,Restaurant,Cheese Shop,Café,Beer Bar
6,Birch Cliff / Cliffside West,College Stadium,General Entertainment,Skating Rink,Café,Discount Store,Distribution Center,Dog Run,Doner Restaurant,Donut Shop,Dumpling Restaurant
7,Brockton / Parkdale Village / Exhibition Place,Café,Breakfast Spot,Coffee Shop,Bakery,Office,Convenience Store,Performing Arts Venue,Pet Store,Climbing Gym,Restaurant
8,Business reply mail Processing CentrE,Light Rail Station,Garden Center,Pizza Place,Restaurant,Burrito Place,Brewery,Skate Park,Farmers Market,Fast Food Restaurant,Spa
9,CN Tower / King and Spadina / Railway Lands / ...,Airport Service,Airport Lounge,Airport Terminal,Harbor / Marina,Rental Car Location,Coffee Shop,Boat or Ferry,Boutique,Bar,Plane


## K-Means Clustering

This Unsupervised Machine Learning algorithm would be used to group the dataset into clusters based on the **_venues categories_** in the neighbourhoods.

Now, we fit the data in the model and obtain the labels for the neighborhoods.

In [442]:
from sklearn.cluster import KMeans

# set number of clusters
kclusters = 5

neigh_clustering = neigh_grouped.drop('Neighborhoods', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, n_init = 12, random_state=200).fit(neigh_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_

array([0, 2, 0, 0, 0, 0, 0, 0, 0, 0, 3, 0, 0, 0, 0, 0, 2, 0, 0, 0, 2, 1,
       0, 2, 0, 2, 3, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2, 0, 0, 0, 0, 0,
       2, 0, 3, 0, 0, 1, 3, 0, 0, 0, 3, 0, 0, 0, 0, 0, 2, 3, 0, 0, 0, 3,
       0, 2, 0, 2, 4, 2, 0, 0, 2, 0, 0, 0, 0, 0, 0, 3, 2, 0, 0, 0, 2, 3,
       2, 0, 0, 0, 3, 3])

### Adding the Clustering Label to the DataFrame

Now, we add a new column __Cluster Labels__ to the DataFrame and also, merge the DataFrame with the initial DataFrame that contains the location data of each neighborhood, __final_df__.

In [443]:
# add clustering labels
venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)
venues_sorted['Cluster Labels'] = venues_sorted['Cluster Labels']
neigh_merged = final_df

# merge toronto_grouped with toronto_data to add latitude/longitude for each neighborhood
neigh_merged = neigh_merged.join(venues_sorted.set_index('Neighborhood'), on='Neighborhood')

neigh_merged.head() # check the last columns!

Unnamed: 0,Postal code,Borough,Neighborhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,M3A,North York,Parkwoods,43.753259,-79.329656,3.0,Park,Fireworks Store,Food & Drink Shop,Event Space,Ethiopian Restaurant,Empanada Restaurant,Electronics Store,Eastern European Restaurant,Farmers Market,Dumpling Restaurant
1,M4A,North York,Victoria Village,43.725882,-79.315572,0.0,Grocery Store,Coffee Shop,Nail Salon,Portuguese Restaurant,Hockey Arena,Electronics Store,Empanada Restaurant,Eastern European Restaurant,Dumpling Restaurant,Dim Sum Restaurant
2,M5A,Downtown Toronto,Regent Park / Harbourfront,43.65426,-79.360636,0.0,Coffee Shop,Bakery,Pub,Park,Theater,Café,Breakfast Spot,Restaurant,Chocolate Shop,Distribution Center
3,M6A,North York,Lawrence Manor / Lawrence Heights,43.718518,-79.464763,0.0,Clothing Store,Women's Store,Accessories Store,Coffee Shop,Event Space,Shoe Store,Miscellaneous Shop,Furniture / Home Store,Vietnamese Restaurant,Boutique
4,M7A,Downtown Toronto,Queen's Park / Ontario Provincial Government,43.662301,-79.389494,0.0,Coffee Shop,Sushi Restaurant,Diner,Burrito Place,Bank,Bar,Beer Bar,Spa,Italian Restaurant,Burger Joint


Now, let's check the cluster labels inserted.

In [444]:
print(list(neigh_merged['Cluster Labels']))

[3.0, 0.0, 0.0, 0.0, 0.0, nan, 1.0, 0.0, 2.0, 0.0, 0.0, nan, 2.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 3.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 2.0, 0.0, 2.0, 4.0, 0.0, 0.0, 3.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 3.0, 0.0, 0.0, 0.0, 3.0, 2.0, 0.0, nan, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 3.0, 0.0, 2.0, 3.0, 2.0, 3.0, 2.0, 0.0, 0.0, 2.0, 2.0, 0.0, 0.0, 0.0, 0.0, 0.0, 2.0, 0.0, 0.0, 0.0, 0.0, 2.0, 0.0, 0.0, 3.0, 0.0, 0.0, 0.0, 2.0, 2.0, 3.0, 0.0, 2.0, 0.0, nan, 0.0, 0.0, 3.0, 0.0, 0.0, 0.0, 0.0]


This is an abnomaly as __nan__ cannot be plotted on the map. Since, they are few, they would be dropped from the DataFrame.

In [445]:
neigh_merged = neigh_merged.dropna(axis = 0).reset_index(drop=True)
neigh_merged

Unnamed: 0,Postal code,Borough,Neighborhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,M3A,North York,Parkwoods,43.753259,-79.329656,3.0,Park,Fireworks Store,Food & Drink Shop,Event Space,Ethiopian Restaurant,Empanada Restaurant,Electronics Store,Eastern European Restaurant,Farmers Market,Dumpling Restaurant
1,M4A,North York,Victoria Village,43.725882,-79.315572,0.0,Grocery Store,Coffee Shop,Nail Salon,Portuguese Restaurant,Hockey Arena,Electronics Store,Empanada Restaurant,Eastern European Restaurant,Dumpling Restaurant,Dim Sum Restaurant
2,M5A,Downtown Toronto,Regent Park / Harbourfront,43.654260,-79.360636,0.0,Coffee Shop,Bakery,Pub,Park,Theater,Café,Breakfast Spot,Restaurant,Chocolate Shop,Distribution Center
3,M6A,North York,Lawrence Manor / Lawrence Heights,43.718518,-79.464763,0.0,Clothing Store,Women's Store,Accessories Store,Coffee Shop,Event Space,Shoe Store,Miscellaneous Shop,Furniture / Home Store,Vietnamese Restaurant,Boutique
4,M7A,Downtown Toronto,Queen's Park / Ontario Provincial Government,43.662301,-79.389494,0.0,Coffee Shop,Sushi Restaurant,Diner,Burrito Place,Bank,Bar,Beer Bar,Spa,Italian Restaurant,Burger Joint
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
94,M8X,Etobicoke,The Kingsway / Montgomery Road / Old Mill North,43.653654,-79.506944,3.0,Park,Pool,River,Doner Restaurant,Dim Sum Restaurant,Diner,Discount Store,Distribution Center,Dog Run,Donut Shop
95,M4Y,Downtown Toronto,Church and Wellesley,43.665860,-79.383160,0.0,Coffee Shop,Sushi Restaurant,Japanese Restaurant,Restaurant,Yoga Studio,Hotel,Burger Joint,Pub,Mediterranean Restaurant,Men's Store
96,M7Y,East Toronto,Business reply mail Processing CentrE,43.662744,-79.321558,0.0,Light Rail Station,Garden Center,Pizza Place,Restaurant,Burrito Place,Brewery,Skate Park,Farmers Market,Fast Food Restaurant,Spa
97,M8Y,Etobicoke,Old Mill South / King's Mill Park / Sunnylea /...,43.636258,-79.498509,0.0,Locksmith,Breakfast Spot,Construction & Landscaping,Baseball Field,Eastern European Restaurant,Dog Run,Doner Restaurant,Donut Shop,Drugstore,Dumpling Restaurant


Now, the __nan__ values have been dropped. let's recheck our __Cluster Labels__ for other visible faults.

In [446]:
print(list(neigh_merged['Cluster Labels']))

[3.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 2.0, 0.0, 0.0, 2.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 3.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 2.0, 0.0, 2.0, 4.0, 0.0, 0.0, 3.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 3.0, 0.0, 0.0, 0.0, 3.0, 2.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 3.0, 0.0, 2.0, 3.0, 2.0, 3.0, 2.0, 0.0, 0.0, 2.0, 2.0, 0.0, 0.0, 0.0, 0.0, 0.0, 2.0, 0.0, 0.0, 0.0, 0.0, 2.0, 0.0, 0.0, 3.0, 0.0, 0.0, 0.0, 2.0, 2.0, 3.0, 0.0, 2.0, 0.0, 0.0, 0.0, 3.0, 0.0, 0.0, 0.0, 0.0]


None! We are good to go. Hence, we can now plot.

## Map of Toronto Depicting the Clustering of Cities

First, we import necessary modules.

In [447]:
from matplotlib.pyplot import cm
from matplotlib import colors

Now, we create the map required.

In [448]:
# create map
map_clusters = folium.Map(location=[tor_latitude, tor_longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(neigh_merged['Latitude'], neigh_merged['Longitude'], neigh_merged['Neighborhood'], list(neigh_merged['Cluster Labels'])):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[int(cluster) -1],
        fill=True,
        fill_color=rainbow[int(cluster)-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

## Neighborhood Classification By Clusters

This is to check the correlation among the neighbourhood in the same cluster.

In [449]:
neigh_merged.loc[neigh_merged['Cluster Labels'] == 0, neigh_merged.columns[[1] + list(range(5, neigh_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
1,North York,0.0,Grocery Store,Coffee Shop,Nail Salon,Portuguese Restaurant,Hockey Arena,Electronics Store,Empanada Restaurant,Eastern European Restaurant,Dumpling Restaurant,Dim Sum Restaurant
2,Downtown Toronto,0.0,Coffee Shop,Bakery,Pub,Park,Theater,Café,Breakfast Spot,Restaurant,Chocolate Shop,Distribution Center
3,North York,0.0,Clothing Store,Women's Store,Accessories Store,Coffee Shop,Event Space,Shoe Store,Miscellaneous Shop,Furniture / Home Store,Vietnamese Restaurant,Boutique
4,Downtown Toronto,0.0,Coffee Shop,Sushi Restaurant,Diner,Burrito Place,Bank,Bar,Beer Bar,Spa,Italian Restaurant,Burger Joint
6,North York,0.0,Asian Restaurant,Coffee Shop,Gym,Beer Store,Restaurant,Japanese Restaurant,Caribbean Restaurant,Supermarket,Baseball Field,Sporting Goods Shop
...,...,...,...,...,...,...,...,...,...,...,...,...
93,Downtown Toronto,0.0,Coffee Shop,Café,Restaurant,Hotel,Asian Restaurant,Deli / Bodega,Bar,Steakhouse,American Restaurant,Seafood Restaurant
95,Downtown Toronto,0.0,Coffee Shop,Sushi Restaurant,Japanese Restaurant,Restaurant,Yoga Studio,Hotel,Burger Joint,Pub,Mediterranean Restaurant,Men's Store
96,East Toronto,0.0,Light Rail Station,Garden Center,Pizza Place,Restaurant,Burrito Place,Brewery,Skate Park,Farmers Market,Fast Food Restaurant,Spa
97,Etobicoke,0.0,Locksmith,Breakfast Spot,Construction & Landscaping,Baseball Field,Eastern European Restaurant,Dog Run,Doner Restaurant,Donut Shop,Drugstore,Dumpling Restaurant


In [450]:
neigh_merged.loc[neigh_merged['Cluster Labels'] == 1, neigh_merged.columns[[1] + list(range(5, neigh_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
5,Scarborough,1.0,Fast Food Restaurant,Diner,Farmers Market,Falafel Restaurant,Event Space,Ethiopian Restaurant,Empanada Restaurant,Electronics Store,Eastern European Restaurant,Dumpling Restaurant
53,York,1.0,Fast Food Restaurant,Discount Store,Sandwich Place,Drugstore,Distribution Center,Dog Run,Doner Restaurant,Donut Shop,Yoga Studio,Diner


In [451]:
neigh_merged.loc[neigh_merged['Cluster Labels'] == 2, neigh_merged.columns[[1] + list(range(5, neigh_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
7,East York,2.0,Pizza Place,Gym / Fitness Center,Gastropub,Bus Line,Café,Intersection,Bank,Fast Food Restaurant,Athletics & Sports,Breakfast Spot
10,Scarborough,2.0,Bar,Golf Course,Yoga Studio,Eastern European Restaurant,Dog Run,Doner Restaurant,Donut Shop,Drugstore,Dumpling Restaurant,Electronics Store
27,East York,2.0,Indian Restaurant,Gas Station,Supermarket,Middle Eastern Restaurant,Bank,Discount Store,Liquor Store,Restaurant,Pizza Place,Pharmacy
29,West Toronto,2.0,Bakery,Pharmacy,Pizza Place,Brewery,Café,Middle Eastern Restaurant,Supermarket,Bar,Bank,Athletics & Sports
48,North York,2.0,Pizza Place,Empanada Restaurant,Yoga Studio,Drugstore,Discount Store,Distribution Center,Dog Run,Doner Restaurant,Donut Shop,Eastern European Restaurant
60,York,2.0,Grocery Store,Caribbean Restaurant,Pizza Place,Yoga Studio,Drugstore,Discount Store,Distribution Center,Dog Run,Doner Restaurant,Donut Shop
62,Scarborough,2.0,Indian Restaurant,Pet Store,Vietnamese Restaurant,Chinese Restaurant,Yoga Studio,Donut Shop,Diner,Discount Store,Distribution Center,Dog Run
64,Central Toronto,2.0,Park,Hotel,Food & Drink Shop,Sandwich Place,Department Store,Breakfast Spot,Gym,Convenience Store,Distribution Center,Event Space
67,Etobicoke,2.0,Pizza Place,Discount Store,Intersection,Middle Eastern Restaurant,Chinese Restaurant,Coffee Shop,Sandwich Place,Doner Restaurant,Distribution Center,Dog Run
68,Scarborough,2.0,Middle Eastern Restaurant,Breakfast Spot,Sandwich Place,Auto Garage,Bakery,Shopping Mall,Ethiopian Restaurant,Empanada Restaurant,Electronics Store,Eastern European Restaurant


In [452]:
neigh_merged.loc[neigh_merged['Cluster Labels'] == 3, neigh_merged.columns[[1] + list(range(5, neigh_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,North York,3.0,Park,Fireworks Store,Food & Drink Shop,Event Space,Ethiopian Restaurant,Empanada Restaurant,Electronics Store,Eastern European Restaurant,Farmers Market,Dumpling Restaurant
19,York,3.0,Park,Women's Store,Pool,Falafel Restaurant,Event Space,Ethiopian Restaurant,Empanada Restaurant,Electronics Store,Eastern European Restaurant,Dumpling Restaurant
33,East York,3.0,Park,Convenience Store,Metro Station,Eastern European Restaurant,Dog Run,Doner Restaurant,Donut Shop,Drugstore,Dumpling Restaurant,Electronics Store
43,North York,3.0,Park,Cafeteria,Farmers Market,Falafel Restaurant,Event Space,Ethiopian Restaurant,Empanada Restaurant,Electronics Store,Fast Food Restaurant,Dumpling Restaurant
47,North York,3.0,Park,Bakery,Construction & Landscaping,Eastern European Restaurant,Dog Run,Doner Restaurant,Donut Shop,Drugstore,Dumpling Restaurant,Electronics Store
58,Central Toronto,3.0,Park,Swim School,Bus Line,Drugstore,Distribution Center,Dog Run,Doner Restaurant,Donut Shop,Dumpling Restaurant,Diner
61,York,3.0,Park,Convenience Store,Dumpling Restaurant,Discount Store,Distribution Center,Dog Run,Doner Restaurant,Donut Shop,Drugstore,Eastern European Restaurant
63,North York,3.0,Park,Bank,Bar,Convenience Store,Dumpling Restaurant,Dog Run,Doner Restaurant,Donut Shop,Drugstore,Eastern European Restaurant
82,Scarborough,3.0,Park,Playground,Drugstore,Diner,Discount Store,Distribution Center,Dog Run,Doner Restaurant,Donut Shop,Yoga Studio
88,Downtown Toronto,3.0,Park,Trail,Playground,Diner,Discount Store,Distribution Center,Dog Run,Doner Restaurant,Donut Shop,Drugstore


In [453]:
neigh_merged.loc[neigh_merged['Cluster Labels'] == 4, neigh_merged.columns[[1] + list(range(5, neigh_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
30,Scarborough,4.0,Playground,Yoga Studio,Dumpling Restaurant,Discount Store,Distribution Center,Dog Run,Doner Restaurant,Donut Shop,Drugstore,Eastern European Restaurant
