# Analyzing Neighborhoods in <font color=lightblue>Downtown Toronto</font> VS. <font color=Pink>Downtown Vancouver</font>

##  Project objective: Explore the similarities between Downtown Toronto and Downtown Vancouver by analyzing the popular venues in the neighbourhoods

#### Importing Libraries

In [1]:
import numpy as np # library to handle data in a vectorized manner

import pandas as pd # library for data analsysis
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import json # library to handle JSON files

#!conda install -c conda-forge geopy --yes # uncomment this line if you haven't completed the Foursquare API lab
#!conda install -c conda-forge geopy --yes

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# import k-means from clustering stage
from sklearn.cluster import KMeans

#!pip install bs4
#!pip install plotly
from bs4 import BeautifulSoup
import plotly.graph_objects as go
from plotly.subplots import make_subplots

#!pip install lxml
#!pip install html5lib

print('Libraries imported.')

Libraries imported.


# <font color=lightblue>Part 1: Analyzing Neighbourhoods in Downtown Toronto</font>

# Wiki Link URL

In [2]:
my_url1=requests.get('https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M').text   #Wiki page for Toronto Postal Codes

# 1. Create Soup Object from URL

In [3]:
soup1 = BeautifulSoup(my_url1,"html5lib")

In [4]:
tag_object1=soup1.title
print("tag object:",tag_object1)

tag object: <title>List of postal codes of Canada: M - Wikipedia</title>


In [5]:
table_contents=[]
table=soup1.find('table')
table

<table cellpadding="2" cellspacing="0" rules="all" style="width:100%; border-collapse:collapse; border:1px solid #ccc;">

<tbody><tr>
<td style="width:11%; vertical-align:top; color:#ccc;">
<p><b>M1A</b><br/><span style="font-size:85%;"><i>Not assigned</i></span>
</p>
</td>
<td style="width:11%; vertical-align:top; color:#ccc;">
<p><b>M2A</b><br/><span style="font-size:85%;"><i>Not assigned</i></span>
</p>
</td>
<td style="width:11%; vertical-align:top;">
<p><b>M3A</b><br/><span style="font-size:85%;"><a href="/wiki/North_York" title="North York">North York</a><br/>(<a href="/wiki/Parkwoods" title="Parkwoods">Parkwoods</a>)</span>
</p>
</td>
<td style="width:11%; vertical-align:top;">
<p><b>M4A</b><br/><span style="font-size:85%;"><a href="/wiki/North_York" title="North York">North York</a><br/>(<a href="/wiki/Victoria_Village" title="Victoria Village">Victoria Village</a>)</span>
</p>
</td>
<td style="width:11%; vertical-align:top;">
<p><b>M5A</b><br/><span style="font-size:85%;"><a h

In [6]:
for row in table.findAll('td'):
    cell = {}
    if row.span.text=='Not assigned':
        pass
    else:
        cell['PostalCode'] = row.p.text[:3]
        cell['Borough'] = (row.span.text).split('(')[0]
        cell['Neighborhood'] = (((((row.span.text).split('(')[1]).strip(')')).replace(' /',',')).replace(')',' ')).strip(' ')
        table_contents.append(cell)
        
print(table_contents)


[{'PostalCode': 'M3A', 'Borough': 'North York', 'Neighborhood': 'Parkwoods'}, {'PostalCode': 'M4A', 'Borough': 'North York', 'Neighborhood': 'Victoria Village'}, {'PostalCode': 'M5A', 'Borough': 'Downtown Toronto', 'Neighborhood': 'Regent Park, Harbourfront'}, {'PostalCode': 'M6A', 'Borough': 'North York', 'Neighborhood': 'Lawrence Manor, Lawrence Heights'}, {'PostalCode': 'M7A', 'Borough': "Queen's Park", 'Neighborhood': 'Ontario Provincial Government'}, {'PostalCode': 'M9A', 'Borough': 'Etobicoke', 'Neighborhood': 'Islington Avenue'}, {'PostalCode': 'M1B', 'Borough': 'Scarborough', 'Neighborhood': 'Malvern, Rouge'}, {'PostalCode': 'M3B', 'Borough': 'North York', 'Neighborhood': 'Don Mills North'}, {'PostalCode': 'M4B', 'Borough': 'East York', 'Neighborhood': 'Parkview Hill, Woodbine Gardens'}, {'PostalCode': 'M5B', 'Borough': 'Downtown Toronto', 'Neighborhood': 'Garden District, Ryerson'}, {'PostalCode': 'M6B', 'Borough': 'North York', 'Neighborhood': 'Glencairn'}, {'PostalCode': 'M9

# Generating the table with PostalCode, Borough, and Neighborhood.

In [7]:
df=pd.DataFrame(table_contents)
df['Borough']=df['Borough'].replace({'Downtown TorontoStn A PO Boxes25 The Esplanade':'Downtown Toronto Stn A',
                                             'East TorontoBusiness reply mail Processing Centre969 Eastern':'East Toronto Business',
                                             'EtobicokeNorthwest':'Etobicoke Northwest','East YorkEast Toronto':'East York/East Toronto',
                                             'MississaugaCanada Post Gateway Processing Centre':'Mississauga'})
df

Unnamed: 0,PostalCode,Borough,Neighborhood
0,M3A,North York,Parkwoods
1,M4A,North York,Victoria Village
2,M5A,Downtown Toronto,"Regent Park, Harbourfront"
3,M6A,North York,"Lawrence Manor, Lawrence Heights"
4,M7A,Queen's Park,Ontario Provincial Government
5,M9A,Etobicoke,Islington Avenue
6,M1B,Scarborough,"Malvern, Rouge"
7,M3B,North York,Don Mills North
8,M4B,East York,"Parkview Hill, Woodbine Gardens"
9,M5B,Downtown Toronto,"Garden District, Ryerson"


## Shape of the table:

In [8]:
df.shape

(103, 3)

## 2. Import Coordinates data frame from csv file and join to original table on postal code to get Latitude and Longitude columns

In [9]:
data_cds = pd.read_csv('https://cocl.us/Geospatial_data') 
data_cds.head()

Unnamed: 0,Postal Code,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497
2,M1E,43.763573,-79.188711
3,M1G,43.770992,-79.216917
4,M1H,43.773136,-79.239476


## Dataframe with Coordinates:

In [10]:
df_final = pd.merge(df, data_cds, left_on='PostalCode', right_on='Postal Code', how='left').drop('Postal Code', axis=1)
df_final.head(20)

Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude
0,M3A,North York,Parkwoods,43.753259,-79.329656
1,M4A,North York,Victoria Village,43.725882,-79.315572
2,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.65426,-79.360636
3,M6A,North York,"Lawrence Manor, Lawrence Heights",43.718518,-79.464763
4,M7A,Queen's Park,Ontario Provincial Government,43.662301,-79.389494
5,M9A,Etobicoke,Islington Avenue,43.667856,-79.532242
6,M1B,Scarborough,"Malvern, Rouge",43.806686,-79.194353
7,M3B,North York,Don Mills North,43.745906,-79.352188
8,M4B,East York,"Parkview Hill, Woodbine Gardens",43.706397,-79.309937
9,M5B,Downtown Toronto,"Garden District, Ryerson",43.657162,-79.378937


# 3. Clustering and Mapping

In [11]:
#pip install folium

In [12]:
# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

#!conda install -c conda-forge folium=0.5.0 --yes # uncomment this line if you haven't completed the Foursquare API lab
import folium # map rendering library

In [13]:
trt_dt_data = df_final[df_final['Borough'] == 'Downtown Toronto'].reset_index(drop=True)
trt_dt_data

Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude
0,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.65426,-79.360636
1,M5B,Downtown Toronto,"Garden District, Ryerson",43.657162,-79.378937
2,M5C,Downtown Toronto,St. James Town,43.651494,-79.375418
3,M5E,Downtown Toronto,Berczy Park,43.644771,-79.373306
4,M5G,Downtown Toronto,Central Bay Street,43.657952,-79.387383
5,M6G,Downtown Toronto,Christie,43.669542,-79.422564
6,M5H,Downtown Toronto,"Richmond, Adelaide, King",43.650571,-79.384568
7,M5J,Downtown Toronto,"Harbourfront East, Union Station, Toronto Islands",43.640816,-79.381752
8,M5K,Downtown Toronto,"Toronto Dominion Centre, Design Exchange",43.647177,-79.381576
9,M5L,Downtown Toronto,"Commerce Court, Victoria Hotel",43.648198,-79.379817


## Visualize the Toronto Neighbourhood

In [14]:
#pip install Nominatim
from geopy.geocoders import Nominatim

In [15]:
address = 'Toronto, ON'

geolocator = Nominatim(user_agent="Toronto_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Toronto are {}, {}.'.format(latitude, longitude))

# create map of Manhattan using latitude and longitude values
map_trt = folium.Map(location=[latitude, longitude], zoom_start=11)

# add markers to map
for lat, lng, label in zip(trt_dt_data['Latitude'], trt_dt_data['Longitude'], trt_dt_data['Neighborhood']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_trt)  
    
map_trt

The geograpical coordinate of Toronto are 43.6534817, -79.3839347.


## Define Foursquare Credentials and Version

In [16]:

CLIENT_ID = 'GL3GBNB0DRIQ3Z5QJE54ZHPPKZARCKPRBAKRH2NDF0C4DLP2' # your Foursquare ID
CLIENT_SECRET = 'FS1GSVHKHW4NKIWWUYC4XQISQ0GEOBHVHCP0XR5W2PCBG3FR' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version
LIMIT = 100 # A default Foursquare API limit value

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)




Your credentails:
CLIENT_ID: GL3GBNB0DRIQ3Z5QJE54ZHPPKZARCKPRBAKRH2NDF0C4DLP2
CLIENT_SECRET:FS1GSVHKHW4NKIWWUYC4XQISQ0GEOBHVHCP0XR5W2PCBG3FR


In [17]:


trt_dt_data.loc[0, 'Neighborhood']


neighborhood_latitude = trt_dt_data.loc[0, 'Latitude'] # neighborhood latitude value
neighborhood_longitude = trt_dt_data.loc[0, 'Longitude'] # neighborhood longitude value

neighborhood_name = trt_dt_data.loc[0, 'Neighborhood'] # neighborhood name

print('Latitude and longitude values of {} are {}, {}.'.format(neighborhood_name, 
                                                               neighborhood_latitude, 
                                                               neighborhood_longitude))




Latitude and longitude values of Regent Park, Harbourfront are 43.6542599, -79.3606359.


## Let's get the top 100 venues that are in Regent Park, Harbourfront within a radius of 500 meters

In [18]:
radius = 500
url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
    CLIENT_ID, 
    CLIENT_SECRET, 
    VERSION, 
    neighborhood_latitude, 
    neighborhood_longitude, 
    radius, 
    LIMIT)
url


results = requests.get(url).json()
results


{'meta': {'code': 200, 'requestId': '606e74f7a465a5165e55c7a1'},
 'response': {'suggestedFilters': {'header': 'Tap to show:',
   'filters': [{'name': 'Open now', 'key': 'openNow'}]},
  'headerLocation': 'Corktown',
  'headerFullLocation': 'Corktown, Toronto',
  'headerLocationGranularity': 'neighborhood',
  'totalResults': 44,
  'suggestedBounds': {'ne': {'lat': 43.6587599045, 'lng': -79.3544279001486},
   'sw': {'lat': 43.6497598955, 'lng': -79.36684389985142}},
  'groups': [{'type': 'Recommended Places',
    'name': 'recommended',
    'items': [{'reasons': {'count': 0,
       'items': [{'summary': 'This spot is popular',
         'type': 'general',
         'reasonName': 'globalInteractionReason'}]},
      'venue': {'id': '53b8466a498e83df908c3f21',
       'name': 'Tandem Coffee',
       'location': {'address': '368 King St E',
        'crossStreet': 'at Trinity St',
        'lat': 43.65355870959944,
        'lng': -79.36180945913513,
        'labeledLatLngs': [{'label': 'display',
 

In [19]:
# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']
    
    
venues = results['response']['groups'][0]['items']
    
nearby_venues = pd.json_normalize(venues) # flatten JSON


In [20]:

# filter columns
filtered_columns = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
nearby_venues =nearby_venues.loc[:, filtered_columns]

# filter the category for each row
nearby_venues['venue.categories'] = nearby_venues.apply(get_category_type, axis=1)

# clean columns
nearby_venues.columns = [col.split(".")[-1] for col in nearby_venues.columns]

nearby_venues.head()


print('{} venues were returned by Foursquare.'.format(nearby_venues.shape[0]))


44 venues were returned by Foursquare.


## Explore Neighborhoods in Toronto

In [21]:


def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)



trt_venues = getNearbyVenues(names=trt_dt_data['Neighborhood'],
                                   latitudes=trt_dt_data['Latitude'],
                                   longitudes=trt_dt_data['Longitude']
                                  )


print(trt_venues.shape)
trt_venues.head()


Regent Park, Harbourfront
Garden District, Ryerson
St. James Town
Berczy Park
Central Bay Street
Christie
Richmond, Adelaide, King
Harbourfront East, Union Station, Toronto Islands
Toronto Dominion Centre, Design Exchange
Commerce Court, Victoria Hotel
University of Toronto, Harbord
Kensington Market, Chinatown, Grange Park
CN Tower, King and Spadina, Railway Lands, Harbourfront West, Bathurst Quay, South Niagara, Island airport
Rosedale
St. James Town, Cabbagetown
First Canadian Place, Underground city
Church and Wellesley
(1095, 7)


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,"Regent Park, Harbourfront",43.65426,-79.360636,Tandem Coffee,43.653559,-79.361809,Coffee Shop
1,"Regent Park, Harbourfront",43.65426,-79.360636,Roselle Desserts,43.653447,-79.362017,Bakery
2,"Regent Park, Harbourfront",43.65426,-79.360636,Cooper Koo Family YMCA,43.653249,-79.358008,Distribution Center
3,"Regent Park, Harbourfront",43.65426,-79.360636,Impact Kitchen,43.656369,-79.35698,Restaurant
4,"Regent Park, Harbourfront",43.65426,-79.360636,Body Blitz Spa East,43.654735,-79.359874,Spa


## Analyze Each Neighborhood

In [22]:


trt_venues.groupby('Neighborhood').count()

print('There are {} uniques categories.'.format(len(trt_venues['Venue Category'].unique())))


# one hot encoding
trt_onehot = pd.get_dummies(trt_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
trt_onehot['Neighborhood'] = trt_venues['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [trt_onehot.columns[-1]] + list(trt_onehot.columns[:-1])
trt_onehot = trt_onehot[fixed_columns]

trt_onehot.head()




There are 205 uniques categories.


Unnamed: 0,Yoga Studio,Airport,Airport Food Court,Airport Gate,Airport Lounge,Airport Service,Airport Terminal,American Restaurant,Antique Shop,Aquarium,Art Gallery,Art Museum,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,BBQ Joint,Baby Store,Bagel Shop,Bakery,Bank,Bar,Baseball Stadium,Basketball Stadium,Beach,Bed & Breakfast,Beer Bar,Beer Store,Belgian Restaurant,Bistro,Boat or Ferry,Bookstore,Boutique,Brazilian Restaurant,Breakfast Spot,Brewery,Bubble Tea Shop,Building,Burger Joint,Burrito Place,Butcher,Café,Candy Store,Caribbean Restaurant,Cheese Shop,Chinese Restaurant,Chocolate Shop,Clothing Store,Cocktail Bar,Coffee Shop,College Arts Building,College Gym,College Rec Center,Colombian Restaurant,Comfort Food Restaurant,Comic Shop,Concert Hall,Cosmetics Shop,Creperie,Cupcake Shop,Dance Studio,Deli / Bodega,Department Store,Dessert Shop,Diner,Discount Store,Distribution Center,Dog Run,Doner Restaurant,Donut Shop,Dumpling Restaurant,Eastern European Restaurant,Electronics Store,Escape Room,Ethiopian Restaurant,Event Space,Falafel Restaurant,Farmers Market,Fast Food Restaurant,Filipino Restaurant,Fish Market,Food & Drink Shop,Food Court,Food Truck,Fountain,French Restaurant,Fried Chicken Joint,Furniture / Home Store,Gaming Cafe,Garden,Gas Station,Gastropub,Gay Bar,General Entertainment,General Travel,German Restaurant,Gift Shop,Gluten-free Restaurant,Gourmet Shop,Greek Restaurant,Grocery Store,Gym,Gym / Fitness Center,Harbor / Marina,Health & Beauty Service,Historic Site,History Museum,Hookah Bar,Hospital,Hotel,Hotel Bar,IT Services,Ice Cream Shop,Indian Restaurant,Indie Movie Theater,Irish Pub,Italian Restaurant,Japanese Restaurant,Jazz Club,Jewelry Store,Juice Bar,Knitting Store,Korean Restaurant,Lake,Latin American Restaurant,Lingerie Store,Liquor Store,Lounge,Market,Martial Arts School,Massage Studio,Mediterranean Restaurant,Men's Store,Mexican Restaurant,Middle Eastern Restaurant,Miscellaneous Shop,Modern European Restaurant,Molecular Gastronomy Restaurant,Monument / Landmark,Moroccan Restaurant,Movie Theater,Museum,Music Venue,Neighborhood,New American Restaurant,Nightclub,Noodle House,Office,Opera House,Organic Grocery,Other Great Outdoors,Park,Performing Arts Venue,Pet Store,Pharmacy,Pizza Place,Plane,Playground,Plaza,Poke Place,Portuguese Restaurant,Poutine Place,Pub,Ramen Restaurant,Record Shop,Rental Car Location,Restaurant,Roof Deck,Sake Bar,Salad Place,Salon / Barbershop,Sandwich Place,Scenic Lookout,Sculpture Garden,Seafood Restaurant,Shoe Store,Shopping Mall,Skating Rink,Smoke Shop,Smoothie Shop,Snack Place,Soup Place,Spa,Speakeasy,Sporting Goods Shop,Sports Bar,Steakhouse,Strip Club,Supermarket,Sushi Restaurant,Taco Place,Tailor Shop,Taiwanese Restaurant,Tanning Salon,Tea Room,Thai Restaurant,Theater,Theme Restaurant,Trail,Train Station,Vegetarian / Vegan Restaurant,Video Game Store,Vietnamese Restaurant,Wine Bar,Wine Shop,Wings Joint
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,"Regent Park, Harbourfront",0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,"Regent Park, Harbourfront",0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,"Regent Park, Harbourfront",0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,"Regent Park, Harbourfront",0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,"Regent Park, Harbourfront",0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


In [23]:

trt_onehot.shape



(1095, 205)

In [24]:

trt_grouped = trt_onehot.groupby('Neighborhood').mean().reset_index()
trt_grouped

trt_grouped.shape



(17, 205)

## Print each neighborhood along with the top 5 most common venues

In [25]:

num_top_venues = 5

for hood in trt_grouped['Neighborhood']:
    print("----"+hood+"----")
    temp = trt_grouped[trt_grouped['Neighborhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')
    
    


----Berczy Park----
            venue  freq
0     Coffee Shop  0.07
1          Bakery  0.05
2    Cocktail Bar  0.05
3  Farmers Market  0.03
4        Pharmacy  0.03


----CN Tower, King and Spadina, Railway Lands, Harbourfront West, Bathurst Quay, South Niagara, Island airport----
              venue  freq
0    Airport Lounge  0.12
1   Airport Service  0.12
2  Airport Terminal  0.12
3   Harbor / Marina  0.06
4               Bar  0.06


----Central Bay Street----
                venue  freq
0         Coffee Shop  0.17
1                Café  0.06
2      Sandwich Place  0.06
3  Italian Restaurant  0.05
4        Burger Joint  0.03


----Christie----
           venue  freq
0  Grocery Store  0.25
1           Café  0.19
2           Park  0.12
3    Candy Store  0.06
4     Baby Store  0.06


----Church and Wellesley----
                  venue  freq
0           Coffee Shop  0.08
1      Sushi Restaurant  0.07
2   Japanese Restaurant  0.07
3            Restaurant  0.04
4  Fast Food Restaurant  0.0

In [26]:

def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]


## Create the new dataframe and display the top 10 venues for each neighborhood

In [27]:


num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = trt_grouped['Neighborhood']

for ind in np.arange(trt_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(trt_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted.head()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Berczy Park,Coffee Shop,Bakery,Cocktail Bar,Cheese Shop,Pharmacy,Beer Bar,Farmers Market,Restaurant,Seafood Restaurant,Liquor Store
1,"CN Tower, King and Spadina, Railway Lands, Har...",Airport Lounge,Airport Service,Airport Terminal,Coffee Shop,Plane,Sculpture Garden,Boat or Ferry,Rental Car Location,Bar,Boutique
2,Central Bay Street,Coffee Shop,Sandwich Place,Café,Italian Restaurant,Burger Joint,Salad Place,Bubble Tea Shop,Thai Restaurant,Portuguese Restaurant,Poke Place
3,Christie,Grocery Store,Café,Park,Italian Restaurant,Athletics & Sports,Restaurant,Candy Store,Baby Store,Nightclub,Coffee Shop
4,Church and Wellesley,Coffee Shop,Japanese Restaurant,Sushi Restaurant,Restaurant,Yoga Studio,Men's Store,Mediterranean Restaurant,Hotel,Gay Bar,Fast Food Restaurant


## Run k-means to cluster the neighborhood into 5 clusters

In [28]:
# set number of clusters
kclusters = 5

trt_grouped_clustering = trt_grouped.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(trt_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 


# add clustering labels
neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

trt_merged = trt_dt_data

# merge manhattan_grouped with manhattan_data to add latitude/longitude for each neighborhood
trt_merged = trt_merged.join(neighborhoods_venues_sorted.set_index('Neighborhood'), on='Neighborhood')

trt_merged.head() # check the last columns!



Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.65426,-79.360636,0,Coffee Shop,Pub,Bakery,Park,Breakfast Spot,Café,Theater,Gym / Fitness Center,Event Space,Performing Arts Venue
1,M5B,Downtown Toronto,"Garden District, Ryerson",43.657162,-79.378937,0,Clothing Store,Coffee Shop,Bubble Tea Shop,Café,Middle Eastern Restaurant,Cosmetics Shop,Hotel,Ramen Restaurant,Bookstore,Burger Joint
2,M5C,Downtown Toronto,St. James Town,43.651494,-79.375418,0,Coffee Shop,Café,Cosmetics Shop,Cocktail Bar,Farmers Market,Park,Restaurant,American Restaurant,Italian Restaurant,Creperie
3,M5E,Downtown Toronto,Berczy Park,43.644771,-79.373306,0,Coffee Shop,Bakery,Cocktail Bar,Cheese Shop,Pharmacy,Beer Bar,Farmers Market,Restaurant,Seafood Restaurant,Liquor Store
4,M5G,Downtown Toronto,Central Bay Street,43.657952,-79.387383,0,Coffee Shop,Sandwich Place,Café,Italian Restaurant,Burger Joint,Salad Place,Bubble Tea Shop,Thai Restaurant,Portuguese Restaurant,Poke Place


## Visualize the resulting clusters

In [29]:

# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(trt_merged['Latitude'], trt_merged['Longitude'], trt_merged['Neighborhood'], trt_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

## Examine the Five Clusters

In [30]:
trt_merged.loc[trt_merged['Cluster Labels'] == 0, trt_merged.columns[[1] + list(range(5, trt_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Downtown Toronto,0,Coffee Shop,Pub,Bakery,Park,Breakfast Spot,Café,Theater,Gym / Fitness Center,Event Space,Performing Arts Venue
1,Downtown Toronto,0,Clothing Store,Coffee Shop,Bubble Tea Shop,Café,Middle Eastern Restaurant,Cosmetics Shop,Hotel,Ramen Restaurant,Bookstore,Burger Joint
2,Downtown Toronto,0,Coffee Shop,Café,Cosmetics Shop,Cocktail Bar,Farmers Market,Park,Restaurant,American Restaurant,Italian Restaurant,Creperie
3,Downtown Toronto,0,Coffee Shop,Bakery,Cocktail Bar,Cheese Shop,Pharmacy,Beer Bar,Farmers Market,Restaurant,Seafood Restaurant,Liquor Store
4,Downtown Toronto,0,Coffee Shop,Sandwich Place,Café,Italian Restaurant,Burger Joint,Salad Place,Bubble Tea Shop,Thai Restaurant,Portuguese Restaurant,Poke Place
6,Downtown Toronto,0,Coffee Shop,Café,Restaurant,Deli / Bodega,Clothing Store,Gym,Thai Restaurant,Hotel,Cosmetics Shop,Sushi Restaurant
7,Downtown Toronto,0,Coffee Shop,Aquarium,Hotel,Café,Italian Restaurant,Sporting Goods Shop,Scenic Lookout,Brewery,Fried Chicken Joint,Restaurant
8,Downtown Toronto,0,Coffee Shop,Hotel,Café,Seafood Restaurant,Italian Restaurant,Restaurant,Japanese Restaurant,Salad Place,Sporting Goods Shop,Breakfast Spot
9,Downtown Toronto,0,Coffee Shop,Restaurant,Café,Hotel,Italian Restaurant,Gym,Seafood Restaurant,Deli / Bodega,Japanese Restaurant,American Restaurant
11,Downtown Toronto,0,Café,Coffee Shop,Mexican Restaurant,Vietnamese Restaurant,Vegetarian / Vegan Restaurant,Park,Arts & Crafts Store,Gaming Cafe,Bakery,Bar


In [31]:
trt_merged.loc[trt_merged['Cluster Labels'] == 1, trt_merged.columns[[1] + list(range(5, trt_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
13,Downtown Toronto,1,Park,Playground,Trail,Department Store,Electronics Store,Eastern European Restaurant,Dumpling Restaurant,Donut Shop,Doner Restaurant,Dog Run


In [32]:
trt_merged.loc[trt_merged['Cluster Labels'] == 2, trt_merged.columns[[1] + list(range(5, trt_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
5,Downtown Toronto,2,Grocery Store,Café,Park,Italian Restaurant,Athletics & Sports,Restaurant,Candy Store,Baby Store,Nightclub,Coffee Shop


In [33]:
trt_merged.loc[trt_merged['Cluster Labels'] == 3, trt_merged.columns[[1] + list(range(5, trt_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
12,Downtown Toronto,3,Airport Lounge,Airport Service,Airport Terminal,Coffee Shop,Plane,Sculpture Garden,Boat or Ferry,Rental Car Location,Bar,Boutique


In [34]:
trt_merged.loc[trt_merged['Cluster Labels'] == 4, trt_merged.columns[[1] + list(range(5, trt_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
10,Downtown Toronto,4,Café,Bookstore,Bar,Italian Restaurant,Japanese Restaurant,Sandwich Place,Bakery,Yoga Studio,Beer Bar,Beer Store


# <font color=pink>Part 2: Analyzing Neighbourhoods in Downtown Vancouver</font>

In [35]:
my_url2=requests.get('https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_V').text   #Wiki page for Vancouver Postal Codes

In [36]:
soup2 = BeautifulSoup(my_url2,"html5lib")

In [37]:
tag_object2=soup2.title
print("tag object:",tag_object2)

tag object: <title>List of postal codes of Canada: V - Wikipedia</title>


In [38]:
table_contents2=[]
table2=soup2.find('table')
table2

<table class="wikitable sortable">
<tbody><tr>
<td valign="top" width="11.1%"><b>V1A</b><br/><span style="font-size: smaller; line-height: 125%;"><a href="/wiki/Kimberley,_British_Columbia" title="Kimberley, British Columbia">Kimberley</a></span>
</td>
<td valign="top" width="11.1%"><b>V2A</b><br/><span style="font-size: smaller; line-height: 125%;"><a href="/wiki/Penticton" title="Penticton">Penticton</a></span>
</td>
<td valign="top" width="11.1%"><b>V3A</b><br/><span style="font-size: smaller; line-height: 125%;"><a href="/wiki/Langley,_British_Columbia_(district_municipality)" title="Langley, British Columbia (district municipality)">Langley Township</a><br/>(Langley City)</span>
</td>
<td valign="top" width="11.1%"><b>V4A</b><br/><span style="font-size: smaller; line-height: 125%;"><a href="/wiki/Surrey,_British_Columbia" title="Surrey, British Columbia">Surrey</a><br/>Southwest</span>
</td>
<td valign="top" width="11.1%"><b>V5A</b><br/><span style="font-size: smaller; line-height

In [39]:
for row in table2.findAll('td'):
    cell = {}
    if row.span.text=='Not assigned':
        pass
    else:
        cell['PostalCode'] = row.text[:3]
        cell['City'] = (row.span.text).split('(')[0]
        table_contents2.append(cell)
        
print(table_contents2)


[{'PostalCode': 'V1A', 'City': 'Kimberley'}, {'PostalCode': 'V2A', 'City': 'Penticton'}, {'PostalCode': 'V3A', 'City': 'Langley Township'}, {'PostalCode': 'V4A', 'City': 'SurreySouthwest'}, {'PostalCode': 'V5A', 'City': 'Burnaby'}, {'PostalCode': 'V6A', 'City': 'Vancouver'}, {'PostalCode': 'V7A', 'City': 'RichmondSouth'}, {'PostalCode': 'V8A', 'City': 'Powell River'}, {'PostalCode': 'V9A', 'City': 'Victoria'}, {'PostalCode': 'V1B', 'City': 'VernonEast'}, {'PostalCode': 'V2B', 'City': 'KamloopsNorthwest'}, {'PostalCode': 'V3B', 'City': 'Port CoquitlamCentral'}, {'PostalCode': 'V4B', 'City': 'White Rock'}, {'PostalCode': 'V5B', 'City': 'Burnaby'}, {'PostalCode': 'V6B', 'City': 'Vancouver'}, {'PostalCode': 'V7B', 'City': 'Richmond'}, {'PostalCode': 'V8B', 'City': 'Squamish'}, {'PostalCode': 'V9B', 'City': 'Victoria'}, {'PostalCode': 'V1C', 'City': 'Cranbrook'}, {'PostalCode': 'V2C', 'City': 'KamloopsCentral and Southeast'}, {'PostalCode': 'V3C', 'City': 'Port CoquitlamSouth'}, {'PostalCod

In [40]:
df2=pd.DataFrame(table_contents2)
df2

Unnamed: 0,PostalCode,City
0,V1A,Kimberley
1,V2A,Penticton
2,V3A,Langley Township
3,V4A,SurreySouthwest
4,V5A,Burnaby
5,V6A,Vancouver
6,V7A,RichmondSouth
7,V8A,Powell River
8,V9A,Victoria
9,V1B,VernonEast


In [41]:
df_van = df2.loc[df2['City'] == 'Vancouver'] #only choose the postal codes for Vancouver
df_van = df_van.head(20)
df_van

Unnamed: 0,PostalCode,City
5,V6A,Vancouver
14,V6B,Vancouver
23,V6C,Vancouver
32,V6E,Vancouver
41,V6G,Vancouver
49,V6H,Vancouver
56,V6J,Vancouver
64,V5K,Vancouver
65,V6K,Vancouver
73,V5L,Vancouver


## Shape of the table:

In [42]:
df_van.shape

(20, 2)

In [43]:
#pip install geocoder

## 2. Get coordinates based on postal codes using Geopy

In [44]:
import geocoder # import geocoder
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values


ModuleNotFoundError: No module named 'geocoder'

## Dataframe with Coordinates:

In [46]:
lat = []
lon=[]
for i in df_van['PostalCode']:
    address = '{}'.format(i)
    geolocator = Nominatim(user_agent="ny_explorer")
    location = geolocator.geocode(address)
    latitude = location.latitude
    longitude = location.longitude
    lat.append('{}'.format(latitude))
    lon.append('{}'.format(longitude))
df_van['Latitude'] = lat
df_van['Longitude'] = lon
df_van

Unnamed: 0,PostalCode,City,Latitude,Longitude
5,V6A,Vancouver,49.27377017128552,-123.09947872733052
14,V6B,Vancouver,49.27811172378708,-123.11959791158984
23,V6C,Vancouver,49.287716588954346,-123.11519411838532
32,V6E,Vancouver,49.28801282544443,-123.12108056609158
41,V6G,Vancouver,49.3002702,-123.13779663860902
49,V6H,Vancouver,49.25680006013691,-123.1331282550357
56,V6J,Vancouver,49.26091372148124,-123.14577875870242
64,V5K,Vancouver,49.28171754656246,-123.0400063294856
65,V6K,Vancouver,49.26895274770836,-123.1650191687676
73,V5L,Vancouver,49.280200918758624,-123.06656328873324


# 3. Clustering and Mapping

In [47]:
Vancouver_dt_data = df_van.reset_index(drop=True)
Vancouver_dt_data

Unnamed: 0,PostalCode,City,Latitude,Longitude
0,V6A,Vancouver,49.27377017128552,-123.09947872733052
1,V6B,Vancouver,49.27811172378708,-123.11959791158984
2,V6C,Vancouver,49.287716588954346,-123.11519411838532
3,V6E,Vancouver,49.28801282544443,-123.12108056609158
4,V6G,Vancouver,49.3002702,-123.13779663860902
5,V6H,Vancouver,49.25680006013691,-123.1331282550357
6,V6J,Vancouver,49.26091372148124,-123.14577875870242
7,V5K,Vancouver,49.28171754656246,-123.0400063294856
8,V6K,Vancouver,49.26895274770836,-123.1650191687676
9,V5L,Vancouver,49.280200918758624,-123.06656328873324


## Visualize the Vancouver Neighbourhood

In [55]:
address = 'Vancouver, BC'

geolocator = Nominatim(user_agent="Vancouver_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Vancouver are {}, {}.'.format(latitude, longitude))

# create map of Manhattan using latitude and longitude values
map_vcv = folium.Map(location=[latitude, longitude], zoom_start=11)

# add markers to map
for lat, lng, label in zip(Vancouver_dt_data['Latitude'], Vancouver_dt_data['Longitude'], Vancouver_dt_data['PostalCode']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_vcv)  
    
map_vcv

The geograpical coordinate of Vancouver are 49.2608724, -123.1139529.


## Explore nearby venues of downtown Vancouver

In [56]:
vcv_venues = getNearbyVenues(names=Vancouver_dt_data['PostalCode'],
                                   latitudes=Vancouver_dt_data['Latitude'],
                                   longitudes=Vancouver_dt_data['Longitude']
                                  )


print(vcv_venues.shape)
vcv_venues.head()

V6A
V6B
V6C
V6E
V6G
V6H
V6J
V5K
V6K
V5L
V6L
V5M
V6M
V5N
V6N
V5P
V6P
V5R
V6R
V5S
(529, 7)


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,V6A,49.27377017128552,-123.09947872733052,Creekside Park,49.274641,-123.102701,Park
1,V6A,49.27377017128552,-123.09947872733052,Boxcar,49.276613,-123.100076,Bar
2,V6A,49.27377017128552,-123.09947872733052,Pizzeria Farina,49.276636,-123.099967,Pizza Place
3,V6A,49.27377017128552,-123.09947872733052,Torafuku,49.275951,-123.099814,Asian Restaurant
4,V6A,49.27377017128552,-123.09947872733052,Science World at TELUS World of Science,49.27331,-123.103501,Science Museum


## Analyze Each Neighborhood

In [59]:
vcv_venues.groupby('Neighborhood').count()

print('There are {} uniques categories.'.format(len(vcv_venues['Venue Category'].unique())))


# one hot encoding
vcv_onehot = pd.get_dummies(vcv_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
vcv_onehot['Neighborhood'] = vcv_venues['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [vcv_onehot.columns[-1]] + list(vcv_onehot.columns[:-1])
vcv_onehot = vcv_onehot[fixed_columns]

vcv_onehot.head(15)

There are 169 uniques categories.


Unnamed: 0,Neighborhood,Accessories Store,Airport Terminal,American Restaurant,Amphitheater,Aquarium,Art Gallery,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,Australian Restaurant,Bakery,Bank,Bar,Baseball Field,Beach,Beer Garden,Boat or Ferry,Bookstore,Bowling Alley,Breakfast Spot,Brewery,Building,Burger Joint,Burrito Place,Bus Line,Bus Station,Bus Stop,Cafeteria,Café,Caribbean Restaurant,Chinese Restaurant,Chocolate Shop,Circus,Clothing Store,Cocktail Bar,Coffee Shop,Community Center,Concert Hall,Convenience Store,Cosmetics Shop,Cruise Ship,Dance Studio,Deli / Bodega,Department Store,Dessert Shop,Diner,Dive Bar,Donut Shop,Electronics Store,Ethiopian Restaurant,Event Space,Exhibit,Fair,Falafel Restaurant,Farm,Farmers Market,Fast Food Restaurant,Financial or Legal Service,Fish & Chips Shop,Fish Market,Food & Drink Shop,Food Court,Food Truck,French Restaurant,Fried Chicken Joint,Furniture / Home Store,Garden,Gas Station,Gastropub,Golf Course,Gourmet Shop,Greek Restaurant,Grocery Store,Gym,Gym / Fitness Center,Hawaiian Restaurant,Historic Site,History Museum,Hockey Arena,Hostel,Hot Dog Joint,Hotel,Hotel Bar,Hotpot Restaurant,Ice Cream Shop,Indian Restaurant,Inn,Insurance Office,Irish Pub,Italian Restaurant,Japanese Curry Restaurant,Japanese Restaurant,Jewelry Store,Korean Restaurant,Latin American Restaurant,Leather Goods Store,Lebanese Restaurant,Liquor Store,Market,Mediterranean Restaurant,Men's Store,Metro Station,Mexican Restaurant,Middle Eastern Restaurant,Miscellaneous Shop,Mobile Phone Shop,Motorcycle Shop,Movie Theater,Music Store,Music Venue,New American Restaurant,Noodle House,Office,Other Great Outdoors,Outdoor Sculpture,Paper / Office Supplies Store,Park,Performing Arts Venue,Pet Store,Pharmacy,Pizza Place,Playground,Plaza,Poke Place,Pool,Post Office,Print Shop,Pub,Ramen Restaurant,Record Shop,Restaurant,Salad Place,Salon / Barbershop,Sandwich Place,Scenic Lookout,Science Museum,Seafood Restaurant,Shoe Store,Smoke Shop,Smoothie Shop,Soccer Field,Southern / Soul Food Restaurant,Spa,Spanish Restaurant,Sporting Goods Shop,Stadium,Steakhouse,Supermarket,Sushi Restaurant,Taco Place,Tapas Restaurant,Tea Room,Tennis Court,Thai Restaurant,Theater,Theme Park,Theme Park Ride / Attraction,Thrift / Vintage Store,Toy / Game Store,Trail,Train Station,Vegetarian / Vegan Restaurant,Vietnamese Restaurant,Waterfront,Wine Bar,Wine Shop,Women's Store,Yoga Studio,Zoo Exhibit
0,V6A,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,V6A,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,V6A,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,V6A,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,V6A,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
5,V6A,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0
6,V6A,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
7,V6A,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
8,V6A,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
9,V6A,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


In [60]:

vcv_onehot.shape



(529, 170)

In [61]:

vcv_grouped = vcv_onehot.groupby('Neighborhood').mean().reset_index()
vcv_grouped

vcv_grouped.shape



(20, 170)

## Print each neighborhood along with the top 5 most common venues

In [62]:

num_top_venues = 5

for hood in vcv_grouped['Neighborhood']:
    print("----"+hood+"----")
    temp = vcv_grouped[vcv_grouped['Neighborhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')
    
    


----V5K----
                          venue  freq
0                   Event Space  0.08
1                    Theme Park  0.08
2  Theme Park Ride / Attraction  0.08
3              Sushi Restaurant  0.04
4                   Beer Garden  0.04


----V5L----
                venue  freq
0  Chinese Restaurant  0.10
1              Bakery  0.10
2             Brewery  0.10
3         Coffee Shop  0.07
4                Café  0.07


----V5M----
          venue  freq
0      Bus Stop  0.09
1   Pizza Place  0.09
2   Coffee Shop  0.09
3    Restaurant  0.05
4  Liquor Store  0.05


----V5N----
                  venue  freq
0           Coffee Shop  0.14
1      Sushi Restaurant  0.12
2           Pizza Place  0.07
3  Ethiopian Restaurant  0.05
4         Grocery Store  0.05


----V5P----
             venue  freq
0      Pizza Place  0.14
1      Post Office  0.05
2     Liquor Store  0.05
3     Noodle House  0.05
4  Motorcycle Shop  0.05


----V5R----
               venue  freq
0                Bar  0.17
1     

## Create the new dataframe and display the top 10 venues for each neighborhood

In [67]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted2 = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted2['Neighborhood'] = vcv_grouped['Neighborhood']

for ind in np.arange(trt_grouped.shape[0]):
    neighborhoods_venues_sorted2.iloc[ind, 1:] = return_most_common_venues(vcv_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted2.head()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,V5K,Event Space,Theme Park Ride / Attraction,Theme Park,Bakery,Bus Line,Market,Soccer Field,Park,Farm,Stadium
1,V5L,Chinese Restaurant,Brewery,Bakery,Coffee Shop,Café,Theater,Convenience Store,Thai Restaurant,Electronics Store,Deli / Bodega
2,V5M,Pizza Place,Coffee Shop,Bus Stop,Office,Mexican Restaurant,Metro Station,Pet Store,Chinese Restaurant,Liquor Store,Sandwich Place
3,V5N,Coffee Shop,Sushi Restaurant,Pizza Place,Sandwich Place,Chinese Restaurant,Burger Joint,Grocery Store,Ethiopian Restaurant,Music Venue,Record Shop
4,V5P,Pizza Place,Convenience Store,Bank,Park,Post Office,Noodle House,Restaurant,Sandwich Place,Motorcycle Shop,Middle Eastern Restaurant


## Run k-means to cluster the neighborhood into 5 clusters

In [90]:
# set number of clusters
kclusters = 5

vcv_grouped_clustering = vcv_grouped.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(vcv_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

# add clustering labels
neighborhoods_venues_sorted2.insert(0, 'Cluster Labels', kmeans.labels_)

#vcv_merged = Vancouver_dt_data

# merge manhattan_grouped with manhattan_data to add latitude/longitude for each neighborhood
#vcv_merged = vcv_merged.join(neighborhoods_venues_sorted2.set_index('Neighborhood'), on='Neighborhood')

#vcv_merged.head() # check the last columns!

neighborhoods_venues_sorted2.head()



Unnamed: 0,Cluster Labels11,Cluster Labels1,Cluster Labels,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,0,0,0,V5K,Event Space,Theme Park Ride / Attraction,Theme Park,Bakery,Bus Line,Market,Soccer Field,Park,Farm,Stadium
1,0,0,0,V5L,Chinese Restaurant,Brewery,Bakery,Coffee Shop,Café,Theater,Convenience Store,Thai Restaurant,Electronics Store,Deli / Bodega
2,0,0,0,V5M,Pizza Place,Coffee Shop,Bus Stop,Office,Mexican Restaurant,Metro Station,Pet Store,Chinese Restaurant,Liquor Store,Sandwich Place
3,0,0,0,V5N,Coffee Shop,Sushi Restaurant,Pizza Place,Sandwich Place,Chinese Restaurant,Burger Joint,Grocery Store,Ethiopian Restaurant,Music Venue,Record Shop
4,0,0,0,V5P,Pizza Place,Convenience Store,Bank,Park,Post Office,Noodle House,Restaurant,Sandwich Place,Motorcycle Shop,Middle Eastern Restaurant


In [100]:
#Join two tables
vcv_merged = Vancouver_dt_data.join(neighborhoods_venues_sorted2)
vcv_merged

Unnamed: 0,PostalCode,City,Latitude,Longitude,Cluster Labels11,Cluster Labels1,Cluster Labels,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,V6A,Vancouver,49.27377017128552,-123.09947872733052,0,0,0,V5K,Event Space,Theme Park Ride / Attraction,Theme Park,Bakery,Bus Line,Market,Soccer Field,Park,Farm,Stadium
1,V6B,Vancouver,49.27811172378708,-123.11959791158984,0,0,0,V5L,Chinese Restaurant,Brewery,Bakery,Coffee Shop,Café,Theater,Convenience Store,Thai Restaurant,Electronics Store,Deli / Bodega
2,V6C,Vancouver,49.287716588954346,-123.11519411838532,0,0,0,V5M,Pizza Place,Coffee Shop,Bus Stop,Office,Mexican Restaurant,Metro Station,Pet Store,Chinese Restaurant,Liquor Store,Sandwich Place
3,V6E,Vancouver,49.28801282544443,-123.12108056609158,0,0,0,V5N,Coffee Shop,Sushi Restaurant,Pizza Place,Sandwich Place,Chinese Restaurant,Burger Joint,Grocery Store,Ethiopian Restaurant,Music Venue,Record Shop
4,V6G,Vancouver,49.3002702,-123.13779663860902,0,0,0,V5P,Pizza Place,Convenience Store,Bank,Park,Post Office,Noodle House,Restaurant,Sandwich Place,Motorcycle Shop,Middle Eastern Restaurant
5,V6H,Vancouver,49.25680006013691,-123.1331282550357,4,4,4,V5R,Bar,Hotel,Bus Stop,Fish & Chips Shop,Asian Restaurant,Park,Zoo Exhibit,Fish Market,Financial or Legal Service,Fast Food Restaurant
6,V6J,Vancouver,49.26091372148124,-123.14577875870242,4,4,4,V5S,Print Shop,Park,Bus Stop,Farmers Market,Burger Joint,Donut Shop,Electronics Store,Ethiopian Restaurant,Dive Bar,Event Space
7,V5K,Vancouver,49.28171754656246,-123.0400063294856,0,0,0,V6A,Dessert Shop,Park,Bar,Korean Restaurant,Pizza Place,Community Center,Coffee Shop,Music Store,Circus,Sandwich Place
8,V6K,Vancouver,49.26895274770836,-123.1650191687676,0,0,0,V6B,Hotel,Japanese Restaurant,Restaurant,Seafood Restaurant,Italian Restaurant,Coffee Shop,Spa,Cosmetics Shop,Concert Hall,Mexican Restaurant
9,V5L,Vancouver,49.280200918758624,-123.06656328873324,0,0,0,V6C,Boat or Ferry,Hotel,Restaurant,Coffee Shop,Café,Hotel Bar,Plaza,Vegetarian / Vegan Restaurant,Cruise Ship,Food Truck


## Visualize the resulting clusters

In [92]:

# create map
map_clusters2 = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(vcv_merged['Latitude'], vcv_merged['Longitude'], vcv_merged['Neighborhood'], vcv_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters2)
       
map_clusters2

## Examine the Five Clusters

In [93]:
vcv_merged.loc[vcv_merged['Cluster Labels'] == 0, vcv_merged.columns[[1] + list(range(5, vcv_merged.shape[1]))]]

Unnamed: 0,City,Cluster Labels1,Cluster Labels,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Vancouver,0,0,V5K,Event Space,Theme Park Ride / Attraction,Theme Park,Bakery,Bus Line,Market,Soccer Field,Park,Farm,Stadium
1,Vancouver,0,0,V5L,Chinese Restaurant,Brewery,Bakery,Coffee Shop,Café,Theater,Convenience Store,Thai Restaurant,Electronics Store,Deli / Bodega
2,Vancouver,0,0,V5M,Pizza Place,Coffee Shop,Bus Stop,Office,Mexican Restaurant,Metro Station,Pet Store,Chinese Restaurant,Liquor Store,Sandwich Place
3,Vancouver,0,0,V5N,Coffee Shop,Sushi Restaurant,Pizza Place,Sandwich Place,Chinese Restaurant,Burger Joint,Grocery Store,Ethiopian Restaurant,Music Venue,Record Shop
4,Vancouver,0,0,V5P,Pizza Place,Convenience Store,Bank,Park,Post Office,Noodle House,Restaurant,Sandwich Place,Motorcycle Shop,Middle Eastern Restaurant
7,Vancouver,0,0,V6A,Dessert Shop,Park,Bar,Korean Restaurant,Pizza Place,Community Center,Coffee Shop,Music Store,Circus,Sandwich Place
8,Vancouver,0,0,V6B,Hotel,Japanese Restaurant,Restaurant,Seafood Restaurant,Italian Restaurant,Coffee Shop,Spa,Cosmetics Shop,Concert Hall,Mexican Restaurant
9,Vancouver,0,0,V6C,Boat or Ferry,Hotel,Restaurant,Coffee Shop,Café,Hotel Bar,Plaza,Vegetarian / Vegan Restaurant,Cruise Ship,Food Truck
10,Vancouver,0,0,V6E,Hotel,American Restaurant,Restaurant,Park,Steakhouse,Miscellaneous Shop,Dessert Shop,Ice Cream Shop,Seafood Restaurant,Sandwich Place
11,Vancouver,0,0,V6G,Aquarium,Trail,Theme Park Ride / Attraction,Zoo Exhibit,Outdoor Sculpture,Playground,Café,Exhibit,Music Venue,Event Space


In [95]:
vcv_merged.loc[vcv_merged['Cluster Labels'] == 1, vcv_merged.columns[[1] + list(range(5, vcv_merged.shape[1]))]]

Unnamed: 0,City,Cluster Labels1,Cluster Labels,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
15,Vancouver,1,1,V6L,Italian Restaurant,Caribbean Restaurant,Baseball Field,Zoo Exhibit,Farm,Food & Drink Shop,Fish Market,Fish & Chips Shop,Financial or Legal Service,Fast Food Restaurant


In [97]:
vcv_merged.loc[vcv_merged['Cluster Labels'] == 2, vcv_merged.columns[[1] + list(range(5, vcv_merged.shape[1]))]]

Unnamed: 0,City,Cluster Labels1,Cluster Labels,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
17,Vancouver,2,2,V6N,,,,,,,,,,


In [98]:
vcv_merged.loc[vcv_merged['Cluster Labels'] == 3, vcv_merged.columns[[1] + list(range(5, vcv_merged.shape[1]))]]

Unnamed: 0,City,Cluster Labels1,Cluster Labels,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
19,Vancouver,3,3,V6R,,,,,,,,,,


In [99]:
vcv_merged.loc[vcv_merged['Cluster Labels'] == 4, vcv_merged.columns[[1] + list(range(5, vcv_merged.shape[1]))]]

Unnamed: 0,City,Cluster Labels1,Cluster Labels,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
5,Vancouver,4,4,V5R,Bar,Hotel,Bus Stop,Fish & Chips Shop,Asian Restaurant,Park,Zoo Exhibit,Fish Market,Financial or Legal Service,Fast Food Restaurant
6,Vancouver,4,4,V5S,Print Shop,Park,Bus Stop,Farmers Market,Burger Joint,Donut Shop,Electronics Store,Ethiopian Restaurant,Dive Bar,Event Space


In [115]:
print(vcv_merged['1st Most Common Venue'].value_counts(normalize=True))
print(trt_merged['1st Most Common Venue'].value_counts(normalize=True))

Coffee Shop           0.176471
Hotel                 0.117647
Pizza Place           0.117647
Italian Restaurant    0.058824
Chinese Restaurant    0.058824
Aquarium              0.058824
Print Shop            0.058824
Dessert Shop          0.058824
Event Space           0.058824
Boat or Ferry         0.058824
Bar                   0.058824
Bus Stop              0.058824
Breakfast Spot        0.058824
Name: 1st Most Common Venue, dtype: float64
Coffee Shop       0.647059
Café              0.117647
Airport Lounge    0.058824
Grocery Store     0.058824
Park              0.058824
Clothing Store    0.058824
Name: 1st Most Common Venue, dtype: float64


## Analysis Conclusion: 
- Similarity - When comparing <font color=lightblue>downtown Toronto</font> with <font color=pink>downtown Vancouver</font>, the No.1 most common venue is <font color=brown>Coffee Shop</font>  for both districts (65% vs. 18%). 

- Dissimilarity - In <font color=pink>Downtown Vancouver</font>, Hotel (12%) and Pizza Place (12%) are among the most common venues other than <font color=brown>Coffee Shop</font>. In <font color=lightblue>downtown Toronto</font>, Cafe (12%) is among the most common venues other than <font color=brown>Coffee Shop</font>.