# GUSTAVO SOUSA
## CAPSTONE PROJECT

The moment of choosing and defining the place to begin a new business activity is critical. An infinity of variables – economic, cultural, social – must be considered and carefully assessed.

I decided to apply this study case considering two extremely popular kinds of venues: coffee shops and transportation stations. I exercised the imagination of a situation where, beyond the benefit of a huge flow of people nearby a station, our fictional entrepreneur is intending to use these venues on a two-month advertising campaign with flyers and promotional pamphlets.

The coffee shop franchising intends to open two new venues: one in New York and the other in Toronto. The best neighborhoods in each city to satisfy the business needs and objectives will be analyzed and assessed.
This research is also appliable in similar problems, considering crowded places and venues that can make use of this to gather clients.


In [1]:
import urllib.request
import pandas as pd
import numpy as np
import json
import requests
from pandas.io.json import json_normalize

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

!conda install -c conda-forge folium=0.5.0 --yes

import folium # map rendering library
from bs4 import BeautifulSoup

#!conda install -c conda-forge geopy --yes # uncomment this line if you haven't completed the Foursquare API lab
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

#!conda install -c conda-forge folium=0.5.0 --yes # uncomment this line if you haven't completed the Foursquare API lab
import folium # map rendering library

print('Libraries imported.')

Collecting package metadata (current_repodata.json): ...working... done
Solving environment: ...working... done

# All requested packages already installed.

Libraries imported.


#### Let's begin importing the New York city neighbornhoods data

In [2]:
ny_url = 'https://cocl.us/new_york_dataset'
ny_filename = 'newyork_data.json'
urllib.request.urlretrieve(ny_url, ny_filename)
print('Data downloaded!')

Data downloaded!


In [3]:
with open('newyork_data.json') as json_data:
    newyork_data = json.load(json_data)

In [4]:
newyork_data

{'type': 'FeatureCollection',
 'totalFeatures': 306,
 'features': [{'type': 'Feature',
   'id': 'nyu_2451_34572.1',
   'geometry': {'type': 'Point',
    'coordinates': [-73.84720052054902, 40.89470517661]},
   'geometry_name': 'geom',
   'properties': {'name': 'Wakefield',
    'stacked': 1,
    'annoline1': 'Wakefield',
    'annoline2': None,
    'annoline3': None,
    'annoangle': 0.0,
    'borough': 'Bronx',
    'bbox': [-73.84720052054902,
     40.89470517661,
     -73.84720052054902,
     40.89470517661]}},
  {'type': 'Feature',
   'id': 'nyu_2451_34572.2',
   'geometry': {'type': 'Point',
    'coordinates': [-73.82993910812398, 40.87429419303012]},
   'geometry_name': 'geom',
   'properties': {'name': 'Co-op City',
    'stacked': 2,
    'annoline1': 'Co-op',
    'annoline2': 'City',
    'annoline3': None,
    'annoangle': 0.0,
    'borough': 'Bronx',
    'bbox': [-73.82993910812398,
     40.87429419303012,
     -73.82993910812398,
     40.87429419303012]}},
  {'type': 'Feature',
 

In [5]:
ny_neigh_data = newyork_data['features']

In [6]:
# define the dataframe columns
column_names = ['Borough', 'Neighborhood', 'Latitude', 'Longitude'] 

# instantiate the dataframe
ny_neighborhoods = pd.DataFrame(columns=column_names)

In [7]:
for data in ny_neigh_data:
    borough = neighborhood_name = data['properties']['borough'] 
    neighborhood_name = data['properties']['name']
        
    neighborhood_latlon = data['geometry']['coordinates']
    neighborhood_lat = neighborhood_latlon[1]
    neighborhood_lon = neighborhood_latlon[0]
    
    ny_neighborhoods = ny_neighborhoods.append({'Borough': borough,
                                          'Neighborhood': neighborhood_name,
                                          'Latitude': neighborhood_lat,
                                          'Longitude': neighborhood_lon}, ignore_index=True)

In [8]:
ny_neighborhoods.head(10)

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude
0,Bronx,Wakefield,40.894705,-73.847201
1,Bronx,Co-op City,40.874294,-73.829939
2,Bronx,Eastchester,40.887556,-73.827806
3,Bronx,Fieldston,40.895437,-73.905643
4,Bronx,Riverdale,40.890834,-73.912585
5,Bronx,Kingsbridge,40.881687,-73.902818
6,Manhattan,Marble Hill,40.876551,-73.91066
7,Bronx,Woodlawn,40.898273,-73.867315
8,Bronx,Norwood,40.877224,-73.879391
9,Bronx,Williamsbridge,40.881039,-73.857446


In [10]:
print('The NY dataframe has {} boroughs and {} neighborhoods.'.format(
        len(ny_neighborhoods['Borough'].unique()),
        ny_neighborhoods.shape[0]
    )
)

The NY dataframe has 5 boroughs and 306 neighborhoods.


#### Now we will gather the data from Toronto. In this case, we will need two phases. First, gather the neighborhoods' list with postal codes, and then cross this data with a dataset with postal codes and coordinates.

In [11]:
tor_url = "https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M"

In [12]:
webpage = urllib.request.urlopen(tor_url)

In [13]:
bsoup = BeautifulSoup(webpage, "lxml")
table = bsoup.find('table', class_='wikitable')
table

<table class="wikitable sortable">
<tbody><tr>
<th>Postal code
</th>
<th>Borough
</th>
<th>Neighborhood
</th></tr>
<tr>
<td>M1A
</td>
<td>Not assigned
</td>
<td>
</td></tr>
<tr>
<td>M2A
</td>
<td>Not assigned
</td>
<td>
</td></tr>
<tr>
<td>M3A
</td>
<td>North York
</td>
<td>Parkwoods
</td></tr>
<tr>
<td>M4A
</td>
<td>North York
</td>
<td>Victoria Village
</td></tr>
<tr>
<td>M5A
</td>
<td>Downtown Toronto
</td>
<td>Regent Park / Harbourfront
</td></tr>
<tr>
<td>M6A
</td>
<td>North York
</td>
<td>Lawrence Manor / Lawrence Heights
</td></tr>
<tr>
<td>M7A
</td>
<td>Downtown Toronto
</td>
<td>Queen's Park / Ontario Provincial Government
</td></tr>
<tr>
<td>M8A
</td>
<td>Not assigned
</td>
<td>
</td></tr>
<tr>
<td>M9A
</td>
<td>Etobicoke
</td>
<td>Islington Avenue
</td></tr>
<tr>
<td>M1B
</td>
<td>Scarborough
</td>
<td>Malvern / Rouge
</td></tr>
<tr>
<td>M2B
</td>
<td>Not assigned
</td>
<td>
</td></tr>
<tr>
<td>M3B
</td>
<td>North York
</td>
<td>Don Mills
</td></tr>
<tr>
<td>M4B
</td>
<td>Ea

In [14]:
# list A for postal codes
A=[]
# list B for boroughs
B=[]
# list C for neighborhoods
C=[]

for row in table.findAll('tr'):
    cells=row.findAll('td')
    if len(cells)==3:
        A.append(cells[0].find(text=True))
        B.append(cells[1].find(text=True))
        C.append(cells[2].find(text=True))

In [30]:
df=pd.DataFrame(A,columns=['Postal Code'])
df['Borough']=B
df['Neighborhood']=C
df = df.sort_values(by='Postal Code')
df.head(10)

Unnamed: 0,Postal Code,Borough,Neighborhood
0,M1A,Not assigned,
9,M1B,Scarborough,Malvern / Rouge
18,M1C,Scarborough,Rouge Hill / Port Union / Highland Creek
27,M1E,Scarborough,Guildwood / Morningside / West Hill
36,M1G,Scarborough,Woburn
45,M1H,Scarborough,Cedarbrae
54,M1J,Scarborough,Scarborough Village
63,M1K,Scarborough,Kennedy Park / Ionview / East Birchmount Park
72,M1L,Scarborough,Golden Mile / Clairlea / Oakridge
81,M1M,Scarborough,Cliffside / Cliffcrest / Scarborough Village W...


In [31]:
filtered_df = df[~df['Borough'].str.match('Not assigned')]
filtered_df = filtered_df.reset_index(drop=True)
filtered_df.head(10)

Unnamed: 0,Postal Code,Borough,Neighborhood
0,M1B,Scarborough,Malvern / Rouge
1,M1C,Scarborough,Rouge Hill / Port Union / Highland Creek
2,M1E,Scarborough,Guildwood / Morningside / West Hill
3,M1G,Scarborough,Woburn
4,M1H,Scarborough,Cedarbrae
5,M1J,Scarborough,Scarborough Village
6,M1K,Scarborough,Kennedy Park / Ionview / East Birchmount Park
7,M1L,Scarborough,Golden Mile / Clairlea / Oakridge
8,M1M,Scarborough,Cliffside / Cliffcrest / Scarborough Village W...
9,M1N,Scarborough,Birch Cliff / Cliffside West


In [32]:
# Let's get coordinates from Toronto's postal codes, with a previously gathered CSV file.
# This will avoid an excessive repetitive number of accesses to the geospatioal API. 

urllib.request.urlretrieve('http://cocl.us/Geospatial_data', 'toronto_postal_coord.csv')
print('Data downloaded!')

coord_df = pd.read_csv("toronto_postal_coord.csv")
coord_df

Data downloaded!


Unnamed: 0,Postal Code,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497
2,M1E,43.763573,-79.188711
3,M1G,43.770992,-79.216917
4,M1H,43.773136,-79.239476
5,M1J,43.744734,-79.239476
6,M1K,43.727929,-79.262029
7,M1L,43.711112,-79.284577
8,M1M,43.716316,-79.239476
9,M1N,43.692657,-79.264848


In [33]:
coord_df.drop('Postal Code', axis=1, inplace=True)
coord_df.shape

(103, 2)

In [34]:
tor_neighborhoods = pd.concat([filtered_df, coord_df],axis=1)
tor_neighborhoods

Unnamed: 0,Postal Code,Borough,Neighborhood,Latitude,Longitude
0,M1B,Scarborough,Malvern / Rouge,43.806686,-79.194353
1,M1C,Scarborough,Rouge Hill / Port Union / Highland Creek,43.784535,-79.160497
2,M1E,Scarborough,Guildwood / Morningside / West Hill,43.763573,-79.188711
3,M1G,Scarborough,Woburn,43.770992,-79.216917
4,M1H,Scarborough,Cedarbrae,43.773136,-79.239476
5,M1J,Scarborough,Scarborough Village,43.744734,-79.239476
6,M1K,Scarborough,Kennedy Park / Ionview / East Birchmount Park,43.727929,-79.262029
7,M1L,Scarborough,Golden Mile / Clairlea / Oakridge,43.711112,-79.284577
8,M1M,Scarborough,Cliffside / Cliffcrest / Scarborough Village W...,43.716316,-79.239476
9,M1N,Scarborough,Birch Cliff / Cliffside West,43.692657,-79.264848


In [35]:
print('The Toronto dataframe has {} boroughs and {} neighborhoods.'.format(
        len(tor_neighborhoods['Borough'].unique()),
        tor_neighborhoods.shape[0]
    )
)

The Toronto dataframe has 10 boroughs and 103 neighborhoods.


#### The next step is to find out the main coordinates of each city

In [36]:
address = 'New York City, NY'
agent = 'ny_explorer'
geolocator = Nominatim(user_agent=agent)
location = geolocator.geocode(address)
ny_latitude = location.latitude
ny_longitude = location.longitude
print('The geograpical coordinates of New York City are {}, {}.'.format(ny_latitude, ny_longitude))

The geograpical coordinates of New York City are 40.7127281, -74.0060152.


In [37]:
address = 'Toronto, Ontario'
agent = 'toronto_explorer'
geolocator = Nominatim(user_agent=agent)
location = geolocator.geocode(address)
tor_latitude = location.latitude
tor_longitude = location.longitude
print('The geograpical coordinates of Toronto are {}, {}.'.format(tor_latitude, tor_longitude))

The geograpical coordinates of Toronto are 43.6534817, -79.3839347.


#### Now let's see the initial neighborhoods map for each city
#### Let's start with New York

In [147]:
# create map of New York using latitude and longitude values
map_newyork = folium.Map(location=[ny_latitude, ny_longitude], zoom_start=10)

# add markers to map
for lat, lng, borough, neighborhood in zip(ny_neighborhoods['Latitude'], ny_neighborhoods['Longitude'], ny_neighborhoods['Borough'], ny_neighborhoods['Neighborhood']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_newyork)  
    
map_newyork

#### Now let's plot the Toronto map

In [148]:
# create map of Toronto using latitude and longitude values
map_toronto = folium.Map(location=[tor_latitude, tor_longitude], zoom_start=10)

# add markers to map
for lat, lng, borough, neighborhood in zip(tor_neighborhoods['Latitude'], tor_neighborhoods['Longitude'], tor_neighborhoods['Borough'], tor_neighborhoods['Neighborhood']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='green',
        fill=True,
        fill_color='#2ee74d',
        fill_opacity=0.7,
        parse_html=False).add_to(map_toronto)  
    
map_toronto

#### Now let's prepare the credentials to access Foursquase data, and define the function that will let us gather the venues data based on the coordinates of each city.
#### These venues will help us to make assumptions based on their distributions

In [48]:
#@hidden_cell
CLIENT_ID = 'M31K3N4AGGRPU0J2DVHR21UROCBTU1MKW35AA4L4UAGNXCZN' # your Foursquare ID
CLIENT_SECRET = 'G4SEHH4M0RTAEDUGC5WFCIMKJR3F5EKHGV22P3XYWXO22GX4' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version

In [49]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

#### Now we will generate the venues data frames. Beginning with New York, and then Toronto.

In [56]:
LIMIT=50
ny_venues = getNearbyVenues(names=ny_neighborhoods['Neighborhood'],
                                   latitudes=ny_neighborhoods['Latitude'],
                                   longitudes=ny_neighborhoods['Longitude']
                                  )
print(ny_venues.shape)

Wakefield
Co-op City
Eastchester
Fieldston
Riverdale
Kingsbridge
Marble Hill
Woodlawn
Norwood
Williamsbridge
Baychester
Pelham Parkway
City Island
Bedford Park
University Heights
Morris Heights
Fordham
East Tremont
West Farms
High  Bridge
Melrose
Mott Haven
Port Morris
Longwood
Hunts Point
Morrisania
Soundview
Clason Point
Throgs Neck
Country Club
Parkchester
Westchester Square
Van Nest
Morris Park
Belmont
Spuyten Duyvil
North Riverdale
Pelham Bay
Schuylerville
Edgewater Park
Castle Hill
Olinville
Pelham Gardens
Concourse
Unionport
Edenwald
Bay Ridge
Bensonhurst
Sunset Park
Greenpoint
Gravesend
Brighton Beach
Sheepshead Bay
Manhattan Terrace
Flatbush
Crown Heights
East Flatbush
Kensington
Windsor Terrace
Prospect Heights
Brownsville
Williamsburg
Bushwick
Bedford Stuyvesant
Brooklyn Heights
Cobble Hill
Carroll Gardens
Red Hook
Gowanus
Fort Greene
Park Slope
Cypress Hills
East New York
Starrett City
Canarsie
Flatlands
Mill Island
Manhattan Beach
Coney Island
Bath Beach
Borough Park
Dyker

In [57]:
ny_venues.head(10)

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Wakefield,40.894705,-73.847201,Lollipops Gelato,40.894123,-73.845892,Dessert Shop
1,Wakefield,40.894705,-73.847201,Carvel Ice Cream,40.890487,-73.848568,Ice Cream Shop
2,Wakefield,40.894705,-73.847201,Walgreens,40.896528,-73.8447,Pharmacy
3,Wakefield,40.894705,-73.847201,Rite Aid,40.896649,-73.844846,Pharmacy
4,Wakefield,40.894705,-73.847201,Dunkin',40.890459,-73.849089,Donut Shop
5,Wakefield,40.894705,-73.847201,Shell,40.894187,-73.845862,Gas Station
6,Wakefield,40.894705,-73.847201,Subway,40.890468,-73.849152,Sandwich Place
7,Wakefield,40.894705,-73.847201,Louis Pizza,40.898399,-73.84881,Pizza Place
8,Wakefield,40.894705,-73.847201,Koss Quick Wash,40.891281,-73.849904,Laundromat
9,Co-op City,40.874294,-73.829939,Capri II Pizza,40.876374,-73.82994,Pizza Place


In [58]:
ny_venues.groupby('Neighborhood').count()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Allerton,32,32,32,32,32,32
Annadale,14,14,14,14,14,14
Arden Heights,4,4,4,4,4,4
Arlington,7,7,7,7,7,7
Arrochar,20,20,20,20,20,20
Arverne,17,17,17,17,17,17
Astoria,50,50,50,50,50,50
Astoria Heights,14,14,14,14,14,14
Auburndale,18,18,18,18,18,18
Bath Beach,49,49,49,49,49,49


In [53]:
toronto_venues = getNearbyVenues(names=tor_neighborhoods['Neighborhood'],
                                   latitudes=tor_neighborhoods['Latitude'],
                                   longitudes=tor_neighborhoods['Longitude']
                                  )
print(toronto_venues.shape)

Malvern / Rouge

Rouge Hill / Port Union / Highland Creek

Guildwood / Morningside / West Hill

Woburn

Cedarbrae

Scarborough Village

Kennedy Park / Ionview / East Birchmount Park

Golden Mile / Clairlea / Oakridge

Cliffside / Cliffcrest / Scarborough Village West

Birch Cliff / Cliffside West

Dorset Park / Wexford Heights / Scarborough Town Centre

Wexford / Maryvale

Agincourt

Clarks Corners / Tam O'Shanter / Sullivan

Milliken / Agincourt North / Steeles East / L'Amoreaux East

Steeles West / L'Amoreaux West

Upper Rouge

Hillcrest Village

Fairview / Henry Farm / Oriole

Bayview Village

York Mills / Silver Hills

Willowdale / Newtonbrook

Willowdale

York Mills West

Willowdale

Parkwoods

Don Mills

Don Mills

Bathurst Manor / Wilson Heights / Downsview North

Northwood Park / York University

Downsview

Downsview

Downsview

Downsview

Victoria Village

Parkview Hill / Woodbine Gardens

Woodbine Heights

The Beaches

Leaside

Thorncliffe Park

East Toronto

The Danforth Wes

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Malvern / Rouge,43.806686,-79.194353,Wendy’s,43.807448,-79.199056,Fast Food Restaurant
1,Malvern / Rouge,43.806686,-79.194353,Interprovincial Group,43.80563,-79.200378,Print Shop
2,Rouge Hill / Port Union / Highland Creek,43.784535,-79.160497,Royal Canadian Legion,43.782533,-79.163085,Bar
3,Rouge Hill / Port Union / Highland Creek,43.784535,-79.160497,SEBS Engineering Inc. (Sustainable Energy and ...,43.782371,-79.15682,Construction & Landscaping
4,Rouge Hill / Port Union / Highland Creek,43.784535,-79.160497,Scarborough Historical Society,43.788755,-79.162438,History Museum
5,Guildwood / Morningside / West Hill,43.763573,-79.188711,RBC Royal Bank,43.76679,-79.191151,Bank
6,Guildwood / Morningside / West Hill,43.763573,-79.188711,G & G Electronics,43.765309,-79.191537,Electronics Store
7,Guildwood / Morningside / West Hill,43.763573,-79.188711,Big Bite Burrito,43.766299,-79.19072,Mexican Restaurant
8,Guildwood / Morningside / West Hill,43.763573,-79.188711,Enterprise Rent-A-Car,43.764076,-79.193406,Rental Car Location
9,Guildwood / Morningside / West Hill,43.763573,-79.188711,Woburn Medical Centre,43.766631,-79.192286,Medical Center


In [54]:
toronto_venues.head(10)

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Malvern / Rouge,43.806686,-79.194353,Wendy’s,43.807448,-79.199056,Fast Food Restaurant
1,Malvern / Rouge,43.806686,-79.194353,Interprovincial Group,43.80563,-79.200378,Print Shop
2,Rouge Hill / Port Union / Highland Creek,43.784535,-79.160497,Royal Canadian Legion,43.782533,-79.163085,Bar
3,Rouge Hill / Port Union / Highland Creek,43.784535,-79.160497,SEBS Engineering Inc. (Sustainable Energy and ...,43.782371,-79.15682,Construction & Landscaping
4,Rouge Hill / Port Union / Highland Creek,43.784535,-79.160497,Scarborough Historical Society,43.788755,-79.162438,History Museum
5,Guildwood / Morningside / West Hill,43.763573,-79.188711,RBC Royal Bank,43.76679,-79.191151,Bank
6,Guildwood / Morningside / West Hill,43.763573,-79.188711,G & G Electronics,43.765309,-79.191537,Electronics Store
7,Guildwood / Morningside / West Hill,43.763573,-79.188711,Big Bite Burrito,43.766299,-79.19072,Mexican Restaurant
8,Guildwood / Morningside / West Hill,43.763573,-79.188711,Enterprise Rent-A-Car,43.764076,-79.193406,Rental Car Location
9,Guildwood / Morningside / West Hill,43.763573,-79.188711,Woburn Medical Centre,43.766631,-79.192286,Medical Center


In [143]:
toronto_venues.groupby('Neighborhood').count()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Agincourt,5,5,5,5,5,5
Alderwood / Long Branch,10,10,10,10,10,10
Bathurst Manor / Wilson Heights / Downsview North,19,19,19,19,19,19
Bayview Village,4,4,4,4,4,4
Bedford Park / Lawrence Manor East,24,24,24,24,24,24
Berczy Park,50,50,50,50,50,50
Birch Cliff / Cliffside West,4,4,4,4,4,4
Brockton / Parkdale Village / Exhibition Place,23,23,23,23,23,23
Business reply mail Processing CentrE,18,18,18,18,18,18
CN Tower / King and Spadina / Railway Lands / Harbourfront West / Bathurst Quay / South Niagara / Island airport,17,17,17,17,17,17


#### Now let's generate a numeric '0,1' table to get rid off categorical variables. This encoding/transformation process is also commonly known as 'one hot'.
#### Let's proceed with each city venues dataset transformation, beginning with New York.

In [63]:
# NY one hot encoding
ny_onehot = pd.get_dummies(ny_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
ny_onehot['Neighborhood'] = ny_venues['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [ny_onehot.columns[-1]] + list(ny_onehot.columns[:-1])
ny_onehot = ny_onehot[fixed_columns]

ny_onehot.head(10)

Unnamed: 0,Yoga Studio,Accessories Store,Adult Boutique,Afghan Restaurant,African Restaurant,Airport Terminal,American Restaurant,Antique Shop,Arcade,Arepa Restaurant,...,Volleyball Court,Warehouse Store,Waste Facility,Waterfront,Weight Loss Center,Whisky Bar,Wine Bar,Wine Shop,Wings Joint,Women's Store
0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
5,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
6,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
7,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
8,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
9,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [64]:
# Toronto one hot encoding
toronto_onehot = pd.get_dummies(toronto_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
toronto_onehot['Neighborhood'] = toronto_venues['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [toronto_onehot.columns[-1]] + list(toronto_onehot.columns[:-1])
toronto_onehot = toronto_onehot[fixed_columns]

toronto_onehot.head(10)

Unnamed: 0,Yoga Studio,Accessories Store,Airport,Airport Food Court,Airport Gate,Airport Lounge,Airport Service,Airport Terminal,American Restaurant,Antique Shop,...,Train Station,Vegetarian / Vegan Restaurant,Video Game Store,Video Store,Vietnamese Restaurant,Warehouse Store,Wine Bar,Wine Shop,Wings Joint,Women's Store
0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
5,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
6,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
7,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
8,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
9,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


#### Now let's group the top venues of each neighborhood, for both cities

In [68]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [69]:
ny_grouped = ny_onehot.groupby('Neighborhood').mean().reset_index()
ny_grouped

Unnamed: 0,Neighborhood,Yoga Studio,Accessories Store,Adult Boutique,Afghan Restaurant,African Restaurant,Airport Terminal,American Restaurant,Antique Shop,Arcade,...,Volleyball Court,Warehouse Store,Waste Facility,Waterfront,Weight Loss Center,Whisky Bar,Wine Bar,Wine Shop,Wings Joint,Women's Store
0,Allerton,0.000000,0.00,0.0,0.000000,0.000000,0.000000,0.031250,0.000000,0.000000,...,0.00,0.0,0.0,0.0,0.00,0.000000,0.000000,0.000000,0.000000,0.000000
1,Annadale,0.000000,0.00,0.0,0.000000,0.000000,0.000000,0.071429,0.000000,0.000000,...,0.00,0.0,0.0,0.0,0.00,0.000000,0.000000,0.000000,0.000000,0.000000
2,Arden Heights,0.000000,0.00,0.0,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,...,0.00,0.0,0.0,0.0,0.00,0.000000,0.000000,0.000000,0.000000,0.000000
3,Arlington,0.000000,0.00,0.0,0.000000,0.000000,0.000000,0.142857,0.000000,0.000000,...,0.00,0.0,0.0,0.0,0.00,0.000000,0.000000,0.000000,0.000000,0.000000
4,Arrochar,0.000000,0.00,0.0,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,...,0.00,0.0,0.0,0.0,0.00,0.000000,0.000000,0.000000,0.000000,0.000000
5,Arverne,0.000000,0.00,0.0,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,...,0.00,0.0,0.0,0.0,0.00,0.000000,0.000000,0.058824,0.000000,0.000000
6,Astoria,0.000000,0.00,0.0,0.000000,0.000000,0.000000,0.020000,0.000000,0.000000,...,0.00,0.0,0.0,0.0,0.00,0.000000,0.000000,0.000000,0.000000,0.000000
7,Astoria Heights,0.000000,0.00,0.0,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,...,0.00,0.0,0.0,0.0,0.00,0.000000,0.000000,0.000000,0.000000,0.000000
8,Auburndale,0.000000,0.00,0.0,0.000000,0.000000,0.000000,0.055556,0.000000,0.000000,...,0.00,0.0,0.0,0.0,0.00,0.000000,0.000000,0.000000,0.000000,0.000000
9,Bath Beach,0.000000,0.00,0.0,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,...,0.00,0.0,0.0,0.0,0.00,0.000000,0.000000,0.000000,0.000000,0.020408


In [84]:
num_top_venues = 10

for hood in ny_grouped['Neighborhood']:
    print("---- "+hood+" ----")
    temp = ny_grouped[ny_grouped['Neighborhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

---- Allerton ----
                 venue  freq
0          Pizza Place  0.16
1        Deli / Bodega  0.06
2   Chinese Restaurant  0.06
3          Supermarket  0.06
4     Department Store  0.06
5               Bakery  0.06
6          Gas Station  0.03
7  Fried Chicken Joint  0.03
8   Spanish Restaurant  0.03
9                  Spa  0.03


---- Annadale ----
                 venue  freq
0          Pizza Place  0.21
1               Bakery  0.14
2           Restaurant  0.07
3         Dance Studio  0.07
4                 Park  0.07
5                Diner  0.07
6  American Restaurant  0.07
7        Train Station  0.07
8     Sushi Restaurant  0.07
9           Bagel Shop  0.07


---- Arden Heights ----
                   venue  freq
0            Coffee Shop  0.25
1            Pizza Place  0.25
2               Bus Stop  0.25
3               Pharmacy  0.25
4            Yoga Studio  0.00
5      Paella Restaurant  0.00
6            Pet Service  0.00
7               Pet Café  0.00
8    Peruvian Res

                  venue  freq
0       Harbor / Marina  0.33
1        Baseball Field  0.17
2            Playground  0.17
3            Donut Shop  0.17
4    Athletics & Sports  0.17
5  Pakistani Restaurant  0.00
6             Pet Store  0.00
7           Pet Service  0.00
8              Pet Café  0.00
9   Peruvian Restaurant  0.00


---- Blissville ----
                 venue  freq
0          Bus Station  0.12
1           Donut Shop  0.12
2          Art Gallery  0.06
3  Sporting Goods Shop  0.06
4        Movie Theater  0.06
5       Clothing Store  0.06
6         Skating Rink  0.06
7                  Bar  0.06
8       Mattress Store  0.06
9                Hotel  0.06


---- Bloomfield ----
                   venue  freq
0      Recreation Center   0.2
1             Theme Park   0.2
2         Discount Store   0.2
3               Bus Stop   0.2
4                   Park   0.2
5            Yoga Studio   0.0
6           Outlet Store   0.0
7               Pet Café   0.0
8    Peruvian Restaurant  

                 venue  freq
0        Big Box Store  0.06
1       Cosmetics Shop  0.06
2  Japanese Restaurant  0.03
3               Bakery  0.03
4      Supplement Shop  0.03
5         Burger Joint  0.03
6  Sporting Goods Shop  0.03
7       Clothing Store  0.03
8          Coffee Shop  0.03
9           Shoe Store  0.03


---- Chelsea ----
                 venue  freq
0          Coffee Shop  0.07
1   Italian Restaurant  0.05
2  American Restaurant  0.05
3       Ice Cream Shop  0.05
4   Seafood Restaurant  0.05
5              Theater  0.04
6               Market  0.04
7               Bakery  0.04
8         Cycle Studio  0.04
9         Cupcake Shop  0.04


---- Chinatown ----
                 venue  freq
0                  Spa  0.06
1   Chinese Restaurant  0.06
2  American Restaurant  0.06
3     Greek Restaurant  0.04
4       Ice Cream Shop  0.04
5     Asian Restaurant  0.04
6   Salon / Barbershop  0.04
7     Malay Restaurant  0.04
8       Sandwich Place  0.04
9             Boutique  0.04



                venue  freq
0         Pizza Place  0.13
1          Bagel Shop  0.09
2  Italian Restaurant  0.09
3            Pharmacy  0.09
4    Basketball Court  0.04
5      Sandwich Place  0.04
6         Flower Shop  0.04
7                 Bar  0.04
8                Bank  0.04
9       Tattoo Parlor  0.04


---- Douglaston ----
                venue  freq
0       Deli / Bodega  0.10
1                Bank  0.10
2      Shipping Store  0.05
3    Sushi Restaurant  0.05
4            Pharmacy  0.05
5         Pizza Place  0.05
6  Chinese Restaurant  0.05
7              Bakery  0.05
8                 Spa  0.05
9   Convenience Store  0.05


---- Downtown ----
                   venue  freq
0         Sandwich Place  0.06
1            Coffee Shop  0.06
2          Grocery Store  0.04
3  Performing Arts Venue  0.04
4                    Bar  0.04
5           Burger Joint  0.04
6     Chinese Restaurant  0.04
7               Creperie  0.02
8          Deli / Bodega  0.02
9       Department Store  0.02

                   venue  freq
0            Bus Station   0.5
1                  Plaza   0.5
2            Yoga Studio   0.0
3           Outlet Store   0.0
4            Pet Service   0.0
5               Pet Café   0.0
6    Peruvian Restaurant   0.0
7  Performing Arts Venue   0.0
8       Pedestrian Plaza   0.0
9                   Park   0.0


---- Financial District ----
                  venue  freq
0           Coffee Shop  0.10
1                 Hotel  0.06
2  Gym / Fitness Center  0.06
3           Salad Place  0.04
4                   Bar  0.04
5   American Restaurant  0.04
6   Monument / Landmark  0.04
7    Falafel Restaurant  0.04
8   Japanese Restaurant  0.02
9        Ice Cream Shop  0.02


---- Flatbush ----
                  venue  freq
0         Deli / Bodega  0.09
1           Coffee Shop  0.09
2    Chinese Restaurant  0.09
3  Caribbean Restaurant  0.09
4    Mexican Restaurant  0.09
5                  Bank  0.09
6             Juice Bar  0.04
7                Lounge  0.04
8      

                 venue  freq
0                  Bar  0.14
1          Pizza Place  0.14
2   Italian Restaurant  0.09
3  Japanese Restaurant  0.05
4   Spanish Restaurant  0.05
5   Mexican Restaurant  0.05
6   Chinese Restaurant  0.05
7        Train Station  0.05
8        Grocery Store  0.05
9   Falafel Restaurant  0.05


---- Greenpoint ----
               venue  freq
0       Cocktail Bar  0.08
1        Coffee Shop  0.08
2        Yoga Studio  0.06
3               Café  0.04
4   Sushi Restaurant  0.04
5                Spa  0.04
6             Bakery  0.04
7  French Restaurant  0.04
8        Record Shop  0.04
9        Pizza Place  0.04


---- Greenridge ----
                        venue  freq
0                       Diner  0.17
1                  Bagel Shop  0.17
2                 Pizza Place  0.17
3  Construction & Landscaping  0.17
4              Shipping Store  0.17
5                      Lawyer  0.17
6                 Yoga Studio  0.00
7        Pakistani Restaurant  0.00
8             

                             venue  freq
0                         Pharmacy  0.11
1                Indian Restaurant  0.07
2                       Donut Shop  0.07
3             Fast Food Restaurant  0.07
4              Fried Chicken Joint  0.07
5                    Moving Target  0.07
6  Southern / Soul Food Restaurant  0.04
7                        Pet Store  0.04
8                      Coffee Shop  0.04
9                    Shopping Mall  0.04


---- Kensington ----
                venue  freq
0       Grocery Store  0.08
1     Thai Restaurant  0.08
2      Ice Cream Shop  0.05
3         Pizza Place  0.05
4                 Gym  0.03
5              Bakery  0.03
6  Mexican Restaurant  0.03
7            Pharmacy  0.03
8   Mobile Phone Shop  0.03
9         Coffee Shop  0.03


---- Kew Gardens ----
                venue  freq
0  Chinese Restaurant  0.07
1         Pizza Place  0.04
2      Cosmetics Shop  0.04
3           Pet Store  0.04
4                Bank  0.04
5                 Bar  0.0

                   venue  freq
0          Deli / Bodega  0.33
1     Italian Restaurant  0.33
2               Bus Stop  0.17
3      Other Repair Shop  0.17
4            Yoga Studio  0.00
5           Outlet Store  0.00
6            Pet Service  0.00
7               Pet Café  0.00
8    Peruvian Restaurant  0.00
9  Performing Arts Venue  0.00


---- Maspeth ----
                venue  freq
0         Pizza Place  0.09
1               Diner  0.09
2                Bank  0.06
3   Mobile Phone Shop  0.06
4       Deli / Bodega  0.06
5       Grocery Store  0.06
6  Chinese Restaurant  0.06
7          Sports Bar  0.03
8          Taco Place  0.03
9        Gourmet Shop  0.03


---- Melrose ----
                           venue  freq
0                    Pizza Place  0.15
1                       Pharmacy  0.11
2                 Sandwich Place  0.07
3                 Discount Store  0.07
4                  Grocery Store  0.07
5               Department Store  0.04
6                    Supermarket  0.04

                venue  freq
0         Pizza Place  0.12
1  Italian Restaurant  0.08
2                Bank  0.08
3   Electronics Store  0.04
4      Sandwich Place  0.04
5                Pool  0.04
6          Donut Shop  0.04
7         Coffee Shop  0.04
8         Social Club  0.04
9          Bagel Shop  0.04


---- North Side ----
                           venue  freq
0                    Coffee Shop  0.08
1                    Yoga Studio  0.06
2            American Restaurant  0.06
3                    Pizza Place  0.06
4                         Bakery  0.04
5                  Jewelry Store  0.04
6                      Juice Bar  0.04
7                       Wine Bar  0.04
8  Vegetarian / Vegan Restaurant  0.04
9             Seafood Restaurant  0.02


---- Norwood ----
                venue  freq
0         Pizza Place  0.16
1                Park  0.12
2         Bus Station  0.09
3                Bank  0.09
4       Deli / Bodega  0.06
5  Chinese Restaurant  0.06
6            Pharmacy  0

                venue  freq
0  Chinese Restaurant  0.14
1                Bank  0.09
2              Bakery  0.09
3         Pizza Place  0.05
4    Asian Restaurant  0.05
5                Café  0.05
6  Frozen Yogurt Shop  0.05
7                Park  0.05
8          Hobby Shop  0.05
9          Playground  0.05


---- Queensbridge ----
                  venue  freq
0                 Hotel  0.22
1        Sandwich Place  0.11
2    Athletics & Sports  0.06
3            Hotel Pool  0.06
4           Beer Garden  0.06
5        Scenic Lookout  0.06
6              Platform  0.06
7    Spanish Restaurant  0.06
8  Gym / Fitness Center  0.06
9                  Park  0.06


---- Randall Manor ----
                  venue  freq
0              Bus Stop   0.4
1         Deli / Bodega   0.2
2            Bagel Shop   0.2
3           Pizza Place   0.2
4           Yoga Studio   0.0
5  Pakistani Restaurant   0.0
6             Pet Store   0.0
7           Pet Service   0.0
8              Pet Café   0.0
9   Peruvia

                      venue  freq
0        Italian Restaurant  0.12
1  Mediterranean Restaurant  0.06
2         French Restaurant  0.04
3            Clothing Store  0.04
4               Coffee Shop  0.04
5            Pilates Studio  0.02
6               Supermarket  0.02
7        Falafel Restaurant  0.02
8               Salad Place  0.02
9        Salon / Barbershop  0.02


---- Somerville ----
                           venue  freq
0                           Park   1.0
1                    Yoga Studio   0.0
2                    Outlet Mall   0.0
3                    Pet Service   0.0
4                       Pet Café   0.0
5            Peruvian Restaurant   0.0
6          Performing Arts Venue   0.0
7               Pedestrian Plaza   0.0
8  Paper / Office Supplies Store   0.0
9           Pakistani Restaurant   0.0


---- Soundview ----
                 venue  freq
0   Chinese Restaurant  0.20
1        Grocery Store  0.13
2         Liquor Store  0.07
3       Breakfast Spot  0.07
4      

                 venue  freq
0                 Park  0.06
1             Wine Bar  0.06
2                 Café  0.06
3                  Spa  0.04
4          Men's Store  0.04
5           Playground  0.04
6     Greek Restaurant  0.04
7       Scenic Lookout  0.04
8                Hotel  0.04
9  American Restaurant  0.04


---- Tudor City ----
                   venue  freq
0                   Park  0.06
1                   Café  0.06
2        Thai Restaurant  0.04
3       Sushi Restaurant  0.04
4          Deli / Bodega  0.04
5     Mexican Restaurant  0.04
6  Vietnamese Restaurant  0.04
7      Convenience Store  0.02
8           Burger Joint  0.02
9      French Restaurant  0.02


---- Turtle Bay ----
                 venue  freq
0    French Restaurant  0.06
1          Coffee Shop  0.06
2             Wine Bar  0.06
3                 Park  0.06
4     Greek Restaurant  0.04
5          Karaoke Bar  0.04
6                  Spa  0.04
7  American Restaurant  0.04
8           Boxing Gym  0.02
9   

             venue  freq
0    Deli / Bodega  0.17
1         Pharmacy  0.08
2             Bank  0.08
3      Supermarket  0.04
4       Nail Salon  0.04
5  Thai Restaurant  0.04
6             Park  0.04
7       Hookah Bar  0.04
8       Bagel Shop  0.04
9      Pizza Place  0.04


---- Woodlawn ----
                 venue  freq
0          Pizza Place  0.16
1        Deli / Bodega  0.08
2                  Bar  0.08
3           Playground  0.08
4  Rental Car Location  0.04
5    Convenience Store  0.04
6                 Park  0.04
7                Trail  0.04
8        Train Station  0.04
9    Food & Drink Shop  0.04


---- Woodrow ----
                venue  freq
0            Pharmacy  0.11
1           Juice Bar  0.06
2    Sushi Restaurant  0.06
3  Mexican Restaurant  0.06
4         Coffee Shop  0.06
5  Chinese Restaurant  0.06
6                Bank  0.06
7              Bakery  0.06
8         Pizza Place  0.06
9       Grocery Store  0.06


---- Woodside ----
                       venue  freq
0

In [97]:
indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
ny_neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
ny_neighborhoods_venues_sorted['Neighborhood'] = ny_grouped['Neighborhood']

for ind in np.arange(ny_grouped.shape[0]):
    ny_neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(ny_grouped.iloc[ind, :], num_top_venues)

ny_neighborhoods_venues_sorted.head(10)

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Allerton,Pizza Place,Deli / Bodega,Chinese Restaurant,Department Store,Supermarket,Bakery,Gas Station,Breakfast Spot,Spanish Restaurant,Spa
1,Annadale,Pizza Place,Bakery,Park,Diner,Sushi Restaurant,Bagel Shop,Train Station,Restaurant,American Restaurant,Sports Bar
2,Arden Heights,Pharmacy,Coffee Shop,Bus Stop,Pizza Place,Women's Store,Field,Event Space,Exhibit,Eye Doctor,Factory
3,Arlington,Bus Stop,Deli / Bodega,Intersection,American Restaurant,Coffee Shop,Grocery Store,Women's Store,Filipino Restaurant,Exhibit,Eye Doctor
4,Arrochar,Deli / Bodega,Italian Restaurant,Bus Stop,Pizza Place,Nail Salon,Sporting Goods Shop,Supermarket,Middle Eastern Restaurant,Outdoors & Recreation,Mediterranean Restaurant
5,Arverne,Surf Spot,Metro Station,Sandwich Place,Bus Stop,Donut Shop,Coffee Shop,Pizza Place,Playground,Board Shop,Beach
6,Astoria,Bar,Seafood Restaurant,Greek Restaurant,Pub,Mediterranean Restaurant,Dessert Shop,Ice Cream Shop,Gym,Gourmet Shop,Bakery
7,Astoria Heights,Italian Restaurant,Burger Joint,Bowling Alley,Bakery,Liquor Store,Hostel,Playground,Plaza,Pizza Place,Shopping Mall
8,Auburndale,Athletics & Sports,Toy / Game Store,Train,Miscellaneous Shop,Mobile Phone Shop,Pharmacy,Pet Store,Hookah Bar,Korean Restaurant,Supermarket
9,Bath Beach,Pharmacy,Chinese Restaurant,Bubble Tea Shop,Sushi Restaurant,Fast Food Restaurant,Asian Restaurant,Italian Restaurant,Gas Station,Ice Cream Shop,Rental Car Location


In [95]:
toronto_grouped = toronto_onehot.groupby('Neighborhood').mean().reset_index()
toronto_grouped

Unnamed: 0,Neighborhood,Yoga Studio,Accessories Store,Airport,Airport Food Court,Airport Gate,Airport Lounge,Airport Service,Airport Terminal,American Restaurant,...,Train Station,Vegetarian / Vegan Restaurant,Video Game Store,Video Store,Vietnamese Restaurant,Warehouse Store,Wine Bar,Wine Shop,Wings Joint,Women's Store
0,Agincourt,0.000000,0.000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,...,0.00,0.000000,0.000000,0.000000,0.000000,0.00,0.00000,0.000000,0.0,0.00
1,Alderwood / Long Branch,0.000000,0.000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,...,0.00,0.000000,0.000000,0.000000,0.000000,0.00,0.00000,0.000000,0.0,0.00
2,Bathurst Manor / Wilson Heights / Downsview No...,0.000000,0.000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,...,0.00,0.000000,0.000000,0.052632,0.000000,0.00,0.00000,0.000000,0.0,0.00
3,Bayview Village,0.000000,0.000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,...,0.00,0.000000,0.000000,0.000000,0.000000,0.00,0.00000,0.000000,0.0,0.00
4,Bedford Park / Lawrence Manor East,0.000000,0.000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.041667,...,0.00,0.000000,0.000000,0.000000,0.000000,0.00,0.00000,0.000000,0.0,0.00
5,Berczy Park,0.000000,0.000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,...,0.00,0.020000,0.000000,0.000000,0.000000,0.00,0.00000,0.000000,0.0,0.00
6,Birch Cliff / Cliffside West,0.000000,0.000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,...,0.00,0.000000,0.000000,0.000000,0.000000,0.00,0.00000,0.000000,0.0,0.00
7,Brockton / Parkdale Village / Exhibition Place,0.000000,0.000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,...,0.00,0.000000,0.000000,0.000000,0.000000,0.00,0.00000,0.000000,0.0,0.00
8,Business reply mail Processing CentrE,0.055556,0.000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,...,0.00,0.000000,0.000000,0.000000,0.000000,0.00,0.00000,0.000000,0.0,0.00
9,CN Tower / King and Spadina / Railway Lands / ...,0.000000,0.000,0.058824,0.058824,0.058824,0.117647,0.176471,0.117647,0.000000,...,0.00,0.000000,0.000000,0.000000,0.000000,0.00,0.00000,0.000000,0.0,0.00


In [96]:
for hood in toronto_grouped['Neighborhood']:
    print("---- "+hood+" ----")
    temp = toronto_grouped[toronto_grouped['Neighborhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

---- Agincourt
 ----
                             venue  freq
0                   Clothing Store   0.2
1                     Skating Rink   0.2
2                           Lounge   0.2
3                   Breakfast Spot   0.2
4        Latin American Restaurant   0.2
5               Miscellaneous Shop   0.0
6              Monument / Landmark   0.0
7  Molecular Gastronomy Restaurant   0.0
8       Modern European Restaurant   0.0
9                Mobile Phone Shop   0.0


---- Alderwood / Long Branch
 ----
                venue  freq
0         Pizza Place   0.2
1                 Gym   0.1
2  Athletics & Sports   0.1
3         Coffee Shop   0.1
4                 Pub   0.1
5        Dance Studio   0.1
6      Sandwich Place   0.1
7        Skating Rink   0.1
8            Pharmacy   0.1
9        Home Service   0.0


---- Bathurst Manor / Wilson Heights / Downsview North
 ----
                 venue  freq
0                 Bank  0.11
1          Coffee Shop  0.11
2        Shopping Mall  0.05
3   

                  venue  freq
0                   Gym  0.07
1   Japanese Restaurant  0.07
2            Beer Store  0.07
3           Coffee Shop  0.07
4            Restaurant  0.07
5      Asian Restaurant  0.07
6  Gym / Fitness Center  0.04
7    Athletics & Sports  0.04
8    Chinese Restaurant  0.04
9  Caribbean Restaurant  0.04


---- Dorset Park / Wexford Heights / Scarborough Town Centre
 ----
                             venue  freq
0                Indian Restaurant  0.29
1                        Pet Store  0.14
2            Vietnamese Restaurant  0.14
3                          Brewery  0.14
4               Chinese Restaurant  0.14
5           Thrift / Vintage Store  0.14
6                Mobile Phone Shop  0.00
7                            Motel  0.00
8              Monument / Landmark  0.00
9  Molecular Gastronomy Restaurant  0.00


---- Downsview
 ----
                venue  freq
0                Park  0.13
1       Grocery Store  0.13
2      Discount Store  0.07
3      Baseball

                             venue  freq
0                   Discount Store  0.33
1                 Department Store  0.17
2                    Train Station  0.17
3                      Coffee Shop  0.17
4                       Hobby Shop  0.17
5                            Motel  0.00
6              Monument / Landmark  0.00
7  Molecular Gastronomy Restaurant  0.00
8       Modern European Restaurant  0.00
9                Mobile Phone Shop  0.00


---- Kensington Market / Chinatown / Grange Park
 ----
                           venue  freq
0                           Café  0.10
1                    Coffee Shop  0.08
2             Mexican Restaurant  0.06
3                         Bakery  0.06
4          Vietnamese Restaurant  0.04
5                    Gaming Cafe  0.04
6  Vegetarian / Vegan Restaurant  0.04
7                   Dessert Shop  0.04
8                  Poutine Place  0.02
9                           Park  0.02


---- Kingsview Village / St. Phillips / Martin Grove Gardens 

                  venue  freq
0           Pizza Place  0.18
1             Gastropub  0.09
2        Breakfast Spot  0.09
3                  Bank  0.09
4          Intersection  0.09
5    Athletics & Sports  0.09
6  Fast Food Restaurant  0.09
7              Pharmacy  0.09
8  Gym / Fitness Center  0.09
9             Pet Store  0.09


---- Parkwoods
 ----
                      venue  freq
0                      Park   0.5
1         Food & Drink Shop   0.5
2               Yoga Studio   0.0
3                     Motel   0.0
4            Massage Studio   0.0
5            Medical Center   0.0
6  Mediterranean Restaurant   0.0
7               Men's Store   0.0
8             Metro Station   0.0
9        Mexican Restaurant   0.0


---- Queen's Park / Ontario Provincial Government
 ----
                 venue  freq
0          Coffee Shop  0.26
1     Sushi Restaurant  0.08
2                Diner  0.05
3          Yoga Studio  0.03
4  Distribution Center  0.03
5         Burger Joint  0.03
6        Bur

                             venue  freq
0                            River  0.25
1                       Smoke Shop  0.25
2                             Park  0.25
3                             Pool  0.25
4        Middle Eastern Restaurant  0.00
5              Monument / Landmark  0.00
6  Molecular Gastronomy Restaurant  0.00
7       Modern European Restaurant  0.00
8                Mobile Phone Shop  0.00
9               Miscellaneous Shop  0.00


---- Thorncliffe Park
 ----
                  venue  freq
0     Indian Restaurant  0.10
1           Yoga Studio  0.05
2                  Bank  0.05
3           Coffee Shop  0.05
4        Discount Store  0.05
5  Fast Food Restaurant  0.05
6           Gas Station  0.05
7         Grocery Store  0.05
8   Housing Development  0.05
9          Liquor Store  0.05


---- Toronto Dominion Centre / Design Exchange
 ----
                 venue  freq
0                 Café  0.14
1          Coffee Shop  0.12
2   Seafood Restaurant  0.06
3        Deli / Bo

In [101]:
indicators = ['st', 'nd', 'rd']
# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
tor_neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
tor_neighborhoods_venues_sorted['Neighborhood'] = toronto_grouped['Neighborhood']

for ind in np.arange(toronto_grouped.shape[0]):
    tor_neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(ny_grouped.iloc[ind, :], num_top_venues)

tor_neighborhoods_venues_sorted.head(10)

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Agincourt,Pizza Place,Deli / Bodega,Chinese Restaurant,Department Store,Supermarket,Bakery,Gas Station,Breakfast Spot,Spanish Restaurant,Spa
1,Alderwood / Long Branch,Pizza Place,Bakery,Park,Diner,Sushi Restaurant,Bagel Shop,Train Station,Restaurant,American Restaurant,Sports Bar
2,Bathurst Manor / Wilson Heights / Downsview No...,Pharmacy,Coffee Shop,Bus Stop,Pizza Place,Women's Store,Field,Event Space,Exhibit,Eye Doctor,Factory
3,Bayview Village,Bus Stop,Deli / Bodega,Intersection,American Restaurant,Coffee Shop,Grocery Store,Women's Store,Filipino Restaurant,Exhibit,Eye Doctor
4,Bedford Park / Lawrence Manor East,Deli / Bodega,Italian Restaurant,Bus Stop,Pizza Place,Nail Salon,Sporting Goods Shop,Supermarket,Middle Eastern Restaurant,Outdoors & Recreation,Mediterranean Restaurant
5,Berczy Park,Surf Spot,Metro Station,Sandwich Place,Bus Stop,Donut Shop,Coffee Shop,Pizza Place,Playground,Board Shop,Beach
6,Birch Cliff / Cliffside West,Bar,Seafood Restaurant,Greek Restaurant,Pub,Mediterranean Restaurant,Dessert Shop,Ice Cream Shop,Gym,Gourmet Shop,Bakery
7,Brockton / Parkdale Village / Exhibition Place,Italian Restaurant,Burger Joint,Bowling Alley,Bakery,Liquor Store,Hostel,Playground,Plaza,Pizza Place,Shopping Mall
8,Business reply mail Processing CentrE,Athletics & Sports,Toy / Game Store,Train,Miscellaneous Shop,Mobile Phone Shop,Pharmacy,Pet Store,Hookah Bar,Korean Restaurant,Supermarket
9,CN Tower / King and Spadina / Railway Lands / ...,Pharmacy,Chinese Restaurant,Bubble Tea Shop,Sushi Restaurant,Fast Food Restaurant,Asian Restaurant,Italian Restaurant,Gas Station,Ice Cream Shop,Rental Car Location


### Now let's finally apply our intended business rule, that motivated this study case. We want to filter these datasets to final ones, under some conditions:
### a) we want to list neighborhoods that have, among the ten most common venues, at least one category of these subset: BUS STATION, BUS STOP, METRO STATION AND TRAIN STATION. If the neighborhood don't meet the condition, it is out of the intended dataset;
### b) considering the a) condition was met, we will look for the venue categories including the keywords 'cafe', 'coffee', 'donut', 'bakery', among the ten most. If any of the keywords is found, we have a neighborhood on our final dataset.
### Our business rule is designed to support the neighborhood to open a coffee shop-like venue. Therefore, consider advertising campaigns on transportation stations and possibly opening this venue near these popular venues. For that, it is also needed to consider if the neighborhood has many other venues of the same category or similar ones.


#### Let's begin our analysis with New York

In [104]:
ny_wanted_list=[]
for name, first, second, third, fourth, fifth, sixth, seventh, eighth, nineth, tenth in zip(ny_neighborhoods_venues_sorted['Neighborhood'],
                                                                                           ny_neighborhoods_venues_sorted['1st Most Common Venue'],
                                                                                           ny_neighborhoods_venues_sorted['2nd Most Common Venue'],
                                                                                           ny_neighborhoods_venues_sorted['3rd Most Common Venue'],
                                                                                           ny_neighborhoods_venues_sorted['4th Most Common Venue'],
                                                                                           ny_neighborhoods_venues_sorted['5th Most Common Venue'],
                                                                                           ny_neighborhoods_venues_sorted['6th Most Common Venue'],
                                                                                           ny_neighborhoods_venues_sorted['7th Most Common Venue'],
                                                                                           ny_neighborhoods_venues_sorted['8th Most Common Venue'],
                                                                                           ny_neighborhoods_venues_sorted['9th Most Common Venue'],
                                                                                           ny_neighborhoods_venues_sorted['10th Most Common Venue']):
    ranked_station = False
    ranked_coffee = False
    coffee_matchers = ['bakery','café','cafe','coffee','donut']
    station_matchers = ['bus stop','bus station','metro station','train station']
    for station in station_matchers:
        if station in (first.lower(),second.lower(),third.lower(),fourth.lower(),fifth.lower(),sixth.lower(),seventh.lower(),eighth.lower(),nineth.lower(),tenth.lower()):
            ranked_station = True
            for match in coffee_matchers:
                if match in (first.lower(),second.lower(),third.lower(),fourth.lower(),fifth.lower(),sixth.lower(),seventh.lower(),eighth.lower(),nineth.lower(),tenth.lower()):
                    ranked_coffee = True
                    ny_wanted_list.append([name, first, second, third,fourth, fifth, sixth, seventh, eighth, nineth, tenth])
ny_potential_neigh = pd.DataFrame(ny_wanted_list, columns=['Neighborhood','1st Most Common Venue','2nd Most Common Venue','3rd Most Common Venue','4th Most Common Venue','5th Most Common Venue','6th Most Common Venue','7th Most Common Venue','8th Most Common Venue','9th Most Common Venue','10th Most Common Venue'])

ny_potential_neigh
            

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Annadale,Pizza Place,Bakery,Park,Diner,Sushi Restaurant,Bagel Shop,Train Station,Restaurant,American Restaurant,Sports Bar
1,Claremont Village,Grocery Store,Chinese Restaurant,Pizza Place,Bus Station,Food,Liquor Store,Caribbean Restaurant,Gym,Bakery,Discount Store
2,Eastchester,Caribbean Restaurant,Deli / Bodega,Diner,Bowling Alley,Donut Shop,Metro Station,Bakery,Seafood Restaurant,Fast Food Restaurant,Pizza Place
3,Grasmere,Bus Stop,Bakery,Deli / Bodega,Restaurant,Park,Grocery Store,Pharmacy,Bank,Bagel Shop,Home Service
4,Gravesend,Italian Restaurant,Lounge,Pizza Place,Bus Station,Bakery,Furniture / Home Store,Donut Shop,Sporting Goods Shop,Record Shop,Men's Store
5,Manhattan Beach,Bus Stop,Café,Sandwich Place,Beach,Food,Ice Cream Shop,Playground,Women's Store,Exhibit,Eye Doctor
6,Mott Haven,Gym,Spanish Restaurant,Donut Shop,Pizza Place,Latin American Restaurant,Metro Station,Burger Joint,Bakery,Peruvian Restaurant,Electronics Store
7,South Jamaica,Bus Station,Vegetarian / Vegan Restaurant,Bakery,Supermarket,Caribbean Restaurant,Grocery Store,Sandwich Place,Field,Exhibit,Eye Doctor
8,Tompkinsville,Thrift / Vintage Store,Brewery,Rock Club,Bus Stop,Spanish Restaurant,Supermarket,Café,Caribbean Restaurant,Mexican Restaurant,Gastropub


#### Let's see them on the map

In [115]:
ny_list_potential_neigh=ny_potential_neigh['Neighborhood'].to_list()
ny_list_potential_neigh

['Annadale',
 'Claremont Village',
 'Eastchester',
 'Grasmere',
 'Gravesend',
 'Manhattan Beach',
 'Mott Haven',
 'South Jamaica',
 'Tompkinsville']

In [114]:
ny_list_potential_neigh

['Annadale',
 'Claremont Village',
 'Eastchester',
 'Grasmere',
 'Gravesend',
 'Manhattan Beach',
 'Mott Haven',
 'South Jamaica',
 'Tompkinsville']

In [116]:
ny_grouped_potential = ny_grouped[ny_grouped['Neighborhood'].isin(ny_list_potential_neigh)]

In [125]:
ny_grouped_potential_coord = ny_neighborhoods[ny_neighborhoods['Neighborhood'].isin(ny_list_potential_neigh)]

In [117]:
ny_grouped_potential

Unnamed: 0,Neighborhood,Yoga Studio,Accessories Store,Adult Boutique,Afghan Restaurant,African Restaurant,Airport Terminal,American Restaurant,Antique Shop,Arcade,...,Volleyball Court,Warehouse Store,Waste Facility,Waterfront,Weight Loss Center,Whisky Bar,Wine Bar,Wine Shop,Wings Joint,Women's Store
1,Annadale,0.0,0.0,0.0,0.0,0.0,0.0,0.071429,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
54,Claremont Village,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
83,Eastchester,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
117,Grasmere,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
118,Gravesend,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
161,Manhattan Beach,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
182,Mott Haven,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
251,South Jamaica,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
268,Tompkinsville,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [126]:
ny_grouped_potential_coord

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude
2,Bronx,Eastchester,40.887556,-73.827806
21,Bronx,Mott Haven,40.806239,-73.9161
50,Brooklyn,Gravesend,40.59526,-73.973471
77,Brooklyn,Manhattan Beach,40.577914,-73.943537
164,Queens,South Jamaica,40.696911,-73.790426
215,Staten Island,Annadale,40.538114,-74.178549
218,Staten Island,Tompkinsville,40.637316,-74.080554
229,Staten Island,Grasmere,40.598268,-74.076674
267,Bronx,Claremont Village,40.831428,-73.901199


In [118]:
# set number of clusters
kclusters = 3
ny_clustering = ny_grouped_potential.drop('Neighborhood', 1)
# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(ny_clustering)

In [119]:
# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

array([2, 2, 2, 2, 2, 0, 2, 1, 2])

In [128]:
# add clustering labels
ny_potential_neigh.insert(0, 'Cluster Labels', kmeans.labels_)

# merge ny_grouped with NY data to add latitude/longitude for each neighborhood
ny_merged = ny_grouped_potential_coord.join(ny_potential_neigh.set_index('Neighborhood'), on='Neighborhood')

In [129]:
ny_merged

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
2,Bronx,Eastchester,40.887556,-73.827806,2,Caribbean Restaurant,Deli / Bodega,Diner,Bowling Alley,Donut Shop,Metro Station,Bakery,Seafood Restaurant,Fast Food Restaurant,Pizza Place
21,Bronx,Mott Haven,40.806239,-73.9161,2,Gym,Spanish Restaurant,Donut Shop,Pizza Place,Latin American Restaurant,Metro Station,Burger Joint,Bakery,Peruvian Restaurant,Electronics Store
50,Brooklyn,Gravesend,40.59526,-73.973471,2,Italian Restaurant,Lounge,Pizza Place,Bus Station,Bakery,Furniture / Home Store,Donut Shop,Sporting Goods Shop,Record Shop,Men's Store
77,Brooklyn,Manhattan Beach,40.577914,-73.943537,0,Bus Stop,Café,Sandwich Place,Beach,Food,Ice Cream Shop,Playground,Women's Store,Exhibit,Eye Doctor
164,Queens,South Jamaica,40.696911,-73.790426,1,Bus Station,Vegetarian / Vegan Restaurant,Bakery,Supermarket,Caribbean Restaurant,Grocery Store,Sandwich Place,Field,Exhibit,Eye Doctor
215,Staten Island,Annadale,40.538114,-74.178549,2,Pizza Place,Bakery,Park,Diner,Sushi Restaurant,Bagel Shop,Train Station,Restaurant,American Restaurant,Sports Bar
218,Staten Island,Tompkinsville,40.637316,-74.080554,2,Thrift / Vintage Store,Brewery,Rock Club,Bus Stop,Spanish Restaurant,Supermarket,Café,Caribbean Restaurant,Mexican Restaurant,Gastropub
229,Staten Island,Grasmere,40.598268,-74.076674,2,Bus Stop,Bakery,Deli / Bodega,Restaurant,Park,Grocery Store,Pharmacy,Bank,Bagel Shop,Home Service
267,Bronx,Claremont Village,40.831428,-73.901199,2,Grocery Store,Chinese Restaurant,Pizza Place,Bus Station,Food,Liquor Store,Caribbean Restaurant,Gym,Bakery,Discount Store


In [149]:
# create map
map_ny_clusters = folium.Map(location=[ny_latitude, ny_longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(ny_merged['Latitude'], ny_merged['Longitude'], ny_merged['Neighborhood'], ny_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_ny_clusters)
       
map_ny_clusters

#### This analysis can tell us on different perspectives how we can approach the decision of opening a new venue.

#### Now, let's do same with Toronto

In [132]:
tor_wanted_list=[]
for name, first, second, third, fourth, fifth, sixth, seventh, eighth, nineth, tenth in zip(tor_neighborhoods_venues_sorted['Neighborhood'],
                                                                                           tor_neighborhoods_venues_sorted['1st Most Common Venue'],
                                                                                           tor_neighborhoods_venues_sorted['2nd Most Common Venue'],
                                                                                           tor_neighborhoods_venues_sorted['3rd Most Common Venue'],
                                                                                           tor_neighborhoods_venues_sorted['4th Most Common Venue'],
                                                                                           tor_neighborhoods_venues_sorted['5th Most Common Venue'],
                                                                                           tor_neighborhoods_venues_sorted['6th Most Common Venue'],
                                                                                           tor_neighborhoods_venues_sorted['7th Most Common Venue'],
                                                                                           tor_neighborhoods_venues_sorted['8th Most Common Venue'],
                                                                                           tor_neighborhoods_venues_sorted['9th Most Common Venue'],
                                                                                           tor_neighborhoods_venues_sorted['10th Most Common Venue']):
    ranked_station = False
    ranked_coffee = False
    coffee_matchers = ['bakery','café','cafe','coffee','donut']
    station_matchers = ['bus stop','bus station','metro station','train station']
    for station in station_matchers:
        if station in (first.lower(),second.lower(),third.lower(),fourth.lower(),fifth.lower(),sixth.lower(),seventh.lower(),eighth.lower(),nineth.lower(),tenth.lower()):
            ranked_station = True
            for match in coffee_matchers:
                if match in (first.lower(),second.lower(),third.lower(),fourth.lower(),fifth.lower(),sixth.lower(),seventh.lower(),eighth.lower(),nineth.lower(),tenth.lower()):
                    ranked_coffee = True
                    tor_wanted_list.append([name, first, second, third,fourth, fifth, sixth, seventh, eighth, nineth, tenth])
tor_potential_neigh = pd.DataFrame(tor_wanted_list, columns=['Neighborhood','1st Most Common Venue','2nd Most Common Venue','3rd Most Common Venue','4th Most Common Venue','5th Most Common Venue','6th Most Common Venue','7th Most Common Venue','8th Most Common Venue','9th Most Common Venue','10th Most Common Venue'])

tor_potential_neigh
            

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Alderwood / Long Branch,Pizza Place,Bakery,Park,Diner,Sushi Restaurant,Bagel Shop,Train Station,Restaurant,American Restaurant,Sports Bar
1,North Park / Maple Leaf Park / Upwood Park,Grocery Store,Chinese Restaurant,Pizza Place,Bus Station,Food,Liquor Store,Caribbean Restaurant,Gym,Bakery,Discount Store
2,Toronto Dominion Centre / Design Exchange,Caribbean Restaurant,Deli / Bodega,Diner,Bowling Alley,Donut Shop,Metro Station,Bakery,Seafood Restaurant,Fast Food Restaurant,Pizza Place


In [134]:
tor_list_potential_neigh=tor_potential_neigh['Neighborhood'].to_list()
tor_list_potential_neigh

['Alderwood / Long Branch\n',
 'North Park / Maple Leaf Park / Upwood Park\n',
 'Toronto Dominion Centre / Design Exchange\n']

In [135]:
tor_grouped_potential = toronto_grouped[toronto_grouped['Neighborhood'].isin(tor_list_potential_neigh)]

In [136]:
tor_grouped_potential_coord = tor_neighborhoods[tor_neighborhoods['Neighborhood'].isin(tor_list_potential_neigh)]

In [137]:
tor_grouped_potential_coord

Unnamed: 0,Postal Code,Borough,Neighborhood,Latitude,Longitude
60,M5K,Downtown Toronto,Toronto Dominion Centre / Design Exchange,43.647177,-79.381576
79,M6L,North York,North Park / Maple Leaf Park / Upwood Park,43.713756,-79.490074
89,M8W,Etobicoke,Alderwood / Long Branch,43.602414,-79.543484


#### We understand that, for only three neighborhoods on our final dataset, there is no need for clustering the map

In [150]:
map_toronto_grouped = folium.Map(location=[tor_latitude, tor_longitude], zoom_start=10)

# add markers to map
for lat, lng, borough, neighborhood in zip(tor_grouped_potential_coord['Latitude'], tor_grouped_potential_coord['Longitude'], tor_grouped_potential_coord['Borough'], tor_grouped_potential_coord['Neighborhood']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='green',
        fill=True,
        fill_color='#2ee74d',
        fill_opacity=0.7,
        parse_html=False).add_to(map_toronto_grouped)  
    
map_toronto_grouped