# Title: finding the best neighborhood (The Battle of Neighborhoods)

## Introduction 
In today's dynamic world, it is common that people find a new job and have to move to a new city or neighborhood. Let's say a person got a job offer from a big company with great career prospects in another city or maybe another country. If this person accepts the job offer, then he must move to a new location. Probably this person would prefer to move to a location that is similar to the place he lives currently in. In this way, he can continue to follow his hobbies and habits and can integrate easier and faster. He has access to venues of his interest in his current neighborhood like gym, swimming pool, cinema, theater, amusement park, restaurants, coffee shops, etc. in the new location, too. To this end, here we want to provide a possibility to find out what are the similar neighborhoods in the new city that are similar to the current neighborhood.


First let's import(install) the required libraries:

In [34]:
!pip install folium



In [35]:
import json, requests
import numpy as np
import pandas as pd
import io
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe
from geopy.geocoders import Nominatim
import folium 
# Matplotlib and associated plotting modules
import matplotlib.pyplot as plt
import matplotlib.cm as cm
import matplotlib.colors as colors
from bs4 import BeautifulSoup
from sklearn.cluster import KMeans

## Getting data of the currect location (New York)

Let's assume our current location is 'Manhattan' in 'New York'. First we need to load the data:

In [36]:
!wget -q -O 'newyork_data.json' https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBMDeveloperSkillsNetwork-DS0701EN-SkillsNetwork/labs/newyork_data.json
print('Data downloaded!')

Data downloaded!


In [37]:
with open('newyork_data.json') as json_data:
    newyork_data = json.load(json_data)

All the relevant data is in the _features_ key, which is basically a list of the neighborhoods. So, let's define a new variable that includes this data.

In [38]:
neighborhoods_data = newyork_data['features']

Let's take a look at the first item in this list:

In [39]:
neighborhoods_data[0]

{'type': 'Feature',
 'id': 'nyu_2451_34572.1',
 'geometry': {'type': 'Point',
  'coordinates': [-73.84720052054902, 40.89470517661]},
 'geometry_name': 'geom',
 'properties': {'name': 'Wakefield',
  'stacked': 1,
  'annoline1': 'Wakefield',
  'annoline2': None,
  'annoline3': None,
  'annoangle': 0.0,
  'borough': 'Bronx',
  'bbox': [-73.84720052054902,
   40.89470517661,
   -73.84720052054902,
   40.89470517661]}}

The next step is transforming this data of nested Python dictionaries into a _pandas_ dataframe. To this end, first we create an empty dataframe.

In [40]:
# define the dataframe columns
column_names = ['Borough', 'Neighborhood', 'Latitude', 'Longitude'] 

# instantiate the dataframe
neighborhoods = pd.DataFrame(columns=column_names)

Next, we will loop through the data and fill the dataframe one row at a time.

In [41]:
for data in neighborhoods_data:
    borough = neighborhood_name = data['properties']['borough'] 
    neighborhood_name = data['properties']['name']
        
    neighborhood_latlon = data['geometry']['coordinates']
    neighborhood_lat = neighborhood_latlon[1]
    neighborhood_lon = neighborhood_latlon[0]
    
    neighborhoods = neighborhoods.append({'Borough': borough,
                                          'Neighborhood': neighborhood_name,
                                          'Latitude': neighborhood_lat,
                                          'Longitude': neighborhood_lon}, ignore_index=True)

Let's first check the resulting dataframe.

In [42]:
neighborhoods.head()

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude
0,Bronx,Wakefield,40.894705,-73.847201
1,Bronx,Co-op City,40.874294,-73.829939
2,Bronx,Eastchester,40.887556,-73.827806
3,Bronx,Fieldston,40.895437,-73.905643
4,Bronx,Riverdale,40.890834,-73.912585


Now, we will extract the related information about Manhattan form the dataframe.

In [43]:
manhattan_data = neighborhoods[neighborhoods['Borough'] == 'Manhattan'].reset_index(drop=True)
manhattan_data.head()

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude
0,Manhattan,Marble Hill,40.876551,-73.91066
1,Manhattan,Chinatown,40.715618,-73.994279
2,Manhattan,Washington Heights,40.851903,-73.9369
3,Manhattan,Inwood,40.867684,-73.92121
4,Manhattan,Hamilton Heights,40.823604,-73.949688


Let's visualize Manhattan and the neighborhoods in it.

In [44]:
address = 'Manhattan, NY'

geolocator = Nominatim(user_agent="ny_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Manhattan are {}, {}.'.format(latitude, longitude))

# create map of Manhattan using latitude and longitude values
map_manhattan = folium.Map(location=[latitude, longitude], zoom_start=11)

# add markers to map
for lat, lng, label in zip(manhattan_data['Latitude'], manhattan_data['Longitude'], manhattan_data['Neighborhood']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_manhattan)  
    
map_manhattan

The geograpical coordinate of Manhattan are 40.7896239, -73.9598939.


We assume one person is living in Midtown, Manhattan in the New York city. Now he wants to move to Toronto city to start a new job and he wants to find the most similar neighbourhood to his current location in Toronto city.


In [45]:
current_neighborhood = manhattan_data[manhattan_data['Neighborhood']== 'Midtown']
current_neighborhood

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude
15,Manhattan,Midtown,40.754691,-73.981669


Next, we are going to start utilizing the Foursquare API to explore the venues in our selcted neighborhood.

In [46]:
CLIENT_ID = 'TSBCBDH5U15453ZMKOYCEQLAXCT40QNYQIMXIQZEL1ZS4Q1S' # your Foursquare ID
CLIENT_SECRET = '2TRNKVNLSZUZLLVG1MIZOFRTRQ1RUVXNZF1I2CSE2P4CBB0L' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version
LIMIT = 100 # A default Foursquare API limit value

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: TSBCBDH5U15453ZMKOYCEQLAXCT40QNYQIMXIQZEL1ZS4Q1S
CLIENT_SECRET:2TRNKVNLSZUZLLVG1MIZOFRTRQ1RUVXNZF1I2CSE2P4CBB0L


Now, let's get the top 100 venues that are in Midtown within a radius of 500 meters.

In [47]:
current_neighborhood.reset_index(inplace=True)
current_neighborhood

Unnamed: 0,index,Borough,Neighborhood,Latitude,Longitude
0,15,Manhattan,Midtown,40.754691,-73.981669


In [48]:
neighborhood_latitude = current_neighborhood.loc[0, 'Latitude']
neighborhood_longitude = current_neighborhood.loc[0, 'Longitude']
print("neighborhood latitude: " + str(neighborhood_latitude))
print("neighborhood_longitude: " + str(neighborhood_longitude))

neighborhood latitude: 40.75469110270623
neighborhood_longitude: -73.98166882730304


In [49]:
radius = 500
LIMIT = 100
url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
    CLIENT_ID, 
    CLIENT_SECRET, 
    VERSION, 
    neighborhood_latitude, 
    neighborhood_longitude, 
    radius, 
    LIMIT)

Send the GET request and examine the resutls

In [50]:
results = requests.get(url).json()
results

{'meta': {'code': 500,
  'errorType': 'server_error',
  'errorDetail': 'Foursquare servers are experiencing problems. Please retry and check status.foursquare.com for updates.'},
 'response': {}}

First, we define a function to find all the venues in the neighborhoods

In [51]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

Now we create a new dataframe called _midtown_venues_.

In [52]:
midtown_venues = getNearbyVenues(names=current_neighborhood['Neighborhood'],
                                   latitudes=current_neighborhood['Latitude'],
                                   longitudes=current_neighborhood['Longitude'], radius=500);
midtown_venues.head(100)
print("number of venues in midtown: " + str(midtown_venues.shape[0]))
midtown_venues.head()

Midtown
number of venues in midtown: 100


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Midtown,40.754691,-73.981669,Bryant Park,40.753621,-73.983265,Park
1,Midtown,40.754691,-73.981669,New York Public Library Terrace,40.753017,-73.98148,Plaza
2,Midtown,40.754691,-73.981669,Nat Sherman Townhouse,40.753283,-73.980358,Smoke Shop
3,Midtown,40.754691,-73.981669,Joanna Vargas Skin Care,40.753136,-73.980721,Spa
4,Midtown,40.754691,-73.981669,sweetgreen,40.75464,-73.983102,Salad Place


## Getting data of the destination (Toronto)

first we scrape the list of neighborhoods from wikipedia

In [53]:
data = pd.read_html("https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M")
toronto_neighbourhood = data[0]
toronto_neighbourhood

Unnamed: 0,Postal Code,Borough,Neighbourhood
0,M1A,Not assigned,Not assigned
1,M2A,Not assigned,Not assigned
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,"Regent Park, Harbourfront"
5,M6A,North York,"Lawrence Manor, Lawrence Heights"
6,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government"
7,M8A,Not assigned,Not assigned
8,M9A,Etobicoke,"Islington Avenue, Humber Valley Village"
9,M1B,Scarborough,"Malvern, Rouge"


In [54]:
# get the index of columns where borough is not assigned
index = toronto_neighbourhood[ toronto_neighbourhood['Borough'] == 'Not assigned' ].index
toronto_neighbourhood.drop(index, inplace=True)

In [55]:
#Resetting indexing the dataframe:
toronto_neighbourhood.reset_index(inplace=True,drop=True)
toronto_neighbourhood.head(10)
print("Here is the shape of the dataframe:" +str(toronto_neighbourhood.shape) )

Here is the shape of the dataframe:(103, 3)


Getting latitute and longitude of each neighbourhood and merging with the dataframe

In [56]:
url="https://cocl.us/Geospatial_data"
s=requests.get(url).content
location=pd.read_csv(io.StringIO(s.decode('utf-8')))
location.head()

Unnamed: 0,Postal Code,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497
2,M1E,43.763573,-79.188711
3,M1G,43.770992,-79.216917
4,M1H,43.773136,-79.239476


merging location with neighbourhood dataframe:

In [57]:
toronto_neighbourhood = toronto_neighbourhood.merge(location, left_on='Postal Code', right_on='Postal Code')
toronto_neighbourhood.head(10)

Unnamed: 0,Postal Code,Borough,Neighbourhood,Latitude,Longitude
0,M3A,North York,Parkwoods,43.753259,-79.329656
1,M4A,North York,Victoria Village,43.725882,-79.315572
2,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.65426,-79.360636
3,M6A,North York,"Lawrence Manor, Lawrence Heights",43.718518,-79.464763
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",43.662301,-79.389494
5,M9A,Etobicoke,"Islington Avenue, Humber Valley Village",43.667856,-79.532242
6,M1B,Scarborough,"Malvern, Rouge",43.806686,-79.194353
7,M3B,North York,Don Mills,43.745906,-79.352188
8,M4B,East York,"Parkview Hill, Woodbine Gardens",43.706397,-79.309937
9,M5B,Downtown Toronto,"Garden District, Ryerson",43.657162,-79.378937


We assume that the person who is moving from New York to Toronto has found a job in Downtown Toronto. Therefore we will select only the neighborhoods in downtown Toronto. As a result, we will slice the original dataframe and create a new dataframe of the downtown data.

In [58]:
downtown_data = toronto_neighbourhood[toronto_neighbourhood['Borough'] == 'Downtown Toronto'].reset_index(drop=True)
downtown_data.head()

Unnamed: 0,Postal Code,Borough,Neighbourhood,Latitude,Longitude
0,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.65426,-79.360636
1,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",43.662301,-79.389494
2,M5B,Downtown Toronto,"Garden District, Ryerson",43.657162,-79.378937
3,M5C,Downtown Toronto,St. James Town,43.651494,-79.375418
4,M5E,Downtown Toronto,Berczy Park,43.644771,-79.373306


Let's get the geographical coordinates of Dowontown:

In [59]:
address = 'Downtown Toronto, Toronto, Ontario'

geolocator = Nominatim(user_agent="to_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of downtown are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of downtown are 43.6563221, -79.3809161.


Let's visualize downtown and the neighborhoods in it.

In [60]:
# create map of downtown using latitude and longitude values
map_downtown = folium.Map(location=[latitude, longitude], zoom_start=13)

# add markers to map
for lat, lng, label in zip(downtown_data['Latitude'], downtown_data['Longitude'], downtown_data['Neighbourhood']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_downtown)  
    
map_downtown

Next, we are going to start utilizing the Foursquare API to find the venues in the neighborhoods

In [61]:
CLIENT_ID = 'TSBCBDH5U15453ZMKOYCEQLAXCT40QNYQIMXIQZEL1ZS4Q1S' 
CLIENT_SECRET = '2TRNKVNLSZUZLLVG1MIZOFRTRQ1RUVXNZF1I2CSE2P4CBB0L' 
VERSION = '20180605' # Foursquare API version
LIMIT = 100 # A default Foursquare API limit value

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: TSBCBDH5U15453ZMKOYCEQLAXCT40QNYQIMXIQZEL1ZS4Q1S
CLIENT_SECRET:2TRNKVNLSZUZLLVG1MIZOFRTRQ1RUVXNZF1I2CSE2P4CBB0L


Now we create a new dataframe called _downtown_venues_.

In [62]:
downtown_venues = getNearbyVenues(names=downtown_data['Neighbourhood'],
                                   latitudes=downtown_data['Latitude'],
                                   longitudes=downtown_data['Longitude'], radius=500)

Regent Park, Harbourfront
Queen's Park, Ontario Provincial Government
Garden District, Ryerson
St. James Town
Berczy Park
Central Bay Street
Christie
Richmond, Adelaide, King
Harbourfront East, Union Station, Toronto Islands
Toronto Dominion Centre, Design Exchange
Commerce Court, Victoria Hotel
University of Toronto, Harbord
Kensington Market, Chinatown, Grange Park
CN Tower, King and Spadina, Railway Lands, Harbourfront West, Bathurst Quay, South Niagara, Island airport
Rosedale
Stn A PO Boxes
St. James Town, Cabbagetown
First Canadian Place, Underground city
Church and Wellesley


we check the size of the resulting dataframe

In [63]:
print("Number of venues in downtown: " + str(downtown_venues.shape[0]))
downtown_venues.head()

Number of venues in downtown: 1248


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,"Regent Park, Harbourfront",43.65426,-79.360636,Roselle Desserts,43.653447,-79.362017,Bakery
1,"Regent Park, Harbourfront",43.65426,-79.360636,Tandem Coffee,43.653559,-79.361809,Coffee Shop
2,"Regent Park, Harbourfront",43.65426,-79.360636,Cooper Koo Family YMCA,43.653249,-79.358008,Distribution Center
3,"Regent Park, Harbourfront",43.65426,-79.360636,Impact Kitchen,43.656369,-79.35698,Restaurant
4,"Regent Park, Harbourfront",43.65426,-79.360636,Body Blitz Spa East,43.654735,-79.359874,Spa


Let's check how many venues were returned for each neighborhood

In [64]:
downtown_venues.groupby('Neighborhood').count()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Berczy Park,55,55,55,55,55,55
"CN Tower, King and Spadina, Railway Lands, Harbourfront West, Bathurst Quay, South Niagara, Island airport",16,16,16,16,16,16
Central Bay Street,68,68,68,68,68,68
Christie,16,16,16,16,16,16
Church and Wellesley,75,75,75,75,75,75
"Commerce Court, Victoria Hotel",100,100,100,100,100,100
"First Canadian Place, Underground city",100,100,100,100,100,100
"Garden District, Ryerson",100,100,100,100,100,100
"Harbourfront East, Union Station, Toronto Islands",100,100,100,100,100,100
"Kensington Market, Chinatown, Grange Park",74,74,74,74,74,74


## Methodology

Now we can start analyzing the data. First, we will do onehot encoding for downtown venues

In [65]:
# one hot encoding
downtown_onehot = pd.get_dummies(downtown_venues[['Venue Category']], prefix="", prefix_sep="")
first_col = downtown_onehot.pop('Neighborhood')
downtown_onehot.insert(0, 'Neighborhood', downtown_venues['Neighborhood'] )
downtown_onehot.head(10)

Unnamed: 0,Neighborhood,Afghan Restaurant,Airport,Airport Food Court,Airport Gate,Airport Lounge,Airport Service,Airport Terminal,American Restaurant,Antique Shop,Aquarium,Art Gallery,Art Museum,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,BBQ Joint,Baby Store,Bagel Shop,Bakery,Bank,Bar,Baseball Stadium,Basketball Stadium,Beach,Bed & Breakfast,Beer Bar,Beer Store,Belgian Restaurant,Bistro,Boat or Ferry,Bookstore,Boutique,Brazilian Restaurant,Breakfast Spot,Brewery,Bubble Tea Shop,Building,Burger Joint,Burrito Place,Butcher,Café,Camera Store,Candy Store,Caribbean Restaurant,Cheese Shop,Chinese Restaurant,Chocolate Shop,Church,Clothing Store,Cocktail Bar,Coffee Shop,College Arts Building,College Auditorium,College Cafeteria,College Gym,College Rec Center,Colombian Restaurant,Comfort Food Restaurant,Comic Shop,Concert Hall,Convenience Store,Cosmetics Shop,Creperie,Cupcake Shop,Dance Studio,Deli / Bodega,Department Store,Dessert Shop,Dim Sum Restaurant,Diner,Discount Store,Distribution Center,Dog Run,Doner Restaurant,Donut Shop,Dumpling Restaurant,Eastern European Restaurant,Electronics Store,Escape Room,Ethiopian Restaurant,Event Space,Falafel Restaurant,Farmers Market,Fast Food Restaurant,Filipino Restaurant,Fish Market,Food & Drink Shop,Food Court,Food Truck,Fountain,French Restaurant,Fried Chicken Joint,Furniture / Home Store,Gaming Cafe,Gastropub,Gay Bar,General Entertainment,General Travel,German Restaurant,Gift Shop,Gluten-free Restaurant,Gourmet Shop,Greek Restaurant,Grocery Store,Gym,Gym / Fitness Center,Harbor / Marina,Health & Beauty Service,Historic Site,History Museum,Hobby Shop,Hookah Bar,Hospital,Hostel,Hotel,Hotel Bar,IT Services,Ice Cream Shop,Indian Restaurant,Indie Movie Theater,Irish Pub,Italian Restaurant,Japanese Restaurant,Jazz Club,Jewelry Store,Juice Bar,Kitchen Supply Store,Korean Restaurant,Lake,Latin American Restaurant,Lingerie Store,Liquor Store,Lounge,Market,Martial Arts School,Massage Studio,Mediterranean Restaurant,Men's Store,Mexican Restaurant,Middle Eastern Restaurant,Miscellaneous Shop,Modern European Restaurant,Molecular Gastronomy Restaurant,Monument / Landmark,Moroccan Restaurant,Movie Theater,Museum,Music Venue,New American Restaurant,Nightclub,Noodle House,Office,Opera House,Optical Shop,Organic Grocery,Other Great Outdoors,Park,Performing Arts Venue,Pet Store,Pharmacy,Pizza Place,Plane,Playground,Plaza,Poke Place,Portuguese Restaurant,Poutine Place,Pub,Ramen Restaurant,Record Shop,Rental Car Location,Restaurant,Roof Deck,Sake Bar,Salad Place,Salon / Barbershop,Sandwich Place,Scenic Lookout,Sculpture Garden,Seafood Restaurant,Shoe Store,Shopping Mall,Skating Rink,Smoke Shop,Smoothie Shop,Snack Place,Soup Place,Spa,Speakeasy,Sporting Goods Shop,Sports Bar,Steakhouse,Strip Club,Supermarket,Sushi Restaurant,Tailor Shop,Taiwanese Restaurant,Tanning Salon,Tea Room,Thai Restaurant,Theater,Theme Restaurant,Toy / Game Store,Trail,Train Station,Vegetarian / Vegan Restaurant,Video Game Store,Vietnamese Restaurant,Wine Bar,Women's Store,Yoga Studio
0,"Regent Park, Harbourfront",0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,"Regent Park, Harbourfront",0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,"Regent Park, Harbourfront",0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,"Regent Park, Harbourfront",0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,"Regent Park, Harbourfront",0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
5,"Regent Park, Harbourfront",0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
6,"Regent Park, Harbourfront",0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
7,"Regent Park, Harbourfront",0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
8,"Regent Park, Harbourfront",0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
9,"Regent Park, Harbourfront",0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


Next, we do One-hot encoding for Midtown venues in New York:

In [66]:
# one hot encoding
midtown_onehot = pd.get_dummies(midtown_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
midtown_onehot['Neighborhood'] = midtown_venues['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [midtown_onehot.columns[-1]] + list(midtown_onehot.columns[:-1])
midtown_onehot = midtown_onehot[fixed_columns]

midtown_onehot.head(10)

Unnamed: 0,Neighborhood,American Restaurant,Art Gallery,Bakery,Bar,Bookstore,Boutique,Boxing Gym,Café,Chinese Restaurant,Clothing Store,Coffee Shop,Concert Hall,Cosmetics Shop,Cuban Restaurant,Cycle Studio,Deli / Bodega,Discount Store,Donut Shop,Fast Food Restaurant,Food Stand,Food Truck,French Restaurant,Golf Course,Gourmet Shop,Grocery Store,Gym,Gym / Fitness Center,Hawaiian Restaurant,Health & Beauty Service,Historic Site,Hotel,Indian Restaurant,Italian Restaurant,Japanese Restaurant,Martial Arts School,Mediterranean Restaurant,Miscellaneous Shop,Optical Shop,Park,Pharmacy,Pilates Studio,Pizza Place,Plaza,Salad Place,Salon / Barbershop,Sandwich Place,Smoke Shop,South American Restaurant,Spa,Sporting Goods Shop,Steakhouse,Sushi Restaurant,Szechuan Restaurant,Tailor Shop,Theater,Train Station,Video Game Store,Vietnamese Restaurant
0,Midtown,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,Midtown,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,Midtown,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0
3,Midtown,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0
4,Midtown,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0
5,Midtown,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
6,Midtown,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
7,Midtown,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
8,Midtown,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0
9,Midtown,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


### Common Venue Categories

Now we will search for common venue categories between the current neighborhood and the destiation neighborhoods
Our analysis we be only based on common vanue categories and other venue categories will be dropped.

In [67]:
#print('Leaving {} columns of New York city data\nLeaving {} columns of Toronto city data'
#      .format(list(midtown_onehot.columns[0:4]), list(downtown_onehot.columns[0:5])))
common = 0
diff_in_NY = 0
dif_ven = []
com_ven = ['Neighborhood']

for i in midtown_onehot.columns[1:]:
    if i in downtown_onehot.columns[1:]:
        common += 1
        com_ven.append(i)
    else:
        diff_in_NY += 1
        dif_ven.append(i)

print('\nNumber of common venue categories in both data are       :{}\n\
Number of different venue categories in New York city are: {}'.format(common, diff_in_NY,))
print('List of common venue cathegories in midtwon, Manhattan are : {}'.format(com_ven))


Number of common venue categories in both data are       :49
Number of different venue categories in New York city are: 9
List of common venue cathegories in midtwon, Manhattan are : ['Neighborhood', 'American Restaurant', 'Art Gallery', 'Bakery', 'Bar', 'Bookstore', 'Boutique', 'Café', 'Chinese Restaurant', 'Clothing Store', 'Coffee Shop', 'Concert Hall', 'Cosmetics Shop', 'Deli / Bodega', 'Discount Store', 'Donut Shop', 'Fast Food Restaurant', 'Food Truck', 'French Restaurant', 'Gourmet Shop', 'Grocery Store', 'Gym', 'Gym / Fitness Center', 'Health & Beauty Service', 'Historic Site', 'Hotel', 'Indian Restaurant', 'Italian Restaurant', 'Japanese Restaurant', 'Martial Arts School', 'Mediterranean Restaurant', 'Miscellaneous Shop', 'Optical Shop', 'Park', 'Pharmacy', 'Pizza Place', 'Plaza', 'Salad Place', 'Salon / Barbershop', 'Sandwich Place', 'Smoke Shop', 'Spa', 'Sporting Goods Shop', 'Steakhouse', 'Sushi Restaurant', 'Tailor Shop', 'Theater', 'Train Station', 'Video Game Store', 'V

In [68]:
midtown_onehot.shape

(100, 59)

In [69]:
print('Before reomoving non-common venues shape of midtwon, Manhattan: {}, and shape of downtown Toronto is: {}'
      .format(midtown_onehot.shape, downtown_onehot.shape))
midtown_onehot = midtown_onehot.loc[:, com_ven]
downtown_onehot = downtown_onehot.loc[:, com_ven]
print('After reomoving non-common venues shape of midtwon, Manhattan: {}, and shape of downtown Toronto is: {}'
      .format(midtown_onehot.shape, downtown_onehot.shape))

Before reomoving non-common venues shape of midtwon, Manhattan: (100, 59), and shape of downtown Toronto is: (1248, 212)
After reomoving non-common venues shape of midtwon, Manhattan: (100, 50), and shape of downtown Toronto is: (1248, 50)


### Calculating Mean Frequency of Occurrence

Next, let's group rows by neighborhood and calculate the mean frequency of occurrence for each venue category

In [38]:
downtown_grouped = downtown_onehot.groupby('Neighborhood').mean().reset_index()
downtown_grouped

Unnamed: 0,Neighborhood,American Restaurant,Art Gallery,Bakery,Bar,Bookstore,Boutique,Café,Chinese Restaurant,Clothing Store,Coffee Shop,Concert Hall,Cosmetics Shop,Deli / Bodega,Discount Store,Donut Shop,Fast Food Restaurant,Food Truck,French Restaurant,Gourmet Shop,Grocery Store,Gym,Gym / Fitness Center,Health & Beauty Service,Historic Site,Hotel,Indian Restaurant,Italian Restaurant,Japanese Restaurant,Lounge,Martial Arts School,Mediterranean Restaurant,Miscellaneous Shop,Optical Shop,Park,Pharmacy,Pizza Place,Plaza,Salad Place,Salon / Barbershop,Sandwich Place,Smoke Shop,Spa,Sporting Goods Shop,Steakhouse,Sushi Restaurant,Tailor Shop,Theater,Train Station,Video Game Store,Vietnamese Restaurant
0,Berczy Park,0.0,0.018182,0.036364,0.0,0.0,0.0,0.018182,0.0,0.0,0.090909,0.018182,0.0,0.0,0.0,0.0,0.0,0.0,0.018182,0.018182,0.018182,0.0,0.0,0.0,0.0,0.018182,0.0,0.018182,0.018182,0.0,0.0,0.0,0.0,0.0,0.018182,0.018182,0.0,0.0,0.0,0.0,0.018182,0.0,0.0,0.018182,0.018182,0.018182,0.018182,0.0,0.0,0.0,0.0
1,"CN Tower, King and Spadina, Railway Lands, Har...",0.0,0.0,0.0,0.0625,0.0,0.0625,0.0,0.0,0.0,0.0625,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,Central Bay Street,0.0,0.0,0.0,0.0,0.014706,0.0,0.058824,0.0,0.0,0.176471,0.0,0.0,0.0,0.014706,0.014706,0.0,0.0,0.014706,0.0,0.0,0.0,0.014706,0.0,0.0,0.014706,0.014706,0.044118,0.029412,0.0,0.0,0.0,0.014706,0.0,0.014706,0.0,0.0,0.0,0.029412,0.0,0.044118,0.0,0.014706,0.0,0.0,0.014706,0.0,0.0,0.0,0.0,0.0
3,Christie,0.0,0.0,0.0,0.0,0.0,0.0,0.1875,0.0,0.0,0.0625,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0625,0.0,0.0,0.0,0.0,0.0,0.0,0.125,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,Church and Wellesley,0.013333,0.0,0.0,0.0,0.013333,0.0,0.026667,0.013333,0.013333,0.093333,0.0,0.0,0.0,0.0,0.0,0.013333,0.0,0.0,0.0,0.0,0.013333,0.0,0.013333,0.0,0.026667,0.013333,0.0,0.053333,0.0,0.013333,0.026667,0.0,0.0,0.013333,0.0,0.013333,0.0,0.0,0.013333,0.0,0.013333,0.0,0.0,0.013333,0.053333,0.0,0.013333,0.0,0.0,0.013333
5,"Commerce Court, Victoria Hotel",0.04,0.01,0.01,0.0,0.01,0.0,0.06,0.0,0.0,0.13,0.01,0.0,0.03,0.0,0.0,0.01,0.01,0.01,0.0,0.0,0.04,0.01,0.0,0.0,0.06,0.0,0.01,0.03,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.01,0.0,0.01,0.01,0.01,0.0,0.0,0.0,0.01,0.01,0.01,0.0,0.0,0.0,0.0
6,"First Canadian Place, Underground city",0.03,0.01,0.01,0.02,0.01,0.0,0.07,0.0,0.0,0.11,0.02,0.0,0.03,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.04,0.01,0.0,0.0,0.04,0.0,0.0,0.04,0.01,0.0,0.01,0.0,0.0,0.0,0.0,0.02,0.01,0.03,0.01,0.01,0.0,0.0,0.0,0.03,0.02,0.0,0.01,0.01,0.0,0.0
7,"Garden District, Ryerson",0.0,0.01,0.01,0.0,0.02,0.0,0.04,0.01,0.09,0.09,0.0,0.03,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.01,0.01,0.0,0.0,0.02,0.0,0.02,0.03,0.01,0.0,0.0,0.01,0.0,0.01,0.0,0.02,0.01,0.0,0.0,0.01,0.01,0.01,0.01,0.01,0.01,0.0,0.02,0.0,0.01,0.01
8,"Harbourfront East, Union Station, Toronto Islands",0.0,0.01,0.02,0.02,0.0,0.0,0.04,0.01,0.0,0.13,0.0,0.0,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.04,0.01,0.02,0.01,0.01,0.0,0.0,0.0,0.0,0.02,0.0,0.02,0.02,0.01,0.0,0.01,0.0,0.0,0.02,0.01,0.01,0.0,0.01,0.01,0.0,0.0
9,"Kensington Market, Chinatown, Grange Park",0.0,0.0,0.027027,0.054054,0.0,0.0,0.054054,0.0,0.0,0.054054,0.0,0.0,0.0,0.0,0.013514,0.0,0.0,0.0,0.0,0.027027,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.013514,0.0,0.0,0.0,0.0,0.0,0.027027,0.013514,0.027027,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.040541


In [39]:
midtown_grouped = midtown_onehot.groupby('Neighborhood').mean().reset_index()
midtown_grouped

Unnamed: 0,Neighborhood,American Restaurant,Art Gallery,Bakery,Bar,Bookstore,Boutique,Café,Chinese Restaurant,Clothing Store,Coffee Shop,Concert Hall,Cosmetics Shop,Deli / Bodega,Discount Store,Donut Shop,Fast Food Restaurant,Food Truck,French Restaurant,Gourmet Shop,Grocery Store,Gym,Gym / Fitness Center,Health & Beauty Service,Historic Site,Hotel,Indian Restaurant,Italian Restaurant,Japanese Restaurant,Lounge,Martial Arts School,Mediterranean Restaurant,Miscellaneous Shop,Optical Shop,Park,Pharmacy,Pizza Place,Plaza,Salad Place,Salon / Barbershop,Sandwich Place,Smoke Shop,Spa,Sporting Goods Shop,Steakhouse,Sushi Restaurant,Tailor Shop,Theater,Train Station,Video Game Store,Vietnamese Restaurant
0,Midtown,0.01,0.01,0.05,0.01,0.03,0.01,0.02,0.01,0.05,0.05,0.01,0.02,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.02,0.01,0.01,0.01,0.06,0.01,0.01,0.02,0.01,0.01,0.02,0.01,0.01,0.01,0.01,0.02,0.01,0.01,0.01,0.03,0.01,0.02,0.04,0.04,0.02,0.02,0.04,0.01,0.01,0.01


Next, let's write a function to sort the venues in descending order:

In [40]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

Now let's create the new dataframe and display the top 10 venues for each neighborhood in the Downtown Toronto.

In [41]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = downtown_grouped['Neighborhood']

for ind in np.arange(downtown_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(downtown_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted.head(50)

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Berczy Park,Coffee Shop,Bakery,Hotel,Café,French Restaurant,Italian Restaurant,Japanese Restaurant,Concert Hall,Park,Pharmacy
1,"CN Tower, King and Spadina, Railway Lands, Har...",Bar,Boutique,Coffee Shop,Vietnamese Restaurant,Discount Store,Health & Beauty Service,Gym / Fitness Center,Gym,Grocery Store,Gourmet Shop
2,Central Bay Street,Coffee Shop,Café,Sandwich Place,Italian Restaurant,Japanese Restaurant,Salad Place,Donut Shop,Park,Indian Restaurant,Hotel
3,Christie,Grocery Store,Café,Park,Italian Restaurant,Coffee Shop,Vietnamese Restaurant,Discount Store,Gym / Fitness Center,Gym,Gourmet Shop
4,Church and Wellesley,Coffee Shop,Sushi Restaurant,Japanese Restaurant,Hotel,Café,Mediterranean Restaurant,Park,Bookstore,Chinese Restaurant,Clothing Store
5,"Commerce Court, Victoria Hotel",Coffee Shop,Hotel,Café,Gym,American Restaurant,Deli / Bodega,Japanese Restaurant,Sushi Restaurant,Park,Fast Food Restaurant
6,"First Canadian Place, Underground city",Coffee Shop,Café,Hotel,Gym,Japanese Restaurant,Salad Place,Deli / Bodega,American Restaurant,Steakhouse,Sushi Restaurant
7,"Garden District, Ryerson",Coffee Shop,Clothing Store,Café,Cosmetics Shop,Japanese Restaurant,Bookstore,Theater,Fast Food Restaurant,Pizza Place,Italian Restaurant
8,"Harbourfront East, Union Station, Toronto Islands",Coffee Shop,Hotel,Café,Bakery,Italian Restaurant,Pizza Place,Plaza,Bar,Park,Sporting Goods Shop
9,"Kensington Market, Chinatown, Grange Park",Café,Coffee Shop,Bar,Vietnamese Restaurant,Pizza Place,Park,Grocery Store,Bakery,Donut Shop,Pharmacy


Now, we do display the top 10 venues for midtown neighborhood: 

In [44]:
midtown_venues_sorted= return_most_common_venues(midtown_grouped.iloc[0, :], num_top_venues)
midtown_venues_sorted

array(['Hotel', 'Coffee Shop', 'Bakery', 'Clothing Store', 'Theater',
       'Steakhouse', 'Sporting Goods Shop', 'Sandwich Place', 'Bookstore',
       'Pizza Place'], dtype=object)

## Similarity Measurement

Now we can search for most similar neighborhoods by using cosine similarity.

In [45]:
downtown_array = downtown_grouped.iloc[:,1:].to_numpy()
midtown_array = midtown_grouped.iloc[:,1:].to_numpy()
similarity_ind = np.matmul(midtown_array, np.transpose(downtown_array))
similarity_ind

array([[0.01236364, 0.004375  , 0.01558824, 0.01125   , 0.01346667,
        0.0165    , 0.0165    , 0.0176    , 0.0152    , 0.0077027 ,
        0.01757576, 0.01863636, 0.0153    , 0.005     , 0.01164706,
        0.01208333, 0.01385417, 0.0203    , 0.01617647]])

In [46]:

print(np.shape(similarity_ind))
best_inx = np.argsort(-similarity_ind)[0] 
best_neighborhood = downtown_grouped.iloc[best_inx[0], :]

(1, 19)


## Results

As a results, here in our case the best match is Toronto Dominion Centre, Design Exchange

In [46]:
best_neighborhood

Neighborhood                Toronto Dominion Centre, Design Exchange
American Restaurant                                             0.03
Art Gallery                                                     0.01
Bakery                                                          0.02
Bar                                                             0.02
Bookstore                                                          0
Boutique                                                           0
Café                                                            0.05
Chinese Restaurant                                              0.01
Clothing Store                                                  0.01
Coffee Shop                                                     0.14
Concert Hall                                                    0.02
Cosmetics Shop                                                     0
Deli / Bodega                                                   0.02
Discount Store                    

Here we can see the most common venue categories in the neighborhood with the highest similarity 

In [58]:
neighborhoods_venues_sorted.head(50)
index = neighborhoods_venues_sorted.index[neighborhoods_venues_sorted['Neighborhood'] == 'Toronto Dominion Centre, Design Exchange']
neighborhoods_venues_sorted.iloc[index]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
17,"Toronto Dominion Centre, Design Exchange",Coffee Shop,Hotel,Café,Salad Place,Japanese Restaurant,American Restaurant,Steakhouse,Bakery,Bar,Concert Hall


## Conclusion

In this notebook, we introduced a problem and a possible solution for that. The search space and the parameters can be modified if required. Besides, there are several similarity measures that can be used like Euclidean distance or Jaccard similarity.

I hope you enjoyed exploring this notebook :)