# Comparing Neighborhoods in New York City and Toronto

# 1 - Introduction

When you have to move to a different country it is always difficult to find the right neigborhood to live.
This problem can be minimized if we can compare the neighborhoods in differents cities and make a list of 
the best candidates.
In order to analyze this problem we will treat three different cases, each one corresponding to a person that
lives in a different neighborhood in Toronto.
We will use Foursquare to analyze the venues in each of the three neighborhoods in Toronto and later we will create a list of possible candidates in New York City.


# 2 - Data

In order to understand the distribution of venues in New York City and Toronto, and start to search for good areas to live, we will use data from Foursquare. 
We will use the Foursquare API to retrieve relevant data for New York City and Toronto and organize it into pandas Dataframes.

We will also use geolocaliztion data for Toronto and New York City, available in previous modules of this Capstone Project.

## We begin importing the libraries required in this project

In [1]:
import pandas as pd
import numpy as np
import json

#Geopy
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

import folium # map rendering library

#Importing wikipedia to read the page
import wikipedia as wp

print('Libraries imported succesfully!')

Libraries imported succesfully!


## Importing and Preparing the New York Dataset

In [14]:
!wget -q -O 'newyork_data.json' https://cocl.us/new_york_dataset
print('Data downloaded!')

with open('newyork_data.json') as json_data:
    newyork_data = json.load(json_data)

neighborhoods_data = newyork_data['features']

# define the dataframe columns
column_names = ['Borough', 'Neighborhood', 'Latitude', 'Longitude'] 

# instantiate the dataframe
neighborhoods = pd.DataFrame(columns=column_names)

for data in neighborhoods_data:
    borough = neighborhood_name = data['properties']['borough'] 
    neighborhood_name = data['properties']['name']
        
    neighborhood_latlon = data['geometry']['coordinates']
    neighborhood_lat = neighborhood_latlon[1]
    neighborhood_lon = neighborhood_latlon[0]
    
    neighborhoods = neighborhoods.append({'Borough': borough,
                                          'Neighborhood': neighborhood_name,
                                          'Latitude': neighborhood_lat,
                                          'Longitude': neighborhood_lon}, ignore_index=True)
print('The dataframe has {} boroughs and {} neighborhoods.'.format(
        len(neighborhoods['Borough'].unique()),
        neighborhoods.shape[0]
    )
)
newyork = neighborhoods.copy()
newyork.head()

Data downloaded!
The dataframe has 5 boroughs and 306 neighborhoods.


Unnamed: 0,Borough,Neighborhood,Latitude,Longitude
0,Bronx,Wakefield,40.894705,-73.847201
1,Bronx,Co-op City,40.874294,-73.829939
2,Bronx,Eastchester,40.887556,-73.827806
3,Bronx,Fieldston,40.895437,-73.905643
4,Bronx,Riverdale,40.890834,-73.912585


## Importing and Preparing the Toronto Dataset

In [11]:
html = wp.page("List_of_postal_codes_of_Canada:_M").html().encode("UTF-8")
df = pd.read_html(html)[0]

table = df[df['Borough'] != 'Not assigned']

table['Neighbourhood'] = table.groupby('Postcode')['Neighbourhood'].transform(lambda neigh: ', '.join(neigh))

table = table.drop_duplicates()

table['Neighbourhood'].replace("Not assigned", table["Borough"],inplace=True)

print('The dataframe has {} boroughs and {} neighborhoods.'.format(
        len(table['Borough'].unique()),
        table.shape[0]
    )
)
table.head()

The dataframe has 10 boroughs and 103 neighborhoods.


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  


Unnamed: 0,Postcode,Borough,Neighbourhood
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,Harbourfront
5,M6A,North York,"Lawrence Heights, Lawrence Manor"
7,M7A,Downtown Toronto,Queen's Park


## We still need the latitude and longitude for each neighborhood in Toronto.

In [13]:
geo_df = pd.read_csv("Geospatial_Coordinates.csv")
geo_df.columns = ["Postcode", "Latitude", "Longitude"]
toronto = table.join(geo_df.set_index('Postcode'),on='Postcode')
toronto.head()

Unnamed: 0,Postcode,Borough,Neighbourhood,Latitude,Longitude
2,M3A,North York,Parkwoods,43.753259,-79.329656
3,M4A,North York,Victoria Village,43.725882,-79.315572
4,M5A,Downtown Toronto,Harbourfront,43.65426,-79.360636
5,M6A,North York,"Lawrence Heights, Lawrence Manor",43.718518,-79.464763
7,M7A,Downtown Toronto,Queen's Park,43.662301,-79.389494


## Geolocation for New Yor City

In [8]:
address_NY = 'New York City, NY'

geolocator_NY = Nominatim(user_agent="ny_explorer")
location_NY = geolocator.geocode(address_NY)
latitude_NY = location_NY.latitude
longitude_NY = location_NY.longitude
print('The geograpical coordinate of New York City are {}, {}.'.format(latitude_NY, longitude_NY))

The geograpical coordinate of New York City are 40.7127281, -74.0060152.


## Geolocation for Toronto

In [213]:
address_TO = 'Toronto, CN'

geolocator_TO = Nominatim(user_agent="toronto_explorer")
location_TO = geolocator.geocode(address_TO)
latitude_TO = location_TO.latitude
longitude_TO = location_TO.longitude
print('The geograpical coordinates of Toronto are {}, {}.'.format(latitude_TO, longitude_TO))

The geograpical coordinates of Toronto are 43.6425637, -79.38708718320467.


## Setting the Foursquare API

In [90]:
### Setting the API
CLIENT_ID = 'IF0FBHU2M5U0TBUTDYE3THW4YWZMTYMRCJPTF54M5QVWOIP5' # your Foursquare ID
CLIENT_SECRET = 'LVOH43H1TW0SQ30RK21VO3QR3ZGGV1X4O0ZW2ATTZ0RQOLIV' # your Foursquare Secret
VERSION = '20200226' # Foursquare API version
LIMIT = 300
radius=500

## Defining a function to collect the data using the Foursquare API

In [91]:
#Defining a function to make the process automatic

def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighbourhood', 
                  'Neighbourhood Latitude', 
                  'Neighbourhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

## Geting the data for Toronto

In [92]:
toronto_venues = getNearbyVenues(names=toronto['Neighbourhood'],
                                   latitudes=toronto['Latitude'],
                                   longitudes=toronto['Longitude']
                                  )

Parkwoods
Victoria Village
Harbourfront
Lawrence Heights, Lawrence Manor
Queen's Park
Islington Avenue
Rouge, Malvern
Don Mills North
Woodbine Gardens, Parkview Hill
Ryerson, Garden District
Glencairn
Cloverdale, Islington, Martin Grove, Princess Gardens, West Deane Park
Highland Creek, Rouge Hill, Port Union
Flemingdon Park, Don Mills South
Woodbine Heights
St. James Town
Humewood-Cedarvale
Bloordale Gardens, Eringate, Markland Wood, Old Burnhamthorpe
Guildwood, Morningside, West Hill
The Beaches
Berczy Park
Caledonia-Fairbanks
Woburn
Leaside
Central Bay Street
Christie
Cedarbrae
Hillcrest Village
Bathurst Manor, Downsview North, Wilson Heights
Thorncliffe Park
Adelaide, King, Richmond
Dovercourt Village, Dufferin
Scarborough Village
Fairview, Henry Farm, Oriole
Northwood Park, York University
East Toronto
Harbourfront East, Toronto Islands, Union Station
Little Portugal, Trinity
East Birchmount Park, Ionview, Kennedy Park
Bayview Village
CFB Toronto, Downsview East
The Danforth West,

## Geting the data for New York City

In [93]:
newyork_venues = getNearbyVenues(names=newyork['Neighborhood'],
                                   latitudes=newyork['Latitude'],
                                   longitudes=newyork['Longitude']
                                  )

Wakefield
Co-op City
Eastchester
Fieldston
Riverdale
Kingsbridge
Marble Hill
Woodlawn
Norwood
Williamsbridge
Baychester
Pelham Parkway
City Island
Bedford Park
University Heights
Morris Heights
Fordham
East Tremont
West Farms
High  Bridge
Melrose
Mott Haven
Port Morris
Longwood
Hunts Point
Morrisania
Soundview
Clason Point
Throgs Neck
Country Club
Parkchester
Westchester Square
Van Nest
Morris Park
Belmont
Spuyten Duyvil
North Riverdale
Pelham Bay
Schuylerville
Edgewater Park
Castle Hill
Olinville
Pelham Gardens
Concourse
Unionport
Edenwald
Bay Ridge
Bensonhurst
Sunset Park
Greenpoint
Gravesend
Brighton Beach
Sheepshead Bay
Manhattan Terrace
Flatbush
Crown Heights
East Flatbush
Kensington
Windsor Terrace
Prospect Heights
Brownsville
Williamsburg
Bushwick
Bedford Stuyvesant
Brooklyn Heights
Cobble Hill
Carroll Gardens
Red Hook
Gowanus
Fort Greene
Park Slope
Cypress Hills
East New York
Starrett City
Canarsie
Flatlands
Mill Island
Manhattan Beach
Coney Island
Bath Beach
Borough Park
Dyker

KeyError: 'groups'

In [215]:
print("There are {} venues in Toronto.".format(toronto_venues.shape[0]))
print("There are {} venues in New York.".format(newyork_venues.shape[0]))

There are 2225 venues in Toronto.
There are 10278 venues in New York.


### All the data was loaded and pre-processed into dataframes. We can proceed with the analysis.

****

# 3 - Methodolgy

## Visualizing the maps of New York and Toronto, together with their neighborhoods

## MESSAGE TO GRADERS!
### If cannot view the map, maybe is because you are viewing the Jupyter Notebbok straight in Github.
### This is a known problem as Jupyter Notebook does NOT render a map when read through Github's direct view.
### To view the maps properly, you need to go through JupyterViewer:
https://nbviewer.jupyter.org/

### Copy the github URL as given into the main field and you will be able to see the map rendered properly.

## Map of Toronto

In [224]:
map_toronto = folium.Map(location=[latitude_TO+0.04, longitude_TO], zoom_start=10.5)

# add markers to map
for lat, lng, borough, neighbourhood in zip(toronto['Latitude'], toronto['Longitude'], toronto['Borough'], toronto['Neighbourhood']):
    label = '{}, {}'.format(neighbourhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_toronto)  
    
map_toronto

## Map of New York

In [225]:
map_newyork = folium.Map(location=[latitude_NY, longitude_NY], zoom_start=10)

# add markers to map
for lat, lng, borough, neighborhood in zip(newyork['Latitude'], newyork['Longitude'], newyork['Borough'], newyork['Neighborhood']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_newyork)  
    
map_newyork

### We have to organize the data. Let's group the venues by neighborhood and take a look at how many venues per neighborhood we have in our dataframes.

### Toronto

In [96]:
toronto_venues.groupby('Neighbourhood').count()

Unnamed: 0_level_0,Neighbourhood Latitude,Neighbourhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighbourhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
"Adelaide, King, Richmond",100,100,100,100,100,100
Agincourt,5,5,5,5,5,5
"Agincourt North, L'Amoreaux East, Milliken, Steeles East",3,3,3,3,3,3
"Albion Gardens, Beaumond Heights, Humbergate, Jamestown, Mount Olive, Silverstone, South Steeles, Thistletown",10,10,10,10,10,10
"Alderwood, Long Branch",9,9,9,9,9,9
...,...,...,...,...,...,...
Willowdale West,7,7,7,7,7,7
Woburn,3,3,3,3,3,3
"Woodbine Gardens, Parkview Hill",11,11,11,11,11,11
Woodbine Heights,9,9,9,9,9,9


### New York

In [97]:
newyork_venues.groupby('Neighbourhood').count()

Unnamed: 0_level_0,Neighbourhood Latitude,Neighbourhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighbourhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Allerton,32,32,32,32,32,32
Annadale,11,11,11,11,11,11
Arden Heights,5,5,5,5,5,5
Arlington,6,6,6,6,6,6
Arrochar,21,21,21,21,21,21
...,...,...,...,...,...,...
Woodhaven,24,24,24,24,24,24
Woodlawn,25,25,25,25,25,25
Woodrow,19,19,19,19,19,19
Woodside,76,76,76,76,76,76


### One important information is the number of unique venue categories in our dataframes.

In [37]:
print('There are {} uniques categories in Toronto.'.format(len(toronto_venues['Venue Category'].unique())))

There are 267 uniques categories in Toronto.


In [38]:
print('There are {} uniques categories in New York City.'.format(len(newyork_venues['Venue Category'].unique())))

There are 429 uniques categories in New York City.


### New york, being a bigger and more populated city, has a lot more unique venue categories. 

****

### Now it is time to start analyzing the data. 
### We will create a new dataframe, listing all the unique categories for each neighborhood.
### Our intention is to obtain a list of most frequent venues per neighborhood. 
### We will then use this information to characterize the neighborhoods.

### First let's do it for Toronto

In [40]:
# one hot encoding
toronto_onehot = pd.get_dummies(toronto_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighbourhood column back to dataframe
toronto_onehot['Neighbourhood'] = toronto_venues['Neighbourhood'] 

# move neighborhood column to the first column
fixed_columns = [toronto_onehot.columns[-1]] + list(toronto_onehot.columns[:-1])
toronto_onehot = toronto_onehot[fixed_columns]

print("Shape of the dataframe:", toronto_onehot.shape)

toronto_grouped = toronto_onehot.groupby('Neighbourhood').mean().reset_index()

#Function to sort the venues in descending order.
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighbourhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
toronto_neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
toronto_neighborhoods_venues_sorted['Neighbourhood'] = toronto_grouped['Neighbourhood']

for ind in np.arange(toronto_grouped.shape[0]):
    toronto_neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(toronto_grouped.iloc[ind, :], num_top_venues)

toronto_neighborhoods_venues_sorted

Shape of the dataframe: (2225, 268)


Unnamed: 0,Neighbourhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,"Adelaide, King, Richmond",Coffee Shop,Restaurant,Thai Restaurant,Café,Bar,Steakhouse,Cosmetics Shop,Sushi Restaurant,Breakfast Spot,Burger Joint
1,Agincourt,Latin American Restaurant,Lounge,Skating Rink,Breakfast Spot,Clothing Store,Yoga Studio,Doner Restaurant,Diner,Discount Store,Distribution Center
2,"Agincourt North, L'Amoreaux East, Milliken, St...",Park,Bakery,Playground,Dog Run,Dessert Shop,Dim Sum Restaurant,Diner,Discount Store,Distribution Center,Doner Restaurant
3,"Albion Gardens, Beaumond Heights, Humbergate, ...",Grocery Store,Beer Store,Japanese Restaurant,Fried Chicken Joint,Pharmacy,Pizza Place,Fast Food Restaurant,Discount Store,Sandwich Place,Dim Sum Restaurant
4,"Alderwood, Long Branch",Pizza Place,Gym,Coffee Shop,Skating Rink,Pharmacy,Pub,Sandwich Place,Pool,Dim Sum Restaurant,Deli / Bodega
...,...,...,...,...,...,...,...,...,...,...,...
94,Willowdale West,Grocery Store,Pizza Place,Discount Store,Home Service,Coffee Shop,Butcher,Pharmacy,Dumpling Restaurant,Drugstore,Department Store
95,Woburn,Coffee Shop,Korean Restaurant,Yoga Studio,Donut Shop,Diner,Discount Store,Distribution Center,Dog Run,Doner Restaurant,Dumpling Restaurant
96,"Woodbine Gardens, Parkview Hill",Pizza Place,Bank,Café,Gym / Fitness Center,Pharmacy,Fast Food Restaurant,Gastropub,Intersection,Athletics & Sports,Bus Line
97,Woodbine Heights,Skating Rink,Curling Ice,Park,Pharmacy,Video Store,Beer Store,Asian Restaurant,Cosmetics Shop,Dog Run,Dim Sum Restaurant


### Now let's do it for New York City

In [41]:
# one hot encoding
newyork_onehot = pd.get_dummies(newyork_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighbourhood column back to dataframe
newyork_onehot['Neighbourhood'] = newyork_venues['Neighbourhood'] 

# move neighborhood column to the first column
fixed_columns2 = [newyork_onehot.columns[-1]] + list(newyork_onehot.columns[:-1])
newyork_onehot = newyork_onehot[fixed_columns2]

print("Shape of the dataframe:", newyork_onehot.shape)

newyork_grouped = newyork_onehot.groupby('Neighbourhood').mean().reset_index()

#Function to sort the venues in descending order.
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns2 = ['Neighbourhood']
for ind in np.arange(num_top_venues):
    try:
        columns2.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns2.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
newyork_neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
newyork_neighborhoods_venues_sorted['Neighbourhood'] = newyork_grouped['Neighbourhood']

for ind in np.arange(newyork_grouped.shape[0]):
    newyork_neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(newyork_grouped.iloc[ind, :], num_top_venues)

newyork_neighborhoods_venues_sorted

Shape of the dataframe: (10278, 430)


Unnamed: 0,Neighbourhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Allerton,Pizza Place,Bakery,Deli / Bodega,Cosmetics Shop,Supermarket,Mexican Restaurant,Fast Food Restaurant,Bus Station,Martial Arts Dojo,Electronics Store
1,Annadale,Pizza Place,Dance Studio,Liquor Store,American Restaurant,Sports Bar,Train Station,Restaurant,Bakery,Diner,Farm
2,Arden Heights,Lawyer,Pharmacy,Coffee Shop,Bus Stop,Pizza Place,Yoga Studio,Event Space,Exhibit,Factory,Falafel Restaurant
3,Arlington,Bus Stop,Deli / Bodega,Coffee Shop,Boat or Ferry,Grocery Store,Yoga Studio,Fish & Chips Shop,Factory,Falafel Restaurant,Farm
4,Arrochar,Italian Restaurant,Pizza Place,Deli / Bodega,Bus Stop,Food Truck,Supermarket,Taco Place,Outdoors & Recreation,Middle Eastern Restaurant,Mediterranean Restaurant
...,...,...,...,...,...,...,...,...,...,...,...
296,Woodhaven,Deli / Bodega,Park,Bank,Pharmacy,Nail Salon,Pizza Place,Chinese Restaurant,Gift Shop,Bagel Shop,Sandwich Place
297,Woodlawn,Pizza Place,Deli / Bodega,Pub,Playground,Food Truck,American Restaurant,Rental Car Location,Park,Grocery Store,Bar
298,Woodrow,Pharmacy,Sushi Restaurant,Donut Shop,Chinese Restaurant,Miscellaneous Shop,Coffee Shop,Grocery Store,Bank,Bakery,Bagel Shop
299,Woodside,Grocery Store,Thai Restaurant,Latin American Restaurant,Filipino Restaurant,Bakery,American Restaurant,Pub,Pizza Place,Bar,Donut Shop


****

### Our next step is to select 3 different neighborhoods in Toronto and try to find similar neighborhoods in New York. 

### I used the following website to select the neighbourhods:
https://torontolife.com/neighbourhood-rankings/

### Looking at the map of Toronto, the neighborhoods selected were: 

1) Runnymede - 1st place 

2) Cabbagetown, St. James Town - 14th place

3) Little Portugal - 34th place

### Let's take a look at the map again and localize our selected neighborhoods.

In [101]:
col_names=['Postcode', 'Borough', 'Neighbourhood', 'Latitude', 'Longitude']
selec = ['Runnymede, Swansea', 'Cabbagetown, St. James Town','Little Portugal, Trinity']
selected_TO = pd.DataFrame(columns=col_names)
for data in range(len(selec)):
    selected_TO = selected_TO.append(toronto[toronto['Neighbourhood'] == selec[data]], ignore_index=True)

map_toronto = folium.Map(location=[latitude_TO, longitude_TO], zoom_start=12)

# add markers to map
for lat, lng, borough, neighbourhood in zip(selected_TO['Latitude'], selected_TO['Longitude'], selected_TO['Borough'], selected_TO['Neighbourhood']):
    label = '{}, {}'.format(neighbourhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_toronto)  
    
map_toronto

In [79]:
selec = ['Runnymede, Swansea', 'Cabbagetown, St. James Town','Little Portugal, Trinity']
selected_TO_grouped = pd.DataFrame(columns=toronto_grouped.columns)
for data in range(len(selec)):
    selected_TO_grouped = selected_TO_grouped.append(toronto_grouped[toronto_grouped['Neighbourhood'] == selec[data]], ignore_index=True)
selected_TO_grouped

Unnamed: 0,Neighbourhood,Accessories Store,Afghan Restaurant,Airport,Airport Food Court,Airport Gate,Airport Lounge,Airport Service,Airport Terminal,American Restaurant,...,Train Station,Vegetarian / Vegan Restaurant,Video Game Store,Video Store,Vietnamese Restaurant,Warehouse Store,Wine Bar,Wings Joint,Women's Store,Yoga Studio
0,"Runnymede, Swansea",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.026316,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.026316
1,"Cabbagetown, St. James Town",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.020833,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,"Little Portugal, Trinity",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.017857,0.0,0.0,0.035714,0.0,0.035714,0.0,0.0,0.017857


### Let's analyze each one separately.

# Runnymede, Swansea

## Considered the #1 neighbourhood in Toronto.

### The information extracted from the web says that:
In 1970, a handful of Bloor West Village business owners, wary of losing customers to suburban shopping malls, banded together to create Canada’s first BIA. Outside their shops, they hung string lights, planted flowers and laid the groundwork for what has, almost 50 years later, become Toronto’s top neighbourhood. For residents, it’s the perfect Goldilocks district: bustling but cloistered from downtown, hip but not as precious as Roncesvalles, classy but not as pricy as Baby Point. It scores well in virtually every metric: it’s safe and accessible (Jane and Runnymede stations are nearby), and its charming old homes are relatively affordable (you can still get one for less than $1 million). The main drag is more quaint than happening, but trendy Bloor and Dundas West bars and restaurants are just a short walk away, as are a number of highly ranked schools and, of course, the sprawling High Park.


### It's 10 most common venues are:

In [153]:
toronto_neighborhoods_venues_sorted[toronto_neighborhoods_venues_sorted['Neighbourhood'] == selec[0]]

Unnamed: 0,Neighbourhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
76,"Runnymede, Swansea",Café,Coffee Shop,Restaurant,Sushi Restaurant,Pizza Place,Italian Restaurant,Gastropub,Pub,Post Office,Latin American Restaurant


### Let's take a look again at the map for this neighbourhood and locate the venues.

In [137]:
selected_TO_venue = toronto_venues[toronto_venues['Neighbourhood'] == selec[0]]

map_runnymede = folium.Map(location=[selected_TO_venue['Neighbourhood Latitude'][1642],selected_TO_venue['Neighbourhood Longitude'][1642]], zoom_start=16)

# add markers to map
for lat, lng, name, categorie in zip(selected_TO_venue['Venue Latitude'], selected_TO_venue['Venue Longitude'], selected_TO_venue['Venue'], selected_TO_venue['Venue Category']):
    label = '{}, {}'.format(name, categorie)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_runnymede)  
    
map_runnymede

### It is possible to see that the majority of the venues are concentrated on a single avenue.

# Cabbagetown, St. James Town

## Considered the #14 neighbourhood in Toronto.

### The information extracted from the web says that:
From the street, Cabbagetown looks like a living museum of Victorian Toronto, with many homes appearing exactly as they did in the 19th century. For a low-rise district, it has a high concentration of restaurants, from ordinary pubs to top-notch spots like Kingyo Izakaya and F’Amelia. Abundant green space is a plus: follow any street far enough east and you’ll wind up at Riverdale Farm, where the resident cows and chickens have been delighting children since the 1970s.

### It's 10 most common venues are:

In [139]:
toronto_neighborhoods_venues_sorted[toronto_neighborhoods_venues_sorted['Neighbourhood'] == selec[1]]

Unnamed: 0,Neighbourhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
15,"Cabbagetown, St. James Town",Park,Coffee Shop,Restaurant,Bakery,Italian Restaurant,Pizza Place,Pub,Café,Caribbean Restaurant,Pet Store


### Let's take a look again at the map for this neighbourhood and locate the venues.


In [145]:
selected_TO_venue = toronto_venues[toronto_venues['Neighbourhood'] == selec[1]]

map_cabbagetown = folium.Map(location=[selected_TO_venue['Neighbourhood Latitude'][1958],selected_TO_venue['Neighbourhood Longitude'][1958]], zoom_start=15.5)

# add markers to map
for lat, lng, name, categorie in zip(selected_TO_venue['Venue Latitude'], selected_TO_venue['Venue Longitude'], selected_TO_venue['Venue'], selected_TO_venue['Venue Category']):
    label = '{}, {}'.format(name, categorie)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_cabbagetown)  
    
map_cabbagetown

### Again, most of the venues are concentrated in one avenue, but we can see higher disperson of venues.
### The parks around these area are very atractive and we will have to find something similar in New York.

# Little Portugal, Trinity

## Considered the #34 neighbourhood in Toronto.

### The information extracted from the web says that:
Given the name, it’s no surprise that this neighbourhood’s most common native tongue, after English, is Portuguese. The influx of Iberian immigrants has been replaced by a new cohort: yuppies. They’re attracted by the juxtaposition of quiet residential streets with ultra-cool bars (The Lockhart, Uncle Mikey’s, the Drake Hotel). Little Portugal’s popularity among the about-to-have-children set has inflated housing prices by 10 per cent in the past year, and although property taxes might soon be going up, at least the locals will be saving on gas: two-thirds of the residents walk, bike or commute to work.

### It's 10 most common venues are:

In [149]:
toronto_neighborhoods_venues_sorted[toronto_neighborhoods_venues_sorted['Neighbourhood'] == selec[2]]

Unnamed: 0,Neighbourhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
64,"Little Portugal, Trinity",Bar,Coffee Shop,Asian Restaurant,Restaurant,Café,Pizza Place,Bakery,Men's Store,Wine Bar,Vietnamese Restaurant


In [152]:
selected_TO_venue = toronto_venues[toronto_venues['Neighbourhood'] == selec[2]]

map_littleportugal = folium.Map(location=[selected_TO_venue['Neighbourhood Latitude'][926],selected_TO_venue['Neighbourhood Longitude'][926]], zoom_start=15.5)

# add markers to map
for lat, lng, name, categorie in zip(selected_TO_venue['Venue Latitude'], selected_TO_venue['Venue Longitude'], selected_TO_venue['Venue'], selected_TO_venue['Venue Category']):
    label = '{}, {}'.format(name, categorie)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_littleportugal)  
    
map_littleportugal

### For Little Portugal we can see that the venues are distributed in 3 main avenues and we have a big park in the vicinity.
### The #1 most common venue is: bar

****

# 4 - Results

## Now we need to analyze the data for New York, taking into account the information we had for our selected neighbourhoods in Toronto.


# Neighborhood similar to Runnymede.

### The #1 venue in Runnymede was Café, the #2 was Coffee Shop and #3 was Restaurant.
### We will select the neighborhoods in New York City with Café as #1 venue.

In [216]:
newyork_neighborhoods_venues_sorted[newyork_neighborhoods_venues_sorted['1st Most Common Venue'] == 'Café']

Unnamed: 0,Neighbourhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
162,Manhattan Beach,Café,Beach,Harbor / Marina,Bus Stop,Pizza Place,Food,Sandwich Place,Playground,Ice Cream Shop,Yoga Studio
218,Prospect Lefferts Gardens,Café,Bakery,Pizza Place,Caribbean Restaurant,Sushi Restaurant,Liquor Store,Deli / Bodega,Wine Shop,Indian Restaurant,Ice Cream Shop
273,Tudor City,Café,Park,Mexican Restaurant,Coffee Shop,Pizza Place,Deli / Bodega,Diner,Gym,Greek Restaurant,Dog Run
283,Washington Heights,Café,Bakery,Grocery Store,Chinese Restaurant,Mobile Phone Shop,Deli / Bodega,Mexican Restaurant,Bank,Gym,New American Restaurant


### We have 4 options, but none has Coffee Shop as #2 venue. Analyzing the options we see that Tudor City looks like an ideal candidate.
### Let's select Tudor City!

In [218]:
ny_selec=newyork_venues[newyork_venues['Neighbourhood'] == 'Tudor City'].reset_index()
ny_selec.head()

Unnamed: 0,index,Neighbourhood,Neighbourhood Latitude,Neighbourhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,9407,Tudor City,40.746917,-73.971219,Tudor City Park South,40.748766,-73.970775,Park
1,9408,Tudor City,40.746917,-73.971219,Sai Gon Dep,40.747701,-73.973788,Vietnamese Restaurant
2,9409,Tudor City,40.746917,-73.971219,mang'Oh yoga,40.747446,-73.972614,Yoga Studio
3,9410,Tudor City,40.746917,-73.971219,Tudor City Steps,40.748352,-73.970866,Trail
4,9411,Tudor City,40.746917,-73.971219,Tudor City Park North,40.749325,-73.970524,Park


### Let's take a look at the map for this neighborhood.

In [164]:
map_tudor = folium.Map(location=[ny_selec['Neighbourhood Latitude'][0],ny_selec['Neighbourhood Longitude'][0]], zoom_start=15.5)

# add markers to map
for lat, lng, name, categorie in zip(ny_selec['Venue Latitude'], ny_selec['Venue Longitude'], ny_selec['Venue'], ny_selec['Venue Category']):
    label = '{}, {}'.format(name, categorie)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_tudor)  
    
map_tudor

****

# Neighborhood similar to Cabbagetown.

### The #1 venue in Cabbagetown was Park, the #2 was Coffee Shop and #3 was Restaurant.
### We will select the neighborhoods in New York City with Park as #1 venue.

In [220]:
newyork_neighborhoods_venues_sorted[newyork_neighborhoods_venues_sorted['1st Most Common Venue'] == 'Park']

Unnamed: 0,Neighbourhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
15,Bayswater,Park,Playground,Yoga Studio,Filipino Restaurant,Event Service,Event Space,Exhibit,Factory,Falafel Restaurant,Farm
55,Clason Point,Park,Bus Stop,Pool,Boat or Ferry,Grocery Store,South American Restaurant,Yoga Studio,Filipino Restaurant,Exhibit,Factory
108,Fulton Ferry,Park,American Restaurant,Scenic Lookout,Coffee Shop,Boat or Ferry,Roof Deck,Playground,Café,Ice Cream Shop,Bakery
179,Morningside Heights,Park,Coffee Shop,American Restaurant,Bookstore,Deli / Bodega,Sandwich Place,Burger Joint,New American Restaurant,College Cafeteria,Salad Place
249,Somerville,Park,Yoga Studio,Ethiopian Restaurant,Event Space,Exhibit,Factory,Falafel Restaurant,Farm,Farmers Market,Fast Food Restaurant
256,Spuyten Duyvil,Park,Bank,Bus Stop,Thai Restaurant,Tennis Stadium,Pharmacy,Scenic Lookout,Tennis Court,Pizza Place,Farmers Market
262,Stuyvesant Town,Park,Bar,Heliport,Coffee Shop,Gas Station,Boat or Ferry,Farmers Market,Gym / Fitness Center,Harbor / Marina,Baseball Field
268,Todt Hill,Park,Yoga Studio,Ethiopian Restaurant,Event Space,Exhibit,Factory,Falafel Restaurant,Farm,Farmers Market,Fast Food Restaurant
289,Westerleigh,Park,Arcade,Convenience Store,Spanish Restaurant,Yoga Studio,Film Studio,Event Space,Exhibit,Factory,Falafel Restaurant


### We have 9 options with Park as #1 venue and 1 one this options also has Coffee Shop as #2.
### Let's select Morningside Heights!

In [167]:
ny_selec=newyork_venues[newyork_venues['Neighbourhood'] == 'Morningside Heights'].reset_index()
ny_selec.head()

Unnamed: 0,index,Neighbourhood,Neighbourhood Latitude,Neighbourhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,5484,Morningside Heights,40.808,-73.963896,Alma Mater Statue,40.807726,-73.962252,Outdoor Sculpture
1,5485,Morningside Heights,40.808,-73.963896,Book Culture,40.806629,-73.96494,Bookstore
2,5486,Morningside Heights,40.808,-73.963896,Columbia Greenmarket,40.807195,-73.964335,Farmers Market
3,5487,Morningside Heights,40.808,-73.963896,Shake Shack,40.807933,-73.964013,Burger Joint
4,5488,Morningside Heights,40.808,-73.963896,Arts and Crafts Beer Parlor,40.806689,-73.961094,Pub


### Let's see Morningside Heights up close.

In [168]:
map_morning = folium.Map(location=[ny_selec['Neighbourhood Latitude'][0],ny_selec['Neighbourhood Longitude'][0]], zoom_start=15.5)

# add markers to map
for lat, lng, name, categorie in zip(ny_selec['Venue Latitude'], ny_selec['Venue Longitude'], ny_selec['Venue'], ny_selec['Venue Category']):
    label = '{}, {}'.format(name, categorie)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_morning)  
    
map_morning

****

# Neighborhood similar to Little Portugal.

### The #1 venue in Little Portugal was Bar, the #2 was Coffee Shop and #3 was Asian Restaurant, with Restaurant as #4.
### We will select the neighborhoods in New York City with Bar as #1 venue.

In [169]:
newyork_neighborhoods_venues_sorted[newyork_neighborhoods_venues_sorted['1st Most Common Venue'] == 'Bar']

Unnamed: 0,Neighbourhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
6,Astoria,Bar,Middle Eastern Restaurant,Greek Restaurant,Hookah Bar,Seafood Restaurant,Pizza Place,Mediterranean Restaurant,Bubble Tea Shop,Gym / Fitness Center,Gym
14,Bayside,Bar,Pizza Place,Indian Restaurant,American Restaurant,Sushi Restaurant,Spa,Bakery,Italian Restaurant,Chinese Restaurant,Cosmetics Shop
27,Boerum Hill,Bar,Dance Studio,Coffee Shop,Sandwich Place,Furniture / Home Store,Arts & Crafts Store,French Restaurant,Bakery,Yoga Studio,Grocery Store
39,Bushwick,Bar,Deli / Bodega,Mexican Restaurant,Coffee Shop,Thrift / Vintage Store,Bakery,Pizza Place,Discount Store,Vegetarian / Vegan Restaurant,Café
47,Central Harlem,Bar,French Restaurant,African Restaurant,American Restaurant,Chinese Restaurant,Cosmetics Shop,Seafood Restaurant,Fried Chicken Joint,Market,Beer Bar
60,Cobble Hill,Bar,Pizza Place,Playground,Coffee Shop,Cocktail Bar,Deli / Bodega,Yoga Studio,Ice Cream Shop,Italian Restaurant,Japanese Restaurant
81,East Village,Bar,Ice Cream Shop,Chinese Restaurant,Mexican Restaurant,Wine Bar,Pizza Place,Vegetarian / Vegan Restaurant,Speakeasy,Korean Restaurant,Japanese Restaurant
82,East Williamsburg,Bar,Deli / Bodega,Cocktail Bar,Coffee Shop,Bakery,Music Venue,Mexican Restaurant,Concert Hall,Gym / Fitness Center,Donut Shop
119,Great Kills,Bar,Italian Restaurant,Pizza Place,Sandwich Place,Train Station,Liquor Store,Pharmacy,Grocery Store,Spanish Restaurant,Cosmetics Shop
120,Greenpoint,Bar,Pizza Place,Coffee Shop,Cocktail Bar,Café,Sushi Restaurant,Boutique,Mexican Restaurant,French Restaurant,Yoga Studio


### We have many options with Bar as #1 venue!!!
### But only 1 with Coffee Shop as #2.
### Let's take a look in Williamsburg.

In [222]:
ny_selec=newyork_venues[newyork_venues['Neighbourhood'] == 'Williamsburg'].reset_index()
ny_selec.head()

Unnamed: 0,index,Neighbourhood,Neighbourhood Latitude,Neighbourhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,1666,Williamsburg,40.707144,-73.958115,Trophy Bar,40.707768,-73.955782,Bar
1,1667,Williamsburg,40.707144,-73.958115,Dotory,40.70773,-73.955779,Korean Restaurant
2,1668,Williamsburg,40.707144,-73.958115,Blink Fitness,40.708756,-73.958248,Gym
3,1669,Williamsburg,40.707144,-73.958115,Duff's,40.708774,-73.957716,Bar
4,1670,Williamsburg,40.707144,-73.958115,Mexico 2000,40.707552,-73.955052,Taco Place


In [223]:
map_william = folium.Map(location=[ny_selec['Neighbourhood Latitude'][0],ny_selec['Neighbourhood Longitude'][0]], zoom_start=15)

# add markers to map
for lat, lng, name, categorie in zip(ny_selec['Venue Latitude'], ny_selec['Venue Longitude'], ny_selec['Venue'], ny_selec['Venue Category']):
    label = '{}, {}'.format(name, categorie)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_william)  
    
map_william

### We are very close to huge avenues and there is no park close by.
### If we look at the map for Little Portugal we will see that there was a very nice park in its vicinity.
### Looking at our options in New York again, there are some neighborhoods with Coffee Shop as #3.
### I selescted the neighborhood Boerum Hill to take a look.

In [209]:
ny_selec=newyork_venues[newyork_venues['Neighbourhood'] == 'Boerum Hill'].reset_index()
ny_selec.head()

Unnamed: 0,index,Neighbourhood,Neighbourhood Latitude,Neighbourhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,2755,Boerum Hill,40.685683,-73.983748,Rucola,40.685659,-73.985769,Italian Restaurant
1,2756,Boerum Hill,40.685683,-73.983748,Robert,40.686559,-73.985128,Cocktail Bar
2,2757,Boerum Hill,40.685683,-73.983748,Rice & Miso,40.684633,-73.983768,Japanese Restaurant
3,2758,Boerum Hill,40.685683,-73.983748,Bedouin Tent,40.686936,-73.984469,Middle Eastern Restaurant
4,2759,Boerum Hill,40.685683,-73.983748,Taiki,40.6845,-73.983783,Sushi Restaurant


In [212]:
map_boerum = folium.Map(location=[ny_selec['Neighbourhood Latitude'][0],ny_selec['Neighbourhood Longitude'][0]], zoom_start=15)

# add markers to map
for lat, lng, name, categorie in zip(ny_selec['Venue Latitude'], ny_selec['Venue Longitude'], ny_selec['Venue'], ny_selec['Venue Category']):
    label = '{}, {}'.format(name, categorie)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_boerum)  
    
map_boerum

### Boerum Hill looks a better neighborhood than Williamsburg, using the data that we have available.
### Let's stick with it.

## Summarizing the results:

### Tudor City is the neighborhood in New York City similar to Runnymede in Toronto.
### Morningside Heights is the neighborhood in New York City similar to Cabbagetown in Toronto.
### Boerum Hill is the neighborhood in New York City similar to Little Portugal in Toronto.
****

# 5- Discussion

The methodolgy applied here is very simple, compared to what is really necessary to select a new neighborhood in a different city.
However, it is a start. We would need more information, like rental or saling prices, public transportation, schools, etc.
Unfortunately we don't have that information with Foursquare.

This project can be improved with time, allowing for more constrains to be used in order to select similar neighborhoods to live.

# 6 - Conclusion

In conclusion, the Foursquare API is a powerfull machine to help us solve problems regarding selection of venues in different locations. 
It's combination with an API that could retrieve real state data about sales and rental prices would be very interesting.

The visualization of the data using Folium also helps a lot to decide among different options of neighborhood, in the present case.