#  1.Introduction/Business Problem

Toronto ,London and New York are famous tourist destinations in the world. They are diverse in many ways. All are multicultural as well as the financial hubs of their respective countries. We want to explore how much they are similar or dissimilar in aspects from a tourist point of view regarding food, accommodation, beautiful places, and many more.
Tourism industry is important for the benefits it brings and due to its role as a commercial activity that creates demand and growth for many more industries. Tourism not only contributes towards more economic activities but also generates more employment, revenues and play a significant role in development. Many countries such as Turkey, France and Italy depend heavily on tourism industry for their expanses.

Knowing what makes tourists choose their travel destination is crucial information for anyone working in the travel business. Therefore, for anyone who relies on tourists and tourism, understanding the consumer behavior is essential. In this project I will focus on venues such as restaurants, hotels, parks, cafes, cinemas and so on in London, Toronto and New York and cluster their neighborhoods in order to understand the similarities and differences between these cities. Therefor the target audience would be tourists and travel agencies. Tourists can explore neighborhoods in each city and decide which city they prefer to visit or if they have been to one of these cities before and enjoyed their visit, they can select a similar city to travel next time. Travel agencies also can recommend destinations to their customers based on customers’ experience and similarity and dissimilarity between different cities. 


# 2.Data

This project will analyze venues of the city of Toronto, New York and London.
The data below will be used for this analysis.

## 2.1  Boroughs and neighborhoods
### 2.1.1 London:
London has in total 32 boroughs. To explore, analyze and segment neighborhoods, longitude and latitude of each neighborhood and borough will be added.
This dataset exists for free on the web. I used this website: https://skgrange.github.io/www/data/london_sport.json

### 2.1.2 New York:
New York has a total of 5 boroughs and 306 neighborhoods. In order to segment the neighborhoods and explore them, we will essentially need a dataset that contains the 5 boroughs and the neighborhoods that exist in each borough as well as the latitude and longitude coordinates of each neighborhood.
Luckily, this dataset exists for free on the web. Here is the link to the dataset: https://geo.nyu.edu/catalog/nyu_2451_34572

### 2.1.3 Toronto:
For Toronto I used the table in Wikipedia for postal code and borough of each neighborhood. (link to the Wikipedia page: https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M ) and for the longitude and latitude of each neighborhood I used a csv file available in: : http://cocl.us/Geospatial_data

## 2.2 Foursquare API
in order to explore neighborhoods and cluster them we need to search for venues in each neighborhood. Foursquare API(utilized via the Request library in Python) permits to provide venues information for each neighborhood.

## 2.3 Dataframes


let's download all the dependencies that we will need.

In [1]:
import pandas as pd
import numpy as np
!pip install lxml
import json # library to handle JSON files
!pip install geopy
#!conda install -c conda-forge geopy --yes 
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans
!pip install folium
#!conda install -c conda-forge folium=0.5.0 --yes 
import folium # map rendering library

print('Libraries imported.')

Collecting folium
  Downloading folium-0.11.0-py2.py3-none-any.whl (93 kB)
[K     |████████████████████████████████| 93 kB 3.1 MB/s  eta 0:00:01
[?25hCollecting branca>=0.3.0
  Downloading branca-0.4.1-py3-none-any.whl (24 kB)
Installing collected packages: branca, folium
Successfully installed branca-0.4.1 folium-0.11.0
Libraries imported.


#### Define Foursquare Credentials and Version

In [2]:
# The code was removed by Watson Studio for sharing.

In [3]:
radius=500
LIMIT=100

## 2.3.1 London Dataframe:

In [4]:
import urllib.request
url='https://skgrange.github.io/www/data/london_sport.json'

with urllib.request.urlopen(url) as json_data:
    londn_data = json.loads(json_data.read().decode())
    
    
#with open('https://skgrange.github.io/www/data/london_sport') as json_data:
 #   londn_data = json.load(json_data)
column_names_london = ['Neighbprhood', 'Latitude', 'Longitude'] 

# instantiate the dataframe
london_data = pd.DataFrame(columns=column_names_london)
for data in londn_data['features']:
    #print(data)
    for i in range(0,5):
        neighborhood =  data['properties']['name'] 
        neighborhood=str(str(neighborhood)+str(i))
    
        
        neighborhood_latlon = data['geometry']['coordinates'][0][i]
        neighborhood_lat = neighborhood_latlon[1]
        neighborhood_lon = neighborhood_latlon[0]
    
        london_data = london_data.append({'Neighborhood': neighborhood,
                                          'Latitude': neighborhood_lat,
                                          'Longitude': neighborhood_lon}, ignore_index=True)
london_data=london_data.drop('Neighbprhood',axis=1)
calls=['Neighborhood','Latitude','Longitude']
london_data=london_data[calls]
london_data.head()

Unnamed: 0,Neighborhood,Latitude,Longitude
0,Bromley0,51.442884,0.031639
1,Bromley1,51.440465,0.041526
2,Bromley2,51.423211,0.063333
3,Bromley3,51.431508,0.076946
4,Bromley4,51.413598,0.109226


#### Create a map of London with neighborhoods superimposed on top

In [5]:
map_london = folium.Map(location=[51.5074, 0.1278], zoom_start=10)

# add markers to map
for lat, lng, neighborhood in zip(london_data['Latitude'], london_data['Longitude'], london_data['Neighborhood']):
    label = '{}'.format(neighborhood)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_london)  
    
map_london

#### now we will explore each neighborhood and  get the top 100 venues in each neighborhood that are within a radius of 500 meters.

In [6]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        #print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [7]:
london_venues = getNearbyVenues(names=london_data['Neighborhood'],
                                   latitudes=london_data['Latitude'],
                                   longitudes=london_data['Longitude']
                                  ) 
london_venues.head()

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Bromley0,51.442884,0.031639,Horn Park,51.442686,0.026776,Park
1,Bromley0,51.442884,0.031639,Mottingham Farm Riding Centre,51.439569,0.035346,Stables
2,Bromley1,51.440465,0.041526,RJ Landscapes,51.441462,0.041531,Construction & Landscaping
3,Bromley1,51.440465,0.041526,U-Keep Building & Construction Ltd,51.442014,0.043359,Construction & Landscaping
4,Bromley1,51.440465,0.041526,Eric Liddell Sports Centre,51.438059,0.039675,Gym / Fitness Center


#### examining the final Dataframe of London

In [8]:
london_venues['City']='London'
london_venues.head()

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category,City
0,Bromley0,51.442884,0.031639,Horn Park,51.442686,0.026776,Park,London
1,Bromley0,51.442884,0.031639,Mottingham Farm Riding Centre,51.439569,0.035346,Stables,London
2,Bromley1,51.440465,0.041526,RJ Landscapes,51.441462,0.041531,Construction & Landscaping,London
3,Bromley1,51.440465,0.041526,U-Keep Building & Construction Ltd,51.442014,0.043359,Construction & Landscaping,London
4,Bromley1,51.440465,0.041526,Eric Liddell Sports Centre,51.438059,0.039675,Gym / Fitness Center,London


## 2.3.2 Toronto Dataframe

In [9]:
d=pd.read_html('https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M')
df=d[0]
df.replace('Not assigned',np.nan,inplace=True)
df.dropna(subset=['Borough'],axis=0,inplace=True)
df=df.reset_index(drop=True)
csvfile='http://cocl.us/Geospatial_data'
dff=pd.read_csv(csvfile)
toronto_data = pd.merge(df, dff, on='Postal Code')
toronto_data=toronto_data.rename(columns={"Neighbourhood": "Neighborhood"})
toronto_data


Unnamed: 0,Postal Code,Borough,Neighborhood,Latitude,Longitude
0,M3A,North York,Parkwoods,43.753259,-79.329656
1,M4A,North York,Victoria Village,43.725882,-79.315572
2,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.654260,-79.360636
3,M6A,North York,"Lawrence Manor, Lawrence Heights",43.718518,-79.464763
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",43.662301,-79.389494
...,...,...,...,...,...
98,M8X,Etobicoke,"The Kingsway, Montgomery Road, Old Mill North",43.653654,-79.506944
99,M4Y,Downtown Toronto,Church and Wellesley,43.665860,-79.383160
100,M7Y,East Toronto,"Business reply mail Processing Centre, South C...",43.662744,-79.321558
101,M8Y,Etobicoke,"Old Mill South, King's Mill Park, Sunnylea, Hu...",43.636258,-79.498509


#### Create a map of Toronto with neighborhoods superimposed on top

In [10]:
address = 'Toronto, Ontario'

geolocator = Nominatim(user_agent="ny_explorer")
tlocation = geolocator.geocode(address)
tlatitude = tlocation.latitude
tlongitude = tlocation.longitude
print('The geograpical coordinate of  Toronto City are {}, {}.'.format(tlatitude, tlongitude))
map_toronto = folium.Map(location=[tlatitude, tlongitude], zoom_start=10)

# add markers to map
for lat, lng, borough, neighborhood in zip(toronto_data['Latitude'], toronto_data['Longitude'], toronto_data['Borough'], toronto_data['Neighborhood']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_toronto)  
    
map_toronto

The geograpical coordinate of  Toronto City are 43.6534817, -79.3839347.


#### now we will explore each neighborhood and  get the top 100 venues in each neighborhood that are within a radius of 500 meters using Foursquare API.

In [11]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        #print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [12]:
toronto_venues = getNearbyVenues(names=toronto_data['Neighborhood'],
                                   latitudes=toronto_data['Latitude'],
                                   longitudes=toronto_data['Longitude']
                                  ) 
toronto_venues.head()

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Parkwoods,43.753259,-79.329656,Brookbanks Park,43.751976,-79.33214,Park
1,Parkwoods,43.753259,-79.329656,Variety Store,43.751974,-79.333114,Food & Drink Shop
2,Victoria Village,43.725882,-79.315572,Victoria Village Arena,43.723481,-79.315635,Hockey Arena
3,Victoria Village,43.725882,-79.315572,Tim Hortons,43.725517,-79.313103,Coffee Shop
4,Victoria Village,43.725882,-79.315572,Portugril,43.725819,-79.312785,Portuguese Restaurant


#### examining the final Dataframe of Toronto

In [13]:
toronto_venues['City']='Toronto'
toronto_venues.head()

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category,City
0,Parkwoods,43.753259,-79.329656,Brookbanks Park,43.751976,-79.33214,Park,Toronto
1,Parkwoods,43.753259,-79.329656,Variety Store,43.751974,-79.333114,Food & Drink Shop,Toronto
2,Victoria Village,43.725882,-79.315572,Victoria Village Arena,43.723481,-79.315635,Hockey Arena,Toronto
3,Victoria Village,43.725882,-79.315572,Tim Hortons,43.725517,-79.313103,Coffee Shop,Toronto
4,Victoria Village,43.725882,-79.315572,Portugril,43.725819,-79.312785,Portuguese Restaurant,Toronto


## 2.3.3 New York Dataframe

In [14]:
!wget -q -O 'newyork_data.json' https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBMDeveloperSkillsNetwork-DS0701EN-SkillsNetwork/labs/newyork_data.json
print('Data downloaded!')
with open('newyork_data.json') as json_data:
    newyork = json.load(json_data)

neighborhoods_data = newyork['features']
column_names = ['Borough', 'Neighborhood', 'Latitude', 'Longitude'] 

# instantiate the dataframe
newyork_data = pd.DataFrame(columns=column_names)
for data in neighborhoods_data:
    borough = neighborhood_name = data['properties']['borough'] 
    neighborhood_name = data['properties']['name']
        
    neighborhood_latlon = data['geometry']['coordinates']
    neighborhood_lat = neighborhood_latlon[1]
    neighborhood_lon = neighborhood_latlon[0]
    
    newyork_data = newyork_data.append({'Borough': borough,
                                          'Neighborhood': neighborhood_name,
                                          'Latitude': neighborhood_lat,
                                          'Longitude': neighborhood_lon}, ignore_index=True)
newyork_data.head()

Data downloaded!


Unnamed: 0,Borough,Neighborhood,Latitude,Longitude
0,Bronx,Wakefield,40.894705,-73.847201
1,Bronx,Co-op City,40.874294,-73.829939
2,Bronx,Eastchester,40.887556,-73.827806
3,Bronx,Fieldston,40.895437,-73.905643
4,Bronx,Riverdale,40.890834,-73.912585


#### Create a map of New York with neighborhoods superimposed on top

In [15]:
address = 'New York City, NY'

geolocator = Nominatim(user_agent="ny_explorer")
nlocation = geolocator.geocode(address)
nlatitude = nlocation.latitude
nlongitude = nlocation.longitude
print('The geograpical coordinate of New York City are {}, {}.'.format(nlatitude, nlongitude))
map_newyork = folium.Map(location=[nlatitude, nlongitude], zoom_start=10)

# add markers to map
for lat, lng, borough, neighborhood in zip(newyork_data['Latitude'], newyork_data['Longitude'], newyork_data['Borough'], newyork_data['Neighborhood']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_newyork)  
    
map_newyork

The geograpical coordinate of New York City are 40.7127281, -74.0060152.


#### now we will explore each neighborhood and  get the top 100 venues in each neighborhood that are within a radius of 500 meters using foursquare API.

In [16]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        #print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues_newyork = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues_newyork.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues_newyork)

In [18]:
newyork_venues = getNearbyVenues(names=newyork_data['Neighborhood'],
                                   latitudes=newyork_data['Latitude'],
                                   longitudes=newyork_data['Longitude']
                                  ) 
newyork_venues.head()

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Wakefield,40.894705,-73.847201,Lollipops Gelato,40.894123,-73.845892,Dessert Shop
1,Wakefield,40.894705,-73.847201,Rite Aid,40.896649,-73.844846,Pharmacy
2,Wakefield,40.894705,-73.847201,Walgreens,40.896528,-73.8447,Pharmacy
3,Wakefield,40.894705,-73.847201,Carvel Ice Cream,40.890487,-73.848568,Ice Cream Shop
4,Wakefield,40.894705,-73.847201,Dunkin',40.890459,-73.849089,Donut Shop


#### examining the final Dataframe of New York

In [20]:
newyork_venues["City"]='Newyork'
newyork_venues.head()

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category,City
0,Wakefield,40.894705,-73.847201,Lollipops Gelato,40.894123,-73.845892,Dessert Shop,Newyork
1,Wakefield,40.894705,-73.847201,Rite Aid,40.896649,-73.844846,Pharmacy,Newyork
2,Wakefield,40.894705,-73.847201,Walgreens,40.896528,-73.8447,Pharmacy,Newyork
3,Wakefield,40.894705,-73.847201,Carvel Ice Cream,40.890487,-73.848568,Ice Cream Shop,Newyork
4,Wakefield,40.894705,-73.847201,Dunkin',40.890459,-73.849089,Donut Shop,Newyork
