#  1.Introduction/Business Problem

Toronto ,London and New York are famous tourist destinations in the world. They are diverse in many ways. All are multicultural as well as the financial hubs of their respective countries. We want to explore how much they are similar or dissimilar in aspects from a tourist point of view regarding food, accommodation, beautiful places, and many more.
Tourism industry is important for the benefits it brings and due to its role as a commercial activity that creates demand and growth for many more industries. Tourism not only contributes towards more economic activities but also generates more employment, revenues and play a significant role in development. Many countries such as Turkey, France and Italy depend heavily on tourism industry for their expanses.

Knowing what makes tourists choose their travel destination is crucial information for anyone working in the travel business. Therefore, for anyone who relies on tourists and tourism, understanding the consumer behavior is essential. In this project I will focus on venues such as restaurants, hotels, parks, cafes, cinemas and so on in London, Toronto and New York and cluster their neighborhoods in order to understand the similarities and differences between these cities. Therefor the target audience would be tourists and travel agencies. Tourists can explore neighborhoods in each city and decide which city they prefer to visit or if they have been to one of these cities before and enjoyed their visit, they can select a similar city to travel next time. Travel agencies also can recommend destinations to their customers based on customers’ experience and similarity and dissimilarity between different cities. 


# 2.Data

This project will analyze venues of the city of Toronto, New York and London.
The data below will be used for this analysis.

## 2.1  Boroughs and neighborhoods
### 2.1.1 London:
London has in total 32 boroughs. To explore, analyze and segment neighborhoods, longitude and latitude of each neighborhood and borough will be added.
This dataset exists for free on the web. I used this website: https://skgrange.github.io/www/data/london_sport.json

### 2.1.2 New York:
New York has a total of 5 boroughs and 306 neighborhoods. In order to segment the neighborhoods and explore them, we will essentially need a dataset that contains the 5 boroughs and the neighborhoods that exist in each borough as well as the latitude and longitude coordinates of each neighborhood.
Luckily, this dataset exists for free on the web. Here is the link to the dataset: https://geo.nyu.edu/catalog/nyu_2451_34572

### 2.1.3 Toronto:
For Toronto I used the table in Wikipedia for postal code and borough of each neighborhood. (link to the Wikipedia page: https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M ) and for the longitude and latitude of each neighborhood I used a csv file available in: : http://cocl.us/Geospatial_data

## 2.2 Foursquare API
in order to explore neighborhoods and cluster them we need to search for venues in each neighborhood. Foursquare API(utilized via the Request library in Python) permits to provide venues information for each neighborhood.

## 2.3 Example of Dataframes
We need Neighborhood name,Longitude and Latitude for data frame of our cities.


let's download all the dependencies that we will need.

In [1]:
import pandas as pd
import numpy as np
!pip install lxml
import json # library to handle JSON files
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe
print('Libraries imported.')

Collecting lxml
[?25l  Downloading https://files.pythonhosted.org/packages/bd/78/56a7c88a57d0d14945472535d0df9fb4bbad7d34ede658ec7961635c790e/lxml-4.6.2-cp36-cp36m-manylinux1_x86_64.whl (5.5MB)
[K     |████████████████████████████████| 5.5MB 7.5MB/s eta 0:00:01     |█████████████████████████▏      | 4.4MB 7.5MB/s eta 0:00:01
[?25hInstalling collected packages: lxml
Successfully installed lxml-4.6.2
Libraries imported.


## 2.3.1 London Dataframe:

In [2]:
with open('london_sport.json') as json_data:
    londn_data = json.load(json_data)
column_names_london = ['Neighbprhood', 'Latitude', 'Longitude'] 

# instantiate the dataframe
london_data = pd.DataFrame(columns=column_names_london)
for data in londn_data['features']:
    #print(data)
    for i in range(0,5):
        neighborhood =  data['properties']['name'] 
        neighborhood=str(str(neighborhood)+str(i))
    
        
        neighborhood_latlon = data['geometry']['coordinates'][0][i]
        neighborhood_lat = neighborhood_latlon[1]
        neighborhood_lon = neighborhood_latlon[0]
    
        london_data = london_data.append({'Neighborhood': neighborhood,
                                          'Latitude': neighborhood_lat,
                                          'Longitude': neighborhood_lon}, ignore_index=True)
london_data=london_data.drop('Neighbprhood',axis=1)
calls=['Neighborhood','Latitude','Longitude']
london_data=london_data[calls]
london_data.head()

Unnamed: 0,Neighborhood,Latitude,Longitude
0,Bromley0,51.442884,0.031639
1,Bromley1,51.440465,0.041526
2,Bromley2,51.423211,0.063333
3,Bromley3,51.431508,0.076946
4,Bromley4,51.413598,0.109226


## 2.3.2 Toronto Dataframe

In [5]:
d=pd.read_html('https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M')
df=d[0]
df.replace('Not assigned',np.nan,inplace=True)
df.dropna(subset=['Borough'],axis=0,inplace=True)
df=df.reset_index(drop=True)
csvfile='http://cocl.us/Geospatial_data'
dff=pd.read_csv(csvfile)
toronto_data = pd.merge(df, dff, on='Postal Code')
toronto_data=toronto_data.rename(columns={"Neighbourhood": "Neighborhood"})
toronto_data[['Neighborhood','Latitude','Longitude']].head()


Unnamed: 0,Neighborhood,Latitude,Longitude
0,Parkwoods,43.753259,-79.329656
1,Victoria Village,43.725882,-79.315572
2,"Regent Park, Harbourfront",43.65426,-79.360636
3,"Lawrence Manor, Lawrence Heights",43.718518,-79.464763
4,"Queen's Park, Ontario Provincial Government",43.662301,-79.389494


## 2.3.3 New York Dataframe

In [6]:
!wget -q -O 'newyork_data.json' https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBMDeveloperSkillsNetwork-DS0701EN-SkillsNetwork/labs/newyork_data.json
print('Data downloaded!')
with open('newyork_data.json') as json_data:
    newyork = json.load(json_data)

neighborhoods_data = newyork['features']
column_names = ['Borough', 'Neighborhood', 'Latitude', 'Longitude'] 

# instantiate the dataframe
newyork_data = pd.DataFrame(columns=column_names)
for data in neighborhoods_data:
    borough = neighborhood_name = data['properties']['borough'] 
    neighborhood_name = data['properties']['name']
        
    neighborhood_latlon = data['geometry']['coordinates']
    neighborhood_lat = neighborhood_latlon[1]
    neighborhood_lon = neighborhood_latlon[0]
    
    newyork_data = newyork_data.append({'Borough': borough,
                                          'Neighborhood': neighborhood_name,
                                          'Latitude': neighborhood_lat,
                                          'Longitude': neighborhood_lon}, ignore_index=True)
newyork_data.head()

Data downloaded!


Unnamed: 0,Borough,Neighborhood,Latitude,Longitude
0,Bronx,Wakefield,40.894705,-73.847201
1,Bronx,Co-op City,40.874294,-73.829939
2,Bronx,Eastchester,40.887556,-73.827806
3,Bronx,Fieldston,40.895437,-73.905643
4,Bronx,Riverdale,40.890834,-73.912585
