# Capstone Project - The Battle of Neighborhoods
## PROBLEM & BACKGROUND
Toronto and New York are the famous places in the world. They are diverse in many ways. Both are multicultural as well as the financial hubs of their respective countries. We want to explore how much they are similar or dissimilar in aspects from a tourist point of view regarding food, accommodation, beautiful places, and many more.
Today Tourism is one of the pillars of the economy and the people most often visits those countries who are rich in heritage and developed enough from a foreign prospective, like friendly environment. Every city is unique in their own way and give something new. And now the information is so common regarding location of every place around the world on your fingertips which make it easier to explore. Therefore, tourists always eager to travel to different places on the basis of available information, and the comparison (the part of the information) between the two cities always assist to choose the specific places or according to their choice.

## DATA DESCRIPTION
For this problem, we will get the services of Foursquare API to explore the data of two cities, in terms of their neighborhoods. The data also include the information about the places around each neighborhood like restaurants, hotels, coffee shops, parks, theaters, art galleries, museums and many more. We selected one Borough from each city to analyze their neighborhoods. Manhattan from New York and Downtown Toronto from Toronto. We will use machine learning technique, “Clustering” to segment the neighborhoods with similar objects on the basis of each neighborhood data. These objects will be given priority on the basis of foot traffic (activity) in their respective neighborhoods. This will help to locate the tourist’s areas and hubs, and then we can judge the similarity or dissimilarity between two cities on that basis.
## METHODOLOGY
As we have selected two cities Borough to explore their neighborhoods. The data exploration, analysis and visualization for both boroughs are done in the same way but separately.

Let's do data processing and exploration

In [72]:
!pip install geopy
!pip install folium
import pandas as pd
import numpy as np
import requests
from bs4 import BeautifulSoup
import requests
import os
from sklearn.cluster import KMeans
import folium 
from geopy.geocoders import Nominatim 
import matplotlib.cm as cm
import matplotlib.colors as colors

print(">>>>>>> Installed!")

>>>>>>> Installed!


### Get datasource

In [73]:
List_url='https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M'
source = requests.get(List_url).text
soup = BeautifulSoup(source, 'xml')
table=soup.find ('table')
column_names=['Postalcode','Borough','Neighborhood']
df = pd.DataFrame(columns=column_names)
for tr_cell in table.find_all('tr'):
    row_data=[]
    for td_cell in tr_cell.find_all('td'):
        row_data.append(td_cell.text.strip())
    if len(row_data)==3:
        df.loc[len(df)] = row_data
        df.head()
        
df=df[df['Borough']!='Not assigned']

df.head()
df.shape

(103, 3)

### Data preprocessing and cleaning

In [74]:
df1 = df[df.Borough != 'Not assigned']

# Combining the neighbourhoods with same Postalcode
df2 = df1.groupby(['Postalcode','Borough'], sort=False).agg(', '.join)
df2.reset_index(inplace=True)

# Replacing the name of the neighbourhoods which are 'Not assigned' with names of Borough
df2['Neighborhood'] = np.where(df2['Neighborhood'] == 'Not assigned',df2['Borough'], df2['Neighborhood'])

df2.shape
df2.head()

Unnamed: 0,Postalcode,Borough,Neighborhood
0,M3A,North York,Parkwoods
1,M4A,North York,Victoria Village
2,M5A,Downtown Toronto,Regent Park / Harbourfront
3,M6A,North York,Lawrence Manor / Lawrence Heights
4,M7A,Downtown Toronto,Queen's Park / Ontario Provincial Government


### Importing the csv file conatining the latitudes and longitudes for various neighbourhoods in Canada

In [75]:
lat_lon = pd.read_csv('https://cocl.us/Geospatial_data')
lat_lon.head()

Unnamed: 0,Postal Code,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497
2,M1E,43.763573,-79.188711
3,M1G,43.770992,-79.216917
4,M1H,43.773136,-79.239476


In [76]:
lat_lon.rename(columns={'Postal Code':'Postalcode'},inplace=True)
df3 = pd.merge(df2,lat_lon,on='Postalcode')
df3.head()

Unnamed: 0,Postalcode,Borough,Neighborhood,Latitude,Longitude
0,M3A,North York,Parkwoods,43.753259,-79.329656
1,M4A,North York,Victoria Village,43.725882,-79.315572
2,M5A,Downtown Toronto,Regent Park / Harbourfront,43.65426,-79.360636
3,M6A,North York,Lawrence Manor / Lawrence Heights,43.718518,-79.464763
4,M7A,Downtown Toronto,Queen's Park / Ontario Provincial Government,43.662301,-79.389494


### The notebook from here includes the Clustering and the plotting of the neighbourhoods of Canada which contain Toronto in their Borough

In [77]:
df4 = df3[df3['Borough'].str.contains('Toronto',regex=False)]
df4.head()

Unnamed: 0,Postalcode,Borough,Neighborhood,Latitude,Longitude
2,M5A,Downtown Toronto,Regent Park / Harbourfront,43.65426,-79.360636
4,M7A,Downtown Toronto,Queen's Park / Ontario Provincial Government,43.662301,-79.389494
9,M5B,Downtown Toronto,"Garden District, Ryerson",43.657162,-79.378937
15,M5C,Downtown Toronto,St. James Town,43.651494,-79.375418
19,M4E,East Toronto,The Beaches,43.676357,-79.293031


In [78]:
downtown_toronto_data = df4[df4['Borough'] == 'Downtown Toronto'].reset_index(drop=True)

In [79]:
downtown_toronto_data.head()

Unnamed: 0,Postalcode,Borough,Neighborhood,Latitude,Longitude
0,M5A,Downtown Toronto,Regent Park / Harbourfront,43.65426,-79.360636
1,M7A,Downtown Toronto,Queen's Park / Ontario Provincial Government,43.662301,-79.389494
2,M5B,Downtown Toronto,"Garden District, Ryerson",43.657162,-79.378937
3,M5C,Downtown Toronto,St. James Town,43.651494,-79.375418
4,M5E,Downtown Toronto,Berczy Park,43.644771,-79.373306


In [92]:
# Sorting
# set index for only Downtown Toronto
downtown_toronto_data = downtown_toronto_data[downtown_toronto_data['Borough'] == 'Downtown Toronto'].reset_index(drop=True)
# eliminate 'Postcode' column

In [93]:
downtown_toronto_data=downtown_toronto_data.drop(['Postalcode'], axis=1)

KeyError: "['Postalcode'] not found in axis"

In [94]:
downtown_toronto_data.head()

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude
0,Downtown Toronto,Regent Park / Harbourfront,43.65426,-79.360636
1,Downtown Toronto,Queen's Park / Ontario Provincial Government,43.662301,-79.389494
2,Downtown Toronto,"Garden District, Ryerson",43.657162,-79.378937
3,Downtown Toronto,St. James Town,43.651494,-79.375418
4,Downtown Toronto,Berczy Park,43.644771,-79.373306


Now we will move towards Thailand Boroughs. We select "Bangkok" as a Borough and anylze its neighborhoods later

In [84]:
# The code was removed by Watson Studio for sharing.

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude
0,Bangkok,Yan Nawa,13.696944,100.543056
1,Bangkok,Watthana,13.742222,100.585833
2,Bangkok,Wang Thonglang,13.7864,100.6087
3,Bangkok,Thung Khru,13.6472,100.4958
4,Bangkok,Thon Buri,13.725,100.485833


In [89]:
# Creating new Dataframe map_bangkok
bangkok_data = map_bangkok[map_bangkok['Borough'] == 'Bangkok'].reset_index(drop=True)
bangkok_data.head()

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude
0,Bangkok,Yan Nawa,13.696944,100.543056
1,Bangkok,Watthana,13.742222,100.585833
2,Bangkok,Wang Thonglang,13.7864,100.6087
3,Bangkok,Thung Khru,13.6472,100.4958
4,Bangkok,Thon Buri,13.725,100.485833


In [95]:
bangkok_data.to_excel("bangkok.xlsx") 

## Foursquare API - Toronto

In [142]:
# Define Foursquare Credentials and Version
CLIENT_ID = 'HRMBKZUASN1NWO005IQK4TGG15UVEY5GCLJCYXHXW0VDP00K' # your Foursquare ID
CLIENT_SECRET = 'JSXFO23NR2OMICQSZRFQYDAZG1GMNRALXXACAFVNF5CGAM4C' # your Foursquare Secret
#VERSION = '20200404'
VERSION = '20180604'
limit = 20
print('Your credentails:')
print('CLIENT_ID:'+ CLIENT_ID)
print('CLIENT_SECRET:'+ CLIENT_SECRET)

Your credentails:
CLIENT_ID:HRMBKZUASN1NWO005IQK4TGG15UVEY5GCLJCYXHXW0VDP00K
CLIENT_SECRET:JSXFO23NR2OMICQSZRFQYDAZG1GMNRALXXACAFVNF5CGAM4C


In [119]:
# get the geographical coordinates of Downtown Toronto
address = 'Downtown Toronto, ON, Canada'

geolocator = Nominatim(user_agent="my-application")
location = geolocator.geocode(address)
latitude_downtown_toronto = location.latitude
longitude_downtown_toronto = location.longitude
print("Downtown Toronto","latitude",latitude_downtown_toronto, "& " "longitude" ,longitude_downtown_toronto)

Downtown Toronto latitude 43.6563221 & longitude -79.3809161


In [125]:
# get the geographical coordinates of >>> Bangkok, Thailand
address = ' Bangkok, Thailand'
geolocator = Nominatim(user_agent="my-application")
location = geolocator.geocode(address)
latitude_bangkok = location.latitude
longitude_bangkok = location.longitude
print(address,"latitude",latitude_bangkok, "& " "longitude" ,longitude_bagnkok)

 Bangkok, Thailand latitude 13.7542529 & longitude 100.493087


# VISUALIZATION
We visualize the data many times at different stages. In the beginning, we visualize the selected borough neighborhoods so that we can get an idea or confirmation regarding the coordinates of that Borough. The second time after clustered the neighborhoods, we visualize the clusters to name them. Assigning the names are very important because it can identify the areas or specific places in each cluster.¶
#### (Before Clustering)

### Downtown Toronto

In [122]:
# create map of Downtown Toronto using latitude and longitude values
map_downtown_toronto = folium.Map(location=[latitude_downtown_toronto,longitude_downtown_toronto], zoom_start=11)

# add markers to map
for lat, lng, label in zip(downtown_toronto_data['Latitude'], downtown_toronto_data['Longitude'], downtown_toronto_data['Neighborhood']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_downtown_toronto)  
    
map_downtown_toronto

In [123]:
from folium import plugins
# create map of Downtown Toronto using latitude and longitude values
map_downtown_toronto = folium.Map(location=[latitude_downtown_toronto,longitude_downtown_toronto], zoom_start=11)
# instantiate a mark cluster object for the incidents in the dataframe
incidents = plugins.MarkerCluster().add_to(map_downtown_toronto)
# add markers to map
for lat, lng, label in zip(downtown_toronto_data['Latitude'], downtown_toronto_data['Longitude'], downtown_toronto_data['Neighborhood']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(incidents)  
    
map_downtown_toronto

### Bangkok

In [101]:
map_bangkok.head()

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude
0,Bangkok,Yan Nawa,13.696944,100.543056
1,Bangkok,Watthana,13.742222,100.585833
2,Bangkok,Wang Thonglang,13.7864,100.6087
3,Bangkok,Thung Khru,13.6472,100.4958
4,Bangkok,Thon Buri,13.725,100.485833


In [127]:
# let's visualizat Manhattan the neighborhoods in it.
# create map of Manhattan using latitude and longitude values
map_bangkok = folium.Map(location=[latitude_bangkok, longitude_bangkok], zoom_start=11)

# add markers to map
for lat, lng, label in zip(bangkok_data['Latitude'], bangkok_data['Longitude'], bangkok_data['Neighborhood']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_bangkok)  
    
map_bangkok

In [128]:
map_bangkok = folium.Map(location=[latitude_bangkok, longitude_bangkok], zoom_start=11)

grouping = plugins.MarkerCluster().add_to(map_bangkok)

# add markers to map
for lat, lng, label in zip(bangkok_data['Latitude'], bangkok_data['Longitude'], bangkok_data['Neighborhood']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(grouping)  
    
map_bangkok

# ANALYSIS
We analyze both boroughs neighborhoods through one hot encoding 
(giving ‘1’ if a venue category is there, and ‘0’ in case of venue category is not there). On the basis of one hot encoding, we calculate mean of the frequency of occurrence of each category and picked top ten venues on that basis for each neighborhood. It means the top venues are showing the foot traffic or the more visited places.
## Exploring Neighborhoods in Downtown Toronto

In [143]:
# Let's create a function to repeat the process to all the neighborhoods in Toronto
def getNearbyVenues(names, latitudes,longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names,latitudes,longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            limit)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [144]:
# Write the code to run the above function on each neighborhood and create a new dataframe called toronto_venues.
downtown_toronto_venues = getNearbyVenues(names=downtown_toronto_data['Neighborhood'],
                                   latitudes=downtown_toronto_data['Latitude'],
                                   longitudes=downtown_toronto_data['Longitude'],
                                  )

Regent Park / Harbourfront
Queen's Park / Ontario Provincial Government
Garden District, Ryerson
St. James Town
Berczy Park
Central Bay Street
Christie
Richmond / Adelaide / King
Harbourfront East / Union Station / Toronto Islands
Toronto Dominion Centre / Design Exchange
Commerce Court / Victoria Hotel
University of Toronto / Harbord
Kensington Market / Chinatown / Grange Park
CN Tower / King and Spadina / Railway Lands / Harbourfront West / Bathurst
 Quay / South Niagara / Island airport
Rosedale
Stn A PO Boxes
St. James Town / Cabbagetown
First Canadian Place / Underground city
Church and Wellesley


In [145]:
# Let's check the size of the resulting dataframe
print(downtown_toronto_venues.shape)
downtown_toronto_venues.head()

(359, 7)


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Regent Park / Harbourfront,43.65426,-79.360636,Roselle Desserts,43.653447,-79.362017,Bakery
1,Regent Park / Harbourfront,43.65426,-79.360636,Tandem Coffee,43.653559,-79.361809,Coffee Shop
2,Regent Park / Harbourfront,43.65426,-79.360636,Cooper Koo Family YMCA,43.653249,-79.358008,Distribution Center
3,Regent Park / Harbourfront,43.65426,-79.360636,Body Blitz Spa East,43.654735,-79.359874,Spa
4,Regent Park / Harbourfront,43.65426,-79.360636,Morning Glory Cafe,43.653947,-79.361149,Breakfast Spot


In [146]:
# Let's check how many venues were returned for each neighborhood
downtown_toronto_venues.groupby('Neighborhood').count()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Berczy Park,20,20,20,20,20,20
CN Tower / King and Spadina / Railway Lands / Harbourfront West / Bathurst\n Quay / South Niagara / Island airport,17,17,17,17,17,17
Central Bay Street,20,20,20,20,20,20
Christie,18,18,18,18,18,18
Church and Wellesley,20,20,20,20,20,20
Commerce Court / Victoria Hotel,20,20,20,20,20,20
First Canadian Place / Underground city,20,20,20,20,20,20
"Garden District, Ryerson",20,20,20,20,20,20
Harbourfront East / Union Station / Toronto Islands,20,20,20,20,20,20
Kensington Market / Chinatown / Grange Park,20,20,20,20,20,20


In [147]:
# Let's find out how many unique categories can be curated from all the returned venues
print('There are {} uniques categories.'.format(len(downtown_toronto_venues['Venue Category'].unique())))

There are 125 uniques categories.


## Analyzing Each Neighborhood

In [148]:
# one hot encoding
downtown_toronto_onehot = pd.get_dummies(downtown_toronto_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
downtown_toronto_onehot['Neighborhood'] = downtown_toronto_venues['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [downtown_toronto_onehot.columns[-1]] + list(downtown_toronto_onehot.columns[:-1])
downtown_toronto_onehot = downtown_toronto_onehot[fixed_columns]

downtown_toronto_onehot.head()

Unnamed: 0,Yoga Studio,Airport,Airport Food Court,Airport Gate,Airport Lounge,Airport Service,Airport Terminal,American Restaurant,Art Gallery,Arts & Crafts Store,...,Taiwanese Restaurant,Tea Room,Thai Restaurant,Theater,Theme Restaurant,Trail,Vegetarian / Vegan Restaurant,Vietnamese Restaurant,Wine Bar,Wings Joint
0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [149]:
# Next, let's group rows by neighborhood and by taking the mean of the frequency of occurrence of each category
downtown_toronto_grouped = downtown_toronto_onehot.groupby('Neighborhood').mean().reset_index()

In [150]:
# Let's print each neighborhood along with the top 5 most common venues
num_top_venues = 5

for hood in downtown_toronto_grouped['Neighborhood']:
    print("----"+hood+"----")
    temp = downtown_toronto_grouped[downtown_toronto_grouped['Neighborhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----Berczy Park----
            venue  freq
0    Cocktail Bar  0.10
1  Farmers Market  0.10
2        Beer Bar  0.10
3          Bistro  0.05
4          Bakery  0.05


----CN Tower / King and Spadina / Railway Lands / Harbourfront West / Bathurst
 Quay / South Niagara / Island airport----
                 venue  freq
0      Airport Service  0.18
1       Airport Lounge  0.12
2     Airport Terminal  0.12
3  Rental Car Location  0.06
4                  Bar  0.06


----Central Bay Street----
                 venue  freq
0          Coffee Shop  0.30
1   Italian Restaurant  0.10
2  Japanese Restaurant  0.10
3   Chinese Restaurant  0.05
4                  Bar  0.05


----Christie----
           venue  freq
0  Grocery Store  0.22
1           Café  0.17
2           Park  0.11
3    Candy Store  0.06
4     Restaurant  0.06


----Church and Wellesley----
          venue  freq
0          Park  0.05
1     Juice Bar  0.05
2  Burger Joint  0.05
3     Bookstore  0.05
4         Diner  0.05


----Commerce 

In [151]:
# Let's put that into a pandas dataframe
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [152]:
# Now let's create the new dataframe and display the top 10 venues for each neighborhood.
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = downtown_toronto_grouped['Neighborhood']

for ind in np.arange(downtown_toronto_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(downtown_toronto_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Berczy Park,Cocktail Bar,Beer Bar,Farmers Market,Park,Bakery,Coffee Shop,Liquor Store,Bistro,Restaurant,Concert Hall
1,CN Tower / King and Spadina / Railway Lands / ...,Airport Service,Airport Lounge,Airport Terminal,Boat or Ferry,Bar,Airport,Airport Food Court,Airport Gate,Boutique,Coffee Shop
2,Central Bay Street,Coffee Shop,Italian Restaurant,Japanese Restaurant,Gastropub,Modern European Restaurant,Park,Miscellaneous Shop,Bubble Tea Shop,Sandwich Place,Bar
3,Christie,Grocery Store,Café,Park,Athletics & Sports,Italian Restaurant,Diner,Coffee Shop,Nightclub,Candy Store,Restaurant
4,Church and Wellesley,Park,Ramen Restaurant,Restaurant,Pub,Salon / Barbershop,Bookstore,Beer Bar,Pizza Place,Breakfast Spot,Creperie
5,Commerce Court / Victoria Hotel,Café,Gastropub,Coffee Shop,Restaurant,American Restaurant,Pub,Beer Bar,Bakery,Deli / Bodega,Tea Room
6,First Canadian Place / Underground city,Café,Coffee Shop,Restaurant,Steakhouse,Gluten-free Restaurant,Gym,Gym / Fitness Center,Deli / Bodega,Pizza Place,Pub
7,"Garden District, Ryerson",Café,Coffee Shop,Burger Joint,Steakhouse,Music Venue,Burrito Place,Clothing Store,Plaza,Ramen Restaurant,Comic Shop
8,Harbourfront East / Union Station / Toronto Is...,Café,Plaza,Park,Italian Restaurant,Supermarket,Lake,Japanese Restaurant,Skating Rink,Hotel,Sporting Goods Shop
9,Kensington Market / Chinatown / Grange Park,Café,Vietnamese Restaurant,Mexican Restaurant,Caribbean Restaurant,Farmers Market,Cheese Shop,Organic Grocery,Cocktail Bar,Coffee Shop,Belgian Restaurant


## Clustering Neighborhoods

In [153]:
# set number of clusters
kclusters = 5

downtown_toronto_grouped_clustering = downtown_toronto_grouped.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(downtown_toronto_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

array([2, 1, 3, 1, 1, 4, 4, 1, 1, 1], dtype=int32)

In [154]:
# Let's create a new dataframe that includes the cluster as well as the top 10 venues for each neighborhood.
downtown_toronto_merged = downtown_toronto_data

# add clustering labels
downtown_toronto_merged['Cluster Labels'] = kmeans.labels_

# merge toronto_grouped with toronto_data to add latitude/longitude for each neighborhood
downtown_toronto_merged = downtown_toronto_merged.join(neighborhoods_venues_sorted.set_index('Neighborhood'), on='Neighborhood')

downtown_toronto_merged.head() # check the last columns!

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Downtown Toronto,Regent Park / Harbourfront,43.65426,-79.360636,2,Coffee Shop,Bakery,Breakfast Spot,Park,Distribution Center,Spa,Pub,Dessert Shop,Historic Site,Farmers Market
1,Downtown Toronto,Queen's Park / Ontario Provincial Government,43.662301,-79.389494,1,Coffee Shop,Diner,Wings Joint,Park,Arts & Crafts Store,Beer Bar,Boutique,Burger Joint,Burrito Place,Creperie
2,Downtown Toronto,"Garden District, Ryerson",43.657162,-79.378937,3,Café,Coffee Shop,Burger Joint,Steakhouse,Music Venue,Burrito Place,Clothing Store,Plaza,Ramen Restaurant,Comic Shop
3,Downtown Toronto,St. James Town,43.651494,-79.375418,1,Gastropub,Coffee Shop,Restaurant,Japanese Restaurant,BBQ Joint,Café,Middle Eastern Restaurant,Cosmetics Shop,Creperie,Hotel
4,Downtown Toronto,Berczy Park,43.644771,-79.373306,1,Cocktail Bar,Beer Bar,Farmers Market,Park,Bakery,Coffee Shop,Liquor Store,Bistro,Restaurant,Concert Hall


In [156]:
# create map
map_clusters = folium.Map(location=[latitude_downtown_toronto, longitude_downtown_toronto], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i+x+(i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(downtown_toronto_merged['Latitude'], downtown_toronto_merged['Longitude'], downtown_toronto_merged['Neighborhood'], downtown_toronto_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

## Examine Clusters
Now, we can examine each cluster and determine the discriminating venue categories that distinguish each cluster. Based on the defining categories, we can then assign a name to each cluster.### Now, we can examine each cluster and determine the discriminating venue categories that distinguish each cluster. Based on the defining categories, we can then assign a name to each cluster.
### Cluster 1 (Airport Lounge, Coffee Shop, Cafe, Restaurants & Grocery Store)

In [159]:
downtown_toronto_merged.loc[downtown_toronto_merged['Cluster Labels'] == 0, downtown_toronto_merged.columns[[1] + list(range(5, downtown_toronto_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
13,CN Tower / King and Spadina / Railway Lands / ...,Airport Service,Airport Lounge,Airport Terminal,Boat or Ferry,Bar,Airport,Airport Food Court,Airport Gate,Boutique,Coffee Shop


### Cluster 2 (Gastropubs)

In [160]:
downtown_toronto_merged.loc[downtown_toronto_merged['Cluster Labels'] == 1, downtown_toronto_merged.columns[[1] + list(range(5, downtown_toronto_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
1,Queen's Park / Ontario Provincial Government,Coffee Shop,Diner,Wings Joint,Park,Arts & Crafts Store,Beer Bar,Boutique,Burger Joint,Burrito Place,Creperie
3,St. James Town,Gastropub,Coffee Shop,Restaurant,Japanese Restaurant,BBQ Joint,Café,Middle Eastern Restaurant,Cosmetics Shop,Creperie,Hotel
4,Berczy Park,Cocktail Bar,Beer Bar,Farmers Market,Park,Bakery,Coffee Shop,Liquor Store,Bistro,Restaurant,Concert Hall
7,Richmond / Adelaide / King,Asian Restaurant,Steakhouse,Gym / Fitness Center,Plaza,Restaurant,Bar,Hotel,Pizza Place,Seafood Restaurant,Coffee Shop
8,Harbourfront East / Union Station / Toronto Is...,Café,Plaza,Park,Italian Restaurant,Supermarket,Lake,Japanese Restaurant,Skating Rink,Hotel,Sporting Goods Shop
9,Toronto Dominion Centre / Design Exchange,Coffee Shop,Deli / Bodega,Café,Gastropub,Hotel,Japanese Restaurant,Beer Bar,Pub,Bar,Bakery
10,Commerce Court / Victoria Hotel,Café,Gastropub,Coffee Shop,Restaurant,American Restaurant,Pub,Beer Bar,Bakery,Deli / Bodega,Tea Room
12,Kensington Market / Chinatown / Grange Park,Café,Vietnamese Restaurant,Mexican Restaurant,Caribbean Restaurant,Farmers Market,Cheese Shop,Organic Grocery,Cocktail Bar,Coffee Shop,Belgian Restaurant


### Cluster 3 (Cafes)

In [161]:
downtown_toronto_merged.loc[downtown_toronto_merged['Cluster Labels'] == 2, downtown_toronto_merged.columns[[1] + list(range(5, downtown_toronto_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Regent Park / Harbourfront,Coffee Shop,Bakery,Breakfast Spot,Park,Distribution Center,Spa,Pub,Dessert Shop,Historic Site,Farmers Market
15,Stn A PO Boxes,Café,Cocktail Bar,Farmers Market,Bakery,Seafood Restaurant,Park,Concert Hall,Restaurant,Beer Bar,Jazz Club
16,St. James Town / Cabbagetown,Café,Restaurant,Bakery,Gastropub,General Entertainment,Diner,Deli / Bodega,Indian Restaurant,Italian Restaurant,Japanese Restaurant
18,Church and Wellesley,Park,Ramen Restaurant,Restaurant,Pub,Salon / Barbershop,Bookstore,Beer Bar,Pizza Place,Breakfast Spot,Creperie


### Cluster 4 (Coffee Shop, Cafe, Park & Japanese Restaurant)

In [162]:
downtown_toronto_merged.loc[downtown_toronto_merged['Cluster Labels'] == 3, downtown_toronto_merged.columns[[1] + list(range(5, downtown_toronto_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
2,"Garden District, Ryerson",Café,Coffee Shop,Burger Joint,Steakhouse,Music Venue,Burrito Place,Clothing Store,Plaza,Ramen Restaurant,Comic Shop
11,University of Toronto / Harbord,Bookstore,Restaurant,Bakery,Japanese Restaurant,Yoga Studio,Dessert Shop,Café,College Gym,Comfort Food Restaurant,Beer Bar


### Cluster 5 (Seafood, steakhouse, Hotel & Cafe)

In [163]:
downtown_toronto_merged.loc[downtown_toronto_merged['Cluster Labels'] == 4, downtown_toronto_merged.columns[[1] + list(range(5, downtown_toronto_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
5,Central Bay Street,Coffee Shop,Italian Restaurant,Japanese Restaurant,Gastropub,Modern European Restaurant,Park,Miscellaneous Shop,Bubble Tea Shop,Sandwich Place,Bar
6,Christie,Grocery Store,Café,Park,Athletics & Sports,Italian Restaurant,Diner,Coffee Shop,Nightclub,Candy Store,Restaurant
14,Rosedale,Park,Trail,Playground,Cocktail Bar,Concert Hall,Comic Shop,Comfort Food Restaurant,College Gym,Coffee Shop,Church
17,First Canadian Place / Underground city,Café,Coffee Shop,Restaurant,Steakhouse,Gluten-free Restaurant,Gym,Gym / Fitness Center,Deli / Bodega,Pizza Place,Pub


## Exploring Neighborhoods in Bangkok

In [171]:
# Let's create a function to repeat the same process to all the neighborhoods in Bangkok
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            limit)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

# Now write the code to run the above function on each neighborhood and create a new dataframe called bangkok_venues
bangkok_venues = getNearbyVenues(names=bangkok_data['Neighborhood'],
                                   latitudes=bangkok_data['Latitude'],
                                   longitudes=bangkok_data['Longitude'],
                                  )

print('There are {} uniques categories.'.format(len(bangkok_venues['Venue Category'].unique())))

Yan Nawa
Watthana
Wang Thonglang
Thung Khru
Thon Buri
Thawi Watthana
Taling Chan
Suan Luang
Sathon
Saphan Sung
Samphanthawong
Sai Mai
Ratchathewi
Rat Burana
Prawet
Pom Prap Sattru Phai
Phra Nakhon
Phra Khanong
Phaya Thai
Phasi Charoen
Pathum Wan
Nong Khaem
Nong Chok
Min Buri
Lat Phrao
Lat Krabang
Lak Si
Khlong Toei
Khlong San
Khlong Sam Wa
Khan Na Yao
Huai Khwang
Dusit
Don Mueang
Din Daeng
Chom Thong
Chatuchak
Bueng Kum
Bangkok Yai
Bangkok Noi
Bang Sue
Bang Rak
Bang Phlat
Bang Na
Bang Khun Thian
Bang Kho Laem
Bang Khen
Bang Khae
Bang Kapi
Bang Bon
There are 125 uniques categories.


In [172]:
bangkok_venues.groupby('Neighborhood').count()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Bang Bon,7,7,7,7,7,7
Bang Kapi,20,20,20,20,20,20
Bang Khae,13,13,13,13,13,13
Bang Khen,17,17,17,17,17,17
Bang Kho Laem,15,15,15,15,15,15
Bang Khun Thian,20,20,20,20,20,20
Bang Na,7,7,7,7,7,7
Bang Phlat,9,9,9,9,9,9
Bang Rak,20,20,20,20,20,20
Bang Sue,13,13,13,13,13,13


## Analyzing the Neighborhoods - Bangkok

In [173]:
# one hot encoding
bangkok_onehot = pd.get_dummies(bangkok_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
bangkok_onehot['Neighborhood'] = bangkok_venues['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [bangkok_onehot.columns[-1]] + list(bangkok_onehot.columns[:-1])
bangkok_onehot = bangkok_onehot[fixed_columns]

bangkok_onehot.head()

Unnamed: 0,Neighborhood,American Restaurant,Art Gallery,Asian Restaurant,Auto Garage,BBQ Joint,Badminton Court,Bakery,Bar,Bed & Breakfast,...,Theater,Tonkatsu Restaurant,Tour Provider,Toy / Game Store,Train Station,Udon Restaurant,Vegetarian / Vegan Restaurant,Video Store,Vietnamese Restaurant,Wings Joint
0,Yan Nawa,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,Yan Nawa,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,Yan Nawa,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,Yan Nawa,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,Yan Nawa,0,0,0,1,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [174]:
# Set Index
bangkok_grouped = bangkok_onehot.groupby('Neighborhood').mean().reset_index()

In [175]:
# Let's print each neighborhood along with the top 5 most common venues
num_top_venues = 5

for hood in bangkok_grouped['Neighborhood']:
    print("----"+hood+"----")
    temp = bangkok_grouped[bangkok_grouped['Neighborhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----Bang Bon----
                 venue  freq
0         Dessert Shop  0.14
1      Thai Restaurant  0.14
2  Japanese Restaurant  0.14
3    Convenience Store  0.14
4        Grocery Store  0.14


----Bang Kapi----
                  venue  freq
0          Noodle House  0.15
1       Thai Restaurant  0.10
2     Convenience Store  0.10
3     Hotpot Restaurant  0.05
4  Fast Food Restaurant  0.05


----Bang Khae----
                    venue  freq
0     Japanese Restaurant  0.15
1       Convenience Store  0.15
2       Hotpot Restaurant  0.08
3           Grocery Store  0.08
4  Shabu-Shabu Restaurant  0.08


----Bang Khen----
                 venue  freq
0         Noodle House  0.18
1     Asian Restaurant  0.12
2    Convenience Store  0.12
3          Coffee Shop  0.12
4  American Restaurant  0.06


----Bang Kho Laem----
                venue  freq
0        Noodle House  0.13
1  Chinese Restaurant  0.13
2         Coffee Shop  0.13
3     Thai Restaurant  0.13
4  Seafood Restaurant  0.07


----Bang 

In [177]:
# Let's put that into a pandas dataframe
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

# Now let's create the new dataframe and display the top 10 venues for each neighborhood.
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = bangkok_grouped['Neighborhood']

for ind in np.arange(bangkok_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(bangkok_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Bang Bon,Shopping Mall,Convenience Store,Dessert Shop,Grocery Store,Japanese Restaurant,Noodle House,Thai Restaurant,Wings Joint,Fast Food Restaurant,Farmers Market
1,Bang Kapi,Noodle House,Convenience Store,Thai Restaurant,Shabu-Shabu Restaurant,Flea Market,Market,Massage Studio,Fast Food Restaurant,Multiplex,Museum
2,Bang Khae,Convenience Store,Japanese Restaurant,Café,Grocery Store,Fast Food Restaurant,Noodle House,Residential Building (Apartment / Condo),Coffee Shop,Shabu-Shabu Restaurant,Shopping Mall
3,Bang Khen,Noodle House,Asian Restaurant,Convenience Store,Coffee Shop,American Restaurant,Som Tum Restaurant,Vietnamese Restaurant,Garden Center,Flower Shop,Other Nightlife
4,Bang Kho Laem,Chinese Restaurant,Noodle House,Coffee Shop,Thai Restaurant,Fast Food Restaurant,Shopping Mall,Supermarket,Seafood Restaurant,Hotpot Restaurant,Convenience Store
5,Bang Khun Thian,Restaurant,Bakery,Thai Restaurant,Hotpot Restaurant,Fried Chicken Joint,Pizza Place,Noodle House,Coffee Shop,Clothing Store,Chinese Restaurant
6,Bang Na,Asian Restaurant,Noodle House,Satay Restaurant,Seafood Restaurant,Café,Fast Food Restaurant,Farmers Market,Electronics Store,Dumpling Restaurant,Wings Joint
7,Bang Phlat,Convenience Store,Seafood Restaurant,Fast Food Restaurant,Dog Run,Cocktail Bar,Café,Clothing Store,Other Repair Shop,Dumpling Restaurant,Farmers Market
8,Bang Rak,Noodle House,Hotel,Chinese Restaurant,Bar,Café,Breakfast Spot,Massage Studio,Seafood Restaurant,Coffee Shop,Hostel
9,Bang Sue,Noodle House,Thai Restaurant,Coffee Shop,Hotpot Restaurant,Seafood Restaurant,Convenience Store,Badminton Court,Bar,Flea Market,Fast Food Restaurant


# CLUSTERING NEIGHBORHOODS
Now we applied Machine Learning Technique “Clustering” to segment the neighborhoods in similar objects cluster. This will help to analyze from Tourist perspective and we can easily extract the Tourist places which are present on one of the clusters.

## Bangkok

In [178]:
# Run k-means to cluster the neighborhood into 5 clusters.
# set number of clusters
kclusters = 5

bangkok_grouped_clustering = bangkok_grouped.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(bangkok_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

array([2, 2, 2, 0, 2, 2, 0, 1, 0, 2], dtype=int32)

In [179]:
# Let's create a new dataframe that includes the cluster as well as the top 10 venues for each neighborhood
bangkok_merged = bangkok_data

# add clustering labels
bangkok_merged['Cluster Labels'] = kmeans.labels_

# merge toronto_grouped with toronto_data to add latitude/longitude for each neighborhood
bangkok_merged = bangkok_merged.join(neighborhoods_venues_sorted.set_index('Neighborhood'), on='Neighborhood')

bangkok_merged.head() # check the last columns!

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Bangkok,Yan Nawa,13.696944,100.543056,2,Fast Food Restaurant,Hotpot Restaurant,Café,Brewery,Thai Restaurant,Bike Rental / Bike Share,Golf Driving Range,Japanese Restaurant,Lounge,Convenience Store
1,Bangkok,Watthana,13.742222,100.585833,2,Café,Coffee Shop,Hotel,BBQ Joint,Japanese Restaurant,Chinese Restaurant,Shabu-Shabu Restaurant,Building,Speakeasy,Spa
2,Bangkok,Wang Thonglang,13.7864,100.6087,2,Hotpot Restaurant,Coffee Shop,Noodle House,Fast Food Restaurant,Café,Bowling Alley,Department Store,Convenience Store,Flea Market,Fried Chicken Joint
3,Bangkok,Thung Khru,13.6472,100.4958,0,Coffee Shop,Convenience Store,Noodle House,Japanese Curry Restaurant,Food Court,Café,Market,Stadium,Tea Room,Bus Stop
4,Bangkok,Thon Buri,13.725,100.485833,2,Noodle House,Asian Restaurant,Deli / Bodega,Bus Stop,Market,Boat or Ferry,Fried Chicken Joint,Seafood Restaurant,Thai Restaurant,Train Station


In [181]:
# create map
map_clusters = folium.Map(location=[latitude_bangkok, longitude_bangkok], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i+x+(i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(bangkok_merged['Latitude'], bangkok_merged['Longitude'], bangkok_merged['Neighborhood'], bangkok_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

# EXAMINE CLUSTERS
Now, we can examine each cluster and determine the discriminating venue categories that distinguish each cluster. Based on the defining categories, we can then assign a name to each cluster.
## Manhattan
### Residential

In [183]:
bangkok_merged.loc[bangkok_merged['Cluster Labels'] == 0, bangkok_merged.columns[[1] + list(range(5, bangkok_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
3,Thung Khru,Coffee Shop,Convenience Store,Noodle House,Japanese Curry Restaurant,Food Court,Café,Market,Stadium,Tea Room,Bus Stop
6,Taling Chan,Convenience Store,Soccer Field,Noodle House,Floating Market,Wings Joint,Deli / Bodega,Dessert Shop,Dim Sum Restaurant,Diner,Dog Run
8,Sathon,Noodle House,Asian Restaurant,Chinese Restaurant,Dessert Shop,Breakfast Spot,BBQ Joint,Bakery,Farmers Market,Coffee Shop,Fast Food Restaurant
10,Samphanthawong,Art Gallery,Chinese Restaurant,Hotel,Coffee Shop,Restaurant,Hostel,Hotel Bar,Tour Provider,Cocktail Bar,Pool
11,Sai Mai,Deli / Bodega,Convenience Store,Steakhouse,Café,Bar,Noodle House,Thai Restaurant,Electronics Store,Flea Market,Fast Food Restaurant
17,Phra Khanong,Convenience Store,Fast Food Restaurant,Hotel,Italian Restaurant,Café,Pharmacy,Coffee Shop,Residential Building (Apartment / Condo),Japanese Restaurant,Spa
18,Phaya Thai,Sushi Restaurant,Coffee Shop,Som Tum Restaurant,Thai Restaurant,Restaurant,Japanese Restaurant,Fried Chicken Joint,Steakhouse,Supermarket,Food Court
25,Lat Krabang,Thai Restaurant,Hotel,Asian Restaurant,Coffee Shop,Restaurant,Steakhouse,Café,Noodle House,Electronics Store,Flea Market
29,Khlong Sam Wa,Pub,Grocery Store,Thai Restaurant,Coffee Shop,Japanese Restaurant,Pet Store,Convenience Store,Dim Sum Restaurant,Dessert Shop,Flea Market
34,Din Daeng,Convenience Store,Thai Restaurant,Park,Bus Station,Sports Club,Stadium,Hotel,Bakery,Food & Drink Shop,Dim Sum Restaurant


### Commercial Places

In [184]:
bangkok_merged.loc[bangkok_merged['Cluster Labels'] == 1, bangkok_merged.columns[[1] + list(range(5, bangkok_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
7,Suan Luang,Noodle House,Asian Restaurant,Thai Restaurant,Café,Shopping Mall,Italian Restaurant,Bakery,Chinese Restaurant,Clothing Store,Coffee Shop
14,Prawet,Convenience Store,Noodle House,Comfort Food Restaurant,Halal Restaurant,Wings Joint,Flower Shop,Dim Sum Restaurant,Diner,Dog Run,Donburi Restaurant
15,Pom Prap Sattru Phai,Noodle House,Asian Restaurant,Chinese Restaurant,Café,American Restaurant,Steakhouse,Dumpling Restaurant,Museum,Mediterranean Restaurant,Market
16,Phra Nakhon,Massage Studio,Hostel,Hotel,Thai Restaurant,Tea Room,Café,Snack Place,Park,Coffee Shop,Beer Bar
32,Dusit,Noodle House,Asian Restaurant,Dessert Shop,Som Tum Restaurant,Convenience Store,Coffee Shop,Market,Farmers Market,Flea Market,Fast Food Restaurant
40,Bang Sue,Noodle House,Thai Restaurant,Coffee Shop,Hotpot Restaurant,Seafood Restaurant,Convenience Store,Badminton Court,Bar,Flea Market,Fast Food Restaurant
44,Bang Khun Thian,Restaurant,Bakery,Thai Restaurant,Hotpot Restaurant,Fried Chicken Joint,Pizza Place,Noodle House,Coffee Shop,Clothing Store,Chinese Restaurant


### Tourist Areas & Hubs

In [185]:
bangkok_merged.loc[bangkok_merged['Cluster Labels'] == 2, bangkok_merged.columns[[1] + list(range(5, bangkok_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Yan Nawa,Fast Food Restaurant,Hotpot Restaurant,Café,Brewery,Thai Restaurant,Bike Rental / Bike Share,Golf Driving Range,Japanese Restaurant,Lounge,Convenience Store
1,Watthana,Café,Coffee Shop,Hotel,BBQ Joint,Japanese Restaurant,Chinese Restaurant,Shabu-Shabu Restaurant,Building,Speakeasy,Spa
2,Wang Thonglang,Hotpot Restaurant,Coffee Shop,Noodle House,Fast Food Restaurant,Café,Bowling Alley,Department Store,Convenience Store,Flea Market,Fried Chicken Joint
4,Thon Buri,Noodle House,Asian Restaurant,Deli / Bodega,Bus Stop,Market,Boat or Ferry,Fried Chicken Joint,Seafood Restaurant,Thai Restaurant,Train Station
5,Thawi Watthana,Convenience Store,Thai Restaurant,Chinese Restaurant,Breakfast Spot,Asian Restaurant,Massage Studio,Bus Stop,Steakhouse,Floating Market,Flea Market
9,Saphan Sung,Thai Restaurant,Convenience Store,Stadium,Japanese Restaurant,Wings Joint,Floating Market,Flea Market,Fast Food Restaurant,Farmers Market,Electronics Store
12,Ratchathewi,Hotel,Steakhouse,Hostel,Coffee Shop,Restaurant,Gym / Fitness Center,Thai Restaurant,Chinese Restaurant,Jazz Club,Lounge
13,Rat Burana,Thai Restaurant,Asian Restaurant,Hotpot Restaurant,Department Store,Café,Bistro,Noodle House,Chinese Restaurant,Flea Market,Fast Food Restaurant
19,Phasi Charoen,Coffee Shop,Japanese Restaurant,Pizza Place,Steakhouse,BBQ Joint,Donburi Restaurant,Clothing Store,Noodle House,Shopping Mall,Fast Food Restaurant
20,Pathum Wan,Asian Restaurant,Noodle House,Dessert Shop,Chinese Restaurant,Thai Restaurant,Gym / Fitness Center,Hostel,Gaming Cafe,Coworking Space,Seafood Restaurant


### Center Acivity

In [186]:
bangkok_merged.loc[bangkok_merged['Cluster Labels'] == 3, bangkok_merged.columns[[1] + list(range(5, bangkok_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
35,Chom Thong,Ice Cream Shop,Convenience Store,Diner,Coffee Shop,Wings Joint,Food & Drink Shop,Dim Sum Restaurant,Dog Run,Donburi Restaurant,Dumpling Restaurant
43,Bang Na,Asian Restaurant,Noodle House,Satay Restaurant,Seafood Restaurant,Café,Fast Food Restaurant,Farmers Market,Electronics Store,Dumpling Restaurant,Wings Joint


### Cultural & Going Out Places

In [191]:
bangkok_merged.loc[bangkok_merged['Cluster Labels'] == 4, bangkok_merged.columns[[1] + list(range(5, bangkok_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
26,Lak Si,Coffee Shop,Fast Food Restaurant,Thai Restaurant,Hotpot Restaurant,Supermarket,Convenience Store,Noodle House,Canal,Café,Shopping Mall


# RESULTS
After clustering the data of the respective neighborhoods, both cities (Boroughs) have venues which can be explored and attract the Tourists. The neighborhoods are much similar in features like food places, clubs, museums, parks etc. As far as concern to dissimilarity, it differs in terms of some unique places like historical places and monuments.
# Observations & Recommendations
When we compare the tourist places, we observe that the historical place is only situated in Downtown Toronto and the Monument or landmark venue is in Manhattan neighborhoods. Similarly, Airport facility, Harbor, Sculpture garden and Boat or ferry services are also available i**n Downtown Toronto while venues like Nightlife, Climbing gym and Museums are present in Bangkok.
As far as concern to recommendations, we recommend Downtown Toronto Neighborhoods will be considered first to visit. The tourists have an easily travelling access due to Airport facility, which not only saves time but also helps to save money. This saved money can be utilized to explore more, the attracting venues.
# Conclusion
The downtown Toronto and Bangkok neighborhoods have more like similar venues. As we know that every place is unique in its own way, so that’s argument is present in both neighborhoods. The dissimilarity exists in terms of some different venues and facilities but not on a larger extent.