# Capstone Project - The Battle of the Neighborhoods (Week 2)
### Applied Data Science Capstone by IBM/Coursera    Kaho Taniguchi

## Table of contents
* [Introduction: Business Problem](#introduction)
* [Data](#data)
* [Methodology](#methodology)
* [Analysis](#analysis)
* [Results and Discussion](#results)
* [Conclusion](#conclusion)

## Introduction: Business Problem <a name="introduction"></a>

In this project we are going to find a good location for a hotel. Especially this project is going to be targeted to stakeholders who are interested in opening a new hotel in Toronto, Canada.

Since when we think about tourism in Canada after the corona virus pandemic is over, Toronto must be the most attractive city. According to some tourism websites, Toronto is the most popular city in Canada and around 4,520,000 people visited there every year before the corona virus pandemic. In fact, 6 new luxury hotels opened in Toronto in 2020. For property developers, location of the hotel is one of the most important factors because it will determine whether the hotel will be success or not. So the business question is, 'In Toronto, if a property developer is going to open a new hotel, where would you recommend that they open it?' Today, there are lots of hotels in Toronto, so we will try to detect locations that are not already crowded with hotels.  And, we are also particularly interested in areas with low hotel in vicinity.

We will use our data to generate a few most promising neighbourhoods based on this criteria. Advantages of each area will then be clearly expressed so that good location for property developers.

## Data <a name="data"></a>

According to definition of our problem, a factor that will influence our decision can be:

* number of existing hotels in the neighbourhood
* Latitude and longitude coordinates of those neighborhoods
* Venue data, particularly data related to hotels

So, we are going to use following data sources to generate the required information:

Candidate areas will be generated algorithmically and approximate addresses of centers of those areas will be obtained using Google Maps API reverse geocoding
* number of hotels and location in every neighbourhood will be obtained using Foursquare API(https://foursquare.com/)
* Python Geocoder package for latitude and longitude coordinates for neighbourhoods.
* List of Postal Codes of Canada:M (https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M)

## Methodology <a name="methodology"></a>

* Web scrape by using Python request and beautiful soup packages to get the list of neighborhoods
* Get the geographical coordinates of latitude and longitude using Python Geocoder package
* Pull Hotel location data in Toronto via Foursquare API (https://foursquare.com)
* Create a table of Toronto Neighborhoods and their coordinates
* Use K-means clustering to make groups to find a good place for a new hotel

In [1]:
! pip install bs4
! pip install lxml
! pip install html5lib



In [2]:
from bs4 import BeautifulSoup 
import requests
import lxml
import pandas as pd
import numpy as np
import html5lib

In [3]:
import json # library to handle JSON files
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe
from sklearn.cluster import KMeans
import matplotlib.cm as cm
import matplotlib.colors as colors

In [4]:
my_url=requests.get('https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M').text

In [5]:
soup = BeautifulSoup(my_url,'html5lib')
table = soup.find('table_class',{'class':'navbox'})

Create the dataframe

In [6]:
table_contents=[]
table=soup.find('table')
for row in table.findAll('td'):
    cell = {}
    if row.span.text=='Not assigned':
        pass
    else:
        cell['PostalCode'] = row.p.text[:3]
        cell['Borough'] = (row.span.text).split('(')[0]
        cell['Neighborhood'] = (((((row.span.text).split('(')[1]).strip(')')).replace(' /',',')).replace(')',' ')).strip(' ')
        table_contents.append(cell)

In [7]:
print(table_contents)
df=pd.DataFrame(table_contents)
df['Borough']=df['Borough'].replace({'Downtown TorontoStn A PO Boxes25 The Esplanade':'Downtown Toronto Stn A',
                                             'East TorontoBusiness reply mail Processing Centre969 Eastern':'East Toronto Business',
                                             'EtobicokeNorthwest':'Etobicoke Northwest','East YorkEast Toronto':'East York/East Toronto',
                                             'MississaugaCanada Post Gateway Processing Centre':'Mississauga'})

[{'PostalCode': 'M3A', 'Borough': 'North York', 'Neighborhood': 'Parkwoods'}, {'PostalCode': 'M4A', 'Borough': 'North York', 'Neighborhood': 'Victoria Village'}, {'PostalCode': 'M5A', 'Borough': 'Downtown Toronto', 'Neighborhood': 'Regent Park, Harbourfront'}, {'PostalCode': 'M6A', 'Borough': 'North York', 'Neighborhood': 'Lawrence Manor, Lawrence Heights'}, {'PostalCode': 'M7A', 'Borough': "Queen's Park", 'Neighborhood': 'Ontario Provincial Government'}, {'PostalCode': 'M9A', 'Borough': 'Etobicoke', 'Neighborhood': 'Islington Avenue'}, {'PostalCode': 'M1B', 'Borough': 'Scarborough', 'Neighborhood': 'Malvern, Rouge'}, {'PostalCode': 'M3B', 'Borough': 'North York', 'Neighborhood': 'Don Mills North'}, {'PostalCode': 'M4B', 'Borough': 'East York', 'Neighborhood': 'Parkview Hill, Woodbine Gardens'}, {'PostalCode': 'M5B', 'Borough': 'Downtown Toronto', 'Neighborhood': 'Garden District, Ryerson'}, {'PostalCode': 'M6B', 'Borough': 'North York', 'Neighborhood': 'Glencairn'}, {'PostalCode': 'M9

In [8]:
df.head(15)

Unnamed: 0,PostalCode,Borough,Neighborhood
0,M3A,North York,Parkwoods
1,M4A,North York,Victoria Village
2,M5A,Downtown Toronto,"Regent Park, Harbourfront"
3,M6A,North York,"Lawrence Manor, Lawrence Heights"
4,M7A,Queen's Park,Ontario Provincial Government
5,M9A,Etobicoke,Islington Avenue
6,M1B,Scarborough,"Malvern, Rouge"
7,M3B,North York,Don Mills North
8,M4B,East York,"Parkview Hill, Woodbine Gardens"
9,M5B,Downtown Toronto,"Garden District, Ryerson"


In [9]:
df.shape

(103, 3)

Built a dataframe of the postal code of each neighborhood along with the borough name and neighborhood name

In [10]:
! pip install geocoder



In [11]:
postal_code = df['PostalCode']
postal_code

0      M3A
1      M4A
2      M5A
3      M6A
4      M7A
      ... 
98     M8X
99     M4Y
100    M7Y
101    M8Y
102    M8Z
Name: PostalCode, Length: 103, dtype: object

In [12]:
import geocoder
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

In [13]:
# initialize your variable to None
lat_lng_coords = None

# loop until you get the coordinates
while(lat_lng_coords is None):
  g = geocoder.arcgis('{}, Toronto, Ontario'.format('Postal Code'))
  lat_lng_coords = g.latlng

latitude = lat_lng_coords[0]
longitude = lat_lng_coords[1]

print(latitude,longitude )

43.648690000000045 -79.38543999999996


In [14]:
lat_lon = pd.read_csv('https://cocl.us/Geospatial_data')
lat_lon.head()

Unnamed: 0,Postal Code,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497
2,M1E,43.763573,-79.188711
3,M1G,43.770992,-79.216917
4,M1H,43.773136,-79.239476


In [15]:
df = df.groupby(["PostalCode", "Borough"])["Neighborhood"].apply(", ".join).reset_index()
df.head()

Unnamed: 0,PostalCode,Borough,Neighborhood
0,M1B,Scarborough,"Malvern, Rouge"
1,M1C,Scarborough,"Rouge Hill, Port Union, Highland Creek"
2,M1E,Scarborough,"Guildwood, Morningside, West Hill"
3,M1G,Scarborough,Woburn
4,M1H,Scarborough,Cedarbrae


In [16]:
df_toronto = pd.merge(df, lat_lon, how='left', left_on = 'PostalCode', right_on = 'Postal Code')
# remove the "Postal Code" column
df_toronto.drop("Postal Code", axis=1, inplace=True)
df_toronto.head()

Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude
0,M1B,Scarborough,"Malvern, Rouge",43.806686,-79.194353
1,M1C,Scarborough,"Rouge Hill, Port Union, Highland Creek",43.784535,-79.160497
2,M1E,Scarborough,"Guildwood, Morningside, West Hill",43.763573,-79.188711
3,M1G,Scarborough,Woburn,43.770992,-79.216917
4,M1H,Scarborough,Cedarbrae,43.773136,-79.239476


Explore and cluster the neighborhoods in Toronto

In [17]:
address = "Toronto, ON"

geolocator = Nominatim(user_agent="toronto_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Toronto city are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Toronto city are 43.6534817, -79.3839347.


In [18]:
!conda install -c conda-forge folium=0.5.0 --yes
import folium # map rendering library

Collecting package metadata (current_repodata.json): ...working... done
Solving environment: ...working... done

# All requested packages already installed.



In [19]:
map_toronto = folium.Map(location=[latitude, longitude], zoom_start=10)
map_toronto

In [20]:
for lat, lng, borough, neighborhood in zip(
        df_toronto['Latitude'], 
        df_toronto['Longitude'], 
        df_toronto['Borough'], 
        df_toronto['Neighborhood']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_toronto)  

map_toronto

In [21]:
df_toronto_denc = df_toronto[df_toronto['Borough'].str.contains("Toronto")].reset_index(drop=True)
df_toronto_denc.head()

Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude
0,M4E,East Toronto,The Beaches,43.676357,-79.293031
1,M4J,East York/East Toronto,The Danforth East,43.685347,-79.338106
2,M4K,East Toronto,"The Danforth West, Riverdale",43.679557,-79.352188
3,M4L,East Toronto,"India Bazaar, The Beaches West",43.668999,-79.315572
4,M4M,East Toronto,Studio District,43.659526,-79.340923


In [22]:
map_toronto_denc = folium.Map(location=[latitude, longitude], zoom_start=12)
for lat, lng, borough, neighborhood in zip(
        df_toronto_denc['Latitude'], 
        df_toronto_denc['Longitude'], 
        df_toronto_denc['Borough'], 
        df_toronto_denc['Neighborhood']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_toronto_denc)  

map_toronto_denc

In [23]:
CLIENT_ID = '0CAIDF5VEIW1QDKYTUATML0RMIPG4KCJHWN25ZZ0IXHTLD24'
CLIENT_SECRET = 'OXSIHN30AGIZ1PNX2MWNNLBDYSKEGAXFLQAD2YSRS4CTPCQX'
VERSION = '20210410'

In [24]:
neighborhood_name = df_toronto_denc.loc[0, 'Neighborhood']
print(f"The first neighborhood's name is '{neighborhood_name}'.")

The first neighborhood's name is 'The Beaches'.


In [25]:
neighborhood_latitude = df_toronto_denc.loc[0, 'Latitude'] 
neighborhood_longitude = df_toronto_denc.loc[0, 'Longitude'] 

print('Latitude and longitude values of {} are {}, {}.'.format(neighborhood_name, 
                                                               neighborhood_latitude, 
                                                               neighborhood_longitude))

Latitude and longitude values of The Beaches are 43.67635739999999, -79.2930312.


In [26]:
LIMIT = 100 # limit of number of venues returned by Foursquare API
radius = 500 # define radius
url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&query=hotel&radius={}&limit={}'.format(
    CLIENT_ID, 
    CLIENT_SECRET, 
    VERSION, 
    neighborhood_latitude, 
    neighborhood_longitude, 
    radius, 
    LIMIT)

# get the result to a json file
results = requests.get(url).json()

In [34]:
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']       

In [35]:
venues = results['response']['groups'][0]['items']
nearby_venues = json_normalize(venues) # flatten JSON

# filter columns
filtered_columns = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
nearby_venues =nearby_venues.loc[:, filtered_columns]

# filter the category for each row
nearby_venues['venue.categories'] = nearby_venues.apply(get_category_type, axis=1)

# clean columns
nearby_venues.columns = [col.split(".")[-1] for col in nearby_venues.columns]

nearby_venues

  nearby_venues = json_normalize(venues) # flatten JSON


KeyError: "None of [Index(['venue.name', 'venue.categories', 'venue.location.lat',\n       'venue.location.lng'],\n      dtype='object')] are in the [columns]"

In [36]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    venues_list=[]
    
    for name, lat, lng in zip(names, latitudes, longitudes):
        # print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&query=hotel&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [37]:
toronto_denc_venues = getNearbyVenues(names=df_toronto_denc['Neighborhood'],
                                   latitudes=df_toronto_denc['Latitude'],
                                   longitudes=df_toronto_denc['Longitude']
                                  )

In [38]:
toronto_denc_venues.head()

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,"India Bazaar, The Beaches West",43.668999,-79.315572,Days Inn,43.667145,-79.313179,Hotel
1,Davisville North,43.712751,-79.390197,Best Western Roehampton Hotel & Suites,43.708878,-79.39088,Hotel
2,Davisville North,43.712751,-79.390197,The Ambassador,43.710418,-79.39186,Hotel
3,Church and Wellesley,43.66586,-79.38316,The Anndore House,43.668801,-79.385413,Hotel
4,Church and Wellesley,43.66586,-79.38316,Holiday Inn Toronto Downtown Centre,43.661779,-79.381047,Hotel


In [39]:
toronto_denc_venues.groupby('Neighborhood').count()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Berczy Park,4,4,4,4,4,4
Central Bay Street,9,9,9,9,9,9
Church and Wellesley,10,10,10,10,10,10
"Commerce Court, Victoria Hotel",29,29,29,29,29,29
Davisville North,2,2,2,2,2,2
Enclave of M5E,10,10,10,10,10,10
"First Canadian Place, Underground city",27,27,27,27,27,27
"Garden District, Ryerson",15,15,15,15,15,15
"Harbourfront East, Union Station, Toronto Islands",7,7,7,7,7,7
"India Bazaar, The Beaches West",1,1,1,1,1,1


In [40]:
print('There are {} uniques categories.'.format(len(toronto_denc_venues['Venue Category'].unique())))

There are 15 uniques categories.


In [41]:
toronto_denc_onehot = pd.get_dummies(toronto_denc_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
toronto_denc_onehot['Neighborhood'] = toronto_denc_venues['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [toronto_denc_onehot.columns[-1]] + list(toronto_denc_onehot.columns[:-1])
toronto_denc_onehot = toronto_denc_onehot[fixed_columns]

toronto_denc_onehot.head()

Unnamed: 0,Neighborhood,Bed & Breakfast,Cocktail Bar,Coffee Shop,College Residence Hall,Convention Center,General Travel,Gym / Fitness Center,Hostel,Hotel,Hotel Bar,Hotel Pool,Jazz Club,Motel,Pub,Restaurant
0,"India Bazaar, The Beaches West",0,0,0,0,0,0,0,0,1,0,0,0,0,0,0
1,Davisville North,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0
2,Davisville North,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0
3,Church and Wellesley,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0
4,Church and Wellesley,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0


In [42]:
toronto_denc_grouped = toronto_denc_onehot.groupby('Neighborhood').mean().reset_index()
toronto_denc_grouped.head()

Unnamed: 0,Neighborhood,Bed & Breakfast,Cocktail Bar,Coffee Shop,College Residence Hall,Convention Center,General Travel,Gym / Fitness Center,Hostel,Hotel,Hotel Bar,Hotel Pool,Jazz Club,Motel,Pub,Restaurant
0,Berczy Park,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.75,0.0,0.25,0.0,0.0,0.0,0.0
1,Central Bay Street,0.111111,0.0,0.0,0.111111,0.0,0.0,0.0,0.111111,0.555556,0.0,0.111111,0.0,0.0,0.0,0.0
2,Church and Wellesley,0.1,0.0,0.0,0.0,0.0,0.0,0.1,0.0,0.7,0.0,0.1,0.0,0.0,0.0,0.0
3,"Commerce Court, Victoria Hotel",0.0,0.0,0.0,0.0,0.0,0.034483,0.0,0.034483,0.724138,0.068966,0.068966,0.0,0.0,0.0,0.068966
4,Davisville North,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0


In [43]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    return row_categories_sorted.index.values[0:num_top_venues]

num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = toronto_denc_grouped['Neighborhood']

for ind in np.arange(toronto_denc_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(toronto_denc_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted.head()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Berczy Park,Hotel,Hotel Pool,Restaurant,Pub,Motel,Jazz Club,Hotel Bar,Hostel,Gym / Fitness Center,General Travel
1,Central Bay Street,Hotel,Hotel Pool,Hostel,College Residence Hall,Bed & Breakfast,Restaurant,Pub,Motel,Jazz Club,Hotel Bar
2,Church and Wellesley,Hotel,Hotel Pool,Gym / Fitness Center,Bed & Breakfast,Restaurant,Pub,Motel,Jazz Club,Hotel Bar,Hostel
3,"Commerce Court, Victoria Hotel",Hotel,Restaurant,Hotel Pool,Hotel Bar,Hostel,General Travel,Pub,Motel,Jazz Club,Gym / Fitness Center
4,Davisville North,Hotel,Restaurant,Pub,Motel,Jazz Club,Hotel Pool,Hotel Bar,Hostel,Gym / Fitness Center,General Travel


In [44]:
kclusters = 5

toronto_denc_grouped_clustering = toronto_denc_grouped.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(toronto_denc_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

array([3, 1, 3, 3, 0, 0, 3, 3, 1, 0])

In [45]:
# add clustering labels
neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

toronto_denc_merged = df_toronto_denc

toronto_denc_merged = toronto_denc_merged.join(neighborhoods_venues_sorted.set_index('Neighborhood'), on='Neighborhood')

toronto_denc_merged.head() 

Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,M4E,East Toronto,The Beaches,43.676357,-79.293031,,,,,,,,,,,
1,M4J,East York/East Toronto,The Danforth East,43.685347,-79.338106,,,,,,,,,,,
2,M4K,East Toronto,"The Danforth West, Riverdale",43.679557,-79.352188,,,,,,,,,,,
3,M4L,East Toronto,"India Bazaar, The Beaches West",43.668999,-79.315572,0.0,Hotel,Restaurant,Pub,Motel,Jazz Club,Hotel Pool,Hotel Bar,Hostel,Gym / Fitness Center,General Travel
4,M4M,East Toronto,Studio District,43.659526,-79.340923,,,,,,,,,,,


In [57]:
toronto_denc_merged=toronto_denc_merged.dropna()
toronto_denc_merged

Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
3,M4L,East Toronto,"India Bazaar, The Beaches West",43.668999,-79.315572,0.0,Hotel,Restaurant,Pub,Motel,Jazz Club,Hotel Pool,Hotel Bar,Hostel,Gym / Fitness Center,General Travel
6,M4P,Central Toronto,Davisville North,43.712751,-79.390197,0.0,Hotel,Restaurant,Pub,Motel,Jazz Club,Hotel Pool,Hotel Bar,Hostel,Gym / Fitness Center,General Travel
13,M4Y,Downtown Toronto,Church and Wellesley,43.66586,-79.38316,3.0,Hotel,Hotel Pool,Gym / Fitness Center,Bed & Breakfast,Restaurant,Pub,Motel,Jazz Club,Hotel Bar,Hostel
14,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.65426,-79.360636,4.0,Pub,Hotel,Restaurant,Motel,Jazz Club,Hotel Pool,Hotel Bar,Hostel,Gym / Fitness Center,General Travel
15,M5B,Downtown Toronto,"Garden District, Ryerson",43.657162,-79.378937,3.0,Hotel,Hotel Pool,Coffee Shop,Cocktail Bar,Restaurant,Pub,Motel,Jazz Club,Hotel Bar,Hostel
16,M5C,Downtown Toronto,St. James Town,43.651494,-79.375418,3.0,Hotel,Hotel Pool,Hostel,Cocktail Bar,Restaurant,Pub,Motel,Jazz Club,Hotel Bar,Gym / Fitness Center
17,M5E,Downtown Toronto,Berczy Park,43.644771,-79.373306,3.0,Hotel,Hotel Pool,Restaurant,Pub,Motel,Jazz Club,Hotel Bar,Hostel,Gym / Fitness Center,General Travel
18,M5G,Downtown Toronto,Central Bay Street,43.657952,-79.387383,1.0,Hotel,Hotel Pool,Hostel,College Residence Hall,Bed & Breakfast,Restaurant,Pub,Motel,Jazz Club,Hotel Bar
19,M5H,Downtown Toronto,"Richmond, Adelaide, King",43.650571,-79.384568,1.0,Hotel,Hotel Pool,Restaurant,Jazz Club,Hotel Bar,General Travel,College Residence Hall,Pub,Motel,Hostel
20,M5J,Downtown Toronto,"Harbourfront East, Union Station, Toronto Islands",43.640816,-79.381752,1.0,Hotel,Hotel Pool,Convention Center,Restaurant,Pub,Motel,Jazz Club,Hotel Bar,Hostel,Gym / Fitness Center


## Analysis <a name="analysis"></a>

In [59]:
# to create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(
        toronto_denc_merged['Latitude'], 
        toronto_denc_merged['Longitude'], 
        toronto_denc_merged['Neighborhood'], 
        toronto_denc_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[int(cluster)-1],
        fill=True,
        fill_color=rainbow[int(cluster)-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

In [60]:
#cluster1
toronto_denc_merged.loc[toronto_denc_merged['Cluster Labels'] == 0, toronto_denc_merged.columns[[1] + list(range(5, toronto_denc_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
3,East Toronto,0.0,Hotel,Restaurant,Pub,Motel,Jazz Club,Hotel Pool,Hotel Bar,Hostel,Gym / Fitness Center,General Travel
6,Central Toronto,0.0,Hotel,Restaurant,Pub,Motel,Jazz Club,Hotel Pool,Hotel Bar,Hostel,Gym / Fitness Center,General Travel
29,Downtown Toronto Stn A,0.0,Hotel,Hostel,Restaurant,Pub,Motel,Jazz Club,Hotel Pool,Hotel Bar,Gym / Fitness Center,General Travel


In [61]:
#cluster2
toronto_denc_merged.loc[toronto_denc_merged['Cluster Labels'] == 1, toronto_denc_merged.columns[[1] + list(range(5, toronto_denc_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
18,Downtown Toronto,1.0,Hotel,Hotel Pool,Hostel,College Residence Hall,Bed & Breakfast,Restaurant,Pub,Motel,Jazz Club,Hotel Bar
19,Downtown Toronto,1.0,Hotel,Hotel Pool,Restaurant,Jazz Club,Hotel Bar,General Travel,College Residence Hall,Pub,Motel,Hostel
20,Downtown Toronto,1.0,Hotel,Hotel Pool,Convention Center,Restaurant,Pub,Motel,Jazz Club,Hotel Bar,Hostel,Gym / Fitness Center


In [62]:
#cluster3
toronto_denc_merged.loc[toronto_denc_merged['Cluster Labels'] == 2, toronto_denc_merged.columns[[1] + list(range(5, toronto_denc_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
27,Downtown Toronto,2.0,Hotel,Motel,Hotel Bar,Hostel,Bed & Breakfast,Restaurant,Pub,Jazz Club,Hotel Pool,Gym / Fitness Center


In [63]:
#cluster4
toronto_denc_merged.loc[toronto_denc_merged['Cluster Labels'] == 3, toronto_denc_merged.columns[[1] + list(range(5, toronto_denc_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
13,Downtown Toronto,3.0,Hotel,Hotel Pool,Gym / Fitness Center,Bed & Breakfast,Restaurant,Pub,Motel,Jazz Club,Hotel Bar,Hostel
15,Downtown Toronto,3.0,Hotel,Hotel Pool,Coffee Shop,Cocktail Bar,Restaurant,Pub,Motel,Jazz Club,Hotel Bar,Hostel
16,Downtown Toronto,3.0,Hotel,Hotel Pool,Hostel,Cocktail Bar,Restaurant,Pub,Motel,Jazz Club,Hotel Bar,Gym / Fitness Center
17,Downtown Toronto,3.0,Hotel,Hotel Pool,Restaurant,Pub,Motel,Jazz Club,Hotel Bar,Hostel,Gym / Fitness Center,General Travel
21,Downtown Toronto,3.0,Hotel,Restaurant,Hotel Pool,Hotel Bar,General Travel,Pub,Motel,Jazz Club,Hostel,Gym / Fitness Center
22,Downtown Toronto,3.0,Hotel,Restaurant,Hotel Pool,Hotel Bar,Hostel,General Travel,Pub,Motel,Jazz Club,Gym / Fitness Center
30,Downtown Toronto,3.0,Hotel,Restaurant,Hotel Pool,Hotel Bar,General Travel,Pub,Motel,Jazz Club,Hostel,Gym / Fitness Center


In [64]:
#cluster5
toronto_denc_merged.loc[toronto_denc_merged['Cluster Labels'] == 4, toronto_denc_merged.columns[[1] + list(range(5, toronto_denc_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
14,Downtown Toronto,4.0,Pub,Hotel,Restaurant,Motel,Jazz Club,Hotel Pool,Hotel Bar,Hostel,Gym / Fitness Center,General Travel


## Results and Discussion <a name="results"></a>

What our analysis found is as below,
* Cluster 4 (green) : the most crowded area 
* Cluster 1 (red) and Cluster 2 (purple) : Neighborhoods with moderate number  of hotels
* Cluster 3  and Cluster 5 :  Neighborhoods with quite low number  of hotels

Based on our analyze, the area grouped as cluster 3 and 5 can be recommended as a place for opening a new hotel with small number of existing hotels while the area grouped as cluster 1, 2 and 4 already have numbers of hotels.

This, of course, does not directly imply that those areas are actually the best locations for opening a new hotel. But the purpose of this analysis was to only provide information of areas where not crowded with existing hotels because it is entirely possible that there is a very good reason for small number of hotels in any of those areas, reasons which would make them unsuitable for a new hotel regardless of lack of competition in the area.  Therefore recommended areas can be considered only as a starting point for more detailed analysis which could eventually result in location which has not only no nearby competition but also other factors taken into account and all other relevant conditions met.

Our analysis shows that although there are many hotels in Toronto, there are still some pockets of low hotel density. Highest concentration of hotels was detected east and central area in Toronto, so we focused our attention to downtown and west area, corresponding to boroughs Downtown Toronto.

## Conclusion <a name="conclusion"></a>

Objective of this project was to identify Toronto areas with small number of hotels in order to make some help for property developer in narrowing down the search for preferred location for a new hotel. By calculating hotel density distribution with data　from Foursquare API, we identified general boroughs that justify further analysis. 

Of course, final decision on optimal new hotel location will be made by property developer based on  much more specific characteristics of neighbourhoods and locations in every recommended area, we can take into consideration additional factors by using data science power.