# Exploring potential new hotel venues in Brussels

### 1. Introduction

My report is for those who are planning to start a new hotel in the city of Brussels. I will provide a suggestion on what would be the best venue to start a new hotel in this densely populated and highly visited city.

Brussels, officially the Brussels-Capital Region, is a region of Belgium comprising 19 municipalities, including the City of Brussels, which is the capital of Belgium. The Brussels-Capital Region is located in the central portion of the country and is a part of both the French Community of Belgium and the Flemish Community, but is separate from the Flemish Region (in which it forms an enclave) and the Walloon Region. Brussels is the most densely populated and the richest region in Belgium in terms of GDP per capita. 

Unassuming Brussels is the capital of Belgium, Flanders and Europe. Medieval Grand-Place, is indeed grand, with many 17th-century buildings and daily flower markets. Reopened in 2006, the Atomium, Brussels' Eiffel Tower, provides great views, inside and out. Architecture fans should visit Musee Horta, home of Belgian master architect Victor Horta. St. Gery's clubs and bars are packed year-round. Seafood eateries abound in Ste. Catherine. Walk, rather than get snarled up in traffic, in the narrow streets.

### 2. Business Problem

In my report, I will focus on the issue of where to open a new hotel in a city like Brussels, once one has decided to go ahead. Let’s imagine Marriott Hotels are willing to open a new luxury hotel, the first and foremost important decision will be the location for its new hotel.
1. On what basis can they decide the new hotel's location?
2. While selecting the place there are key points to consider like they need to check where the most well-visited venues of the city are?
3. If incase there are already other luxury hotels which have good ratings, will it be risky to open new one near these hotels?
4. Out of scope for this project: Rent and land values in the neighborhoods, budget for the interior decoration of the hotel, budget for opening different restaurants in the hotel etc.

### 3. Data Preperation

As we are creating a report for those who want to open a new luxury hotel in Brussels, the first requirement is to collect Brussels postal codes data with the name of respective neighborhoods. The second requirement would be collect data related to latitude and longitude values of the same neighbrohoods and merge the two datasets.

There are 19 municipalities in Brussels with different neighborhoods. We will explore each municipality and their respective neighborhoods to check which neighborhood has the most visited venues and would be perfect to open a new hotel.
We will extract the data from below wikipedia page using Beautiful Soup.
https://en.wikipedia.org/wiki/List_of_municipalities_of_the_Brussels-Capital_Region

We will extract the data related to latitude and logitude values of Neighborhoods with a csv file(zipcode-belgium.csv) saved at local machine and then at the server. 

Now once we have the latitude and longitude data, let's use Foursquare Location to get the amount of most visited venues per Neighborhood, which will give us an idea of where the tourist are moving when visiting the city. This will already show us the best Neighborhoods to start a hotel. The details can be retrieved using search endpoint.
Link to the dataset is: https://developer.foursquare.com/docs/data

#### Use the BeautifulSoup package to transform data in the table on the Wikipedia page into pandas dataframe.
#### Importing libraries to get data in required format. 

In [44]:
import requests
from bs4 import BeautifulSoup
import pandas as pd

In [45]:
url = requests.get('https://en.wikipedia.org/wiki/List_of_municipalities_of_the_Brussels-Capital_Region').text

In [46]:
soup = BeautifulSoup(url,'lxml')
print(soup.prettify())

<!DOCTYPE html>
<html class="client-nojs" dir="ltr" lang="en">
 <head>
  <meta charset="utf-8"/>
  <title>
   List of municipalities of the Brussels-Capital Region - Wikipedia
  </title>
  <script>
   document.documentElement.className="client-js";RLCONF={"wgBreakFrames":!1,"wgSeparatorTransformTable":["",""],"wgDigitTransformTable":["",""],"wgDefaultDateFormat":"dmy","wgMonthNames":["","January","February","March","April","May","June","July","August","September","October","November","December"],"wgMonthNamesShort":["","Jan","Feb","Mar","Apr","May","Jun","Jul","Aug","Sep","Oct","Nov","Dec"],"wgRequestId":"Xd4X5ApAICkAADuMvHsAAACI","wgCSPNonce":!1,"wgCanonicalNamespace":"","wgCanonicalSpecialPageName":!1,"wgNamespaceNumber":0,"wgPageName":"List_of_municipalities_of_the_Brussels-Capital_Region","wgTitle":"List of municipalities of the Brussels-Capital Region","wgCurRevisionId":923747618,"wgRevisionId":923747618,"wgArticleId":261746,"wgIsArticle":!0,"wgIsRedirect":!1,"wgAction":"view","wg

#### We can see that the data required is available in table and belongs to class 'wikitable sortable'. So, let's extract only the table data

In [47]:
My_table = soup.find('table',{'class':'wikitable sortable'})
My_table

<table class="wikitable sortable">
<tbody><tr>
<th>
</th>
<th>French name
</th>
<th>Dutch name
</th>
<th class="unsortable"><a href="/wiki/Flag" title="Flag">Flag</a>
</th>
<th class="unsortable"><a href="/wiki/Coat_of_arms" title="Coat of arms">CoA</a>
</th>
<th><a href="/wiki/Postal_code" title="Postal code">post<br/><small>code</small></a>
</th>
<th>Population<br/><small>(1/1/2017)</small>
</th>
<th>Area
</th>
<th><a href="/wiki/Population_density" title="Population density">Population density</a><br/><small>(km²)</small>
</th>
<th class="unsortable">Ref.
</th></tr>
<tr>
<td style="text-align:center;">1
</td>
<td style="text-align:left"><a href="/wiki/Anderlecht" title="Anderlecht">Anderlecht</a>
</td>
<td style="text-align:left"><a href="/wiki/Anderlecht" title="Anderlecht">Anderlecht</a>
</td>
<td style="text-align:center;"><a class="image" href="/wiki/File:Flag_of_Anderlecht.svg"><img alt="Flag of Anderlecht.svg" class="thumbborder" data-file-height="400" data-file-width="600" de

In [48]:
tableRows = [[td.get_text() for td in row.find_all('td')] for row in My_table.find_all('tr')[1:]]
print(tableRows)

[['1\n', 'Anderlecht\n', 'Anderlecht\n', '\n', '\n', '1070\n', '118,241\n', '17.717.7\xa0km2 (6.8\xa0sq\xa0mi)\n', '6,680\n'], ['2\n', 'Auderghem\n', 'Oudergem\n', '\n', '\n', '1160\n', '33,313\n', '09.09.0\xa0km2 (3.5\xa0sq\xa0mi)\n', '3,701\n'], ['3\n', 'Berchem-Sainte-Agathe\n', 'Sint-Agatha-Berchem\n', '\n', '\n', '1082\n', '24,701\n', '02.92.9\xa0km2 (1.1\xa0sq\xa0mi)\n', '8,518\n'], ['4\n', 'Bruxelles-Ville*\n', 'Stad Brussel*\n', '\n', '\n', '1000102011201130\n', '176,545\n', '32.632.6\xa0km2 (12.6\xa0sq\xa0mi)\n', '5,415\n'], ['5\n', 'Etterbeek\n', 'Etterbeek\n', '\n', '\n', '1040\n', '47,414\n', '03.13.1\xa0km2 (1.2\xa0sq\xa0mi)\n', '15,295\n'], ['6\n', 'Evere\n', 'Evere\n', '\n', '\n', '1140\n', '40,394\n', '05.05.0\xa0km2 (1.9\xa0sq\xa0mi)\n', '8,079\n'], ['7\n', 'Forest\n', 'Vorst\n', '\n', '\n', '1190\n', '55,746\n', '06.26.2\xa0km2 (2.4\xa0sq\xa0mi)\n', '8,991\n'], ['8\n', 'Ganshoren\n', 'Ganshoren\n', '\n', '\n', '1083\n', '24,596\n', '02.52.5\xa0km2 (1.0\xa0sq\xa0mi)\n'

#### Let's extract the data into pandas dataframe

In [49]:
tableHeaders = ["Index","Neighborhood","Dutch name","Flag","CoA","Postal code","Population","Area","Population density"]
df = pd.DataFrame(tableRows, columns=tableHeaders)
df.head(10)

Unnamed: 0,Index,Neighborhood,Dutch name,Flag,CoA,Postal code,Population,Area,Population density
0,1\n,Anderlecht\n,Anderlecht\n,\n,\n,1070\n,"118,241\n",17.717.7 km2 (6.8 sq mi)\n,"6,680\n"
1,2\n,Auderghem\n,Oudergem\n,\n,\n,1160\n,"33,313\n",09.09.0 km2 (3.5 sq mi)\n,"3,701\n"
2,3\n,Berchem-Sainte-Agathe\n,Sint-Agatha-Berchem\n,\n,\n,1082\n,"24,701\n",02.92.9 km2 (1.1 sq mi)\n,"8,518\n"
3,4\n,Bruxelles-Ville*\n,Stad Brussel*\n,\n,\n,1000102011201130\n,"176,545\n",32.632.6 km2 (12.6 sq mi)\n,"5,415\n"
4,5\n,Etterbeek\n,Etterbeek\n,\n,\n,1040\n,"47,414\n",03.13.1 km2 (1.2 sq mi)\n,"15,295\n"
5,6\n,Evere\n,Evere\n,\n,\n,1140\n,"40,394\n",05.05.0 km2 (1.9 sq mi)\n,"8,079\n"
6,7\n,Forest\n,Vorst\n,\n,\n,1190\n,"55,746\n",06.26.2 km2 (2.4 sq mi)\n,"8,991\n"
7,8\n,Ganshoren\n,Ganshoren\n,\n,\n,1083\n,"24,596\n",02.52.5 km2 (1.0 sq mi)\n,"9,838\n"
8,9\n,Ixelles\n,Elsene\n,\n,\n,1050\n,"86,244\n",06.36.3 km2 (2.4 sq mi)\n,"13,690\n"
9,10\n,Jette\n,Jette\n,\n,\n,1090\n,"51,933\n",05.05.0 km2 (1.9 sq mi)\n,"10,387\n"


#### Let's refine and transfrom data as required

In [50]:
df = df.replace('\n','', regex=True)
df.head()

Unnamed: 0,Index,Neighborhood,Dutch name,Flag,CoA,Postal code,Population,Area,Population density
0,1,Anderlecht,Anderlecht,,,1070,118241,17.717.7 km2 (6.8 sq mi),6680
1,2,Auderghem,Oudergem,,,1160,33313,09.09.0 km2 (3.5 sq mi),3701
2,3,Berchem-Sainte-Agathe,Sint-Agatha-Berchem,,,1082,24701,02.92.9 km2 (1.1 sq mi),8518
3,4,Bruxelles-Ville*,Stad Brussel*,,,1000102011201130,176545,32.632.6 km2 (12.6 sq mi),5415
4,5,Etterbeek,Etterbeek,,,1040,47414,03.13.1 km2 (1.2 sq mi),15295


#### Let's drop the columns that are not required

In [51]:
df.drop(['Index', 'Dutch name', 'Flag','CoA','Population','Area','Population density'], inplace = True, axis = 1)
df.head()

Unnamed: 0,Neighborhood,Postal code
0,Anderlecht,1070
1,Auderghem,1160
2,Berchem-Sainte-Agathe,1082
3,Bruxelles-Ville*,1000102011201130
4,Etterbeek,1040


In [52]:
df['Postal code'] = df['Postal code'].replace('1000102011201130','1000')
df.head()

Unnamed: 0,Neighborhood,Postal code
0,Anderlecht,1070
1,Auderghem,1160
2,Berchem-Sainte-Agathe,1082
3,Bruxelles-Ville*,1000
4,Etterbeek,1040


In [53]:
df['Postal code']= df['Postal code'].astype(int)
df

Unnamed: 0,Neighborhood,Postal code
0,Anderlecht,1070
1,Auderghem,1160
2,Berchem-Sainte-Agathe,1082
3,Bruxelles-Ville*,1000
4,Etterbeek,1040
5,Evere,1140
6,Forest,1190
7,Ganshoren,1083
8,Ixelles,1050
9,Jette,1090


#### Now let's add the csv file containing latitude and logitude values to server and then impirt the data into pandas dataframe

In [54]:
# The code was removed by Watson Studio for sharing.

Unnamed: 0,Postal code,Neighborhood,Latitude,Longitude
0,1000,Bruxelles,50.846557,4.351697
1,1020,Laeken,50.883392,4.348713
2,1030,Schaerbeek,50.867604,4.373712
3,1040,Etterbeek,50.836851,4.38951
4,1050,Ixelles,50.822285,4.381571


#### Let's merge two dataframes into a single dataframe for further analysis

In [55]:
Brussel_df = pd.merge(df,df_data_0, on='Postal code')
Brussel_df

Unnamed: 0,Neighborhood_x,Postal code,Neighborhood_y,Latitude,Longitude
0,Anderlecht,1070,Anderlecht,50.838141,4.31234
1,Auderghem,1160,Auderghem,50.815657,4.433139
2,Berchem-Sainte-Agathe,1082,Berchem-Sainte-Agathe,50.863984,4.292702
3,Bruxelles-Ville*,1000,Bruxelles,50.846557,4.351697
4,Etterbeek,1040,Etterbeek,50.836851,4.38951
5,Evere,1140,Evere,50.870452,4.40216
6,Forest,1190,Forest,50.809143,4.317751
7,Ganshoren,1083,Ganshoren,50.87124,4.31751
8,Ixelles,1050,Ixelles,50.822285,4.381571
9,Jette,1090,Jette,50.877763,4.32609


In [56]:
Brussel_df.drop(['Neighborhood_x'], inplace = True, axis = 1)
Brussel_df.head()

Unnamed: 0,Postal code,Neighborhood_y,Latitude,Longitude
0,1070,Anderlecht,50.838141,4.31234
1,1160,Auderghem,50.815657,4.433139
2,1082,Berchem-Sainte-Agathe,50.863984,4.292702
3,1000,Bruxelles,50.846557,4.351697
4,1040,Etterbeek,50.836851,4.38951


In [57]:
Brussel_df.rename(columns={'Neighborhood_y':'Neighborhood'}, inplace=True)
Brussel_df.head()

Unnamed: 0,Postal code,Neighborhood,Latitude,Longitude
0,1070,Anderlecht,50.838141,4.31234
1,1160,Auderghem,50.815657,4.433139
2,1082,Berchem-Sainte-Agathe,50.863984,4.292702
3,1000,Bruxelles,50.846557,4.351697
4,1040,Etterbeek,50.836851,4.38951


### Let's Explore and cluster the neighborhoods in Brussels

#### Use geopy library to set latitude and longitude values of Brussels

In [59]:
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

!conda install -c conda-forge folium=0.5.0 --yes
import folium # map rendering library

print('Libraries imported.')

Solving environment: done

# All requested packages already installed.

Libraries imported.


In [60]:
address = 'Brussel, Belgium'

geolocator = Nominatim(user_agent="Brussel")
location = geolocator.geocode(address)
latitude_Brussel = location.latitude
longitude_Brussel = location.longitude
print('The geograpical coordinate of Brussel are {}, {}.'.format(latitude_Brussel, longitude_Brussel))

The geograpical coordinate of Brussel are 50.8436709, 4.36743669338796.


In [61]:
map_Brussel = folium.Map(location=[latitude_Brussel, longitude_Brussel], zoom_start=10)

# add markers to map
for lat, lng, Neighbourhood in zip(Brussel_df['Latitude'], Brussel_df['Longitude'], Brussel_df['Neighborhood']):
    label = '{}'.format(Neighbourhood)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_Brussel)  
    
map_Brussel

#### Define foursquare credentials and version

In [62]:
# The code was removed by Watson Studio for sharing.

#### Let's check all the neighborhoods in Brussels

In [63]:
radius = 500
LIMIT = 100

In [64]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [65]:
Brussel_venues = getNearbyVenues(names=Brussel_df['Neighborhood'],
                                   latitudes=Brussel_df['Latitude'],
                                   longitudes=Brussel_df['Longitude']
                                  )

Anderlecht
Auderghem
Berchem-Sainte-Agathe
Bruxelles
Etterbeek
Evere
Forest
Ganshoren
Ixelles
Jette
Koekelberg
Molenbeek-Saint-Jean
Saint-Gilles
Saint-Josse-Ten-Noode
Schaerbeek
Uccle
Watermael-Boitsfort
Woluwe-Saint-Lambert
Woluwe-Saint-Pierre


#### Let's check the size of resulting dataframe

In [66]:
print(Brussel_venues.shape)
Brussel_venues.head()

(605, 7)


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Anderlecht,50.838141,4.31234,Friture René,50.835846,4.311632,Belgian Restaurant
1,Anderlecht,50.838141,4.31234,Snack Mirvan,50.835176,4.308543,Snack Place
2,Anderlecht,50.838141,4.31234,Le Chapeau Blanc,50.835034,4.30779,Restaurant
3,Anderlecht,50.838141,4.31234,Ulysse,50.838612,4.30686,Greek Restaurant
4,Anderlecht,50.838141,4.31234,Bospark / Parc Forestier (Bospark),50.840087,4.310731,Park


In [67]:
Brussel_venues.groupby('Neighborhood').count()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Anderlecht,16,16,16,16,16,16
Auderghem,34,34,34,34,34,34
Berchem-Sainte-Agathe,11,11,11,11,11,11
Bruxelles,100,100,100,100,100,100
Etterbeek,45,45,45,45,45,45
Evere,27,27,27,27,27,27
Forest,14,14,14,14,14,14
Ganshoren,26,26,26,26,26,26
Ixelles,30,30,30,30,30,30
Jette,39,39,39,39,39,39


#### Let's find out how many unique categories can be curated from all the returned venues

In [68]:
print('There are {} uniques categories.'.format(len(Brussel_venues['Venue Category'].unique())))

There are 163 uniques categories.


#### analyze each neighborhood

In [69]:
# one hot encoding
Brussel_onehot = pd.get_dummies(Brussel_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
Brussel_onehot['Neighborhood'] = Brussel_venues['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [Brussel_onehot.columns[-1]] + list(Brussel_onehot.columns[:-1])
Brussel_onehot = Brussel_onehot[fixed_columns]

Brussel_onehot.head()

Unnamed: 0,Neighborhood,African Restaurant,American Restaurant,Art Gallery,Art Museum,Asian Restaurant,Athletics & Sports,BBQ Joint,Bagel Shop,Bakery,...,Train Station,Tram Station,Turkish Restaurant,Vegetarian / Vegan Restaurant,Video Game Store,Video Store,Vietnamese Restaurant,Volleyball Court,Wine Bar,Winery
0,Anderlecht,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,Anderlecht,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,Anderlecht,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,Anderlecht,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,Anderlecht,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [70]:
Brussel_onehot.shape

(605, 164)

#### Next, let's group rows by neighborhood and by taking the mean of the frequency of occurrence of each category

In [71]:
Brussel_grouped = Brussel_onehot.groupby('Neighborhood').mean().reset_index()
Brussel_grouped

Unnamed: 0,Neighborhood,African Restaurant,American Restaurant,Art Gallery,Art Museum,Asian Restaurant,Athletics & Sports,BBQ Joint,Bagel Shop,Bakery,...,Train Station,Tram Station,Turkish Restaurant,Vegetarian / Vegan Restaurant,Video Game Store,Video Store,Vietnamese Restaurant,Volleyball Court,Wine Bar,Winery
0,Anderlecht,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,Auderghem,0.0,0.0,0.0,0.0,0.0,0.029412,0.0,0.0,0.029412,...,0.0,0.029412,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,Berchem-Sainte-Agathe,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.090909,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,Bruxelles,0.0,0.01,0.0,0.0,0.02,0.0,0.0,0.0,0.02,...,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.01,0.0
4,Etterbeek,0.0,0.0,0.022222,0.022222,0.022222,0.0,0.0,0.0,0.022222,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
5,Evere,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.074074,...,0.0,0.037037,0.0,0.0,0.0,0.037037,0.0,0.0,0.0,0.0
6,Forest,0.0,0.0,0.0,0.0,0.0,0.071429,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
7,Ganshoren,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.038462,...,0.0,0.0,0.0,0.0,0.038462,0.0,0.0,0.0,0.0,0.0
8,Ixelles,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.033333,0.0,0.0,0.0,0.0,0.066667,0.0,0.033333,0.0
9,Jette,0.0,0.0,0.025641,0.0,0.0,0.0,0.025641,0.0,0.025641,...,0.025641,0.102564,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [72]:
Brussel_grouped.shape

(19, 164)

#### Let's create new data frame and display top 10 venues for each neighborhood

In [73]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [74]:
import numpy as np
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = Brussel_grouped['Neighborhood']

for ind in np.arange(Brussel_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(Brussel_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted.head()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Anderlecht,Grocery Store,Greek Restaurant,Convenience Store,Restaurant,Cosmetics Shop,Middle Eastern Restaurant,Snack Place,Discount Store,Belgian Restaurant,Sandwich Place
1,Auderghem,French Restaurant,Grocery Store,Middle Eastern Restaurant,Sushi Restaurant,Belgian Restaurant,Italian Restaurant,Athletics & Sports,Ice Cream Shop,Nightclub,Cultural Center
2,Berchem-Sainte-Agathe,Greek Restaurant,Burger Joint,Snack Place,Gym,Plaza,Tram Station,Supermarket,French Restaurant,Restaurant,Furniture / Home Store
3,Bruxelles,Chocolate Shop,Bar,Beer Bar,Sandwich Place,Belgian Restaurant,Gastropub,Plaza,Greek Restaurant,Bookstore,Thai Restaurant
4,Etterbeek,Italian Restaurant,Plaza,Pizza Place,Bar,Lounge,Greek Restaurant,French Restaurant,Bus Stop,Sandwich Place,Snack Place


#### Cluster Neighborhoods - Run k-means to cluster the neighborhoods into 7 clusters

In [75]:
# set number of clusters
kclusters = 7

Brussel_grouped_clustering = Brussel_grouped.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(Brussel_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

array([0, 0, 4, 2, 2, 2, 5, 2, 2, 2], dtype=int32)

Let's create a new dataframe that includes the cluster as well as the top 10 venues for each neighborhood.

In [76]:
neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)
Brussel_merged = Brussel_df

# merge toronto_grouped with toronto_data to add latitude/longitude for each neighborhood
Brussel_merged = Brussel_merged.join(neighborhoods_venues_sorted.set_index('Neighborhood'), on='Neighborhood', how='right')

Brussel_merged.head() # check the last columns!

Unnamed: 0,Postal code,Neighborhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,1070,Anderlecht,50.838141,4.31234,0,Grocery Store,Greek Restaurant,Convenience Store,Restaurant,Cosmetics Shop,Middle Eastern Restaurant,Snack Place,Discount Store,Belgian Restaurant,Sandwich Place
1,1160,Auderghem,50.815657,4.433139,0,French Restaurant,Grocery Store,Middle Eastern Restaurant,Sushi Restaurant,Belgian Restaurant,Italian Restaurant,Athletics & Sports,Ice Cream Shop,Nightclub,Cultural Center
2,1082,Berchem-Sainte-Agathe,50.863984,4.292702,4,Greek Restaurant,Burger Joint,Snack Place,Gym,Plaza,Tram Station,Supermarket,French Restaurant,Restaurant,Furniture / Home Store
3,1000,Bruxelles,50.846557,4.351697,2,Chocolate Shop,Bar,Beer Bar,Sandwich Place,Belgian Restaurant,Gastropub,Plaza,Greek Restaurant,Bookstore,Thai Restaurant
4,1040,Etterbeek,50.836851,4.38951,2,Italian Restaurant,Plaza,Pizza Place,Bar,Lounge,Greek Restaurant,French Restaurant,Bus Stop,Sandwich Place,Snack Place


#### Let's visualize resulting clusters

In [77]:
# create map
map_clusters = folium.Map(location=[latitude_Brussel, longitude_Brussel], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i+x+(i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(Brussel_merged['Latitude'], Brussel_merged['Longitude'], Brussel_merged['Neighborhood'], Brussel_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

In [78]:
Brussel_merged.loc[Brussel_merged['Cluster Labels'] == 0, Brussel_merged.columns[[0] + list(range(5, Brussel_merged.shape[1]))]]

Unnamed: 0,Postal code,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,1070,Grocery Store,Greek Restaurant,Convenience Store,Restaurant,Cosmetics Shop,Middle Eastern Restaurant,Snack Place,Discount Store,Belgian Restaurant,Sandwich Place
1,1160,French Restaurant,Grocery Store,Middle Eastern Restaurant,Sushi Restaurant,Belgian Restaurant,Italian Restaurant,Athletics & Sports,Ice Cream Shop,Nightclub,Cultural Center
17,1200,Park,French Restaurant,Supermarket,Italian Restaurant,Restaurant,Bus Station,Chinese Restaurant,Basketball Court,Climbing Gym,Fried Chicken Joint


In [79]:
Brussel_merged.loc[Brussel_merged['Cluster Labels'] == 1, Brussel_merged.columns[[0] + list(range(5, Brussel_merged.shape[1]))]]

Unnamed: 0,Postal code,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
11,1080,Supermarket,Spa,Dance Studio,Fried Chicken Joint,Snack Place,Winery,Electronics Store,Falafel Restaurant,Factory,Exhibit


In [80]:
Brussel_merged.loc[Brussel_merged['Cluster Labels'] == 2, Brussel_merged.columns[[0] + list(range(5, Brussel_merged.shape[1]))]]

Unnamed: 0,Postal code,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
3,1000,Chocolate Shop,Bar,Beer Bar,Sandwich Place,Belgian Restaurant,Gastropub,Plaza,Greek Restaurant,Bookstore,Thai Restaurant
4,1040,Italian Restaurant,Plaza,Pizza Place,Bar,Lounge,Greek Restaurant,French Restaurant,Bus Stop,Sandwich Place,Snack Place
5,1140,Restaurant,Bakery,Hockey Field,Pool,Sports Bar,Brasserie,Snack Place,Bus Station,Sandwich Place,Plaza
7,1083,Italian Restaurant,Bar,Bus Station,Bus Stop,Café,Sandwich Place,Chinese Restaurant,Flower Shop,Steakhouse,Supermarket
8,1050,Plaza,Bar,French Restaurant,Vietnamese Restaurant,Thai Restaurant,Greek Restaurant,Paper / Office Supplies Store,Grocery Store,Pizza Place,Playground
9,1090,Tram Station,Bar,Gym,Pizza Place,Snack Place,Bus Station,Supermarket,Gastropub,Grocery Store,Gym / Fitness Center
12,1060,Bar,Brasserie,Plaza,Pizza Place,Italian Restaurant,French Restaurant,Sandwich Place,Moroccan Restaurant,Gym / Fitness Center,Grocery Store
13,1210,Sandwich Place,Italian Restaurant,Kebab Restaurant,Restaurant,Supermarket,Pizza Place,Plaza,Middle Eastern Restaurant,Snack Place,Lebanese Restaurant
14,1030,Supermarket,Plaza,Bakery,Tram Station,Gastropub,Italian Restaurant,Middle Eastern Restaurant,Electronics Store,Coffee Shop,French Restaurant
15,1180,Plaza,Cosmetics Shop,French Restaurant,Supermarket,Sandwich Place,Italian Restaurant,Bakery,Food & Drink Shop,Chocolate Shop,Kids Store


In [81]:
Brussel_merged.loc[Brussel_merged['Cluster Labels'] == 3, Brussel_merged.columns[[0] + list(range(5, Brussel_merged.shape[1]))]]

Unnamed: 0,Postal code,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
18,1150,Park,Tram Station,Trail,Sports Club,Winery,Discount Store,Farmers Market,Falafel Restaurant,Factory,Exhibit


In [82]:
Brussel_merged.loc[Brussel_merged['Cluster Labels'] == 4, Brussel_merged.columns[[0] + list(range(5, Brussel_merged.shape[1]))]]

Unnamed: 0,Postal code,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
2,1082,Greek Restaurant,Burger Joint,Snack Place,Gym,Plaza,Tram Station,Supermarket,French Restaurant,Restaurant,Furniture / Home Store


In [83]:
Brussel_merged.loc[Brussel_merged['Cluster Labels'] == 5, Brussel_merged.columns[[0] + list(range(5, Brussel_merged.shape[1]))]]

Unnamed: 0,Postal code,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
6,1190,Supermarket,French Restaurant,Halal Restaurant,Park,Health Food Store,Factory,Athletics & Sports,Plaza,Men's Store,Cafeteria


In [84]:
Brussel_merged.loc[Brussel_merged['Cluster Labels'] == 6, Brussel_merged.columns[[0] + list(range(5, Brussel_merged.shape[1]))]]

Unnamed: 0,Postal code,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
10,1081,Gym / Fitness Center,Bar,Convenience Store,Park,Indonesian Restaurant,Bed & Breakfast,Falafel Restaurant,Sandwich Place,French Restaurant,Hostel


### Results

The following are the highlights of the 7 clusters above:
1. The most common venues are clearly located only in 3rd Cluster (center of Brussels), which makes the choice of the final location very easy.
2. As for restaurants, bars and coffee shops are very popular also in 3rd Cluster (center of Brussels), Especially in 1000 Brussels, 1040 Etterbeek and 1050 Ixelles.
3. Although, the Clusters have variations, a very visible presence is the predominance of bars and restaurants, so new hotel can be opened in those places.

### Discussion and Conclusion

It is noticable that 3rd Cluster is the most viable clusters to build a new luxury hotel with guarantees. The proximity to a big number of Restaurants (lunch and dinner venues for guests), Coffee shops and other amenities and accessibility to station are also very important points to take into account when making the right choice. 

The municipalities like 1000 Brussels, 1040 Etterbeek and 1050 Ixelles, lies in the center of Brussels and have proximity to all kind of most common venues visited by locals as well as tourists. These neighborhoods could be the best places to open new luxury hotel in the city. 

In conclusion, this project would have had better results if there were more available data in terms of actual land pricing data within the area, public transportantion access and allowance of more venues exploration with the Foursquare (limited venues for free calls).
However, based on the available data, my advice to Marriott group would be to focus on 1000 Brussels, 1040 Etterbeek and 1050 Ixelles when investing on a new luxury hotel.