# Capstone Project - The Battle of the Neighborhoods

#### Applied Data Science Capstone by IBM/Coursera

# Table of contents
* [Business Problem and Understanding](#Business-Problem-and-Understanding)
* [Data](#Data)
* [Data preparation](#Data-preparation)
* [Methodology](#Methodology)
* [Analysis](#Analysis)
* [Results](#Results)
* [Discussion](#Discussion)
* [Conclusion](#Conclusion)

# Business Problem and Understanding

This data analytics project will concern choosing the best location in Rotterdam to set up a cafe. Rotterdam is a bustling student city and there are already a lot of cafes around. Therefore, the owner would like to know where it is best to set up a cafe. The Neighbourhood should ideally be trendy and visited by people, but there should not be too many cafes already so that the competition is not too high.  Therefore, the question is: **Which neighbourhood is the best to set up a restaurant given that Rotterdam is a competitive environment for cafes?**

#### Criteria

The criteria to be taken into account:
1. The cafe should be located within 2.5km range from the neighbourhood center.
2. The neighbourhood should have positive reviews from tourists or city inhabitants.
3. Preferrably closer to the city centre, although other locations are possible.
4. Bring visibility of the areas with a lot of restaurants/cafes in Rotterdam.

#### Stakeholders

The owner of the future cafe.

#### Why is the problem important?

Finding the best location to set up a cafe is crucial because it will maximize the profit of the cafe in question

# Data

The dataset used includes data on 26 neighbourhoods in Rotterdam together with location and latitude of them. The fields in the datset include: 
1. Neighbourhood number, 
2. Neighbourhood name, 
3. Latitude, 
4. Longitude. 

For the rest, *Foursquare API* will be used to get the data on trendy places as well as number of cafes per neighbourhood. The dataset on venues extracted from API will include names of the current cafes, their category, address, latitude, longitude, distance, postal code, city, and id. 

**These 2 datasets will allow to group and cluster data, and decide where trending locations are as well as which Neighbourhoods have more or less cafes.**

|Neighbourhood Name| Neighbourhood Number| Neighbourhood Code| Latitude | Longitude
|------|------|-------|-------|-------|

###### Example of table 1 - Neighbourhood data. Source: Rotterdam city website - https://www.rotterdam.nl

|Cafe Name| Category| Address| latitude| longitude| distance| postal code| city| id| 
|----|-------|-------|------|-------|--------|--------|------|--------|

###### Example of table 2 - Venues data. Source: Foursquare API

# Data preparation

First, the necessary packages are exported:

In [1]:
import pandas as pd
import numpy as np
import requests
import types
import pandas as pd
from botocore.client import Config
import ibm_boto3
!conda install -c conda-forge geopy --yes 
from geopy.geocoders import Nominatim # module to convert an address into latitude and longitude values
from IPython.display import Image 
from IPython.core.display import HTML 
from pandas.io.json import json_normalize
!conda install -c conda-forge folium=0.5.0 --yes
import folium
!pip install pyproj
import pyproj
import math

Solving environment: done

## Package Plan ##

  environment location: /opt/conda/envs/Python36

  added / updated specs: 
    - geopy


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    openssl-1.1.1c             |       h516909a_0         2.1 MB  conda-forge
    geographiclib-1.50         |             py_0          34 KB  conda-forge
    geopy-1.20.0               |             py_0          57 KB  conda-forge
    ca-certificates-2019.9.11  |       hecc5488_0         144 KB  conda-forge
    certifi-2019.9.11          |           py36_0         147 KB  conda-forge
    ------------------------------------------------------------
                                           Total:         2.5 MB

The following NEW packages will be INSTALLED:

    geographiclib:   1.50-py_0         conda-forge
    geopy:           1.20.0-py_0       conda-forge

The following packages will be UPDATED:

    ca-

Then, the data is loaded. The Source of this table is Rotterdam website (please, read the details in [Data section](#Data)).

In [6]:
def __iter__(self): return 0

# @hidden_cell
# The following code accesses a file in your IBM Cloud Object Storage. It includes your credentials.
# You might want to remove those credentials before you share the notebook.
client_43d174e2a28c4a688502803b33a6000a = ibm_boto3.client(service_name='s3',
    ibm_api_key_id='tnSn2jOMmTbSLpH6E5EeV0YhQUutl00kR4ths8LtfNIf',
    ibm_auth_endpoint="https://iam.eu-gb.bluemix.net/oidc/token",
    config=Config(signature_version='oauth'),
    endpoint_url='https://s3.eu-geo.objectstorage.service.networklayer.com')

body = client_43d174e2a28c4a688502803b33a6000a.get_object(Bucket='pythonbasicsfordatascienceproject-donotdelete-pr-tv665solrgtita',Key='Neighbourhoods.csv')['Body']
# add missing __iter__ method, so pandas accepts body as file-like object
if not hasattr(body, "__iter__"): body.__iter__ = types.MethodType( __iter__, body )

df = pd.read_csv(body)

Furthermore, the Foursquare API credentials need to be defined.

In [7]:
CLIENT_ID = 'JAM12CPOMRFSVKPNQOEXBKBJEUENNHOZQW2GYLDYHJR1FGIQ' # your Foursquare ID
CLIENT_SECRET = 'WBBFTIOJF2NJE3T4TZVFGBWOQGE4TLEEJ4MIKE14OBTHIAOV' # your Foursquare Secret
VERSION = '20180604'
LIMIT = 30

We need to search for cafes in Rotterdam by location and then filter out the columns of interest for us (*Name, categories, and location data*) as well as create a dataframe out of this data:

In [8]:
query='Cafe'
def getNearbyVenues(names, latitudes, longitudes, radius=2500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng,
            radius,
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighbourhood', 
                  'Neighbourhood Latitude', 
                  'Neighbourhood Longitude', 
                  'Venue', 
                  'VenueLatitude', 
                  'VenueLongitude', 
                  'VenueCategory']
    
    return(nearby_venues)

In [9]:
Rotterdam_venues = getNearbyVenues(names=df['DistrName'],
                                   latitudes=df['Latitude'],
                                   longitudes=df['Longitude']
                                  )




Stadscentrum
Delfshaven
Overschie
Noord
Hillegersberg-Schiebroek
Kralingen-Crooswijk
Feijenoord
IJsselmonde
Pernis
Prins Alexander
Charlois
Hoogvliet
Hoek van Holland
Spaanse Polder
Nieuw Mathenesse
Waalhaven-Eemhaven
Vondelingsplaat
Botlek-Europoort-Maasvlakte
Rotterdam-Noord-West
Rivium
Bedrijventerrein Schieveen
Rozenburg


Let's check the dimensions of the resulting dataframe:

In [10]:
print(Rotterdam_venues.shape)
Rotterdam_venues.head(3)

(614, 7)


Unnamed: 0,Neighbourhood,Neighbourhood Latitude,Neighbourhood Longitude,Venue,VenueLatitude,VenueLongitude,VenueCategory
0,Stadscentrum,51.922909,4.47059,Lebkov & Sons Rotterdam,51.923679,4.469122,Sandwich Place
1,Stadscentrum,51.922909,4.47059,Station Rotterdam Centraal,51.924803,4.469556,Train Station
2,Stadscentrum,51.922909,4.47059,Bertmans,51.920812,4.474312,Vegetarian / Vegan Restaurant


Now, we need to remove categories that are not of interest for us. First, we will get all the unique values:

In [11]:
Rotterdam_venues['VenueCategory'].unique()

array(['Sandwich Place', 'Train Station', 'Vegetarian / Vegan Restaurant',
       'Hotel', 'Supermarket', 'Concert Hall', 'Beer Garden', 'Café',
       'Ice Cream Shop', 'Grocery Store', 'Hostel',
       'Residential Building (Apartment / Condo)', 'Bubble Tea Shop',
       'Coffee Shop', 'Gastropub', 'Pizza Place', 'Movie Theater',
       'Boutique', 'Ramen Restaurant', 'French Restaurant',
       'Italian Restaurant', 'Bar', 'Pub', 'Wine Bar', 'Museum',
       'Historic Site', 'Indonesian Restaurant', 'Pool Hall', 'Brewery',
       'Deli / Bodega', 'Bougatsa Shop', 'Restaurant',
       'Salon / Barbershop', 'Scenic Lookout', 'Plaza',
       'Chinese Restaurant', 'Health Food Store', 'Park', 'Lounge',
       'Caribbean Restaurant', 'Bistro', 'Breakfast Spot',
       'Indian Restaurant', 'Bakery', 'Airport', 'Fish Market', 'Zoo',
       'Aquarium', 'Zoo Exhibit', 'Gym / Fitness Center', 'Soccer Field',
       'Multiplex', 'Soccer Stadium', 'Exhibit', 'Airport Service',
       'Big Box S

All the categories that are of interest for us are listed below. The catgories were chosen according to the following criteria:
1. They can be selling tea/coffee/refreshing drinks;
2. They can be offering desserts, baked products.

Therefore, *restaurants, bars, cafes, bakeries, snack places, and sandwich places* are all on the list.

In [12]:
needed = ['Sandwich Place', 'Coffee Shop', 'Bubble Tea Shop','Italian Restaurant', 'Ramen Restaurant', 'Vegetarian / Vegan Restaurant', 'Bar',
       'Bagel Shop', 'Chinese Restaurant', 'Ice Cream Shop','Asian Restaurant',
       'Modern European Restaurant', 'Café', 'Sushi Restaurant', 'French Restaurant','Indonesian Restaurant',
       'Middle Eastern Restaurant', 'Restaurant', 'Snack Place',
       'Mediterranean Restaurant','Steakhouse', 'Bakery',
       'Latin American Restaurant', 'Turkish Restaurant', 'Dessert Shop','Argentinian Restaurant',
       'Indian Restaurant', 'South American Restaurant',
       'Thai Restaurant', 'Southern / Soul Food Restaurant',
       'Moroccan Restaurant', 'Food', 'Hotel Bar', 'Japanese Restaurant', 'Diner',
       'Seafood Restaurant']

A new dataframe is created. As you can see, now there are  **249 venues** of interest for us. With this dataset, we can start the analysis.

In [26]:
Final = Rotterdam_venues[Rotterdam_venues['VenueCategory'].str.contains('|'.join(needed))]
Final.shape

(249, 7)

This is how the final dataset looks like:

In [14]:
Final.head(3)

Unnamed: 0,Neighbourhood,Neighbourhood Latitude,Neighbourhood Longitude,Venue,VenueLatitude,VenueLongitude,VenueCategory
0,Stadscentrum,51.922909,4.47059,Lebkov & Sons Rotterdam,51.923679,4.469122,Sandwich Place
2,Stadscentrum,51.922909,4.47059,Bertmans,51.920812,4.474312,Vegetarian / Vegan Restaurant
7,Stadscentrum,51.922909,4.47059,Op het Dak,51.925637,4.476674,Café


# Methodology

In this project we will direct our efforts on detecting areas of Rotterdam that have low restaurant & cafe density, particularly those with low number of places where you can get a drink & dessert. We will limit our analysis to the 26 city disctricts.

In first step we have collected the required data: location and type (category) of every restaurant in the 26 disctricts. We have also identified which venues are of interest for us.

Second step in our analysis will be calculation and exploration of 'restaurant density' across different areas of Rotterdam - we will use heatmaps to identify a few promising areas close to center with low number of restaurants in general and focus our attention on those areas.

In third and final step we will generate the most promising areas, within 2.5km range from neighbourhood center, and within those create clusters of locations that meet some basic requirements established in discussion with stakeholders: we will show locations with low restaurant & cafe density, and we want locations that are closer to the city centre. We will present map of all such locations but also create clusters (using k-means clustering) of those locations to identify general zones / neighborhoods / addresses which should be a starting point for final 'street level' exploration and search for optimal venue location by stakeholders.

**Please note.** There is no aim to identify the locations that do not have any cafes or restaurants nearby. The aim is to bring visibility of cafe-dense areas and suggest several locations within the busiest and most centered neighbourhoods, even if they fall into "heated" areas.

# Analysis

Now, let's perform some basic exploration of data. By Neighbourhood we can see how many restaurants there are. For example, *Noord, Kralingen, Delfshaven, Schiebroek, Hoek van Holland* are the five top locations by the amount of venues. Additionally, we can see that Rotterdam is, indeed, very bustling and lively city. On average, there are 9.6 restaurants per Neighbourhood.

In [27]:
from __future__ import division
r = Final['Venue'].groupby(Final['Neighbourhood']).count()
num = 249/26 
print('Average number of venues per neighbourhood:', num)
g = pd.DataFrame(data=r)
g.sort_values('Venue',ascending=False).head(5)

Average number of venues per neighbourhood: 9.576923076923077


Unnamed: 0_level_0,Venue
Neighbourhood,Unnamed: 1_level_1
Delfshaven,18
Noord,17
Kralingen-Crooswijk,17
Hillegersberg-Schiebroek,17
Hoek van Holland,16


Now, let's visualize the restaurants on the map to make it easier to see what kind of data we are dealing with. To check which Venue Category you are dealing with, simply press on the markers.

In [28]:
center = [51.9225, 4.47917]
map_rotter = folium.Map(location=[51.9225, 4.47917], zoom_start=13)
folium.Marker(center, popup='Centrum').add_to(map_rotter)

incidents = folium.map.FeatureGroup()

# loop through the 100 crimes and add each to the incidents feature group
for lat, lng, in zip(Final.VenueLatitude, Final.VenueLongitude):
    incidents.add_child(
        folium.features.CircleMarker(
            [lat, lng],
            radius=3, # define how big you want the circle markers to be
            color='yellow',
            fill=True,
            fill_color='green',
            fill_opacity=0.6
        )
    )

# add pop-up text to each marker on the map
latitudes = list(Final.VenueLatitude)
longitudes = list(Final.VenueLongitude)
labels = list(Final.VenueCategory)

for lat, lng, label in zip(latitudes, longitudes, labels):
    folium.Marker([lat, lng], popup=label).add_to(map_rotter)    
    
# add incidents to map
map_rotter.add_child(incidents)
map_rotter

Now, let's create a density map of the locations and check how dense certain areas are.

In [29]:
heat_df = Final[['VenueLatitude', 'VenueLongitude']]
heat_df = heat_df.dropna(axis=0, subset=['VenueLatitude', 'VenueLongitude'])
heat_data = [[row['VenueLatitude'],row['VenueLongitude']] for index, row in heat_df.iterrows()]

In [30]:
from folium import plugins
from folium.plugins import HeatMap

map_rotter = folium.Map(location=center, zoom_start=13)
folium.TileLayer('cartodbpositron').add_to(map_rotter) #cartodbpositron cartodbdark_matter
HeatMap(heat_data).add_to(map_rotter)
folium.Marker(center).add_to(map_rotter)
map_rotter

It looks like there are empty pockets for restaurants to the **South** and **East** to the city centre. These areas are not dense with restaurants and cafes, therefore, competition is lower. Therefore, **Centrum** and **Kralingen** could be explored further. The reason to pick the city centre is simple: **it  is more dense** with population and popular with tourists. Since there are pockets where a restaurant can be situated, we prefer to place it in the city centre as stakeholders wanted more popular areas.

### Stadscentrum and Kralingen-Crooswijk

Rotterdam **Stadscentrum** is famous for sights like: Market Hall, Euromast, Beurstraverse (Koopgoot) with the Beurs-World Trade Center, Lijnbaan, Coolsingel with the city hall and Hofplein, Erasmusbrug, Willemsbrug, Various stations of the Rotterdam Metro, Grote of Sint-Laurenskerk, Library Rotterdam, Cube houses.

Reviews on Booking.com describe Centrum as: *"This neighbourhood is a great choice for travellers looking for museums, shops, and food."* Furthermore, it is belived to be: *"Today de Centrum is renowned for its innovative architecture which includes several modern masterpieces such as the Cube House complex and Rotterdam’s state-of-the-art Markthal."*

As for **Kralingen**, this neighbourhood is more popular among the student population. Therefore, it is always full of young and vivid souls.

CityRotterdam describes the neighbourhood as: *"Kralingen is a green and attractive neighbourhood, traditionally one of the richer areas of Rotterdam. Well-known places are the recreation area Kralingse Bos, the student pubs around Oostplein and the Erasmus University."* And Agoda.com states that: *"Kralingen-Crooswijk supplies the perfect mix of tranquility and entertainment. There are also several impressive landmarks to visit."*

Therefore, both of these neighbourhoods are full of live. They could be great candidates for a future cafe.

To understand it further, we need to define several good locations in Kralingen adn Centrum where the restaurant can be situated. For this, we will take 2 constraints in account:
1. The location should be within radius of 2.5km from the neighbourhood center.
2. The area must not be far away from the city centre.

*Firstly, the data will be generated for Kralingen Neighbourhood.*

In [31]:
def lonlat_to_xy(lon, lat):
    proj_latlon = pyproj.Proj(proj='latlong',datum='WGS84')
    proj_xy = pyproj.Proj(proj="utm", zone=33, datum='WGS84')
    xy = pyproj.transform(proj_latlon, proj_xy, lon, lat)
    return xy[0], xy[1]

events = Final.to_dict()
Kralingen_center = [51.928263, 4.50344]
Kr_center_x, Kr_center_y = lonlat_to_xy(Kralingen_center[1], Kralingen_center[0]) 
Centrum_Center = [51.922909, 4.47059]
k = math.sqrt(3) / 2 # Vertical offset for hexagonal grid cells
x_step = 200
y_step = 200 * k 
roi_x_min = Kr_center_x - 2000
roi_y_max = Kr_center_y + 1000
roi_center_x = roi_x_min + 2000
roi_center_y = roi_y_max - 2500
roi_y_min = roi_center_y - 2500


def xy_to_lonlat(x, y):
    proj_latlon = pyproj.Proj(proj='latlong',datum='WGS84')
    proj_xy = pyproj.Proj(proj="utm", zone=33, datum='WGS84')
    lonlat = pyproj.transform(proj_xy, proj_latlon, x, y)
    return lonlat[0], lonlat[1]

def calc_xy_distance(x1, y1, x2, y2):
    dx = float(x2) - float(x1)
    dy = float(y2) - float(y1)
    return math.sqrt(dx*dx + dy*dy)

roi_latitudes = []
roi_longitudes = []
roi_xs = []
roi_ys = []
for i in range(0, int(51/k)):
    y = roi_y_min + i * y_step
    x_offset = 50 if i%2==0 else 0
    for j in range(0, 51):
        x = roi_x_min + j * x_step + x_offset
        d = calc_xy_distance(Kr_center_x, Kr_center_y, x, y)
        if (d <= 2001):
            lon, lat = xy_to_lonlat(x, y)
            roi_latitudes.append(lat)
            roi_longitudes.append(lon)
            roi_xs.append(x)
            roi_ys.append(y)

print(len(roi_latitudes), 'candidate points generated.')

    
df_roi_locations = pd.DataFrame({'Latitude':roi_latitudes,
                                 'Longitude':roi_longitudes,
                                 'X':roi_xs,
                                 'Y':roi_ys})

df_roi_locations.head(10)

365 candidate points generated.


Unnamed: 0,Latitude,Longitude,X,Y
0,51.910567,4.499584,-221311.378335,5803371.0
1,51.910824,4.502443,-221111.378335,5803371.0
2,51.911082,4.505302,-220911.378335,5803371.0
3,51.91134,4.508161,-220711.378335,5803371.0
4,51.911598,4.51102,-220511.378335,5803371.0
5,51.911855,4.513879,-220311.378335,5803371.0
6,51.911775,4.495649,-221561.378335,5803544.0
7,51.912033,4.498508,-221361.378335,5803544.0
8,51.912291,4.501367,-221161.378335,5803544.0
9,51.912549,4.504226,-220961.378335,5803544.0


*Now, the same data will be generated for Centrum*

In [32]:
def lonlat_to_xy(lon, lat):
    proj_latlon = pyproj.Proj(proj='latlong',datum='WGS84')
    proj_xy = pyproj.Proj(proj="utm", zone=33, datum='WGS84')
    xy = pyproj.transform(proj_latlon, proj_xy, lon, lat)
    return xy[0], xy[1]

C_center_x, C_center_y = lonlat_to_xy(Centrum_Center[1], Centrum_Center[0]) 
Centrum_Center = [51.922909, 4.47059]
k = math.sqrt(3) / 2 # Vertical offset for hexagonal grid cells
x_step = 200
y_step = 200 * k 
roi_x_min = C_center_x - 2000
roi_y_max = C_center_y + 1000
roi_center_x = roi_x_min + 2000
roi_center_y = roi_y_max - 2500
roi_y_min = roi_center_y - 2500


def xy_to_lonlat(x, y):
    proj_latlon = pyproj.Proj(proj='latlong',datum='WGS84')
    proj_xy = pyproj.Proj(proj="utm", zone=33, datum='WGS84')
    lonlat = pyproj.transform(proj_xy, proj_latlon, x, y)
    return lonlat[0], lonlat[1]

def calc_xy_distance(x1, y1, x2, y2):
    dx = float(x2) - float(x1)
    dy = float(y2) - float(y1)
    return math.sqrt(dx*dx + dy*dy)

roi_latitudes = []
roi_longitudes = []
roi_xs = []
roi_ys = []
for i in range(0, int(51/k)):
    y = roi_y_min + i * y_step
    x_offset = 50 if i%2==0 else 0
    for j in range(0, 51):
        x = roi_x_min + j * x_step + x_offset
        d = calc_xy_distance(C_center_x, C_center_y, x, y)
        if (d <= 2001):
            lon, lat = xy_to_lonlat(x, y)
            roi_latitudes.append(lat)
            roi_longitudes.append(lon)
            roi_xs.append(x)
            roi_ys.append(y)

print(len(roi_latitudes), 'candidate points generated.')
    
df_roi1_locations = pd.DataFrame({'Latitude':roi_latitudes,
                                 'Longitude':roi_longitudes,
                                 'X':roi_xs,
                                 'Y':roi_ys})

df_roi1_locations.head(10)

365 candidate points generated.


Unnamed: 0,Latitude,Longitude,X,Y
0,51.905212,4.466748,-223647.512094,5803106.0
1,51.905471,4.469606,-223447.512094,5803106.0
2,51.905729,4.472464,-223247.512094,5803106.0
3,51.905988,4.475322,-223047.512094,5803106.0
4,51.906246,4.47818,-222847.512094,5803106.0
5,51.906505,4.481039,-222647.512094,5803106.0
6,51.90642,4.462813,-223897.512094,5803280.0
7,51.906679,4.465671,-223697.512094,5803280.0
8,51.906937,4.468529,-223497.512094,5803280.0
9,51.907196,4.471387,-223297.512094,5803280.0


Now, the generated datapoints (the step used was 200 meters) are visualized. The place where data points are more dense, shows the area where more value can be extracted as it best satisifes the criteria for both neighbourhoods.

In [33]:
good_latitudes = df_roi_locations['Latitude'].values
good_longitudes = df_roi_locations['Longitude'].values
good_latitudes1 = df_roi1_locations['Latitude'].values
good_longitudes1 = df_roi1_locations['Longitude'].values
good_locations1 = [[lat, lon] for lat, lon in zip(good_latitudes1, good_longitudes1)]
good_locations = [[lat, lon] for lat, lon in zip(good_latitudes, good_longitudes)]
map_rotter = folium.Map(location=center, zoom_start=14)
folium.TileLayer('cartodbpositron').add_to(map_rotter)
HeatMap(heat_data).add_to(map_rotter)
for lat, lon in zip(good_latitudes, good_longitudes):
    folium.CircleMarker([lat, lon], radius=2, color='green', fill=True, fill_color='green', fill_opacity=1).add_to(map_rotter)
for lat, lon in zip(good_latitudes1, good_longitudes1):
    folium.CircleMarker([lat, lon], radius=2, color='green', fill=True, fill_color='green', fill_opacity=1).add_to(map_rotter)
map_rotter

Finally, let's find the clusters for our analysis. We decided to pick 20 locations, since some of them will still fall into the locations where cafes are situated. But, as stakeholders wanted full visibility of the process, the areas that fall into heated areas will not be removed.

In [35]:
good = df_roi1_locations.append(df_roi_locations)

from sklearn.cluster import KMeans

number_of_clusters = 20

good_xys = good[['X', 'Y']].values
kmeans = KMeans(n_clusters=number_of_clusters, random_state=0).fit(good_xys)

cluster_centers = [xy_to_lonlat(cc[0], cc[1]) for cc in kmeans.cluster_centers_]

In [36]:
folium.TileLayer('cartodbpositron').add_to(map_rotter)
HeatMap(heat_data).add_to(map_rotter)
for lat, lon in zip(good_latitudes, good_longitudes):
    folium.CircleMarker([lat, lon], radius=2, color='green', fill=True, fill_color='green', fill_opacity=1).add_to(map_rotter)
for lat, lon in zip(good_latitudes1, good_longitudes1):
    folium.CircleMarker([lat, lon], radius=2, color='green', fill=True, fill_color='green', fill_opacity=1).add_to(map_rotter)
for lon, lat in cluster_centers:
    folium.Circle([lat, lon], radius=300, color='red', fill=True, fill_opacity=0.25).add_to(map_rotter)
map_rotter

Finally, to be able to better understand the locations suggested, let's plot cluster centers together with the heatmap to not loose visibility of the most "cafe-dense" areas. As you can see, some of the cluster centers do fall into cafe-dense areas, however, this is what the stakeholder wanted - visibility.

In [48]:
map_rotter1 = folium.Map(location=center, zoom_start=14)
HeatMap(heat_data).add_to(map_rotter1)
for lon, lat in cluster_centers:
    folium.CircleMarker([lat, lon], radius=5, color='red', fill=True, fill_color='red', fill_opacity=1).add_to(map_rotter1)
map_rotter1

# Results and discussion

Our analysis shows that although there is a great number of restaurants and cafes in Rotterdam (~600 in our initial area of interest which was 2500 meter from each neighbourhood center), there are pockets of low restaurant density fairly close to city center. Highest concentration of restaurants was detected west from the city centre and in Neighbourhoods such as Noord, Kralingen, Delfshaven, Schiebroek, Hoek van Holland. Therefore, we decided to focus on the more eastern part of Rotterdam, specifically east Stadscentrum and Kralingen-Crooswijk neighbourhoods.  Both of these neighbourhoods are reported to be high in popularity, either because of tourists & expats or students. Therefore, there was a lot of potential identified for these areas.

After directing our attention to this more narrow area of interest we first created a dense grid of location candidates (spaced 200m appart). Since the stakeholder wanted to get an approxiamtion of any location he can get, the locations were not filtered further. However, the heatmap was left for visibility to ensure that the stakeholder can check whether any of the given locations is in the heated area or not.

Those location candidates were then clustered to create zones of interest which contain greatest number of location candidates.

Result of all this is 20 zones containing largest number of potential new cafe locations. This, of course, does not imply that those zones are actually optimal locations for a new restaurant! Purpose of this analysis was to only highlight what locations are full with restaurants & cafes and then generate suggestions as to where new location candidates can sit. Recommended zones should therefore be considered only as a starting point for more detailed analysis which could eventually result in location which has not only no nearby competition but also other factors taken into account and all other relevant conditions met.

# Limitations and future directions

There are several recommendations that can be made as a result of the analysis:
1. More limitations and factors need to be taken into account for further analysis. Currently, we only checked for the number of restaurants in 2.5km range from neighbourhood center. However, further analysis could explore what are the trendy areas at any given moment and bring this data to the map.
2. Based on the analysis and initial constraints, 20 recommended location points were generated as a starting point. The project,though, does not separate the suggested aread into 'good' and 'bad' based on theamount of cafes nearbyas it was not the aim of the project. However, one could further explore which of these locations are situated far away from other venues.
3. A heatmap of Rotterdam with the current amount of restaurants was visualized. This can help with future explorations or it can serve as a starting point for further analysis of other neighbourhoods.

# Conclusion

In this study, I analyzed which places are the best to set up a cafe in the city of Rotterdam. I identified popularity of the location as well as the amount of the cafes in the Neighbourhood as the main determinants of what Neighbourgood is the best. I built classification models to understand which Neighbourhood is the best. These models can be very useful in helping entrepreneurs to determine the best location for a cafe in the future. As one of the benefits, these models can drive growth for cafes, increase profits, and ensure cafe popularity and success.