# Capstone Project - Venue Analysis of Malmö City

## Applied Data Science Capstone by Luqman

### Table of contents

* Introduction: Business Problem
* Data
* Methodology
* Analysis
* Results and Discussion
* Conclusion

### Introduction: Business Problem 

In this project we will try to find an optimal location for a warehouse. Specifically, this report will be targeted to stakeholders interested in opening an rice storage facility in Malmö.
Since there are lots of restaurants in Malmö, we will try to detect locations that crowded with restaurants. We are also particularly interested in areas with Indian restaurants in vicinity. 
The data for neighbourhood is not easily availabe on internet for both cities. We will use our data science powers to generate a few most promissing neighborhoods based on this criteria. Advantages of each area will then be clearly expressed so that best possible final location can be chosen by stakeholders

### Data 

Based on definition of our problem, factors that will influence our decission are:

* number of existing restaurants in the neighborhood (any type of restaurant)
* number of Indian restaurants in the neighborhood, if any

We decided to use regularly spaced grid of locations, centered around city center, to define our neighborhoods.

Following data sources will be needed to extract/generate the required information:

* centers of candidate areas will be generated algorithmically and approximate addresses of centers of those areas will be obtained using Google Maps API reverse geocoding
* number of restaurants and their type and location in every neighborhood will be obtained using Foursquare API
* coordinate of Malmo center will be obtained using Google Maps API geocoding.

### Neighborhood Candidates

Let's create latitude & longitude coordinates for centroids of our candidate neighborhoods. We will create a grid of cells covering our area of interest which is aprox. 12x12 killometers centered around center of malmo city.

Let's first find the latitude & longitude of center Malmo city which is approximately the area of Rosengård, using specific, well known address and Google Maps geocoding API.

In [1]:
# Google maps api key.
api_key = "AIzaSyDVbSIP5ZWyYLb7Fun82w3DwHOTtCz7KnM"
google_api_key = "AIzaSyDVbSIP5ZWyYLb7Fun82w3DwHOTtCz7KnM"

In [2]:
import requests

def get_coordinates(api_key, address, verbose=False):
    try:
        url = 'https://maps.googleapis.com/maps/api/geocode/json?key={}&address={}'.format(api_key, address)
        response = requests.get(url).json()
        if verbose:
            print('Google Maps API JSON result =>', response)
        results = response['results']
        geographical_data = results[0]['geometry']['location'] # get geographical coordinates
        lat = geographical_data['lat']
        lon = geographical_data['lng']
        return [lat, lon]
    except:
        return [None, None]
    
address = 'Rosengård Malmö, Sweden'
Malmo_center = get_coordinates(google_api_key, address)
print('Coordinate of {}: {}'.format(address, Malmo_center))

Coordinate of Rosengård Malmö, Sweden: [55.584091, 13.0456746]


Now let's create a grid of area candidates, equaly spaced, centered around city center and within ~6km from Rosengård. Our neighborhoods will be defined as circular areas with a radius of 300 meters, so our neighborhood centers will be 600 meters apart.

To accurately calculate distances we need to create our grid of locations in Cartesian 2D coordinate system which allows us to calculate distances in meters (not in latitude/longitude degrees). Then we'll project those coordinates back to latitude/longitude degrees to be shown on Folium map. So let's create functions to convert between WGS84 spherical coordinate system (latitude/longitude degrees) and UTM Cartesian coordinate system (X/Y coordinates in meters).

In [3]:
!pip install shapely
import shapely.geometry

!pip install pyproj
import pyproj

import math

def lonlat_to_xy(lon, lat):
    proj_latlon = pyproj.Proj(proj='latlong',datum='WGS84')
    proj_xy = pyproj.Proj(proj="utm", zone=33, datum='WGS84')
    xy = pyproj.transform(proj_latlon, proj_xy, lon, lat)
    return xy[0], xy[1]

def xy_to_lonlat(x, y):
    proj_latlon = pyproj.Proj(proj='latlong',datum='WGS84')
    proj_xy = pyproj.Proj(proj="utm", zone=33, datum='WGS84')
    lonlat = pyproj.transform(proj_xy, proj_latlon, x, y)
    return lonlat[0], lonlat[1]

def calc_xy_distance(x1, y1, x2, y2):
    dx = x2 - x1
    dy = y2 - y1
    return math.sqrt(dx*dx + dy*dy)

print('Coordinate transformation check')
print('-------------------------------')
print('Malmo center longitude={}, latitude={}'.format(Malmo_center[1], Malmo_center[0]))
x, y = lonlat_to_xy(Malmo_center[1], Malmo_center[0])
print('Malmo center UTM X={}, Y={}'.format(x, y))
lo, la = xy_to_lonlat(x, y)
print('Malmo center longitude={}, latitude={}'.format(lo, la))

Collecting shapely
  Downloading https://files.pythonhosted.org/packages/a2/6c/966fa320a88fc685c956af08135855fa84a1589631256abebf73721c26ed/Shapely-1.6.4.post2-cp35-cp35m-manylinux1_x86_64.whl (1.5MB)
[K    100% |████████████████████████████████| 1.5MB 688kB/s eta 0:00:01
[?25hInstalling collected packages: shapely
Successfully installed shapely-1.6.4.post2
Collecting pyproj
  Downloading https://files.pythonhosted.org/packages/94/a6/a8d67fe3c6097ab390da706049e27ab50bf42fc063afa49a0b484ae2203a/pyproj-2.1.3-cp35-cp35m-manylinux1_x86_64.whl (10.8MB)
[K    100% |████████████████████████████████| 10.8MB 93kB/s  eta 0:00:01
[?25hInstalling collected packages: pyproj
Successfully installed pyproj-2.1.3
Coordinate transformation check
-------------------------------
Malmo center longitude=13.0456746, latitude=55.584091
Malmo center UTM X=376815.8953909836, Y=6161525.018138531
Malmo center longitude=13.0456746, latitude=55.584090999999994


In [4]:
Malmo_center_x, Malmo_center_y = lonlat_to_xy(Malmo_center[1], Malmo_center[0]) # City center in Cartesian coordinates

k = math.sqrt(3) / 2 # Vertical offset for hexagonal grid cells
x_min = Malmo_center_x - 6000
x_step = 600
y_min = Malmo_center_y - 6000 - (int(21/k)*k*600 - 12000)/2
y_step = 600 * k 

latitudes_M = []
longitudes_M = []
distances_from_center = []
xs = []
ys = []
for i in range(0, int(21/k)):
    y = y_min + i * y_step
    x_offset = 300 if i%2==0 else 0
    for j in range(0, 21):
        x = x_min + j * x_step + x_offset
        distance_from_center = calc_xy_distance(Malmo_center_x, Malmo_center_y, x, y)
        if (distance_from_center <= 6001):
            lon, lat = xy_to_lonlat(x, y)
            latitudes_M.append(lat)
            longitudes_M.append(lon)
            distances_from_center.append(distance_from_center)
            xs.append(x)
            ys.append(y)

print(len(latitudes_M), 'candidate neighborhood centers generated.')

364 candidate neighborhood centers generated.


In [5]:
!pip install folium

import folium

map_Malmo = folium.Map(location=Malmo_center, zoom_start=13)
folium.Marker(Malmo_center, popup='Rosengård').add_to(map_Malmo)
for lat, lon in zip(latitudes_M, longitudes_M):
    #folium.CircleMarker([lat, lon], radius=2, color='blue', fill=True, fill_color='blue', fill_opacity=1).add_to(map_Malmo) 
    folium.Circle([lat, lon], radius=300, color='blue', fill=False).add_to(map_Malmo)
    #folium.Marker([lat, lon]).add_to(map_Malmo)
map_Malmo

Collecting folium
  Downloading https://files.pythonhosted.org/packages/43/77/0287320dc4fd86ae8847bab6c34b5ec370e836a79c7b0c16680a3d9fd770/folium-0.8.3-py2.py3-none-any.whl (87kB)
[K    100% |████████████████████████████████| 92kB 7.2MB/s eta 0:00:01
[?25hRequirement not upgraded as not directly required: requests in /opt/conda/envs/DSX-Python35/lib/python3.5/site-packages (from folium)
Requirement not upgraded as not directly required: jinja2 in /opt/conda/envs/DSX-Python35/lib/python3.5/site-packages (from folium)
Requirement not upgraded as not directly required: six in /opt/conda/envs/DSX-Python35/lib/python3.5/site-packages (from folium)
Collecting branca>=0.3.0 (from folium)
  Downloading https://files.pythonhosted.org/packages/63/36/1c93318e9653f4e414a2e0c3b98fc898b4970e939afeedeee6075dd3b703/branca-0.3.1-py3-none-any.whl
Requirement not upgraded as not directly required: numpy in /opt/conda/envs/DSX-Python35/lib/python3.5/site-packages (from folium)
Requirement not upgraded a

Now we have the coordinates of centers of neighborhoods/areas to be evaluated, equally spaced and within ~6km from Rosengård.

Let's now use Google Maps API to get approximate addresses of those locations.

In [6]:
def get_address(api_key, latitude_M, longitude_M, verbose=False):
    try:
        url = 'https://maps.googleapis.com/maps/api/geocode/json?key={}&latlng={},{}'.format(api_key, latitude_M, longitude_M)
        response = requests.get(url).json()
        if verbose:
            print('Google Maps API JSON result =>', response)
        results = response['results']
        address = results[0]['formatted_address']
        return address
    except:
        return None

addr = get_address(google_api_key, Malmo_center[0], Malmo_center[1])
print('Reverse geocoding check')
print('-----------------------')
print('Address of [{}, {}] is: {}'.format(Malmo_center[0], Malmo_center[1], addr))

Reverse geocoding check
-----------------------
Address of [55.584091, 13.0456746] is: von Rosens väg 60-62, 213 68 Malmö, Sweden


In [7]:
print('Obtaining location addresses: ', end='')
addresses = []
for lat, lon in zip(latitudes_M, longitudes_M):
    address = get_address(google_api_key, lat, lon)
    if address is None:
        address = 'NO ADDRESS'
    address = address.replace(', Sweden', '') # We don't need country part of address
    addresses.append(address)
    print(' .', end='')
print(' done.')

Obtaining location addresses:  . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . done.


In [8]:
addresses[150:170]

['Östra Kattarpsvägen, 212 91 Malmö',
 'Kvisslevägen 31, 212 91 Malmö',
 'Kvisslevägen 34, 212 91 Malmö',
 'Kåseholmsgatan 10, 216 22 Limhamn',
 'Barsebäcksgatan 25, 216 20 Malmö',
 'Ärtholmsvägen 140, 216 24 Malmö',
 'Unnamed Road, 215 69 Malmö',
 'Teknikergatan 29, 215 68 Malmö',
 'Eriksfältsgatan 18, 214 32 Malmö',
 'Munkhättegatan 7, 214 55 Malmö',
 'Augustenborgsgatan 19C, 214 47 Malmö',
 'Botildenborgsvägen 22, 213 62 Malmö',
 'Clematisgatan 15, 213 62 Malmö',
 'Västra Skrävlinge Kyrkoväg 22, 212 37 Malmö',
 'Amiralsgatan 101, 213 64 Malmö',
 'Soldatgatan 4, 212 33 Malmö',
 'Husie kyrkoväg 98, 212 38 Malmö',
 'Kvarnbyvägen 36, 212 36 Malmö',
 'Unnamed Road, 212 36 Malmö',
 'E20, 212 36 Malmö']

In [9]:
import pandas as pd

df_locations_M = pd.DataFrame({'Address': addresses,
                             'Latitude': latitudes_M,
                             'Longitude': longitudes_M,
                             'X': xs,
                             'Y': ys,
                             'Distance from center': distances_from_center})

df_locations_M.head(10)

Unnamed: 0,Address,Distance from center,Latitude,Longitude,X,Y
0,"Gränskullavägen 9, 218 75 Tygelsjö",5992.495307,55.532304,13.01972,375015.895391,6155809.0
1,"Vångavägen 1C, 238 41 Oxie",5840.3767,55.532457,13.029221,375615.895391,6155809.0
2,"Glostorps kyrkoväg 14, 238 41 Oxie",5747.173218,55.53261,13.038721,376215.895391,6155809.0
3,"Vångavägen 20, 238 41 Oxie",5715.767665,55.532761,13.048221,376815.895391,6155809.0
4,"Vångavägen 19, 238 41 Oxie",5747.173218,55.532912,13.057722,377415.895391,6155809.0
5,"Källstorpsvägen 12, 238 43 Oxie",5840.3767,55.533063,13.067223,378015.895391,6155809.0
6,"Högebjersvägen 1, 238 43 Oxie",5992.495307,55.533212,13.076723,378615.895391,6155809.0
7,"Trelleborgsvägen, 218 75 Tygelsjö",5855.766389,55.536739,13.005234,374115.895391,6156329.0
8,"Lockarps kyrkoväg 21, 238 41 Oxie",5604.462508,55.536893,13.014735,374715.895391,6156329.0
9,"Unnamed Road, 238 41 Oxie",5408.326913,55.537047,13.024237,375315.895391,6156329.0


In [10]:
df_locations_M.to_pickle('./locationsM.pkl')  

### Foursquare

Now that we have our location candidates, let's use Foursquare API to get info on restaurants in each neighborhood.

We're interested in venues in 'food' category, but only those that are proper restaurants - coffe shops, pizza places, bakeries etc. are not direct competitors so we don't care about those. So we will include in out list only venues that have 'restaurant' in category name, and we'll make sure to detect and include all the subcategories of specific 'Indian restaurant' category, as we need info on Indian restaurants in the neighborhood.

In [21]:
CLIENT_ID = 'POL2HDMGEBWMLTXTWMWLIVGWMXWMQ3DTS1C32P34IHBYGIOR' # your Foursquare ID
CLIENT_SECRET = 'NP4JV2DGPWAXQWXM51F3SQD1AUCTOKKY4KE2AFECPHVIE4HQ' # your Foursquare Secret
VERSION = '20180604'
categoryId ='4d4b7105d754a06374d81259'

In [22]:
F_latitude = df_locations_M.loc[0, 'Latitude'] # neighborhood latitude value
F_longitude = df_locations_M.loc[0, 'Longitude'] # neighborhood longitude value

neighbourhood_name = df_locations_M.loc[0, 'Address'] # neighborhood name

print('Latitude and longitude values of {} are {}, {}.'.format(neighbourhood_name, 
                                                               F_latitude, 
                                                               F_longitude))

Latitude and longitude values of Gränskullavägen 9, 218 75 Tygelsjö are 55.53230381256916, 13.019720498077456.


In [23]:
LIMIT = 100 # limit of number of venues returned by Foursquare API
radius = 1000 # define radius
url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&categoryId={}&ll={},{}&radius={}&limit={}'.format(
    CLIENT_ID, 
    CLIENT_SECRET, 
    VERSION,
    categoryId, 
    F_latitude, 
    F_longitude, 
    radius, 
    LIMIT)

In [24]:
Fresults = requests.get(url).json()

In [25]:
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

In [26]:
from pandas.io.json import json_normalize
venues = Fresults['response']['groups'][0]['items']
    
nearby_venues = json_normalize(venues) # flatten JSON

# filter columns
filtered_columns = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
nearby_venues =nearby_venues.loc[:, filtered_columns]

# filter the category for each row
nearby_venues['venue.categories'] = nearby_venues.apply(get_category_type, axis=1)

# clean columns
nearby_venues.columns = [col.split(".")[-1] for col in nearby_venues.columns]

nearby_venues.head()

Unnamed: 0,name,categories,lat,lng
0,Vindåkra Gård,French Restaurant,55.533883,13.005621


In [27]:
def getNearbyVenues(names, latitudes, longitudes, radius=100):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)

            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&categoryId={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION,
            categoryId,
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results_M = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results_M])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['District', 
                  'District Latitude', 
                  'District Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [28]:
malmo_venues = getNearbyVenues(names=df_locations_M['Address'],
                      
                                   latitudes=df_locations_M['Latitude'],
                                   longitudes=df_locations_M['Longitude']
                                  )

Gränskullavägen 9, 218 75 Tygelsjö
Vångavägen 1C, 238 41 Oxie
Glostorps kyrkoväg 14, 238 41 Oxie
Vångavägen 20, 238 41 Oxie
Vångavägen 19, 238 41 Oxie
Källstorpsvägen 12, 238 43 Oxie
Högebjersvägen 1, 238 43 Oxie
Trelleborgsvägen, 218 75 Tygelsjö
Lockarps kyrkoväg 21, 238 41 Oxie
Unnamed Road, 238 41 Oxie
11, 238 41 Oxie
Sofiedalsvägen, 238 41 Oxie
Sofiedalsvägen, 238 41 Oxie
Vångavägen 33, 238 41 Oxie
Gustaf Pålssons väg 69, 238 43 Oxie
Planetgatan 25, 238 37 Oxie
Lergodsvägen 1Y, 238 40 Oxie
Lockarps kyrkoväg 3, 218 75 Tygelsjö
Herrgårdsvägen 12, 218 75 Tygelsjö
Lockarps kyrkoväg 16, 238 41 Oxie
Lockarps kyrkoväg 29A, 238 41 Oxie
Lockarps kyrkoväg 41, 238 41 Oxie
Lockarpsvägen 30, 238 41 Oxie
Lockarps Bangårdsväg 6, 238 41 Oxie
Lockarps kyrkoväg 96, 238 41 Oxie
Käglingevägen 75, 238 37 Malmö
Kristinas gata 37, 238 37 Oxie
Kristinebergsvägen 59, 238 37 Oxie
Formarevägen 14, 238 31 Oxie
Stenhögagatan 11, 238 31 Oxie
E20, 215 86 Tygelsjö
Grophusgatan 2, 215 86 Malmö
Trelleborgsvägen, 21

In [29]:
print(malmo_venues.shape)
malmo_venues.head()

(72, 7)


Unnamed: 0,District,District Latitude,District Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,"Skogholmsgatan 12, 213 76 Malmö",55.556169,13.051818,Atria Skandinavia Restaurant,55.556055,13.052688,Restaurant
1,"Hyllie Gårds väg 1, 216 23 Malmö",55.564111,12.965784,Konditori Katarina Emporia,55.564636,12.966266,Bakery
2,"Hyllie Boulevard 13B, 215 32 Malmö",55.564268,12.975292,Percy´s Restaurant & Bar,55.56483,12.975995,Scandinavian Restaurant
3,"Hyllie Boulevard 13B, 215 32 Malmö",55.564268,12.975292,Jensens Bøfhus,55.563446,12.974911,Steakhouse
4,"Hyllie Boulevard 13B, 215 32 Malmö",55.564268,12.975292,China Box,55.563696,12.976237,Chinese Restaurant


In [30]:
food_category = '4d4b7105d754a06374d81259' # 'Root' category for all food-related venues

indian_restaurant_categories = ['4bf58dd8d48988d10f941735']

def is_restaurant(categories, specific_filter=None):
    restaurant_words = ['restaurant', 'diner', 'taverna', 'steakhouse']
    restaurant = False
    specific = False
    for c in categories:
        category_name = c[0].lower()
        category_id = c[1]
        for r in restaurant_words:
            if r in category_name:
                restaurant = True
        if 'fast food' in category_name:
            restaurant = False
        if not(specific_filter is None) and (category_id in specific_filter):
            specific = True
            restaurant = True
    return restaurant, specific

def get_categories(categories):
    return [(cat['name'], cat['id']) for cat in categories]

def format_address(location):
    address = ', '.join(location['formattedAddress'])
    address = address.replace(', Sverige', '')
    address = address.replace(', Sweden', '')
    return address

def get_venues_near_location(lat, lon, category, client_id, client_secret, radius=500, limit=100):
    version = '20180724'
    url = 'https://api.foursquare.com/v2/venues/explore?client_id={}&client_secret={}&v={}&ll={},{}&categoryId={}&radius={}&limit={}'.format(
        client_id, client_secret, version, lat, lon, category, radius, limit)
    try:
        results = requests.get(url).json()['response']['groups'][0]['items']
        venues = [(item['venue']['id'],
                   item['venue']['name'],
                   get_categories(item['venue']['categories']),
                   (item['venue']['location']['lat'], item['venue']['location']['lng']),
                   format_address(item['venue']['location']),
                   item['venue']['location']['distance']) for item in results]        
    except:
        venues = []
    return venues

In [31]:
import pickle

def get_restaurants(lats, lons):
    restaurants = {}
    indian_restaurants = {}
    location_restaurants = []

    print('Obtaining venues around candidate locations:', end='')
    for lat, lon in zip(lats, lons):
        # Using radius=350 to meke sure we have overlaps/full coverage so we don't miss any restaurant (we're using dictionaries to remove any duplicates resulting from area overlaps)
        venues = get_venues_near_location(lat, lon, food_category,CLIENT_ID, CLIENT_SECRET, radius=350, limit=100)
        area_restaurants = []
        for venue in venues:
            venue_id = venue[0]
            venue_name = venue[1]
            venue_categories = venue[2]
            venue_latlon = venue[3]
            venue_address = venue[4]
            venue_distance = venue[5]
            is_res, is_indian = is_restaurant(venue_categories, specific_filter=indian_restaurant_categories)
            if is_res:
                x, y = lonlat_to_xy(venue_latlon[1], venue_latlon[0])
                restaurant = (venue_id, venue_name, venue_latlon[0], venue_latlon[1], venue_address, venue_distance, is_indian, x, y)
                if venue_distance<=300:
                    area_restaurants.append(restaurant)
                restaurants[venue_id] = restaurant
                if is_indian:
                    indian_restaurants[venue_id] = restaurant
        location_restaurants.append(area_restaurants)
        print(' .', end='')
    print(' done.')
    return restaurants, indian_restaurants, location_restaurants

# Try to load from local file system in case we did this before
restaurants = {}
indian_restaurants = {}
location_restaurants = []
loaded = False
try:
    with open('restaurants_350.pkl', 'rb') as f:
        restaurants = pickle.load(f)
    with open('indian_restaurants_350.pkl', 'rb') as f:
        indian_restaurants = pickle.load(f)
    with open('location_restaurants_350.pkl', 'rb') as f:
        location_restaurants = pickle.load(f)
    print('Restaurant data loaded.')
    loaded = True
except:
    pass

# If load failed use the Foursquare API to get the data
if not loaded:
    restaurants, indian_restaurants, location_restaurants = get_restaurants(latitudes_M, longitudes_M)
    
    # Let's persists this in local file system
    with open('restaurants_350.pkl', 'wb') as f:
        pickle.dump(restaurants, f)
    with open('indian_restaurants_350.pkl', 'wb') as f:
        pickle.dump(indian_restaurants, f)
    with open('location_restaurants_350.pkl', 'wb') as f:
        pickle.dump(location_restaurants, f)
        

Obtaining venues around candidate locations: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . done.


In [32]:
import numpy as np

print('Total number of restaurants:', len(restaurants))
print('Total number of Indian restaurants:', len(indian_restaurants))
print('Percentage of Indian restaurants: {:.2f}%'.format(len(indian_restaurants) / len(restaurants) * 100))
print('Average number of restaurants in neighborhood:', np.array([len(r) for r in location_restaurants]).mean())

Total number of restaurants: 335
Total number of Indian restaurants: 14
Percentage of Indian restaurants: 4.18%
Average number of restaurants in neighborhood: 0.815934065934


In [33]:
print('List of all restaurants')
print('-----------------------')
for r in list(restaurants.values())[:10]:
    print(r)
print('...')
print('Total:', len(restaurants))

List of all restaurants
-----------------------
('4bcedd75cc8cd13a53ebc4cf', 'Vegegården', 55.603915272349944, 13.008453524761617, 'Rörsjögatan 23, 211 37 Malmö', 346, False, 374533.414622216, 6163797.440034788)
('4b783739f964a5200abd2ee3', 'Värnhems Falafel', 55.605787620549904, 13.0245152444612, 'Lundavägen 1, 212 18 Malmö', 159, False, 375551.08007028105, 6163976.856414921)
('4c52dcd67f6e20a1005737ec', "Ma'vera Restaurang & Lounge", 55.61488310618305, 12.972231116558996, 'Barometergatan 58, 211 17 Malmö', 337, False, 372287.4144497439, 6165083.8393701045)
('525d1838498e2407cc5b1265', 'Shawarma Specialisten', 55.59108531786762, 13.009609968712697, 'Ystadgatan 4, Malmö', 247, False, 374565.32610499015, 6162367.816549143)
('4bd17487b221c9b69a88d5d0', 'Di Penco', 55.5955402, 12.9940706, 'Roskildevägen 3, 211 47 Malmö', 288, False, 373600.5000699743, 6162891.674020107)
('57e6e83b498e68dd61b2b526', 'Neighbaren', 55.597179, 13.041396, 'Malmö', 247, False, 376587.3327653375, 6162988.8711742

In [34]:
print('List of Indian restaurants')
print('---------------------------')
for r in list(indian_restaurants.values())[:10]:
    print(r)
print('...')
print('Total:', len(indian_restaurants))

List of Indian restaurants
---------------------------
('4b544d54f964a520e8b627e3', 'Restaurang Indian Haweli', 55.592133, 13.0056537, 'Södra förstadsgatan 88, 214 20 Malmö', 215, True, 374319.3867746268, 6162491.542147009)
('4ed8f5cd9911a3e78b6c708b', 'Ariana Restaurang', 55.58931374410876, 13.015261609901609, 'Nobelvägen 73a, Malmö', 152, True, 374915.80628735805, 6162160.505062286)
('5511c40a498ed87a67c6adeb', 'Sájjva', 55.60662192645805, 13.020114147030016, 'Malmö', 341, True, 375276.51357142074, 6164077.5852913065)
('4b5991b7f964a520e78c28e3', 'Masala House', 55.60502200340942, 13.00672769719945, 'Baltzarsgatan 12, 211 36 Malmö', 199, True, 374428.2404529763, 6163923.7018871745)
('4b4f55e8f964a520d00127e3', 'Restaurang Indian Express', 55.59298895376396, 13.007745877506599, 'Bergsgatan 35, 214 22 Malmö', 262, True, 374453.94664092653, 6162582.995029307)
('58013026d67c8ac5eb6abdc1', 'Kontrast', 55.59190673524471, 13.008120966393204, 'Malmö', 262, True, 374474.12506848236, 6162461.9

In [35]:
print('Restaurants around location')
print('---------------------------')
for i in range(100, 110):
    rs = location_restaurants[i][:8]
    names = ', '.join([r[1] for r in rs])
    print('Restaurants around location {}: {}'.format(i+1, names))

Restaurants around location
---------------------------
Restaurants around location 101: 
Restaurants around location 102: 
Restaurants around location 103: 
Restaurants around location 104: 
Restaurants around location 105: Baguetteboden, Fosierestaurangen
Restaurants around location 106: Asian & Thai Food
Restaurants around location 107: 
Restaurants around location 108: 
Restaurants around location 109: 
Restaurants around location 110: 


Let's now see all the collected restaurants in our area of interest on map, and let's also show Italian restaurants in different color.

In [36]:
map_Malmo = folium.Map(location=Malmo_center, zoom_start=13)
folium.Marker(Malmo_center, popup='Rosengård').add_to(map_Malmo)
for res in restaurants.values():
    lat = res[2]; lon = res[3]
    is_indian = res[6]
    color = 'red' if is_indian else 'blue'
    folium.CircleMarker([lat, lon], radius=3, color=color, fill=True, fill_color=color, fill_opacity=1).add_to(map_Malmo)
map_Malmo

Looking good. So now we have all the restaurants in area within few kilometers from Rosengård, and we know which ones are Indian restaurants! We also know which restaurants exactly are in vicinity of every neighborhood candidate center.

This concludes the data gathering phase - we're now ready to use this data for analysis to produce the report on optimal locations for a new Italian restaurant!

### Methodology 

In this project we will direct our efforts on detecting areas of Berlin that have low restaurant density, particularly those with low number of Italian restaurants. We will limit our analysis to area ~6km around city center.

In first step we have collected the required data: location and type (category) of every restaurant within 6km from Berlin center (Alexanderplatz). We have also identified Italian restaurants (according to Foursquare categorization).

Second step in our analysis will be calculation and exploration of 'restaurant density' across different areas of Berlin - we will use heatmaps to identify a few promising areas close to center with low number of restaurants in general (and no Italian restaurants in vicinity) and focus our attention on those areas.

In third and final step we will focus on most promising areas and within those create clusters of locations that meet some basic requirements established in discussion with stakeholders: we will take into consideration locations with no more than two restaurants in radius of 250 meters, and we want locations without Italian restaurants in radius of 400 meters. We will present map of all such locations but also create clusters (using k-means clustering) of those locations to identify general zones / neighborhoods / addresses which should be a starting point for final 'street level' exploration and search for optimal venue location by stakeholders.

### Analysis

Let's perform some basic explanatory data analysis and derive some additional info from our raw data. First let's count the number of restaurants in every area candidate:

In [37]:
location_restaurants_count = [len(res) for res in location_restaurants]

df_locations_M['Restaurants in area'] = location_restaurants_count

print('Average number of restaurants in every area with radius=300m:', np.array(location_restaurants_count).mean())

df_locations_M.head(10)

Average number of restaurants in every area with radius=300m: 0.815934065934


Unnamed: 0,Address,Distance from center,Latitude,Longitude,X,Y,Restaurants in area
0,"Gränskullavägen 9, 218 75 Tygelsjö",5992.495307,55.532304,13.01972,375015.895391,6155809.0,0
1,"Vångavägen 1C, 238 41 Oxie",5840.3767,55.532457,13.029221,375615.895391,6155809.0,0
2,"Glostorps kyrkoväg 14, 238 41 Oxie",5747.173218,55.53261,13.038721,376215.895391,6155809.0,0
3,"Vångavägen 20, 238 41 Oxie",5715.767665,55.532761,13.048221,376815.895391,6155809.0,0
4,"Vångavägen 19, 238 41 Oxie",5747.173218,55.532912,13.057722,377415.895391,6155809.0,0
5,"Källstorpsvägen 12, 238 43 Oxie",5840.3767,55.533063,13.067223,378015.895391,6155809.0,0
6,"Högebjersvägen 1, 238 43 Oxie",5992.495307,55.533212,13.076723,378615.895391,6155809.0,0
7,"Trelleborgsvägen, 218 75 Tygelsjö",5855.766389,55.536739,13.005234,374115.895391,6156329.0,0
8,"Lockarps kyrkoväg 21, 238 41 Oxie",5604.462508,55.536893,13.014735,374715.895391,6156329.0,0
9,"Unnamed Road, 238 41 Oxie",5408.326913,55.537047,13.024237,375315.895391,6156329.0,0


OK, now let's calculate the distance to nearest Indian restaurant from every area candidate center (not only those within 300m - we want distance to closest one, regardless of how distant it is).

In [38]:
distances_to_indian_restaurant = []

for area_x, area_y in zip(xs, ys):
    min_distance = 10000
    for res in indian_restaurants.values():
        res_x = res[7]
        res_y = res[8]
        d = calc_xy_distance(area_x, area_y, res_x, res_y)
        if d<min_distance:
            min_distance = d
    distances_to_indian_restaurant.append(min_distance)

df_locations_M['Distance to Indian restaurant'] = distances_to_indian_restaurant

In [41]:
df_locations_M.head(10)

Unnamed: 0,Address,Distance from center,Latitude,Longitude,X,Y,Restaurants in area,Distance to Indian restaurant
0,"Gränskullavägen 9, 218 75 Tygelsjö",5992.495307,55.532304,13.01972,375015.895391,6155809.0,0,6352.043189
1,"Vångavägen 1C, 238 41 Oxie",5840.3767,55.532457,13.029221,375615.895391,6155809.0,0,6389.722968
2,"Glostorps kyrkoväg 14, 238 41 Oxie",5747.173218,55.53261,13.038721,376215.895391,6155809.0,0,6482.951992
3,"Vångavägen 20, 238 41 Oxie",5715.767665,55.532761,13.048221,376815.895391,6155809.0,0,6629.38711
4,"Vångavägen 19, 238 41 Oxie",5747.173218,55.532912,13.057722,377415.895391,6155809.0,0,6825.604763
5,"Källstorpsvägen 12, 238 43 Oxie",5840.3767,55.533063,13.067223,378015.895391,6155809.0,0,7067.459749
6,"Högebjersvägen 1, 238 43 Oxie",5992.495307,55.533212,13.076723,378615.895391,6155809.0,0,7350.448573
7,"Trelleborgsvägen, 218 75 Tygelsjö",5855.766389,55.536739,13.005234,374115.895391,6156329.0,0,5886.244551
8,"Lockarps kyrkoväg 21, 238 41 Oxie",5604.462508,55.536893,13.014735,374715.895391,6156329.0,0,5835.064853
9,"Unnamed Road, 238 41 Oxie",5408.326913,55.537047,13.024237,375315.895391,6156329.0,0,5845.347617


In [42]:
print('Average distance to closest Indian restaurant from each area center:', df_locations_M['Distance to Indian restaurant'].mean())



Average distance to closest Indian restaurant from each area center: 3721.463037302315


OK, so on average Indian restaurant can't be found within ~500m from every area center candidate. That's far, so we need to filter our areas carefully

### Analyzing Each Neighborhood

In [44]:
# one hot encoding
malmo_onehot = pd.get_dummies(malmo_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
malmo_onehot['Neighbourhood'] = malmo_venues['District'] 

# move neighborhood column to the first column
fixed_columns = [malmo_onehot.columns[-1]] + list(malmo_onehot.columns[:-1])
malmo_onehot = malmo_onehot[fixed_columns]

malmo_onehot.head()

Unnamed: 0,Neighbourhood,American Restaurant,Asian Restaurant,BBQ Joint,Bakery,Burger Joint,Café,Chinese Restaurant,Deli / Bodega,Eastern European Restaurant,...,Pizza Place,Restaurant,Salad Place,Sandwich Place,Scandinavian Restaurant,Steakhouse,Sushi Restaurant,Taco Place,Tapas Restaurant,Thai Restaurant
0,"Skogholmsgatan 12, 213 76 Malmö",0,0,0,0,0,0,0,0,0,...,0,1,0,0,0,0,0,0,0,0
1,"Hyllie Gårds väg 1, 216 23 Malmö",0,0,0,1,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,"Hyllie Boulevard 13B, 215 32 Malmö",0,0,0,0,0,0,0,0,0,...,0,0,0,0,1,0,0,0,0,0
3,"Hyllie Boulevard 13B, 215 32 Malmö",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,1,0,0,0,0
4,"Hyllie Boulevard 13B, 215 32 Malmö",0,0,0,0,0,0,1,0,0,...,0,0,0,0,0,0,0,0,0,0


In [45]:
malmo_onehot.shape

(72, 31)

Next, let's group rows by neighborhood and by taking the mean of the frequency of occurrence of each category

In [46]:
malmo_grouped = malmo_onehot.groupby('Neighbourhood').mean().reset_index()
malmo_grouped

Unnamed: 0,Neighbourhood,American Restaurant,Asian Restaurant,BBQ Joint,Bakery,Burger Joint,Café,Chinese Restaurant,Deli / Bodega,Eastern European Restaurant,...,Pizza Place,Restaurant,Salad Place,Sandwich Place,Scandinavian Restaurant,Steakhouse,Sushi Restaurant,Taco Place,Tapas Restaurant,Thai Restaurant
0,"Annelundsgatan 56, 214 44 Malmö",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,"BODEKULLSGÅNGEN 21B, 214 40 Malmö",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.333333,0.333333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,"Bertrandsgatan 6 U6, 212 14 Malmö",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0
3,"Borrgatan 25, 211 24 Malmö",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,"Botildenborgsvägen 22, 213 62 Malmö",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
5,"Calle Ljungbecks gata 46, 212 40 Malmö",0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
6,"Eriksfältsgatan 18, 214 32 Malmö",0.0,0.0,0.0,0.0,0.0,0.5,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
7,"Fredsgatan 29, 212 12 Malmö",0.2,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
8,"Fågelbacksgatan 5-7, 217 44 Malmö",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
9,"Grimsbygatan 24, 211 20 Malmö",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0


In [47]:
malmo_grouped.shape

(38, 31)

In [48]:
num_top_venues = 5

for hood in malmo_grouped['Neighbourhood']:
    print("----"+hood+"----")
    temp = malmo_grouped[malmo_grouped['Neighbourhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----Annelundsgatan 56, 214 44 Malmö----
                 venue  freq
0   Falafel Restaurant   1.0
1  American Restaurant   0.0
2        Hot Dog Joint   0.0
3     Tapas Restaurant   0.0
4           Taco Place   0.0


----BODEKULLSGÅNGEN 21B, 214 40 Malmö----
                 venue  freq
0           Restaurant  0.33
1          Pizza Place  0.33
2                 Food  0.33
3  American Restaurant  0.00
4        Hot Dog Joint  0.00


----Bertrandsgatan 6 U6, 212 14 Malmö----
                 venue  freq
0       Sandwich Place   1.0
1  American Restaurant   0.0
2     Asian Restaurant   0.0
3     Tapas Restaurant   0.0
4           Taco Place   0.0


----Borrgatan 25, 211 24 Malmö----
                 venue  freq
0   Italian Restaurant   1.0
1  American Restaurant   0.0
2     Asian Restaurant   0.0
3     Tapas Restaurant   0.0
4           Taco Place   0.0


----Botildenborgsvägen 22, 213 62 Malmö----
                 venue  freq
0           Restaurant   1.0
1  American Restaurant   0.0
2     

##### Let's put that into a pandas dataframe

First, let's write a function to sort the venues in descending order.

In [49]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

Now let's create the new dataframe and display the top 10 venues for each neighborhood.

In [51]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighbourhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighbourhoods_venues_sorted = pd.DataFrame(columns=columns)
neighbourhoods_venues_sorted['Neighbourhood'] = malmo_grouped['Neighbourhood']

for ind in np.arange(malmo_grouped.shape[0]):
    neighbourhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(malmo_grouped.iloc[ind, :], num_top_venues)

neighbourhoods_venues_sorted.head()

Unnamed: 0,Neighbourhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,"Annelundsgatan 56, 214 44 Malmö",Falafel Restaurant,Thai Restaurant,Tapas Restaurant,Asian Restaurant,BBQ Joint,Bakery,Burger Joint,Café,Chinese Restaurant,Deli / Bodega
1,"BODEKULLSGÅNGEN 21B, 214 40 Malmö",Restaurant,Pizza Place,Food,Thai Restaurant,Food Truck,Asian Restaurant,BBQ Joint,Bakery,Burger Joint,Café
2,"Bertrandsgatan 6 U6, 212 14 Malmö",Sandwich Place,Thai Restaurant,Food Truck,Asian Restaurant,BBQ Joint,Bakery,Burger Joint,Café,Chinese Restaurant,Deli / Bodega
3,"Borrgatan 25, 211 24 Malmö",Italian Restaurant,Thai Restaurant,Food Truck,Asian Restaurant,BBQ Joint,Bakery,Burger Joint,Café,Chinese Restaurant,Deli / Bodega
4,"Botildenborgsvägen 22, 213 62 Malmö",Restaurant,Thai Restaurant,Food Truck,Asian Restaurant,BBQ Joint,Bakery,Burger Joint,Café,Chinese Restaurant,Deli / Bodega


#### Cluster Neighborhoods

In [54]:
from sklearn.cluster import KMeans
# set number of clusters
kclusters = 5

malmo_grouped_clustering = malmo_grouped.drop('Neighbourhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(malmo_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

array([2, 2, 2, 1, 3, 2, 4, 2, 0, 2], dtype=int32)

Let's create a new dataframe that includes the cluster as well as the top 10 venues for each neighborhood.

In [93]:
# add clustering labels
#neighbourhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

malmo_merged = df_locations_M
# merge toronto_grouped with toronto_data to add latitude/longitude for each neighborhood
malmo_merged = malmo_merged.join(neighbourhoods_venues_sorted.set_index('Neighbourhood'), on='Address')

malmo_merged.head() # check the last columns!

Unnamed: 0,Address,Distance from center,Latitude,Longitude,X,Y,Restaurants in area,Distance to Indian restaurant,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,"Gränskullavägen 9, 218 75 Tygelsjö",5992.495307,55.532304,13.01972,375015.895391,6155809.0,0,6352.043189,,,,,,,,,,,
1,"Vångavägen 1C, 238 41 Oxie",5840.3767,55.532457,13.029221,375615.895391,6155809.0,0,6389.722968,,,,,,,,,,,
2,"Glostorps kyrkoväg 14, 238 41 Oxie",5747.173218,55.53261,13.038721,376215.895391,6155809.0,0,6482.951992,,,,,,,,,,,
3,"Vångavägen 20, 238 41 Oxie",5715.767665,55.532761,13.048221,376815.895391,6155809.0,0,6629.38711,,,,,,,,,,,
4,"Vångavägen 19, 238 41 Oxie",5747.173218,55.532912,13.057722,377415.895391,6155809.0,0,6825.604763,,,,,,,,,,,


In [94]:
#!pip install colored --upgrade
from colored import fg, bg, attr
from bs4 import BeautifulSoup
import numpy as np # library to handle data in a vectorized manner
import pandas as pd # library for data analsysis
import requests
from bs4 import BeautifulSoup
import time
from colour import Color
import matplotlib.cm as cm
import matplotlib.colors as colors
import matplotlib.cm as cm
import numpy as np
import matplotlib.pyplot as plt

In [118]:
ros_lat = 55.584091 
ros_lon = 13.0456746

In [122]:
# create map
map_clusters = folium.Map(location=[ros_lat, ros_lon], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(malmo_merged['Latitude'], malmo_merged['Longitude'], malmo_merged['Address'], malmo_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color="rainbow[cluster-1]",
        fill=True,
        fill_color="rainbow[cluster-1]",
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

### Cluster Examination

#### Cluster 1

In [110]:
malmo_merged.loc[malmo_merged['Cluster Labels'] == 0, malmo_merged.columns[[0] + list(range(5, malmo_merged.shape[1]))]]

Unnamed: 0,Address,Y,Restaurants in area,Distance to Indian restaurant,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
153,"Kåseholmsgatan 10, 216 22 Limhamn",6161005.0,0,3261.72839,0.0,Pizza Place,Thai Restaurant,Food Truck,Asian Restaurant,BBQ Joint,Bakery,Burger Joint,Café,Chinese Restaurant,Deli / Bodega
194,"Vilebovägen 31, 217 63 Malmö",6162045.0,1,1761.138932,0.0,Pizza Place,Thai Restaurant,Food Truck,Asian Restaurant,BBQ Joint,Bakery,Burger Joint,Café,Chinese Restaurant,Deli / Bodega
234,"Fågelbacksgatan 5-7, 217 44 Malmö",6163084.0,0,1208.805573,0.0,Pizza Place,Thai Restaurant,Food Truck,Asian Restaurant,BBQ Joint,Bakery,Burger Joint,Café,Chinese Restaurant,Deli / Bodega


#### Cluster 2

In [115]:
malmo_merged.loc[malmo_merged['Cluster Labels'] == 1, malmo_merged.columns[[0] + list(range(5, malmo_merged.shape[1]))]]

Unnamed: 0,Address,Y,Restaurants in area,Distance to Indian restaurant,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
110,"Minnesdalsvägen 17, 212 91 Malmö",6159447.0,1,5865.700286,1.0,Italian Restaurant,Thai Restaurant,Food Truck,Asian Restaurant,BBQ Joint,Bakery,Burger Joint,Café,Chinese Restaurant,Deli / Bodega
177,"Per Albin Hanssons väg 35, 214 32 Malmö",6161525.0,4,971.229247,1.0,BBQ Joint,Italian Restaurant,Thai Restaurant,Food Truck,Asian Restaurant,Bakery,Burger Joint,Café,Chinese Restaurant,Deli / Bodega
251,"Nicoloviusgatan 6, 217 57 Malmö",6163603.0,2,1210.427781,1.0,Italian Restaurant,Thai Restaurant,Food Truck,Asian Restaurant,BBQ Joint,Bakery,Burger Joint,Café,Chinese Restaurant,Deli / Bodega
288,"Sundspromenaden 23, 211 16 Malmö",6164643.0,4,492.755116,1.0,Italian Restaurant,Thai Restaurant,Food Truck,Asian Restaurant,BBQ Joint,Bakery,Burger Joint,Café,Chinese Restaurant,Deli / Bodega
311,"Borrgatan 25, 211 24 Malmö",6165162.0,1,1224.080297,1.0,Italian Restaurant,Thai Restaurant,Food Truck,Asian Restaurant,BBQ Joint,Bakery,Burger Joint,Café,Chinese Restaurant,Deli / Bodega


#### CLuster 3

In [112]:
malmo_merged.loc[malmo_merged['Cluster Labels'] == 3, malmo_merged.columns[[0] + list(range(5, malmo_merged.shape[1]))]]

Unnamed: 0,Address,Y,Restaurants in area,Distance to Indian restaurant,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
68,"Skogholmsgatan 12, 213 76 Malmö",6158407.0,1,4350.487329,3.0,Restaurant,Thai Restaurant,Food Truck,Asian Restaurant,BBQ Joint,Bakery,Burger Joint,Café,Chinese Restaurant,Deli / Bodega
124,"Jägersrovägen 151, 213 75 Malmö",6159966.0,2,3326.49084,3.0,Restaurant,Thai Restaurant,Food Truck,Asian Restaurant,BBQ Joint,Bakery,Burger Joint,Café,Chinese Restaurant,Deli / Bodega
161,"Botildenborgsvägen 22, 213 62 Malmö",6161005.0,1,1739.106866,3.0,Restaurant,Thai Restaurant,Food Truck,Asian Restaurant,BBQ Joint,Bakery,Burger Joint,Café,Chinese Restaurant,Deli / Bodega
201,"von Lingens väg 50, 213 71 Malmö",6162045.0,2,1903.618882,3.0,Restaurant,Thai Restaurant,Food Truck,Asian Restaurant,BBQ Joint,Bakery,Burger Joint,Café,Chinese Restaurant,Deli / Bodega
236,"Torpgatan 5, 211 52 Malmö",6163084.0,7,502.312143,3.0,Restaurant,Thai Restaurant,Food Truck,Asian Restaurant,BBQ Joint,Bakery,Burger Joint,Café,Chinese Restaurant,Deli / Bodega
289,"Stora Varvsgatan 17, 211 19 Malmö",6164643.0,4,131.297509,3.0,Sandwich Place,Restaurant,Thai Restaurant,Food Truck,Asian Restaurant,BBQ Joint,Bakery,Burger Joint,Café,Chinese Restaurant


#### Cluster 4

In [113]:
malmo_merged.loc[malmo_merged['Cluster Labels'] == 4, malmo_merged.columns[[0] + list(range(5, malmo_merged.shape[1]))]]

Unnamed: 0,Address,Y,Restaurants in area,Distance to Indian restaurant,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
158,"Eriksfältsgatan 18, 214 32 Malmö",6161005.0,3,1258.638915,4.0,Café,Falafel Restaurant,Thai Restaurant,Tapas Restaurant,Asian Restaurant,BBQ Joint,Bakery,Burger Joint,Chinese Restaurant,Deli / Bodega
216,"Kapellgatan 14 U4, 214 21 Malmö",6162564.0,18,193.697972,4.0,Café,Thai Restaurant,Tapas Restaurant,Asian Restaurant,BBQ Joint,Bakery,Burger Joint,Chinese Restaurant,Deli / Bodega,Eastern European Restaurant
235,"Rönngatan 5A, 211 47 Malmö",6163084.0,17,712.731142,4.0,Café,Thai Restaurant,Tapas Restaurant,Asian Restaurant,BBQ Joint,Bakery,Burger Joint,Chinese Restaurant,Deli / Bodega,Eastern European Restaurant
255,"Kungsgatan 13, 211 49 Malmö",6163603.0,5,430.450917,4.0,Café,Thai Restaurant,Tapas Restaurant,Asian Restaurant,BBQ Joint,Bakery,Burger Joint,Chinese Restaurant,Deli / Bodega,Eastern European Restaurant


### Results and Discussion

Our analysis shows that although there is a great number of restaurants in Malmo (~300 in our initial area of interest which was 12x12km around Rosengård), there are pockets of low restaurant density away from city center. Highest concentration of restaurants was detected in centrum, so we focused our attention to centrum area , which offer a combination of popularity among tourists, closeness to city center, strong socio-economic dynamics and a number of pockets of high restaurant density.

After directing our attention to this more narrow area of interest we first created a dense grid of location candidates (spaced 100m appart); those locations were then filtered so that those with more than two restaurants in radius of 250m and those with an Indian restaurant closer than 400m were removed.

Those location candidates were then clustered to create zones of interest which contain greatest number of location candidates. Addresses of centers of those zones were also generated using reverse geocoding to be used as markers/starting points for more detailed local analysis based on other factors.

Result of all zones containing largest number of potential restaurant customers locations based on number of and distance to existing venues - both restaurants in general and Indian restaurants particularly. This, of course, imply that those zones are actually optimal locations for a new customers! Purpose of this analysis was to only provide info on areas close to center but  crowded with existing restaurants (particularly Indian) - it is entirely possible that there is a very good reason for large number of restaurants in any of those areas, reasons which would make them suitable for a new business regardless of competition in the area. Recommended zones should therefore be considered only as a starting point for more detailed analysis which could eventually result in location which has not only no nearby competition but also other factors taken into account and all other relevant conditions met.

### Conclusion

Purpose of this project was to identify Malmo areas close to center with high number of restaurants (particularly Indian restaurants) in order to aid stakeholders in narrowing down the search for optimal location for a new warehouse. By calculating restaurant density distribution from Foursquare data we have first identified general boroughs that justify further analysis , and then generated extensive collection of locations which satisfy some basic requirements regarding existing nearby restaurants. Clustering of those locations was then performed in order to create major zones of interest (containing greatest number of potential locations) and addresses of those zone centers were created to be used as starting points for final exploration by stakeholders.

Final decission on optimal restaurant location will be made by stakeholders based on specific characteristics of neighborhoods and locations in every recommended zone, taking into consideration additional factors like attractiveness of each location proximity to major roads, real estate availability, prices, social and economic dynamics of every neighborhood etc.