**1.INTRODUCTION**

Clearly define a problem or an idea of your choice, where you would need to leverage the Foursquare location data to solve or execute. Remember that data science problems always target an audience and are meant to help a group of stakeholders solve a problem, so make sure that you explicitly describe your audience and why they would care about your problem.

**In this project, we will try to find an ideal restaurant venue. This study would directly address stakeholders that are involved in opening an Italian restaurant in Berlin, Germany.
We would strive to detect areas that are not already packed with restaurants, as there are several restaurants in Berlin. The places with no Italian restaurants in the vicinity are also of special concern to us. We will still want areas as close as possible to the city centre, assuming the first two conditions are satisfied.
Based on this criterion, we will use our data science powers to deliver a handful of the most promising neighbourhoods. The advantages of each region would then be specifically articulated in order to enable stakeholders to select the best possible final venue.**

**2.DATA**

Describe the data that you will be using to solve the problem or execute your idea. Remember that you will need to use the Foursquare location data to solve the problem or execute your idea. You can absolutely use other datasets in combination with the Foursquare location data. So make sure that you provide adequate explanation and discussion, with examples, of the data that you will be using, even if it is only Foursquare location data.

**Factors that affect our judgment, based on the concept of our dilemma, are:**
* Number of established neighborhood restaurants (any type of restaurant) 
* Distance to and number of Italian restaurants in the neighborhood, if any 
* The gap between the community and the city center

**We decided to use regularly spaced grid of locations, centered around city center, to define our neighborhoods.**

**Following data sources will be needed to extract/generate the required information:**
* Centers of candidate areas will be generated algorithmically and approximate addresses of centers of those areas will be obtained using Google Maps API reverse geocoding
* Number of restaurants and their type and location in every neighborhood will be obtained using Foursquare API
* Coordinate of Berlin center will be obtained using Google Maps API geocoding of well known Berlin location (Alexanderplatz)

In [6]:
!pip install pandas
!pip install numpy
!pip install bs4
!pip install matplotlib
!pip install requests
!pip install folium
!pip install geopy
!pip install shapely
!pip install pyproj
!pip install seaborn
!pip install wikipedia
!pip install geocoder
!pip install lxml



In [11]:
import pandas as pd
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)
import numpy as np
import wikipedia as wp
from bs4 import BeautifulSoup
import matplotlib.pyplot as plt
import requests
import folium
import io
from pandas.io.json import json_normalize
import matplotlib.cm as cm
import matplotlib.colors as colors
from sklearn.cluster import KMeans
from geopy.geocoders import Nominatim
import json
import seaborn as sns
import pyproj
import shapely.geometry
import math

In [12]:
google_api_key = 'AIzaSyBPWm8_OUjB8lmS0lofFZeRr6MAuz7LAtw'

In [13]:
def get_coordinates(api_key, address, verbose=False):
    try:
        url = 'https://maps.googleapis.com/maps/api/geocode/json?key={}&address={}'.format(api_key, address)
        response = requests.get(url).json()
        print(response)
        if verbose:
            print('Google Maps API JSON result =>', response)
        results = response['results']
        geographical_data = results[0]['geometry']['location'] # get geographical coordinates
        lat = geographical_data['lat']
        lon = geographical_data['lng']
        return [lat, lon]
    except:
        return [None, None]
    
address = 'Alexanderplatz, Berlin, Germany'
berlin_center = get_coordinates(google_api_key, address)
print('Coordinate of {}: {}'.format(address, berlin_center))

{'results': [{'address_components': [{'long_name': 'Berlin', 'short_name': 'Berlin', 'types': ['locality', 'political']}, {'long_name': 'Mitte', 'short_name': 'Mitte', 'types': ['political', 'sublocality', 'sublocality_level_1']}, {'long_name': 'Berlin', 'short_name': 'Berlin', 'types': ['administrative_area_level_1', 'political']}, {'long_name': 'Germany', 'short_name': 'DE', 'types': ['country', 'political']}, {'long_name': '10178', 'short_name': '10178', 'types': ['postal_code']}], 'formatted_address': '10178 Berlin, Germany', 'geometry': {'location': {'lat': 52.52198139999999, 'lng': 13.413306}, 'location_type': 'GEOMETRIC_CENTER', 'viewport': {'northeast': {'lat': 52.5233303802915, 'lng': 13.4146549802915}, 'southwest': {'lat': 52.5206324197085, 'lng': 13.4119570197085}}}, 'place_id': 'ChIJbygR2x5OqEcRbhbkZsMB_DA', 'plus_code': {'compound_code': 'GCC7+Q8 Berlin, Germany', 'global_code': '9F4MGCC7+Q8'}, 'types': ['establishment', 'point_of_interest', 'tourist_attraction']}], 'statu

In [14]:
def lonlat_to_xy(lon, lat):
    proj_latlon = pyproj.Proj(proj='latlong',datum='WGS84')
    proj_xy = pyproj.Proj(proj="utm", zone=33, datum='WGS84')
    xy = pyproj.transform(proj_latlon, proj_xy, lon, lat)
    return xy[0], xy[1]

def xy_to_lonlat(x, y):
    proj_latlon = pyproj.Proj(proj='latlong',datum='WGS84')
    proj_xy = pyproj.Proj(proj="utm", zone=33, datum='WGS84')
    lonlat = pyproj.transform(proj_xy, proj_latlon, x, y)
    return lonlat[0], lonlat[1]

def calc_xy_distance(x1, y1, x2, y2):
    dx = x2 - x1
    dy = y2 - y1
    return math.sqrt(dx*dx + dy*dy)

print('Coordinate transformation check')
print('-------------------------------')
print('Berlin center longitude={}, latitude={}'.format(berlin_center[1], berlin_center[0]))
x, y = lonlat_to_xy(berlin_center[1], berlin_center[0])
print('Berlin center UTM X={}, Y={}'.format(x, y))
lo, la = xy_to_lonlat(x, y)
print('Berlin center longitude={}, latitude={}'.format(lo, la))

Coordinate transformation check
-------------------------------
Berlin center longitude=13.413306, latitude=52.52198139999999
Berlin center UTM X=392347.62823280576, Y=5820280.114077939
Berlin center longitude=13.413306, latitude=52.52198139999999


In [15]:
berlin_center_x, berlin_center_y = lonlat_to_xy(berlin_center[1], berlin_center[0]) # City center in Cartesian coordinates

k = math.sqrt(3) / 2 # Vertical offset for hexagonal grid cells
x_min = berlin_center_x - 6000
x_step = 600
y_min = berlin_center_y - 6000 - (int(21/k)*k*600 - 12000)/2
y_step = 600 * k 

latitudes = []
longitudes = []
distances_from_center = []
xs = []
ys = []
for i in range(0, int(21/k)):
    y = y_min + i * y_step
    x_offset = 300 if i%2==0 else 0
    for j in range(0, 21):
        x = x_min + j * x_step + x_offset
        distance_from_center = calc_xy_distance(berlin_center_x, berlin_center_y, x, y)
        if (distance_from_center <= 6001):
            lon, lat = xy_to_lonlat(x, y)
            latitudes.append(lat)
            longitudes.append(lon)
            distances_from_center.append(distance_from_center)
            xs.append(x)
            ys.append(y)

print(len(latitudes), 'candidate neighborhood centers generated.')

364 candidate neighborhood centers generated.


In [16]:
map_berlin = folium.Map(location=berlin_center, zoom_start=13)
folium.Marker(berlin_center, popup='Alexanderplatz').add_to(map_berlin)
for lat, lon in zip(latitudes, longitudes):
    #folium.CircleMarker([lat, lon], radius=2, color='blue', fill=True, fill_color='blue', fill_opacity=1).add_to(map_berlin) 
    folium.Circle([lat, lon], radius=300, color='blue', fill=False).add_to(map_berlin)
    #folium.Marker([lat, lon]).add_to(map_berlin)
map_berlin

In [17]:
def get_address(api_key, latitude, longitude, verbose=False):
    try:
        url = 'https://maps.googleapis.com/maps/api/geocode/json?key={}&latlng={},{}'.format(api_key, latitude, longitude)
        response = requests.get(url).json()
        if verbose:
            print('Google Maps API JSON result =>', response)
        results = response['results']
        address = results[0]['formatted_address']
        return address
    except:
        return None

addr = get_address(google_api_key, berlin_center[0], berlin_center[1])
print('Reverse geocoding check')
print('-----------------------')
print('Address of [{}, {}] is: {}'.format(berlin_center[0], berlin_center[1], addr))

Reverse geocoding check
-----------------------
Address of [52.52198139999999, 13.413306] is: Alexanderplatz, 10178 Berlin, Germany


In [18]:
print('Obtaining location addresses: ', end='')
addresses = []
for lat, lon in zip(latitudes, longitudes):
    address = get_address(google_api_key, lat, lon)
    if address is None:
        address = 'NO ADDRESS'
    address = address.replace(', Germany', '') # We don't need country part of address
    addresses.append(address)
    print(' .', end='')
print(' done.')

Obtaining location addresses:  . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . done.


In [19]:
addresses[150:170]

['Frankfurter Allee 147, 10365 Berlin',
 'Magdalenenstraße 14, 10365 Berlin',
 'Fanningerstraße 46, 10365 Berlin',
 'Wilhelm-Caspar-Wegely-Platz 6, 10623 Berlin',
 'Händelallee 53, 10557 Berlin',
 'Spreeweg 1, 10557 Berlin',
 'Kastanienallee, 10557 Berlin',
 'Platz der Republik (Berlin), 10557 Berlin',
 'Pariser Platz 6A, 10117 Berlin',
 'B2 38, 10117 Berlin',
 'Unter den Linden 5, 10117 Berlin',
 'Rathausstraße 53a, 10178 Berlin',
 'Waisenstraße 16, 10179 Berlin',
 'Neue Blumenstraße 1, 10179 Berlin',
 'Blumenstraße 46, 10243 Berlin',
 'Karl-Marx-Allee 89, 10243 Berlin',
 'Weidenweg 27, 10249 Berlin',
 'Rigaer Str. 95, 10247 Berlin',
 'Bänschstraße 55, 10247 Berlin',
 'Parkaue 30, 10367 Berlin']

In [20]:
import pandas as pd

df_locations = pd.DataFrame({'Address': addresses,
                             'Latitude': latitudes,
                             'Longitude': longitudes,
                             'X': xs,
                             'Y': ys,
                             'Distance from center': distances_from_center})

df_locations.head(10)

Unnamed: 0,Address,Latitude,Longitude,X,Y,Distance from center
0,"Ringbahnstraße 72, 12099 Berlin",52.470257,13.388666,390547.628233,5814564.0,5992.495307
1,"09R/27L, 12101 Berlin",52.470377,13.397496,391147.628233,5814564.0,5840.3767
2,"09R/27L, 12049 Berlin",52.470497,13.406325,391747.628233,5814564.0,5747.173218
3,"Oderstraße 174, 12049 Berlin",52.470615,13.415154,392347.628233,5814564.0,5715.767665
4,"Warthestraße 22, 12051 Berlin",52.470733,13.423984,392947.628233,5814564.0,5747.173218
5,NO ADDRESS,52.470851,13.432813,393547.628233,5814564.0,5840.3767
6,"Karl-Marx-Straße 211, 12055 Berlin",52.470967,13.441643,394147.628233,5814564.0,5992.495307
7,"Hessenring 36, 12101 Berlin",52.474746,13.37525,389647.628233,5815084.0,5855.766389
8,"Kleineweg 125, 12101 Berlin",52.474867,13.384081,390247.628233,5815084.0,5604.462508
9,"09L/27R, 12101 Berlin",52.474987,13.392911,390847.628233,5815084.0,5408.326913


In [21]:
df_locations.to_pickle('./locations.pkl')    

In [29]:
CLIENT_ID = 'BREP5Y2TPDKZTTDYEKL4VBIPUELQZZRBXFJTMJGLDTKPJVPZ' # my Foursquare ID
CLIENT_SECRET = 'F4OSHTF0FOM1ASZKWAQP2P0WVCA03AIVHEDE4BM4LBKTSIW0' # my Foursquare Secret
VERSION = '20180604'
LIMIT = 30
foursquare_client_id = CLIENT_ID
foursquare_client_secret = CLIENT_SECRET
print('Your credentails:')
print('CLIENT_ID: '+CLIENT_ID)
print('CLIENT_SECRET: '+CLIENT_SECRET)



Your credentails:
CLIENT_ID: BREP5Y2TPDKZTTDYEKL4VBIPUELQZZRBXFJTMJGLDTKPJVPZ
CLIENT_SECRET: F4OSHTF0FOM1ASZKWAQP2P0WVCA03AIVHEDE4BM4LBKTSIW0


In [30]:
food_category = '4d4b7105d754a06374d81259' # 'Root' category for all food-related venues

italian_restaurant_categories = ['4bf58dd8d48988d110941735','55a5a1ebe4b013909087cbb6','55a5a1ebe4b013909087cb7c',
                                 '55a5a1ebe4b013909087cba7','55a5a1ebe4b013909087cba1','55a5a1ebe4b013909087cba4',
                                 '55a5a1ebe4b013909087cb95','55a5a1ebe4b013909087cb89','55a5a1ebe4b013909087cb9b',
                                 '55a5a1ebe4b013909087cb98','55a5a1ebe4b013909087cbbf','55a5a1ebe4b013909087cb79',
                                 '55a5a1ebe4b013909087cbb0','55a5a1ebe4b013909087cbb3','55a5a1ebe4b013909087cb74',
                                 '55a5a1ebe4b013909087cbaa','55a5a1ebe4b013909087cb83','55a5a1ebe4b013909087cb8c',
                                 '55a5a1ebe4b013909087cb92','55a5a1ebe4b013909087cb8f','55a5a1ebe4b013909087cb86',
                                 '55a5a1ebe4b013909087cbb9','55a5a1ebe4b013909087cb7f','55a5a1ebe4b013909087cbbc',
                                 '55a5a1ebe4b013909087cb9e','55a5a1ebe4b013909087cbc2','55a5a1ebe4b013909087cbad']

def is_restaurant(categories, specific_filter=None):
    restaurant_words = ['restaurant', 'diner', 'taverna', 'steakhouse']
    restaurant = False
    specific = False
    for c in categories:
        category_name = c[0].lower()
        category_id = c[1]
        for r in restaurant_words:
            if r in category_name:
                restaurant = True
        if 'fast food' in category_name:
            restaurant = False
        if not(specific_filter is None) and (category_id in specific_filter):
            specific = True
            restaurant = True
    return restaurant, specific

def get_categories(categories):
    return [(cat['name'], cat['id']) for cat in categories]

def format_address(location):
    address = ', '.join(location['formattedAddress'])
    address = address.replace(', Deutschland', '')
    address = address.replace(', Germany', '')
    return address

def get_venues_near_location(lat, lon, category, client_id, client_secret, radius=500, limit=100):
    version = '20180724'
    url = 'https://api.foursquare.com/v2/venues/explore?client_id={}&client_secret={}&v={}&ll={},{}&categoryId={}&radius={}&limit={}'.format(
        client_id, client_secret, version, lat, lon, category, radius, limit)
    try:
        results = requests.get(url).json()['response']['groups'][0]['items']
        venues = [(item['venue']['id'],
                   item['venue']['name'],
                   get_categories(item['venue']['categories']),
                   (item['venue']['location']['lat'], item['venue']['location']['lng']),
                   format_address(item['venue']['location']),
                   item['venue']['location']['distance']) for item in results]        
    except:
        venues = []
    return venues

In [31]:
import pickle

def get_restaurants(lats, lons):
    restaurants = {}
    italian_restaurants = {}
    location_restaurants = []

    print('Obtaining venues around candidate locations:', end='')
    for lat, lon in zip(lats, lons):
        # Using radius=350 to meke sure we have overlaps/full coverage so we don't miss any restaurant (we're using dictionaries to remove any duplicates resulting from area overlaps)
        venues = get_venues_near_location(lat, lon, food_category, foursquare_client_id, foursquare_client_secret, radius=350, limit=100)
        area_restaurants = []
        for venue in venues:
            venue_id = venue[0]
            venue_name = venue[1]
            venue_categories = venue[2]
            venue_latlon = venue[3]
            venue_address = venue[4]
            venue_distance = venue[5]
            is_res, is_italian = is_restaurant(venue_categories, specific_filter=italian_restaurant_categories)
            if is_res:
                x, y = lonlat_to_xy(venue_latlon[1], venue_latlon[0])
                restaurant = (venue_id, venue_name, venue_latlon[0], venue_latlon[1], venue_address, venue_distance, is_italian, x, y)
                if venue_distance<=300:
                    area_restaurants.append(restaurant)
                restaurants[venue_id] = restaurant
                if is_italian:
                    italian_restaurants[venue_id] = restaurant
        location_restaurants.append(area_restaurants)
        print(' .', end='')
    print(' done.')
    return restaurants, italian_restaurants, location_restaurants

# Try to load from local file system in case we did this before
restaurants = {}
italian_restaurants = {}
location_restaurants = []
loaded = False
try:
    with open('restaurants_350.pkl', 'rb') as f:
        restaurants = pickle.load(f)
    with open('italian_restaurants_350.pkl', 'rb') as f:
        italian_restaurants = pickle.load(f)
    with open('location_restaurants_350.pkl', 'rb') as f:
        location_restaurants = pickle.load(f)
    print('Restaurant data loaded.')
    loaded = True
except:
    pass

# If load failed use the Foursquare API to get the data
if not loaded:
    restaurants, italian_restaurants, location_restaurants = get_restaurants(latitudes, longitudes)
    
    # Let's persists this in local file system
    with open('restaurants_350.pkl', 'wb') as f:
        pickle.dump(restaurants, f)
    with open('italian_restaurants_350.pkl', 'wb') as f:
        pickle.dump(italian_restaurants, f)
    with open('location_restaurants_350.pkl', 'wb') as f:
        pickle.dump(location_restaurants, f)

Obtaining venues around candidate locations: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . done.


In [32]:
print('Total number of restaurants:', len(restaurants))
print('Total number of Italian restaurants:', len(italian_restaurants))
print('Percentage of Italian restaurants: {:.2f}%'.format(len(italian_restaurants) / len(restaurants) * 100))
print('Average number of restaurants in neighborhood:', np.array([len(r) for r in location_restaurants]).mean())

Total number of restaurants: 2073
Total number of Italian restaurants: 299
Percentage of Italian restaurants: 14.42%
Average number of restaurants in neighborhood: 4.8901098901098905


In [33]:
print('List of all restaurants')
print('-----------------------')
for r in list(restaurants.values())[:10]:
    print(r)
print('...')
print('Total:', len(restaurants))

List of all restaurants
-----------------------
('4c49b2c520ab1b8de2062117', 'Trattoria Toscana', 52.4715056699932, 13.385329206997389, 'Tempelhofer Damm 104 (Hoeppnerstr.), 12099 Berlin', 265, True, 390324.0734948359, 5814708.250705032)
('529c75dc498e5f28c08db637', 'Orient Food', 52.468418, 13.385631, 'Tempelhofer damm 124, 12099 Berlin', 290, False, 390336.8958508584, 5814364.380615842)
('4fc6a39be4b0b925af772392', 'Lodos Bistro', 52.47099391193633, 13.385207714672129, 'Berlin', 248, False, 390314.54923433566, 5814651.517235505)
('52b5d5d0498edc50653df469', 'Yasaka-Sushi', 52.46877834428962, 13.385458971242336, 'Tempelhofer Damm 124, 12099 Berlin', 272, False, 390326.1066713873, 5814404.719417979)
('4e68e3beae60186280aeda86', 'Mahatma', 52.47215311646854, 13.38533023091156, 'Tempelhofer Damm 102 (Manfred-von-Richthofen-Str.), 12101 Berlin', 313, False, 390325.75282294105, 5814780.258426609)
('53bfbf97498e767d04eba262', 'My Asia', 52.46863425153149, 13.38513817439509, 'Tempelhofer Dam

In [34]:
print('List of Italian restaurants')
print('---------------------------')
for r in list(italian_restaurants.values())[:10]:
    print(r)
print('...')
print('Total:', len(italian_restaurants))

List of Italian restaurants
---------------------------
('4c49b2c520ab1b8de2062117', 'Trattoria Toscana', 52.4715056699932, 13.385329206997389, 'Tempelhofer Damm 104 (Hoeppnerstr.), 12099 Berlin', 265, True, 390324.0734948359, 5814708.250705032)
('4e7793c48130fa6bd3b581ff', 'Ninì e Pettirosso', 52.470317, 13.43698, 'Selkestr. 27, 12051 Berlin', 324, True, 393829.34645155125, 5814498.875631407)
('52c863ca498e73da17e59cb9', 'Caligari', 52.475843, 13.423658, 'Kienitzerstr. 110, 12049 Berlin', 313, True, 392937.9001831622, 5815133.146853393)
('514e2368e4b03aaaf1b5055c', 'La Pecora Nera', 52.477266, 13.421193, 'Herrfurthplatz 6, 12049 Berlin', 246, True, 392773.95274748007, 5815295.070954637)
('55a69d78498eeec7e1b94b3b', 'Bottega No. 6', 52.478164, 13.438749, 'Richardstr. 6, 12043 Berlin', 309, True, 393968.36637404084, 5815369.029899831)
('4e0777612271dfa46baf6c41', 'Pipaso', 52.47881647595757, 13.44599754315937, 'Sonnenallee, 12059 Berlin', 333, True, 394462.1781169361, 5815430.982869017)

In [35]:
print('Restaurants around location')
print('---------------------------')
for i in range(100, 110):
    rs = location_restaurants[i][:8]
    names = ', '.join([r[1] for r in rs])
    print('Restaurants around location {}: {}'.format(i+1, names))

Restaurants around location
---------------------------
Restaurants around location 101: Mabuhay, Scandic Restaurant, Tischlein deck dich
Restaurants around location 102: Solar, Mirami - Sushi & Asia fusion cuisin, Layla, Mexican, Cucina Italiana, Restaurant Hof zwei, Ristorante Marinelli, Gaststätte Italia 90
Restaurants around location 103: NaNum, Nobelhart & Schmutzig, Mama Cook, Charlotte 1, Paracas, Trattoria da Vinci, Café Nullpunkt, Orient Food
Restaurants around location 104: 
Restaurants around location 105: Shishi, Pacifico, TAT Imbiss
Restaurants around location 106: Die Henne, Zur kleinen Markthalle, Parantez, Habibi, Maroush, Cevichería, Santa Maria, Mezehaus
Restaurants around location 107: La Piadina, Der Goldene Hahn, 3 Schwestern, Trattoria Marechiaro, Weltrestaurant Markthalle, Long March Canteen, Olive
Restaurants around location 108: Salumeria Lamuri, Restaurant Richard
Restaurants around location 109: Vincent Vegan, Scheers Schnitzel, Michelberger Restaurant, Tony 

In [36]:
map_berlin = folium.Map(location=berlin_center, zoom_start=13)
folium.Marker(berlin_center, popup='Alexanderplatz').add_to(map_berlin)
for res in restaurants.values():
    lat = res[2]; lon = res[3]
    is_italian = res[6]
    color = 'red' if is_italian else 'blue'
    folium.CircleMarker([lat, lon], radius=3, color=color, fill=True, fill_color=color, fill_opacity=1).add_to(map_berlin)
map_berlin

**METHODOLOGY**

In this project, we will concentrate our efforts on identifying areas of Berlin with a low density of restaurants, especially those with a low number of restaurants in Italy. We're going to narrow our study to about 6 km across the city centre.

We have collected the necessary data in the first step: location and sort (category) of each restaurant within 6 km of Berlin center (Alexanderplatz). We named Italian restaurants as well (according to Foursquare categorization).

Calculation and discovery of 'restaurant density' across various areas of Berlin will be the second step in our study - we will use heatmaps to find a few promising areas near the center with a low number of restaurants in general (and no Italian restaurants nearby) and concentrate our efforts on those areas.

In the third and final phase, we will concentrate on the most promising areas and build clusters of locations within them that follow some specific criteria set out in the conversation with stakeholders: we will take into account locations with no more than two restaurants within a radius of 250 meters, and we want locations without Italian restaurants within a radius of 400 meters. We will display a map of all these locations but also create clusters of those locations (using k-means clustering) to classify general zones/neighborhoods/addresses that should be a starting point for the final exploration of 'street level' and search by stakeholders for optimal position.

**ANALYSIS**

Let's do some simple study of explanatory data and extract some extra details from our raw data. Let's first count the number of restaurants in each applicant region:

In [38]:
location_restaurants_count = [len(res) for res in location_restaurants]

df_locations['Restaurants in area'] = location_restaurants_count

print('Average number of restaurants in every area with radius=300m:', np.array(location_restaurants_count).mean())

df_locations.head(10)

Average number of restaurants in every area with radius=300m: 4.8901098901098905


Unnamed: 0,Address,Latitude,Longitude,X,Y,Distance from center,Restaurants in area
0,"Ringbahnstraße 72, 12099 Berlin",52.470257,13.388666,390547.628233,5814564.0,5992.495307,5
1,"09R/27L, 12101 Berlin",52.470377,13.397496,391147.628233,5814564.0,5840.3767,0
2,"09R/27L, 12049 Berlin",52.470497,13.406325,391747.628233,5814564.0,5747.173218,0
3,"Oderstraße 174, 12049 Berlin",52.470615,13.415154,392347.628233,5814564.0,5715.767665,0
4,"Warthestraße 22, 12051 Berlin",52.470733,13.423984,392947.628233,5814564.0,5747.173218,0
5,NO ADDRESS,52.470851,13.432813,393547.628233,5814564.0,5840.3767,7
6,"Karl-Marx-Straße 211, 12055 Berlin",52.470967,13.441643,394147.628233,5814564.0,5992.495307,6
7,"Hessenring 36, 12101 Berlin",52.474746,13.37525,389647.628233,5815084.0,5855.766389,0
8,"Kleineweg 125, 12101 Berlin",52.474867,13.384081,390247.628233,5815084.0,5604.462508,0
9,"09L/27R, 12101 Berlin",52.474987,13.392911,390847.628233,5815084.0,5408.326913,0


In [40]:
distances_to_italian_restaurant = []

for area_x, area_y in zip(xs, ys):
    min_distance = 10000
    for res in italian_restaurants.values():
        res_x = res[7]
        res_y = res[8]
        d = calc_xy_distance(area_x, area_y, res_x, res_y)
        if d<min_distance:
            min_distance = d
    distances_to_italian_restaurant.append(min_distance)

df_locations['Distance to Italian restaurant'] = distances_to_italian_restaurant
df_locations.head(10)

Unnamed: 0,Address,Latitude,Longitude,X,Y,Distance from center,Restaurants in area,Distance to Italian restaurant
0,"Ringbahnstraße 72, 12099 Berlin",52.470257,13.388666,390547.628233,5814564.0,5992.495307,5,265.86682
1,"09R/27L, 12101 Berlin",52.470377,13.397496,391147.628233,5814564.0,5840.3767,0,836.032805
2,"09R/27L, 12049 Berlin",52.470497,13.406325,391747.628233,5814564.0,5747.173218,0,1259.881092
3,"Oderstraße 174, 12049 Berlin",52.470615,13.415154,392347.628233,5814564.0,5715.767665,0,819.728563
4,"Warthestraße 22, 12051 Berlin",52.470733,13.423984,392947.628233,5814564.0,5747.173218,0,568.883623
5,NO ADDRESS,52.470851,13.432813,393547.628233,5814564.0,5840.3767,7,289.225825
6,"Karl-Marx-Straße 211, 12055 Berlin",52.470967,13.441643,394147.628233,5814564.0,5992.495307,6,324.945712
7,"Hessenring 36, 12101 Berlin",52.474746,13.37525,389647.628233,5815084.0,5855.766389,0,773.780919
8,"Kleineweg 125, 12101 Berlin",52.474867,13.384081,390247.628233,5815084.0,5604.462508,0,383.409176
9,"09L/27R, 12101 Berlin",52.474987,13.392911,390847.628233,5815084.0,5408.326913,0,644.41313


In [41]:
print('Average distance to closest Italian restaurant from each area center:', df_locations['Distance to Italian restaurant'].mean())

Average distance to closest Italian restaurant from each area center: 479.95671085814854


In [42]:
berlin_boroughs_url = 'https://raw.githubusercontent.com/m-hoerz/berlin-shapes/master/berliner-bezirke.geojson'
berlin_boroughs = requests.get(berlin_boroughs_url).json()

def boroughs_style(feature):
    return { 'color': 'blue', 'fill': False }
restaurant_latlons = [[res[2], res[3]] for res in restaurants.values()]

italian_latlons = [[res[2], res[3]] for res in italian_restaurants.values()]

In [43]:
from folium import plugins
from folium.plugins import HeatMap

map_berlin = folium.Map(location=berlin_center, zoom_start=13)
folium.TileLayer('cartodbpositron').add_to(map_berlin) #cartodbpositron cartodbdark_matter
HeatMap(restaurant_latlons).add_to(map_berlin)
folium.Marker(berlin_center).add_to(map_berlin)
folium.Circle(berlin_center, radius=1000, fill=False, color='white').add_to(map_berlin)
folium.Circle(berlin_center, radius=2000, fill=False, color='white').add_to(map_berlin)
folium.Circle(berlin_center, radius=3000, fill=False, color='white').add_to(map_berlin)
folium.GeoJson(berlin_boroughs, style_function=boroughs_style, name='geojson').add_to(map_berlin)
map_berlin

In [44]:
map_berlin = folium.Map(location=berlin_center, zoom_start=13)
folium.TileLayer('cartodbpositron').add_to(map_berlin) #cartodbpositron cartodbdark_matter
HeatMap(italian_latlons).add_to(map_berlin)
folium.Marker(berlin_center).add_to(map_berlin)
folium.Circle(berlin_center, radius=1000, fill=False, color='white').add_to(map_berlin)
folium.Circle(berlin_center, radius=2000, fill=False, color='white').add_to(map_berlin)
folium.Circle(berlin_center, radius=3000, fill=False, color='white').add_to(map_berlin)
folium.GeoJson(berlin_boroughs, style_function=boroughs_style, name='geojson').add_to(map_berlin)
map_berlin

**KREUZBERG AND FRIEDRICHSHAIN**

Analysis of popular travel guides and web sites often mention Kreuzberg and Friedrichshain as beautifull, interesting, rich with culture, 'hip' and 'cool' Berlin neighborhoods popular with tourists and loved by Berliners.

*"Bold and brazen, Kreuzberg's creative people, places, and spaces might challenge your paradigm."* Tags: Nightlife, Artsy, Dining, Trendy, Loved by Berliners, Great Transit (airbnb.com)

*"Kreuzberg has long been revered for its diverse cultural life and as a part of Berlin where alternative lifestyles have flourished. Envisioning the glamorous yet gritty nature of Berlin often conjures up scenes from this neighbourhood, where cultures, movements and artistic flare adorn the walls of building and fills the air. Brimming with nightclubs, street food, and art galleries, Kreuzberg is the place to be for Berlin’s young and trendy."* (theculturetrip.com)

*"Imagine an art gallery turned inside out and you’ll begin to envision Friedrichshain. Single walls aren’t canvases for creative works, entire buildings are canvases. This zealously expressive east Berlin neighborhood forgoes social norms"* Tags: Artsy, Nightlife, Trendy, Dining, Touristy, Shopping, Great Transit, Loved by Berliners (airbnb.com)

*"As anyone from Kreuzberg will tell you, this district is not just the coolest in Berlin, but the hippest location in the entire universe. Kreuzberg has long been famed for its diverse cultural life, its experimental alternative lifestyles and the powerful spell it exercises on young people from across Germany. In 2001, Kreuzberg and Friedrichshain were merged to form one administrative borough. When it comes to club culture, Friedrichshain is now out in front – with southern Friedrichshain particularly ranked as home to the highest density of clubs in the city."* (visitberlin.de)

Popular with tourists, alternative and bohemian but booming and trendy, relatively close to city center and well connected, those boroughs appear to justify further analysis.

Let's define new, more narrow region of interest, which will include low-restaurant-count parts of Kreuzberg and Friedrichshain closest to Alexanderplatz.


In [45]:
roi_x_min = berlin_center_x - 2000
roi_y_max = berlin_center_y + 1000
roi_width = 5000
roi_height = 5000
roi_center_x = roi_x_min + 2500
roi_center_y = roi_y_max - 2500
roi_center_lon, roi_center_lat = xy_to_lonlat(roi_center_x, roi_center_y)
roi_center = [roi_center_lat, roi_center_lon]

map_berlin = folium.Map(location=roi_center, zoom_start=14)
HeatMap(restaurant_latlons).add_to(map_berlin)
folium.Marker(berlin_center).add_to(map_berlin)
folium.Circle(roi_center, radius=2500, color='white', fill=True, fill_opacity=0.4).add_to(map_berlin)
folium.GeoJson(berlin_boroughs, style_function=boroughs_style, name='geojson').add_to(map_berlin)
map_berlin

In [46]:
k = math.sqrt(3) / 2 # Vertical offset for hexagonal grid cells
x_step = 100
y_step = 100 * k 
roi_y_min = roi_center_y - 2500

roi_latitudes = []
roi_longitudes = []
roi_xs = []
roi_ys = []
for i in range(0, int(51/k)):
    y = roi_y_min + i * y_step
    x_offset = 50 if i%2==0 else 0
    for j in range(0, 51):
        x = roi_x_min + j * x_step + x_offset
        d = calc_xy_distance(roi_center_x, roi_center_y, x, y)
        if (d <= 2501):
            lon, lat = xy_to_lonlat(x, y)
            roi_latitudes.append(lat)
            roi_longitudes.append(lon)
            roi_xs.append(x)
            roi_ys.append(y)

print(len(roi_latitudes), 'candidate neighborhood centers generated.')

2261 candidate neighborhood centers generated.


In [47]:
def count_restaurants_nearby(x, y, restaurants, radius=250):    
    count = 0
    for res in restaurants.values():
        res_x = res[7]; res_y = res[8]
        d = calc_xy_distance(x, y, res_x, res_y)
        if d<=radius:
            count += 1
    return count

def find_nearest_restaurant(x, y, restaurants):
    d_min = 100000
    for res in restaurants.values():
        res_x = res[7]; res_y = res[8]
        d = calc_xy_distance(x, y, res_x, res_y)
        if d<=d_min:
            d_min = d
    return d_min

roi_restaurant_counts = []
roi_italian_distances = []

print('Generating data on location candidates... ', end='')
for x, y in zip(roi_xs, roi_ys):
    count = count_restaurants_nearby(x, y, restaurants, radius=250)
    roi_restaurant_counts.append(count)
    distance = find_nearest_restaurant(x, y, italian_restaurants)
    roi_italian_distances.append(distance)
print('done.')

Generating data on location candidates... done.


In [48]:
df_roi_locations = pd.DataFrame({'Latitude':roi_latitudes,
                                 'Longitude':roi_longitudes,
                                 'X':roi_xs,
                                 'Y':roi_ys,
                                 'Restaurants nearby':roi_restaurant_counts,
                                 'Distance to Italian restaurant':roi_italian_distances})

df_roi_locations.head(10)

Unnamed: 0,Latitude,Longitude,X,Y,Restaurants nearby,Distance to Italian restaurant
0,52.486123,13.421225,392797.628233,5816280.0,7,151.316228
1,52.486143,13.422697,392897.628233,5816280.0,8,177.795718
2,52.486793,13.4131,392247.628233,5816367.0,0,477.672615
3,52.486813,13.414572,392347.628233,5816367.0,0,460.972108
4,52.486832,13.416044,392447.628233,5816367.0,0,362.229211
5,52.486852,13.417516,392547.628233,5816367.0,2,264.432822
6,52.486872,13.418988,392647.628233,5816367.0,5,169.231894
7,52.486891,13.420461,392747.628233,5816367.0,5,85.756343
8,52.486911,13.421933,392847.628233,5816367.0,8,77.902932
9,52.486931,13.423405,392947.628233,5816367.0,12,157.428026


In [49]:
good_res_count = np.array((df_roi_locations['Restaurants nearby']<=2))
print('Locations with no more than two restaurants nearby:', good_res_count.sum())

good_ita_distance = np.array(df_roi_locations['Distance to Italian restaurant']>=400)
print('Locations with no Italian restaurants within 400m:', good_ita_distance.sum())

good_locations = np.logical_and(good_res_count, good_ita_distance)
print('Locations with both conditions met:', good_locations.sum())

df_good_locations = df_roi_locations[good_locations]

Locations with no more than two restaurants nearby: 789
Locations with no Italian restaurants within 400m: 373
Locations with both conditions met: 287


In [50]:
good_latitudes = df_good_locations['Latitude'].values
good_longitudes = df_good_locations['Longitude'].values

good_locations = [[lat, lon] for lat, lon in zip(good_latitudes, good_longitudes)]

map_berlin = folium.Map(location=roi_center, zoom_start=14)
folium.TileLayer('cartodbpositron').add_to(map_berlin)
HeatMap(restaurant_latlons).add_to(map_berlin)
folium.Circle(roi_center, radius=2500, color='white', fill=True, fill_opacity=0.6).add_to(map_berlin)
folium.Marker(berlin_center).add_to(map_berlin)
for lat, lon in zip(good_latitudes, good_longitudes):
    folium.CircleMarker([lat, lon], radius=2, color='blue', fill=True, fill_color='blue', fill_opacity=1).add_to(map_berlin) 
folium.GeoJson(berlin_boroughs, style_function=boroughs_style, name='geojson').add_to(map_berlin)
map_berlin

In [51]:
map_berlin = folium.Map(location=roi_center, zoom_start=14)
HeatMap(good_locations, radius=25).add_to(map_berlin)
folium.Marker(berlin_center).add_to(map_berlin)
for lat, lon in zip(good_latitudes, good_longitudes):
    folium.CircleMarker([lat, lon], radius=2, color='blue', fill=True, fill_color='blue', fill_opacity=1).add_to(map_berlin)
folium.GeoJson(berlin_boroughs, style_function=boroughs_style, name='geojson').add_to(map_berlin)
map_berlin

In [52]:
number_of_clusters = 15

good_xys = df_good_locations[['X', 'Y']].values
kmeans = KMeans(n_clusters=number_of_clusters, random_state=0).fit(good_xys)

cluster_centers = [xy_to_lonlat(cc[0], cc[1]) for cc in kmeans.cluster_centers_]

map_berlin = folium.Map(location=roi_center, zoom_start=14)
folium.TileLayer('cartodbpositron').add_to(map_berlin)
HeatMap(restaurant_latlons).add_to(map_berlin)
folium.Circle(roi_center, radius=2500, color='white', fill=True, fill_opacity=0.4).add_to(map_berlin)
folium.Marker(berlin_center).add_to(map_berlin)
for lon, lat in cluster_centers:
    folium.Circle([lat, lon], radius=500, color='green', fill=True, fill_opacity=0.25).add_to(map_berlin) 
for lat, lon in zip(good_latitudes, good_longitudes):
    folium.CircleMarker([lat, lon], radius=2, color='blue', fill=True, fill_color='blue', fill_opacity=1).add_to(map_berlin)
folium.GeoJson(berlin_boroughs, style_function=boroughs_style, name='geojson').add_to(map_berlin)
map_berlin

In [53]:
map_berlin = folium.Map(location=roi_center, zoom_start=14)
folium.Marker(berlin_center).add_to(map_berlin)
for lat, lon in zip(good_latitudes, good_longitudes):
    folium.Circle([lat, lon], radius=250, color='#00000000', fill=True, fill_color='#0066ff', fill_opacity=0.07).add_to(map_berlin)
for lat, lon in zip(good_latitudes, good_longitudes):
    folium.CircleMarker([lat, lon], radius=2, color='blue', fill=True, fill_color='blue', fill_opacity=1).add_to(map_berlin)
for lon, lat in cluster_centers:
    folium.Circle([lat, lon], radius=500, color='green', fill=False).add_to(map_berlin) 
folium.GeoJson(berlin_boroughs, style_function=boroughs_style, name='geojson').add_to(map_berlin)
map_berlin

In [54]:
map_berlin = folium.Map(location=[52.498972, 13.409591], zoom_start=15)
folium.Marker(berlin_center).add_to(map_berlin)
for lon, lat in cluster_centers:
    folium.Circle([lat, lon], radius=500, color='green', fill=False).add_to(map_berlin) 
for lat, lon in zip(good_latitudes, good_longitudes):
    folium.Circle([lat, lon], radius=250, color='#0000ff00', fill=True, fill_color='#0066ff', fill_opacity=0.07).add_to(map_berlin)
for lat, lon in zip(good_latitudes, good_longitudes):
    folium.CircleMarker([lat, lon], radius=2, color='blue', fill=True, fill_color='blue', fill_opacity=1).add_to(map_berlin)
folium.GeoJson(berlin_boroughs, style_function=boroughs_style, name='geojson').add_to(map_berlin)
map_berlin

In [55]:
map_berlin = folium.Map(location=[52.516347, 13.428403], zoom_start=15)
folium.Marker(berlin_center).add_to(map_berlin)
for lon, lat in cluster_centers:
    folium.Circle([lat, lon], radius=500, color='green', fill=False).add_to(map_berlin) 
for lat, lon in zip(good_latitudes, good_longitudes):
    folium.Circle([lat, lon], radius=250, color='#0000ff00', fill=True, fill_color='#0066ff', fill_opacity=0.07).add_to(map_berlin)
for lat, lon in zip(good_latitudes, good_longitudes):
    folium.CircleMarker([lat, lon], radius=2, color='blue', fill=True, fill_color='blue', fill_opacity=1).add_to(map_berlin)
folium.GeoJson(berlin_boroughs, style_function=boroughs_style, name='geojson').add_to(map_berlin)
map_berlin

In [56]:
candidate_area_addresses = []
print('==============================================================')
print('Addresses of centers of areas recommended for further analysis')
print('==============================================================\n')
for lon, lat in cluster_centers:
    addr = get_address(google_api_key, lat, lon).replace(', Germany', '')
    candidate_area_addresses.append(addr)    
    x, y = lonlat_to_xy(lon, lat)
    d = calc_xy_distance(x, y, berlin_center_x, berlin_center_y)
    print('{}{} => {:.1f}km from Alexanderplatz'.format(addr, ' '*(50-len(addr)), d/1000))

Addresses of centers of areas recommended for further analysis

Ritterstraße 90, 10969 Berlin                      => 2.3km from Alexanderplatz
Platz der Vereinten Nationen 30, 10249 Berlin      => 1.1km from Alexanderplatz
Köpenicker Str. 40, 10179 Berlin                   => 1.7km from Alexanderplatz
Vor dem Schlesischen Tor 2, 10997 Berlin           => 3.7km from Alexanderplatz
Gitschiner Str. 34, 10969 Berlin                   => 2.7km from Alexanderplatz
Stallschreiberstraße 46, 10969 Berlin              => 1.7km from Alexanderplatz
Ifflandstraße 12, 10179 Berlin                     => 0.9km from Alexanderplatz
Landsberger Allee 37, 10249 Berlin                 => 1.9km from Alexanderplatz
Schloßpl. 1, 10178 Berlin                          => 1.1km from Alexanderplatz
Alt, 10999 Berlin                                  => 3.9km from Alexanderplatz
Michaelkirchpl. 15, 10179 Berlin                   => 1.7km from Alexanderplatz
Helen-Ernst-Straße 28, 10243 Berlin                => 2.

In [57]:
map_berlin = folium.Map(location=roi_center, zoom_start=14)
folium.Circle(berlin_center, radius=50, color='red', fill=True, fill_color='red', fill_opacity=1).add_to(map_berlin)
for lonlat, addr in zip(cluster_centers, candidate_area_addresses):
    folium.Marker([lonlat[1], lonlat[0]], popup=addr).add_to(map_berlin) 
for lat, lon in zip(good_latitudes, good_longitudes):
    folium.Circle([lat, lon], radius=250, color='#0000ff00', fill=True, fill_color='#0066ff', fill_opacity=0.05).add_to(map_berlin)
map_berlin