# The Battle of the Neighborhoods
## Applied Data Science Capstone Project
### Table of contents
#### Week 1
* [Introduction: Business Opportunity](#introduction)
* [Data](#data)

#### Week 2
* [Methodology](#methodology)
* Results
* Discussion
* Conclusion
* [Final deliverables](#final)

## Introduction: Business Opportunity <a name="introduction"></a>

An idea to bring foreign culture, happiness to neighborhoods and increase earnings.

The target audiences are decision maker from bank, needs the bank to provide financial support.

The idea is to leverage Foursquare data to explore nearby venues and find an optimal location to run an Italian restaurant in Germany.

## Data <a name="data"></a>

Based on the introduction, the following factors influence the decision making of candidate location.
* the number of existing venues in the neighborhoods (any type of similar business).
* the number of distance to the similar business in the neighborhoods.
* the distance of neighborhood from city center.

Following are data sources:
* [Address data come from Bing Map](#address)
* [Coordinates data come from Bing Map](#coordinate)
* [Venues data come from Foursquare](#venue)

In [1]:
# The code was removed by Watson Studio for sharing.

Tokens and Keys are imported


In [2]:
'''
In case you are interested in running the notebook, input values for the following variable:

# IBM Cloud
project_id = 'xxxxxx'
project_access_token = 'xxxxxx'

# Bing Map
BING_API_KEY = 'xxxxxx'

# Foursquare
client_id = 'xxxxxx'
client_secret = 'xxxxxx'

print('Tokens and Keys are imported')
'''
!pip install geocoder
!pip install shapely
!pip install pyproj
!pip install folium

print('-'*10,'Python packages are installed','-'*10)

---------- Python packages are installed ----------


#### Coordinates data<a name="coordinate"></a>

In [3]:
import geocoder

def get_coordinates(address):
    g = geocoder.bing(address, key=BING_API_KEY)
    lat = g.latlng[0]
    lon = g.latlng[1]
    return lat, lon

address = 'Alexanderplatz, Berlin, Germany'

berlin_center = get_coordinates(address)
print("Coordinate of {}: {}".format(address, berlin_center))

Coordinate of Alexanderplatz, Berlin, Germany: (52.521671295166016, 13.413330078125)


In [4]:
import shapely.geometry

import pyproj

import math

def lonlat_to_xy(lon, lat):
    proj_latlon = pyproj.Proj(proj = 'latlong', datum = 'WGS84')
    proj_xy = pyproj.Proj(proj = 'utm', zone = 33, datum = 'WGS84')
    transformer = pyproj.Transformer.from_proj(proj_latlon, proj_xy)
    for xy in transformer.itransform([(lon, lat)]):
        pass
    return xy[0], xy[1]

def xy_to_lonlat(x, y):
    proj_latlon = pyproj.Proj(proj = 'latlong', datum = 'WGS84')
    proj_xy = pyproj.Proj(proj = 'utm', zone = 33, datum = 'WGS84')
    transformer = pyproj.Transformer.from_proj(proj_xy, proj_latlon)
    for lonlat in transformer.itransform([(x, y)]):
        pass
    return lonlat[0], lonlat[1]

def calc_xy_distance(x1, y1, x2, y2):
    dx = x2 - x1
    dy = y2 - y1
    return math.sqrt(dx*dx + dy*dy)

print('Coordinate transformation check')
print('-'*31)
print('Berlin center longitude = {}, latitude = {}'.format(berlin_center[1], berlin_center[0]))
x, y = lonlat_to_xy(berlin_center[1], berlin_center[0])
print('Berlin center UTM X = {}, Y = {}'.format(x, y))
lo, la = xy_to_lonlat(x, y)
print('Berlin center longitude = {}, latitude = {}'.format(lo, la))

Coordinate transformation check
-------------------------------
Berlin center longitude = 13.413330078125, latitude = 52.521671295166016
Berlin center UTM X = 392348.5035818579, Y = 5820245.587784032
Berlin center longitude = 13.413330078125, latitude = 52.521671295166016


In [5]:
berlin_center_x, berlin_center_y = lonlat_to_xy(berlin_center[1], berlin_center[0])

k = math.sqrt(3) / 2
x_min = berlin_center_x - 6000
x_step = 600
y_min = berlin_center_y - 6000 - (int(21/k)*k*600 - 12000)/2
y_step = 600 * k

latitudes = []
longitudes = []
distances_from_center = []
xs = []
ys = []
for i in range(0,int(21/k)):
    y = y_min + i * y_step
    x_offset = 300 if i%2 == 0 else 0
    for j in range(0, 21):
        x = x_min + j * x_step + x_offset
        distance_from_center = calc_xy_distance(berlin_center_x, berlin_center_y, x, y)
        if distance_from_center <= 6001:
            lon, lat = xy_to_lonlat(x, y)
            latitudes.append(lat)
            longitudes.append(lon)
            distances_from_center.append(distance_from_center)
            xs.append(x)
            ys.append(y)
print(len(latitudes), 'candidate neighborhood centers generated.')

364 candidate neighborhood centers generated.


In [6]:
import folium

map_berlin = folium.Map(location=berlin_center, zoom_start=13)
folium.Marker(berlin_center, popup='Alexanderplatz').add_to(map_berlin)
for lat, lon in zip(latitudes, longitudes):
    #folium.Circle([lat, lon], radius = 300, color = 'blue', fill = False).add_to(map_berlin)
    folium.CircleMarker([lat, lon], radius=2, color='blue', fill=True, fill_color='blue', fill_opacity=1).add_to(map_berlin)
    #folium.Marker([lat, lon]).add_to(map_berlin)
map_berlin

#### Address data<a name='address'></a>

In [7]:
def get_address(lat, lon):
    g = geocoder.bing([lat, lon], method='reverse', key=BING_API_KEY)
    address = g.json['raw']['address']['formattedAddress']
    return address

addr = get_address(berlin_center[0], berlin_center[1])
print('Reverse geocoding check')
print('-'*23)
print('Address of [{}, {}] is: {}'.format(berlin_center[0], berlin_center[1], addr))


Reverse geocoding check
-----------------------
Address of [52.521671295166016, 13.413330078125] is: Alexanderplatz 1-5, 10178 Berlin


In [8]:
print('Obtaining location addresses: ', end='')
addresses = []
for lat, lon in zip(latitudes, longitudes):
    address = get_address(lat, lon)
    if address is None:
        address = 'NO ADDRESS'
    addresses.append(address)
    print('.', end='')
print('done.')
addresses[:10]

Obtaining location addresses: ............................................................................................................................................................................................................................................................................................................................................................................done.


['A100, 12101 Berlin',
 '12101, Berlin, Germany',
 '12101, Berlin, Germany',
 'Oderstraße 174, 12101 Berlin',
 'Warthestraße 23, 12051 Berlin',
 'Schierker Straße 19, 12053 Berlin',
 'Karl-Marx Straße 213, 12055 Berlin',
 'Hessenring 36, 12101 Berlin',
 'Kleineweg 133, 12101 Berlin',
 '12101, Berlin, Germany']

In [9]:
import pandas as pd

df_locations = pd.DataFrame({'Address': addresses,
                            'Latitude': latitudes,
                            'Longitude': longitudes,
                            'X': xs,
                            'Y': ys,
                            'Distance from center': distances_from_center})
df_locations.head()

Unnamed: 0,Address,Latitude,Longitude,X,Y,Distance from center
0,"A100, 12101 Berlin",52.469947,13.38869,390548.503582,5814530.0,5992.495307
1,"12101, Berlin, Germany",52.470067,13.39752,391148.503582,5814530.0,5840.3767
2,"12101, Berlin, Germany",52.470186,13.406349,391748.503582,5814530.0,5747.173218
3,"Oderstraße 174, 12101 Berlin",52.470305,13.415178,392348.503582,5814530.0,5715.767665
4,"Warthestraße 23, 12051 Berlin",52.470423,13.424008,392948.503582,5814530.0,5747.173218


#### Venues Data<a name='venue'></a>

In [10]:
import requests

# https://developer.foursquare.com/docs/build-with-foursquare/categories/
food_category = '4d4b7105d754a06374d81259'

italian_restaurant_categories = ['4bf58dd8d48988d110941735','55a5a1ebe4b013909087cbb6','55a5a1ebe4b013909087cb7c',
                                 '55a5a1ebe4b013909087cba7','55a5a1ebe4b013909087cba1','55a5a1ebe4b013909087cba4',
                                 '55a5a1ebe4b013909087cb95','55a5a1ebe4b013909087cb89','55a5a1ebe4b013909087cb9b',
                                 '55a5a1ebe4b013909087cb98','55a5a1ebe4b013909087cbbf','55a5a1ebe4b013909087cb79',
                                 '55a5a1ebe4b013909087cbb0','55a5a1ebe4b013909087cbb3','55a5a1ebe4b013909087cb74',
                                 '55a5a1ebe4b013909087cbaa','55a5a1ebe4b013909087cb83','55a5a1ebe4b013909087cb8c',
                                 '55a5a1ebe4b013909087cb92','55a5a1ebe4b013909087cb8f','55a5a1ebe4b013909087cb86',
                                 '55a5a1ebe4b013909087cbb9','55a5a1ebe4b013909087cb7f','55a5a1ebe4b013909087cbbc',
                                 '55a5a1ebe4b013909087cb9e','55a5a1ebe4b013909087cbc2','55a5a1ebe4b013909087cbad']

def is_restaurant(categories, specific_filter=None):
    restaurant_words = ['restaurant', 'diner','taverna','steakhouse']
    restaurant = False
    specific = False
    for c in categories:
        category_name = c[0].lower()
        category_id = c[1]
        for r in restaurant_words:
            if r in category_name:
                restaurant = True
        if 'fast food' in category_name:
            restaurant = False
        if not(specific_filter is None) and (category_id in specific_filter):
            specific = True
            restaurant = True
    return restaurant, specific

def get_categories(categories):
    return [(cat['name'], cat['id']) for cat in categories]

def format_address(location):
    address = ', '.join(location['formattedAddress'])
    address = address.replace(', Deutschland', '')
    address = address.replace(', Germany', '')
    return address

def get_venues_near_location(lat, lon, category, radius=500, limit=100):
    version = '20210101'
    url = 'https://api.foursquare.com/v2/venues/explore?ll={},{}&categoryId={}&radius={}&limit={}&v={}&client_id={}&client_secret={}'.format(\
    lat, lon, category, radius, limit, version, client_id, client_secret)
    #print(url)
    #results = requests.get(url).json()
    #print(requests.get(url).json())
    try:
        results = requests.get(url).json()['response']['groups'][0]['items']
        #print('\n',results[0],'\n')
        venues = [(item['venue']['id'],
                   item['venue']['name'],
                   get_categories(item['venue']['categories']),
                   (item['venue']['location']['lat'], item['venue']['location']['lng']),
                   format_address(item['venue']['location']),
                   item['venue']['location']['distance']) for item in results]
        #print(venues)
    except:
        venues = []
    return venues

print('Venue coding check')
print('-'*18)
print(get_venues_near_location(berlin_center[0], berlin_center[1], food_category)[:1])


Venue coding check
------------------
[('5ab2728716fa0464f2ec0a42', 'Mama Van - Sai Gon Deli', [('Vietnamese Restaurant', '4bf58dd8d48988d14a941735')], (52.52277618060749, 13.41046135741149), 'Karl-Liebknecht-Str. 15, 10178 Berlin', 229)]


In [11]:

def get_restaurants(lats, lons):
    restaurants = {}
    italian_restaurants = {}
    location_restaurants = []
    
    print('Obtaining venues around candidates locations: ', end='')
    for lat, lon in zip(lats, lons):
        venues = get_venues_near_location(lat, lon, food_category, radius=350)
        #print(venues, lat, lon, food_category)
        area_restaurants = []
        for venue in venues:
            venue_id = venue[0]
            venue_name = venue[1]
            venue_categories = venue[2]
            venue_latlon = venue[3]
            venue_address = venue[4]
            venue_distance = venue[5]
            is_res, is_italian = is_restaurant(venue_categories, specific_filter=italian_restaurant_categories)
            if is_res:
                x, y = lonlat_to_xy(venue_latlon[1], venue_latlon[0])
                restaurant = (venue_id, venue_name, venue_latlon[0], venue_latlon[1], venue_address, venue_distance, is_italian, x, y)
                if venue_distance <= 300:
                    area_restaurants.append(restaurant)
                restaurants[venue_id] = restaurant
                if is_italian:
                    italian_restaurants[venue_id] = restaurant
        location_restaurants.append(area_restaurants)
        print('.', end='')
    print('done.')
    return restaurants, italian_restaurants, location_restaurants

print('get restaurants coding check')
print('-'*28)
print(get_restaurants([berlin_center[0]], [berlin_center[1]])[:1])


get restaurants coding check
----------------------------
Obtaining venues around candidates locations: .done.
({'5ab2728716fa0464f2ec0a42': ('5ab2728716fa0464f2ec0a42', 'Mama Van - Sai Gon Deli', 52.52277618060749, 13.41046135741149, 'Karl-Liebknecht-Str. 15, 10178 Berlin', 229, False, 392156.5873505683, 5820372.757047131), '5a6f557389e4906a1487e680': ('5a6f557389e4906a1487e680', 'THE REED', 52.522172, 13.4084076, 'Karl-Liebknecht-Str. 13, 10178 Berlin', 338, False, 392015.7762002655, 5820308.629403383), '569791a2498eecf0031f2026': ('569791a2498eecf0031f2026', 'Döner Curry', 52.52076904649568, 13.412803377932226, 'Berlin', 106, False, 392310.56407849886, 5820146.0237856805), '4f2ebefbe4b083c4288c3b15': ('4f2ebefbe4b083c4288c3b15', 'Restaurant Sphere', 52.52091214530329, 13.409809865557389, 'Panoramastr. 1A, 10178 Berlin', 252, False, 392107.82188021275, 5820166.409504815), '5370ac46498e5c1ba757eb85': ('5370ac46498e5c1ba757eb85', 'Esra', 52.52360868074495, 13.40931253282819, 'Rosa-Luxe

In [12]:
'''
save and load data from IBM Cloud Object Storage
https://dataplatform.cloud.ibm.com/docs/content/wsj/analyze-data/project-lib-python.html#save-data
'''
from project_lib import Project
project = Project(project_id=project_id, project_access_token=project_access_token)

import numpy as np

import pickle

restaurants = {}
italian_restaurants = {}
location_restaurants = []

try:
    restaurants = pickle.load(project.get_file('restaurants.pkl'))
    italian_restaurants = pickle.load(project.get_file('italian_restaurants.pkl'))
    location_restaurants = pickle.load(project.get_file('location_restaurants.pkl'))

except:
    restaurants, italian_restaurants, location_restaurants = get_restaurants(latitudes, longitudes)
    project.save_data(data=pickle.dumps(restaurants),file_name='restaurants.pkl',overwrite=True)
    project.save_data(data=pickle.dumps(italian_restaurants),file_name='italian_restaurants.pkl',overwrite=True)
    project.save_data(data=pickle.dumps(location_restaurants),file_name='location_restaurants.pkl',overwrite=True)

#restaurants, italian_restaurants, location_restaurants = get_restaurants(latitudes, longitudes)

print('Total number of restaurents: ', len(restaurants))
print('Total number of Italian restaurants: ', len(italian_restaurants))
print('Percent of Italian restaurents: {:.2f}%'.format(len(italian_restaurants) / len(restaurants) * 100))
print('Average number of restaurants in neighborhood: ', np.array([len(r) for r in location_restaurants]).mean())

Total number of restaurents:  2049
Total number of Italian restaurants:  298
Percent of Italian restaurents: 14.54%
Average number of restaurants in neighborhood:  4.876373626373627


In [13]:
print('List of all restaurants')
print('-'*23)
for r in list(restaurants.values())[:10]:
    print(r)
print('.'*3)
print('Total:', len(restaurants))

List of all restaurants
-----------------------
('4c49b2c520ab1b8de2062117', 'Trattoria Toscana', 52.4715056699932, 13.385329206997389, 'Tempelhofer Damm 104 (Hoeppnerstr.), 12099 Berlin', 349, True, 390324.0734948358, 5814708.250705032)
('529c75dc498e5f28c08db637', 'Orient Food', 52.468418, 13.385631, 'Tempelhofer damm 124, 12099 Berlin', 268, False, 390336.8958508584, 5814364.380615842)
('52b5d5d0498edc50653df469', 'Yasaka-Sushi', 52.46877834428962, 13.385458971242336, 'Tempelhofer Damm 124, 12099 Berlin', 254, False, 390326.1066713873, 5814404.719417979)
('4fc6a39be4b0b925af772392', 'Lodos Bistro', 52.47099391193633, 13.385207714672129, 'Berlin', 263, False, 390314.54923433566, 5814651.517235505)
('53bfbf97498e767d04eba262', 'My Asia', 52.46863425153149, 13.38513817439509, 'Tempelhofer Damm 122 (Ringahnstraße), 12103 Berlin', 281, False, 390303.9585045325, 5814389.180474727)
('4f53b76de4b0754d59050702', 'Yoko Sushi', 52.46798, 13.38516391, 'Tempelhofer Damm 128, 12099 Berlin', 324, 

In [14]:
print('List of Italian restaurants')
print('-'*23)
for r in list(italian_restaurants.values())[:10]:
    print(r)
print('.'*3)
print('Total:', len(italian_restaurants))

List of Italian restaurants
-----------------------
('4c49b2c520ab1b8de2062117', 'Trattoria Toscana', 52.4715056699932, 13.385329206997389, 'Tempelhofer Damm 104 (Hoeppnerstr.), 12099 Berlin', 349, True, 390324.0734948358, 5814708.250705032)
('4e7793c48130fa6bd3b581ff', 'Ninì e Pettirosso', 52.470317, 13.43698, 'Selkestr. 27, 12051 Berlin', 320, True, 393829.34645155125, 5814498.875631407)
('52c863ca498e73da17e59cb9', 'Caligari', 52.475843, 13.423658, 'Kienitzerstr. 110, 12049 Berlin', 321, True, 392937.9001831622, 5815133.146853393)
('514e2368e4b03aaaf1b5055c', 'La Pecora Nera', 52.477266, 13.421193, 'Herrfurthplatz 6, 12049 Berlin', 324, True, 392773.95274748007, 5815295.070954637)
('55a69d78498eeec7e1b94b3b', 'Bottega No. 6', 52.478164, 13.438749, 'Richardstr. 6, 12043 Berlin', 341, True, 393968.3663740409, 5815369.029899831)
('51dc5936498ed7f9f36a0f04', 'Su Nuraghe', 52.47474284089369, 13.446922302246094, 'Richardplatz 1, Berlin', 98, True, 394515.235132202, 5814976.556749946)
('4d

In [15]:
print('Restaurants around location')
print('-'*23)
for i in range(10):
    rs = location_restaurants[i][:2]
    names = [r[1] for r in rs]
    print('Restaurants around location {}: {}'.format(i, names))

Restaurants around location
-----------------------
Restaurants around location 0: ['Trattoria Toscana', 'Orient Food']
Restaurants around location 1: ['Korean BBQ']
Restaurants around location 2: []
Restaurants around location 3: []
Restaurants around location 4: ['Warthe-Mal']
Restaurants around location 5: ['Merakli Köfteci', 'STAR Gemüse-Schawarma & Falafel']
Restaurants around location 6: ['Café Vux', 'Hakiki']
Restaurants around location 7: []
Restaurants around location 8: ['Mauwal', 'Mahatma']
Restaurants around location 9: []


In [16]:
map_berlin = folium.Map(location=berlin_center, zoom_start=13)
folium.Marker(berlin_center, popup='Alexanderplatz').add_to(map_berlin)
for res in restaurants.values():
    lat = res[2]
    lon = res[3]
    is_italian = res[6]
    color = 'red' if is_italian else 'blue'
    folium.CircleMarker([lat,lon], radius=3, color=color, fill=True, fill_color=color, fill_opacity=1).add_to(map_berlin)
print('Generating map...',end='')
map_berlin

Generating map...

## Methodology
detecting areas that have low **restaurants** density, particularly these with low number of **italian restaurands**.


## Analysis
basic explanatory

In [None]:
location_restaurants_count = [len(res) for res in location_restaurants]
df_locations['Restaurants in area'] = location_restaurants_count
print('Average number of restaurants in every area with radius = 300m:', np.array(location_restaurants_count).mean())
df_locations.head()

In [None]:
distances_to_italian_restaurant = []
for area_x, area_y in zip(xs, ys):
    min_distance = 10000
    for res in italian_restaurants.values():
        res_x = res[7]
        res_y = res[8]
        d = calc_xy_distance(area_x, area_y, res_x, res_y)
        if d < min_distance:
            min_distance = d
    distances_to_italian_restaurant.append(min_distance)

df_locations['Distance to italian restaurant'] = distances_to_italian_restaurant
print('Average distance to closest italian restaurants from each area center:', df_locations['Distance to italian restaurant'].mean())
df_locations.sample(10)

create a map showing heathap / density of restaurants, show borders of Berlin boroughts

In [None]:
berlin_boroughs_url = 'https://raw.githubusercontent.com/m-hoerz/berlin-shapes/master/berliner-bezirke.geojson'
berlin_boroughs = requests.get(berlin_boroughs_url).json()
def boroughs_style(feature):
    return {'color': 'blue', 'fill': False}


In [None]:
restaurant_latlons = [[res[2], res[3]] for res in restaurants.values()]
italian_latlons = [[res[2], res[3]] for res in italian_restaurants.values()]

In [None]:
from folium import plugins
from folium.plugins import HeatMap

map_berlin = folium.Map(location=berlin_center, zoom_start=13)
folium.TileLayer('cartodbpositron').add_to(map_berlin)
HeatMap(restaurant_latlons).add_to(map_berlin)
folium.Marker(berlin_center).add_to(map_berlin)
for r in [1000, 2000, 3000]:
    folium.Circle(berlin_center, radius=r, fill=False, color='white').add_to(map_berlin)
folium.GeoJson(berlin_boroughs, style_function=boroughs_style, name='geojson').add_to(map_berlin)
print('Genearting map...',end='')
map_berlin

create another heatmap showing heatmap/density of italian restaurants only

In [None]:
map_berlin = folium.Map(location=berlin_center, zoom_start=13)
folium.TileLayer('cartodbpositron').add_to(map_berlin)
HeatMap(italian_latlons).add_to(map_berlin)
folium.Marker(berlin_center).add_to(map_berlin)
for r in [1000, 2000, 3000]:
    folium.Circle(berlin_center, radius=r, fill=False, color='white').add_to(map_berlin)
folium.GeoJson(berlin_boroughs, style_function=boroughs_style, name='geojson').add_to(map_berlin)
print('Genearting map...',end='')
map_berlin


## Kreuzberg and Friedrichshain
narrow region of interest, move the center to south-west, south, south-east

In [None]:
roi_x_min = berlin_center_x - 2000
roi_y_max = berlin_center_y + 1000
roi_width = 5000
roi_height = 5000
roi_center_x = roi_x_min + 2500
roi_center_y = roi_y_max - 2500
roi_center_lon, roi_center_lat = xy_to_lonlat(roi_center_x, roi_center_y)
roi_center = [roi_center_lat, roi_center_lon]

map_berlin = folium.Map(location=roi_center, zoom_start=13)
HeatMap(restaurant_latlons).add_to(map_berlin)
folium.Marker(berlin_center).add_to(map_berlin)
folium.Circle(roi_center, radius=2500, color='white', fill=True, fill_opacity=0.4).add_to(map_berlin)
folium.GeoJson(berlin_boroughs, style_function=boroughs_style, name='geojson').add_to(map_berlin)
print('Genearting map...',end='')
map_berlin

create a new, more dense grid of location candidates

In [None]:
k = math.sqrt(3) / 2
x_step = 100
y_step = 100 * k
roi_y_min = roi_center_y - 2500

roi_latitudes = []
roi_longitudes = []
roi_xs = []
roi_ys = []
for i in range(0, int(51/k)):
    y = roi_y_min + i * y_step
    x_offset = 50 if i%2==0 else 0
    for j in range(0, 51):
        x = roi_x_min + j * x_step + x_offset
        d = calc_xy_distance(roi_center_x, roi_center_y, x, y)
        if d <= 2501:
            lon, lat = xy_to_lonlat(x, y)
            roi_latitudes.append(lat)
            roi_longitudes.append(lon)
            roi_xs.append(x)
            roi_ys.append(y)

print(len(roi_latitudes), 'candidate neighborhood centers generated.')

calculate number of restaurants in vicinity(radius=250m) and distence to closest italian restaurant

In [None]:
def count_restaurants_nearby(x, y, restaurants, radius=250):
    count = 0
    for res in restaurants.values():
        res_x = res[7]
        res_y = res[8]
        d = calc_xy_distance(x, y, res_x, res_y)
        if d < radius:
            count += 1
    return count

def find_nearest_restaurant(x, y, restaurants):
    d_min = 10000
    for res in restaurants.values():
        res_x = res[7]
        res_y = res[8]
        d = calc_xy_distance(x, y, res_x, res_y)
        if d <= d_min:
            d_min = d
    return d_min

roi_restaurant_counts = []
roi_italian_distances = []
print('Generating data on location candidates...', end='')
for x, y in zip(roi_xs, roi_ys):
    count = count_restaurants_nearby(x, y, restaurants, radius=250)
    roi_restaurant_counts.append(count)
    distance = find_nearest_restaurant(x, y, italian_restaurants)
    roi_italian_distances.append(distance)
print('done.')


create dataframe

In [None]:
df_roi_locations = pd.DataFrame({'Latitude': roi_latitudes,
                                 'Longitude': roi_longitudes,
                                 'X': roi_xs,
                                 'Y': roi_ys,
                                 'Restaurants nearby': roi_restaurant_counts,
                                 'Distance to italian restaurant': roi_italian_distances})
df_roi_locations.head(10)

filter locations with no more than 2 restaurants in radius=250m and no italian restaurants in radius=400m

In [None]:
good_res_count = np.array((df_roi_locations['Restaurants nearby']<=2))
print('Locations with no more than 2 restaurants nearby:', good_res_count.sum())

good_ita_distance = np.array(df_roi_locations['Distance to italian restaurant']>=400)
print('Locations with no italian restaurant within 400m', good_ita_distance.sum())

good_locations = np.logical_and(good_res_count, good_ita_distance)
print('Locations with both condition met:', good_locations.sum())

df_good_locations = df_roi_locations[good_locations]

show on a map

In [None]:
good_latitudes = df_good_locations['Latitude'].values
good_longitudes = df_good_locations['Longitude'].values
good_locations = [[lat, lon] for lat, lon in zip(good_latitudes, good_longitudes)]

map_berlin = folium.Map(location=roi_center, zoom_start=14)
folium.TileLayer('cartodbpositron').add_to(map_berlin)
HeatMap(restaurant_latlons).add_to(map_berlin)
folium.Circle(roi_center, radius=2500, color='white', fill=True, fill_opacity=0.6).add_to(map_berlin)
folium.Marker(berlin_center).add_to(map_berlin)
for lat, lon in good_locations:
    folium.CircleMarker([lat, lon], radius=2, color='blue', fill=True, fill_color='blue', fill_opacity=1).add_to(map_berlin)
folium.GeoJson(berlin_boroughs, style_function=boroughs_style, name='geojson').add_to(map_berlin)
map_berlin

show good location in heatmap

In [None]:
map_berlin = folium.Map(location=roi_center, zoom_start=14)
HeatMap(good_locations, radius=25).add_to(map_berlin)
folium.Marker(berlin_center).add_to(map_berlin)
for lat, lon in good_locations:
    folium.CircleMarker([lat, lon], radius=2, color='blue', fill=True, fill_color='blue', fill_opacity=1).add_to(map_berlin)
folium.GeoJson(berlin_boroughs, style_function=boroughs_style, name='geojson').add_to(map_berlin)
map_berlin

cluster these locations to create centers of zones containing good locations

In [None]:
from sklearn.cluster import KMeans

number_of_clusters = 15

good_xys = df_good_locations[['X', 'Y']].values
kmeans = KMeans(n_clusters=number_of_clusters, random_state=0).fit(good_xys)

cluster_centers = [xy_to_lonlat(cc[0], cc[1]) for cc in kmeans.cluster_centers_]

map_berlin = folium.Map(location=roi_center, zoom_start=14)
folium.TileLayer('cartodbpositron').add_to(map_berlin)
HeatMap(restaurant_latlons).add_to(map_berlin)
folium.Circle(roi_center, radius=2500, color='white', fill=True, fill_opacity=0.4).add_to(map_berlin)
folium.Marker(berlin_center).add_to(map_berlin)
for lon, lat in cluster_centers:
    folium.Circle([lat, lon], radius=500, color='green', fill=True, fill_opacity=0.25).add_to(map_berlin)
for lat, lon in good_locations:
    folium.CircleMarker([lat, lon], radius=2, color='blue', fill=True, fill_color='blue', fill_opacity=1).add_to(map_berlin)
folium.GeoJson(berlin_boroughs, style_function=boroughs_style, name='geojson').add_to(map_berlin)
map_berlin

see these zones on a map using shaded areas

In [None]:
map_berlin = folium.Map(location=roi_center, zoom_start=14)
folium.Marker(berlin_center).add_to(map_berlin)
for lat, lon in good_locations:
    folium.Circle([lat, lon], radius=250, color='#00000000', fill=True, fill_color='#0066FF', fill_opacity=0.07).add_to(map_berlin)
    folium.CircleMarker([lat, lon], radius=2, color='blue', fill=True, fill_color='blue', fill_opacity=1).add_to(map_berlin)
for lon, lat in cluster_centers:
    folium.Circle([lat, lon], radius=500, color='green', fill=False).add_to(map_berlin)
folium.GeoJson(berlin_boroughs, style_function=boroughs_style, name='geojson').add_to(map_berlin)
map_berlin

zoom in candidate area **Kreuzberg**

In [None]:

map_berlin = folium.Map(location=get_coordinates('Kreuzberg'), zoom_start=15)
folium.Marker(berlin_center).add_to(map_berlin)
for lon, lat in cluster_centers:
    folium.Circle([lat, lon], radius=500, color='green', fill=False).add_to(map_berlin)
for lat, lon in good_locations:
    folium.Circle([lat, lon], radius=250, color='#0000ff00', fill=True, fill_color='#0066ff', fill_cpacity=0.07).add_to(map_berlin)
    folium.CircleMarker([lat, lon], radius=2, color='blue', fill=True, fill_color='blue', fill_opacity=1).add_to(map_berlin)
folium.GeoJson(berlin_boroughs, style_function=boroughs_style, name='geojson').add_to(map_berlin)
map_berlin

zoom in candidate area **Friedrichshain**

In [None]:
map_berlin = folium.Map(location=get_coordinates('Friedrichshain'), zoom_start=15)
folium.Marker(berlin_center).add_to(map_berlin)
for lon, lat in cluster_centers:
    folium.Circle([lat, lon], radius=500, color='green', fill=False).add_to(map_berlin)
for lat, lon in good_locations:
    folium.Circle([lat, lon], radius=250, color='#0000ff00', fill=True, fill_color='#0066ff', fill_cpacity=0.07).add_to(map_berlin)
    folium.CircleMarker([lat, lon], radius=2, color='blue', fill=True, fill_color='blue', fill_opacity=1).add_to(map_berlin)
folium.GeoJson(berlin_boroughs, style_function=boroughs_style, name='geojson').add_to(map_berlin)
map_berlin

get addresses of these candidates ares

In [None]:
candidate_area_address = []
print('='*60)
print('Addresses of centers of areas recommended for further analysis')
print('='*60 + '\n')

for lon, lat in cluster_centers:
    addr = get_address(lat, lon)
    candidate_area_address.append(addr)
    x, y = lonlat_to_xy(lon, lat)
    d = calc_xy_distance(x, y, berlin_center_x, berlin_center_y)
    print('{}{} => {:.1f}km from Alexanderplatz'.format(addr, ' '*(50-len(addr)), d/1000))
    

In [None]:
map_berlin = folium.Map(location=roi_center, zoom_start=13)
folium.Circle(berlin_center, radio=50, color='red', fill=True, fill_color='red', fill_opacity=1).add_to(map_berlin)
for lonlat, addr in zip(cluster_centers, candidate_area_address):
    folium.Marker([lonlat[1],lonlat[0]], popup=addr).add_to(map_berlin)
for lat, lon in good_locations:
    folium.Circle([lat, lon], radius=250, color='#0000ff00', fill=True, fill_color='#0066ff', fill_opacity=0.05).add_to(map_berlin)
map_berlin

Special thanks to the author of the example notebook