# The Battle of Neighborhoods

## 1. Introduction

Perth is the capital and largest city of the Australian state of Western Australian. Perth is one of the most successful diverse cities in the world, because of which, a rich variety of cuisines from different backgrounds has been adopted and developed. For example, even in a small neighborhood, you can find many different types of restaurants- Chinese, Italian, Indian, Mexican, Thai, Japanese……you name it! If you are passionate with business in food industry, like opening a restaurant, you might have to accept a reality- the competition is quite tough. 

One of our clients, Mr. Romano who is an immigrant from Italy, has great passion in opening an Italian restaurant in Perth city center. But there are already so many restaurants in the city. Where is the best location to open an Italian restaurant? 

In order to help Mr. Romano with the solution, we will use data science power to generate some promising neighborhoods. And then, he can choose one of them as the best location for business value. The locations we are looking for have to meet at least three criteria- as close to city center as possible, not already crowded with restaurants and without Italian restaurants in vicinity. 


## 2. Data

Based on the definition of the problem, factors that might impact our decision are:
    
•	Number of the existing restaurants in the neighborhood

•	Number of the existing Italian restaurants in the neighborhood

•	Distance of the neighborhood from the city center 

Following data sources will be needed to extract the required information:
    
•	Centers of hexagon neighborhoods will be generated algorithmically and approximate address of centers of those areas will be obtained
using ‘geopy.geocoders’
•	Restaurants data including number, type and location in every neighbohood will be obtained using Foursquare API

•	Coordinate of Perth center will be obtained using ‘geopy.geocoders’


### Create neighborhood candidates 

In [2]:
import numpy as np
import pandas as pd

In [3]:
import matplotlib.pyplot
import seaborn as sns

In [4]:
!conda install -c conda-forge geopy --yes

Collecting package metadata (current_repodata.json): done
Solving environment: done

# All requested packages already installed.



In [5]:
from geopy.geocoders import Nominatim 
import requests 

In [6]:
geolocator = Nominatim(user_agent="perth_explorer")

In [7]:
def get_coordinates(address, verbose=False):
    location = geolocator.geocode(address)
    lat = location.latitude
    lon = location.longitude
    return[lat, lon]
    
address = 'Hay Street Mall, Perth, Australia'
perth_center = get_coordinates(address)
print('Coordinate of {}: {}'.format(address, perth_center))

Coordinate of Hay Street Mall, Perth, Australia: [-31.9540732, 115.858585]


Next, we will create a grid of neighbohood candidates, equally spaced, centered around city center and within ~6km from Hay Street Mall. Our neighborhoods will be defined as circular areas with a radius of 300 meters.To accurately calculate distances we need to create functions to convert between WGS84 spherical coordinate system (latitude/longitude degrees) and UTM Cartesian coordinate system (X/Y coordinates in meters). Then we'll project those coordinates back to latitude/longitude degrees to be shown on Folium map. 

In [8]:
!pip install shapely
import shapely.geometry

!pip install pyproj
import pyproj

import math

def lonlat_to_xy(lon, lat):
    proj_latlon = pyproj.Proj(proj='latlong',datum='WGS84')
    proj_xy = pyproj.Proj(proj="utm", zone=33, datum='WGS84')
    xy = pyproj.transform(proj_latlon, proj_xy, lon, lat)
    return xy[0], xy[1]

def xy_to_lonlat(x, y):
    proj_latlon = pyproj.Proj(proj='latlong',datum='WGS84')
    proj_xy = pyproj.Proj(proj="utm", zone=33, datum='WGS84')
    lonlat = pyproj.transform(proj_xy, proj_latlon, x, y)
    return lonlat[0], lonlat[1]

def calc_xy_distance(x1, y1, x2, y2):
    dx = x2 - x1
    dy = y2 - y1
    return math.sqrt(dx*dx + dy*dy)

print('Coordinate transformation check')
print('-------------------------------')
print('Perth center longitude={}, latitude={}'.format(perth_center[1], perth_center[0]))
x, y = lonlat_to_xy(perth_center[1], perth_center[0])
print('Perth center UTM X={}, Y={}'.format(x, y))
lo, la = xy_to_lonlat(x, y)
print('Perth center longitude={}, latitude={}'.format(lo, la))

Collecting shapely
[?25l  Downloading https://files.pythonhosted.org/packages/9d/18/557d4f55453fe00f59807b111cc7b39ce53594e13ada88e16738fb4ff7fb/Shapely-1.7.1-cp36-cp36m-manylinux1_x86_64.whl (1.0MB)
[K     |████████████████████████████████| 1.0MB 9.2MB/s eta 0:00:01
[?25hInstalling collected packages: shapely
Successfully installed shapely-1.7.1
Coordinate transformation check
-------------------------------
Perth center longitude=115.858585, latitude=-31.9540732
Perth center UTM X=8138447.55030681, Y=-11860351.441958493
Perth center longitude=115.858584999991, latitude=-31.954073200013827


In [9]:
perth_center_x, perth_center_y = lonlat_to_xy(perth_center[1], perth_center[0]) # City center in Cartesian coordinates

k = math.sqrt(3) / 2 # Vertical offset for hexagonal grid cells
x_min = perth_center_x - 6000
x_step = 600
y_min = perth_center_y - 6000 - (int(21/k)*k*600 - 12000)/2
y_step = 600 * k 

latitudes = []
longitudes = []
distances_from_center = []
xs = []
ys = []
for i in range(0, int(21/k)):
    y = y_min + i * y_step
    x_offset = 300 if i%2==0 else 0
    for j in range(0, 21):
        x = x_min + j * x_step + x_offset
        distance_from_center = calc_xy_distance(perth_center_x, perth_center_y, x, y)
        if (distance_from_center <= 6001):
            lon, lat = xy_to_lonlat(x, y)
            latitudes.append(lat)
            longitudes.append(lon)
            distances_from_center.append(distance_from_center)
            xs.append(x)
            ys.append(y)

print(len(latitudes), 'candidate neighborhood centers generated.')

364 candidate neighborhood centers generated.


In [10]:
!pip install folium
import folium



In [11]:
map_perth = folium.Map(location=perth_center, zoom_start=13)
folium.Marker(perth_center, popup='Hay Street Mall').add_to(map_perth)
for lat, lon in zip(latitudes, longitudes):
    #folium.CircleMarker([lat, lon], radius=2, color='blue', fill=True, fill_color='blue', fill_opacity=1).add_to(map_berlin) 
    folium.Circle([lat, lon], radius=300, color='blue', fill=False).add_to(map_perth)
    #folium.Marker([lat, lon]).add_to(map_berlin)
map_perth

Then, we use the geopy.geocoders to get their approximate addresses

In [12]:
def get_address(latitude, longitude):
    try:
        addre = ''
        addre = str(latitude) + ',' + str(longitude)
        location = geolocator.reverse(addre)
        addre = location.address
        return addre
    except:
        return Error

addr = get_address(perth_center[0], perth_center[1])
print('Reverse geocoding check')
print('-----------------------')
print('Address of [{}, {}] is: {}'.format(perth_center[0], perth_center[1], addr))

Reverse geocoding check
-----------------------
Address of [-31.9540732, 115.858585] is: Mo Expresso, Trinity Arcade, Perth, City of Perth, Western Australia, 6000, Australia


In [15]:
print('Obtaining location addresses: ', end='')
addresses = []
for lat, lon in zip(latitudes, longitudes):
    address = get_address(lat, lon)
    if address is None:
        address = 'NO ADDRESS'
    address = address.replace(', Australia', '') # We don't need country part of address
    addresses.append(address)
    print(' .', end='')
print(' done.')

Obtaining location addresses:  . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . done.


In [16]:
import pandas as pd

df_locations = pd.DataFrame({'Address': addresses,
                             'Latitude': latitudes,
                             'Longitude': longitudes,
                             'X': xs,
                             'Y': ys,
                             'Distance from center': distances_from_center})

df_locations.head(10)

Unnamed: 0,Address,Latitude,Longitude,X,Y,Distance from center
0,"Victoria Park Drive, Burswood, Town of Victori...",-31.952881,115.893766,8136648.0,-11866070.0,5992.495307
1,"Victoria Park Drive, Burswood, Town of Victori...",-31.950053,115.892572,8137248.0,-11866070.0,5840.3767
2,"Placid Avenue, Burswood, Town of Victoria Park...",-31.947225,115.891379,8137848.0,-11866070.0,5747.173218
3,"Belmont Park Racecourse, Placid Avenue, Burswo...",-31.944397,115.890186,8138448.0,-11866070.0,5715.767665
4,"Belmont Park Racecourse, Placid Avenue, Burswo...",-31.941569,115.888993,8139048.0,-11866070.0,5747.173218
5,"St John of God Mt Lawley Hospital, Thirlmere R...",-31.938742,115.8878,8139648.0,-11866070.0,5840.3767
6,"St John of God Mt Lawley Hospital, Thirlmere R...",-31.935914,115.886608,8140248.0,-11866070.0,5992.495307
7,"Crown Perth, Bolton Avenue, Burswood, Town of ...",-31.958006,115.892682,8135748.0,-11865550.0,5855.766389
8,"Roger Mackay Drive, Burswood, Town of Victoria...",-31.955177,115.891489,8136348.0,-11865550.0,5604.462508
9,"Burswood, Town of Victoria Park, Western Austr...",-31.952348,115.890296,8136948.0,-11865550.0,5408.326913


In [17]:
df_locations.to_pickle('./locations.pkl')   

Then we use Foursquare API to get information about the restaurants in each candidate neighborhood

In [18]:
# @hidden_cell
    client_id = 'PUELOZVKKLG3ZW0RKWTTQBMJRWFN3VB31LXA5LRAJ1WEQLYT' # your Foursquare ID
    client_secret = 'Z0UFCST0M0F5QC0TYIUWKMKJDFHY4OFP04VC0NE25CC5R4MV' # your Foursquare Secret
    version = '20210129'

In [19]:
# Category IDs corresponding to Italian restaurants were taken from Foursquare web site (https://developer.foursquare.com/docs/resources/categories):

food_category = '4d4b7105d754a06374d81259' # 'Root' category for all food-related venues

italian_restaurant_categories = ['4bf58dd8d48988d110941735','55a5a1ebe4b013909087cbb6','55a5a1ebe4b013909087cb7c',
                                 '55a5a1ebe4b013909087cba7','55a5a1ebe4b013909087cba1','55a5a1ebe4b013909087cba4',
                                 '55a5a1ebe4b013909087cb95','55a5a1ebe4b013909087cb89','55a5a1ebe4b013909087cb9b',
                                 '55a5a1ebe4b013909087cb98','55a5a1ebe4b013909087cbbf','55a5a1ebe4b013909087cb79',
                                 '55a5a1ebe4b013909087cbb0','55a5a1ebe4b013909087cbb3','55a5a1ebe4b013909087cb74',
                                 '55a5a1ebe4b013909087cbaa','55a5a1ebe4b013909087cb83','55a5a1ebe4b013909087cb8c',
                                 '55a5a1ebe4b013909087cb92','55a5a1ebe4b013909087cb8f','55a5a1ebe4b013909087cb86',
                                 '55a5a1ebe4b013909087cbb9','55a5a1ebe4b013909087cb7f','55a5a1ebe4b013909087cbbc',
                                 '55a5a1ebe4b013909087cb9e','55a5a1ebe4b013909087cbc2','55a5a1ebe4b013909087cbad']

In [20]:
def is_restaurant(categories, specific_filter=None):
    restaurant_words = ['restaurant', 'diner', 'taverna', 'steakhouse']
    restaurant = False
    specific = False
    for c in categories:
        category_name = c[0].lower()
        category_id = c[1]
        for r in restaurant_words:
            if r in category_name:
                restaurant = True
        if 'fast food' in category_name:
            restaurant = False
        if not(specific_filter is None) and (category_id in specific_filter):
            specific = True
            restaurant = True
    return restaurant, specific

def get_categories(categories):
    return [(cat['name'], cat['id']) for cat in categories]

def format_address(location):
    address = ', '.join(location['formattedAddress'])
    address = address.replace(', Western Australia', '')
    address = address.replace(', Australia', '')
    return address

In [21]:
import json

In [22]:
def get_venues_near_location(lat, lon, category, client_id, client_secret, radius=500, limit=100):
    version = '20210129'
    url = 'https://api.foursquare.com/v2/venues/explore?client_id={}&client_secret={}&v={}&ll={},{}&categoryId={}&radius={}&limit={}'.format(
        client_id, client_secret, version, lat, lon, category, radius, limit)
    try:
        results = requests.get(url).json()['response']['groups'][0]['items']
        venues = [(item['venue']['id'],
                   item['venue']['name'],
                   get_categories(item['venue']['categories']),
                   (item['venue']['location']['lat'], item['venue']['location']['lng']),
                   format_address(item['venue']['location']),
                   item['venue']['location']['distance']) for item in results]        
    except:
        venues = []
    return venues

In [23]:
import pickle

def get_restaurants(lats, lons):
    restaurants = {}
    italian_restaurants = {}
    location_restaurants = []

    print('Obtaining venues around candidate locations:', end='')
    for lat, lon in zip(lats, lons):
        # Using radius=350 to meke sure we have overlaps/full coverage so we don't miss any restaurant (we're using dictionaries to remove any duplicates resulting from area overlaps)
        venues = get_venues_near_location(lat, lon, food_category, client_id, client_secret, radius=350, limit=100)
        area_restaurants = []
        for venue in venues:
            venue_id = venue[0]
            venue_name = venue[1]
            venue_categories = venue[2]
            venue_latlon = venue[3]
            venue_address = venue[4]
            venue_distance = venue[5]
            is_res, is_italian = is_restaurant(venue_categories, specific_filter=italian_restaurant_categories)
            if is_res:
                x, y = lonlat_to_xy(venue_latlon[1], venue_latlon[0])
                restaurant = (venue_id, venue_name, venue_latlon[0], venue_latlon[1], venue_address, venue_distance, is_italian, x, y)
                if venue_distance<=300:
                    area_restaurants.append(restaurant)
                restaurants[venue_id] = restaurant
                if is_italian:
                    italian_restaurants[venue_id] = restaurant
        location_restaurants.append(area_restaurants)
        print(' .', end='')
    print(' done.')
    return restaurants, italian_restaurants, location_restaurants

# Try to load from local file system in case we did this before
restaurants = {}
italian_restaurants = {}
location_restaurants = []
loaded = False
try:
    with open('restaurants_350.pkl', 'rb') as f:
        restaurants = pickle.load(f)
    with open('italian_restaurants_350.pkl', 'rb') as f:
        italian_restaurants = pickle.load(f)
    with open('location_restaurants_350.pkl', 'rb') as f:
        location_restaurants = pickle.load(f)
    print('Restaurant data loaded.')
    loaded = True
except:
    pass

# If load failed use the Foursquare API to get the data
if not loaded:
    restaurants, italian_restaurants, location_restaurants = get_restaurants(latitudes, longitudes)
    
    # Let's persists this in local file system
    with open('restaurants_350.pkl', 'wb') as f:
        pickle.dump(restaurants, f)
    with open('italian_restaurants_350.pkl', 'wb') as f:
        pickle.dump(italian_restaurants, f)
    with open('location_restaurants_350.pkl', 'wb') as f:
        pickle.dump(location_restaurants, f)

Restaurant data loaded.


In [24]:
import numpy as np

print('Total number of restaurants:', len(restaurants))
print('Total number of Italian restaurants:', len(italian_restaurants))
print('Percentage of Italian restaurants: {:.2f}%'.format(len(italian_restaurants) / len(restaurants) * 100))
print('Average number of restaurants in neighborhood:', np.array([len(r) for r in location_restaurants]).mean())

Total number of restaurants: 362
Total number of Italian restaurants: 24
Percentage of Italian restaurants: 6.63%
Average number of restaurants in neighborhood: 2.2747252747252746


In [25]:
map_perth = folium.Map(location=perth_center, zoom_start=13)
folium.Marker(perth_center, popup='Hay Street Mall').add_to(map_perth)
for res in restaurants.values():
    lat = res[2]; lon = res[3]
    is_italian = res[6]
    color = 'red' if is_italian else 'blue'
    folium.CircleMarker([lat, lon], radius=3, color=color, fill=True, fill_color=color, fill_opacity=1).add_to(map_perth)
map_perth

## 3.Methodology

Let's perform some data analysis and derive some info from our raw data. First let's count the number of restaurants in every area candidate:

In [26]:
location_restaurants_count = [len(res) for res in location_restaurants]

df_locations['Restaurants in area'] = location_restaurants_count

print('Average number of restaurants in every area with radius=300m:', np.array(location_restaurants_count).mean())

df_locations.head(10)

Average number of restaurants in every area with radius=300m: 2.2747252747252746


Unnamed: 0,Address,Latitude,Longitude,X,Y,Distance from center,Restaurants in area
0,"Victoria Park Drive, Burswood, Town of Victori...",-31.952881,115.893766,8136648.0,-11866070.0,5992.495307,0
1,"Victoria Park Drive, Burswood, Town of Victori...",-31.950053,115.892572,8137248.0,-11866070.0,5840.3767,0
2,"Placid Avenue, Burswood, Town of Victoria Park...",-31.947225,115.891379,8137848.0,-11866070.0,5747.173218,0
3,"Belmont Park Racecourse, Placid Avenue, Burswo...",-31.944397,115.890186,8138448.0,-11866070.0,5715.767665,0
4,"Belmont Park Racecourse, Placid Avenue, Burswo...",-31.941569,115.888993,8139048.0,-11866070.0,5747.173218,0
5,"St John of God Mt Lawley Hospital, Thirlmere R...",-31.938742,115.8878,8139648.0,-11866070.0,5840.3767,0
6,"St John of God Mt Lawley Hospital, Thirlmere R...",-31.935914,115.886608,8140248.0,-11866070.0,5992.495307,0
7,"Crown Perth, Bolton Avenue, Burswood, Town of ...",-31.958006,115.892682,8135748.0,-11865550.0,5855.766389,1
8,"Roger Mackay Drive, Burswood, Town of Victoria...",-31.955177,115.891489,8136348.0,-11865550.0,5604.462508,0
9,"Burswood, Town of Victoria Park, Western Austr...",-31.952348,115.890296,8136948.0,-11865550.0,5408.326913,0


calculate the distance to nearest Italian restaurant from every area candidate center

In [27]:
distances_to_italian_restaurant = []

for area_x, area_y in zip(xs, ys):
    min_distance = 10000
    for res in italian_restaurants.values():
        res_x = res[7]
        res_y = res[8]
        d = calc_xy_distance(area_x, area_y, res_x, res_y)
        if d<min_distance:
            min_distance = d
    distances_to_italian_restaurant.append(min_distance)

df_locations['Distance to Italian restaurant'] = distances_to_italian_restaurant
df_locations.head(10)

Unnamed: 0,Address,Latitude,Longitude,X,Y,Distance from center,Restaurants in area,Distance to Italian restaurant
0,"Victoria Park Drive, Burswood, Town of Victori...",-31.952881,115.893766,8136648.0,-11866070.0,5992.495307,0,1748.047741
1,"Victoria Park Drive, Burswood, Town of Victori...",-31.950053,115.892572,8137248.0,-11866070.0,5840.3767,0,2330.409339
2,"Placid Avenue, Burswood, Town of Victoria Park...",-31.947225,115.891379,8137848.0,-11866070.0,5747.173218,0,2609.697278
3,"Belmont Park Racecourse, Placid Avenue, Burswo...",-31.944397,115.890186,8138448.0,-11866070.0,5715.767665,0,2747.771503
4,"Belmont Park Racecourse, Placid Avenue, Burswo...",-31.941569,115.888993,8139048.0,-11866070.0,5747.173218,0,3001.662302
5,"St John of God Mt Lawley Hospital, Thirlmere R...",-31.938742,115.8878,8139648.0,-11866070.0,5840.3767,0,2696.851827
6,"St John of God Mt Lawley Hospital, Thirlmere R...",-31.935914,115.886608,8140248.0,-11866070.0,5992.495307,0,2117.203158
7,"Crown Perth, Bolton Avenue, Burswood, Town of ...",-31.958006,115.892682,8135748.0,-11865550.0,5855.766389,1,780.028072
8,"Roger Mackay Drive, Burswood, Town of Victoria...",-31.955177,115.891489,8136348.0,-11865550.0,5604.462508,0,1379.703075
9,"Burswood, Town of Victoria Park, Western Austr...",-31.952348,115.890296,8136948.0,-11865550.0,5408.326913,0,1979.575045


In [28]:
print('Average distance to closest Italian restaurant from each area center:', df_locations['Distance to Italian restaurant'].mean())

Average distance to closest Italian restaurant from each area center: 1289.6940133341382


In [29]:
restaurant_latlons = [[res[2], res[3]] for res in restaurants.values()]

italian_latlons = [[res[2], res[3]] for res in italian_restaurants.values()]

create a map showing heatmap / density of restaurants 

In [55]:
from folium import plugins
from folium.plugins import HeatMap

map_perth = folium.Map(location=perth_center, zoom_start=13)
folium.TileLayer('cartodbpositron').add_to(map_perth) #cartodbpositron cartodbdark_matter
HeatMap(restaurant_latlons).add_to(map_perth)
folium.Marker(perth_center).add_to(map_perth)
folium.Circle(perth_center, radius=1000, fill=False, color='white').add_to(map_perth)
folium.Circle(perth_center, radius=2000, fill=False, color='white').add_to(map_perth)
folium.Circle(perth_center, radius=3000, fill=False, color='white').add_to(map_perth)
map_perth

Let's define new, more narrow region of interest

In [31]:
roi_x_min = perth_center_x - 2000
roi_y_max = perth_center_y + 1000
roi_width = 5000
roi_height = 5000
roi_center_x = roi_x_min + 2500
roi_center_y = roi_y_max - 2500
roi_center_lon, roi_center_lat = xy_to_lonlat(roi_center_x, roi_center_y)
roi_center = [roi_center_lat, roi_center_lon]

map_perth = folium.Map(location=roi_center, zoom_start=14)
HeatMap(restaurant_latlons).add_to(map_perth)
folium.Marker(perth_center).add_to(map_perth)
folium.Circle(roi_center, radius=2500, color='white', fill=True, fill_opacity=0.4).add_to(map_perth)
map_perth

This nicely covers all the pockets of low restaurant density closest to Perth center.

Let's also create new, more dense grid of location candidates restricted to our new region of interest

In [32]:
k = math.sqrt(3) / 2 # Vertical offset for hexagonal grid cells
x_step = 100
y_step = 100 * k 
roi_y_min = roi_center_y - 2500
roi_latitudes = []
roi_longitudes = []
roi_xs = []
roi_ys = []
for i in range(0, int(51/k)):
    y = roi_y_min + i * y_step
    x_offset = 50 if i%2==0 else 0
    for j in range(0, 51):
        x = roi_x_min + j * x_step + x_offset
        d = calc_xy_distance(roi_center_x, roi_center_y, x, y)
        if (d <= 2501):
            lon, lat = xy_to_lonlat(x, y)
            roi_latitudes.append(lat)
            roi_longitudes.append(lon)
            roi_xs.append(x)
            roi_ys.append(y)
print(len(roi_latitudes), 'candidate neighborhood centers generated.')

2261 candidate neighborhood centers generated.


OK. Now let's calculate two most important things for each location candidate: number of restaurants in vicinity (we'll use radius of 250 meters) and distance to closest Italian restaurant.

In [33]:
def count_restaurants_nearby(x, y, restaurants, radius=250):    
    count = 0
    for res in restaurants.values():
        res_x = res[7]; res_y = res[8]
        d = calc_xy_distance(x, y, res_x, res_y)
        if d<=radius:
            count += 1
    return count

def find_nearest_restaurant(x, y, restaurants):
    d_min = 100000
    for res in restaurants.values():
        res_x = res[7]; res_y = res[8]
        d = calc_xy_distance(x, y, res_x, res_y)
        if d<=d_min:
            d_min = d
    return d_min

roi_restaurant_counts = []
roi_italian_distances = []

print('Generating data on location candidates... ', end='')
for x, y in zip(roi_xs, roi_ys):
    count = count_restaurants_nearby(x, y, restaurants, radius=250)
    roi_restaurant_counts.append(count)
    distance = find_nearest_restaurant(x, y, italian_restaurants)
    roi_italian_distances.append(distance)
print('done.')


Generating data on location candidates... done.


In [34]:
# Let's put this into dataframe
df_roi_locations = pd.DataFrame({'Latitude':roi_latitudes,
                                 'Longitude':roi_longitudes,
                                 'X':roi_xs,
                                 'Y':roi_ys,
                                 'Restaurants nearby':roi_restaurant_counts,
                                 'Distance to Italian restaurant':roi_italian_distances})

df_roi_locations.head(10)

Unnamed: 0,Latitude,Longitude,X,Y,Restaurants nearby,Distance to Italian restaurant
0,-31.945183,115.879808,8138898.0,-11864350.0,0,1622.409564
1,-31.944712,115.879609,8138998.0,-11864350.0,0,1707.483777
2,-31.947923,115.880421,8138348.0,-11864260.0,0,1134.738111
3,-31.947451,115.880223,8138448.0,-11864260.0,0,1208.684673
4,-31.94698,115.880024,8138548.0,-11864260.0,0,1286.15967
5,-31.946508,115.879825,8138648.0,-11864260.0,0,1366.563118
6,-31.946037,115.879627,8138748.0,-11864260.0,0,1449.407745
7,-31.945566,115.879428,8138848.0,-11864260.0,0,1534.298168
8,-31.945094,115.87923,8138948.0,-11864260.0,0,1620.912992
9,-31.944623,115.879031,8139048.0,-11864260.0,0,1708.990048


Let us now filter those locations: we're interested only in locations with no more than two restaurants in radius of 250 meters, and no Italian restaurants in radius of 400 meters.

In [35]:
good_res_count = np.array((df_roi_locations['Restaurants nearby']<=2))
print('Locations with no more than two restaurants nearby:', good_res_count.sum())

good_ita_distance = np.array(df_roi_locations['Distance to Italian restaurant']>=400)
print('Locations with no Italian restaurants within 400m:', good_ita_distance.sum())

good_locations = np.logical_and(good_res_count, good_ita_distance)
print('Locations with both conditions met:', good_locations.sum())

df_good_locations = df_roi_locations[good_locations]

Locations with no more than two restaurants nearby: 1722
Locations with no Italian restaurants within 400m: 1797
Locations with both conditions met: 1510


Let's see how this looks on a map.

In [38]:
good_latitudes = df_good_locations['Latitude'].values
good_longitudes = df_good_locations['Longitude'].values

good_locations = [[lat, lon] for lat, lon in zip(good_latitudes, good_longitudes)]

map_perth = folium.Map(location=roi_center, zoom_start=14)
folium.TileLayer('cartodbpositron').add_to(map_perth)
HeatMap(restaurant_latlons).add_to(map_perth)
folium.Circle(roi_center, radius=2500, color='white', fill=True, fill_opacity=0.6).add_to(map_perth)
folium.Marker(perth_center).add_to(map_perth)
for lat, lon in zip(good_latitudes, good_longitudes):
    folium.CircleMarker([lat, lon], radius=2, color='blue', fill=True, fill_color='blue', fill_opacity=1).add_to(map_perth) 
map_perth

In [39]:
map_perth = folium.Map(location=roi_center, zoom_start=14)
HeatMap(good_locations, radius=25).add_to(map_perth)
folium.Marker(perth_center).add_to(map_perth)
for lat, lon in zip(good_latitudes, good_longitudes):
    folium.CircleMarker([lat, lon], radius=2, color='blue', fill=True, fill_color='blue', fill_opacity=1).add_to(map_perth)
map_perth

Let us now cluster those locations to create centers of zones containing good locations. Those zones, their centers and addresses will be the final result of our analysis.

In [40]:
from sklearn.cluster import KMeans

number_of_clusters = 15

good_xys = df_good_locations[['X', 'Y']].values
kmeans = KMeans(n_clusters=number_of_clusters, random_state=0).fit(good_xys)

cluster_centers = [xy_to_lonlat(cc[0], cc[1]) for cc in kmeans.cluster_centers_]

map_perth = folium.Map(location=roi_center, zoom_start=14)
folium.TileLayer('cartodbpositron').add_to(map_perth)
HeatMap(restaurant_latlons).add_to(map_perth)
folium.Circle(roi_center, radius=2500, color='white', fill=True, fill_opacity=0.4).add_to(map_perth)
folium.Marker(perth_center).add_to(map_perth)
for lon, lat in cluster_centers:
    folium.Circle([lat, lon], radius=500, color='green', fill=True, fill_opacity=0.25).add_to(map_perth) 
for lat, lon in zip(good_latitudes, good_longitudes):
    folium.CircleMarker([lat, lon], radius=2, color='blue', fill=True, fill_color='blue', fill_opacity=1).add_to(map_perth)
map_perth

In [52]:
map_perth = folium.Map(location=roi_center, zoom_start=14)
folium.Marker(perth_center).add_to(map_perth)
for lat, lon in zip(good_latitudes, good_longitudes):
    folium.Circle([lat, lon], radius=250, color='#00000000', fill=True, fill_color='#0066ff', fill_opacity=0.07).add_to(map_perth)
for lat, lon in zip(good_latitudes, good_longitudes):
    folium.CircleMarker([lat, lon], radius=2, color='blue', fill=True, fill_color='blue', fill_opacity=1).add_to(map_perth)
for lon, lat in cluster_centers:
    folium.Circle([lat, lon], radius=500, color='green', fill=False).add_to(map_perth) 
map_perth

Finaly, let's get the addresses of the centers of those zones, which can be presented to the client.

In [53]:
candidate_area_addresses = []
print('==============================================================')
print('Addresses of centers of areas recommended for further analysis')
print('==============================================================\n')
for lon, lat in cluster_centers:
    addr = get_address(lat, lon).replace(', Australia', '')
    candidate_area_addresses.append(addr)    
    x, y = lonlat_to_xy(lon, lat)
    d = calc_xy_distance(x, y, perth_center_x, perth_center_y)
    print('{}{} => {:.1f}km from Hay Streeet Mall'.format(addr, ' '*(50-len(addr)), d/1000))
    

Addresses of centers of areas recommended for further analysis

Traffic Police, Bronte Street, East Perth, Perth, City of Perth, Western Australia, 6004 => 2.9km from Hay Streeet Mall
238, Brisbane Street, Perth, City Of Vincent, Western Australia, 6003 => 2.6km from Hay Streeet Mall
Tully Road, East Perth, Perth, City of Perth, Western Australia, 6004 => 3.4km from Hay Streeet Mall
The Emperors Crown Backpackers, 85, Stirling Street, Perth, City Of Vincent, City of Perth, Western Australia, 6000 => 1.2km from Hay Streeet Mall
Governors Avenue, Perth, City of Perth, Western Australia, 6000 => 1.1km from Hay Streeet Mall
Wright Street, Highgate, City Of Vincent, Western Australia, 6050 => 3.5km from Hay Streeet Mall
Northbridge Tunnel (West Bound), Aberdeen Street, Perth, City of Perth, Western Australia, 6003 => 2.1km from Hay Streeet Mall
Youth With A Mission, Gladstone Street, Perth, City Of Vincent, Western Australia, 6004 => 2.4km from Hay Streeet Mall
William Street after Glendowe

Let's see all those zones on the map

In [54]:
map_perth = folium.Map(location=roi_center, zoom_start=14)
folium.Circle(perth_center, radius=50, color='red', fill=True, fill_color='red', fill_opacity=1).add_to(map_perth)
for lonlat, addr in zip(cluster_centers, candidate_area_addresses):
    folium.Marker([lonlat[1], lonlat[0]], popup=addr).add_to(map_perth) 
for lat, lon in zip(good_latitudes, good_longitudes):
    folium.Circle([lat, lon], radius=250, color='#0000ff00', fill=True, fill_color='#0066ff', fill_opacity=0.05).add_to(map_perth)
map_perth

Results, Discussion and conclusion will be included in the full report.