# **Capstone Project - Green Tea Restaurant Relocation**

*This is the notebook for Coursera IBM Data Science Capstone Project.*


## Import all Libraries
Thie way, we only need to run this cell once.

In [1]:
# this way, we only need to run this cell once

import requests # library to handle requests
import pandas as pd # library for data analsysis
import numpy as np # library to handle data in a vectorized manner

# clustering algorithm library
from sklearn.cluster import KMeans

# The Python Math Library provides us access to
# some common math functions and constants in Python
import math

# The pickle module implements binary protocols
# for serializing and de-serializing a Python object structure.
import pickle

# Performs cartographic transformations (converts from longitude,latitude
# to native map projection x,y coordinates and vice versa) using proj (https://proj.org)
!pip install pyproj
import pyproj

import progressbar # library to show a progress bar while the code is processing data.

# tranforming json file into a pandas dataframe library
from pandas.io.json import json_normalize

!conda install -c conda-forge folium=0.5.0 --yes
import folium # plotting library
from folium import plugins
from folium.plugins import HeatMap

print('Folium installed')
print('Libraries imported.')

Collecting pyproj
[?25l  Downloading https://files.pythonhosted.org/packages/d6/70/eedc98cd52b86de24a1589c762612a98bea26cde649ffdd60c1db396cce8/pyproj-2.4.2.post1-cp36-cp36m-manylinux2010_x86_64.whl (10.1MB)
[K     |████████████████████████████████| 10.1MB 4.3MB/s 
[?25hInstalling collected packages: pyproj
Successfully installed pyproj-2.4.2.post1
/bin/bash: conda: command not found
Folium installed
Libraries imported.


## Create Neighborhood for Edmonton


### API Credentials
The following cell contains API credentials which will be removed/hidden prior to pushing to the repository.

In [2]:
google_api_key = '***'

fs_client_id = '***'
fs_client_secret = '***'

print ("\"Roger that, sir!\"")

"Roger that, sir!"


### Geocoding Request and Response (latitude/longitude lookup)
Define a function to enquiry Google Maps Geocoding API for long/lat degree.

In [0]:
def g_enquiry (api_key, address, verbose=False):
    try:
        url = 'https://maps.googleapis.com/maps/api/geocode/json?address={}&key={}'.format(address, api_key)
        response = requests.get(url).json()
        if verbose:
            print('Google Maps API JSON result =>', response)
        results = response['results']
        geographical_data = results[0]['geometry']['location'] # get geographical coordinates
        lat = geographical_data['lat']
        lon = geographical_data['lng']
        return [lat, lon]
    except:
        return [None, None]

Let's test the function.

In [4]:
address = 'Rogers Place, Edmonton, Alberta'
dt_yeg = g_enquiry (google_api_key, address)
print('Coordinate of {} is {}'.format(address, dt_yeg))

Coordinate of Rogers Place, Edmonton, Alberta is [53.54623600000001, -113.497221]


Looks good. The function will be used to extract long/lat degree of locations.
However, we are not able to calculate distance given the traditional long/lat degree, and therefore, we will need to use the Universal Transverse Mercator ([UTM](https://en.wikipedia.org/wiki/Universal_Transverse_Mercator_coordinate_system)). To do that, we will define functions to covert them back and forth, and to calculate the distances.

In [0]:
# Define a function to convert lat/long degree to UTM coordinate.
def lonlat_to_utm(lon, lat):
    proj_latlon = pyproj.Proj(proj='latlon',datum='WGS84')
    proj_utm = pyproj.Proj(proj="utm", zone=12, datum='WGS84')
    xy = pyproj.transform(proj_latlon, proj_utm, lon, lat)
    return xy[0], xy[1]

# Define a function to convert UMT coordiante to lat/long degree
def utm_to_lonlat(x, y):
    proj_latlon = pyproj.Proj(proj='latlon',datum='WGS84')
    proj_utm = pyproj.Proj(proj="utm", zone=12, datum='WGS84')
    lonlat = pyproj.transform(proj_utm, proj_latlon, x, y)
    return lonlat[0], lonlat[1]

# Define a function to calculate the distance between location and downtown.
def calc_utm_distance(x1, y1, x2, y2):
    return math.sqrt((x2-x1)*(x2-x1) + (y2-y1)*(y2-y1))


Let's test if the defined functions work well as we expected. The long/lat degree of downtown, Edmonton should be fairly close to the output of utm_to_lonlat function.

In [6]:
print('Coordinate transformation check')
print('-------------------------------')
print('Downtown longitude={}, latitude={}'.format(dt_yeg[1], dt_yeg[0]))
x, y = lonlat_to_utm(dt_yeg[1], dt_yeg[0])
print('dt_yeg UTM X={}, Y={}'.format(x, y))
lo, la = utm_to_lonlat(x, y)
print('dt_yeg longitude={}, latitude={}'.format(lo, la))
print("The long/lat degree should be almost the same to the one converted from UTM System")

Coordinate transformation check
-------------------------------
Downtown longitude=-113.497221, latitude=53.54623600000001
dt_yeg UTM X=334548.70396402385, Y=5935938.463901036
dt_yeg longitude=-113.497221, latitude=53.546236000000015
The long/lat degree should be almost the same to the one converted from UTM System


The following cell draws circles with a radius of 500 meters. This creates a grid of cells covering our area of interest. The map also shows where the Green Tea Restaurant currently is.

In [7]:
dt_yeg_x, dt_yeg_y = lonlat_to_utm(dt_yeg[1], dt_yeg[0]) # downtown in UTM coordinates

k = math.sqrt(3) / 2 # Vertical offset for hexagonal grid cells
x_min = dt_yeg_x - 1000
x_step = 500
y_min = dt_yeg_y - 1000 - (int(21/k)*k*500 - 1000)/2
y_step = 500 * k 

latitudes = []
longitudes = []
distances_from_dt = []
xs = []
ys = []
for i in range(0, int(21/k)):
    y = y_min + i * y_step
    x_offset = 250 if i%2==0 else 0
    for j in range(0, 21):
        x = x_min + j * x_step + x_offset
        distance_from_dt = calc_utm_distance(dt_yeg_x, dt_yeg_y, x, y)
        if (distance_from_dt <= 1001):
            lon, lat = utm_to_lonlat(x, y)
            latitudes.append(lat)
            longitudes.append(lon)
            distances_from_dt.append(distance_from_dt)
            xs.append(x)
            ys.append(y)

print(len(latitudes), 'candidate neighborhood centers generated.')

map_yeg = folium.Map(location=dt_yeg, zoom_start=12.2
)
folium.Marker(location=[53.549231, -113.619792], popup="Green Tea Restaurant", icon=folium.Icon(icon='info-sign')).add_to(map_yeg)
folium.Marker(dt_yeg, popup='Downtown').add_to(map_yeg)
for lat, lon in zip(latitudes, longitudes):
    # folium.CircleMarker([lat, lon], radius=2, color='blue', fill=True, fill_color='blue', fill_opacity=1).add_to(map_yeg) 
    folium.Circle([lat, lon], radius=250, color='green', fill=False).add_to(map_yeg)
    # folium.Marker([lat, lon]).add_to(map_yeg)
map_yeg

15 candidate neighborhood centers generated.


In [0]:
def get_address(api_key, latitude, longitude, verbose=False):
    try:
        url = 'https://maps.googleapis.com/maps/api/geocode/json?key={}&latlng={},{}'.format(api_key, latitude, longitude)
        response = requests.get(url).json()
        if verbose:
            print('Google Maps API JSON result =>', response)
        results = response['results']
        address = results[0]['formatted_address']
        return address
    except:
        return None

Let's test the function. This request should return the address given the coordiante of Rogers Place.

In [9]:
addr = get_address(google_api_key, dt_yeg[0], dt_yeg[1])
print('Address of [{}, {}] is: {}'.format(dt_yeg[0], dt_yeg[1], addr))

Address of [53.54623600000001, -113.497221] is: 10232 104 Ave NW, Edmonton, AB T5J 1B9, Canada


Next, we will use the above function to look up addresses of neighbors.

In [10]:
loaded = False
try:
    with open('locations.pkl', 'rb') as f:
        df_locations = pickle.load(f)
        loaded = True
        print('DataFrame imported from local file.')
except:
    pass

if not loaded:
    print('Obtaining location addresses...')
    addresses = []
    for lat, lon in zip(latitudes, longitudes):
        address = get_address(google_api_key, lat, lon)
        if address is None:
            address = 'NO ADDRESS'
        address = address.replace(', Canada', '') # We don't need country part of address
        addresses.append(address)
    print("Done!")
    df_locations = pd.DataFrame({'Address': addresses,
                        'Latitude': latitudes,
                        'Longitude': longitudes,
                        'X': xs,
                        'Y': ys,
                        'Distance from Downtown': distances_from_dt})
    # Save to local file
    df_locations.to_pickle('./locations.pkl')
    print ('DataFrame saved to local file locations.pkl')

Obtaining location addresses...
Done!
DataFrame saved to local file locations.pkl


In [11]:
df_locations

Unnamed: 0,Address,Latitude,Longitude,X,Y,Distance from Downtown
0,"9923 103 St NW, Edmonton, AB T5K 2J3",53.537857,-113.496728,334548.703964,5935005.0,933.012702
1,"The Executive, 10105 109 St NW, Edmonton, AB T...",53.541509,-113.508264,333798.703964,5935438.0,901.387819
2,"105 Street & Jasper Avenue, Edmonton, AB T5J 3N1",53.541667,-113.500726,334298.703964,5935438.0,559.016994
3,"10043 Jasper Ave, Edmonton, AB T5J 1S6",53.541825,-113.493188,334798.703964,5935438.0,559.016994
4,"9751 Jasper Ave, Edmonton, AB T5J 0C5",53.541982,-113.485649,335298.703964,5935438.0,901.387819
5,"107 Street & 103 Avenue, Edmonton, AB T5J 1K3",53.545477,-113.504724,334048.703964,5935871.0,504.467341
6,"10340 103 St NW, Edmonton, AB T5J 0Y9",53.545634,-113.497186,334548.703964,5935871.0,66.987298
7,"10248 99 St NW, Edmonton, AB T5J",53.545792,-113.489647,335048.703964,5935871.0,504.467341
8,"10568 109 St NW, Edmonton, AB T5H 3B2",53.549286,-113.508724,333798.703964,5936304.0,834.550535
9,"10572 105 St NW, Edmonton, AB T5H 2W7",53.549444,-113.501184,334298.703964,5936304.0,443.25455


### Foursquare Request
In this step, we will use Foursquare API to get information about restaurants in each neighborhood.

Refer to [Foursaqure website](https://developer.foursquare.com/docs/resources/categories) for a full list of category IDs.

In [0]:
root_category = '4d4b7105d754a06374d81259' # Food category

chinese_restaurant_categories = ['52af3a5e3cf9994f4e043bea', '52af3a723cf9994f4e043bec',
                                 '52af3a7c3cf9994f4e043bed', '58daa1558bbb0b01f18ec1d3',
                                 '52af3a673cf9994f4e043beb', '52af3a903cf9994f4e043bee',
                                 '4bf58dd8d48988d1f5931735', '52af3a9f3cf9994f4e043bef',
                                 '52af3aaa3cf9994f4e043bf0', '52af3ab53cf9994f4e043bf1',
                                 '52af3abe3cf9994f4e043bf2', '52af3ac83cf9994f4e043bf3',
                                 '52af3ad23cf9994f4e043bf4', '52af3add3cf9994f4e043bf5',
                                 '52af3af23cf9994f4e043bf7', '52af3ae63cf9994f4e043bf6',
                                 '52af3afc3cf9994f4e043bf8', '52af3b053cf9994f4e043bf9',
                                 '52af3b213cf9994f4e043bfa', '52af3b293cf9994f4e043bfb',
                                 '52af3b343cf9994f4e043bfc', '52af3b3b3cf9994f4e043bfd',
                                 '52af3b463cf9994f4e043bfe', '52af3b633cf9994f4e043c01',
                                 '52af3b513cf9994f4e043bff', '52af3b593cf9994f4e043c00',
                                 '52af3b6e3cf9994f4e043c02', '52af3b773cf9994f4e043c03',
                                 '52af3b813cf9994f4e043c04', '52af3b893cf9994f4e043c05',
                                 '52af3b913cf9994f4e043c06', '52af3b9a3cf9994f4e043c07',
                                 '52af3ba23cf9994f4e043c08', '4bf58dd8d48988d145941735']

Next, define functions to categorize and scrape information.

In [0]:
def is_restaurant(categories, specific_filter=None):
    restaurant_words = ['restaurant']
    restaurant = False
    specific = False
    for c in categories:
        category_name = c[0].lower()
        category_id = c[1]
        for r in restaurant_words:
            if r in category_name:
                restaurant = True
        if 'fast food' in category_name:
            restaurant = False
        if not(specific_filter is None) and (category_id in specific_filter):
            specific = True
            restaurant = True
    return restaurant, specific

def get_categories(categories):
    return [(cat['name'], cat['id']) for cat in categories]

def format_address(location):
    address = ', '.join(location['formattedAddress'])
    address = address.replace(', Alberta', '')
    address = address.replace(', Canada', '')
    return address

def get_venues_near_location(lat, lon, category, client_id, client_secret, radius=500, limit=100):
    version = '20181020'
    url = 'https://api.foursquare.com/v2/venues/explore?client_id={}&client_secret={}&v={}&ll={},{}&categoryId={}&radius={}&limit={}'.format(
        client_id, client_secret, version, lat, lon, category, radius, limit)
    try:
        results = requests.get(url).json()['response']['groups'][0]['items']
        venues = [(item['venue']['id'],
                   item['venue']['name'],
                   get_categories(item['venue']['categories']),
                   (item['venue']['location']['lat'], item['venue']['location']['lng']),
                   format_address(item['venue']['location']),
                   item['venue']['location']['distance']) for item in results]        
    except:
        venues = []
    return venues

In [14]:
 def get_restaurants(lats, lons):
    restaurants = {}
    chinese_restaurants = {}
    location_restaurants = []

    print('Obtaining venues around candidate locations...')
    for lat, lon in zip(lats, lons):
        # Using radius=300 to meke sure we have overlaps/full coverage so we don't miss any restaurant (we're using dictionaries to remove any duplicates resulting from area overlaps)
        venues = get_venues_near_location(lat, lon, root_category, fs_client_id, fs_client_secret, radius=350, limit=100)
        area_restaurants = []
        for venue in venues:
            venue_id = venue[0]
            venue_name = venue[1]
            venue_categories = venue[2]
            venue_latlon = venue[3]
            venue_address = venue[4]
            venue_distance = venue[5]
            is_res, is_chinese = is_restaurant(venue_categories, specific_filter=chinese_restaurant_categories)
            if is_res:
                x, y = lonlat_to_utm(venue_latlon[1], venue_latlon[0])
                restaurant = (venue_id, venue_name, venue_latlon[0], venue_latlon[1], venue_address, venue_distance, is_chinese, x, y)
                if venue_distance<=250:
                    area_restaurants.append(restaurant)
                restaurants[venue_id] = restaurant
                if is_chinese:
                    chinese_restaurants[venue_id] = restaurant
        location_restaurants.append(area_restaurants)
    print('Done!')
    return restaurants, chinese_restaurants, location_restaurants


restaurants = {}
chinese_restaurants = {}
location_restaurants = []
loaded = False
try:
    with open('restaurants_300.pkl', 'rb') as f:
        restaurants = pickle.load(f)
    with open('chinese_restaurants_300.pkl', 'rb') as f:
        chinese_restaurants = pickle.load(f)
    with open('location_restaurants_300.pkl', 'rb') as f:
        location_restaurants = pickle.load(f)
    print('Restaurant data loaded.')
    loaded = True
except:
    pass

if not loaded:
    restaurants, chinese_restaurants, location_restaurants = get_restaurants(latitudes, longitudes)
    # Let's persists this in local file system
    with open('restaurants_300.pkl', 'wb') as f:
        pickle.dump(restaurants, f)
    with open('chinese_restaurants_300.pkl', 'wb') as f:
        pickle.dump(chinese_restaurants, f)
    with open('location_restaurants_300.pkl', 'wb') as f:
        pickle.dump(location_restaurants, f)
        

Obtaining venues around candidate locations...
Done!


In [15]:
restaurants

{'4b0586f5f964a5200b7822e3': ('4b0586f5f964a5200b7822e3',
  'Hardware Grill',
  53.54241267633521,
  -113.48561741048763,
  '9698 Jasper Avenue, Edmonton AB T5H 3V5',
  48,
  False,
  335302.50106957206,
  5935486.328482499),
 '4b0586f5f964a520137822e3': ('4b0586f5f964a520137822e3',
  'Hoa An Restaurant',
  53.55182864803684,
  -113.48905412826757,
  '9653 107 Ave. Northwest, Edmonton AB T5H 0T8',
  205,
  True,
  335111.4423685989,
  5936541.538788642),
 '4b0586f5f964a5201a7822e3': ('4b0586f5f964a5201a7822e3',
  'Mikado',
  53.54573419816865,
  -113.5090962806086,
  '10305 109 Street (across from MacEwan University), Edmonton AB T5J 4X9',
  290,
  False,
  333760.09740411496,
  5935910.31342741),
 '4b0586f6f964a5201b7822e3': ('4b0586f6f964a5201b7822e3',
  'Khazana Restaurant',
  53.542735468061984,
  -113.50425082076549,
  '10177 107 Street, Edmonton AB T5J 1J5',
  306,
  False,
  334069.34044477774,
  5935565.490205238),
 '4b0586f6f964a5201d7822e3': ('4b0586f6f964a5201d7822e3',
  'Ha

Let's print out the results and some basics statistics.

In [16]:
print('Total number of restaurants:', len(restaurants))
print('Total number of Chinese restaurants:', len(chinese_restaurants))
print('Percentage of Chinese restaurants: {:.2f}%'.format(len(chinese_restaurants) / len(restaurants) * 100))
print('Average number of restaurants in neighborhood:', np.array([len(r) for r in location_restaurants]).mean())

Total number of restaurants: 95
Total number of Chinese restaurants: 13
Percentage of Chinese restaurants: 13.68%
Average number of restaurants in neighborhood: 5.066666666666666


In [17]:
print('List of all restaurants')
print('-----------------------')
for r in list(restaurants.values())[:10]:
    print(r)
print('...')
print('Total:', len(restaurants))

List of all restaurants
-----------------------
('4b6cb50ef964a520e04d2ce3', "Rigoletto's", 53.539053967983804, -113.49765235103246, '10305 100ave, Edmonton AB', 146, False, 334492.10604495555, 5935140.674094922)
('4b675a53f964a520cf492be3', 'LaRonde Restaurant - Chateau Lacombe', 53.53926154306514, -113.49404976360684, '10111 Bellamy Hill Rd. (at MacDonald Dr. NW), Edmonton AB T5J 1N7', 290, False, 334731.5973645472, 5935155.3934075665)
('531a181a498ec163295d9915', '3 amigos restaurant', 53.54088, -113.496908, 'Canada', 267, False, 334548.5444278917, 5935342.037195342)
('4b078a18f964a52090fe22e3', 'Kyoto Japanese Cuisine', 53.54116949229921, -113.50940302018427, '10128 109 St NW, Edmonton AB T5J 1M7', 84, False, 333721.882603335, 5935403.340123962)
('4b0586f6f964a5201b7822e3', 'Khazana Restaurant', 53.542735468061984, -113.50425082076549, '10177 107 Street, Edmonton AB T5J 1J5', 306, False, 334069.34044477774, 5935565.490205238)
('4b15f4fef964a52006b623e3', 'Wildflower Grill', 53.5393

In [18]:
print('List of Chinese restaurants')
print('---------------------------')
for r in list(chinese_restaurants.values())[:10]:
    print(r)
print('---------------------------')
print('Total:', len(chinese_restaurants))

List of Chinese restaurants
---------------------------
('4b0d7213f964a5207e4823e3', 'Chicken for Lunch', 53.5416196652816, -113.49322176707038, '10060 Jasper Ave (Scotia Place Food Court), Edmonton AB T5J 3R8', 22, True, 334795.63929736556, 5935415.7447195)
('4c390ee30a71c9b604e741c9', 'Wok Box', 53.54092188103267, -113.49492447788208, '10119 Jasper Ave (101st St.), Edmonton AB', 152, True, 334680.11647926795, 5935342.088275741)
('4b0d726ff964a520884823e3', 'New Tan Tan Restaurant', 53.5428996062331, -113.48581530685958, '10133 - 97 St (101A Ave), Edmonton AB T5J 0L2', 102, True, 335291.281831532, 5935540.9431688115)
('4ee10dfab8f7413830dfa933', 'Kallin Chinese Seafood Restaurant', 53.54883873135637, -113.50865857269224, '10548 - 109 Street, Edmonton AB', 49, True, 333801.26035802613, 5936254.57997853)
('4b06e394f964a52042f222e3', 'The Lingnan', 53.54954173860989, -113.4996271679795, '10582 104 St. NW (106 Ave. NW), Edmonton AB T5H 2W1', 103, True, 334402.2251457613, 5936311.725210998

In [19]:
print('Restaurants around location')
print('---------------------------')
for i in range(0,5):
    rs = location_restaurants[i][:8]
    names = ', '.join([r[1] for r in rs])
    print('Restaurants around location {}: {}'.format(i+1, names))

print('---------------------------')

Restaurants around location
---------------------------
Restaurants around location 1: Rigoletto's, LaRonde Restaurant - Chateau Lacombe
Restaurants around location 2: Kyoto Japanese Cuisine, Hoang Long, Oodle Noodle, Mucho Burrito Fresh Mexican Grill, Doan's Downtown, Ruby Dragon, The Sultan Palace
Restaurants around location 3: Corso 32, Uccellino, Tzin, I Love Sushi, Blue Plate Diner, Pita Pit, Wishbone, Drunken Ox Sober Cat
Restaurants around location 4: Tres Carnales Taquería, Bistro Praha, Chicken for Lunch, V Sandwiches, Sorrentino's Bistro-Bar, State & Main, The Chopped Leaf, Wok Box
Restaurants around location 5: Hardware Grill, New Tan Tan Restaurant, Normand's Bistro, Canada Place - Food Court
---------------------------


### Ploting Restaurants on the Map
Now let's plot restaurants on the map. Chinese restaurant will be represented by red dots, while other restaurant will be blue.


In [20]:
map_yeg = folium.Map(location=dt_yeg, zoom_start=14)
folium.Marker(dt_yeg, popup='Downtown').add_to(map_yeg)
for res in restaurants.values():
    lat = res[2]; lon = res[3]
    is_chinese = res[6]
    color = 'red' if is_chinese else 'blue'
    folium.CircleMarker([lat, lon], radius=3, color=color, fill=True, fill_color=color, fill_opacity=1).add_to(map_yeg)
map_yeg

## Analysis
First, we will perform some basic explanatory analysis. Let's count the total number of restaurants in each candidate neighbourhood. 

In [21]:
location_restaurants_count = [len(res) for res in location_restaurants]

df_locations['Restaurants in area'] = location_restaurants_count

print('Average number of restaurants in every area with radius=250m:', np.array(location_restaurants_count).mean())

df_locations.head(10)

Average number of restaurants in every area with radius=250m: 5.066666666666666


Unnamed: 0,Address,Latitude,Longitude,X,Y,Distance from Downtown,Restaurants in area
0,"9923 103 St NW, Edmonton, AB T5K 2J3",53.537857,-113.496728,334548.703964,5935005.0,933.012702,2
1,"The Executive, 10105 109 St NW, Edmonton, AB T...",53.541509,-113.508264,333798.703964,5935438.0,901.387819,7
2,"105 Street & Jasper Avenue, Edmonton, AB T5J 3N1",53.541667,-113.500726,334298.703964,5935438.0,559.016994,11
3,"10043 Jasper Ave, Edmonton, AB T5J 1S6",53.541825,-113.493188,334798.703964,5935438.0,559.016994,18
4,"9751 Jasper Ave, Edmonton, AB T5J 0C5",53.541982,-113.485649,335298.703964,5935438.0,901.387819,4
5,"107 Street & 103 Avenue, Edmonton, AB T5J 1K3",53.545477,-113.504724,334048.703964,5935871.0,504.467341,2
6,"10340 103 St NW, Edmonton, AB T5J 0Y9",53.545634,-113.497186,334548.703964,5935871.0,66.987298,5
7,"10248 99 St NW, Edmonton, AB T5J",53.545792,-113.489647,335048.703964,5935871.0,504.467341,2
8,"10568 109 St NW, Edmonton, AB T5H 3B2",53.549286,-113.508724,333798.703964,5936304.0,834.550535,2
9,"10572 105 St NW, Edmonton, AB T5H 2W7",53.549444,-113.501184,334298.703964,5936304.0,443.25455,1


In [0]:
distances_to_chinese_restaurant = []

for area_x, area_y in zip(xs, ys):
    min_distance = 10000
    for res in chinese_restaurants.values():
        res_x = res[7]
        res_y = res[8]
        d = calc_utm_distance(area_x, area_y, res_x, res_y)
        if d<min_distance:
            min_distance = d
    distances_to_chinese_restaurant.append(min_distance)

df_locations['Distance to Chinese restaurant'] = distances_to_chinese_restaurant

In [23]:
df_locations.head()

Unnamed: 0,Address,Latitude,Longitude,X,Y,Distance from Downtown,Restaurants in area,Distance to Chinese restaurant
0,"9923 103 St NW, Edmonton, AB T5K 2J3",53.537857,-113.496728,334548.703964,5935005.0,933.012702,2,361.377601
1,"The Executive, 10105 109 St NW, Edmonton, AB T...",53.541509,-113.508264,333798.703964,5935438.0,901.387819,7,816.120081
2,"105 Street & Jasper Avenue, Edmonton, AB T5J 3N1",53.541667,-113.500726,334298.703964,5935438.0,559.016994,11,393.400264
3,"10043 Jasper Ave, Edmonton, AB T5J 1S6",53.541825,-113.493188,334798.703964,5935438.0,559.016994,18,22.924951
4,"9751 Jasper Ave, Edmonton, AB T5J 0C5",53.541982,-113.485649,335298.703964,5935438.0,901.387819,4,102.747693


In [24]:
print('Average distance to closest chinese restaurant from each area center:', df_locations['Distance to Chinese restaurant'].mean())

Average distance to closest chinese restaurant from each area center: 299.6669363220621


OK, so on average Chinese restaurant can be found within ~300m from every area center candidate. That's fairly close, so we need to filter our areas carefully!

Let's create a map showing heatmap / density of restaurants and try to extract some meaningfull info from that. Also, let's show a few circles indicating distance of 300m, 600m, and 1000m from Rogers Place.

In [0]:
restaurant_latlons = [[res[2], res[3]] for res in restaurants.values()]
chinese_latlons = [[res[2], res[3]] for res in chinese_restaurants.values()]

In [26]:
map_yeg = folium.Map(location=dt_yeg, zoom_start=14)
folium.TileLayer('cartodbpositron').add_to(map_yeg) #cartodbpositron cartodbdark_matter
HeatMap(restaurant_latlons).add_to(map_yeg)
folium.Marker(dt_yeg).add_to(map_yeg)
folium.Circle(dt_yeg, radius=300, fill=False, color='#5c5c5c').add_to(map_yeg)
folium.Circle(dt_yeg, radius=600, fill=False, color='#5c5c5c').add_to(map_yeg)
folium.Circle(dt_yeg, radius=1000, fill=False, color='#5c5c5c').add_to(map_yeg)
map_yeg

There are a few pockets of low restaurant density.

Create another heatmap map showing heatmap/density of Chinese restaurants only.

In [27]:
map_yeg = folium.Map(location=dt_yeg, zoom_start=14)
folium.TileLayer('cartodbpositron').add_to(map_yeg) #cartodbpositron cartodbdark_matter
HeatMap(chinese_latlons).add_to(map_yeg)
folium.Marker(dt_yeg).add_to(map_yeg)
folium.Circle(dt_yeg, radius=300, fill=False, color='#5c5c5c').add_to(map_yeg)
folium.Circle(dt_yeg, radius=600, fill=False, color='#5c5c5c').add_to(map_yeg)
folium.Circle(dt_yeg, radius=1000, fill=False, color='#5c5c5c').add_to(map_yeg)
map_yeg

We can see there is a high Chinese restaurant density on top-right corner. Right, that is China Town.

## **Locate Solutions**

In [28]:
roi_x_min = dt_yeg_x - 400
roi_y_max = dt_yeg_y + 200
roi_width = 1000
roi_height = 1000
roi_dt_x = roi_x_min + 500
roi_dt_y = roi_y_max - 500
roi_dt_lon, roi_dt_lat = utm_to_lonlat(roi_dt_x, roi_dt_y)
roi_dt = [roi_dt_lat, roi_dt_lon]

map_yeg = folium.Map(location=roi_dt, zoom_start=15)
HeatMap(restaurant_latlons).add_to(map_yeg)
folium.Marker(dt_yeg).add_to(map_yeg)
folium.Circle(roi_dt, radius=1000, color='white', fill=True, fill_opacity=0.4).add_to(map_yeg)
map_yeg

In [29]:
k = math.sqrt(3) / 2 # Vertical offset for hexagonal grid cells
x_step = 100
y_step = 100 * k 
roi_y_min = roi_dt_y - 2500

roi_latitudes = []
roi_longitudes = []
roi_xs = []
roi_ys = []
for i in range(0, int(51/k)):
    y = roi_y_min + i * y_step
    x_offset = 50 if i%2==0 else 0
    for j in range(0, 51):
        x = roi_x_min + j * x_step + x_offset
        d = calc_utm_distance(roi_dt_x, roi_dt_y, x, y)
        if (d <= 501):
            lon, lat = utm_to_lonlat(x, y)
            roi_latitudes.append(lat)
            roi_longitudes.append(lon)
            roi_xs.append(x)
            roi_ys.append(y)

print(len(roi_latitudes), 'candidate neighborhood centers generated.')

89 candidate neighborhood centers generated.


OK. Now let's calculate two most important things for each location candidate: number of restaurants in vicinity (we'll use radius of 250 meters) and distance to closest Chinese restaurant.

In [30]:
def count_restaurants_nearby(x, y, restaurants, radius=250):    
    count = 0
    for res in restaurants.values():
        res_x = res[7]; res_y = res[8]
        d = calc_utm_distance(x, y, res_x, res_y)
        if d<=radius:
            count += 1
    return count

def find_nearest_restaurant(x, y, restaurants):
    d_min = 100000
    for res in restaurants.values():
        res_x = res[7]; res_y = res[8]
        d = calc_utm_distance(x, y, res_x, res_y)
        if d<=d_min:
            d_min = d
    return d_min

roi_restaurant_counts = []
roi_chinese_distances = []

print('Generating data on location candidates... ')
for x, y in zip(roi_xs, roi_ys):
    count = count_restaurants_nearby(x, y, restaurants, radius=250)
    roi_restaurant_counts.append(count)
    distance = find_nearest_restaurant(x, y, chinese_restaurants)
    roi_chinese_distances.append(distance)
print('Done.')

Generating data on location candidates... 
Done.


In [0]:
# Store the data in a DataFrame
df_roi_locations = pd.DataFrame({'Latitude':roi_latitudes,
                                 'Longitude':roi_longitudes,
                                 'X':roi_xs,
                                 'Y':roi_ys,
                                 'Restaurants nearby':roi_restaurant_counts,
                                 'Distance to Chinese restaurant':roi_chinese_distances})

In [32]:
df_roi_locations.head(10)

Unnamed: 0,Latitude,Longitude,X,Y,Restaurants nearby,Distance to Chinese restaurant
0,53.539709,-113.499101,334398.703964,5935217.0,8,307.991691
1,53.539741,-113.497593,334498.703964,5935217.0,7,220.400496
2,53.539772,-113.496086,334598.703964,5935217.0,9,149.311338
3,53.539804,-113.494578,334698.703964,5935217.0,12,126.536053
4,53.539835,-113.493071,334798.703964,5935217.0,12,172.420618
5,53.539867,-113.491563,334898.703964,5935217.0,11,223.945659
6,53.540471,-113.4999,334348.703964,5935304.0,11,333.648311
7,53.540503,-113.498393,334448.703964,5935304.0,9,234.603266
8,53.540534,-113.496885,334548.703964,5935304.0,11,136.953238
9,53.540566,-113.495378,334648.703964,5935304.0,14,49.736168


Let us now **filter** those locations: we're interested only in **locations with no more than three restaurants near by**, and **no Chinese restaurants in radius of 300 meters**.


In [33]:
good_res_count = np.array((df_roi_locations['Restaurants nearby']<=3))
print('Locations with no more than three restaurants nearby:', good_res_count.sum())

good_chn_distance = np.array(df_roi_locations['Distance to Chinese restaurant']>=300)
print('Locations with no Chinese restaurants within 300m:', good_chn_distance.sum())

good_locations = np.logical_and(good_res_count, good_chn_distance)
print('Locations with both conditions met:', good_locations.sum())

df_good_locations = df_roi_locations[good_locations]

Locations with no more than three restaurants nearby: 14
Locations with no Chinese restaurants within 300m: 47
Locations with both conditions met: 12


Now, plot the processed data on the map.

In [34]:
good_latitudes = df_good_locations['Latitude'].values
good_longitudes = df_good_locations['Longitude'].values

good_locations = [[lat, lon] for lat, lon in zip(good_latitudes, good_longitudes)]

map_yeg = folium.Map(location=roi_dt, zoom_start=15)
folium.TileLayer('cartodbpositron').add_to(map_yeg)
HeatMap(restaurant_latlons).add_to(map_yeg)
folium.Circle(roi_dt, radius=1000, color='white', fill=True, fill_opacity=0.6).add_to(map_yeg)
folium.Marker(dt_yeg).add_to(map_yeg)
for lat, lon in zip(good_latitudes, good_longitudes):
    folium.CircleMarker([lat, lon], radius=2, color='blue', fill=True, fill_color='blue', fill_opacity=1).add_to(map_yeg) 
map_yeg

We now have a few locations, and we know that each of those locations has no more than two restaurants in radius of 250m, and no Chinese restaurant closer than 400m. Any of those locations is a potential candidate for a new Chinese restaurant, at least based on nearby competition.

Let's now show those good locations in a form of heatmap:

In [35]:
map_yeg = folium.Map(location=dt_yeg, zoom_start=15)
HeatMap(good_locations, radius=25).add_to(map_yeg)
folium.Marker(dt_yeg).add_to(map_yeg)
for lat, lon in zip(good_latitudes, good_longitudes):
    folium.CircleMarker([lat, lon], radius=2, color='blue', fill=True, fill_color='blue', fill_opacity=1).add_to(map_yeg)
map_yeg

What we have now is a clear indication of zones with low number of restaurants in vicinity, and no Chinese restaurants at all nearby.

Let's now cluster those locations to create centers of zones containing good locations. Those zones, their centers and addresses will be the final result of our analysis.

In [36]:
k = 2

good_xys = df_good_locations[['X', 'Y']].values
kmeans = KMeans(n_clusters=k, random_state=0).fit(good_xys)

cluster_centers = [utm_to_lonlat(cc[0], cc[1]) for cc in kmeans.cluster_centers_]

map_yeg = folium.Map(location=roi_dt, zoom_start=14)
folium.TileLayer('cartodbpositron').add_to(map_yeg)
HeatMap(restaurant_latlons).add_to(map_yeg)
folium.Circle(roi_dt, radius=500, color='white', fill=True, fill_opacity=0.4).add_to(map_yeg)
folium.Marker(dt_yeg).add_to(map_yeg)
for lon, lat in cluster_centers:
    folium.Circle([lat, lon], radius=500, color='green', fill=True, fill_opacity=0.25).add_to(map_yeg) 
for lat, lon in zip(good_latitudes, good_longitudes):
    folium.CircleMarker([lat, lon], radius=2, color='blue', fill=True, fill_color='blue', fill_opacity=1).add_to(map_yeg)
map_yeg

Thoes two clusters represent groupings of most of the candidate locations and cluster centers are placed in the middle of the zones 'rich' with location candidates.

The circles represent ranges where it could be ideal for the restaurant relocation. Let's see the circle on a map.

In [37]:
map_yeg = folium.Map(location=dt_yeg, zoom_start=15)
folium.Marker(dt_yeg).add_to(map_yeg)
for lat, lon in zip(good_latitudes, good_longitudes):
    folium.Circle([lat, lon], radius=250, color='#00000000', fill=True, fill_color='#0066ff', fill_opacity=0.07).add_to(map_yeg)
for lat, lon in zip(good_latitudes, good_longitudes):
    folium.CircleMarker([lat, lon], radius=2, color='blue', fill=True, fill_color='blue', fill_opacity=1).add_to(map_yeg)
for lon, lat in cluster_centers:
    folium.Circle([lat, lon], radius=500, color='green', fill=False).add_to(map_yeg) 
map_yeg

**Finaly**, reverse geocode those candidate area centers to get the addresses which can be presented to stakeholders.

In [38]:
candidate_area_addresses = []
print('==============================================================')
print('Addresses of centers of areas recommended for further analysis')
print('==============================================================\n')
for lon, lat in cluster_centers:
    addr = get_address(google_api_key, lat, lon).replace(', Canada', '')
    index = addr.index(", AB")
    candidate_area_addresses.append(addr)    
    x, y = lonlat_to_utm(lon, lat)
    d = calc_utm_distance(x, y, dt_yeg_x, dt_yeg_y)
    print('{}{}=> {:0.2f}m from Rogers Place'.format(addr[:index], ' '*(35-len(addr[:index])), d))

Addresses of centers of areas recommended for further analysis

10220 104 Ave NW, Edmonton         => 58.84m from Rogers Place
99 Street & 103A Avenue, Edmonton  => 450.92m from Rogers Place


Plot these address on the map

In [39]:
map_yeg = folium.Map(location=dt_yeg, zoom_start=16)
folium.Circle(dt_yeg, radius=10, color='#f01111', fill=True, fill_color='#ff6e6e', fill_opacity=0.8, popup='Rogers Place').add_to(map_yeg)
for lonlat, addr in zip(cluster_centers, candidate_area_addresses):
    folium.Marker([lonlat[1], lonlat[0]], popup=addr).add_to(map_yeg) 
for lat, lon in zip(good_latitudes, good_longitudes):
    folium.Circle([lat, lon], radius=250, color='#0000ff00', fill=True, fill_color='#0066ff', fill_opacity=0.05).add_to(map_yeg)
map_yeg


### This concludes our notebook. You may refer to the report or presentation slides from detailed explanation. Thank you!