# Capstone Project - The Battle of the Neighborhoods 
### Applied Data Science Capstone by IBM/Coursera

## Table of contents
* [Introduction: Business Problem](#introduction)
* [Data](#data)

<img src="https://www.telegraph.co.uk/content/dam/Travel/hotels/europe/france/paris/paris-cityscape-overview-guide.jpg?imwidth=450" height="200" width="300">

## Introduction: Business Problem <a name="introduction"></a>

In this project, I will try to find an optimal location for a restaurant in the capital city of France, **Paris**.

As we can all assume, Paris has a lot of restaurants.
We are planning on opening an **Italian** restaurant, without any of them nearby. We will try to find a place where there are fewer of them. Also, that location has to be popular for tourists, because we don't want it to be too far from the city. Hopefully, we will achieve all of the conditions, but the most
important ones are that there aren't too many of the restaurants in general, preferably without Italian restaurants, and that the location is not too far from the city.

By applying the knowledge I've gained during this course, I will try to find best locations based on all of the criteria.
This is useful for stakeholders who are palnning on opening an Italian restaurant in the city of Paris. 

## Data <a name="data"></a>

Based on definition of our problem, factors that will influence our decission are:
* number of existing restaurants in the neighborhood (any type of restaurant)
* number of and distance to Italian restaurants in the neighborhood, if any
* distance of neighborhood from city center

Following data sources will be needed to extract/generate the required information:
* centers of candidate areas will be generated algorithmically and approximate addresses of centers of those areas will be obtained using **Google Maps API reverse geocoding**
* number of restaurants and their type and location in every neighborhood will be obtained using **Foursquare API**
* coordinates of Paris center will be obtained using **Google Maps API geocoding**

### Slicing Neighbourhoods

In this section we will create a grid of cells that will cover the area we will be exploring, approximately 12x12 killometers around the center of the city. As a center point, we will pick the Eiffel Tower.

First of all, we need to get the location of the city we are researching. In this case, it's Paris. Using the *geolocator* we will find the latitude and longtitude of Paris. 

In [1]:
from geopy.geocoders import Nominatim

In [2]:
address = 'Eiffel Tower'
geolocator = Nominatim(user_agent="paris_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Eiffel Tower, Paris, France are {}, {}.'.format(latitude, longitude))
eiffel_tower = [latitude, longitude]

The geograpical coordinate of Eiffel Tower, Paris, France are 48.8582602, 2.29449905431968.


After we have determined the geolocation of the Eiffel ower in Paris, now is the turn to create the grid we have talked about in the previous cell. We will slice them so that they are equaly spaced, centered around city center,  and within ~6km from Eiffel Tower. Since we have to calculate the distances in meters, we will have to transform the latitude and longtitude. Of course, in order to see that calculated location, we will have to revert the values back to latitude and longtitude.

In [3]:
import pyproj
import math

def lonlat_to_xy(lon, lat):
    proj_latlon = pyproj.Proj(proj='latlong',datum='WGS84')
    proj_xy = pyproj.Proj(proj="utm", zone=33, datum='WGS84')
    xy = pyproj.transform(proj_latlon, proj_xy, lon, lat)
    return xy[0], xy[1]

def xy_to_lonlat(x, y):
    proj_latlon = pyproj.Proj(proj='latlong',datum='WGS84')
    proj_xy = pyproj.Proj(proj="utm", zone=33, datum='WGS84')
    lonlat = pyproj.transform(proj_xy, proj_latlon, x, y)
    return lonlat[0], lonlat[1]

def calc_xy_distance(x1, y1, x2, y2):
    dx = x2 - x1
    dy = y2 - y1
    return math.sqrt(dx*dx + dy*dy)

print('Paris center longitude={}, latitude={}'.format(eiffel_tower[1], eiffel_tower[0]))
x, y = lonlat_to_xy(eiffel_tower[1], eiffel_tower[0])
print('Paris center UTM X={}, Y={}'.format(x, y))
lo, la = xy_to_lonlat(x, y)
print('Paris center longitude={}, latitude={}'.format(lo, la))

Paris center longitude=2.29449905431968, latitude=48.8582602
Paris center UTM X=-430870.1660402089, Y=5490027.833357609
Paris center longitude=2.2944990543196813, latitude=48.858260200000004


Let's create a **hexagonal grid of cells**: we offset every other row, and adjust vertical row spacing so that every cell center is equally distant from all it's neighbors.

In [5]:
eiffel_tower_x, eiffel_tower_y = lonlat_to_xy(eiffel_tower[1], eiffel_tower[0]) # City center in Cartesian coordinates

k = math.sqrt(3) / 2 # Vertical offset for hexagonal grid cells
x_min = eiffel_tower_x - 6000
x_step = 600
y_min = eiffel_tower_y - 6000 - (int(21/k)*k*600 - 12000)/2
y_step = 600 * k 

latitudes = []
longitudes = []
distances_from_center = []
xs = []
ys = []
for i in range(0, int(21/k)):
    y = y_min + i * y_step
    x_offset = 300 if i%2==0 else 0
    for j in range(0, 21):
        x = x_min + j * x_step + x_offset
        distance_from_center = calc_xy_distance(eiffel_tower_x, eiffel_tower_y, x, y)
        if (distance_from_center <= 6001):
            lon, lat = xy_to_lonlat(x, y)
            latitudes.append(lat)
            longitudes.append(lon)
            distances_from_center.append(distance_from_center)
            xs.append(x)
            ys.append(y)

print(len(latitudes), 'candidate neighborhood centers generated.')

364 candidate neighborhood centers generated.


Now, we will visualize the data we have gathered so far. Those are the center, or the Eiffel Tower and the neighborhood centers we have calculated.

In [6]:
import folium
import requests

In [7]:
map_paris = folium.Map(location=eiffel_tower, zoom_start=12)
folium.Marker(eiffel_tower, popup='Eiffel Tower').add_to(map_paris)
for lat, lon in zip(latitudes, longitudes):
    folium.CircleMarker([lat, lon], radius=2, color='red', fill=True, fill_color='white', fill_opacity=1).add_to(map_paris) 
    folium.Circle([lat, lon], radius=300, color='red', fill=False).add_to(map_paris)
    #folium.Marker([lat, lon]).add_to(map_paris)
map_paris

On this map, we can see that we have equally divided our area into "neighborhoods" around the Eiffel Tower.

It would be useful to obtain information about addresses inside our hexagon, so in the next cell we will call the Foursquare API in order to obtain them. 

*The next cell is hidden because it contains authorization values for the Foursquare API.*

In [174]:
client_id = 'ECC1UPSMPTCEWHX0B3MEWKDN1CTOGWUODVCSX5KBAU34U3S2' # your Foursquare ID
client_secret = 'WWXWYUFBG1RFCZ1JNSXDH2GSUESUWA05O3O3PFHDJNOULJBL' # your Foursquare Secret
version = '20191010'

In [13]:
addresses = []
for lat, lon in zip(latitudes, longitudes):
    url = 'https://api.foursquare.com/v2/venues/search?&client_id={}&client_secret={}&v={}&ll={},{}'.format(
            client_id, 
            client_secret, 
            version, 
            lat, 
            lon)
    results = requests.get(url).json()
    if 'venues' in results["response"]:
        address = results["response"]['venues'][0]['location']['formattedAddress'][0]
        addresses.append(address)
    else:
        address = 'NO ADDRESS'
        addresses.append(address)

print('Done')

Done


In [20]:
import pandas as pd
import pickle

As we have obtained information about addresses, now we will place them inside a data frame and also save that dataframe so we don't have to call the API and function to obtain addresses every time we need to use them. 

In [15]:
df_locations = pd.DataFrame({'Address': addresses,
                             'Latitude': latitudes,
                             'Longitude': longitudes,
                             'X': xs,
                             'Y': ys,
                             'Distance from center': distances_from_center})

df_locations

Unnamed: 0,Address,Latitude,Longitude,X,Y,Distance from center
0,France,48.805421,2.283475,-432670.16604,5.484312e+06,5992.495307
1,Avenue De Paris,48.806315,2.291446,-432070.16604,5.484312e+06,5840.376700
2,France,48.807208,2.299417,-431470.16604,5.484312e+06,5747.173218
3,45 avenue Marx Dormoy,48.808101,2.307389,-430870.16604,5.484312e+06,5715.767665
4,35 Rue Molière,48.808993,2.315361,-430270.16604,5.484312e+06,5747.173218
...,...,...,...,...,...,...
359,30 Rue Chanzy,48.907520,2.273587,-431470.16604,5.495744e+06,5747.173218
360,Rue de la Station,48.908416,2.281574,-430870.16604,5.495744e+06,5715.767665
361,France,48.909312,2.289561,-430270.16604,5.495744e+06,5747.173218
362,Place Marguerite Durand,48.910206,2.297549,-429670.16604,5.495744e+06,5840.376700


In [17]:
df_locations.to_pickle('./locations.pkl')

### Foursquare

In this section we will search gor the information about the restaurants in our neighborhoods. We're interested in venues in 'food' category, but only those that are proper restaurants - coffe shops, pizza places, bakeries etc. are not direct competitors so we don't care about those. So we will include in out list only venues that have 'restaurant' in category name, and we'll make sure to detect and include all the subcategories of specific 'Italian restaurant' category, as we need info on Italian restaurants in the neighborhood.

In [18]:
food_category = '4d4b7105d754a06374d81259'

italian_restaurant_categories = ['4bf58dd8d48988d110941735','55a5a1ebe4b013909087cbb6','55a5a1ebe4b013909087cb7c',
                                 '55a5a1ebe4b013909087cba7','55a5a1ebe4b013909087cba1','55a5a1ebe4b013909087cba4',
                                 '55a5a1ebe4b013909087cb95','55a5a1ebe4b013909087cb89','55a5a1ebe4b013909087cb9b',
                                 '55a5a1ebe4b013909087cb98','55a5a1ebe4b013909087cbbf','55a5a1ebe4b013909087cb79',
                                 '55a5a1ebe4b013909087cbb0','55a5a1ebe4b013909087cbb3','55a5a1ebe4b013909087cb74',
                                 '55a5a1ebe4b013909087cbaa','55a5a1ebe4b013909087cb83','55a5a1ebe4b013909087cb8c',
                                 '55a5a1ebe4b013909087cb92','55a5a1ebe4b013909087cb8f','55a5a1ebe4b013909087cb86',
                                 '55a5a1ebe4b013909087cbb9','55a5a1ebe4b013909087cb7f','55a5a1ebe4b013909087cbbc',
                                 '55a5a1ebe4b013909087cb9e','55a5a1ebe4b013909087cbc2','55a5a1ebe4b013909087cbad']

def is_restaurant(categories, specific_filter=None):
    restaurant_words = ['restaurant', 'diner', 'taverna', 'steakhouse']
    restaurant = False
    specific = False
    for c in categories:
        category_name = c[0].lower()
        category_id = c[1]
        for r in restaurant_words:
            if r in category_name:
                restaurant = True
        if 'fast food' in category_name:
            restaurant = False
        if not(specific_filter is None) and (category_id in specific_filter):
            specific = True
            restaurant = True
    return restaurant, specific

def get_categories(categories):
    return [(cat['name'], cat['id']) for cat in categories]

def get_venues_near_location(lat, lon, category, client_id, client_secret, radius=500, limit=100):
    url = 'https://api.foursquare.com/v2/venues/explore?client_id={}&client_secret={}&v={}&ll={},{}&categoryId={}&radius={}&limit={}'.format(
        client_id, client_secret, version, lat, lon, category, radius, limit)
    try:
        results = requests.get(url).json()['response']['groups'][0]['items']
        venues = [(item['venue']['id'],
                   item['venue']['name'],
                   get_categories(item['venue']['categories']),
                   (item['venue']['location']['lat'], item['venue']['location']['lng']),
                   item['venue']['location']['formattedAddress'],
                   item['venue']['location']['distance']) for item in results]        
    except:
        venues = []
    return venues

In [21]:
def get_restaurants(lats, lons):
    restaurants = {}
    italian_restaurants = {}
    location_restaurants = []

    print('Obtaining venues around candidate locations:', end='')
    for lat, lon in zip(lats, lons):
        venues = get_venues_near_location(lat, lon, food_category, client_id, client_secret, radius=350, limit=100)
        area_restaurants = []
        for venue in venues:
            venue_id = venue[0]
            venue_name = venue[1]
            venue_categories = venue[2]
            venue_latlon = venue[3]
            venue_address = venue[4]
            venue_distance = venue[5]
            is_res, is_italian = is_restaurant(venue_categories, specific_filter=italian_restaurant_categories)
            if is_res:
                x, y = lonlat_to_xy(venue_latlon[1], venue_latlon[0])
                restaurant = (venue_id, venue_name, venue_latlon[0], venue_latlon[1], venue_address, venue_distance, is_italian, x, y)
                if venue_distance<=300:
                    area_restaurants.append(restaurant)
                restaurants[venue_id] = restaurant
                if is_italian:
                    italian_restaurants[venue_id] = restaurant
        location_restaurants.append(area_restaurants)
        print(' .', end='')
    print(' done.')
    return restaurants, italian_restaurants, location_restaurants

restaurants = {}
italian_restaurants = {}
location_restaurants = []
loaded = False
try:
    with open('restaurants_350.pkl', 'rb') as f:
        restaurants = pickle.load(f)
    with open('italian_restaurants_350.pkl', 'rb') as f:
        italian_restaurants = pickle.load(f)
    with open('location_restaurants_350.pkl', 'rb') as f:
        location_restaurants = pickle.load(f)
    print('Restaurant data loaded.')
    loaded = True
except:
    pass

if not loaded:
    restaurants, italian_restaurants, location_restaurants = get_restaurants(latitudes, longitudes)
    with open('restaurants_350.pkl', 'wb') as f:
        pickle.dump(restaurants, f)
    with open('italian_restaurants_350.pkl', 'wb') as f:
        pickle.dump(italian_restaurants, f)
    with open('location_restaurants_350.pkl', 'wb') as f:
        pickle.dump(location_restaurants, f)

Obtaining venues around candidate locations: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . done.


In [22]:
import numpy as np

In [23]:
print('Total number of restaurants:', len(restaurants))
print('Total number of Italian restaurants:', len(italian_restaurants))
print('Percentage of Italian restaurants: {:.2f}%'.format(len(italian_restaurants) / len(restaurants) * 100))
print('Average number of restaurants in neighborhood:', np.array([len(r) for r in location_restaurants]).mean())

Total number of restaurants: 2666
Total number of Italian restaurants: 324
Percentage of Italian restaurants: 12.15%
Average number of restaurants in neighborhood: 6.403846153846154


In [24]:
print('List of all restaurants')
print('-----------------------')
for r in list(restaurants.values())[:10]:
    print(r)
print('...')
print('Total:', len(restaurants))

List of all restaurants
-----------------------
('4f817c1fe4b0b2237d17f834', 'La marine', 48.80482, 2.28136, ['Trouville', 'France'], 168, False, -432836.2056250459, 5484271.730835456)
('4f6db5e6e4b0463c93ed61b5', 'o minhoto', 48.80843846873504, 2.283992963370548, ['France'], 278, False, -432575.49887857225, 5484639.817899577)
('4e0621c5ae60a90eabc0c12d', 'Iki Sushi', 48.80423190270499, 2.2883880732021566, ['10 boulevard de Vanves', '92320 Châtillon', 'France'], 322, False, -432333.0416026593, 5484119.276652427)
('5143928fe4b0d65fe0755118', 'Asie Royale', 48.80583888001345, 2.2912824153900146, ['39 avenue de Paris', '92320 Châtillon', 'France'], 54, False, -432091.0607111484, 5484261.332718332)
('4b7bdad4f964a520c8702fe3', 'Piccolo Dino', 48.808678441390995, 2.2965718451646255, ['111 avenue de Paris', '92320 Châtillon', 'France'], 339, True, -431650.69916849525, 5484510.197383144)
('4ba37893f964a520d53f38e3', 'Aoyama', 48.8084689013508, 2.296408659828958, ['134 avenue de Paris', '92320

In [25]:
print('List of Italian restaurants')
print('---------------------------')
for r in list(italian_restaurants.values())[:10]:
    print(r)
print('...')
print('Total:', len(italian_restaurants))

List of Italian restaurants
---------------------------
('4b7bdad4f964a520c8702fe3', 'Piccolo Dino', 48.808678441390995, 2.2965718451646255, ['111 avenue de Paris', '92320 Châtillon', 'France'], 339, True, -431650.69916849525, 5484510.197383144)
('54b4fef1498ef216ab2b0e4e', 'Milano', 48.80965268865753, 2.327658534049988, ['79 avenue Laplace', '94110 Arcueil', 'France'], 295, True, -429358.068721978, 5484232.767353718)
('4bf122b13a15d13a02323f9f', 'Mezzo di Pasta', 48.81079513822355, 2.3291125815396754, ['C.C La Vache Noire', '94110 Arcueil', 'France'], 160, True, -429230.27676722466, 5484341.314740278)
('4c2334aa9085d13a8d7687cc', 'La Spaghetteria', 48.814015, 2.302883, ['183 avenue Pierre Brossolette', '92120 Montrouge', 'France'], 205, True, -431088.7331353909, 5485023.001007017)
('4cd2952083e0721ebcb45797', 'Santa Rita', 48.8170904590696, 2.328253984451294, ['24 rue Barbès', '92120 Montrouge', 'France'], 316, True, -429175.08993562264, 5485049.266314293)
('4bdac2e42a3a0f4715e2abb6',

As we have collected information about restaurants in general and Italian restaurants also, we will display them so we can actually see on the map what is their location. The ones printed in blue are restaurants from other cuisines and the red ones are Italian restaurants.

Now, we can begin to uncover which neighborhood will be best for our restaurant.

In [28]:
map_paris = folium.Map(location=eiffel_tower, zoom_start=13)
folium.Marker(eiffel_tower, popup='Eiffel Tower').add_to(map_paris)
for res in restaurants.values():
    lat = res[2]; lon = res[3]
    is_italian = res[6]
    color = 'red' if is_italian else 'blue'
    folium.CircleMarker([lat, lon], radius=3, color=color, fill=True, fill_color=color, fill_opacity=1).add_to(map_paris)
map_paris

## Methodology <a name="methodology"></a>

In this project we will direct our efforts on detecting areas of Paris that have low restaurant density, particularly those with low number of Italian restaurants. We will limit our analysis to area ~6km around city center.

First, we have identified the exact location that we choose as the center point, which is the **Eiffel Tower, Paris, France**.
We have decided to split the neighborhood into equal parts, in the **6km** range from our center point. Using the Foursquare, we have identified the density of restaurants in general and also, **Italian restaurants**.

Second step in our analysis will be calculation and exploration of '**restaurant density**' across different areas of Paris. 
We will use **heatmaps** to identify a few promising areas close to center with low number of restaurants in general (and hopefully no Italian restaurants nearby) and focus our attention on those areas.

In third and final step we will focus on most promising areas that we have discovered in the previous step and within those create **clusters of locations that meet some basic requirements** established in discussion with stakeholders: we will take into consideration locations with **no more than two restaurants in radius of 300 meters**, and we want locations **without Italian restaurants in radius of 400 meters**. We will present map of all such locations but also create clusters (using **k-means clustering**) of those locations to identify general zones / neighborhoods / addresses which should be a starting point for final 'street level' exploration and search for optimal venue location by stakeholders.

## Analysis <a name="analysis"></a>

As a first step, let's count the **number of restaurants in every area candidate**:

In [29]:
location_restaurants_count = [len(res) for res in location_restaurants]

df_locations['Restaurants in area'] = location_restaurants_count

print('Average number of restaurants in every area with radius=300m:', np.array(location_restaurants_count).mean())

df_locations.head(10)

Average number of restaurants in every area with radius=300m: 6.403846153846154


Unnamed: 0,Address,Latitude,Longitude,X,Y,Distance from center,Restaurants in area
0,France,48.805421,2.283475,-432670.16604,5484312.0,5992.495307,1
1,Avenue De Paris,48.806315,2.291446,-432070.16604,5484312.0,5840.3767,1
2,France,48.807208,2.299417,-431470.16604,5484312.0,5747.173218,3
3,45 avenue Marx Dormoy,48.808101,2.307389,-430870.16604,5484312.0,5715.767665,0
4,35 Rue Molière,48.808993,2.315361,-430270.16604,5484312.0,5747.173218,3
5,60 avenue Marx Dormoy,48.809885,2.323333,-429670.16604,5484312.0,5840.3767,1
6,C.C La Vache Noire (Place de la Vache Noire),48.810776,2.331306,-429070.16604,5484312.0,5992.495307,9
7,92140 Clamart,48.808639,2.270346,-433570.16604,5484832.0,5855.766389,2
8,France,48.809534,2.278317,-432970.16604,5484832.0,5604.462508,1
9,France,48.810428,2.286288,-432370.16604,5484832.0,5408.326913,1


OK, now let's calculate the **distance to nearest Italian restaurant from every area candidate center** (not only those within 300m - we want distance to closest one, regardless of how distant it is).

In [30]:
distances_to_italian_restaurant = []

for area_x, area_y in zip(xs, ys):
    min_distance = 10000
    for res in italian_restaurants.values():
        res_x = res[7]
        res_y = res[8]
        d = calc_xy_distance(area_x, area_y, res_x, res_y)
        if d<min_distance:
            min_distance = d
    distances_to_italian_restaurant.append(min_distance)

df_locations['Distance to Italian restaurant'] = distances_to_italian_restaurant

In [31]:
df_locations.head(10)

Unnamed: 0,Address,Latitude,Longitude,X,Y,Distance from center,Restaurants in area,Distance to Italian restaurant
0,France,48.805421,2.283475,-432670.16604,5484312.0,5992.495307,1,1038.541703
1,Avenue De Paris,48.806315,2.291446,-432070.16604,5484312.0,5840.3767,1,463.905834
2,France,48.807208,2.299417,-431470.16604,5484312.0,5747.173218,3,268.045476
3,45 avenue Marx Dormoy,48.808101,2.307389,-430870.16604,5484312.0,5715.767665,0,743.77456
4,35 Rue Molière,48.808993,2.315361,-430270.16604,5484312.0,5747.173218,3,915.537954
5,60 avenue Marx Dormoy,48.809885,2.323333,-429670.16604,5484312.0,5840.3767,1,322.013917
6,C.C La Vache Noire (Place de la Vache Noire),48.810776,2.331306,-429070.16604,5484312.0,5992.495307,9,162.760412
7,92140 Clamart,48.808639,2.270346,-433570.16604,5484832.0,5855.766389,2,489.669726
8,France,48.809534,2.278317,-432970.16604,5484832.0,5604.462508,1,539.317569
9,France,48.810428,2.286288,-432370.16604,5484832.0,5408.326913,1,788.025541


In [32]:
print('Average distance to closest Italian restaurant from each area center:', df_locations['Distance to Italian restaurant'].mean())

Average distance to closest Italian restaurant from each area center: 1063.4328630439534


OK, so **on average Italian restaurant can be found within ~1km** from every area center candidate. That's fairly close, so we need to filter our areas carefully!

Let's crete a map showing **heatmap / density of restaurants** and try to extract some meaningfull info from that. Also, let's show **borders of Paris boroughs** on our map and a few circles indicating distance of 1km, 2km and 3km from Eiffel Tower.

In [34]:
from folium import plugins
from folium.plugins import HeatMap

In [41]:
paris_boroughs_url = 'https://france-geojson.gregoiredavid.fr/repo/departements/75-paris/communes-75-paris.geojson'
paris_boroughs = requests.get(paris_boroughs_url).json()

def boroughs_style(feature):
    return { 'color': 'blue', 'fill': False }

In [36]:
restaurant_latlons = [[res[2], res[3]] for res in restaurants.values()]
italian_latlons = [[res[2], res[3]] for res in italian_restaurants.values()]

In [42]:
map_paris_heat = folium.Map(location=eiffel_tower, zoom_start=13)
folium.TileLayer('cartodbpositron').add_to(map_paris_heat) #cartodbpositron cartodbdark_matter
HeatMap(restaurant_latlons).add_to(map_paris_heat)
folium.Marker(eiffel_tower).add_to(map_paris_heat)
folium.Circle(eiffel_tower, radius=1000, fill=False, color='white').add_to(map_paris_heat)
folium.Circle(eiffel_tower, radius=2000, fill=False, color='white').add_to(map_paris_heat)
folium.Circle(eiffel_tower, radius=3000, fill=False, color='white').add_to(map_paris_heat)
folium.GeoJson(paris_boroughs, style_function=boroughs_style, name='geojson').add_to(map_paris_heat)
map_paris_heat

Looks like a few pockets of low restaurant density closest to city center can be found **north, west and south-east from Eiffel Tower**. 

Let's create another heatmap map showing **heatmap/density of Italian restaurants** only.

In [43]:
map_paris_heat = folium.Map(location=eiffel_tower, zoom_start=13)
folium.TileLayer('cartodbpositron').add_to(map_paris_heat) #cartodbpositron cartodbdark_matter
HeatMap(italian_latlons).add_to(map_paris_heat)
folium.Marker(eiffel_tower).add_to(map_paris_heat)
folium.Circle(eiffel_tower, radius=1000, fill=False, color='white').add_to(map_paris_heat)
folium.Circle(eiffel_tower, radius=2000, fill=False, color='white').add_to(map_paris_heat)
folium.Circle(eiffel_tower, radius=3000, fill=False, color='white').add_to(map_paris_heat)
folium.GeoJson(paris_boroughs, style_function=boroughs_style, name='geojson').add_to(map_paris_heat)
map_paris_heat

This map is not so 'hot' (Italian restaurants represent a subset of ~12% of all restaurants in Paris) but it also indicates higher density of existing Italian restaurants directly east and north from Eiffel Tower, with closest pockets of **low Italian restaurant density positioned west, south-west and south from city center**.

Based on this we will now focus our analysis on areas *south-west and south from Paris center* - we will move the center of our area of interest and reduce it's size to have a radius of **2.5km**. This places our location candidates mostly in boroughs **Paris 15e  Arrondissement and Paris 16e  Arrondissement**. The Paris 15e  Arrondissement is not very interesting to our stakeholders since it's mostly a residential zone. On the other hand the Paris 16e  Arrondissement is a much better choice since it's nearer to the Seine and Eiffel Tower. Another interesting area is **Paris 7e  Arrondissement**,  but it's a little bit crowded. So, the best choice would be **Paris 16e  Arrondissement**, and we will procede analyzing that borough in the further sections.

## Paris 16e Arrondissement

**Paris 16e Arrondissement** is a large district that occupies most of the West of Paris, extending east-west between the bends of the Seine from the Jardins of Trocadero immediately facing the Eiffel Tower to the expansive Bois du Boulogne (which occupies a larger part of the 16th's territory), and north-south from the Etoile to the southern border of Paris. It is known to be the residence of choice for affluent Parisians, and for hosting numerous internationally famous events, such as the Roland Garros French Open tennis tournaments, as well as the home stadium of the Paris Saint-Germain football club.

Some of the reviews from TripAdvisor:

*It is one of the richest arrondissements of Paris and this is easy to guess from the many beautiful residential buildings, nice restaurants and shops. Except some very touristic areas (like Trocadero) it is less crowded by tourists.
Also, the Bois de Boulogne and many parks, lakes, sports fields are in the same district.*

*The area we visited in the 16th Arrondissement had the best views of the Eiffel Tower, that is not situated in the Seine. As we ventured further along though, there wasn’t much to do and the people there weren’t too friendly. We don’t have time for stuck up people so we left.*

*One of Paris' poshest areas is full of massive apartment buildings the look like fortresses. But it also has some cute village-like areas, some good views of the River. High-end restaurants and tea salons. Pretty parks.*

Now, let's define new, more narrow region of interest, which will include low-restaurant-count parts of Paris 16e Arrondissement closest to Eiffel Tower.

In [152]:
roi_x_min = eiffel_tower_x - 3000
roi_y_max = eiffel_tower_y + 3000
roi_width = 5000
roi_height = 5000
roi_center_x = roi_x_min + 2500
roi_center_y = roi_y_max - 2500
roi_center_lon, roi_center_lat = xy_to_lonlat(roi_center_x, roi_center_y)
roi_center = [roi_center_lat, roi_center_lon]

map_paris_heat = folium.Map(location=roi_center, zoom_start=14)
HeatMap(restaurant_latlons).add_to(map_paris_heat)
folium.Marker(eiffel_tower).add_to(map_paris_heat)
folium.Circle(roi_center, radius=2500, color='white', fill=True, fill_opacity=0.4).add_to(map_paris_heat)
folium.GeoJson(paris_boroughs, style_function=boroughs_style, name='geojson').add_to(map_paris_heat)
map_paris_heat

Not bad - this nicely covers all the pockets of low restaurant density in Paris 16e Arrondissement closest to Eiffel Tower.

Let's also create new, more dense grid of location candidates restricted to our new region of interest (let's make our location candidates 100m appart).

In [153]:
k = math.sqrt(3) / 2 # Vertical offset for hexagonal grid cells
x_step = 100
y_step = 100 * k 
roi_y_min = roi_center_y - 1000

roi_latitudes = []
roi_longitudes = []
roi_xs = []
roi_ys = []
for i in range(0, int(51/k)):
    y = roi_y_min + i * y_step
    x_offset = 50 if i%2==0 else 0
    for j in range(0, 51):
        x = roi_x_min + j * x_step + x_offset
        d = calc_xy_distance(roi_center_x, roi_center_y, x, y)
        if (d <= 2501):
            lon, lat = xy_to_lonlat(x, y)
            roi_latitudes.append(lat)
            roi_longitudes.append(lon)
            roi_xs.append(x)
            roi_ys.append(y)

print(len(roi_latitudes), 'candidate neighborhood centers generated.')

1720 candidate neighborhood centers generated.


OK. Now let's calculate two most important things for each location candidate: **number of restaurants in vicinity** (we'll use radius of **250 meters**) and **distance to closest Italian restaurant**.

In [154]:
def count_restaurants_nearby(x, y, restaurants, radius=250):    
    count = 0
    for res in restaurants.values():
        res_x = res[7]; res_y = res[8]
        d = calc_xy_distance(x, y, res_x, res_y)
        if d<=radius:
            count += 1
    return count

def find_nearest_restaurant(x, y, restaurants):
    d_min = 100000
    for res in restaurants.values():
        res_x = res[7]; res_y = res[8]
        d = calc_xy_distance(x, y, res_x, res_y)
        if d<=d_min:
            d_min = d
    return d_min

roi_restaurant_counts = []
roi_italian_distances = []

print('Generating data on location candidates... ', end='')
for x, y in zip(roi_xs, roi_ys):
    count = count_restaurants_nearby(x, y, restaurants, radius=250)
    roi_restaurant_counts.append(count)
    distance = find_nearest_restaurant(x, y, italian_restaurants)
    roi_italian_distances.append(distance)
print('done.')

Generating data on location candidates... done.


In [155]:
df_roi_locations = pd.DataFrame({'Latitude':roi_latitudes,
                                 'Longitude':roi_longitudes,
                                 'X':roi_xs,
                                 'Y':roi_ys,
                                 'Restaurants nearby':roi_restaurant_counts,
                                 'Distance to Italian restaurant':roi_italian_distances})

df_roi_locations.head(10)

Unnamed: 0,Latitude,Longitude,X,Y,Restaurants nearby,Distance to Italian restaurant
0,48.84977,2.259063,-433620.16604,5489528.0,8,366.525137
1,48.849919,2.260392,-433520.16604,5489528.0,7,288.198547
2,48.850069,2.261722,-433420.16604,5489528.0,5,227.543686
3,48.850218,2.263051,-433320.16604,5489528.0,5,201.230852
4,48.850367,2.264381,-433220.16604,5489528.0,4,221.836838
5,48.850517,2.26571,-433120.16604,5489528.0,5,223.436549
6,48.850666,2.26704,-433020.16604,5489528.0,5,153.381124
7,48.850815,2.26837,-432920.16604,5489528.0,4,130.872636
8,48.850965,2.269699,-432820.16604,5489528.0,3,175.298387
9,48.851114,2.271029,-432720.16604,5489528.0,1,253.636358


OK. Let us now **filter** those locations: we're interested only in **locations with no more than four restaurants in radius of 250 meters**, and **no Italian restaurants in radius of 200 meters**.

In [156]:
good_res_count = np.array((df_roi_locations['Restaurants nearby']<=4))
print('Locations with no more than four restaurants nearby:', good_res_count.sum())

good_ita_distance = np.array(df_roi_locations['Distance to Italian restaurant']>=200)
print('Locations with no Italian restaurants within 200m:', good_ita_distance.sum())

good_locations = np.logical_and(good_res_count, good_ita_distance)
print('Locations with both conditions met:', good_locations.sum())

df_good_locations = df_roi_locations[good_locations]

Locations with no more than four restaurants nearby: 1085
Locations with no Italian restaurants within 200m: 1287
Locations with both conditions met: 1063


Let's visualize what we have gathered and see how it looks on a map.

In [157]:
good_latitudes = df_good_locations['Latitude'].values
good_longitudes = df_good_locations['Longitude'].values

good_locations = [[lat, lon] for lat, lon in zip(good_latitudes, good_longitudes)]

map_paris_heat = folium.Map(location=roi_center, zoom_start=14)
folium.TileLayer('cartodbpositron').add_to(map_paris_heat)
HeatMap(restaurant_latlons).add_to(map_paris_heat)
folium.Circle(roi_center, radius=2500, color='white', fill=True, fill_opacity=0.6).add_to(map_paris_heat)
folium.Marker(eiffel_tower).add_to(map_paris_heat)
for lat, lon in zip(good_latitudes, good_longitudes):
    folium.CircleMarker([lat, lon], radius=2, color='blue', fill=True, fill_color='blue', fill_opacity=1).add_to(map_paris_heat) 
folium.GeoJson(paris_boroughs, style_function=boroughs_style, name='geojson').add_to(map_paris_heat)
map_paris_heat

Looking good. What we have now is a clear indication of zones with low number of restaurants in vicinity, and *no* Italian restaurants at all nearby.

Let us now **cluster** those locations to create **centers of zones containing good locations**. Those zones, their centers and addresses will be the final result of our analysis. 

In [158]:
from sklearn.cluster import KMeans

In [178]:
number_of_clusters = 20

good_xys = df_good_locations[['X', 'Y']].values
kmeans = KMeans(n_clusters=number_of_clusters, random_state=0).fit(good_xys)

cluster_centers = [xy_to_lonlat(cc[0], cc[1]) for cc in kmeans.cluster_centers_]

map_paris_heat = folium.Map(location=roi_center, zoom_start=13)
folium.TileLayer('cartodbpositron').add_to(map_paris_heat)
HeatMap(restaurant_latlons).add_to(map_paris_heat)
folium.Circle(roi_center, radius=2500, color='white', fill=True, fill_opacity=0.4).add_to(map_paris_heat)
folium.Marker(eiffel_tower).add_to(map_paris_heat)
for lon, lat in cluster_centers:
    folium.Circle([lat, lon], radius=500, color='grey', fill=True, fill_opacity=0.25).add_to(map_paris_heat) 
for lat, lon in zip(good_latitudes, good_longitudes):
    folium.CircleMarker([lat, lon], radius=2, color='white', fill=True, fill_color='white', fill_opacity=1).add_to(map_paris_heat)
folium.GeoJson(paris_boroughs, style_function=boroughs_style, name='geojson').add_to(map_paris_heat)
map_paris_heat

Let's see those zones on a city map without heatmap, using shaded areas to indicate our clusters:

In [168]:
map_paris_heat = folium.Map(location=roi_center, zoom_start=14)
folium.Marker(eiffel_tower).add_to(map_paris_heat)
for lat, lon in zip(good_latitudes, good_longitudes):
    folium.Circle([lat, lon], radius=250, color='#00000000', fill=True, fill_color='#CA84B7', fill_opacity=0.07).add_to(map_paris_heat)
for lat, lon in zip(good_latitudes, good_longitudes):
    folium.CircleMarker([lat, lon], radius=2, color='white', fill=True, fill_color='white', fill_opacity=1).add_to(map_paris_heat)
for lon, lat in cluster_centers:
    folium.Circle([lat, lon], radius=500, color='grey', fill=False).add_to(map_paris_heat) 
folium.GeoJson(paris_boroughs, style_function=boroughs_style, name='geojson').add_to(map_paris_heat)
map_paris_heat

In [175]:
candidate_area_addresses = []
print('==============================================================')
print('Addresses of centers of areas recommended for further analysis')
print('==============================================================\n')
for lon, lat in cluster_centers:
    url = 'https://api.foursquare.com/v2/venues/search?&client_id={}&client_secret={}&v={}&ll={},{}'.format(
            client_id, 
            client_secret, 
            version, 
            lat, 
            lon)
    results = requests.get(url).json()
    if 'venues' in results["response"]:
        address = results["response"]['venues'][0]['location']['formattedAddress'][0]
        candidate_area_addresses.append(address) 
    else:
        address = 'NO ADDRESS'
        candidate_area_addresses.append(address) 
        
    x, y = lonlat_to_xy(lon, lat)
    d = calc_xy_distance(x, y, eiffel_tower_x, eiffel_tower_y)
    print('{}{} => {:.1f}km from Eiffel Tower'.format(address, ' '*(50-len(address)), d/1000))

Addresses of centers of areas recommended for further analysis

57 boulevard de Montmorency (Bois de Boulogne)     => 2.6km from Eiffel Tower
France                                             => 1.9km from Eiffel Tower
8 avenue Dutuit                                    => 1.8km from Eiffel Tower
Chemin de ceinture du Lac Inférieur                => 2.9km from Eiffel Tower
58 avenue de Wagram                                => 2.4km from Eiffel Tower
5 avenue Anatole France (Tour Eiffel, 1er étage)   => 0.1km from Eiffel Tower
14 rue de Monceau                                  => 2.2km from Eiffel Tower
14 Boulevard de la Tour, Maubourg                  => 1.2km from Eiffel Tower
21 rue de Sablonville (Rue Montrosier)             => 2.7km from Eiffel Tower
1 boulevard Lannes                                 => 2.0km from Eiffel Tower
22 rue Cortambert                                  => 1.1km from Eiffel Tower
Porte de la Muette                                 => 2.0km from Eiffel Tower


This concludes our analysis. We have created 20 addresses representing centers of zones containing locations with low number of restaurants and no Italian restaurants nearby, all zones being fairly close to city center (all less than 4km from Eiffel Tower, and about half of those less than 2km from Eiffel Tower). Although zones are shown on map with a radius of ~500 meters (grey circles), their shape is actually very irregular and their centers/addresses should be considered only as a starting point for exploring area neighborhoods in search for potential restaurant locations. Most of the zones are located in **Paris 16e Arrondissement** boroughs, as we have mostly focused on that area, which we have identified as interesting due to being popular with tourists, fairly close to city center and well connected by public transport.

The red dot on the map will represent the Eiffel Tower and the popups represent the locations that are good for opening an Italian restaurant.

In [176]:
map_paris_heat = folium.Map(location=roi_center, zoom_start=14)
folium.Circle(eiffel_tower, radius=50, color='red', fill=True, fill_color='red', fill_opacity=1).add_to(map_paris_heat)
for lonlat, addr in zip(cluster_centers, candidate_area_addresses):
    folium.Marker([lonlat[1], lonlat[0]], popup=address).add_to(map_paris_heat) 
for lat, lon in zip(good_latitudes, good_longitudes):
    folium.Circle([lat, lon], radius=250, color='#0000ff00', fill=True, fill_color='#0066ff', fill_opacity=0.05).add_to(map_paris_heat)
map_paris_heat