### Capstone Project - The Battle of Neighborhoods (Week 1)

## Table of contents

[Introduction](#intro_cell)  
[Data section](#data_cell)  
[Methodology section](#methodology_cell)  
[Analysis section](#analysis_cell)  
[Results and Discussion section](#results_cell)  
[Conclusion section](#conclusion_cell)  
[END](#end_cell)  

### Part 1 : Introduction and Data Sections

## 1. Introduction<a id='intro_cell'></a> 

My brother is a restorers and has a japanese restaurant in the north of Paris (France). He wants to expand his business and create a chain of japanese restaurants in France at first and then in Europe . For his second restaurant, he stays in Paris and he would like to detect the optimal location for this second restaurant. He would like to be in a neighborhood similar to the first restaurant area with some constraints that are:

* low japanese restaurants in vicinity (< 5)
* neighborhood with two middle school at least
* neighborhood with one bike station at least
* neighborhood close to city center as possible

__Business Problem and Interested Audience:__

The challenge is to find the optimal location for a japanese restaurant in a city like Paris or New York. I believe this is a relevant challenge with valid questions for anyone want to have a chain of restaurants in  to other large city in US, EU or Asia. The same methodology can be applied with other type of restaurant.

## 2. Data section<a id='data_cell'></a> 

The following data is required to answer the issues of the problem:

* List of Boroughs and neighborhoods of Paris with their geodata (latitud and longitud)
* List of middle schools Paris with their geodata (latitud and longitud)
* List of bike station in Paris with their address location
* List of existing japanese restaurants in the neighborhood
* distance of neighborhood from city center

Following data sources will be needed to extract/generate the required information:

* centers of candidate areas will be generated algorithmically and approximate addresses of centers of those areas will be obtained using Foursquare
List of japanese restaurants and their type and location in every neighborhood will be obtained using Foursquare API

* coordinate of Paris center will be obtained using Foursquare

* List of Boroughs and neighborhoods of Paris with their geodata (latitud and longitud) are available for free at the Paris City Hall website (https://opendata.paris.fr/explore/dataset/quartier_paris/table/)

* List of middle schools Paris with their geodata (latitud and longitud) are available for free at the Paris City Hall website (https://opendata.paris.fr/explore/dataset/secteurs-scolaires/table/?disjunctive.id_projet&disjunctive.zone_commune&disjunctive.annee_scol)

* List of bike station in Paris with their address location are available for free at the Paris City Hall website (https://opendata.paris.fr/explore/dataset/velib-emplacement-des-stations/table/)

Before we get the data and start exploring it, let's download all the dependencies that we will need.

In [1]:
import numpy as np # library to handle data in a vectorized manner

import pandas as pd # library for data analsysis
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import json # library to handle JSON files

#!conda install -c conda-forge geopy --yes # uncomment this line if you haven't completed the Foursquare API lab
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

#!conda install -c conda-forge folium=0.5.0 --yes # uncomment this line if you haven't completed the Foursquare API lab
import folium # map rendering library

print('Libraries imported.')

Libraries imported.


# Neighborhood Candidates

 Let's create latitude & longitude coordinates for centroids of our candidate neighborhoods  
 and get data about school and bike station in Paris

In [2]:
paris_neighborhoods = pd.read_csv("https://opendata.paris.fr/explore/dataset/quartier_paris/download/?format=csv&timezone=Europe/Berlin&use_labels_for_header=true", sep=';', error_bad_lines=False, index_col=0)
paris_school = pd.read_csv("https://opendata.paris.fr/explore/dataset/secteurs-scolaires/download/?format=csv&timezone=Europe/Berlin&use_labels_for_header=true", sep=';', error_bad_lines=False, index_col=0)
paris_bike_station = pd.read_csv("https://opendata.paris.fr/explore/dataset/velib-emplacement-des-stations/download/?format=csv&timezone=Europe/Berlin&use_labels_for_header=true", sep=';', error_bad_lines=False, index_col=0)

In [3]:
paris_neighborhoods.head()

Unnamed: 0_level_0,C_QU,C_QUINSEE,L_QU,C_AR,N_SQ_AR,PERIMETRE,SURFACE,Geometry X Y,Geometry
N_SQ_QU,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
750000014,14,7510402,Saint-Gervais,4,750000004,2678.340923,422028.2,"48.8557186509,2.35816233385","{""type"": ""Polygon"", ""coordinates"": [[[2.363764..."
750000034,34,7510902,Chaussée-d'Antin,9,750000009,3133.580092,543441.2,"48.8735473802,2.33226886887","{""type"": ""Polygon"", ""coordinates"": [[[2.335450..."
750000042,42,7511102,Saint-Ambroise,11,750000011,4052.567737,837992.9,"48.8623450235,2.37611805592","{""type"": ""Polygon"", ""coordinates"": [[[2.370939..."
750000058,58,7511502,Necker,15,750000015,5979.711469,1578484.0,"48.8427112503,2.31077745364","{""type"": ""Polygon"", ""coordinates"": [[[2.306149..."
750000012,12,7510304,Sainte-Avoie,3,750000003,1861.804114,213316.4,"48.862557245,2.35485151825","{""type"": ""Polygon"", ""coordinates"": [[[2.358217..."


In [4]:
paris_school.head()

Unnamed: 0_level_0,Libellé,Zone commune,Etiquette,Libellé établissement 1,Libellé établissement 2,Libellé établissement 3,Libellé établissement 4,Adresse 1,Adresse 2,Adresse 3,Adresse 4,Annee Scolaire,geo_shape,geo_point_2d
Type d'établissement,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1
MATERNELLES (année scolaire 2017/2018),BERCY (167) MAT,0,,BERCY (167) MAT,,,,167 RUE DE BERCY,,,,2017-2018,"{""type"": ""MultiPolygon"", ""coordinates"": [[[[2....","48.8430854612,2.37480814291"
ELEMENTAIRES (année scolaire 2017/2018),VICTOR COUSIN (14) ELEM,0,,VICTOR COUSIN (14) ELEM,,,,14 rue VICTOR COUSIN,,,,2017-2018,"{""type"": ""MultiPolygon"", ""coordinates"": [[[[2....","48.8468696747,2.34420482452"
ELEMENTAIRES (année scolaire 2018/2019),MOTTE PICQUET (10) ELEM,0,,MOTTE PICQUET (10) ELEM,,,,10 AVENUE DE LA MOTTE-PICQUET,,,,2018-2019,"{""type"": ""MultiPolygon"", ""coordinates"": [[[[2....","48.8575282256,2.3078884073"
COLLEGES (année scolaire 2017/2018),STEPHANE MALLARME,0,Secteur du collège STEPHANE MALLARME,STEPHANE MALLARME,,,,29 RUE DE LA JONQUIERE,,,,2017-2018,"{""type"": ""MultiPolygon"", ""coordinates"": [[[[2....","48.8908115125,2.32336574239"
MATERNELLES (année scolaire 2018/2019),ZM CARNOT(16)/MARSOULAN(16),1,,CARNOT (16) MAT,MARSOULAN (16) MAT,,,8 avenue LAMORICIERE,16 rue MARSOULAN,,,2018-2019,"{""type"": ""MultiPolygon"", ""coordinates"": [[[[2....","48.8454557926,2.41183853455"


In [5]:
paris_bike_station.head()

Unnamed: 0_level_0,Nom de la station,Capacité de la station,Lattitude,Longitude,Coordonnées géographiques
Identifiant de la station,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
54000559,Jouffroy d'Abbans - Wagram,40,48.881973,2.301132,"48.8819732984,2.30113215744"
210561800,Saint-Romain - Cherche-Midi,17,48.847082,2.321375,"48.8470815908,2.32137478888"
911767210,Alexandre Parodi - Quai de Valmy,24,48.879359,2.366596,"48.8793594194,2.36659616232"
82567101,Château - République,26,48.862924,2.415504,"48.8629238006,2.4155035615"
50191171,Thionville - Ourcq,24,48.889176,2.383365,"48.8891761374,2.38336533308"


### Data Cleaning

In [6]:
paris_neighborhoods = paris_neighborhoods.reset_index()
paris_school = paris_school.reset_index()
paris_bike_station = paris_bike_station.reset_index()

In [7]:
#paris_school = paris_school.drop(["Etiquette", "Libellé établissement 2", "Libellé établissement 3", "Libellé établissement 4", "Adresse 2", "Adresse 3", "Adresse 4"], 1)

In [8]:
paris_school.head()

Unnamed: 0,Type d'établissement,Libellé,Zone commune,Etiquette,Libellé établissement 1,Libellé établissement 2,Libellé établissement 3,Libellé établissement 4,Adresse 1,Adresse 2,Adresse 3,Adresse 4,Annee Scolaire,geo_shape,geo_point_2d
0,MATERNELLES (année scolaire 2017/2018),BERCY (167) MAT,0,,BERCY (167) MAT,,,,167 RUE DE BERCY,,,,2017-2018,"{""type"": ""MultiPolygon"", ""coordinates"": [[[[2....","48.8430854612,2.37480814291"
1,ELEMENTAIRES (année scolaire 2017/2018),VICTOR COUSIN (14) ELEM,0,,VICTOR COUSIN (14) ELEM,,,,14 rue VICTOR COUSIN,,,,2017-2018,"{""type"": ""MultiPolygon"", ""coordinates"": [[[[2....","48.8468696747,2.34420482452"
2,ELEMENTAIRES (année scolaire 2018/2019),MOTTE PICQUET (10) ELEM,0,,MOTTE PICQUET (10) ELEM,,,,10 AVENUE DE LA MOTTE-PICQUET,,,,2018-2019,"{""type"": ""MultiPolygon"", ""coordinates"": [[[[2....","48.8575282256,2.3078884073"
3,COLLEGES (année scolaire 2017/2018),STEPHANE MALLARME,0,Secteur du collège STEPHANE MALLARME,STEPHANE MALLARME,,,,29 RUE DE LA JONQUIERE,,,,2017-2018,"{""type"": ""MultiPolygon"", ""coordinates"": [[[[2....","48.8908115125,2.32336574239"
4,MATERNELLES (année scolaire 2018/2019),ZM CARNOT(16)/MARSOULAN(16),1,,CARNOT (16) MAT,MARSOULAN (16) MAT,,,8 avenue LAMORICIERE,16 rue MARSOULAN,,,2018-2019,"{""type"": ""MultiPolygon"", ""coordinates"": [[[[2....","48.8454557926,2.41183853455"


In [9]:
paris_neighborhoods[['L_QU', 'Geometry X Y']]

Unnamed: 0,L_QU,Geometry X Y
0,Saint-Gervais,"48.8557186509,2.35816233385"
1,Chaussée-d'Antin,"48.8735473802,2.33226886887"
2,Saint-Ambroise,"48.8623450235,2.37611805592"
3,Necker,"48.8427112503,2.31077745364"
4,Sainte-Avoie,"48.862557245,2.35485151825"
5,Rochechouart,"48.8798119198,2.344861291"
6,Folie-Méricourt,"48.8674031901,2.37296482493"
7,Saint-Fargeau,"48.8710347391,2.40617153015"
8,Batignolles,"48.8884815139,2.31385616901"
9,Val-de-Grâce,"48.841684288,2.34386092632"


In [10]:
paris_neighborhoods[['Latitude','Longitude']] = paris_neighborhoods['Geometry X Y'].str.split(',',expand=True)

In [11]:
paris_school[['Latitude','Longitude']] = paris_school['geo_point_2d'].str.split(',',expand=True)

In [12]:
paris_neighborhoods.head()

Unnamed: 0,N_SQ_QU,C_QU,C_QUINSEE,L_QU,C_AR,N_SQ_AR,PERIMETRE,SURFACE,Geometry X Y,Geometry,Latitude,Longitude
0,750000014,14,7510402,Saint-Gervais,4,750000004,2678.340923,422028.2,"48.8557186509,2.35816233385","{""type"": ""Polygon"", ""coordinates"": [[[2.363764...",48.8557186509,2.35816233385
1,750000034,34,7510902,Chaussée-d'Antin,9,750000009,3133.580092,543441.2,"48.8735473802,2.33226886887","{""type"": ""Polygon"", ""coordinates"": [[[2.335450...",48.8735473802,2.33226886887
2,750000042,42,7511102,Saint-Ambroise,11,750000011,4052.567737,837992.9,"48.8623450235,2.37611805592","{""type"": ""Polygon"", ""coordinates"": [[[2.370939...",48.8623450235,2.37611805592
3,750000058,58,7511502,Necker,15,750000015,5979.711469,1578484.0,"48.8427112503,2.31077745364","{""type"": ""Polygon"", ""coordinates"": [[[2.306149...",48.8427112503,2.31077745364
4,750000012,12,7510304,Sainte-Avoie,3,750000003,1861.804114,213316.4,"48.862557245,2.35485151825","{""type"": ""Polygon"", ""coordinates"": [[[2.358217...",48.862557245,2.35485151825


In [13]:
paris_school.head()

Unnamed: 0,Type d'établissement,Libellé,Zone commune,Etiquette,Libellé établissement 1,Libellé établissement 2,Libellé établissement 3,Libellé établissement 4,Adresse 1,Adresse 2,Adresse 3,Adresse 4,Annee Scolaire,geo_shape,geo_point_2d,Latitude,Longitude
0,MATERNELLES (année scolaire 2017/2018),BERCY (167) MAT,0,,BERCY (167) MAT,,,,167 RUE DE BERCY,,,,2017-2018,"{""type"": ""MultiPolygon"", ""coordinates"": [[[[2....","48.8430854612,2.37480814291",48.8430854612,2.37480814291
1,ELEMENTAIRES (année scolaire 2017/2018),VICTOR COUSIN (14) ELEM,0,,VICTOR COUSIN (14) ELEM,,,,14 rue VICTOR COUSIN,,,,2017-2018,"{""type"": ""MultiPolygon"", ""coordinates"": [[[[2....","48.8468696747,2.34420482452",48.8468696747,2.34420482452
2,ELEMENTAIRES (année scolaire 2018/2019),MOTTE PICQUET (10) ELEM,0,,MOTTE PICQUET (10) ELEM,,,,10 AVENUE DE LA MOTTE-PICQUET,,,,2018-2019,"{""type"": ""MultiPolygon"", ""coordinates"": [[[[2....","48.8575282256,2.3078884073",48.8575282256,2.3078884073
3,COLLEGES (année scolaire 2017/2018),STEPHANE MALLARME,0,Secteur du collège STEPHANE MALLARME,STEPHANE MALLARME,,,,29 RUE DE LA JONQUIERE,,,,2017-2018,"{""type"": ""MultiPolygon"", ""coordinates"": [[[[2....","48.8908115125,2.32336574239",48.8908115125,2.32336574239
4,MATERNELLES (année scolaire 2018/2019),ZM CARNOT(16)/MARSOULAN(16),1,,CARNOT (16) MAT,MARSOULAN (16) MAT,,,8 avenue LAMORICIERE,16 rue MARSOULAN,,,2018-2019,"{""type"": ""MultiPolygon"", ""coordinates"": [[[[2....","48.8454557926,2.41183853455",48.8454557926,2.41183853455


In [14]:
paris_neighborhoods_data = paris_neighborhoods[['L_QU', 'Latitude', 'Longitude']]
paris_neighborhoods_data['Latitude'] = paris_neighborhoods_data['Latitude'].astype(float)
paris_neighborhoods_data['Longitude'] = paris_neighborhoods_data['Longitude'].astype(float)

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  from ipykernel import kernelapp as app
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  app.launch_new_instance()


In [15]:
paris_school_data = paris_school[["Type d'établissement", 'Libellé','Latitude', 'Longitude']]
paris_school_data['Latitude'] = paris_school_data['Latitude'].astype(float)
paris_school_data['Longitude'] = paris_school_data['Longitude'].astype(float)

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  from ipykernel import kernelapp as app
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  app.launch_new_instance()


In [16]:
paris_school_data.head()

Unnamed: 0,Type d'établissement,Libellé,Latitude,Longitude
0,MATERNELLES (année scolaire 2017/2018),BERCY (167) MAT,48.843085,2.374808
1,ELEMENTAIRES (année scolaire 2017/2018),VICTOR COUSIN (14) ELEM,48.84687,2.344205
2,ELEMENTAIRES (année scolaire 2018/2019),MOTTE PICQUET (10) ELEM,48.857528,2.307888
3,COLLEGES (année scolaire 2017/2018),STEPHANE MALLARME,48.890812,2.323366
4,MATERNELLES (année scolaire 2018/2019),ZM CARNOT(16)/MARSOULAN(16),48.845456,2.411839


In [17]:
paris_bike_station_data = paris_bike_station[['Nom de la station', 'Lattitude', 'Longitude']]
paris_bike_station_data['Lattitude'] = paris_bike_station_data['Lattitude'].astype(float)
paris_bike_station_data['Longitude'] = paris_bike_station_data['Longitude'].astype(float)

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  from ipykernel import kernelapp as app
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  app.launch_new_instance()


In [18]:
paris_neighborhoods_data.dtypes

L_QU          object
Latitude     float64
Longitude    float64
dtype: object

In [19]:
paris_bike_station_data.dtypes

Nom de la station     object
Lattitude            float64
Longitude            float64
dtype: object

In [20]:
paris_school_data.dtypes

Type d'établissement     object
Libellé                  object
Latitude                float64
Longitude               float64
dtype: object

In [21]:
address = 'Paris'

geolocator = Nominatim(user_agent="ny_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of The Beaches are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of The Beaches are 48.8566101, 2.3514992.


In [22]:
# create map of Paris using latitude and longitude values
map_paris = folium.Map(location=[latitude, longitude], zoom_start=13)

# add markers to map
for lat, lng, label in zip(paris_neighborhoods_data['Latitude'], paris_neighborhoods_data['Longitude'], paris_neighborhoods_data['L_QU']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_paris)  

for lat, lng, label in zip(paris_bike_station_data['Lattitude'], paris_bike_station_data['Longitude'], paris_bike_station_data['Nom de la station']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='red',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_paris)


map_paris

### Foursquare

Now that we have our location candidates, let's use Foursquare API to get info on restaurants in each neighborhood.

We're interested in venues in 'food' category, but only those that are proper restaurants - coffe shops, pizza places, bakeries etc. are not direct competitors so we don't care about those. So we will include in out list only venues that have 'restaurant' in category name, and we'll make sure to detect and include all the subcategories of specific 'Italian restaurant' category, as we need info on Italian restaurants in the neighborhood.

### Define Foursquare Credentials and Version

In [23]:

CLIENT_ID = 'Q5QJCLVIONHRSPBJIEU2UTNCT2H3JRXLK1HZSRZS5XJT5XWI' # your Foursquare ID
CLIENT_SECRET = 'WH4H5WVRN1B4BEH555BKSPVTAVNLR4XWH2KPTZYHM42RYW0Q' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version
radius = 500
LIMIT  = 100

In [24]:
latitudes = paris_neighborhoods_data['Latitude'].values
longitudes = paris_neighborhoods_data['Longitude'].values

In [25]:
!pip install shapely
import shapely.geometry

!pip install pyproj
import pyproj

import math

def lonlat_to_xy(lon, lat):
    proj_latlon = pyproj.Proj(proj='latlong',datum='WGS84')
    proj_xy = pyproj.Proj(proj="utm", zone=33, datum='WGS84')
    xy = pyproj.transform(proj_latlon, proj_xy, lon, lat)
    return xy[0], xy[1]

def xy_to_lonlat(x, y):
    proj_latlon = pyproj.Proj(proj='latlong',datum='WGS84')
    proj_xy = pyproj.Proj(proj="utm", zone=33, datum='WGS84')
    lonlat = pyproj.transform(proj_xy, proj_latlon, x, y)
    return lonlat[0], lonlat[1]

def calc_xy_distance(x1, y1, x2, y2):
    dx = x2 - x1
    dy = y2 - y1
    return math.sqrt(dx*dx + dy*dy)



In [26]:
# Category IDs corresponding to Italian restaurants were taken from Foursquare web site (https://developer.foursquare.com/docs/resources/categories):

food_category = '4d4b7105d754a06374d81259' # 'Root' category for all food-related venues

japanese_restaurant_categories = ['4bf58dd8d48988d111941735','55a59bace4b013909087cb0c','55a59bace4b013909087cb30',
                                 '55a59bace4b013909087cb21','55a59bace4b013909087cb06','55a59bace4b013909087cb1b',
                                 '55a59bace4b013909087cb1e','55a59bace4b013909087cb18','55a59bace4b013909087cb24',
                                 '55a59bace4b013909087cb15','55a59bace4b013909087cb27','55a59bace4b013909087cb12',
                                 '4bf58dd8d48988d1d2941735','55a59bace4b013909087cb2d','55a59a31e4b013909087cb00',
                                 '55a59af1e4b013909087cb03','55a59bace4b013909087cb2a','55a59bace4b013909087cb0f',
                                 '55a59bace4b013909087cb33','55a59bace4b013909087cb09','55a59bace4b013909087cb36']

def is_restaurant(categories, specific_filter=None):
    restaurant_words = ['restaurant', 'diner', 'taverna', 'steakhouse']
    restaurant = False
    specific = False
    for c in categories:
        category_name = c[0].lower()
        category_id = c[1]
        for r in restaurant_words:
            if r in category_name:
                restaurant = True
        if 'fast food' in category_name:
            restaurant = False
        if not(specific_filter is None) and (category_id in specific_filter):
            specific = True
            restaurant = True
    return restaurant, specific

def get_categories(categories):
    return [(cat['name'], cat['id']) for cat in categories]

def format_address(location):
    address = ', '.join(location['formattedAddress'])
    address = address.replace(', Deutschland', '')
    address = address.replace(', Germany', '')
    return address

def get_venues_near_location(lat, lon, category, client_id, client_secret, radius=500, limit=100):
    version = '20180724'
    url = 'https://api.foursquare.com/v2/venues/explore?client_id={}&client_secret={}&v={}&ll={},{}&categoryId={}&radius={}&limit={}'.format(
        client_id, client_secret, version, lat, lon, category, radius, limit)
    try:
        results = requests.get(url).json()['response']['groups'][0]['items']
        venues = [(item['venue']['id'],
                   item['venue']['name'],
                   get_categories(item['venue']['categories']),
                   (item['venue']['location']['lat'], item['venue']['location']['lng']),
                   format_address(item['venue']['location']),
                   item['venue']['location']['distance']) for item in results]        
    except:
        venues = []
    return venues

In [27]:

# Let's now go over our neighborhood locations and get nearby restaurants; we'll also maintain a dictionary of all found restaurants and all found japanese restaurants

import pickle

def get_restaurants(lats, lons):
    restaurants = {}
    japanese_restaurants = {}
    location_restaurants = []

    print('Obtaining venues around candidate locations:', end='')
    for lat, lon in zip(lats, lons):
        # Using radius=350 to meke sure we have overlaps/full coverage so we don't miss any restaurant (we're using dictionaries to remove any duplicates resulting from area overlaps)
        venues = get_venues_near_location(lat, lon, food_category, CLIENT_ID, CLIENT_SECRET, radius=350, limit=100)
        area_restaurants = []
        for venue in venues:
            venue_id = venue[0]
            venue_name = venue[1]
            venue_categories = venue[2]
            venue_latlon = venue[3]
            venue_address = venue[4]
            venue_distance = venue[5]
            is_res, is_japanese = is_restaurant(venue_categories, specific_filter=japanese_restaurant_categories)
            if is_res:
                x, y = lonlat_to_xy(venue_latlon[1], venue_latlon[0])
                restaurant = (venue_id, venue_name, venue_latlon[0], venue_latlon[1], venue_address, venue_distance, is_japanese, x, y)
                if venue_distance<=1000:
                    area_restaurants.append(restaurant)
                restaurants[venue_id] = restaurant
                if is_japanese:
                    japanese_restaurants[venue_id] = restaurant
        location_restaurants.append(area_restaurants)
        print(' .', end='')
    print(' done.')
    return restaurants, japanese_restaurants, location_restaurants

# Try to load from local file system in case we did this before
restaurants = {}
japanese_restaurants = {}
location_restaurants = []

restaurants, japanese_restaurants, location_restaurants = get_restaurants(latitudes, longitudes)


Obtaining venues around candidate locations: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . done.


In [28]:
print('Total number of restaurants:', len(restaurants))
print('Total number of Japanese restaurants:', len(japanese_restaurants))
print('Percentage of Japanese restaurants: {:.2f}%'.format(len(japanese_restaurants) / len(restaurants) * 100))
print('Average number of restaurants in neighborhood:', np.array([len(r) for r in location_restaurants]).mean())

Total number of restaurants: 1746
Total number of Japanese restaurants: 212
Percentage of Japanese restaurants: 12.14%
Average number of restaurants in neighborhood: 23.925


In [29]:
print('List of all restaurants')
print('-----------------------')
for r in list(restaurants.values())[:10]:
    print(r)
print('...')
print('Total:', len(restaurants))

List of all restaurants
-----------------------
('526d201d11d2d42b81af4ee6', 'Miznon', 48.857200621424404, 2.3589573284780676, '22 rue des Ecouffes, 75004 Paris, France', 174, False, -426178.8923768628, 5489112.6125680925)
('4b867a97f964a520e48b31e3', "L'As du Fallafel", 48.85741376683434, 2.3590779304504395, '34 rue des Rosiers (Rue des Ecouffes), 75004 Paris, France', 200, False, -426166.0896838608, 5489134.733208605)
('4db07d97fa8ca4b3e9e2caca', 'Pamela Popo', 48.85574916092363, 2.3569190697806426, '15 rue Francois Miron, 75004 Paris, France', 91, False, -426355.0302407106, 5488977.004003589)
('526add8411d26c630d168674', 'Chez Mademoiselle', 48.85429032043008, 2.3598099177322163, '16 rue Charlemagne, 75004 Paris, France', 199, False, -426171.0215477565, 5488779.720531615)
('4bc61f2d04e8b713f079352d', 'Autour du Saumon', 48.85558694633289, 2.3578015122239786, '60 rue François Miron, 75004 Paris, France', 30, False, -426293.56526887056, 5488948.140121554)
('4c6eb67006ed6dcb107fa722', 

In [30]:
print('List of Japanese restaurants')
print('---------------------------')
for r in list(japanese_restaurants.values())[:10]:
    print(r)
print('...')
print('Total:', len(japanese_restaurants))

List of Japanese restaurants
---------------------------
('4b5214b5f964a520536627e3', 'Allo Sushi', 48.8563168088396, 2.3571211959625566, '13 rue Cloche Perce, 75004 Paris, France', 101, True, -426329.634452709, 5489037.385018682)
('4b201338f964a520dd2c24e3', 'Kyo', 48.857172400826315, 2.3551261928219493, '9 rue de la Verrerie, 75004 Paris, France', 275, True, -426459.44029719173, 5489156.7917822935)
('4d9771d6daec224bd1b2303e', 'Sancho', 48.855803, 2.356545, '7 rue François Miron, Paris, France', 118, True, -426381.3645402378, 5488987.58660638)
('4bc20f3a4cdfc9b6c1759521', 'Kiccho', 48.85726111085153, 2.354937636717164, '11 rue de la Verrerie, 75004 Paris, France', 292, True, -426471.561689885, 5489168.946512529)
('4bafa817f964a52079143ce3', 'Sushi Bâ', 48.8629561687714, 2.379144185932262, '39 rue Saint-Ambroise, 75011 Paris, France', 231, True, -424595.90142995387, 5489501.130291076)
('4bec631c75b2c9b6086a438d', 'Naoki', 48.86335663128362, 2.3797349631786346, '5 rue Guillaume Bertran

In [31]:
location_japanese_restaurants = []
TMP_list = []
for i in location_restaurants:
    for j in i:
        if (j[6] == True):
            TMP_list.append(j)
    TMP_list = tuple(TMP_list)
    location_japanese_restaurants.append(TMP_list)
    TMP_list = []

In [32]:
# create map of Manhattan using latitude and longitude values
map_paris = folium.Map(location=[latitude, longitude], zoom_start=13)

folium.Marker([latitude, longitude], popup='Paris').add_to(map_paris)

# add markers to map
for res in restaurants.values():
    lat = res[2]; lon = res[3]
    is_japanese = res[6]
    color = 'green' if is_japanese else 'indigo'
    folium.CircleMarker([lat, lon], radius=3, color=color, fill=True, fill_color=color, fill_opacity=1).add_to(map_paris) 
    
map_paris

Looking good. So now we have all the restaurants in area, and we know which ones are japanese restaurants! We also know which restaurants exactly are in vicinity of every neighborhood candidate center.

This concludes the data gathering phase - we're now ready to use this data for analysis to produce the report on optimal locations for a new japanese restaurant!

## 3. Methodology section

In this project we will direct our efforts on detecting areas of Paris that have low japaneses restaurant density (< 5), close to two middle school at least and close to one bike station at least. Given a neighborhood, we will limit our analysis to area ~1km around neighborhood center.

In first step we have collected the required data: location and type (category) of every restaurant within Paris, middle school in Paris and bike station in Paris. We have also identified Japanese restaurants (according to Foursquare categorization).

Second step in our analysis will be calculation and exploration,
    - number of japanese restaurants in 1km arround the center each neighborhood.
    - number of middle school in 1km arround the center each neighborhood.
    - number of bike station in 1km arround the center each neighborhood.

In third and final step we will focus on most promising areas and within those create clusters of locations that meet some basic requirements established in discussion with stakeholders: we will take into consideration locations with no more than two restaurants in radius of 250 meters, and we want locations without Japanese restaurants in radius of 400 meters. We will present map of all such locations but also create clusters (using k-means clustering) of those locations to identify general zones / neighborhoods / addresses which should be a starting point for final 'street level' exploration and search for optimal venue location by stakeholders.

## 4. Analysis section<a id='methodology_cell'></a>

In [34]:
location_japanese_restaurants_count = [len(res) for res in location_japanese_restaurants]

paris_neighborhoods_data['Japanese Restaurants in area'] = location_japanese_restaurants_count

print('Average number of restaurants in every area with radius=1000m:', np.array(location_japanese_restaurants_count).mean())

paris_neighborhoods_data.head(10)

Average number of restaurants in every area with radius=1000m: 3.0125


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  app.launch_new_instance()


Unnamed: 0,L_QU,Latitude,Longitude,Japanese Restaurants in area
0,Saint-Gervais,48.855719,2.358162,4
1,Chaussée-d'Antin,48.873547,2.332269,0
2,Saint-Ambroise,48.862345,2.376118,2
3,Necker,48.842711,2.310777,0
4,Sainte-Avoie,48.862557,2.354852,4
5,Rochechouart,48.879812,2.344861,2
6,Folie-Méricourt,48.867403,2.372965,6
7,Saint-Fargeau,48.871035,2.406172,1
8,Batignolles,48.888482,2.313856,4
9,Val-de-Grâce,48.841684,2.343861,2


In [35]:
nb_bike_station_neighborhood = []

tmp_nb_bike_station = 0
j = 0
for neighborhoods_lat, neighborhoods_long in zip(paris_neighborhoods_data['Latitude'], paris_neighborhoods_data['Longitude']):
    neighborhoods_x, neighborhoods_y = lonlat_to_xy(neighborhoods_lat, neighborhoods_long)
    for byke in paris_bike_station_data.itertuples():
        byke_x, byke_y = lonlat_to_xy(byke.Lattitude, byke.Longitude)
        d = calc_xy_distance(neighborhoods_x, neighborhoods_y, byke_x, byke_y)
        if d<500:
            tmp_nb_bike_station += 1
        #print (tmp_nb_bike_station)

    nb_bike_station_neighborhood.append(tmp_nb_bike_station)
    tmp_nb_bike_station = 0
    j += 1

print (j)


80


In [36]:
paris_neighborhoods_data['Number Byke Station'] = nb_bike_station_neighborhood
paris_neighborhoods_data.head()

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  if __name__ == '__main__':


Unnamed: 0,L_QU,Latitude,Longitude,Japanese Restaurants in area,Number Byke Station
0,Saint-Gervais,48.855719,2.358162,4,5
1,Chaussée-d'Antin,48.873547,2.332269,0,9
2,Saint-Ambroise,48.862345,2.376118,2,6
3,Necker,48.842711,2.310777,0,4
4,Sainte-Avoie,48.862557,2.354852,4,6


In [40]:
nb_middle_schools_neighborhood = []

tmp_nb_middle_schools = 0
i = 0
for neighborhoods_lat, neighborhoods_long in zip(paris_neighborhoods_data['Latitude'], paris_neighborhoods_data['Longitude']):
    neighborhoods_x, neighborhoods_y = lonlat_to_xy(neighborhoods_lat, neighborhoods_long)
    for ms in paris_school_data.itertuples():
        k = 0
        byke_x, byke_y = lonlat_to_xy(ms.Latitude, ms.Longitude)
        d = calc_xy_distance(neighborhoods_x, neighborhoods_y, byke_x, byke_y)
        if d<500:
            tmp_nb_middle_schools += 1
        #print (tmp_nb_middle_schools)

    nb_middle_schools_neighborhood.append(tmp_nb_middle_schools)
    tmp_nb_middle_schools = 0
    i += 1

print (i)

80


In [43]:
paris_neighborhoods_data['Number Middle Schools'] = nb_middle_schools_neighborhood
paris_neighborhoods_data

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  if __name__ == '__main__':


Unnamed: 0,L_QU,Latitude,Longitude,Japanese Restaurants in area,Number Byke Station,Number Middle Schools
0,Saint-Gervais,48.855719,2.358162,4,5,10
1,Chaussée-d'Antin,48.873547,2.332269,0,9,4
2,Saint-Ambroise,48.862345,2.376118,2,6,8
3,Necker,48.842711,2.310777,0,4,5
4,Sainte-Avoie,48.862557,2.354852,4,6,10
5,Rochechouart,48.879812,2.344861,2,8,15
6,Folie-Méricourt,48.867403,2.372965,6,6,11
7,Saint-Fargeau,48.871035,2.406172,1,5,5
8,Batignolles,48.888482,2.313856,4,4,8
9,Val-de-Grâce,48.841684,2.343861,2,2,9


In [50]:
optimal_location = paris_neighborhoods_data.loc[(paris_neighborhoods_data["Japanese Restaurants in area"] <= 3) & (paris_neighborhoods_data["Number Byke Station"] >=3) & (paris_neighborhoods_data["Number Middle Schools"] >= 2)]

optimal_location

Unnamed: 0,L_QU,Latitude,Longitude,Japanese Restaurants in area,Number Byke Station,Number Middle Schools
1,Chaussée-d'Antin,48.873547,2.332269,0,9,4
2,Saint-Ambroise,48.862345,2.376118,2,6,8
3,Necker,48.842711,2.310777,0,4,5
5,Rochechouart,48.879812,2.344861,2,8,15
7,Saint-Fargeau,48.871035,2.406172,1,5,5
11,Plaisance,48.830317,2.315305,0,5,8
13,Saint-Victor,48.847664,2.354093,2,6,6
18,Mail,48.868008,2.344699,2,8,9
22,Grenelle,48.850172,2.291853,3,4,8
23,Saint-Germain-l'Auxerrois,48.86065,2.33491,1,4,5


In [55]:
optimal_location.shape

(40, 6)

In [52]:
# create map of Paris using latitude and longitude values
map_paris = folium.Map(location=[latitude, longitude], zoom_start=13)

# add markers to map
for lat, lng, label in zip(optimal_location['Latitude'], optimal_location['Longitude'], optimal_location['L_QU']):
    label = folium.Popup(label, parse_html=True)
    folium.Marker([lat, lng], popup=label).add_to(map_paris)

map_paris

## 5. Results and Discussion section<a id='analysis_cell'></a> 

Our analysis shows us a very high number of restaurants in Paris and a very high concentration of Japanese restaurant in the center of Paris. We then focused on the outermost neighborhoods while staying in Paris. We held the neighborhoods with a low number of Japanese restaurants (<= 3), a large number of bike stations (> = 3) and with a number of colleges also raised to give more chance to our future Japanese restaurant.

First of all, we retrieved all the data on a site set up by the city of Paris (data are available for free). We then proceeded to a data processing before retrieving the list of restaurants (especially Japanese restaurants) with foursquare. The data collected on bike stations and schools, also allowed us to make an aggregation to determine the optimal location for our next restaurant.

Result of all this is 40 zones containing largest number of potential new restaurant locations based on number of and distance to existing japanese restaurant - both restaurants in general and Japanese restaurants particularly. This, of course, does not imply that those zones are actually optimal locations for a new restaurant! Purpose of this analysis was to only provide info on neighborhoods in Paris it is entirely possible that there is a very good reason for small number of restaurants in any of those neighborhoods, reasons which would make them unsuitable for a new restaurant regardless of lack of competition in the area. Recommended zones should therefore be considered only as a starting point for more detailed analysis which could eventually result in location which has not only no nearby competition but also other factors taken into account and all other relevant conditions met.

## 7. Conclusion section<a id='results_cell'></a> 

Purpose of this project was to identify Paris neighborhood close to 2 middle school at least,  close to 3 bike station at least and with low number of japanese restaurants in order to aid stakeholders in narrowing down the search for optimal location for a new Japanese restaurant. By calculating restaurant density distribution from Foursquare data we have first identified general neighborhood that justify further analysis (external of Paris), and then generated extensive collection of locations which satisfy some basic requirements. Clustering of those locations was then performed in order to create major zones of interest (containing greatest number of potential locations) and addresses of those zone centers were created to be used as starting points for final exploration by stakeholders.

Final decission on optimal restaurant location will be made by stakeholders based on specific characteristics of neighborhoods and locations in every recommended zone, taking into consideration additional factors like attractiveness of each location (proximity to park or water), levels of noise / proximity to major roads, real estate availability, prices, social and economic dynamics of every neighborhood etc.

### END...<a id='end_cell'></a> 