# Capstone Project - The Battle of Neighborhoods


# Opening a Restaurant in Paris

# I. Introduction 

According to Insee www.insee.fr, there are more than 45,000 restaurants in Paris and about 2.2 million people. London has approximately 16,000 restaurants for 9 million people. That is why opening a new restaurant in Paris can be an extremely challenging task compared to doing so in London. Choosing a restaurant type and a good spot, an entrepreneur or an investor usually relies on common sense and domain knowledge. Needless to say that too often an inconsiderate decision leads to a poor income and inevitable bankruptcy.

According to several surveys, up to 40% of such start-ups fail in the very first year. Let's suppose, an investor has enough time and money, as well as a passion to open the best eating spot in Paris. What type of restaurant would it be? What would be the best place for it?

What if there is a way to cluster city neighborhoods, based on their near-by restaurant similarity? What if we can visualize these clusters on a map? What if we might find what type of restaurant is the most and least popular in each location? Equipped with that knowledge, we might be able to make a smart choice from a huge number of restaurant types and available places.
Let us allow machine learning to get the job done.

Target audience: investors, entrepreneurs, and chefs interested in opening a restaurant in Paris, who may need a piece of objective advice of what type of restaurant would be more successful and where exactly it should be opened.

# II. Data

## Data used in this project

This project will be using data from Paris open source database https://opendata.paris.fr, to collect information about boroughs and neighborhoods and merge those two separate dataframes.

We will also use Foursquare API www.foursquare.com to collect the top 100 restaurants for each location.

## We will work as follow:

1. Using two tables from Paris open source database, collect information about Paris boroughs and neighborhoods.

2. Merge these two separate dataframes into one which will be used for the next steps.

3. Use the Geopy and Folium library to get the coordinates of every locations and map geospatial data on a Paris map.

4. Using Foursquare API, collect the top 100 restaurants and their categories for each location within a radius 300 meters.

5. Group collected restaurants by location and by taking the mean of the frequency of occurrence of each type, preparing them for clustering.

6. Cluster restaurants by k-means algorithm and analyze the top 10 most common restaurants in each cluster.

7. Visualize clusters on the map, thus showing the best locations for opening the chosen restaurant.

# III. Methodology

Before we get the data and start exploring it, let's download all the dependencies that we will need.

In [1]:
import time # for time delay while working with API
import requests # library to handle requests
import numpy as np # library to handle data in a vectorized manner
import pandas as pd # library for data analsysis
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe
from geopy.geocoders import Nominatim # Convert an address into latitude and longitude values
import geopy.geocoders # Convert an address into latitude and longitude values
import json # library to handle JSON files
import matplotlib.cm as cm # Matplotlib and associated plotting modules
import matplotlib.colors as colors # Matplotlib and associated plotting modules
from sklearn.cluster import KMeans # k-means
import folium # Map rendering library
import re # regular expressions

## III.1. Collecting Paris Boroughs

In [2]:
body1 = r"H:/PYTHON/COURSERA/IBM/CAPSTONE PROJECT/arrd.xls"

In [3]:
df1 = pd.read_excel(body1)

In [4]:
df1.head()

Unnamed: 0,N_SQ_AR,C_AR,C_ARINSEE,L_AR,L_AROFF,N_SQ_CO,SURFACE,PERIMETRE,Geometry X Y,Geometry,OBJECTID,LONGUEUR
0,750000002,2,75102,2ème Ardt,Bourse,750001537,991153.7,4554.10436,"48.8682792225, 2.34280254689","{""type"": ""Polygon"", ""coordinates"": [[[2.351518...",2,4553.938764
1,750000003,3,75103,3ème Ardt,Temple,750001537,1170883.0,4519.263648,"48.86287238, 2.3600009859","{""type"": ""Polygon"", ""coordinates"": [[[2.363828...",3,4519.071982
2,750000012,12,75112,12ème Ardt,Reuilly,750001537,16314780.0,24089.666298,"48.8349743815, 2.42132490078","{""type"": ""Polygon"", ""coordinates"": [[[2.413879...",12,24088.038922
3,750000001,1,75101,1er Ardt,Louvre,750001537,1824613.0,6054.936862,"48.8625627018, 2.33644336205","{""type"": ""Polygon"", ""coordinates"": [[[2.328007...",1,6054.680862
4,750000004,4,75104,4ème Ardt,Hôtel-de-Ville,750001537,1600586.0,5420.908434,"48.8543414263, 2.35762962032","{""type"": ""Polygon"", ""coordinates"": [[[2.368512...",4,5420.636779


In [5]:
df1.shape

(20, 12)

In [6]:
df1.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 20 entries, 0 to 19
Data columns (total 12 columns):
N_SQ_AR         20 non-null int64
C_AR            20 non-null int64
C_ARINSEE       20 non-null int64
L_AR            20 non-null object
L_AROFF         20 non-null object
N_SQ_CO         20 non-null int64
SURFACE         20 non-null float64
PERIMETRE       20 non-null float64
Geometry X Y    20 non-null object
Geometry        20 non-null object
OBJECTID        20 non-null int64
LONGUEUR        20 non-null float64
dtypes: float64(3), int64(5), object(4)
memory usage: 2.0+ KB


In [7]:
df1['Borough'] = df1['L_AR'] + ' - ' + df1['L_AROFF']

In [8]:
df1 = df1[['C_AR','Borough']]

In [9]:
df1 = df1.sort_values(by='C_AR').reset_index(drop=True)

In [10]:
df1.head()

Unnamed: 0,C_AR,Borough
0,1,1er Ardt - Louvre
1,2,2ème Ardt - Bourse
2,3,3ème Ardt - Temple
3,4,4ème Ardt - Hôtel-de-Ville
4,5,5ème Ardt - Panthéon


In [11]:
df1.shape

(20, 2)

In [12]:
df1.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 20 entries, 0 to 19
Data columns (total 2 columns):
C_AR       20 non-null int64
Borough    20 non-null object
dtypes: int64(1), object(1)
memory usage: 400.0+ bytes


## III.2. Collecting Paris Neighborhoods

In [13]:
body2 = r"H:/PYTHON/COURSERA/IBM/CAPSTONE PROJECT/qts.xls"

In [14]:
df2 = pd.read_excel(body2)

In [15]:
df2.head()

Unnamed: 0,N_SQ_QU,C_QU,C_QUINSEE,L_QU,C_AR,N_SQ_AR,PERIMETRE,SURFACE,Geometry X Y,Geometry,OBJECTID,LONGUEUR
0,750000010,10,7510302,Enfants-Rouges,3,750000003,2139.625388,271750.323937,"48.863887392, 2.36312330099","{""type"": ""Polygon"", ""coordinates"": [[[2.367101...",50,2139.535591
1,750000016,16,7510404,Notre-Dame,4,750000004,3283.163371,378252.153674,"48.8528955862, 2.35277501212","{""type"": ""Polygon"", ""coordinates"": [[[2.361313...",56,3282.999717
2,750000018,18,7510502,Jardin-des-Plantes,5,750000005,4052.729521,798389.398463,"48.8419401934, 2.35689388962","{""type"": ""Polygon"", ""coordinates"": [[[2.364561...",58,4052.473226
3,750000025,25,7510701,Saint-Thomas-d'Aquin,7,750000007,3827.253353,826559.43678,"48.8552632694, 2.32558765258","{""type"": ""Polygon"", ""coordinates"": [[[2.322133...",7,3827.053421
4,750000035,35,7510903,Faubourg-Montmartre,9,750000009,2786.541926,417335.080621,"48.8739346918, 2.34325257947","{""type"": ""Polygon"", ""coordinates"": [[[2.340255...",17,2786.448978


Each borough of Paris as 4 neighborhoods (20 boroughs so 80 neighborhoods)

In [16]:
df2.shape

(80, 12)

In [17]:
df2.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 80 entries, 0 to 79
Data columns (total 12 columns):
N_SQ_QU         80 non-null int64
C_QU            80 non-null int64
C_QUINSEE       80 non-null int64
L_QU            80 non-null object
C_AR            80 non-null int64
N_SQ_AR         80 non-null int64
PERIMETRE       80 non-null float64
SURFACE         80 non-null float64
Geometry X Y    80 non-null object
Geometry        80 non-null object
OBJECTID        80 non-null int64
LONGUEUR        80 non-null float64
dtypes: float64(3), int64(6), object(3)
memory usage: 7.6+ KB


In [18]:
df2 = df2[['C_AR','L_QU','Geometry X Y']]

We can use 'Geometry X Y' to set Latitude and Longitude for each neighborhood.

In [19]:
df2['Latitude'] = df2['Geometry X Y'].str.split(',', expand=True).astype(float)[0]
df2['Longitude'] = df2['Geometry X Y'].str.split(',', expand=True).astype(float)[1]

In [20]:
df2 = df2[['C_AR','L_QU','Latitude','Longitude']]

In [21]:
df2 = df2.sort_values(by='C_AR').reset_index(drop=True)

In [22]:
df2.head()

Unnamed: 0,C_AR,L_QU,Latitude,Longitude
0,1,Halles,48.862289,2.344899
1,1,Place-Vendôme,48.867019,2.328582
2,1,Palais-Royal,48.86466,2.336309
3,1,St-Germain-l'Auxerrois,48.86065,2.33491
4,2,Mail,48.868008,2.344699


## III.3. Merging the two dataframes

In [23]:
df = pd.merge(df1,df2, on='C_AR', how='outer')

In [24]:
df.rename(columns={'C_AR': 'Number', 'L_QU': 'Neighborhood'}, inplace=True)

In [25]:
df.shape

(80, 5)

In [26]:
df.head()

Unnamed: 0,Number,Borough,Neighborhood,Latitude,Longitude
0,1,1er Ardt - Louvre,Halles,48.862289,2.344899
1,1,1er Ardt - Louvre,Place-Vendôme,48.867019,2.328582
2,1,1er Ardt - Louvre,Palais-Royal,48.86466,2.336309
3,1,1er Ardt - Louvre,St-Germain-l'Auxerrois,48.86065,2.33491
4,2,2ème Ardt - Bourse,Mail,48.868008,2.344699


## III.4. Map geospatial data

In [27]:
# Get the Paris "central" point
paris_address = 'Paris, France'
geolocator = Nominatim(user_agent='opening_restaurant_paris')
location = geolocator.geocode(paris_address)
paris_lat = location.latitude
paris_lon = location.longitude
print('The geograpical coordinate of {0} are {1}, {2}.'.format(paris_address, paris_lat, paris_lon))

The geograpical coordinate of Paris, France are 48.8566101, 2.3514992.


In [28]:
# Create map of Paris using starting point coordinates
paris_map = folium.Map(location=[paris_lat, paris_lon], zoom_start=12) 

In [29]:
# Add markers to map
for lat, lng, bor, nei in zip(df['Latitude'], df['Longitude'], df['Borough'], df['Neighborhood']):
    label = '{}'.format(nei, bor)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        ).add_to(paris_map)

In [30]:
paris_map

# IV. Exploring Paris Restaurants

Next, we are going to start utilizing the Foursquare API to explore neighborhoods and segment them.

## IV.1. Collecting Restaurants

Let's explore the first neighborhood in our dataframe.

In [31]:
df.loc[0,'Neighborhood']

'Halles'

Get the neighborhood's latitude and longitude values.

In [32]:
loc_latitude = df.loc[0, 'Latitude'] # neighborhood latitude value
loc_longitude = df.loc[0, 'Longitude'] # neighborhood longitude value
loc_name = df.loc[0, 'Neighborhood'] # neighborhood name

print('Latitude and longitude values of {} are {}, {}.'.format(loc_name, 
                                                               loc_latitude, 
                                                               loc_longitude))

Latitude and longitude values of Halles are 48.8622891081, 2.34489885831.


Now, let's get the top 100 venues that are in Halles within a radius of 300 meters.

In [33]:
CLIENT_ID = 'XGAMKYYB2F4GWWESSWCVFTC00ZF101PBJRU1FDJKC1B53COS'
CLIENT_SECRET = '4XSNSODVEYVC0YGNVEIGSNBGJA2LKKLCMLEQOLA3QIIWE1PN'
VERSION = '20190101'

In [34]:
radius = 300
LIMIT = 100
url = 'https://api.foursquare.com/v2/venues/explore?client_id={0}&client_secret={1}&ll={2},{3}&v={4}&radius={5}&limit={6}&query=restaurant'.format(CLIENT_ID, CLIENT_SECRET, loc_latitude, loc_longitude, VERSION, radius, LIMIT)

In [35]:
results = requests.get(url).json()
results

{'meta': {'code': 200, 'requestId': '5c8449501ed2196e4b32e4a1'},
 'response': {'suggestedFilters': {'header': 'Tap to show:',
   'filters': [{'name': 'Open now', 'key': 'openNow'}]},
  'headerLocation': 'Les Halles',
  'headerFullLocation': 'Les Halles, Paris',
  'headerLocationGranularity': 'neighborhood',
  'query': 'restaurant',
  'totalResults': 80,
  'suggestedBounds': {'ne': {'lat': 48.8649891108, 'lng': 2.3489953470523512},
   'sw': {'lat': 48.8595891054, 'lng': 2.340802369567649}},
  'groups': [{'type': 'Recommended Places',
    'name': 'recommended',
    'items': [{'reasons': {'count': 0,
       'items': [{'summary': 'This spot is popular',
         'type': 'general',
         'reasonName': 'globalInteractionReason'}]},
      'venue': {'id': '57222bc7498ea8de6c6d8c24',
       'name': 'Café Belleville',
       'location': {'address': "L'Exception Concept Store",
        'crossStreet': '24 rue Berger',
        'lat': 48.861209781831484,
        'lng': 2.3468057763265,
        'l

In [36]:
# Function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

In [37]:
venues = results['response']['groups'][0]['items']
    
nearby_venues = json_normalize(venues) # flatten JSON

# filter columns
filtered_columns = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
nearby_venues = nearby_venues.loc[:, filtered_columns]

# filter the category for each row
nearby_venues['venue.categories'] = nearby_venues.apply(get_category_type, axis=1)

# clean columns
nearby_venues.columns = [col.split(".")[-1] for col in nearby_venues.columns]

In [38]:
nearby_venues.head()

Unnamed: 0,name,categories,lat,lng
0,Café Belleville,Café,48.86121,2.346806
1,Boutique yam'Tcha,Chinese Restaurant,48.86171,2.34238
2,Baltard Au Louvre,Restaurant,48.863441,2.342502
3,Enza & Famiglia,Italian Restaurant,48.861191,2.343449
4,La Tavola Calda,Pizza Place,48.860493,2.345559


In [39]:
print('{} venues were returned by Foursquare.'.format(nearby_venues.shape[0]))

80 venues were returned by Foursquare.


Let's create a function to repeat the same process to all the neighborhoods of Paris.

In [40]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={0}&client_secret={1}&v={2}&ll={3},{4}&radius={5}&limit={6}&query=restaurant'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Latitude', 
                  'Longitude', 
                  'Venue', 
                  'Venue Category']
    
    return(nearby_venues)

Now we run the above function on each neighborhood and create a new dataframe called paris_venues.

In [41]:
paris_venues = getNearbyVenues(names=df['Neighborhood'],
                                   latitudes=df['Latitude'],
                                   longitudes=df['Longitude']
                                  )

Let's check the size of the resulting dataframe.

In [42]:
print(paris_venues.shape)
paris_venues.head()

(4791, 5)


Unnamed: 0,Neighborhood,Latitude,Longitude,Venue,Venue Category
0,Halles,48.862289,2.344899,Café Belleville,Café
1,Halles,48.862289,2.344899,Boutique yam'Tcha,Chinese Restaurant
2,Halles,48.862289,2.344899,Baltard Au Louvre,Restaurant
3,Halles,48.862289,2.344899,Enza & Famiglia,Italian Restaurant
4,Halles,48.862289,2.344899,La Régalade Saint-Honoré,French Restaurant


Let's check how many restaurants were returned for each neighborhood.

In [43]:
paris_venues[['Neighborhood', 'Venue']].groupby('Neighborhood').count()

Unnamed: 0_level_0,Venue
Neighborhood,Unnamed: 1_level_1
Amérique,9
Archives,100
Arsenal,62
Arts-et-Metiers,100
Auteuil,4
Batignolles,88
Bel-Air,2
Belleville,33
Bercy,46
Bonne-Nouvelle,100


And check if Foursquare API did not return restaurants for some locations.

In [44]:
x = paris_venues[['Neighborhood', 'Venue']].groupby('Neighborhood').count().shape[0]
y = df.shape[0]
if x != y:
    print('Missing data for {0} locations:'.format(y-x))
    print(set(df['Neighborhood']).symmetric_difference(set(paris_venues['Neighborhood'])))

Missing data for 1 locations:
{'Picpus'}


We need to remove 'Picpus' from the main Paris dataframe.

In [45]:
df = df[df.Neighborhood != 'Picpus']

Let's find out how many unique categories can be curated from all the returned restaurants.

In [46]:
print('There are {0} uniques categories.'.format(len(paris_venues['Venue Category'].unique())))

There are 123 uniques categories.


## IV.2. Exploring Restaurants

To begin analisys we need to transform collected information using the one-hot encoding method.

In [47]:
# one hot encoding
paris_onehot = pd.get_dummies(paris_venues[['Venue Category']], prefix="", prefix_sep="")

# add location column back to dataframe
paris_onehot['Neighborhood'] = paris_venues['Neighborhood'] 

# move location column to the first column
fixed_columns = [paris_onehot.columns[-1]] + list(paris_onehot.columns[:-1])
paris_onehot = paris_onehot[fixed_columns]

paris_onehot.head()

Unnamed: 0,Neighborhood,Afghan Restaurant,African Restaurant,Alsatian Restaurant,American Restaurant,Arepa Restaurant,Argentinian Restaurant,Asian Restaurant,Auvergne Restaurant,BBQ Joint,Bagel Shop,Bakery,Basque Restaurant,Belgian Restaurant,Bistro,Brasserie,Brazilian Restaurant,Breakfast Spot,Breton Restaurant,Burger Joint,Burgundian Restaurant,Cafeteria,Café,Cajun / Creole Restaurant,Cambodian Restaurant,Cantonese Restaurant,Caribbean Restaurant,Caucasian Restaurant,Ch'ti Restaurant,Chinese Restaurant,Comfort Food Restaurant,Corsican Restaurant,Creperie,Deli / Bodega,Dim Sum Restaurant,Diner,Doner Restaurant,Donut Shop,Eastern European Restaurant,Empanada Restaurant,English Restaurant,Ethiopian Restaurant,Falafel Restaurant,Fast Food Restaurant,Fish & Chips Shop,Fondue Restaurant,Food,Food Court,Food Truck,French Restaurant,Fried Chicken Joint,Gastropub,German Restaurant,Gluten-free Restaurant,Greek Restaurant,Halal Restaurant,Hawaiian Restaurant,Hot Dog Joint,Indian Restaurant,Indonesian Restaurant,Irish Pub,Israeli Restaurant,Italian Restaurant,Japanese Restaurant,Jewish Restaurant,Jiangxi Restaurant,Kebab Restaurant,Korean Restaurant,Latin American Restaurant,Lebanese Restaurant,Lyonese Bouchon,Mac & Cheese Joint,Malay Restaurant,Mediterranean Restaurant,Mexican Restaurant,Middle Eastern Restaurant,Modern European Restaurant,Molecular Gastronomy Restaurant,Moroccan Restaurant,New American Restaurant,Noodle House,North Indian Restaurant,Okonomiyaki Restaurant,Peruvian Restaurant,Pet Café,Pizza Place,Poke Place,Portuguese Restaurant,Provençal Restaurant,Ramen Restaurant,Restaurant,Romanian Restaurant,Russian Restaurant,Salad Place,Sandwich Place,Savoyard Restaurant,Scandinavian Restaurant,Seafood Restaurant,Shandong Restaurant,Shanxi Restaurant,Snack Place,Soba Restaurant,Soup Place,South American Restaurant,Southern / Soul Food Restaurant,Southwestern French Restaurant,Souvlaki Shop,Spanish Restaurant,Steakhouse,Sushi Restaurant,Syrian Restaurant,Szechuan Restaurant,Taco Place,Taiwanese Restaurant,Tapas Restaurant,Thai Restaurant,Tibetan Restaurant,Trattoria/Osteria,Turkish Restaurant,Udon Restaurant,Vegetarian / Vegan Restaurant,Venezuelan Restaurant,Vietnamese Restaurant,Wings Joint
0,Halles,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,Halles,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,Halles,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,Halles,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,Halles,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


In [48]:
paris_onehot.shape

(4791, 124)

Next, let's group rows by neighborhood and by taking the mean of the frequency of occurrence of each category preparing the dataframe for clustering.

In [49]:
paris_grouped = paris_onehot.groupby('Neighborhood').mean().reset_index()
paris_grouped

Unnamed: 0,Neighborhood,Afghan Restaurant,African Restaurant,Alsatian Restaurant,American Restaurant,Arepa Restaurant,Argentinian Restaurant,Asian Restaurant,Auvergne Restaurant,BBQ Joint,Bagel Shop,Bakery,Basque Restaurant,Belgian Restaurant,Bistro,Brasserie,Brazilian Restaurant,Breakfast Spot,Breton Restaurant,Burger Joint,Burgundian Restaurant,Cafeteria,Café,Cajun / Creole Restaurant,Cambodian Restaurant,Cantonese Restaurant,Caribbean Restaurant,Caucasian Restaurant,Ch'ti Restaurant,Chinese Restaurant,Comfort Food Restaurant,Corsican Restaurant,Creperie,Deli / Bodega,Dim Sum Restaurant,Diner,Doner Restaurant,Donut Shop,Eastern European Restaurant,Empanada Restaurant,English Restaurant,Ethiopian Restaurant,Falafel Restaurant,Fast Food Restaurant,Fish & Chips Shop,Fondue Restaurant,Food,Food Court,Food Truck,French Restaurant,Fried Chicken Joint,Gastropub,German Restaurant,Gluten-free Restaurant,Greek Restaurant,Halal Restaurant,Hawaiian Restaurant,Hot Dog Joint,Indian Restaurant,Indonesian Restaurant,Irish Pub,Israeli Restaurant,Italian Restaurant,Japanese Restaurant,Jewish Restaurant,Jiangxi Restaurant,Kebab Restaurant,Korean Restaurant,Latin American Restaurant,Lebanese Restaurant,Lyonese Bouchon,Mac & Cheese Joint,Malay Restaurant,Mediterranean Restaurant,Mexican Restaurant,Middle Eastern Restaurant,Modern European Restaurant,Molecular Gastronomy Restaurant,Moroccan Restaurant,New American Restaurant,Noodle House,North Indian Restaurant,Okonomiyaki Restaurant,Peruvian Restaurant,Pet Café,Pizza Place,Poke Place,Portuguese Restaurant,Provençal Restaurant,Ramen Restaurant,Restaurant,Romanian Restaurant,Russian Restaurant,Salad Place,Sandwich Place,Savoyard Restaurant,Scandinavian Restaurant,Seafood Restaurant,Shandong Restaurant,Shanxi Restaurant,Snack Place,Soba Restaurant,Soup Place,South American Restaurant,Southern / Soul Food Restaurant,Southwestern French Restaurant,Souvlaki Shop,Spanish Restaurant,Steakhouse,Sushi Restaurant,Syrian Restaurant,Szechuan Restaurant,Taco Place,Taiwanese Restaurant,Tapas Restaurant,Thai Restaurant,Tibetan Restaurant,Trattoria/Osteria,Turkish Restaurant,Udon Restaurant,Vegetarian / Vegan Restaurant,Venezuelan Restaurant,Vietnamese Restaurant,Wings Joint
0,Amérique,0.0,0.0,0.0,0.0,0.0,0.0,0.111111,0.0,0.0,0.0,0.111111,0.0,0.0,0.111111,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.333333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.222222,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.111111,0.0
1,Archives,0.0,0.0,0.0,0.0,0.0,0.01,0.01,0.0,0.0,0.0,0.04,0.0,0.0,0.06,0.01,0.0,0.0,0.0,0.04,0.0,0.0,0.07,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.03,0.02,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.01,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.07,0.04,0.01,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.01,0.0,0.0,0.04,0.0,0.0,0.01,0.0,0.04,0.0,0.0,0.0,0.02,0.0,0.01,0.01,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.03,0.02,0.0,0.0,0.0,0.0,0.03,0.0,0.0,0.0
2,Arsenal,0.0,0.0,0.0,0.0,0.0,0.0,0.016129,0.0,0.0,0.016129,0.064516,0.0,0.0,0.016129,0.0,0.016129,0.0,0.0,0.032258,0.0,0.0,0.032258,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.016129,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.274194,0.0,0.064516,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.080645,0.032258,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.048387,0.0,0.0,0.0,0.0,0.032258,0.0,0.0,0.0,0.016129,0.0,0.0,0.048387,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.016129,0.0,0.0,0.016129,0.032258,0.0,0.016129,0.0,0.0,0.048387,0.032258,0.0,0.0,0.0,0.0,0.032258,0.0,0.0,0.0
3,Arts-et-Metiers,0.0,0.0,0.0,0.0,0.0,0.01,0.02,0.02,0.0,0.0,0.01,0.0,0.0,0.04,0.0,0.0,0.0,0.0,0.04,0.0,0.0,0.05,0.0,0.0,0.0,0.0,0.0,0.0,0.06,0.0,0.01,0.01,0.01,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.16,0.0,0.02,0.01,0.0,0.0,0.0,0.0,0.01,0.01,0.0,0.0,0.0,0.05,0.06,0.0,0.0,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.01,0.0,0.04,0.0,0.02,0.0,0.0,0.01,0.0,0.02,0.0,0.0,0.0,0.0,0.05,0.0,0.0,0.01,0.02,0.0,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.02,0.0,0.0,0.0,0.0,0.02,0.02,0.0,0.0,0.0,0.0,0.02,0.0,0.05,0.0
4,Auteuil,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.75,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
5,Batignolles,0.0,0.0,0.0,0.011364,0.0,0.0,0.0,0.0,0.011364,0.0,0.034091,0.0,0.0,0.068182,0.0,0.0,0.034091,0.0,0.0,0.0,0.0,0.045455,0.0,0.0,0.0,0.0,0.0,0.0,0.034091,0.0,0.0,0.022727,0.0,0.0,0.011364,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.011364,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.011364,0.0,0.0,0.011364,0.0,0.0,0.0,0.034091,0.0,0.0,0.0,0.102273,0.068182,0.0,0.0,0.0,0.0,0.011364,0.011364,0.0,0.0,0.0,0.011364,0.0,0.0,0.0,0.0,0.0,0.0,0.011364,0.0,0.0,0.0,0.0,0.022727,0.0,0.0,0.0,0.0,0.056818,0.0,0.0,0.0,0.0,0.0,0.0,0.011364,0.0,0.0,0.011364,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.022727,0.0,0.0,0.0,0.0,0.0,0.011364,0.0,0.0,0.022727,0.0,0.011364,0.0,0.022727,0.0
6,Bel-Air,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.5,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.5,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
7,Belleville,0.0,0.030303,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.030303,0.0,0.0,0.060606,0.0,0.0,0.0,0.0,0.060606,0.0,0.0,0.030303,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.030303,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.272727,0.0,0.030303,0.0,0.0,0.030303,0.0,0.0,0.0,0.030303,0.0,0.0,0.0,0.060606,0.090909,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.030303,0.0,0.0,0.0,0.0,0.121212,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.030303,0.0,0.0,0.030303,0.0,0.0,0.0,0.030303,0.0
8,Bercy,0.0,0.0,0.0,0.0,0.0,0.0,0.021739,0.0,0.0,0.021739,0.152174,0.0,0.0,0.043478,0.0,0.0,0.021739,0.0,0.021739,0.0,0.0,0.043478,0.0,0.021739,0.0,0.0,0.0,0.0,0.021739,0.0,0.0,0.0,0.0,0.0,0.021739,0.021739,0.0,0.0,0.0,0.0,0.0,0.0,0.021739,0.0,0.0,0.0,0.0,0.0,0.152174,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.086957,0.065217,0.0,0.0,0.0,0.0,0.021739,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.021739,0.0,0.021739,0.0,0.0,0.021739,0.0,0.0,0.0,0.021739,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.021739,0.108696,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.021739,0.0
9,Bonne-Nouvelle,0.0,0.01,0.0,0.01,0.0,0.0,0.01,0.01,0.0,0.0,0.07,0.0,0.0,0.03,0.01,0.0,0.01,0.0,0.05,0.0,0.0,0.05,0.0,0.0,0.0,0.0,0.0,0.0,0.04,0.0,0.0,0.02,0.01,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.16,0.0,0.01,0.0,0.0,0.01,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.07,0.06,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.01,0.01,0.01,0.0,0.0,0.01,0.0,0.03,0.0,0.0,0.01,0.0,0.05,0.0,0.0,0.0,0.01,0.04,0.0,0.0,0.01,0.01,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.06,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0


Let's investigate each neighborhood along with the top 5 most common venues

In [50]:
# Function to sort the venues in descending order
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

Now let's create the new dataframe and display the top 10 venues for each neighborhood.

In [51]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = paris_grouped['Neighborhood']

for ind in np.arange(paris_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(paris_grouped.iloc[ind, :], num_top_venues)

In [52]:
neighborhoods_venues_sorted

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Amérique,Café,French Restaurant,Asian Restaurant,Bistro,Bakery,Vietnamese Restaurant,Hawaiian Restaurant,Fondue Restaurant,Diner,Doner Restaurant
1,Archives,French Restaurant,Italian Restaurant,Café,Bistro,Bakery,Burger Joint,Japanese Restaurant,Restaurant,Pizza Place,Vegetarian / Vegan Restaurant
2,Arsenal,French Restaurant,Italian Restaurant,Gastropub,Bakery,Pizza Place,Seafood Restaurant,Tapas Restaurant,Sushi Restaurant,Japanese Restaurant,Café
3,Arts-et-Metiers,French Restaurant,Japanese Restaurant,Chinese Restaurant,Italian Restaurant,Vietnamese Restaurant,Restaurant,Café,Moroccan Restaurant,Burger Joint,Bistro
4,Auteuil,French Restaurant,Café,Wings Joint,Fondue Restaurant,Deli / Bodega,Dim Sum Restaurant,Diner,Doner Restaurant,Donut Shop,Eastern European Restaurant
5,Batignolles,French Restaurant,Italian Restaurant,Bistro,Japanese Restaurant,Restaurant,Café,Chinese Restaurant,Bakery,Breakfast Spot,Indian Restaurant
6,Bel-Air,Café,French Restaurant,Wings Joint,Fondue Restaurant,Deli / Bodega,Dim Sum Restaurant,Diner,Doner Restaurant,Donut Shop,Eastern European Restaurant
7,Belleville,French Restaurant,Restaurant,Japanese Restaurant,Italian Restaurant,Burger Joint,Bistro,Gastropub,Diner,Indian Restaurant,Café
8,Bercy,Bakery,French Restaurant,Sushi Restaurant,Italian Restaurant,Japanese Restaurant,Café,Bistro,Cambodian Restaurant,Sandwich Place,Chinese Restaurant
9,Bonne-Nouvelle,French Restaurant,Italian Restaurant,Bakery,Thai Restaurant,Japanese Restaurant,Pizza Place,Burger Joint,Café,Chinese Restaurant,Restaurant


## IV.3. Clustering Restaurants

Run k-means to cluster the neighborhood into 5 clusters.

In [53]:
# set number of clusters
kclusters = 5

paris_grouped_clustering = paris_grouped.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=4).fit(paris_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

array([3, 2, 0, 2, 4, 2, 4, 2, 1, 2])

In [54]:
kmeans.labels_

array([3, 2, 0, 2, 4, 2, 4, 2, 1, 2, 2, 0, 1, 2, 2, 0, 0, 0, 2, 2, 0, 2,
       0, 2, 2, 1, 1, 2, 2, 0, 0, 2, 0, 2, 2, 2, 0, 2, 1, 0, 2, 0, 2, 0,
       2, 2, 2, 1, 2, 0, 2, 2, 2, 3, 2, 2, 1, 2, 2, 2, 2, 1, 2, 2, 0, 2,
       0, 0, 2, 2, 2, 2, 2, 2, 0, 0, 2, 1, 2])

In [55]:
neighborhoods_venues_sorted.shape

(79, 11)

In [56]:
df.shape

(79, 5)

Let's create a new dataframe that includes the clusters as well as the top 10 venues for each neighborhood.

In [57]:
paris_merged = df

# add clustering labels
paris_merged['Cluster Labels'] = kmeans.labels_

# merge paris_grouped with main df to add latitude/longitude for each neighborhood
paris_merged = paris_merged.join(neighborhoods_venues_sorted.set_index('Neighborhood'), on='Neighborhood')

paris_merged.head()

Unnamed: 0,Number,Borough,Neighborhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,1,1er Ardt - Louvre,Halles,48.862289,2.344899,3,French Restaurant,Italian Restaurant,Pizza Place,Bistro,Bakery,Chinese Restaurant,Café,Burger Joint,Thai Restaurant,Japanese Restaurant
1,1,1er Ardt - Louvre,Place-Vendôme,48.867019,2.328582,2,French Restaurant,Japanese Restaurant,Sandwich Place,Café,Italian Restaurant,Bakery,Korean Restaurant,Restaurant,Salad Place,Burger Joint
2,1,1er Ardt - Louvre,Palais-Royal,48.86466,2.336309,0,French Restaurant,Japanese Restaurant,Café,Italian Restaurant,Korean Restaurant,Ramen Restaurant,Restaurant,Bakery,Bistro,Udon Restaurant
3,1,1er Ardt - Louvre,St-Germain-l'Auxerrois,48.86065,2.33491,2,French Restaurant,Café,Italian Restaurant,Japanese Restaurant,Sandwich Place,Chinese Restaurant,Ramen Restaurant,Restaurant,Salad Place,Fast Food Restaurant
4,2,2ème Ardt - Bourse,Mail,48.868008,2.344699,4,French Restaurant,Italian Restaurant,Bistro,Bakery,Creperie,Thai Restaurant,Salad Place,Burger Joint,Restaurant,Asian Restaurant


# V. Results

And now we are ready to conclude our report.

## V.1. Examine Clusters

Let's examine each cluster and the discriminating restaurant categories that distinguish a cluster.

### Cluster 1

In [58]:
cluster_1 = paris_merged.loc[paris_merged['Cluster Labels'] == 0, paris_merged.columns[[2] + list(range(5, paris_merged.shape[1]))]]
cluster_1

Unnamed: 0,Neighborhood,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
2,Palais-Royal,0,French Restaurant,Japanese Restaurant,Café,Italian Restaurant,Korean Restaurant,Ramen Restaurant,Restaurant,Bakery,Bistro,Udon Restaurant
11,Arts-et-Metiers,0,French Restaurant,Japanese Restaurant,Chinese Restaurant,Italian Restaurant,Vietnamese Restaurant,Restaurant,Café,Moroccan Restaurant,Burger Joint,Bistro
15,Arsenal,0,French Restaurant,Italian Restaurant,Gastropub,Bakery,Pizza Place,Seafood Restaurant,Tapas Restaurant,Sushi Restaurant,Japanese Restaurant,Café
16,Jardin-des-Plantes,0,French Restaurant,Italian Restaurant,Café,Bakery,Korean Restaurant,Sushi Restaurant,Sandwich Place,Greek Restaurant,Restaurant,Mediterranean Restaurant
17,Sorbonne,0,French Restaurant,Café,Japanese Restaurant,Burger Joint,Italian Restaurant,Creperie,Bakery,Bistro,Mexican Restaurant,Sandwich Place
20,Odeon,0,French Restaurant,Café,Italian Restaurant,Bistro,Bakery,Pizza Place,Argentinian Restaurant,Vietnamese Restaurant,Seafood Restaurant,Burger Joint
22,Notre-Dame-des-Champs,0,French Restaurant,Bakery,Café,Japanese Restaurant,Bistro,Italian Restaurant,Sandwich Place,Pizza Place,Creperie,Steakhouse
29,Madeleine,0,French Restaurant,Italian Restaurant,Café,Salad Place,Asian Restaurant,Sandwich Place,Bistro,Sushi Restaurant,Gastropub,Seafood Restaurant
30,Europe,0,French Restaurant,Italian Restaurant,Pizza Place,Sandwich Place,Thai Restaurant,Bakery,Salad Place,Bistro,Sushi Restaurant,Restaurant
32,Rochechouart,0,French Restaurant,Bakery,Italian Restaurant,Sandwich Place,Bistro,Vegetarian / Vegan Restaurant,Pizza Place,Japanese Restaurant,Restaurant,Café


In [59]:
cluster_1.describe(include='all')

Unnamed: 0,Neighborhood,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
count,20,20.0,20,20,20,20,20,20,20,20,20,20
unique,20,,4,8,13,11,14,16,15,15,16,15
top,Arsenal,,French Restaurant,Italian Restaurant,Italian Restaurant,Japanese Restaurant,Bakery,Café,Bakery,Bakery,Japanese Restaurant,Café
freq,1,,17,7,4,3,3,2,3,2,3,3
mean,,0.0,,,,,,,,,,
std,,0.0,,,,,,,,,,
min,,0.0,,,,,,,,,,
25%,,0.0,,,,,,,,,,
50%,,0.0,,,,,,,,,,
75%,,0.0,,,,,,,,,,


### Cluster 2

In [60]:
cluster_2 = paris_merged.loc[paris_merged['Cluster Labels'] == 1, paris_merged.columns[[2] + list(range(5, paris_merged.shape[1]))]]
cluster_2

Unnamed: 0,Neighborhood,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
8,Enfants-Rouges,1,French Restaurant,Japanese Restaurant,Bistro,Café,Italian Restaurant,Bakery,Burger Joint,Pizza Place,Moroccan Restaurant,Sandwich Place
12,Notre-Dame,1,French Restaurant,Japanese Restaurant,Italian Restaurant,Bakery,Café,Bistro,Mexican Restaurant,Seafood Restaurant,Burger Joint,Diner
25,Invalides,1,French Restaurant,Café,Italian Restaurant,Bakery,Restaurant,Cafeteria,Vietnamese Restaurant,Japanese Restaurant,Food Court,Diner
26,Saint-Thomas-d'Aquin,1,French Restaurant,Café,Italian Restaurant,Bakery,Restaurant,American Restaurant,Salad Place,Brasserie,Sandwich Place,Pizza Place
38,Porte-Saint-Martin,1,French Restaurant,Italian Restaurant,Bistro,Pizza Place,Asian Restaurant,Breakfast Spot,Restaurant,Bakery,Mexican Restaurant,Burger Joint
48,Gare,1,Japanese Restaurant,Café,French Restaurant,Fast Food Restaurant,Italian Restaurant,Vietnamese Restaurant,Sandwich Place,Bakery,Thai Restaurant,Restaurant
57,Grenelle,1,French Restaurant,Korean Restaurant,Japanese Restaurant,Bistro,Pizza Place,Italian Restaurant,Bakery,Brasserie,Middle Eastern Restaurant,Ethiopian Restaurant
62,Porte-Dauphine,1,Café,Pizza Place,French Restaurant,Wings Joint,Fondue Restaurant,Dim Sum Restaurant,Diner,Doner Restaurant,Donut Shop,Eastern European Restaurant
78,Charonne,1,Bakery,Pizza Place,French Restaurant,Fast Food Restaurant,Japanese Restaurant,Café,Food,Brazilian Restaurant,Brasserie,Bistro


In [61]:
cluster_2.describe(include='all')

Unnamed: 0,Neighborhood,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
count,9,9.0,9,9,9,9,9,9,9,9,9,9
unique,9,,4,5,4,6,7,9,9,7,9,8
top,Porte-Dauphine,,French Restaurant,Café,French Restaurant,Bakery,Italian Restaurant,Vietnamese Restaurant,Mexican Restaurant,Brasserie,Mexican Restaurant,Diner
freq,1,,6,3,3,3,2,1,1,2,1,2
mean,,1.0,,,,,,,,,,
std,,0.0,,,,,,,,,,
min,,1.0,,,,,,,,,,
25%,,1.0,,,,,,,,,,
50%,,1.0,,,,,,,,,,
75%,,1.0,,,,,,,,,,


### Cluster 3

In [62]:
cluster_3 = paris_merged.loc[paris_merged['Cluster Labels'] == 2, paris_merged.columns[[2] + list(range(5, paris_merged.shape[1]))]]
cluster_3

Unnamed: 0,Neighborhood,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
1,Place-Vendôme,2,French Restaurant,Japanese Restaurant,Sandwich Place,Café,Italian Restaurant,Bakery,Korean Restaurant,Restaurant,Salad Place,Burger Joint
3,St-Germain-l'Auxerrois,2,French Restaurant,Café,Italian Restaurant,Japanese Restaurant,Sandwich Place,Chinese Restaurant,Ramen Restaurant,Restaurant,Salad Place,Fast Food Restaurant
5,Bonne-Nouvelle,2,French Restaurant,Italian Restaurant,Bakery,Thai Restaurant,Japanese Restaurant,Pizza Place,Burger Joint,Café,Chinese Restaurant,Restaurant
7,Vivienne,2,French Restaurant,Japanese Restaurant,Italian Restaurant,Bistro,Korean Restaurant,Ramen Restaurant,Vietnamese Restaurant,Creperie,Café,Salad Place
9,Archives,2,French Restaurant,Italian Restaurant,Café,Bistro,Bakery,Burger Joint,Japanese Restaurant,Restaurant,Pizza Place,Vegetarian / Vegan Restaurant
10,Sainte-Avoie,2,French Restaurant,Café,Restaurant,Italian Restaurant,Chinese Restaurant,Bistro,Burger Joint,Bakery,Asian Restaurant,Japanese Restaurant
13,Saint-Gervais,2,French Restaurant,Italian Restaurant,Bakery,Bistro,Falafel Restaurant,Gastropub,Café,Burger Joint,Seafood Restaurant,Sushi Restaurant
14,Saint-Merri,2,French Restaurant,Café,Bakery,Burger Joint,Italian Restaurant,Restaurant,Portuguese Restaurant,Bistro,Pizza Place,Sushi Restaurant
18,Val-de-Grace,2,French Restaurant,Café,Italian Restaurant,Bistro,Asian Restaurant,Creperie,Chinese Restaurant,Turkish Restaurant,Sushi Restaurant,Falafel Restaurant
19,Saint-Victor,2,French Restaurant,Italian Restaurant,Café,Bakery,Vietnamese Restaurant,Japanese Restaurant,Brasserie,Falafel Restaurant,Burger Joint,Bistro


In [63]:
cluster_3.describe(include='all')

Unnamed: 0,Neighborhood,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
count,46,46.0,46,46,46,46,46,46,46,46,46,46
unique,46,,4,13,12,16,20,19,24,26,27,31
top,Clignancourt,,French Restaurant,Italian Restaurant,Italian Restaurant,Bistro,Japanese Restaurant,Japanese Restaurant,Diner,Restaurant,Salad Place,Sushi Restaurant
freq,1,,41,14,10,7,7,8,4,4,3,4
mean,,2.0,,,,,,,,,,
std,,0.0,,,,,,,,,,
min,,2.0,,,,,,,,,,
25%,,2.0,,,,,,,,,,
50%,,2.0,,,,,,,,,,
75%,,2.0,,,,,,,,,,


### Cluster 4

In [64]:
cluster_4 = paris_merged.loc[paris_merged['Cluster Labels'] == 3, paris_merged.columns[[2] + list(range(5, paris_merged.shape[1]))]]
cluster_4

Unnamed: 0,Neighborhood,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Halles,3,French Restaurant,Italian Restaurant,Pizza Place,Bistro,Bakery,Chinese Restaurant,Café,Burger Joint,Thai Restaurant,Japanese Restaurant
54,Parc-de-Montsouris,3,Italian Restaurant,Restaurant,Middle Eastern Restaurant,Pizza Place,Vietnamese Restaurant,Cafeteria,Sushi Restaurant,Doner Restaurant,French Restaurant,Diner


In [65]:
cluster_4.describe(include='all')

Unnamed: 0,Neighborhood,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
count,2,2.0,2,2,2,2,2,2,2,2,2,2
unique,2,,2,2,2,2,2,2,2,2,2,2
top,Parc-de-Montsouris,,French Restaurant,Restaurant,Middle Eastern Restaurant,Bistro,Bakery,Chinese Restaurant,Sushi Restaurant,Doner Restaurant,French Restaurant,Japanese Restaurant
freq,1,,1,1,1,1,1,1,1,1,1,1
mean,,3.0,,,,,,,,,,
std,,0.0,,,,,,,,,,
min,,3.0,,,,,,,,,,
25%,,3.0,,,,,,,,,,
50%,,3.0,,,,,,,,,,
75%,,3.0,,,,,,,,,,


### Cluster 5

In [66]:
cluster_5 = paris_merged.loc[paris_merged['Cluster Labels'] == 4, paris_merged.columns[[2] + list(range(5, paris_merged.shape[1]))]]
cluster_5

Unnamed: 0,Neighborhood,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
4,Mail,4,French Restaurant,Italian Restaurant,Bistro,Bakery,Creperie,Thai Restaurant,Salad Place,Burger Joint,Restaurant,Asian Restaurant
6,Gaillon,4,French Restaurant,Japanese Restaurant,Italian Restaurant,Korean Restaurant,Sandwich Place,Restaurant,Café,Vietnamese Restaurant,Ramen Restaurant,Burger Joint


In [67]:
cluster_5.describe(include='all')

Unnamed: 0,Neighborhood,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
count,2,2.0,2,2,2,2,2,2,2,2,2,2
unique,2,,1,2,2,2,2,2,2,2,2,2
top,Mail,,French Restaurant,Japanese Restaurant,Bistro,Bakery,Sandwich Place,Restaurant,Salad Place,Burger Joint,Restaurant,Asian Restaurant
freq,1,,2,1,1,1,1,1,1,1,1,1
mean,,4.0,,,,,,,,,,
std,,0.0,,,,,,,,,,
min,,4.0,,,,,,,,,,
25%,,4.0,,,,,,,,,,
50%,,4.0,,,,,,,,,,
75%,,4.0,,,,,,,,,,


## V.2. Visualizing Clusters

Finally, let's visualize the resulting clusters.

In [68]:
# create map
map_clusters = folium.Map(location=[paris_lat, paris_lon], zoom_start=12)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i+x+(i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(paris_merged['Latitude'], paris_merged['Longitude'], paris_merged['Neighborhood'], paris_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster+1), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

# VI. Discussion

**MAP LEGEND**  
Cluster 1 - Red dots  
Cluster 2 - Purple dots  
Cluster 3 - Blue dots  
Cluster 4 - Green dots  
Cluster 5 - Orange dots

Analyzing the most popular restaurants in each cluster, the stakeholder should prefer the least popular types as a safe choice. There is no sense in opening the 17th pizzeria in the same street. Of course, there might be more than 10 types in a location. And one might object, that following this logic, the stakeholder must prefer the last type in a full list, and not the 10th one. But bear in mind that descending on the popularity list we might face an absence of demand for this type of food, and open a restaurant that is not needed in this particular location. Presence of interested customers is a must for a successful business. That is why in our recommendations we offer to stop on 10th and 9th positions.

Recommendations, based on description of each cluster:
 - Cluster 1 Locations: __Japanese Restaurant__ or __Café__
 - Cluster 2 Locations: __Middle Eastern Restaurant__ or __Diner__
 - Cluster 3 Locations: __Pizza Place__ or __Sushi Restaurant__
 - Cluster 4 Locations: __French Restaurant__ or __Diner__
 - Cluster 5 Locations: __Ramen Restaurant__ or __Burger Joint__

After the type of restaurant is chosen, it is time to select a right place.

# VII. Conclusion

In this report we worked out a methodology to determine what the most promising type of restaurant is and where it should be opened.

We collected information about Paris neighborhoods from government website, and using geospatial libraries mapped them. Using Foursquare API, we collected the top 100 restaurants and their types for each location within a radius 300 meters from its central point. Then we grouped collected restaurants by location and by taking the mean of the frequency of occurrence of each type, preparing them for clustering. Finally we clustered restaurants by the k-means algorithm and analize the top 10 most common restaurants in each cluster, making useful observations. Eventually we visualized clusters on the map, thus showing the best locations for opening the chosen type of restaurant.

This type of analysis can be applied to any city of your choice that has available geospatial information.

This type of analysis can be applied to any type of venue (shopping, clubs, etc.) that is available in Foursquare database.