<a href="https://colab.research.google.com/github/mojanens/Coursera_capstone/blob/main/Battle_of_the_neighborhoods.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

**Business problem description:**

My client wants to open a new yoga studio in Paris. The task is to find the best location for it: A neighborhood where there is least existing competition (few or no yoga studios at all), but potential demand for yoga studio (similar venues are trending in Foursquare than in those neigborhoods that yoga studios are also trending high).

**Data used to solve the problem:**

The trendind venues and existing yoga studios are searched for from Foursquare Paris data. The city of Paris is divided into 20 arrondissements. However, many of the arrondissements are so large in area that in finding the potential location for a yoga studio it is more appropriate to analyse the Foursquare data on quarter level:https://en.wikipedia.org/wiki/Quarters_of_Paris. 

The coordinates of the quarters can be retrieved using Geocoder. We will look at the radius of 500 m from the reference coordinates of the quarter. 500 m can be considered a distance you can take in order to go to one's favourite yoga studio. As some of the quarters are small in area, the 500-m circles may in some cases overlap, meaning that same venues may be included in several quarters. This does not matter in our case, as the cluster of the nearby services is most important for the analysis. Also, there may be areas that are not covered by any of those circles. The model is not perfect, but will provide a starting point for searching a location for a new yoga studio.

Install required packages

In [2]:
! pip3 install requests
! pip3 install beautifulsoup4



Import packages needed

In [3]:
import requests
from bs4 import BeautifulSoup
import pandas as pd
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values
import folium # map rendering library
import json
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe
import numpy as np # library to handle data in a vectorized manner
from sklearn.cluster import KMeans
import matplotlib.cm as cm
import matplotlib.colors as colors


Define Foursquare credentials and version

In [4]:
CLIENT_ID = 'LSGK0R44VUDVZPWZOLVBWFFCE5BM23X1N0MVDPRINXFHFYWB' 
CLIENT_SECRET = 'AQTYIEGCIE1SYL5KZKKE3DG1PQDNJISXC5ZPNIVEYBZQMH1F' 
VERSION = '20180604'
LIMIT = 100 

Import the html table into Pandas dataframe

In [5]:
url = 'https://fr.wikipedia.org/wiki/Liste_des_quartiers_administratifs_de_Paris'

dfs = pd.read_html(url)
df = dfs[0]
df.head()

Unnamed: 0,"Arrondissement[1],[n 1]",Quartiers,Quartiers.1,Population en1999 (hab.)[2],Superficie(ha)[2],Densitéhab/km2,Plan
0,1er arrondissementdit « du Louvre »,1er,Saint-Germain-l'Auxerrois,1 672,869,1 924,
1,1er arrondissementdit « du Louvre »,2e,Halles,8 984,412,21 806,
2,1er arrondissementdit « du Louvre »,3e,Palais-Royal,3 195,274,11 661,
3,1er arrondissementdit « du Louvre »,4e,Place-Vendôme,3 044,269,11 316,
4,2e arrondissementdit « de la Bourse »,5e,Gaillon,1 345,188,7 154,


Drop the columns that do not contain essential information for this exercise (population, surface, habitation density and map)

In [6]:
df_paris = df.drop(columns=['Population en1999 (hab.)[2]',	'Superficie(ha)[2]','Densitéhab/km2',	'Plan'])
df_paris.head()

Unnamed: 0,"Arrondissement[1],[n 1]",Quartiers,Quartiers.1
0,1er arrondissementdit « du Louvre »,1er,Saint-Germain-l'Auxerrois
1,1er arrondissementdit « du Louvre »,2e,Halles
2,1er arrondissementdit « du Louvre »,3e,Palais-Royal
3,1er arrondissementdit « du Louvre »,4e,Place-Vendôme
4,2e arrondissementdit « de la Bourse »,5e,Gaillon


Find the geospatial coordinates for the quartiers

In [7]:
address = 'Palais-Royal, FR'

#for address in addresses
geolocator = Nominatim(user_agent="paris_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Palais-Royal are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Palais-Royal are 48.863584700000004, 2.3362042200938715.


In [8]:
df_addresses = df_paris.drop(columns=['Arrondissement[1],[n 1]',	'Quartiers'])
address_list = df_addresses.values.tolist()
address_list

[["Saint-Germain-l'Auxerrois"],
 ['Halles'],
 ['Palais-Royal'],
 ['Place-Vendôme'],
 ['Gaillon'],
 ['Vivienne'],
 ['Mail'],
 ['Bonne-Nouvelle'],
 ['Arts-et-Métiers'],
 ['Enfants-Rouges'],
 ['Archives'],
 ['Sainte-Avoye'],
 ['Saint-Merri'],
 ['Saint-Gervais'],
 ['Arsenal'],
 ['Notre-Dame'],
 ['Saint-Victor'],
 ['Jardin-des-Plantes'],
 ['Val-de-Grâce'],
 ['Sorbonne'],
 ['Monnaie'],
 ['Odéon'],
 ['Notre-Dame-des-Champs'],
 ['Saint-Germain-des-Prés'],
 ["Saint-Thomas-d'Aquin"],
 ['Invalides'],
 ['École-Militaire'],
 ['Gros-Caillou'],
 ['Champs-Élysées'],
 ['Faubourg-du-Roule'],
 ['Madeleine'],
 ['Europe'],
 ['Saint-Georges'],
 ["Chaussée-d'Antin"],
 ['Faubourg-Montmartre'],
 ['Rochechouart'],
 ['Saint-Vincent-de-Paul'],
 ['Porte-Saint-Denis'],
 ['Porte-Saint-Martin'],
 ['Hôpital-Saint-Louis'],
 ['Folie-Méricourt'],
 ['Saint-Ambroise'],
 ['Roquette'],
 ['Sainte-Marguerite'],
 ['Bel-Air'],
 ['Picpus'],
 ['Bercy'],
 ['Quinze-Vingts'],
 ['Salpêtrière'],
 ['Gare'],
 ['Maison-Blanche'],
 ['Croul

In [9]:

table_contents = []
geolocator = Nominatim(user_agent="paris_explorer")
for address in address_list:
  addr = str(address)+', Paris, FR' 
  cell = {}
  location = geolocator.geocode(addr)
  latitude = location.latitude
  longitude = location.longitude
  cell['Quartier'] = str(address)
  cell['Latidude'] = latitude
  cell['Longitude'] = longitude
  table_contents.append(cell)
#print('The geograpical coordinate of ', address, 'are {}, {}.'.format(latitude, longitude))
df_loc=pd.DataFrame(table_contents)

In [11]:
df_loc.head(10)

Unnamed: 0,Quartier,Latidude,Longitude
0,"[""Saint-Germain-l'Auxerrois""]",48.860211,2.336299
1,['Halles'],48.862466,2.346009
2,['Palais-Royal'],48.863585,2.336204
3,['Place-Vendôme'],48.867463,2.329428
4,['Gaillon'],48.869135,2.332909
5,['Vivienne'],48.868859,2.339363
6,['Mail'],48.868054,2.344593
7,['Bonne-Nouvelle'],48.870623,2.34875
8,['Arts-et-Métiers'],48.865441,2.356132
9,['Enfants-Rouges'],48.864241,2.362585


function that extracts the category of each of the venues; borrowed from the New York exercise


In [12]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Quartier', 
                  'Quartier Latitude', 
                  'Quartier Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [13]:
paris_venues = getNearbyVenues(names=df_loc['Quartier'],
                                   latitudes=df_loc['Latidude'],
                                   longitudes=df_loc['Longitude']
                                  )

["Saint-Germain-l'Auxerrois"]
['Halles']
['Palais-Royal']
['Place-Vendôme']
['Gaillon']
['Vivienne']
['Mail']
['Bonne-Nouvelle']
['Arts-et-Métiers']
['Enfants-Rouges']
['Archives']
['Sainte-Avoye']
['Saint-Merri']
['Saint-Gervais']
['Arsenal']
['Notre-Dame']
['Saint-Victor']
['Jardin-des-Plantes']
['Val-de-Grâce']
['Sorbonne']
['Monnaie']
['Odéon']
['Notre-Dame-des-Champs']
['Saint-Germain-des-Prés']
["Saint-Thomas-d'Aquin"]
['Invalides']
['École-Militaire']
['Gros-Caillou']
['Champs-Élysées']
['Faubourg-du-Roule']
['Madeleine']
['Europe']
['Saint-Georges']
["Chaussée-d'Antin"]
['Faubourg-Montmartre']
['Rochechouart']
['Saint-Vincent-de-Paul']
['Porte-Saint-Denis']
['Porte-Saint-Martin']
['Hôpital-Saint-Louis']
['Folie-Méricourt']
['Saint-Ambroise']
['Roquette']
['Sainte-Marguerite']
['Bel-Air']
['Picpus']
['Bercy']
['Quinze-Vingts']
['Salpêtrière']
['Gare']
['Maison-Blanche']
['Croulebarbe']
['Montparnasse']
['Parc-de-Montsouris']
['Petit-Montrouge']
['Plaisance']
['Saint-Lambert']
['

In [14]:
print(paris_venues.shape)
paris_venues.head()

(5188, 7)


Unnamed: 0,Quartier,Quartier Latitude,Quartier Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,"[""Saint-Germain-l'Auxerrois""]",48.860211,2.336299,Cour Carrée du Louvre,48.86036,2.338543,Pedestrian Plaza
1,"[""Saint-Germain-l'Auxerrois""]",48.860211,2.336299,La Vénus de Milo (Vénus de Milo),48.859943,2.337234,Exhibit
2,"[""Saint-Germain-l'Auxerrois""]",48.860211,2.336299,Musée du Louvre,48.860847,2.33644,Art Museum
3,"[""Saint-Germain-l'Auxerrois""]",48.860211,2.336299,Pylones,48.861419,2.334143,Gift Shop
4,"[""Saint-Germain-l'Auxerrois""]",48.860211,2.336299,Pont des Arts,48.858565,2.337635,Bridge


Find the yoga studios in Paris and in which quartiers they are located

In [17]:
paris_venues[paris_venues['Venue Category']== 'Yoga Studio']

Unnamed: 0,Quartier,Quartier Latitude,Quartier Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
1399,['Notre-Dame'],48.852937,2.35005,Rasa Yoga,48.851454,2.346311,Yoga Studio
1647,['Sorbonne'],48.849123,2.345325,Rasa Yoga,48.851454,2.346311,Yoga Studio
3313,['Folie-Méricourt'],48.86738,2.373423,Ashtanga Yoga Paris,48.865272,2.373232,Yoga Studio
3426,['Saint-Ambroise'],48.861052,2.374796,Ashtanga Yoga Paris,48.865272,2.373232,Yoga Studio


In [18]:
# one hot encoding
paris_onehot = pd.get_dummies(paris_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
paris_onehot['Quartier'] = df_paris['Quartiers.1'] 

# move neighborhood column to the first column
fixed_columns = [paris_onehot.columns[-1]] + list(paris_onehot.columns[:-1])
paris_onehot = paris_onehot[fixed_columns]

paris_onehot.head()

Unnamed: 0,Quartier,Accessories Store,Afghan Restaurant,African Restaurant,American Restaurant,Antique Shop,Arepa Restaurant,Argentinian Restaurant,Art Gallery,Art Museum,Arts & Crafts Store,Arts & Entertainment,Asian Restaurant,Athletics & Sports,Auto Dealership,Auvergne Restaurant,BBQ Joint,Bagel Shop,Bakery,Bank,Bar,Basketball Court,Basque Restaurant,Beach Bar,Bed & Breakfast,Beer Bar,Beer Garden,Beer Store,Bike Rental / Bike Share,Bistro,Boarding House,Boat or Ferry,Bookstore,Botanical Garden,Boutique,Boxing Gym,Brasserie,Brazilian Restaurant,Breakfast Spot,Breton Restaurant,...,Sports Club,Steakhouse,Street Food Gathering,Supermarket,Sushi Restaurant,Syrian Restaurant,Szechuan Restaurant,Taco Place,Tailor Shop,Taiwanese Restaurant,Tapas Restaurant,Tattoo Parlor,Tea Room,Tech Startup,Tennis Court,Thai Restaurant,Theater,Thrift / Vintage Store,Tourist Information Center,Toy / Game Store,Track,Track Stadium,Trail,Train Station,Tram Station,Trattoria/Osteria,Travel Agency,Turkish Restaurant,Udon Restaurant,Used Bookstore,Vegetarian / Vegan Restaurant,Venezuelan Restaurant,Video Game Store,Vietnamese Restaurant,Wine Bar,Wine Shop,Women's Store,Yoga Studio,Zoo,Zoo Exhibit
0,Saint-Germain-l'Auxerrois,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,Halles,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,Palais-Royal,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,Place-Vendôme,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,Gaillon,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


In [19]:
paris_grouped = paris_onehot.groupby('Quartier').mean().reset_index()
paris_grouped

Unnamed: 0,Quartier,Accessories Store,Afghan Restaurant,African Restaurant,American Restaurant,Antique Shop,Arepa Restaurant,Argentinian Restaurant,Art Gallery,Art Museum,Arts & Crafts Store,Arts & Entertainment,Asian Restaurant,Athletics & Sports,Auto Dealership,Auvergne Restaurant,BBQ Joint,Bagel Shop,Bakery,Bank,Bar,Basketball Court,Basque Restaurant,Beach Bar,Bed & Breakfast,Beer Bar,Beer Garden,Beer Store,Bike Rental / Bike Share,Bistro,Boarding House,Boat or Ferry,Bookstore,Botanical Garden,Boutique,Boxing Gym,Brasserie,Brazilian Restaurant,Breakfast Spot,Breton Restaurant,...,Sports Club,Steakhouse,Street Food Gathering,Supermarket,Sushi Restaurant,Syrian Restaurant,Szechuan Restaurant,Taco Place,Tailor Shop,Taiwanese Restaurant,Tapas Restaurant,Tattoo Parlor,Tea Room,Tech Startup,Tennis Court,Thai Restaurant,Theater,Thrift / Vintage Store,Tourist Information Center,Toy / Game Store,Track,Track Stadium,Trail,Train Station,Tram Station,Trattoria/Osteria,Travel Agency,Turkish Restaurant,Udon Restaurant,Used Bookstore,Vegetarian / Vegan Restaurant,Venezuelan Restaurant,Video Game Store,Vietnamese Restaurant,Wine Bar,Wine Shop,Women's Store,Yoga Studio,Zoo,Zoo Exhibit
0,Amérique,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,Archives,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,Arsenal,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,Arts-et-Métiers,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,Auteuil,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
75,Val-de-Grâce,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
76,Villette,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
77,Vivienne,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
78,École-Militaire,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


Function for the most common venues (borrowed from the New York exercise)

In [23]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [24]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Quartier']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
quartiers_venues_sorted = pd.DataFrame(columns=columns)
quartiers_venues_sorted['Quartier'] = paris_grouped['Quartier']

for ind in np.arange(paris_grouped.shape[0]):
    quartiers_venues_sorted.iloc[ind, 1:] = return_most_common_venues(paris_grouped.iloc[ind, :], num_top_venues)

quartiers_venues_sorted.head()

Unnamed: 0,Quartier,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Amérique,French Restaurant,Exhibit,Eastern European Restaurant,Electronics Store,Empanada Restaurant,English Restaurant,Escape Room,Ethiopian Restaurant,Falafel Restaurant,Donut Shop
1,Archives,Historic Site,Zoo Exhibit,Donut Shop,Eastern European Restaurant,Electronics Store,Empanada Restaurant,English Restaurant,Escape Room,Ethiopian Restaurant,Exhibit
2,Arsenal,Church,Donut Shop,Eastern European Restaurant,Electronics Store,Empanada Restaurant,English Restaurant,Escape Room,Ethiopian Restaurant,Exhibit,Zoo Exhibit
3,Arts-et-Métiers,Art Museum,Dry Cleaner,Electronics Store,Empanada Restaurant,English Restaurant,Escape Room,Ethiopian Restaurant,Exhibit,Falafel Restaurant,Zoo Exhibit
4,Auteuil,Restaurant,Zoo Exhibit,Ethiopian Restaurant,Dry Cleaner,Eastern European Restaurant,Electronics Store,Empanada Restaurant,English Restaurant,Escape Room,Exhibit


Divide Parisian quartiers in 5 similar clusters by their 10 most common venues

In [25]:
# set number of clusters
kclusters = 5

paris_grouped_clustering = paris_grouped.drop('Quartier', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(paris_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

array([3, 4, 0, 0, 0, 0, 0, 0, 0, 0], dtype=int32)

In [26]:
quartiers_venues_sorted.insert(1, 'Cluster Labels', kmeans.labels_)

quartiers_venues_sorted.head()


Unnamed: 0,Quartier,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Amérique,3,French Restaurant,Exhibit,Eastern European Restaurant,Electronics Store,Empanada Restaurant,English Restaurant,Escape Room,Ethiopian Restaurant,Falafel Restaurant,Donut Shop
1,Archives,4,Historic Site,Zoo Exhibit,Donut Shop,Eastern European Restaurant,Electronics Store,Empanada Restaurant,English Restaurant,Escape Room,Ethiopian Restaurant,Exhibit
2,Arsenal,0,Church,Donut Shop,Eastern European Restaurant,Electronics Store,Empanada Restaurant,English Restaurant,Escape Room,Ethiopian Restaurant,Exhibit,Zoo Exhibit
3,Arts-et-Métiers,0,Art Museum,Dry Cleaner,Electronics Store,Empanada Restaurant,English Restaurant,Escape Room,Ethiopian Restaurant,Exhibit,Falafel Restaurant,Zoo Exhibit
4,Auteuil,0,Restaurant,Zoo Exhibit,Ethiopian Restaurant,Dry Cleaner,Eastern European Restaurant,Electronics Store,Empanada Restaurant,English Restaurant,Escape Room,Exhibit


In [27]:

df_loc['Quartier'] = df_paris['Quartiers.1'] 
df_paris_merged = df_loc

df_paris_merged = df_paris_merged.join(quartiers_venues_sorted.set_index('Quartier'), on='Quartier', how ='inner')

df_paris_merged.head() 

Unnamed: 0,Quartier,Latidude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Saint-Germain-l'Auxerrois,48.860211,2.336299,0,Pedestrian Plaza,Ethiopian Restaurant,Dry Cleaner,Eastern European Restaurant,Electronics Store,Empanada Restaurant,English Restaurant,Escape Room,Zoo Exhibit,Donut Shop
1,Halles,48.862466,2.346009,0,Exhibit,Zoo Exhibit,Eastern European Restaurant,Electronics Store,Empanada Restaurant,English Restaurant,Escape Room,Ethiopian Restaurant,Falafel Restaurant,Donut Shop
2,Palais-Royal,48.863585,2.336204,0,Art Museum,Dry Cleaner,Electronics Store,Empanada Restaurant,English Restaurant,Escape Room,Ethiopian Restaurant,Exhibit,Falafel Restaurant,Zoo Exhibit
3,Place-Vendôme,48.867463,2.329428,0,Gift Shop,Zoo Exhibit,Fountain,Eastern European Restaurant,Electronics Store,Empanada Restaurant,English Restaurant,Escape Room,Ethiopian Restaurant,Exhibit
4,Gaillon,48.869135,2.332909,0,Bridge,Zoo Exhibit,Falafel Restaurant,Electronics Store,Empanada Restaurant,English Restaurant,Escape Room,Ethiopian Restaurant,Exhibit,Farmers Market


In [28]:
address = 'Paris, FR'

geolocator = Nominatim(user_agent="paris_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Paris are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Paris are 48.8566969, 2.3514616.


In [29]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(df_paris_merged['Latidude'], df_paris_merged['Longitude'], df_paris_merged['Quartier'], df_paris_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

In [30]:
df_arrondissement = df_paris['Arrondissement[1],[n 1]']

In [31]:
df_paris_merged['Arrondissement']=df_arrondissement

Find the clusters of those quartiers where the yoga studios are located (Notre-Dame, Sorbonne, Folie-Méricourt and Saint-Ambroise)

In [37]:
df_paris_merged[df_paris_merged['Quartier']== 'Notre-Dame']

Unnamed: 0,Quartier,Latidude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue,Arrondissement
15,Notre-Dame,48.852937,2.35005,0,Coffee Shop,Zoo Exhibit,Exhibit,Eastern European Restaurant,Electronics Store,Empanada Restaurant,English Restaurant,Escape Room,Ethiopian Restaurant,Falafel Restaurant,4e arrondissementdit « de l'Hôtel-de-Ville »


In [42]:
df_paris_merged[df_paris_merged['Quartier']== 'Sorbonne']


Unnamed: 0,Quartier,Latidude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue,Arrondissement
19,Sorbonne,48.849123,2.345325,2,Hotel,Zoo Exhibit,Exhibit,Eastern European Restaurant,Electronics Store,Empanada Restaurant,English Restaurant,Escape Room,Ethiopian Restaurant,Falafel Restaurant,5e arrondissementdit « du Panthéon »


In [38]:
df_paris_merged[df_paris_merged['Quartier']== 'Folie-Méricourt']

Unnamed: 0,Quartier,Latidude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue,Arrondissement
40,Folie-Méricourt,48.86738,2.373423,3,French Restaurant,Exhibit,Eastern European Restaurant,Electronics Store,Empanada Restaurant,English Restaurant,Escape Room,Ethiopian Restaurant,Falafel Restaurant,Donut Shop,11e arrondissementdit « de Popincourt »


In [39]:
df_paris_merged[df_paris_merged['Quartier']== 'Saint-Ambroise']

Unnamed: 0,Quartier,Latidude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue,Arrondissement
41,Saint-Ambroise,48.861052,2.374796,3,French Restaurant,Exhibit,Eastern European Restaurant,Electronics Store,Empanada Restaurant,English Restaurant,Escape Room,Ethiopian Restaurant,Falafel Restaurant,Donut Shop,11e arrondissementdit « de Popincourt »


We can conclude that there are yoga studios in quartiers in clusters 0, 2 and 3.

There are only 4 venues classified as yoga studios in Paris, but two of them are located in cluster 3 quartiers. To simplify our search, let's take a closer look into cluster 3.The most recommended places for a yoga studio would be the quartiers of cluster 3 that do not have yet a yoga studio. The potential for the yoga studio location is estimated in the most common venues perspective. 


In [41]:
df_paris_merged.loc[df_paris_merged['Cluster Labels'] == 3, df_paris_merged.columns[[0] + list(range(14, df_paris_merged.shape[1]))]]

Unnamed: 0,Quartier,Arrondissement
22,Notre-Dame-des-Champs,6e arrondissementdit « du Luxembourg »
29,Faubourg-du-Roule,8e arrondissementdit « de l'Élysée »
40,Folie-Méricourt,11e arrondissementdit « de Popincourt »
41,Saint-Ambroise,11e arrondissementdit « de Popincourt »
49,Gare,13e arrondissementdit « des Gobelins »
59,Javel,15e arrondissementdit « de Vaugirard »
70,Goutte-d'Or,18e arrondissementdit « des Buttes-Montmartre »
71,Chapelle,18e arrondissementdit « des Buttes-Montmartre »
74,Amérique,19e arrondissementdit « des Buttes-Chaumont »
75,Combat,19e arrondissementdit « des Buttes-Chaumont »


Let's print the quartiers of cluster 3 that do not have yet a yoga studio, but would be potential places for it - in the most common venues perspective.

In [54]:
df_3=df_paris_merged.loc[df_paris_merged['Cluster Labels'] == 3]

df_3.drop(index=40,axis=1, inplace=True)

df_3.drop(index=41,axis=1, inplace=True)

df_3

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  errors=errors,


Unnamed: 0,Quartier,Latidude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue,Arrondissement
22,Notre-Dame-des-Champs,48.844688,2.328831,3,French Restaurant,Exhibit,Eastern European Restaurant,Electronics Store,Empanada Restaurant,English Restaurant,Escape Room,Ethiopian Restaurant,Falafel Restaurant,Donut Shop,6e arrondissementdit « du Luxembourg »
29,Faubourg-du-Roule,48.874026,2.302803,3,French Restaurant,Exhibit,Eastern European Restaurant,Electronics Store,Empanada Restaurant,English Restaurant,Escape Room,Ethiopian Restaurant,Falafel Restaurant,Donut Shop,8e arrondissementdit « de l'Élysée »
49,Gare,48.827898,2.372877,3,French Restaurant,Exhibit,Eastern European Restaurant,Electronics Store,Empanada Restaurant,English Restaurant,Escape Room,Ethiopian Restaurant,Falafel Restaurant,Donut Shop,13e arrondissementdit « des Gobelins »
59,Javel,48.839247,2.27905,3,French Restaurant,Exhibit,Eastern European Restaurant,Electronics Store,Empanada Restaurant,English Restaurant,Escape Room,Ethiopian Restaurant,Falafel Restaurant,Donut Shop,15e arrondissementdit « de Vaugirard »
70,Goutte-d'Or,48.892676,2.35604,3,French Restaurant,Exhibit,Eastern European Restaurant,Electronics Store,Empanada Restaurant,English Restaurant,Escape Room,Ethiopian Restaurant,Falafel Restaurant,Donut Shop,18e arrondissementdit « des Buttes-Montmartre »
71,Chapelle,48.884407,2.360306,3,French Restaurant,Exhibit,Eastern European Restaurant,Electronics Store,Empanada Restaurant,English Restaurant,Escape Room,Ethiopian Restaurant,Falafel Restaurant,Donut Shop,18e arrondissementdit « des Buttes-Montmartre »
74,Amérique,48.882424,2.394025,3,French Restaurant,Exhibit,Eastern European Restaurant,Electronics Store,Empanada Restaurant,English Restaurant,Escape Room,Ethiopian Restaurant,Falafel Restaurant,Donut Shop,19e arrondissementdit « des Buttes-Chaumont »
75,Combat,48.877421,2.37102,3,French Restaurant,Exhibit,Eastern European Restaurant,Electronics Store,Empanada Restaurant,English Restaurant,Escape Room,Ethiopian Restaurant,Falafel Restaurant,Donut Shop,19e arrondissementdit « des Buttes-Chaumont »


Display the potential yoga studio locations on a map

In [58]:
# create map of potential yoga studio locations
map_yoga = folium.Map(location=[latitude, longitude], zoom_start=11)

# add markers to the map

for lat, lon, poi in zip(df_3['Latidude'], df_3['Longitude'], df_3['Quartier']):
    label = folium.Popup(str(poi) + ' Cluster ', parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='blue',
        fill_opacity=0.7).add_to(map_yoga)
       
map_yoga