Capstone Project. Real State Agent Helper
========================

# 1. Introduction. Business Problem

## *Problem description and Bussiness understanding*

A real State Agent generally has to be able to provide all the possible information about the characteristics of a accommodation, like the area, number of rooms, if there is a garden or not?, is there a balcony? a garden? etc etc. They possessed this kind of information easily.

However, agents sometimes have to face specific client demands about the characteristics of the different locations. This information is not so easy to access and summarize in order to satisfy the demands and provide useful information for the clients.

Examples of different real state clients demands concerning locations are:

* How many primary, high schols are there for my kids? Are there parks around?
* Are there many bars and restaurants around in order to enjoy the night life?
* How about cultural locations. I love to go to the teather and cinemas. Are There many options around?

**The general question to ask is: How can we provide useful information about locations regarding a specific real state client demand. In other words, Is the future accommodation suitable for a client specific needs?**

We can identify different kind of needs. We start to define three (3) kind of needs or scénarios:

* Family environments an cultural environment: is the location suitable for kids and families?. Are there any parks, schools, is it safe? Are there museums, cinemas and theaters?
* Nightlife environments: is the location suibale for a person that enjoys go out frequently. How many bars, restaurants, discos are there
* Service facilities: how many service facilities are there? such as hotels, banks, train stations, spa, gyms?

This solution also compares how any particular location is similar to another concerning these scénarios 

## *Project environment. Analytic approach*

**Where to implement this solution**

This solution is suitable for any kind of city that possess many kind of different locations in a defined area. For this project the city of Paris will be explored. 

* Why Paris? It is one of the cities in which each location has its particularity and provides different kind of needs

#### Analytic Approach:

For this project, a clustering machine learning solution will be needed in order to group the different neighborhoods for the 4 defined scenarios

In the next section we are going to explain the different requirement for the data and how to collect it, in order to provide the answer to our problem

# 2. Data requirements, sources and collection

## *Data requirements*

In order to solve the problem we have to collect, combine and analyze two kind of informations/datasets:

* *Demographic information:* for Paris, or any other city, we have to determine and investigate how it is divided. Is it divided only by neighborhoods? or any zones?. After some web investigation we can say that the city of Paris is divided by (arrondissements) wich are the districts and "quartiers" which we can translate by neighborhoods. 


* *Location venues:* we need to get a dataset that provides all the different kinds of venues for a particular location. For this project we will use the foursquare location data that we use in this capstone module

## *Data sources and collection*

* For the demographic information we can use two sources or methods:

    - Webscraping of websites like Wikipedia: There are many wikepedia sites that provides tables of the different district of Paris and its name like https://fr.wikipedia.org/wiki/Liste_des_quartiers_administratifs_de_Paris, or
    
    - A public dataset of the Paris cityhall website https://opendata.paris.fr/explore/dataset/quartier_paris/table/. that contains all the "quartier administratifs" (official neighborhoods) of the city. This public data source offers the opportunity to obtain the dataset in different formats (csv, json, excel, etc). We are going to get the csv file. This file we will provide the district number, neighboorhood name and coordinates (lat and long). **This is the method that I will use since it provides all the necessary information**
    
    
* For the venues:
    - As we said we are going to use the Foursquare dataset. We are going to use the API and we are going to use the explore url request: 
        
    https://api.foursquare.com/v2/venues/explore?client_id=CLIENT_ID&client_secret=CLIENT_SECRET&ll=LATITUDE,LONGITUDE&v=VERSION&limit=LIMIT
    
    
* The latitude and longtide information is getted in the demographic dataset

In [1]:
#import packages and libraries for dataframe treatement
!pip install pandas
import pandas as pd
import numpy as np
import requests



In [2]:
paris_data_url="https://www.data.gouv.fr/fr/datasets/r/a3b31fdc-85dc-4aeb-94c6-a8b57aebef77"
paris_df = pd.read_csv(paris_data_url, sep=';')
print(paris_df.shape)
paris_df.head()

(80, 10)


Unnamed: 0,n_sq_qu,c_qu,c_quinsee,l_qu,c_ar,n_sq_ar,perimetre,surface,geom_x_y,geom
0,750000014,14,7510402,Saint-Gervais,4,750000004,2678.340923,422028.2,"48.8557186509,2.35816233385","{""type"": ""Polygon"", ""coordinates"": [[[2.363764..."
1,750000025,25,7510701,Saint-Thomas-d'Aquin,7,750000007,3827.253353,826559.4,"48.8552632694,2.32558765258","{""type"": ""Polygon"", ""coordinates"": [[[2.322133..."
2,750000038,38,7511002,Porte-Saint-Denis,10,750000010,2736.292954,472113.6,"48.873617661,2.35228289495","{""type"": ""Polygon"", ""coordinates"": [[[2.355344..."
3,750000001,1,7510101,Saint-Germain-l'Auxerrois,1,750000001,5057.549475,869000.7,"48.8606501352,2.33491032928","{""type"": ""Polygon"", ""coordinates"": [[[2.344593..."
4,750000073,73,7511901,Villette,19,750000019,5191.01883,1285705.0,"48.8876610888,2.37446821213","{""type"": ""Polygon"", ""coordinates"": [[[2.370498..."


In [3]:
#The Colums n_sq_qu, c_qu, c_quinsee, perimetre, surface and geom are not necessary, we are going to delete them
paris_df.drop(['n_sq_qu', 'c_qu', 'c_quinsee', 'n_sq_ar', 'perimetre', 'surface','geom'], axis='columns', inplace=True)

In [4]:
#We are going to rename the columns 
paris_df = paris_df.rename(columns = {'l_qu': 'Neighborhood', 'c_ar':'District', 'geom_x_y':'coor'})
paris_df.head()

Unnamed: 0,Neighborhood,District,coor
0,Saint-Gervais,4,"48.8557186509,2.35816233385"
1,Saint-Thomas-d'Aquin,7,"48.8552632694,2.32558765258"
2,Porte-Saint-Denis,10,"48.873617661,2.35228289495"
3,Saint-Germain-l'Auxerrois,1,"48.8606501352,2.33491032928"
4,Villette,19,"48.8876610888,2.37446821213"


In [5]:
#We see that the coordinates are in the same column. We split this into two columns and then we can drop the "coor" column
paris_df[['Latitude', 'Longitude']]= paris_df.coor.str.split(",", expand=True)
paris_df.drop(['coor'], axis='columns', inplace= True)

In [6]:
#We change the data type of Latitude and Longitude columns to float
paris_df['Latitude'] = paris_df['Latitude'].astype('float64')
paris_df['Longitude'] = paris_df['Longitude'].astype('float64')

In [7]:
paris_df.dtypes

Neighborhood     object
District          int64
Latitude        float64
Longitude       float64
dtype: object

## We have just obtain the final Paris demographic database 

In [8]:
print(paris_df.shape)
paris_df.head(10)

(80, 4)


Unnamed: 0,Neighborhood,District,Latitude,Longitude
0,Saint-Gervais,4,48.855719,2.358162
1,Saint-Thomas-d'Aquin,7,48.855263,2.325588
2,Porte-Saint-Denis,10,48.873618,2.352283
3,Saint-Germain-l'Auxerrois,1,48.86065,2.33491
4,Villette,19,48.887661,2.374468
5,Val-de-Grâce,5,48.841684,2.343861
6,Necker,15,48.842711,2.310777
7,Père-Lachaise,20,48.863719,2.395273
8,La Chapelle,18,48.894012,2.364387
9,Europe,8,48.878148,2.317175


## Let's plot into a map the different neighborhoods

In [9]:
#We import the libraries
!conda install -c conda-forge folium=0.5.0 --yes 
import folium
!pip install pgeocode
import pgeocode

Collecting package metadata (current_repodata.json): done
Solving environment: done

# All requested packages already installed.



In [10]:
h = pgeocode.Nominatim('fr')

loc = h.query_postal_code("75001") #Postal code on paris downtown
map_Paris = folium.Map(location=[loc.latitude, loc.longitude],zoom_start=13)

for lat, lng, label in zip(paris_df['Latitude'], paris_df['Longitude'], paris_df['Neighborhood']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_Paris)

map_Paris

## Let's explore now the venues using the Foursquare API

In [11]:
# The code was removed by Watson Studio for sharing.

### We create a function in order to retrieve the venues for the neighborhoods


In [12]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

## Now we get the venues for each neighborhood

In [13]:
neighborhood_names = paris_df['Neighborhood']
neighborhood_lats = paris_df['Latitude']
neighborhood_longs = paris_df['Longitude']
paris_venues = getNearbyVenues(neighborhood_names, neighborhood_lats, neighborhood_longs)

Saint-Gervais
Saint-Thomas-d'Aquin
Porte-Saint-Denis
Saint-Germain-l'Auxerrois
Villette
Val-de-Grâce
Necker
Père-Lachaise
La Chapelle
Europe
Sainte-Marguerite
Parc-de-Montsouris
Saint-Lambert
Monnaie
Odéon
Champs-Elysées
Maison-Blanche
Croulebarbe
Arsenal
Jardin-des-Plantes
Porte-Saint-Martin
Roquette
Picpus
Plaisance
Batignolles
Saint-Merri
Notre-Dame
Gros-Caillou
Mail
Bonne-Nouvelle
Gare
Clignancourt
Goutte-d'Or
Invalides
Faubourg-Montmartre
Gaillon
Amérique
Notre-Dame-des-Champs
Petit-Montrouge
Pont-de-Flandre
Ecole-Militaire
Muette
Grenelle
Chaillot
Auteuil
Epinettes
Sainte-Avoie
Hôpital-Saint-Louis
Belleville
Ternes
Arts-et-Métiers
Archives
Faubourg-du-Roule
Sorbonne
Saint-Georges
Chaussée-d'Antin
Palais-Royal
Folie-Méricourt
Salpêtrière
Place-Vendôme
Combat
Charonne
Javel
Vivienne
Enfants-Rouges
Saint-Germain-des-Prés
Saint-Vincent-de-Paul
Saint-Ambroise
Bel-Air
Montparnasse
Plaine de Monceaux
Saint-Victor
Madeleine
Saint-Fargeau
Porte-Dauphine
Grandes-Carrières
Quinze-Vingts
Roc

### Let' see the dataframe of the Paris venues

In [14]:
print(paris_venues.shape)
paris_venues.head(10)

(4885, 7)


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Saint-Gervais,48.855719,2.358162,Tasca,48.856686,2.356374,Portuguese Restaurant
1,Saint-Gervais,48.855719,2.358162,Miznon,48.857201,2.358957,Israeli Restaurant
2,Saint-Gervais,48.855719,2.358162,Murciano Boulangerie et Patisserie,48.856984,2.359789,Bakery
3,Saint-Gervais,48.855719,2.358162,Aux Merveilleux de Fred,48.855686,2.356369,Dessert Shop
4,Saint-Gervais,48.855719,2.358162,Autour du Saumon,48.855587,2.357802,Scandinavian Restaurant
5,Saint-Gervais,48.855719,2.358162,Jardin de l'Hôtel de Sens,48.853842,2.358404,Garden
6,Saint-Gervais,48.855719,2.358162,Florence Kahn,48.857242,2.359057,Deli / Bodega
7,Saint-Gervais,48.855719,2.358162,Grom,48.856737,2.356933,Ice Cream Shop
8,Saint-Gervais,48.855719,2.358162,Vingt Vins d'Art,48.855214,2.35794,Wine Bar
9,Saint-Gervais,48.855719,2.358162,Comme à Lisbonne,48.856767,2.356462,Café


Let's get the number of venues for each neighborhood

In [15]:
paris_venues.groupby('Neighborhood').count()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Amérique,10,10,10,10,10,10
Archives,100,100,100,100,100,100
Arsenal,64,64,64,64,64,64
Arts-et-Métiers,100,100,100,100,100,100
Auteuil,14,14,14,14,14,14
...,...,...,...,...,...,...
Sorbonne,100,100,100,100,100,100
Ternes,64,64,64,64,64,64
Val-de-Grâce,51,51,51,51,51,51
Villette,44,44,44,44,44,44


Let's see the top 20 frequent categories

In [16]:
df_paris_categories= paris_venues.groupby('Venue Category').count()
print(df_paris_categories.shape)
df_paris_categories.sort_values(by="Venue",ascending=False).head(15)

(302, 6)


Unnamed: 0_level_0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude
Venue Category,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
French Restaurant,676,676,676,676,676,676
Hotel,422,422,422,422,422,422
Italian Restaurant,194,194,194,194,194,194
Bar,157,157,157,157,157,157
Japanese Restaurant,134,134,134,134,134,134
Bakery,132,132,132,132,132,132
Café,120,120,120,120,120,120
Bistro,109,109,109,109,109,109
Plaza,106,106,106,106,106,106
Wine Bar,90,90,90,90,90,90


Let's get the total list of venues. Then we are going to difine the list of venues for each defined categories. I remember that we define 3 kind of needs:

* Family environments an cultural environment: is the location suitable for kids and families?. Are there any parks, schools, is it safe? Are there museums, cinemas and theaters?
* Nightlife environments: is the location suibale for a person that enjoys go out frequently. How many bars, restaurants, discos are there
* Service facilities: how many service facilities are there? such as hotels, banks, train stations, spa, gyms?


In [17]:
paris_venues['Venue Category'].unique().tolist()


['Portuguese Restaurant',
 'Israeli Restaurant',
 'Bakery',
 'Dessert Shop',
 'Scandinavian Restaurant',
 'Garden',
 'Deli / Bodega',
 'Ice Cream Shop',
 'Wine Bar',
 'Café',
 'Bookstore',
 'Restaurant',
 'Art Museum',
 'French Restaurant',
 'Falafel Restaurant',
 'Italian Restaurant',
 'Cupcake Shop',
 'Coffee Shop',
 'Art Gallery',
 'Burger Joint',
 'Creperie',
 'Hotel',
 'Tea Room',
 'Chocolate Shop',
 'Pastry Shop',
 'Thai Restaurant',
 'Cultural Center',
 'Cocktail Bar',
 "Men's Store",
 'Cosmetics Shop',
 'Jewish Restaurant',
 'Burgundian Restaurant',
 'Sushi Restaurant',
 'Lyonese Bouchon',
 'Clothing Store',
 'Pedestrian Plaza',
 'Memorial Site',
 'Tapas Restaurant',
 'Miscellaneous Shop',
 'Gourmet Shop',
 'Pizza Place',
 'Salon / Barbershop',
 'Boutique',
 'Bistro',
 'Plaza',
 'Gastropub',
 'Bar',
 'Perfume Shop',
 'African Restaurant',
 'Pub',
 'Shoe Store',
 'Lingerie Store',
 'Furniture / Home Store',
 'Arts & Crafts Store',
 "Women's Store",
 'Cheese Shop',
 'Seafood Rest

## Let's now define the list of venues for each of the needs and environments

1. For the first category: family and cultural venues. We have:

In [18]:
paris_fam_cultural_venues=['Bookstore','Garden','Park','Theater','Art Museum','Historic Site','Pedestrian Plaza','Art Gallery','Indie Movie Theater',
                           'Museum','Concert Hall','Comedy Club','Arts & Crafts Store','Music Venue','Science Museum','Church','Cultural Center',
                           'Middle Eastern Restaurant','Playground']

2. For the nightlife and and restaurants. We have:

In [19]:
paris_night_rest_venues=['French Restaurant','Italian Restaurant','Bar','Japanese Restaurant','Café','Bistro','Plaza','Wine Bar',
                         'Restaurant','Coffee Shop','Pizza Place','Cocktail Bar','Sandwich Place','Thai Restaurant','Ice Cream Shop',
                         'Chinese Restaurant','Indian Restaurant','Vietnamese Restaurant','Tea Room','Burger Joint','Seafood Restaurant',
                         'Asian Restaurant','Creperie','Korean Restaurant','Sushi Restaurant','Dessert Shop','Salad Place',
                         'Vegetarian / Vegan Restaurant','Pub','Beer Bar','Tapas Restaurant','Hotel Bar','Moroccan Restaurant','Gastropub',
                         'Steakhouse','Mexican Restaurant','Diner','Brasserie','Lebanese Restaurant','Breakfast Spot','Fast Food Restaurant',
                         'Greek Restaurant','Falafel Restaurant','Food & Drink Shop','Mediterranean Restaurant','Argentinian Restaurant','Juice Bar',
                         'African Restaurant','Ethiopian Restaurant','American Restaurant','Nightclub','Noodle House','Liquor Store','Ramen Restaurant',
                         'Turkish Restaurant','Udon Restaurant','Cajun / Creole Restaurant','Lounge','Portuguese Restaurant','Scandinavian Restaurant',
                         'Bubble Tea Shop','Corsican Restaurant','Food Truck','Fountain','Israeli Restaurant','Movie Theater','Peruvian Restaurant',
                         'Basque Restaurant','Fish & Chips Shop','New American Restaurant','Southwestern French Restaurant']

3. And finally for the service, facilities we have:

In [20]:
paris_services_venues = ['Hotel','Bakery','Supermarket','Pastry Shop','Clothing Store','Cheese Shop','Gym / Fitness Center','Boutique',
                         'Cosmetics Shop','Chocolate Shop','Gourmet Shop','Convenience Store','Spa','Farmers Market','Grocery Store','Wine Shop',
                         'Bagel Shop','Deli / Bodega','Furniture / Home Store','Candy Store','Beer Store','Bike Rental / Bike Share','Bus Stop',
                         'Jewelry Store','Miscellaneous Shop','Pool','Gym','Cupcake Shop','Perfume Shop','Department Store','Multiplex',
                         'Shoe Store','Electronics Store','Hostel','Metro Station','Record Shop','Tailor Shop','Toy / Game Store',
                         'Accessories Store','Souvlaki Shop','Train Station','Tram Station','Yoga Studio']

## Let's apply the one hot encoding and grouped all the venues for each neighborhood

In [21]:
paris_onehot = pd.get_dummies(paris_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
paris_onehot['Neighborhood'] = paris_venues['Neighborhood'] 

paris_grouped = paris_onehot.groupby('Neighborhood').mean().reset_index()
paris_grouped

Unnamed: 0,Neighborhood,Accessories Store,African Restaurant,American Restaurant,Antique Shop,Arepa Restaurant,Argentinian Restaurant,Art Gallery,Art Museum,Arts & Crafts Store,...,Vegetarian / Vegan Restaurant,Venezuelan Restaurant,Video Game Store,Vietnamese Restaurant,Wine Bar,Wine Shop,Women's Store,Yoga Studio,Zoo,Zoo Exhibit
0,Amérique,0.0,0.0,0.00000,0.000000,0.0,0.00,0.00,0.00,0.00,...,0.00000,0.0,0.0,0.000000,0.000000,0.000000,0.000000,0.00,0.0,0.0
1,Archives,0.0,0.0,0.00000,0.000000,0.0,0.00,0.04,0.02,0.00,...,0.00000,0.0,0.0,0.000000,0.000000,0.000000,0.000000,0.00,0.0,0.0
2,Arsenal,0.0,0.0,0.00000,0.000000,0.0,0.00,0.00,0.00,0.00,...,0.03125,0.0,0.0,0.000000,0.015625,0.000000,0.000000,0.00,0.0,0.0
3,Arts-et-Métiers,0.0,0.0,0.00000,0.000000,0.0,0.01,0.01,0.00,0.00,...,0.01000,0.0,0.0,0.040000,0.050000,0.000000,0.000000,0.00,0.0,0.0
4,Auteuil,0.0,0.0,0.00000,0.000000,0.0,0.00,0.00,0.00,0.00,...,0.00000,0.0,0.0,0.000000,0.000000,0.000000,0.000000,0.00,0.0,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
75,Sorbonne,0.0,0.0,0.00000,0.000000,0.0,0.01,0.00,0.00,0.01,...,0.00000,0.0,0.0,0.010000,0.030000,0.000000,0.000000,0.01,0.0,0.0
76,Ternes,0.0,0.0,0.03125,0.000000,0.0,0.00,0.00,0.00,0.00,...,0.00000,0.0,0.0,0.015625,0.015625,0.000000,0.000000,0.00,0.0,0.0
77,Val-de-Grâce,0.0,0.0,0.00000,0.000000,0.0,0.00,0.00,0.00,0.00,...,0.00000,0.0,0.0,0.000000,0.000000,0.000000,0.000000,0.00,0.0,0.0
78,Villette,0.0,0.0,0.00000,0.000000,0.0,0.00,0.00,0.00,0.00,...,0.00000,0.0,0.0,0.000000,0.000000,0.000000,0.000000,0.00,0.0,0.0


### Let's now sort and display the top 10 venues for each neighborhood

In [22]:
#Let's create the function that sorts the venues for each neighborhood

def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [23]:
#Now let's put this into a dataframe and display them

num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
paris_venues_sorted = pd.DataFrame(columns=columns)
paris_venues_sorted['Neighborhood'] = paris_grouped['Neighborhood']

for ind in np.arange(paris_grouped.shape[0]):
    paris_venues_sorted.iloc[ind, 1:] = return_most_common_venues(paris_grouped.iloc[ind, :], num_top_venues)

paris_venues_sorted.head()



Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Amérique,Supermarket,Pool,Café,Grocery Store,Park,French Restaurant,Bistro,Bed & Breakfast,Street Art,Pedestrian Plaza
1,Archives,French Restaurant,Hotel,Bar,Coffee Shop,Clothing Store,Art Gallery,Plaza,Bookstore,Pizza Place,Japanese Restaurant
2,Arsenal,French Restaurant,Hotel,Gastropub,Italian Restaurant,Plaza,Thai Restaurant,Park,Cocktail Bar,Vegetarian / Vegan Restaurant,Tapas Restaurant
3,Arts-et-Métiers,French Restaurant,Hotel,Wine Bar,Coffee Shop,Restaurant,Chinese Restaurant,Vietnamese Restaurant,Italian Restaurant,Cocktail Bar,Bar
4,Auteuil,Stadium,Tennis Court,Garden,French Restaurant,Outdoors & Recreation,Botanical Garden,Plaza,Office,Museum,Racecourse


## We have now all the datasets in order to continue with the metedology!!

# 3. Methodology

The objective, goal is to propose to the clients the neighboorhoods that will satisfy certain need. For that we need the grouped the similar neighboorhoods that share the same frequency of venues for each of the category.

**Since we want to talk about grouping or segment  different elements based on similiraties or disimilarities we will use and unsupervised machine learning technique called Clustering**

- We will use specifically the K-means algorithm. We do not use the DBSCAN algorithm because we are not searching for anomalies. We do not want to leave any neighborhood behind

**So let's start!!!**

### 3.1 Clustering the city considering all the venues

In this part we are going to segment our city, without taking into consideration the 3 scenarios or needs that we define. That means we are going to group the neighborhoods by taking into account all the venues collected

In general we are going to get 5 different clusters for all 80 neighboorhods for each scenario

In [24]:
#Let's import all the libraries

!pip install scikit-learn==0.23.1
from sklearn.cluster import KMeans

Collecting scikit-learn==0.23.1
  Downloading scikit_learn-0.23.1-cp37-cp37m-manylinux1_x86_64.whl (6.8 MB)
[K     |████████████████████████████████| 6.8 MB 11.3 MB/s eta 0:00:01�██████████▋          | 4.6 MB 11.3 MB/s eta 0:00:01
Installing collected packages: scikit-learn
  Attempting uninstall: scikit-learn
    Found existing installation: scikit-learn 0.24.2
    Uninstalling scikit-learn-0.24.2:
      Successfully uninstalled scikit-learn-0.24.2
Successfully installed scikit-learn-0.23.1


In [25]:
# We apply the K_means algorithm clustering

kclusters = 5
paris_clust_general = paris_grouped.drop('Neighborhood', 1)

#apply k-means algorithm

kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(paris_clust_general)
kmeans.labels_[0:10]


array([2, 2, 0, 2, 2, 0, 4, 0, 2, 2], dtype=int32)

We are now merge the dataframes

In [26]:
paris_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

paris_gen_cluster = paris_df

paris_gen_cluster = paris_gen_cluster.join(paris_venues_sorted.set_index('Neighborhood'), on='Neighborhood')

paris_gen_cluster.head(10) 

Unnamed: 0,Neighborhood,District,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Saint-Gervais,4,48.855719,2.358162,2,French Restaurant,Ice Cream Shop,Italian Restaurant,Hotel,Cosmetics Shop,Pastry Shop,Cultural Center,Plaza,Thai Restaurant,Wine Bar
1,Saint-Thomas-d'Aquin,7,48.855263,2.325588,0,French Restaurant,Hotel,Coffee Shop,Bakery,Cheese Shop,Historic Site,Art Gallery,Garden,Restaurant,Café
2,Porte-Saint-Denis,10,48.873618,2.352283,3,Hotel,French Restaurant,Bistro,Vegetarian / Vegan Restaurant,Japanese Restaurant,Restaurant,Bar,Bakery,Pizza Place,Café
3,Saint-Germain-l'Auxerrois,1,48.86065,2.33491,3,Hotel,Plaza,French Restaurant,Historic Site,Art Museum,Italian Restaurant,Café,Boutique,Fountain,Park
4,Villette,19,48.887661,2.374468,2,Café,Hotel,Bar,French Restaurant,Japanese Restaurant,Supermarket,Multiplex,Metro Station,Diner,Plaza
5,Val-de-Grâce,5,48.841684,2.343861,3,Hotel,French Restaurant,Bar,Bistro,Café,Creperie,Italian Restaurant,Bakery,Ice Cream Shop,Brewery
6,Necker,15,48.842711,2.310777,3,Hotel,French Restaurant,Dessert Shop,Bar,Pharmacy,Café,Japanese Restaurant,Gym / Fitness Center,Korean BBQ Restaurant,Italian Restaurant
7,Père-Lachaise,20,48.863719,2.395273,2,Bistro,Bakery,Wine Bar,French Restaurant,Bar,Italian Restaurant,Music Venue,Bus Line,Restaurant,Cemetery
8,La Chapelle,18,48.894012,2.364387,2,French Restaurant,Soccer Field,Mexican Restaurant,Thai Restaurant,Diner,Wine Bar,Vietnamese Restaurant,Supermarket,Asian Restaurant,Chinese Restaurant
9,Europe,8,48.878148,2.317175,0,French Restaurant,Hotel,Restaurant,Italian Restaurant,Wine Shop,Sandwich Place,Pizza Place,Thai Restaurant,Middle Eastern Restaurant,Lawyer


### Let 's now visualize our first general cluster

In [27]:
import matplotlib.cm as cm
import matplotlib.colors as colors

map_paris_cluster_gen = folium.Map(location=[loc.latitude, loc.longitude],zoom_start=13)

#Set colors for each cluster
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

#add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(paris_gen_cluster['Latitude'], paris_gen_cluster['Longitude'], paris_gen_cluster['Neighborhood'], paris_gen_cluster['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[int(cluster)-1],
        fill=True,
        fill_color=rainbow[int(cluster)-1],
        fill_opacity=0.7).add_to(map_paris_cluster_gen)
       
map_paris_cluster_gen

## Let's examine each cluster 

### Cluster 1

In [28]:
paris_gen_cluster.loc[paris_gen_cluster['Cluster Labels'] == 0, paris_gen_cluster.columns[[0] + list(range(5,paris_gen_cluster.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
1,Saint-Thomas-d'Aquin,French Restaurant,Hotel,Coffee Shop,Bakery,Cheese Shop,Historic Site,Art Gallery,Garden,Restaurant,Café
9,Europe,French Restaurant,Hotel,Restaurant,Italian Restaurant,Wine Shop,Sandwich Place,Pizza Place,Thai Restaurant,Middle Eastern Restaurant,Lawyer
10,Sainte-Marguerite,French Restaurant,Hotel,Bar,Korean Restaurant,Bistro,Pastry Shop,Thai Restaurant,Pizza Place,Convenience Store,Garden
12,Saint-Lambert,French Restaurant,Bakery,Sushi Restaurant,Italian Restaurant,Supermarket,Café,Hotel,Plaza,Burger Joint,Bus Stop
17,Croulebarbe,French Restaurant,Sushi Restaurant,Hotel,Bar,Bakery,Sandwich Place,Italian Restaurant,Supermarket,Thai Restaurant,Moroccan Restaurant
18,Arsenal,French Restaurant,Hotel,Gastropub,Italian Restaurant,Plaza,Thai Restaurant,Park,Cocktail Bar,Vegetarian / Vegan Restaurant,Tapas Restaurant
24,Batignolles,French Restaurant,Hotel,Bar,Italian Restaurant,Restaurant,Japanese Restaurant,Café,Bistro,Park,Indian Restaurant
25,Saint-Merri,French Restaurant,Ice Cream Shop,Coffee Shop,Art Gallery,Tea Room,Plaza,Hotel,Bakery,Restaurant,Sushi Restaurant
26,Notre-Dame,French Restaurant,Bakery,Plaza,Japanese Restaurant,Ice Cream Shop,Wine Bar,Tapas Restaurant,Bookstore,Cupcake Shop,Hotel
27,Gros-Caillou,French Restaurant,Italian Restaurant,Hotel,Café,Ice Cream Shop,Bistro,Pizza Place,Dessert Shop,Coffee Shop,Burger Joint


 We can say that in this first cluster the most important venues are french restaurants, bar and hotels. This neighborhoods seem to be very visited by locals and tourist.
 
 In the map this first cluster is shown in the red spots

### Cluster 2

In [29]:
paris_gen_cluster.loc[paris_gen_cluster['Cluster Labels'] == 1, paris_gen_cluster.columns[[0] + list(range(5,paris_gen_cluster.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
22,Picpus,Locksmith,Accessories Store,Paper / Office Supplies Store,Peruvian Restaurant,Perfume Shop,Performing Arts Venue,Pedestrian Plaza,Pastry Shop,Park,Outdoors & Recreation


We only one neighborhood in this cluster. The most importan venues are shops and services. 

We can see there are open spaces like pedestrian plazas and parks. It is not a crowded area, or a least much less crowded than cluster 1

In the map the second cluster is shown with the purple spot

### Cluster 3

In [30]:
paris_gen_cluster.loc[paris_gen_cluster['Cluster Labels'] == 2, paris_gen_cluster.columns[[0] + list(range(5,paris_gen_cluster.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Saint-Gervais,French Restaurant,Ice Cream Shop,Italian Restaurant,Hotel,Cosmetics Shop,Pastry Shop,Cultural Center,Plaza,Thai Restaurant,Wine Bar
4,Villette,Café,Hotel,Bar,French Restaurant,Japanese Restaurant,Supermarket,Multiplex,Metro Station,Diner,Plaza
7,Père-Lachaise,Bistro,Bakery,Wine Bar,French Restaurant,Bar,Italian Restaurant,Music Venue,Bus Line,Restaurant,Cemetery
8,La Chapelle,French Restaurant,Soccer Field,Mexican Restaurant,Thai Restaurant,Diner,Wine Bar,Vietnamese Restaurant,Supermarket,Asian Restaurant,Chinese Restaurant
13,Monnaie,French Restaurant,Plaza,Wine Bar,Hotel,Italian Restaurant,Wine Shop,Cocktail Bar,Restaurant,Ice Cream Shop,Bar
14,Odéon,French Restaurant,Hotel,Café,Plaza,Italian Restaurant,Garden,Fountain,Bakery,Bistro,Ice Cream Shop
16,Maison-Blanche,French Restaurant,Supermarket,Farmers Market,Café,Bus Stop,Bistro,Plaza,Park,Pizza Place,Southwestern French Restaurant
19,Jardin-des-Plantes,French Restaurant,Hotel,Science Museum,Garden,Greek Restaurant,Bakery,Tea Room,Korean Restaurant,Botanical Garden,Italian Restaurant
20,Porte-Saint-Martin,Coffee Shop,French Restaurant,Theater,Hotel,Bakery,Cheese Shop,Italian Restaurant,Indian Restaurant,Breakfast Spot,Cocktail Bar
21,Roquette,Bar,French Restaurant,Italian Restaurant,Supermarket,Bistro,Bakery,Cocktail Bar,Hotel,Record Shop,Pizza Place


This cluster is similar to the first cluster. 

However we see more varieties of services, shops and open spaces like parks and gardens. 

In the map this cluster is shown with blue spots

### Cluster 4

In [31]:
paris_gen_cluster.loc[paris_gen_cluster['Cluster Labels'] == 3, paris_gen_cluster.columns[[0] + list(range(5,paris_gen_cluster.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
2,Porte-Saint-Denis,Hotel,French Restaurant,Bistro,Vegetarian / Vegan Restaurant,Japanese Restaurant,Restaurant,Bar,Bakery,Pizza Place,Café
3,Saint-Germain-l'Auxerrois,Hotel,Plaza,French Restaurant,Historic Site,Art Museum,Italian Restaurant,Café,Boutique,Fountain,Park
5,Val-de-Grâce,Hotel,French Restaurant,Bar,Bistro,Café,Creperie,Italian Restaurant,Bakery,Ice Cream Shop,Brewery
6,Necker,Hotel,French Restaurant,Dessert Shop,Bar,Pharmacy,Café,Japanese Restaurant,Gym / Fitness Center,Korean BBQ Restaurant,Italian Restaurant
11,Parc-de-Montsouris,Italian Restaurant,Japanese Restaurant,Hotel,Athletics & Sports,Café,Indian Restaurant,French Restaurant,Chinese Restaurant,Theater,Bistro
15,Champs-Elysées,French Restaurant,Hotel,Boutique,Women's Store,Italian Restaurant,Japanese Restaurant,Steakhouse,Plaza,Garden,Salad Place
23,Plaisance,Hotel,French Restaurant,Café,Bistro,Bar,Supermarket,Japanese Restaurant,Bakery,Grocery Store,Restaurant
34,Faubourg-Montmartre,French Restaurant,Hotel,Italian Restaurant,Japanese Restaurant,Pizza Place,Vegetarian / Vegan Restaurant,Coffee Shop,Chinese Restaurant,Wine Bar,Sandwich Place
38,Petit-Montrouge,Hotel,French Restaurant,Supermarket,Italian Restaurant,Bistro,Discount Store,Plaza,Sandwich Place,Bakery,Japanese Restaurant
40,Ecole-Militaire,Hotel,French Restaurant,Plaza,Diner,Café,Gym,Bistro,Garden,Asian Restaurant,Farmers Market


This cluster is and hybrid of the first and third cluster. We can find a lot of restaurants and hotels like the first cluster. 

This cluster has also many shops and services facilities like the third cluster but less open spaces. 

In the map this cluster is shown with the green spots

### Cluster 5

In [32]:
paris_gen_cluster.loc[paris_gen_cluster['Cluster Labels'] == 4, paris_gen_cluster.columns[[0] + list(range(5,paris_gen_cluster.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
68,Bel-Air,French Restaurant,Sports Club,Café,Playground,Plaza,Paper / Office Supplies Store,Perfume Shop,Performing Arts Venue,Pedestrian Plaza,Pastry Shop


As Cluster 2. This cluster has one neighborhood. It is similar to cluster 2 because there are many open spaces like playgorunds and plazas. 

However there are more restaurants thant the cluster 2 and the shops are different

In the map the neighborhood is shown in orange

### 3.2 Clustering the city considering the family, cultural venues

Let's do the same exercise of clustering. But now we are going to take into acount only the venues that are considered suitable for family and cultural environments

The venues were defined in the paris_fam_cultural_venues list


In [33]:
# Let's create a data frame only with this family cultural venues
paris_family_cultural = paris_grouped[['Neighborhood','Bookstore','Garden','Park','Theater','Art Museum','Historic Site','Pedestrian Plaza','Art Gallery','Indie Movie Theater',
                           'Museum','Concert Hall','Comedy Club','Arts & Crafts Store','Music Venue','Science Museum','Church','Cultural Center',
                           'Middle Eastern Restaurant','Playground']]
paris_family_cultural.head()


Unnamed: 0,Neighborhood,Bookstore,Garden,Park,Theater,Art Museum,Historic Site,Pedestrian Plaza,Art Gallery,Indie Movie Theater,Museum,Concert Hall,Comedy Club,Arts & Crafts Store,Music Venue,Science Museum,Church,Cultural Center,Middle Eastern Restaurant,Playground
0,Amérique,0.0,0.0,0.1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,Archives,0.03,0.02,0.02,0.0,0.02,0.02,0.0,0.04,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.01,0.0,0.0
2,Arsenal,0.0,0.015625,0.03125,0.0,0.0,0.0,0.015625,0.0,0.0,0.015625,0.0,0.0,0.0,0.015625,0.0,0.0,0.0,0.0,0.0
3,Arts-et-Métiers,0.01,0.0,0.01,0.01,0.0,0.01,0.0,0.01,0.0,0.01,0.01,0.01,0.0,0.0,0.0,0.01,0.0,0.0,0.0
4,Auteuil,0.0,0.071429,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.071429,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [34]:
#Now let's put into a draframe the top 7 common venues for this scenario

num_top_venues = 7

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
paris_family_sorted = pd.DataFrame(columns=columns)
paris_family_sorted['Neighborhood'] = paris_family_cultural['Neighborhood']

for ind in np.arange(paris_family_cultural.shape[0]):
    paris_family_sorted.iloc[ind, 1:] = return_most_common_venues(paris_family_cultural.iloc[ind, :], num_top_venues)

paris_family_sorted.head()


Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue
0,Amérique,Park,Bookstore,Concert Hall,Middle Eastern Restaurant,Cultural Center,Church,Science Museum
1,Archives,Art Gallery,Bookstore,Park,Art Museum,Historic Site,Garden,Cultural Center
2,Arsenal,Park,Museum,Pedestrian Plaza,Music Venue,Garden,Comedy Club,Middle Eastern Restaurant
3,Arts-et-Métiers,Bookstore,Art Gallery,Church,Comedy Club,Concert Hall,Museum,Historic Site
4,Auteuil,Museum,Garden,Concert Hall,Middle Eastern Restaurant,Cultural Center,Church,Science Museum


In [35]:
#Let's now reaply the k-means algorithm

paris_clust_family_cultural = paris_family_cultural.drop('Neighborhood', 1)

#apply k-means algorithm

kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(paris_clust_family_cultural)
kmeans.labels_[0:10]


array([4, 1, 1, 3, 0, 3, 2, 1, 3, 3], dtype=int32)

Let's merge the dataframe

In [36]:
paris_family_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

paris_family_cluster = paris_df

paris_family_cluster = paris_family_cluster.join(paris_family_sorted.set_index('Neighborhood'), on='Neighborhood')

paris_family_cluster.head(10) 

Unnamed: 0,Neighborhood,District,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue
0,Saint-Gervais,4,48.855719,2.358162,1,Cultural Center,Bookstore,Art Museum,Art Gallery,Arts & Crafts Store,Pedestrian Plaza,Garden
1,Saint-Thomas-d'Aquin,7,48.855263,2.325588,1,Historic Site,Art Gallery,Garden,Bookstore,Pedestrian Plaza,Arts & Crafts Store,Middle Eastern Restaurant
2,Porte-Saint-Denis,10,48.873618,2.352283,3,Indie Movie Theater,Comedy Club,Bookstore,Concert Hall,Middle Eastern Restaurant,Cultural Center,Church
3,Saint-Germain-l'Auxerrois,1,48.86065,2.33491,1,Art Museum,Historic Site,Arts & Crafts Store,Park,Pedestrian Plaza,Church,Garden
4,Villette,19,48.887661,2.374468,3,Middle Eastern Restaurant,Bookstore,Concert Hall,Cultural Center,Church,Science Museum,Music Venue
5,Val-de-Grâce,5,48.841684,2.343861,3,Church,Comedy Club,Bookstore,Concert Hall,Middle Eastern Restaurant,Cultural Center,Science Museum
6,Necker,15,48.842711,2.310777,3,Bookstore,Concert Hall,Middle Eastern Restaurant,Cultural Center,Church,Science Museum,Music Venue
7,Père-Lachaise,20,48.863719,2.395273,3,Bookstore,Theater,Music Venue,Playground,Art Museum,Historic Site,Pedestrian Plaza
8,La Chapelle,18,48.894012,2.364387,3,Bookstore,Concert Hall,Middle Eastern Restaurant,Cultural Center,Church,Science Museum,Music Venue
9,Europe,8,48.878148,2.317175,3,Middle Eastern Restaurant,Art Museum,Bookstore,Concert Hall,Cultural Center,Church,Science Museum


##  Let 's now visualize the family, cultural cluster

In [37]:
map_paris_cluster_family = folium.Map(location=[loc.latitude, loc.longitude],zoom_start=13)

#Set colors for each cluster
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

#add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(paris_family_cluster['Latitude'], paris_family_cluster['Longitude'], paris_family_cluster['Neighborhood'], paris_family_cluster['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[int(cluster)-1],
        fill=True,
        fill_color=rainbow[int(cluster)-1],
        fill_opacity=0.7).add_to(map_paris_cluster_family)
       
map_paris_cluster_family

## Let's examine each cluster in details

### Cluster 1

In [38]:
paris_family_cluster.loc[paris_family_cluster['Cluster Labels'] == 0, paris_family_cluster.columns[[0] + list(range(5,paris_family_cluster.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue
19,Jardin-des-Plantes,Science Museum,Garden,Museum,Historic Site,Indie Movie Theater,Comedy Club,Middle Eastern Restaurant
40,Ecole-Militaire,Garden,Bookstore,Concert Hall,Middle Eastern Restaurant,Cultural Center,Church,Science Museum
44,Auteuil,Museum,Garden,Concert Hall,Middle Eastern Restaurant,Cultural Center,Church,Science Museum
45,Epinettes,Pedestrian Plaza,Garden,Bookstore,Comedy Club,Middle Eastern Restaurant,Cultural Center,Church


If you love gardens and general museums these neighborhoods are ideal 

The locations are shown in red in the map

### Cluster 2

In [39]:
paris_family_cluster.loc[paris_family_cluster['Cluster Labels'] == 1, paris_family_cluster.columns[[0] + list(range(5,paris_family_cluster.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue
0,Saint-Gervais,Cultural Center,Bookstore,Art Museum,Art Gallery,Arts & Crafts Store,Pedestrian Plaza,Garden
1,Saint-Thomas-d'Aquin,Historic Site,Art Gallery,Garden,Bookstore,Pedestrian Plaza,Arts & Crafts Store,Middle Eastern Restaurant
3,Saint-Germain-l'Auxerrois,Art Museum,Historic Site,Arts & Crafts Store,Park,Pedestrian Plaza,Church,Garden
13,Monnaie,Bookstore,Park,Indie Movie Theater,Garden,Museum,Historic Site,Pedestrian Plaza
14,Odéon,Garden,Bookstore,Art Museum,Comedy Club,Playground,Historic Site,Pedestrian Plaza
15,Champs-Elysées,Garden,Theater,Historic Site,Park,Art Gallery,Science Museum,Music Venue
16,Maison-Blanche,Park,Garden,Bookstore,Comedy Club,Middle Eastern Restaurant,Cultural Center,Church
18,Arsenal,Park,Museum,Pedestrian Plaza,Music Venue,Garden,Comedy Club,Middle Eastern Restaurant
25,Saint-Merri,Art Gallery,Pedestrian Plaza,Bookstore,Park,Art Museum,Historic Site,Garden
26,Notre-Dame,Bookstore,Park,Cultural Center,Garden,Comedy Club,Middle Eastern Restaurant,Church


This is the zone for the art galleries and art museums

The neighborhoods of this cluster are shown in purple in the map

## Cluster 3

In [40]:
paris_family_cluster.loc[paris_family_cluster['Cluster Labels'] == 2, paris_family_cluster.columns[[0] + list(range(5,paris_family_cluster.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue
68,Bel-Air,Playground,Concert Hall,Middle Eastern Restaurant,Cultural Center,Church,Science Museum,Music Venue


This cluster has only one neighborhood. 

We do not find a lot museums nearby. But this neighborhood is good if you like open spaces, concerts and general cultural activities.

The neighborhood is shown in blue in the map

### Cluster 4

In [41]:
paris_family_cluster.loc[paris_family_cluster['Cluster Labels'] == 3, paris_family_cluster.columns[[0] + list(range(5,paris_family_cluster.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue
2,Porte-Saint-Denis,Indie Movie Theater,Comedy Club,Bookstore,Concert Hall,Middle Eastern Restaurant,Cultural Center,Church
4,Villette,Middle Eastern Restaurant,Bookstore,Concert Hall,Cultural Center,Church,Science Museum,Music Venue
5,Val-de-Grâce,Church,Comedy Club,Bookstore,Concert Hall,Middle Eastern Restaurant,Cultural Center,Science Museum
6,Necker,Bookstore,Concert Hall,Middle Eastern Restaurant,Cultural Center,Church,Science Museum,Music Venue
7,Père-Lachaise,Bookstore,Theater,Music Venue,Playground,Art Museum,Historic Site,Pedestrian Plaza
8,La Chapelle,Bookstore,Concert Hall,Middle Eastern Restaurant,Cultural Center,Church,Science Museum,Music Venue
9,Europe,Middle Eastern Restaurant,Art Museum,Bookstore,Concert Hall,Cultural Center,Church,Science Museum
10,Sainte-Marguerite,Arts & Crafts Store,Garden,Bookstore,Concert Hall,Middle Eastern Restaurant,Cultural Center,Church
12,Saint-Lambert,Bookstore,Park,Theater,Middle Eastern Restaurant,Comedy Club,Cultural Center,Church
17,Croulebarbe,Museum,Park,Indie Movie Theater,Comedy Club,Middle Eastern Restaurant,Cultural Center,Church


This is the cluster with the most number of neighborhoods. 

Here you can find many cultural shops like bookstores, as well as general museums and theaters.
 
It has many cultural centers too and gardens. In general a very versatile cluster

The neighborhoods are shown in green

### Cluster 5

In [42]:
paris_family_cluster.loc[paris_family_cluster['Cluster Labels'] == 4, paris_family_cluster.columns[[0] + list(range(5,paris_family_cluster.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue
11,Parc-de-Montsouris,Park,Theater,Bookstore,Comedy Club,Middle Eastern Restaurant,Cultural Center,Church
36,Amérique,Park,Bookstore,Concert Hall,Middle Eastern Restaurant,Cultural Center,Church,Science Museum
60,Combat,Park,Art Museum,Bookstore,Comedy Club,Middle Eastern Restaurant,Cultural Center,Church
73,Saint-Fargeau,Park,Bookstore,Concert Hall,Middle Eastern Restaurant,Cultural Center,Church,Science Museum
74,Porte-Dauphine,Museum,Park,Concert Hall,Middle Eastern Restaurant,Cultural Center,Church,Science Museum


If you are looking for parks these are the neighborhoods for you

These are shown in orange in the map

### 3.3 Clustering the city considering the nightlife venues

How about if you love nightlife, bars and restaurants. Let's find out!!
The venues were defined in the paris_night_rest_venues list

In [43]:
# Let's create a data frame only with this family cultural venues
paris_nightlife = paris_grouped[['Neighborhood','French Restaurant','Italian Restaurant','Bar','Japanese Restaurant','Café','Bistro','Plaza','Wine Bar',
                         'Restaurant','Coffee Shop','Pizza Place','Cocktail Bar','Sandwich Place','Thai Restaurant','Ice Cream Shop',
                         'Chinese Restaurant','Indian Restaurant','Vietnamese Restaurant','Tea Room','Burger Joint','Seafood Restaurant',
                         'Asian Restaurant','Creperie','Korean Restaurant','Sushi Restaurant','Dessert Shop','Salad Place',
                         'Vegetarian / Vegan Restaurant','Pub','Beer Bar','Tapas Restaurant','Hotel Bar','Moroccan Restaurant','Gastropub',
                         'Steakhouse','Mexican Restaurant','Diner','Brasserie','Lebanese Restaurant','Breakfast Spot','Fast Food Restaurant',
                         'Greek Restaurant','Falafel Restaurant','Food & Drink Shop','Mediterranean Restaurant','Argentinian Restaurant','Juice Bar',
                         'African Restaurant','Ethiopian Restaurant','American Restaurant','Nightclub','Noodle House','Liquor Store','Ramen Restaurant',
                         'Turkish Restaurant','Udon Restaurant','Cajun / Creole Restaurant','Lounge','Portuguese Restaurant','Scandinavian Restaurant',
                         'Bubble Tea Shop','Corsican Restaurant','Food Truck','Fountain','Israeli Restaurant','Movie Theater','Peruvian Restaurant',
                         'Basque Restaurant','Fish & Chips Shop','New American Restaurant','Southwestern French Restaurant']]
paris_nightlife.head()

Unnamed: 0,Neighborhood,French Restaurant,Italian Restaurant,Bar,Japanese Restaurant,Café,Bistro,Plaza,Wine Bar,Restaurant,...,Corsican Restaurant,Food Truck,Fountain,Israeli Restaurant,Movie Theater,Peruvian Restaurant,Basque Restaurant,Fish & Chips Shop,New American Restaurant,Southwestern French Restaurant
0,Amérique,0.1,0.0,0.0,0.0,0.1,0.1,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,Archives,0.09,0.01,0.04,0.03,0.0,0.02,0.03,0.0,0.02,...,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0
2,Arsenal,0.171875,0.046875,0.015625,0.015625,0.0,0.0,0.046875,0.015625,0.015625,...,0.0,0.0,0.0,0.0,0.015625,0.0,0.0,0.0,0.0,0.015625
3,Arts-et-Métiers,0.11,0.03,0.03,0.0,0.01,0.01,0.0,0.05,0.04,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,Auteuil,0.071429,0.0,0.0,0.0,0.0,0.0,0.071429,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [44]:
#Now let's put into a draframe the top 7 common venues for this scenario

num_top_venues = 7

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
paris_nightlife_sorted = pd.DataFrame(columns=columns)
paris_nightlife_sorted['Neighborhood'] = paris_nightlife['Neighborhood']

for ind in np.arange(paris_family_cultural.shape[0]):
    paris_nightlife_sorted.iloc[ind, 1:] = return_most_common_venues(paris_nightlife.iloc[ind, :], num_top_venues)

paris_nightlife_sorted.head()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue
0,Amérique,French Restaurant,Café,Bistro,Juice Bar,Liquor Store,Noodle House,Nightclub
1,Archives,French Restaurant,Bar,Coffee Shop,Plaza,Pizza Place,Japanese Restaurant,Bistro
2,Arsenal,French Restaurant,Italian Restaurant,Gastropub,Plaza,Tapas Restaurant,Vegetarian / Vegan Restaurant,Cocktail Bar
3,Arts-et-Métiers,French Restaurant,Wine Bar,Restaurant,Coffee Shop,Vietnamese Restaurant,Chinese Restaurant,Italian Restaurant
4,Auteuil,French Restaurant,Plaza,Argentinian Restaurant,Noodle House,Nightclub,American Restaurant,Ethiopian Restaurant


In [45]:
#Let's now reaply the k-means algorithm

paris_clust_nightlife = paris_nightlife.drop('Neighborhood', 1)

#apply k-means algorithm

kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(paris_clust_nightlife)
kmeans.labels_[0:10]


array([4, 2, 3, 2, 2, 3, 0, 1, 2, 2], dtype=int32)

Let's merge the dataframes

In [46]:
paris_nightlife_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

paris_nightlife_cluster = paris_df

paris_nightlife_cluster = paris_nightlife_cluster.join(paris_nightlife_sorted.set_index('Neighborhood'), on='Neighborhood')

paris_nightlife_cluster.head(10) 

Unnamed: 0,Neighborhood,District,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue
0,Saint-Gervais,4,48.855719,2.358162,2,French Restaurant,Italian Restaurant,Ice Cream Shop,Plaza,Thai Restaurant,Wine Bar,Coffee Shop
1,Saint-Thomas-d'Aquin,7,48.855263,2.325588,3,French Restaurant,Coffee Shop,Café,Restaurant,Peruvian Restaurant,Korean Restaurant,Gastropub
2,Porte-Saint-Denis,10,48.873618,2.352283,1,French Restaurant,Bistro,Vegetarian / Vegan Restaurant,Pizza Place,Bar,Japanese Restaurant,Restaurant
3,Saint-Germain-l'Auxerrois,1,48.86065,2.33491,2,French Restaurant,Plaza,Café,Italian Restaurant,Japanese Restaurant,Cocktail Bar,Korean Restaurant
4,Villette,19,48.887661,2.374468,4,Café,French Restaurant,Bar,Japanese Restaurant,Food Truck,Fast Food Restaurant,Breakfast Spot
5,Val-de-Grâce,5,48.841684,2.343861,1,French Restaurant,Bar,Bistro,Café,Italian Restaurant,Creperie,Ice Cream Shop
6,Necker,15,48.842711,2.310777,1,French Restaurant,Japanese Restaurant,Café,Dessert Shop,Bar,Sandwich Place,Gastropub
7,Père-Lachaise,20,48.863719,2.395273,4,Bistro,French Restaurant,Wine Bar,Bar,Italian Restaurant,Brasserie,Restaurant
8,La Chapelle,18,48.894012,2.364387,0,French Restaurant,Wine Bar,Diner,Asian Restaurant,Vietnamese Restaurant,Chinese Restaurant,Thai Restaurant
9,Europe,8,48.878148,2.317175,3,French Restaurant,Sandwich Place,Italian Restaurant,Restaurant,Pizza Place,Thai Restaurant,Pub


##  Let 's now visualize nightlife cluster

In [47]:
map_paris_cluster_nightlife = folium.Map(location=[loc.latitude, loc.longitude],zoom_start=13)

#Set colors for each cluster
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

#add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(paris_nightlife_cluster['Latitude'], paris_nightlife_cluster['Longitude'], paris_nightlife_cluster['Neighborhood'], paris_nightlife_cluster['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[int(cluster)-1],
        fill=True,
        fill_color=rainbow[int(cluster)-1],
        fill_opacity=0.7).add_to(map_paris_cluster_nightlife)
       
map_paris_cluster_nightlife

## Let's examine each cluster in details

### Cluster 1

In [48]:
paris_nightlife_cluster.loc[paris_nightlife_cluster['Cluster Labels'] == 0, paris_nightlife_cluster.columns[[0] + list(range(5,paris_nightlife_cluster.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue
8,La Chapelle,French Restaurant,Wine Bar,Diner,Asian Restaurant,Vietnamese Restaurant,Chinese Restaurant,Thai Restaurant
33,Invalides,French Restaurant,Plaza,Café,Italian Restaurant,Coffee Shop,Thai Restaurant,Cocktail Bar
40,Ecole-Militaire,French Restaurant,Plaza,Diner,Café,Bistro,Asian Restaurant,African Restaurant
41,Muette,French Restaurant,Diner,Lebanese Restaurant,Breakfast Spot,Fast Food Restaurant,Greek Restaurant,Falafel Restaurant
68,Bel-Air,French Restaurant,Café,Plaza,Juice Bar,Liquor Store,Noodle House,Nightclub


This first cluster is  for restaurant lovers only. We can find mostly french restaurants but many other options to eat too

However no bars nearby

Neighborhoods of this cluster are show in red

### Cluster 2

In [49]:
paris_nightlife_cluster.loc[paris_nightlife_cluster['Cluster Labels'] == 1, paris_nightlife_cluster.columns[[0] + list(range(5,paris_nightlife_cluster.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue
2,Porte-Saint-Denis,French Restaurant,Bistro,Vegetarian / Vegan Restaurant,Pizza Place,Bar,Japanese Restaurant,Restaurant
5,Val-de-Grâce,French Restaurant,Bar,Bistro,Café,Italian Restaurant,Creperie,Ice Cream Shop
6,Necker,French Restaurant,Japanese Restaurant,Café,Dessert Shop,Bar,Sandwich Place,Gastropub
10,Sainte-Marguerite,French Restaurant,Bar,Korean Restaurant,Bistro,Diner,Beer Bar,Seafood Restaurant
21,Roquette,Bar,French Restaurant,Italian Restaurant,Bistro,Cocktail Bar,Vietnamese Restaurant,Pizza Place
23,Plaisance,French Restaurant,Bar,Café,Bistro,Japanese Restaurant,Restaurant,Vegetarian / Vegan Restaurant
31,Clignancourt,French Restaurant,Bar,Bistro,Italian Restaurant,Restaurant,Pizza Place,Vietnamese Restaurant
45,Epinettes,French Restaurant,Restaurant,Bar,Ethiopian Restaurant,Noodle House,Sushi Restaurant,Burger Joint
47,Hôpital-Saint-Louis,French Restaurant,Bar,Coffee Shop,Bistro,Wine Bar,Thai Restaurant,Pizza Place
48,Belleville,French Restaurant,Bar,Japanese Restaurant,Café,Restaurant,Italian Restaurant,Thai Restaurant


In this second cluster we find also many french restaurants and a well variaty of international restaurants

The difference with the first cluster is that we found a lot of bars too. 

It seems the neighborhood to have fun

These neighborhoods are shown in purple

### Cluster 3

In [50]:
paris_nightlife_cluster.loc[paris_nightlife_cluster['Cluster Labels'] == 2, paris_nightlife_cluster.columns[[0] + list(range(5,paris_nightlife_cluster.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue
0,Saint-Gervais,French Restaurant,Italian Restaurant,Ice Cream Shop,Plaza,Thai Restaurant,Wine Bar,Coffee Shop
3,Saint-Germain-l'Auxerrois,French Restaurant,Plaza,Café,Italian Restaurant,Japanese Restaurant,Cocktail Bar,Korean Restaurant
13,Monnaie,French Restaurant,Plaza,Wine Bar,Italian Restaurant,Coffee Shop,Seafood Restaurant,Tea Room
19,Jardin-des-Plantes,French Restaurant,Greek Restaurant,Italian Restaurant,Plaza,Korean Restaurant,Tea Room,Chinese Restaurant
20,Porte-Saint-Martin,French Restaurant,Coffee Shop,Pizza Place,Italian Restaurant,Indian Restaurant,Cocktail Bar,Asian Restaurant
22,Picpus,French Restaurant,Argentinian Restaurant,Noodle House,Nightclub,American Restaurant,Ethiopian Restaurant,African Restaurant
28,Mail,French Restaurant,Wine Bar,Cocktail Bar,Italian Restaurant,Salad Place,Bar,Thai Restaurant
29,Bonne-Nouvelle,French Restaurant,Cocktail Bar,Wine Bar,Italian Restaurant,Bar,Coffee Shop,Chinese Restaurant
30,Gare,Thai Restaurant,Sandwich Place,Café,Japanese Restaurant,Vietnamese Restaurant,Coffee Shop,Creperie
32,Goutte-d'Or,Bar,Plaza,Asian Restaurant,French Restaurant,Chinese Restaurant,Food & Drink Shop,Mediterranean Restaurant


The neighborhoods of this cluster has a very good balance between restaurants and bars. 

The neighborhoods of this cluster are shown in blue

### Cluster 4

In [51]:
paris_nightlife_cluster.loc[paris_nightlife_cluster['Cluster Labels'] == 3, paris_nightlife_cluster.columns[[0] + list(range(5,paris_nightlife_cluster.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue
1,Saint-Thomas-d'Aquin,French Restaurant,Coffee Shop,Café,Restaurant,Peruvian Restaurant,Korean Restaurant,Gastropub
9,Europe,French Restaurant,Sandwich Place,Italian Restaurant,Restaurant,Pizza Place,Thai Restaurant,Pub
12,Saint-Lambert,French Restaurant,Italian Restaurant,Café,Plaza,Sushi Restaurant,Thai Restaurant,Lebanese Restaurant
15,Champs-Elysées,French Restaurant,Japanese Restaurant,Italian Restaurant,Steakhouse,Plaza,Nightclub,Bistro
17,Croulebarbe,French Restaurant,Sushi Restaurant,Bar,Italian Restaurant,Sandwich Place,Thai Restaurant,Ramen Restaurant
18,Arsenal,French Restaurant,Italian Restaurant,Gastropub,Plaza,Tapas Restaurant,Vegetarian / Vegan Restaurant,Cocktail Bar
24,Batignolles,French Restaurant,Bar,Italian Restaurant,Restaurant,Japanese Restaurant,Café,Bistro
25,Saint-Merri,French Restaurant,Ice Cream Shop,Plaza,Coffee Shop,Tea Room,Café,Sushi Restaurant
26,Notre-Dame,French Restaurant,Plaza,Japanese Restaurant,Wine Bar,Ice Cream Shop,Bar,Italian Restaurant
27,Gros-Caillou,French Restaurant,Italian Restaurant,Café,Pizza Place,Ice Cream Shop,Dessert Shop,Coffee Shop


As cluster number 1. Neighborhoods on cluster 4 offers mostly restaurants. No so many bars

However we found more italian restaurants here

Neighborhoods of this cluster are shown in green

### Cluster 5

In [52]:
paris_nightlife_cluster.loc[paris_nightlife_cluster['Cluster Labels'] == 4, paris_nightlife_cluster.columns[[0] + list(range(5,paris_nightlife_cluster.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue
4,Villette,Café,French Restaurant,Bar,Japanese Restaurant,Food Truck,Fast Food Restaurant,Breakfast Spot
7,Père-Lachaise,Bistro,French Restaurant,Wine Bar,Bar,Italian Restaurant,Brasserie,Restaurant
11,Parc-de-Montsouris,Italian Restaurant,Japanese Restaurant,French Restaurant,Chinese Restaurant,Diner,Indian Restaurant,Lebanese Restaurant
14,Odéon,French Restaurant,Café,Plaza,Ice Cream Shop,Fountain,Italian Restaurant,Bistro
16,Maison-Blanche,French Restaurant,Café,Bistro,Plaza,Pizza Place,Diner,Asian Restaurant
36,Amérique,French Restaurant,Café,Bistro,Juice Bar,Liquor Store,Noodle House,Nightclub
42,Grenelle,Japanese Restaurant,French Restaurant,Bistro,Sandwich Place,Beer Bar,Korean Restaurant,Creperie
55,Chaussée-d'Antin,Bistro,French Restaurant,Salad Place,Coffee Shop,Japanese Restaurant,Italian Restaurant,Indian Restaurant
73,Saint-Fargeau,Diner,Food Truck,Japanese Restaurant,Café,Bistro,Noodle House,Nightclub


This cluster has a versatile food option: french, italian and japanese restaurants, bistros and more calm venues such as cafés

The neighboorhoods of this cluster are shown in orange

### 3.3 Clustering the city considering the services venues (shops, hotels, transport, etc)

How about if you want that all the services venues are close. 

You like to shop, to go to gym, spas, and have all the facilities near

The venues were defined in the paris_services_venues list


In [53]:
# Let's create a data frame only with this family cultural venues
paris_services = paris_grouped[['Neighborhood','Hotel','Bakery','Supermarket','Pastry Shop','Clothing Store','Cheese Shop','Gym / Fitness Center','Boutique',
                         'Cosmetics Shop','Chocolate Shop','Gourmet Shop','Convenience Store','Spa','Farmers Market','Grocery Store','Wine Shop',
                         'Bagel Shop','Deli / Bodega','Furniture / Home Store','Candy Store','Beer Store','Bike Rental / Bike Share','Bus Stop',
                         'Jewelry Store','Miscellaneous Shop','Pool','Gym','Cupcake Shop','Perfume Shop','Department Store','Multiplex',
                         'Shoe Store','Electronics Store','Hostel','Metro Station','Record Shop','Tailor Shop','Toy / Game Store',
                         'Accessories Store','Souvlaki Shop','Train Station','Tram Station','Yoga Studio']]
paris_services.head()

Unnamed: 0,Neighborhood,Hotel,Bakery,Supermarket,Pastry Shop,Clothing Store,Cheese Shop,Gym / Fitness Center,Boutique,Cosmetics Shop,...,Hostel,Metro Station,Record Shop,Tailor Shop,Toy / Game Store,Accessories Store,Souvlaki Shop,Train Station,Tram Station,Yoga Studio
0,Amérique,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,Archives,0.06,0.01,0.01,0.01,0.04,0.02,0.0,0.0,0.01,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,Arsenal,0.09375,0.015625,0.0,0.0,0.0,0.015625,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,Arts-et-Métiers,0.09,0.01,0.01,0.0,0.0,0.0,0.0,0.01,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,Auteuil,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [54]:
#Now let's put into a draframe the top 7 common venues for this scenario

num_top_venues = 7

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
paris_services_sorted = pd.DataFrame(columns=columns)
paris_services_sorted['Neighborhood'] = paris_services['Neighborhood']

for ind in np.arange(paris_services.shape[0]):
    paris_services_sorted.iloc[ind, 1:] = return_most_common_venues(paris_services.iloc[ind, :], num_top_venues)

paris_services_sorted.head()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue
0,Amérique,Supermarket,Pool,Grocery Store,Hotel,Electronics Store,Gym,Cupcake Shop
1,Archives,Hotel,Clothing Store,Cheese Shop,Deli / Bodega,Gourmet Shop,Miscellaneous Shop,Bakery
2,Arsenal,Hotel,Gourmet Shop,Bakery,Beer Store,Perfume Shop,Bagel Shop,Spa
3,Arts-et-Métiers,Hotel,Supermarket,Bakery,Perfume Shop,Boutique,Furniture / Home Store,Deli / Bodega
4,Auteuil,Hotel,Electronics Store,Miscellaneous Shop,Pool,Gym,Cupcake Shop,Perfume Shop


In [55]:
#Let's now reaply the k-means algorithm

paris_clust_services = paris_services.drop('Neighborhood', 1)

#apply k-means algorithm

kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(paris_clust_services)
kmeans.labels_[0:10]


array([3, 2, 2, 2, 1, 2, 1, 1, 2, 1], dtype=int32)

Let's merge the dataframes

In [56]:
paris_services_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

paris_services_cluster = paris_df

paris_services_cluster = paris_services_cluster.join(paris_services_sorted.set_index('Neighborhood'), on='Neighborhood')

paris_services_cluster.head(10) 

Unnamed: 0,Neighborhood,District,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue
0,Saint-Gervais,4,48.855719,2.358162,1,Hotel,Pastry Shop,Cosmetics Shop,Clothing Store,Bakery,Gourmet Shop,Cupcake Shop
1,Saint-Thomas-d'Aquin,7,48.855263,2.325588,2,Hotel,Bakery,Cheese Shop,Supermarket,Clothing Store,Department Store,Chocolate Shop
2,Porte-Saint-Denis,10,48.873618,2.352283,2,Hotel,Bakery,Convenience Store,Farmers Market,Cheese Shop,Furniture / Home Store,Wine Shop
3,Saint-Germain-l'Auxerrois,1,48.86065,2.33491,0,Hotel,Furniture / Home Store,Boutique,Cosmetics Shop,Shoe Store,Spa,Hostel
4,Villette,19,48.887661,2.374468,2,Hotel,Multiplex,Metro Station,Supermarket,Farmers Market,Gym / Fitness Center,Cosmetics Shop
5,Val-de-Grâce,5,48.841684,2.343861,2,Hotel,Bakery,Cupcake Shop,Tram Station,Train Station,Jewelry Store,Miscellaneous Shop
6,Necker,15,48.842711,2.310777,0,Hotel,Gym / Fitness Center,Bakery,Grocery Store,Pool,Pastry Shop,Supermarket
7,Père-Lachaise,20,48.863719,2.395273,1,Bakery,Gourmet Shop,Electronics Store,Miscellaneous Shop,Pool,Gym,Cupcake Shop
8,La Chapelle,18,48.894012,2.364387,1,Farmers Market,Supermarket,Cheese Shop,Hotel,Pool,Gym,Cupcake Shop
9,Europe,8,48.878148,2.317175,2,Hotel,Wine Shop,Supermarket,Pastry Shop,Bus Stop,Bakery,Electronics Store


## Let's visualize the service cluster

In [57]:
map_paris_cluster_services = folium.Map(location=[loc.latitude, loc.longitude],zoom_start=13)

#Set colors for each cluster
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

#add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(paris_services_cluster['Latitude'], paris_services_cluster['Longitude'], paris_services_cluster['Neighborhood'], paris_services_cluster['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[int(cluster)-1],
        fill=True,
        fill_color=rainbow[int(cluster)-1],
        fill_opacity=0.7).add_to(map_paris_cluster_services)
       
map_paris_cluster_services

## Let's examine each cluster in details

### Cluster 1

In [58]:
paris_services_cluster.loc[paris_services_cluster['Cluster Labels'] == 0, paris_services_cluster.columns[[0] + list(range(5,paris_services_cluster.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue
3,Saint-Germain-l'Auxerrois,Hotel,Furniture / Home Store,Boutique,Cosmetics Shop,Shoe Store,Spa,Hostel
6,Necker,Hotel,Gym / Fitness Center,Bakery,Grocery Store,Pool,Pastry Shop,Supermarket
23,Plaisance,Hotel,Supermarket,Bakery,Grocery Store,Farmers Market,Gym,Beer Store
38,Petit-Montrouge,Hotel,Supermarket,Bakery,Gym / Fitness Center,Cosmetics Shop,Multiplex,Hostel
40,Ecole-Militaire,Hotel,Pastry Shop,Gym,Farmers Market,Electronics Store,Pool,Cupcake Shop
42,Grenelle,Hotel,Gym / Fitness Center,Cheese Shop,Bakery,Bagel Shop,Bike Rental / Bike Share,Supermarket
43,Chaillot,Hotel,Bakery,Cosmetics Shop,Bagel Shop,Convenience Store,Gourmet Shop,Spa
52,Faubourg-du-Roule,Hotel,Jewelry Store,Cosmetics Shop,Pastry Shop,Clothing Store,Gym / Fitness Center,Boutique
66,Saint-Vincent-de-Paul,Hotel,Farmers Market,Supermarket,Cosmetics Shop,Grocery Store,Hostel,Gym
72,Madeleine,Hotel,Boutique,Gourmet Shop,Clothing Store,Wine Shop,Bakery,Cosmetics Shop


The neighborhoods of this cluster are suitable for tourism due to the variety of hotels

The neighborhoods of this cluster are shown in red

### Cluster 2

In [59]:
paris_services_cluster.loc[paris_services_cluster['Cluster Labels'] == 1, paris_services_cluster.columns[[0] + list(range(5,paris_services_cluster.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue
0,Saint-Gervais,Hotel,Pastry Shop,Cosmetics Shop,Clothing Store,Bakery,Gourmet Shop,Cupcake Shop
7,Père-Lachaise,Bakery,Gourmet Shop,Electronics Store,Miscellaneous Shop,Pool,Gym,Cupcake Shop
8,La Chapelle,Farmers Market,Supermarket,Cheese Shop,Hotel,Pool,Gym,Cupcake Shop
12,Saint-Lambert,Hotel,Supermarket,Bakery,Convenience Store,Metro Station,Multiplex,Bus Stop
13,Monnaie,Hotel,Wine Shop,Cosmetics Shop,Electronics Store,Miscellaneous Shop,Candy Store,Chocolate Shop
16,Maison-Blanche,Bus Stop,Farmers Market,Supermarket,Bakery,Pool,Hotel,Tram Station
17,Croulebarbe,Hotel,Bakery,Supermarket,Multiplex,Farmers Market,Electronics Store,Pool
20,Porte-Saint-Martin,Hotel,Bakery,Cheese Shop,Supermarket,Furniture / Home Store,Electronics Store,Gym
21,Roquette,Supermarket,Hotel,Bakery,Pastry Shop,Record Shop,Grocery Store,Beer Store
22,Picpus,Hotel,Electronics Store,Miscellaneous Shop,Pool,Gym,Cupcake Shop,Perfume Shop


The neighborhoods of this cluster provide a interesting variety of services. 

We can find hotels, many supermarkets and bakerys. It is a typical downtown neighborhood

The neighborhoods of this cluster are shown in purple

In [60]:
paris_services_cluster.loc[paris_services_cluster['Cluster Labels'] == 2, paris_services_cluster.columns[[0] + list(range(5,paris_services_cluster.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue
1,Saint-Thomas-d'Aquin,Hotel,Bakery,Cheese Shop,Supermarket,Clothing Store,Department Store,Chocolate Shop
2,Porte-Saint-Denis,Hotel,Bakery,Convenience Store,Farmers Market,Cheese Shop,Furniture / Home Store,Wine Shop
4,Villette,Hotel,Multiplex,Metro Station,Supermarket,Farmers Market,Gym / Fitness Center,Cosmetics Shop
5,Val-de-Grâce,Hotel,Bakery,Cupcake Shop,Tram Station,Train Station,Jewelry Store,Miscellaneous Shop
9,Europe,Hotel,Wine Shop,Supermarket,Pastry Shop,Bus Stop,Bakery,Electronics Store
10,Sainte-Marguerite,Hotel,Record Shop,Beer Store,Convenience Store,Pastry Shop,Train Station,Shoe Store
11,Parc-de-Montsouris,Hotel,Bus Stop,Electronics Store,Miscellaneous Shop,Pool,Gym,Cupcake Shop
14,Odéon,Hotel,Bakery,Boutique,Bagel Shop,Convenience Store,Chocolate Shop,Miscellaneous Shop
15,Champs-Elysées,Hotel,Boutique,Department Store,Tailor Shop,Bakery,Clothing Store,Jewelry Store
18,Arsenal,Hotel,Gourmet Shop,Bakery,Beer Store,Perfume Shop,Bagel Shop,Spa


This cluster is a typical downtown cluster too.

It is very suitable for touristist as cluster 1 due to the variety of hotels

The difference from cluster 1 is that cluster 3 seem to offer more bakeries and supermarkets

The neighborhoods of this cluster are shown in blue

### Cluster 4

In [62]:
paris_services_cluster.loc[paris_services_cluster['Cluster Labels'] == 3, paris_services_cluster.columns[[0] + list(range(5,paris_services_cluster.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue
36,Amérique,Supermarket,Pool,Grocery Store,Hotel,Electronics Store,Gym,Cupcake Shop
73,Saint-Fargeau,Supermarket,Bakery,Pool,Electronics Store,Miscellaneous Shop,Gym,Cupcake Shop


The two neighborhoods of this cluster offers many services but they are not suitable for visitors due to the lack of hotels offers

The neighborhoods of this cluster are shown in green

### Cluster 5

In [63]:
paris_services_cluster.loc[paris_services_cluster['Cluster Labels'] == 4, paris_services_cluster.columns[[0] + list(range(5,paris_services_cluster.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue
41,Muette,Pool,Gym / Fitness Center,Hotel,Electronics Store,Miscellaneous Shop,Gym,Cupcake Shop
74,Porte-Dauphine,Train Station,Gym / Fitness Center,Grocery Store,Hotel,Electronics Store,Pool,Gym


Like Cluster 4, the neighborhoods of this cluster are not suitable for turist

It is not a typical downtown cluster because we do not find supermarkets and bakeries

# 4. Results and discussion

This tool helps real state agents and clients identify the most suitable neighborhoods to live considering the general and specific needs

This tool has stablished 4 types of pool of clusters. Bellow we can describe the results:

a) General clustering: for this clustering exercise we have considered all the venues. We did not distingish a category. We have found the following:

- In this first analysis we have seen that there are neigborhoods more crowded than others.
- We identify neighborhoods that offers more open spaces than others (playgrounds, gardens and plazas)
- We identiy crowded areas that offers several kind of restaurant venues and services like shops

b) Family and cultural clustering: in this exercise we consider only the venues that are considered 

- We identify cluster of neighborhoods that offers open spaces like gardens and museums
- We identify a cluster in which art is essential in the form of art museums and galleries
- There are other neighborhoods in which we do not find art and museums but it offers othe cultural activities like concerts 
- There is a cluster in which we found many bookstores and theaters

c) Nightlife clustering: bars and restaurants. In this exercise we have found:

- Neighborhoods that offers more options to eat than to drink (not so many bars)
- We have found also the opposite. Clusters that offers a very good variety of bars and nightclubs
- Neighborhoods that offers the two options: a more calm venues like restaurants and cafés and more "loud" venues like bars and nightclubs

d) Services clustering: In this exercise we have found:

- There are neighborhoods more suitable for the tourism which offers a good variety of hotels
- There are neighborhoods that offers more "downtown" kind of life with a good variety of supermarket, shops and bakeries
- There are neighbothoods less crowded that do not offer venues like supermarket, groceries stores and bakeries

# 5. Conclusions

This tool helps real state agents to aim and search the most suitable neighborhoods considering many criterias

We can have many specific cliend demands about the characteristics of the different locations. 

This tool collects, summarizes and segment the group of neighborhoods that offer similar venues. 

With this tool a client could know:

- The neighborhoods more crowded and less crowded
- The neighborhoods that offers more open spaces like gardens and parks
- The neighborhoods that offers the amount and kind of cultural offer like art galleries, museums and teathers
- The neighborhoods that offers more restaurants and cafés.
- The neighborhoods that are more suitable for tourism
- The neighborhoods that are more suitable for nightlife activites (bars and nightclubs)

Finaly as we use a cluster algorithm. The neighborhoods are grouped. This could help a client if they want to go to a neighborhood that is similar to the current one
or a total different one
