# Capstone Project Week 4 & 5

## 0) Due to the limitation of Github, I invite you to review the project though this webpage : https://nbviewer.jupyter.org/github/ernesst/Coursera_Capstone/blob/master/Capstone%20Project%20W4-W5.ipynb

## 1) Introduction/Business Problem

Paris is known as the world tourism capital and the city attractions are well captured by touristic websites, such as Tripadvisor. However, when an experienced tourist has done the most known activities, he usually want to discover the local life and go out of the main trail. 
This project will help such tourist by giving them a unique finger print of the neighbourhoods (called in french arrondissements) on several topics : Population density, top 5 main venue, top 5 main restaurant types and top 5 main ongoing cultural activities. 
Thus according to the desire, the experience tourist will be able to have deeper understanding of the city layout and the unsupervised clustering will allow to give an hit on neigthbourhood similarties. 


## 2) Data / a description of the data and how it will be used to solve the problem.

Obviously we are going to use Foursquare data to enable this analysis, however this provider has two main limitations. 
- Due to the free account, the venue limitation is 100 per location
- Foursquare data are limited in some application such as cultural events and population density. 

Thus we will need to use other set of data: 
- Wikipedia for the neibourghood description : https://en.wikipedia.org/wiki/Demographics_of_Paris
- https://opendata.paris.fr/pages/home/ a wonderful free data ressources for Paris. 
- https://www.data.gouv.fr/ for the geographic neighbourhood layout

We will need to clean, merge these dataset in order to extract most of it. 
Then we will visit Paris in a different way, thanks to a unsupervised K-mean clustering :
- We will look at the population density, 
- Establish a profile of the main activities,
- Establish a profile of the restaurants' type,
- Establish a profile according to the actual cultural activities. 

The methodoly for this project will be the following : 
- Data collection through different sources, see above,
- Data processing in order to obtain dataframe grouped by Neighbourhood were subject of interest is presented. We will also require to link to this information GPS coordinate in order to use folium and plot the information on a map to ease information review. 
- As information as quite a lot, and it's somewhat difficult to asses similarities between neightbourhoods, we are going to use Machine Learning, with unsupervised k-means algorithm to ease the assessment by the experienced tourist, thus his futur destination choice. 

## 3) Library import

In [1]:
import numpy as np # library to handle data in a vectorized manner

import pandas as pd # library for data analsysis
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import json # library to handle JSON files
#!pip install geopy
#!conda install -c conda-forge geopy --yes # uncomment this line if you haven't completed the Foursquare API lab
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# Geopanda, allow to easily process geodata
!pip install geopandas
import geopandas

# import k-means from clustering stage
from sklearn.cluster import KMeans
!pip install folium
#!conda install -c conda-forge folium --yes # uncomment this line if you haven't completed the Foursquare API lab
import folium # map rendering library

import csv
import numpy as np

print('Libraries imported.')

Libraries imported.


## 4) Let's import arrondissement data from Wikipedia about Paris's demography

In [2]:
url = 'https://en.wikipedia.org/wiki/Demographics_of_Paris'
dfs = pd.read_html(url)

print("This page contains {} tables".format(len(dfs)))

This page contains 11 tables


### The third table is the relevant one, let's get it and inspect it

In [3]:
# Selection of the thrid table
arrondissements = dfs[3]
#Let's inspect them, look at the last row !
print(arrondissements)
#drop the last sum-up row 
arrondissements.drop(arrondissements.tail(1).index,inplace=True)
#convert arrondissement in numeric
arrondissements['Arrondissement']=pd.to_numeric(arrondissements.Arrondissement)
print(arrondissements.dtypes)

arrondissements.head(20)

   Arrondissement  Area (km2)  Population  Population per km2
0              01       1.826       17268                9457
1              02       0.992       22558               22740
2              03       1.171       36727               31364
3              04       1.601       28068               17532
4              05       2.541       61080               24038
5              06       2.154       44154               20499
6              07       4.088       58166               14228
7              08       3.881       39409               10154
8              09       2.179       60293               27670
9              10       2.892       95436               33000
10             11       3.666      156831               42780
11             12       6.377      146527               22977
12             13       7.146      184235               25782
13             14       5.621      142535               25358
14             15       8.502      240723               28314
15      

Unnamed: 0,Arrondissement,Area (km2),Population,Population per km2
0,1,1.826,17268,9457
1,2,0.992,22558,22740
2,3,1.171,36727,31364
3,4,1.601,28068,17532
4,5,2.541,61080,24038
5,6,2.154,44154,20499
6,7,4.088,58166,14228
7,8,3.881,39409,10154
8,9,2.179,60293,27670
9,10,2.892,95436,33000


### We have now the arrondissement and its informations.

## 5) Let's import the geojson for the neightbourhood (Arrondissements)

In [4]:
!wget   https://www.data.gouv.fr/en/datasets/r/4765fe48-35fd-4536-b029-4727380ce23c -O arrondissements.geojson
print('GeoJSON file downloaded!')
!ls ./

--2020-12-26 12:03:32--  https://www.data.gouv.fr/en/datasets/r/4765fe48-35fd-4536-b029-4727380ce23c
Resolving www.data.gouv.fr (www.data.gouv.fr)... 37.59.183.93
Connecting to www.data.gouv.fr (www.data.gouv.fr)|37.59.183.93|:443... connected.
HTTP request sent, awaiting response... 302 FOUND
Location: https://opendata.paris.fr/explore/dataset/arrondissements/download?format=geojson&timezone=Europe/Berlin&use_labels_for_header=false [following]
--2020-12-26 12:03:32--  https://opendata.paris.fr/explore/dataset/arrondissements/download?format=geojson&timezone=Europe/Berlin&use_labels_for_header=false
Resolving opendata.paris.fr (opendata.paris.fr)... 34.248.20.69, 34.249.199.226
Connecting to opendata.paris.fr (opendata.paris.fr)|34.248.20.69|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: unspecified [application/json]
Saving to: ‘arrondissements.geojson’

arrondissements.geo     [ <=>                ] 202.63K  --.-KB/s    in 0.04s   

2020-12-26 12:03:33 (5.

In [5]:
# Getting Paris GPS coordinates to center the maps
address = 'Paris, FR'
geolocator = Nominatim(user_agent="To_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinates of Paris are {}, {}.'.format(latitude, longitude))


The geograpical coordinates of Paris are 48.8566969, 2.3514616.


In [6]:
#Let's create a dataframe for easier process, thanks to geopandas lib.
arrondissement_geo = r'./arrondissements.geojson'
Paris = geopandas.read_file(arrondissement_geo)
Paris.head()

Unnamed: 0,n_sq_co,perimetre,l_ar,surface,n_sq_ar,l_aroff,c_arinsee,c_ar,geometry
0,750001537,6054.936862,1er Ardt,1824613.0,750000001,Louvre,75101,1,"POLYGON ((2.32801 48.86992, 2.32997 48.86851, ..."
1,750001537,4554.10436,2ème Ardt,991153.7,750000002,Bourse,75102,2,"POLYGON ((2.35152 48.86443, 2.35095 48.86341, ..."
2,750001537,11253.182479,19ème Ardt,6792651.0,750000019,Buttes-Chaumont,75119,19,"POLYGON ((2.38943 48.90122, 2.39014 48.90108, ..."
3,750001537,4519.263648,3ème Ardt,1170883.0,750000003,Temple,75103,3,"POLYGON ((2.36383 48.86750, 2.36389 48.86747, ..."
4,750001537,8099.424883,7ème Ardt,4090057.0,750000007,Palais-Bourbon,75107,7,"POLYGON ((2.32090 48.86306, 2.32094 48.86305, ..."


In [7]:
#Merge of both dataframe for easier process, on the arrondissements number, first wee need to rename a column.
arrondissements.rename(columns = {'Arrondissement':'c_ar'}, inplace = True) 
Paris = Paris.merge(arrondissements, on = "c_ar")
Paris.head()

Unnamed: 0,n_sq_co,perimetre,l_ar,surface,n_sq_ar,l_aroff,c_arinsee,c_ar,geometry,Area (km2),Population,Population per km2
0,750001537,6054.936862,1er Ardt,1824613.0,750000001,Louvre,75101,1,"POLYGON ((2.32801 48.86992, 2.32997 48.86851, ...",1.826,17268,9457
1,750001537,4554.10436,2ème Ardt,991153.7,750000002,Bourse,75102,2,"POLYGON ((2.35152 48.86443, 2.35095 48.86341, ...",0.992,22558,22740
2,750001537,11253.182479,19ème Ardt,6792651.0,750000019,Buttes-Chaumont,75119,19,"POLYGON ((2.38943 48.90122, 2.39014 48.90108, ...",6.786,187799,27674
3,750001537,4519.263648,3ème Ardt,1170883.0,750000003,Temple,75103,3,"POLYGON ((2.36383 48.86750, 2.36389 48.86747, ...",1.171,36727,31364
4,750001537,8099.424883,7ème Ardt,4090057.0,750000007,Palais-Bourbon,75107,7,"POLYGON ((2.32090 48.86306, 2.32094 48.86305, ...",4.088,58166,14228


###  As we can see we are missing the neighbourhood center gps coordinate that geopandas was not able to import, let's fix it. 

In [8]:
# Let's get the gps coordinate of the arrondissements that geopanda missed.
rows = []
with open(arrondissement_geo) as json_file:
    data = json.load(json_file)
    i = 0
    for p in data['features']:
        rows.append([data['features'][i]['properties']['c_ar'], data['features'][i]['properties']['geom_x_y'][0],data['features'][i]['properties']['geom_x_y'][1]])
        i = i + 1
        
missing_df = pd.DataFrame(rows, columns=["c_ar", "latitude", "longitude"])

#add missing data to main dataframe 
Paris = Paris.merge(missing_df, on = "c_ar")
Paris.head()

Unnamed: 0,n_sq_co,perimetre,l_ar,surface,n_sq_ar,l_aroff,c_arinsee,c_ar,geometry,Area (km2),Population,Population per km2,latitude,longitude
0,750001537,6054.936862,1er Ardt,1824613.0,750000001,Louvre,75101,1,"POLYGON ((2.32801 48.86992, 2.32997 48.86851, ...",1.826,17268,9457,48.862563,2.336443
1,750001537,4554.10436,2ème Ardt,991153.7,750000002,Bourse,75102,2,"POLYGON ((2.35152 48.86443, 2.35095 48.86341, ...",0.992,22558,22740,48.868279,2.342803
2,750001537,11253.182479,19ème Ardt,6792651.0,750000019,Buttes-Chaumont,75119,19,"POLYGON ((2.38943 48.90122, 2.39014 48.90108, ...",6.786,187799,27674,48.887076,2.384821
3,750001537,4519.263648,3ème Ardt,1170883.0,750000003,Temple,75103,3,"POLYGON ((2.36383 48.86750, 2.36389 48.86747, ...",1.171,36727,31364,48.862872,2.360001
4,750001537,8099.424883,7ème Ardt,4090057.0,750000007,Palais-Bourbon,75107,7,"POLYGON ((2.32090 48.86306, 2.32094 48.86305, ...",4.088,58166,14228,48.856174,2.312188


### look on the last 2 columns, now that we have all the information let's look at the population density in Paris.

## 6) Maps of the population density of Paris.

In [9]:
Paris_map = folium.Map(location=[latitude , longitude], zoom_start=13)
# add layer to map
Paris_map.choropleth(arrondissement_geo, data=Paris,columns=['c_ar','Population per km2'], key_on='feature.properties.c_ar',    fill_color='YlOrRd', fill_opacity=0.7,     line_opacity=0.2,    legend_name='Population per km2')
# add markers to map
for lat, lng, l_aroff, c_arinsee in zip(Paris['latitude'], Paris['longitude'], Paris['l_aroff'], Paris['c_arinsee']):
    label = '{}, {}'.format(l_aroff, c_arinsee)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(Paris_map)  
Paris_map



### So now we have a map of Paris segmented by arrondissements and color per population density. Sweet !
Paris density can be decomposed in 3 area :
- Center of Paris => low density, when main museum, ambassies, French gouvernments reside.
- North east of Paris => expressed the highest density

 ## 7) Now let's try to establish a profile per arrondissements - 1st from Foursquare Venue

### Define Foursquare Credentials and Version - hidden

In [10]:
# The code was removed by Watson Studio for sharing.

In [11]:
# Define the fonction to get the venue
def getNearbyVenues(names, latitudes, longitudes, radius=1000):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Arrondissement', 
                  'Arrondissement Latitude', 
                  'Arrondissement Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [12]:
# Processing data from Foursquare
Paris_venues = getNearbyVenues(names=Paris['c_ar'],
                                   latitudes=Paris['latitude'],
                                   longitudes=Paris['longitude']
                                  )

1
2
19
3
7
5
8
17
20
6
11
13
9
18
4
14
16
10
15
12


### Let's looks wa we get

In [13]:
print(Paris_venues.shape)
Paris_venues.head()

(1830, 7)


Unnamed: 0,Arrondissement,Arrondissement Latitude,Arrondissement Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,1,48.862563,2.336443,Musée du Louvre,48.860847,2.33644,Art Museum
1,1,48.862563,2.336443,Palais Royal,48.863236,2.337127,Historic Site
2,1,48.862563,2.336443,Comédie-Française,48.863088,2.336612,Theater
3,1,48.862563,2.336443,Place du Palais Royal,48.862523,2.336688,Plaza
4,1,48.862563,2.336443,Cour Napoléon,48.861172,2.335088,Plaza


In [14]:
Paris_venues.groupby('Arrondissement').count()

Unnamed: 0_level_0,Arrondissement Latitude,Arrondissement Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Arrondissement,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
1,100,100,100,100,100,100
2,100,100,100,100,100,100
3,100,100,100,100,100,100
4,100,100,100,100,100,100
5,100,100,100,100,100,100
6,100,100,100,100,100,100
7,100,100,100,100,100,100
8,100,100,100,100,100,100
9,100,100,100,100,100,100
10,100,100,100,100,100,100


In [15]:
print('There are {} uniques categories.'.format(len(Paris_venues['Venue Category'].unique())))

There are 228 uniques categories.


### it's interesting to notice that we reach to upper limit of the free developer account on Foursquare, thus likely we don't have a reprensentative set of information from the neightbourhood. Let's see what we get!

In [16]:
# one hot encoding
Paris_onehot = pd.get_dummies(Paris_venues[['Venue Category']], prefix="", prefix_sep="")
# add neighbourhood column back to dataframe
Paris_onehot['Arrondissement'] = Paris_venues['Arrondissement'] 

# move neighbourhood column to the first column
col_name="Arrondissement"
first_col = Paris_onehot.pop(col_name)
Paris_onehot.insert(0, col_name, first_col)
Paris_onehot.head()

Unnamed: 0,Arrondissement,Afghan Restaurant,African Restaurant,Alsatian Restaurant,American Restaurant,Antique Shop,Aquarium,Arcade,Arepa Restaurant,Argentinian Restaurant,Art Gallery,Art Museum,Arts & Crafts Store,Asian Restaurant,BBQ Joint,Baby Store,Bagel Shop,Bakery,Bar,Basketball Court,Basque Restaurant,Bed & Breakfast,Beer Bar,Beer Garden,Beer Store,Bistro,Boat or Ferry,Bookstore,Botanical Garden,Boutique,Boxing Gym,Brasserie,Brazilian Restaurant,Breakfast Spot,Breton Restaurant,Brewery,Bridge,Bubble Tea Shop,Burger Joint,Bus Station,Butcher,Café,Cambodian Restaurant,Canal,Candy Store,Cantonese Restaurant,Cemetery,Cheese Shop,Chinese Restaurant,Chocolate Shop,Church,Circus,Clothing Store,Cocktail Bar,Coffee Shop,Comedy Club,Comfort Food Restaurant,Comic Shop,Concert Hall,Convenience Store,Corsican Restaurant,Cosmetics Shop,Creperie,Cultural Center,Cupcake Shop,Cycle Studio,Dance Studio,Deli / Bodega,Department Store,Dessert Shop,Diner,Discount Store,Donut Shop,Drive-in Theater,Eastern European Restaurant,Electronics Store,English Restaurant,Escape Room,Ethiopian Restaurant,Event Space,Exhibit,Falafel Restaurant,Farmers Market,Fast Food Restaurant,Fish & Chips Shop,Fish Market,Flea Market,Food & Drink Shop,Food Truck,Fountain,French Restaurant,Furniture / Home Store,Gaming Cafe,Garden,Gastropub,Gluten-free Restaurant,Gourmet Shop,Greek Restaurant,Grocery Store,Gym,Gym / Fitness Center,Gym Pool,Health Food Store,Historic Site,History Museum,Hookah Bar,Hostel,Hotel,Hotel Bar,Ice Cream Shop,Indian Restaurant,Indie Movie Theater,Irish Pub,Israeli Restaurant,Italian Restaurant,Japanese Restaurant,Jazz Club,Jewelry Store,Jewish Restaurant,Jiangxi Restaurant,Juice Bar,Karaoke Bar,Korean BBQ Restaurant,Korean Restaurant,Lake,Latin American Restaurant,Lebanese Restaurant,Lingerie Store,Liquor Store,Lounge,Lyonese Bouchon,Market,Mediterranean Restaurant,Memorial Site,Men's Store,Metro Station,Mexican Restaurant,Middle Eastern Restaurant,Miscellaneous Shop,Modern European Restaurant,Monument / Landmark,Moroccan Restaurant,Movie Theater,Multiplex,Museum,Music Store,Music Venue,New American Restaurant,Nightclub,Okonomiyaki Restaurant,Opera House,Organic Grocery,Park,Pastry Shop,Pedestrian Plaza,Performing Arts Venue,Persian Restaurant,Peruvian Restaurant,Pharmacy,Pizza Place,Playground,Plaza,Pool,Portuguese Restaurant,Pub,Record Shop,Recreation Center,Residential Building (Apartment / Condo),Resort,Restaurant,Rock Club,Roof Deck,Russian Restaurant,Salad Place,Sandwich Place,Scandinavian Restaurant,Scenic Lookout,Science Museum,Seafood Restaurant,Shanxi Restaurant,Shoe Store,Shopping Plaza,Smoke Shop,Snack Place,Soccer Field,Soccer Stadium,South American Restaurant,Southern / Soul Food Restaurant,Southwestern French Restaurant,Souvenir Shop,Souvlaki Shop,Spa,Spanish Restaurant,Sporting Goods Shop,Sports Bar,Stadium,Steakhouse,Street Fair,Supermarket,Sushi Restaurant,Szechuan Restaurant,Taco Place,Tailor Shop,Taiwanese Restaurant,Tapas Restaurant,Tattoo Parlor,Tea Room,Tech Startup,Tennis Stadium,Thai Restaurant,Theater,Theme Park Ride / Attraction,Thrift / Vintage Store,Toy / Game Store,Track,Trail,Train Station,Turkish Restaurant,Udon Restaurant,Vegetarian / Vegan Restaurant,Venezuelan Restaurant,Video Game Store,Vietnamese Restaurant,Vineyard,Wine Bar,Wine Shop,Women's Store,Zoo,Zoo Exhibit
0,1,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


In [17]:
Paris_onehot.shape

(1830, 229)

In [18]:
Paris_grouped = Paris_onehot.groupby('Arrondissement').mean().reset_index()
Paris_grouped

Unnamed: 0,Arrondissement,Afghan Restaurant,African Restaurant,Alsatian Restaurant,American Restaurant,Antique Shop,Aquarium,Arcade,Arepa Restaurant,Argentinian Restaurant,Art Gallery,Art Museum,Arts & Crafts Store,Asian Restaurant,BBQ Joint,Baby Store,Bagel Shop,Bakery,Bar,Basketball Court,Basque Restaurant,Bed & Breakfast,Beer Bar,Beer Garden,Beer Store,Bistro,Boat or Ferry,Bookstore,Botanical Garden,Boutique,Boxing Gym,Brasserie,Brazilian Restaurant,Breakfast Spot,Breton Restaurant,Brewery,Bridge,Bubble Tea Shop,Burger Joint,Bus Station,Butcher,Café,Cambodian Restaurant,Canal,Candy Store,Cantonese Restaurant,Cemetery,Cheese Shop,Chinese Restaurant,Chocolate Shop,Church,Circus,Clothing Store,Cocktail Bar,Coffee Shop,Comedy Club,Comfort Food Restaurant,Comic Shop,Concert Hall,Convenience Store,Corsican Restaurant,Cosmetics Shop,Creperie,Cultural Center,Cupcake Shop,Cycle Studio,Dance Studio,Deli / Bodega,Department Store,Dessert Shop,Diner,Discount Store,Donut Shop,Drive-in Theater,Eastern European Restaurant,Electronics Store,English Restaurant,Escape Room,Ethiopian Restaurant,Event Space,Exhibit,Falafel Restaurant,Farmers Market,Fast Food Restaurant,Fish & Chips Shop,Fish Market,Flea Market,Food & Drink Shop,Food Truck,Fountain,French Restaurant,Furniture / Home Store,Gaming Cafe,Garden,Gastropub,Gluten-free Restaurant,Gourmet Shop,Greek Restaurant,Grocery Store,Gym,Gym / Fitness Center,Gym Pool,Health Food Store,Historic Site,History Museum,Hookah Bar,Hostel,Hotel,Hotel Bar,Ice Cream Shop,Indian Restaurant,Indie Movie Theater,Irish Pub,Israeli Restaurant,Italian Restaurant,Japanese Restaurant,Jazz Club,Jewelry Store,Jewish Restaurant,Jiangxi Restaurant,Juice Bar,Karaoke Bar,Korean BBQ Restaurant,Korean Restaurant,Lake,Latin American Restaurant,Lebanese Restaurant,Lingerie Store,Liquor Store,Lounge,Lyonese Bouchon,Market,Mediterranean Restaurant,Memorial Site,Men's Store,Metro Station,Mexican Restaurant,Middle Eastern Restaurant,Miscellaneous Shop,Modern European Restaurant,Monument / Landmark,Moroccan Restaurant,Movie Theater,Multiplex,Museum,Music Store,Music Venue,New American Restaurant,Nightclub,Okonomiyaki Restaurant,Opera House,Organic Grocery,Park,Pastry Shop,Pedestrian Plaza,Performing Arts Venue,Persian Restaurant,Peruvian Restaurant,Pharmacy,Pizza Place,Playground,Plaza,Pool,Portuguese Restaurant,Pub,Record Shop,Recreation Center,Residential Building (Apartment / Condo),Resort,Restaurant,Rock Club,Roof Deck,Russian Restaurant,Salad Place,Sandwich Place,Scandinavian Restaurant,Scenic Lookout,Science Museum,Seafood Restaurant,Shanxi Restaurant,Shoe Store,Shopping Plaza,Smoke Shop,Snack Place,Soccer Field,Soccer Stadium,South American Restaurant,Southern / Soul Food Restaurant,Southwestern French Restaurant,Souvenir Shop,Souvlaki Shop,Spa,Spanish Restaurant,Sporting Goods Shop,Sports Bar,Stadium,Steakhouse,Street Fair,Supermarket,Sushi Restaurant,Szechuan Restaurant,Taco Place,Tailor Shop,Taiwanese Restaurant,Tapas Restaurant,Tattoo Parlor,Tea Room,Tech Startup,Tennis Stadium,Thai Restaurant,Theater,Theme Park Ride / Attraction,Thrift / Vintage Store,Toy / Game Store,Track,Trail,Train Station,Turkish Restaurant,Udon Restaurant,Vegetarian / Vegan Restaurant,Venezuelan Restaurant,Video Game Store,Vietnamese Restaurant,Vineyard,Wine Bar,Wine Shop,Women's Store,Zoo,Zoo Exhibit
0,1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.03,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.01,0.01,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.02,0.01,0.02,0.01,0.0,0.0,0.01,0.04,0.0,0.0,0.0,0.0,0.01,0.0,0.02,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.01,0.09,0.01,0.0,0.03,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.03,0.0,0.0,0.0,0.09,0.0,0.0,0.0,0.0,0.0,0.01,0.05,0.08,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.07,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.01,0.0,0.0,0.02,0.0,0.0,0.01,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.04,0.01,0.02,0.0,0.0
1,2,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.02,0.01,0.0,0.0,0.0,0.01,0.0,0.0,0.02,0.0,0.02,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.01,0.0,0.0,0.01,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.04,0.02,0.01,0.0,0.0,0.01,0.0,0.01,0.0,0.02,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.14,0.0,0.0,0.01,0.01,0.0,0.02,0.01,0.01,0.0,0.01,0.0,0.0,0.02,0.0,0.0,0.0,0.05,0.0,0.01,0.0,0.02,0.0,0.0,0.03,0.05,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.02,0.03,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.01,0.01,0.0,0.01,0.0,0.0,0.0,0.01,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.06,0.01,0.02,0.0,0.0
2,3,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.04,0.02,0.0,0.0,0.0,0.0,0.0,0.03,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.03,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.02,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.01,0.01,0.01,0.0,0.0,0.04,0.03,0.04,0.01,0.0,0.0,0.01,0.0,0.0,0.01,0.0,0.01,0.0,0.0,0.0,0.01,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.06,0.01,0.0,0.02,0.0,0.0,0.03,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.03,0.0,0.01,0.0,0.0,0.0,0.01,0.04,0.02,0.0,0.0,0.01,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.01,0.0,0.0,0.01,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.01,0.03,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.03,0.0,0.0,0.0,0.0,0.03,0.01,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.02,0.0,0.03,0.0,0.0,0.0,0.0
3,4,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.02,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.01,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.02,0.03,0.03,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.02,0.0,0.0,0.0,0.01,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.19,0.02,0.0,0.03,0.02,0.0,0.02,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.04,0.0,0.03,0.0,0.0,0.0,0.01,0.01,0.01,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.01,0.01,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.03,0.02,0.0,0.0,0.0,0.0,0.01,0.0,0.05,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.02,0.01,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.03,0.0,0.0,0.0,0.0
4,5,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.04,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.01,0.01,0.0,0.0,0.02,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.03,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.01,0.0,0.0,0.0,0.0,0.03,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.02,0.0,0.01,0.0,0.0,0.0,0.0,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.01,0.0,0.0,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.14,0.0,0.0,0.0,0.0,0.0,0.01,0.02,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.04,0.0,0.01,0.0,0.03,0.0,0.0,0.05,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.01,0.0,0.01,0.0,0.01,0.01,0.0,0.0,0.03,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.01,0.0,0.01,0.0,0.05,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.03,0.01,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.01,0.0,0.01,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.02,0.0,0.0,0.01,0.0
5,6,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.03,0.0,0.0,0.0,0.01,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.01,0.0,0.0,0.01,0.0,0.02,0.0,0.0,0.02,0.02,0.03,0.0,0.0,0.02,0.0,0.0,0.0,0.01,0.02,0.0,0.02,0.0,0.0,0.01,0.01,0.01,0.01,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.1,0.01,0.0,0.05,0.0,0.0,0.02,0.0,0.0,0.01,0.0,0.0,0.0,0.01,0.01,0.0,0.0,0.04,0.0,0.02,0.0,0.01,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.01,0.01,0.0,0.0,0.0,0.0,0.02,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.03,0.0,0.0,0.0,0.01,0.01,0.0,0.01,0.07,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.01,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.03,0.01,0.0,0.0,0.0
6,7,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.01,0.01,0.0,0.01,0.0,0.0,0.01,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.03,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.03,0.04,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.01,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.29,0.0,0.0,0.04,0.0,0.0,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.04,0.03,0.0,0.0,0.1,0.0,0.01,0.0,0.0,0.0,0.0,0.03,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.02,0.02,0.01,0.0,0.0,0.01,0.0,0.0,0.0,0.05,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
7,8,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.01,0.0,0.0,0.0,0.0,0.0,0.02,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.06,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.04,0.01,0.02,0.0,0.0,0.0,0.0,0.0,0.02,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.13,0.01,0.0,0.05,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.02,0.01,0.0,0.0,0.15,0.01,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.01,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.03,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.01,0.0,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.02,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.03,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.01,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.02,0.0,0.0
8,9,0.0,0.0,0.0,0.0,0.01,0.0,0.01,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.05,0.01,0.01,0.0,0.0,0.01,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.01,0.0,0.0,0.02,0.0,0.01,0.0,0.0,0.02,0.02,0.03,0.01,0.0,0.0,0.02,0.0,0.01,0.01,0.01,0.0,0.0,0.0,0.0,0.01,0.02,0.0,0.0,0.0,0.01,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.01,0.0,0.0,0.15,0.0,0.0,0.0,0.01,0.0,0.01,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.08,0.0,0.0,0.0,0.01,0.0,0.0,0.05,0.01,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.01,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.02,0.0,0.04,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.01,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.02,0.01,0.0,0.01,0.0,0.04,0.01,0.0,0.0,0.0
9,10,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.03,0.0,0.0,0.0,0.03,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.03,0.0,0.01,0.0,0.0,0.01,0.0,0.0,0.03,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.01,0.01,0.01,0.0,0.0,0.0,0.02,0.01,0.0,0.0,0.0,0.0,0.03,0.06,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.01,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.15,0.01,0.01,0.01,0.01,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.02,0.0,0.0,0.02,0.01,0.0,0.01,0.04,0.04,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.01,0.0,0.0,0.0,0.02,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.02,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.02,0.0,0.0,0.0,0.0,0.02,0.02,0.0,0.0,0.0


In [19]:
Paris_grouped.shape

(20, 229)

In [20]:
# Let's pring the top 5 venue per neightboorhood.
num_top_venues = 5

for hood in Paris_grouped['Arrondissement']:
    print("----"+str(hood)+"----")
    temp = Paris_grouped[Paris_grouped['Arrondissement'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----1----
                 venue  freq
0                Hotel  0.09
1    French Restaurant  0.09
2  Japanese Restaurant  0.08
3                Plaza  0.07
4   Italian Restaurant  0.05


----2----
                 venue  freq
0    French Restaurant  0.14
1             Wine Bar  0.06
2  Japanese Restaurant  0.05
3                Hotel  0.05
4         Cocktail Bar  0.04


----3----
                venue  freq
0   French Restaurant  0.06
1      Clothing Store  0.04
2         Coffee Shop  0.04
3  Italian Restaurant  0.04
4         Art Gallery  0.04


----4----
               venue  freq
0  French Restaurant  0.19
1              Plaza  0.05
2              Hotel  0.04
3       Cocktail Bar  0.03
4        Coffee Shop  0.03


----5----
                venue  freq
0   French Restaurant  0.14
1  Italian Restaurant  0.05
2               Plaza  0.05
3               Hotel  0.04
4              Bakery  0.04


----6----
               venue  freq
0  French Restaurant  0.10
1              Plaza  0.07
2  

### Nice, but not so easy to process for the experienced tourist, let's plot this information on a maps and cluster them

In [21]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [22]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Arrondissement']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighbourhoods_venues_sorted = pd.DataFrame(columns=columns)
neighbourhoods_venues_sorted['Arrondissement'] = Paris_grouped['Arrondissement']

for ind in np.arange(Paris_grouped.shape[0]):
    neighbourhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(Paris_grouped.iloc[ind, :], num_top_venues)

neighbourhoods_venues_sorted[:20]

Unnamed: 0,Arrondissement,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,1,Hotel,French Restaurant,Japanese Restaurant,Plaza,Italian Restaurant,Wine Bar,Coffee Shop,Art Museum,Historic Site,Garden
1,2,French Restaurant,Wine Bar,Hotel,Japanese Restaurant,Cocktail Bar,Italian Restaurant,Pedestrian Plaza,Spa,Bookstore,Restaurant
2,3,French Restaurant,Italian Restaurant,Art Gallery,Coffee Shop,Clothing Store,Cocktail Bar,Bakery,Pastry Shop,Bookstore,Restaurant
3,4,French Restaurant,Plaza,Hotel,Coffee Shop,Wine Bar,Ice Cream Shop,Pastry Shop,Garden,Cocktail Bar,Cultural Center
4,5,French Restaurant,Italian Restaurant,Plaza,Bakery,Hotel,Indie Movie Theater,Museum,Science Museum,Café,Coffee Shop
5,6,French Restaurant,Plaza,Garden,Hotel,Pastry Shop,Wine Bar,Coffee Shop,Bookstore,Creperie,Bistro
6,7,French Restaurant,Hotel,Plaza,Garden,Historic Site,Coffee Shop,Cocktail Bar,Italian Restaurant,Café,History Museum
7,8,Hotel,French Restaurant,Boutique,Garden,Clothing Store,Pastry Shop,Tailor Shop,Cosmetics Shop,Coffee Shop,Plaza
8,9,French Restaurant,Hotel,Italian Restaurant,Bakery,Wine Bar,Plaza,Coffee Shop,Cheese Shop,Cocktail Bar,Vegetarian / Vegan Restaurant
9,10,French Restaurant,Coffee Shop,Italian Restaurant,Japanese Restaurant,Cocktail Bar,Bistro,Breakfast Spot,Bakery,Asian Restaurant,Vegetarian / Vegan Restaurant


In [23]:
# set number of clusters
kclusters = 10

Paris_grouped_clustering = Paris_grouped.drop('Arrondissement', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(Paris_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:20]

array([0, 2, 1, 2, 2, 1, 9, 7, 2, 2, 3, 4, 6, 5, 5, 5, 5, 3, 3, 8],
      dtype=int32)

In [24]:
# add clustering labels
neighbourhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)
Paris_merged = Paris
neighbourhoods_venues_sorted.rename(columns = {'Arrondissement':'c_ar'}, inplace = True) 
# merge manhattan_grouped with manhattan_data to add latitude/longitude for each neighbourhood
Paris_merged = Paris_merged.join(neighbourhoods_venues_sorted.set_index('c_ar'), on='c_ar')

Paris_merged.head() # check the last columns!

Unnamed: 0,n_sq_co,perimetre,l_ar,surface,n_sq_ar,l_aroff,c_arinsee,c_ar,geometry,Area (km2),Population,Population per km2,latitude,longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,750001537,6054.936862,1er Ardt,1824613.0,750000001,Louvre,75101,1,"POLYGON ((2.32801 48.86992, 2.32997 48.86851, ...",1.826,17268,9457,48.862563,2.336443,0,Hotel,French Restaurant,Japanese Restaurant,Plaza,Italian Restaurant,Wine Bar,Coffee Shop,Art Museum,Historic Site,Garden
1,750001537,4554.10436,2ème Ardt,991153.7,750000002,Bourse,75102,2,"POLYGON ((2.35152 48.86443, 2.35095 48.86341, ...",0.992,22558,22740,48.868279,2.342803,2,French Restaurant,Wine Bar,Hotel,Japanese Restaurant,Cocktail Bar,Italian Restaurant,Pedestrian Plaza,Spa,Bookstore,Restaurant
2,750001537,11253.182479,19ème Ardt,6792651.0,750000019,Buttes-Chaumont,75119,19,"POLYGON ((2.38943 48.90122, 2.39014 48.90108, ...",6.786,187799,27674,48.887076,2.384821,3,French Restaurant,Bar,Café,Bistro,Italian Restaurant,Supermarket,Concert Hall,Pizza Place,Canal,Seafood Restaurant
3,750001537,4519.263648,3ème Ardt,1170883.0,750000003,Temple,75103,3,"POLYGON ((2.36383 48.86750, 2.36389 48.86747, ...",1.171,36727,31364,48.862872,2.360001,1,French Restaurant,Italian Restaurant,Art Gallery,Coffee Shop,Clothing Store,Cocktail Bar,Bakery,Pastry Shop,Bookstore,Restaurant
4,750001537,8099.424883,7ème Ardt,4090057.0,750000007,Palais-Bourbon,75107,7,"POLYGON ((2.32090 48.86306, 2.32094 48.86305, ...",4.088,58166,14228,48.856174,2.312188,9,French Restaurant,Hotel,Plaza,Garden,Historic Site,Coffee Shop,Cocktail Bar,Italian Restaurant,Café,History Museum


In [25]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=13)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

map_clusters.choropleth(arrondissement_geo, data=Paris_merged,columns=['c_ar','Cluster Labels'], key_on='feature.properties.c_ar',    fill_color='YlOrRd', fill_opacity=0.4,     line_opacity=0.2,    legend_name='Clusters on restaurant')




# add markers to the map
markers_colors = []
for lat, lon, poi, cluster, first_venue, second_venue, third_venue, fourth_venue, fifth_venue in zip(Paris_merged['latitude'], Paris_merged['longitude'], Paris_merged['c_ar'], Paris_merged['Cluster Labels'], Paris_merged['1st Most Common Venue'], Paris_merged['2nd Most Common Venue'],Paris_merged['3rd Most Common Venue'],Paris_merged['4th Most Common Venue'],Paris_merged['5th Most Common Venue']):
    #label = folium.Popup( str(poi) + " Arrondissement " + ' Cluster '+ """  <p>"""+ str(cluster) + ' Main venue :  '  + str(first_venue), parse_html=True)
    text = str(poi) + " Arrondissement " + """  <p>"""+ ' Cluster '+  str(cluster) + """  <p>"""+   ' Main venues :  ' + """  <p>"""+  str(first_venue)+ """  <p>"""+  str(second_venue)+ """  <p>"""+  str(third_venue)+ """  <p>"""+  str(fourth_venue)+ """  <p>"""+  str(fifth_venue)
    text_processed = folium.Html(text, script=True) # i'm assuming this bit runs fine
    iframe = folium.IFrame(html=text_processed, width=350, height=300)
    label = folium.Popup(iframe, parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)



       
map_clusters




### Nice, now the experienced tourist has a map where each neighbourhood is 
- presented with it's 5 top venues,
- clustered with other neighbourhood in order to let's choose if he want so stay in the same type of neightbourhood or not.

### I invite you to click on the center of each neightbourhood to see their main venue.
See below for the sum-up

In [26]:
neighbourhoods_venues_sorted[:20]

Unnamed: 0,Cluster Labels,c_ar,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,0,1,Hotel,French Restaurant,Japanese Restaurant,Plaza,Italian Restaurant,Wine Bar,Coffee Shop,Art Museum,Historic Site,Garden
1,2,2,French Restaurant,Wine Bar,Hotel,Japanese Restaurant,Cocktail Bar,Italian Restaurant,Pedestrian Plaza,Spa,Bookstore,Restaurant
2,1,3,French Restaurant,Italian Restaurant,Art Gallery,Coffee Shop,Clothing Store,Cocktail Bar,Bakery,Pastry Shop,Bookstore,Restaurant
3,2,4,French Restaurant,Plaza,Hotel,Coffee Shop,Wine Bar,Ice Cream Shop,Pastry Shop,Garden,Cocktail Bar,Cultural Center
4,2,5,French Restaurant,Italian Restaurant,Plaza,Bakery,Hotel,Indie Movie Theater,Museum,Science Museum,Café,Coffee Shop
5,1,6,French Restaurant,Plaza,Garden,Hotel,Pastry Shop,Wine Bar,Coffee Shop,Bookstore,Creperie,Bistro
6,9,7,French Restaurant,Hotel,Plaza,Garden,Historic Site,Coffee Shop,Cocktail Bar,Italian Restaurant,Café,History Museum
7,7,8,Hotel,French Restaurant,Boutique,Garden,Clothing Store,Pastry Shop,Tailor Shop,Cosmetics Shop,Coffee Shop,Plaza
8,2,9,French Restaurant,Hotel,Italian Restaurant,Bakery,Wine Bar,Plaza,Coffee Shop,Cheese Shop,Cocktail Bar,Vegetarian / Vegan Restaurant
9,2,10,French Restaurant,Coffee Shop,Italian Restaurant,Japanese Restaurant,Cocktail Bar,Bistro,Breakfast Spot,Bakery,Asian Restaurant,Vegetarian / Vegan Restaurant


## 8) There are quite a lot of restaurants in Paris, Let's analyze them per neightbourhood

In [27]:
# For that we reuse the processed dataframe Paris_onehot where we are going to keep only the restaurant information and arrondissements (neightbourhood)
Paris_onehot_restaurants = Paris_onehot
Paris_onehot_restaurants .rename(columns = {'Arrondissement':'Arrondissement_Restaurant'}, inplace = True) 
Paris_onehot_restaurants = Paris_onehot_restaurants[Paris_onehot_restaurants .filter(regex='Restaurant').columns]
Paris_onehot_restaurants .rename(columns = {'Arrondissement_Restaurant':'Arrondissement'}, inplace = True) 

Paris_grouped_restaurants = Paris_onehot_restaurants.groupby('Arrondissement').mean().reset_index()
Paris_grouped_restaurants.head()

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  errors=errors,


Unnamed: 0,Arrondissement,Afghan Restaurant,African Restaurant,Alsatian Restaurant,American Restaurant,Arepa Restaurant,Argentinian Restaurant,Asian Restaurant,Basque Restaurant,Brazilian Restaurant,Breton Restaurant,Cambodian Restaurant,Cantonese Restaurant,Chinese Restaurant,Comfort Food Restaurant,Corsican Restaurant,Eastern European Restaurant,English Restaurant,Ethiopian Restaurant,Falafel Restaurant,Fast Food Restaurant,French Restaurant,Gluten-free Restaurant,Greek Restaurant,Indian Restaurant,Israeli Restaurant,Italian Restaurant,Japanese Restaurant,Jewish Restaurant,Jiangxi Restaurant,Korean BBQ Restaurant,Korean Restaurant,Latin American Restaurant,Lebanese Restaurant,Mediterranean Restaurant,Mexican Restaurant,Middle Eastern Restaurant,Modern European Restaurant,Moroccan Restaurant,New American Restaurant,Okonomiyaki Restaurant,Persian Restaurant,Peruvian Restaurant,Portuguese Restaurant,Restaurant,Russian Restaurant,Scandinavian Restaurant,Seafood Restaurant,Shanxi Restaurant,South American Restaurant,Southern / Soul Food Restaurant,Southwestern French Restaurant,Spanish Restaurant,Sushi Restaurant,Szechuan Restaurant,Taiwanese Restaurant,Tapas Restaurant,Thai Restaurant,Turkish Restaurant,Udon Restaurant,Vegetarian / Vegan Restaurant,Venezuelan Restaurant,Vietnamese Restaurant
0,1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.09,0.0,0.0,0.0,0.01,0.05,0.08,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.01,0.0,0.01,0.0,0.02,0.0,0.0,0.0
1,2,0.0,0.0,0.0,0.0,0.0,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.01,0.0,0.0,0.0,0.14,0.0,0.01,0.0,0.0,0.03,0.05,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.01,0.01,0.0,0.0,0.01,0.0,0.0,0.0
2,3,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.06,0.0,0.0,0.0,0.01,0.04,0.02,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.01,0.0,0.0,0.0,0.03,0.0,0.01,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.02
3,4,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.19,0.0,0.0,0.0,0.01,0.01,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.02,0.0,0.02,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.01,0.0,0.0
4,5,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.01,0.0,0.14,0.0,0.02,0.0,0.0,0.05,0.02,0.0,0.0,0.0,0.01,0.0,0.02,0.01,0.01,0.0,0.0,0.01,0.0,0.0,0.0,0.01,0.01,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.01,0.01,0.0,0.0,0.0,0.0,0.01


In [28]:
num_top_venues = 5

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Arrondissement']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighbourhoods_venues_sorted_restaurants = pd.DataFrame(columns=columns)
neighbourhoods_venues_sorted_restaurants['Arrondissement'] = Paris_grouped_restaurants['Arrondissement']

for ind in np.arange(Paris_grouped_restaurants.shape[0]):
    neighbourhoods_venues_sorted_restaurants.iloc[ind, 1:] = return_most_common_venues(Paris_grouped_restaurants.iloc[ind, :], num_top_venues)

neighbourhoods_venues_sorted_restaurants[:20]

Unnamed: 0,Arrondissement,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
0,1,French Restaurant,Japanese Restaurant,Italian Restaurant,Udon Restaurant,Sushi Restaurant
1,2,French Restaurant,Japanese Restaurant,Italian Restaurant,Restaurant,Argentinian Restaurant
2,3,French Restaurant,Italian Restaurant,Restaurant,Vietnamese Restaurant,Seafood Restaurant
3,4,French Restaurant,Restaurant,Scandinavian Restaurant,Tapas Restaurant,Falafel Restaurant
4,5,French Restaurant,Italian Restaurant,Greek Restaurant,Japanese Restaurant,Lebanese Restaurant
5,6,French Restaurant,American Restaurant,Seafood Restaurant,Mexican Restaurant,Italian Restaurant
6,7,French Restaurant,Italian Restaurant,Korean Restaurant,Spanish Restaurant,Greek Restaurant
7,8,French Restaurant,Corsican Restaurant,Italian Restaurant,Mediterranean Restaurant,Seafood Restaurant
8,9,French Restaurant,Italian Restaurant,Vegetarian / Vegan Restaurant,Vietnamese Restaurant,Seafood Restaurant
9,10,French Restaurant,Japanese Restaurant,Italian Restaurant,Asian Restaurant,Indian Restaurant


In [29]:
# set number of clusters
kclusters = 10

Paris_grouped_clustering_restaurants = Paris_grouped_restaurants.drop('Arrondissement', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(Paris_grouped_clustering_restaurants)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:20]

array([4, 0, 9, 8, 0, 5, 2, 5, 0, 0, 9, 4, 1, 3, 7, 3, 6, 0, 0, 5],
      dtype=int32)

In [30]:
# add clustering labels
neighbourhoods_venues_sorted_restaurants.insert(0, 'Cluster Labels', kmeans.labels_)
Paris_merged_restaurants = Paris
neighbourhoods_venues_sorted_restaurants.rename(columns = {'Arrondissement':'c_ar'}, inplace = True) 
# merge manhattan_grouped with manhattan_data to add latitude/longitude for each neighbourhood
Paris_merged_restaurants = Paris_merged_restaurants.join(neighbourhoods_venues_sorted_restaurants.set_index('c_ar'), on='c_ar')

Paris_merged_restaurants.head() # check the last columns!

Unnamed: 0,n_sq_co,perimetre,l_ar,surface,n_sq_ar,l_aroff,c_arinsee,c_ar,geometry,Area (km2),Population,Population per km2,latitude,longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
0,750001537,6054.936862,1er Ardt,1824613.0,750000001,Louvre,75101,1,"POLYGON ((2.32801 48.86992, 2.32997 48.86851, ...",1.826,17268,9457,48.862563,2.336443,4,French Restaurant,Japanese Restaurant,Italian Restaurant,Udon Restaurant,Sushi Restaurant
1,750001537,4554.10436,2ème Ardt,991153.7,750000002,Bourse,75102,2,"POLYGON ((2.35152 48.86443, 2.35095 48.86341, ...",0.992,22558,22740,48.868279,2.342803,0,French Restaurant,Japanese Restaurant,Italian Restaurant,Restaurant,Argentinian Restaurant
2,750001537,11253.182479,19ème Ardt,6792651.0,750000019,Buttes-Chaumont,75119,19,"POLYGON ((2.38943 48.90122, 2.39014 48.90108, ...",6.786,187799,27674,48.887076,2.384821,0,French Restaurant,Italian Restaurant,Restaurant,Seafood Restaurant,Asian Restaurant
3,750001537,4519.263648,3ème Ardt,1170883.0,750000003,Temple,75103,3,"POLYGON ((2.36383 48.86750, 2.36389 48.86747, ...",1.171,36727,31364,48.862872,2.360001,9,French Restaurant,Italian Restaurant,Restaurant,Vietnamese Restaurant,Seafood Restaurant
4,750001537,8099.424883,7ème Ardt,4090057.0,750000007,Palais-Bourbon,75107,7,"POLYGON ((2.32090 48.86306, 2.32094 48.86305, ...",4.088,58166,14228,48.856174,2.312188,2,French Restaurant,Italian Restaurant,Korean Restaurant,Spanish Restaurant,Greek Restaurant


In [31]:
# create map
map_clusters_restaurants = folium.Map(location=[latitude, longitude], zoom_start=13)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

map_clusters_restaurants.choropleth(arrondissement_geo, data=Paris_merged_restaurants,columns=['c_ar','Cluster Labels'], key_on='feature.properties.c_ar',    fill_color='YlOrRd', fill_opacity=0.4,     line_opacity=0.2,    legend_name='Population per km2')




# add markers to the map
markers_colors = []
for lat, lon, poi, cluster, first_venue, second_venue, third_venue, fourth_venue, fifth_venue in zip(Paris_merged_restaurants['latitude'], Paris_merged_restaurants['longitude'], Paris_merged_restaurants['l_ar'], Paris_merged_restaurants['Cluster Labels'], Paris_merged_restaurants['1st Most Common Venue'], Paris_merged_restaurants['2nd Most Common Venue'],Paris_merged_restaurants['3rd Most Common Venue'],Paris_merged_restaurants['4th Most Common Venue'],Paris_merged_restaurants['5th Most Common Venue']):
    #label = folium.Popup( str(poi) + " Arrondissement " + ' Cluster '+ """  <p>"""+ str(cluster) + ' Main venue :  '  + str(first_venue), parse_html=True)
    text = str(poi) + """  <p>"""+ ' Cluster '+  str(cluster) + """  <p>"""+   ' Main venues :  ' + """  <p>"""+  str(first_venue)+ """  <p>"""+  str(second_venue)+ """  <p>"""+  str(third_venue)+ """  <p>"""+  str(fourth_venue)+ """  <p>"""+  str(fifth_venue)
    text_processed = folium.Html(text, script=True) # i'm assuming this bit runs fine
    iframe = folium.IFrame(html=text_processed, width=350, height=300)
    label = folium.Popup(iframe, parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters_restaurants)



       
map_clusters_restaurants



### Nice, now the experienced tourist has a map where each neighbourhood is 
- presented with it's 5 top restaurants,
- clustered with other neighbourhood.

### I invite you to click on the center of each neightbourhood to see their main restaurant type.
See below for the sum-up

In [32]:
neighbourhoods_venues_sorted_restaurants[:20]

Unnamed: 0,Cluster Labels,c_ar,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
0,4,1,French Restaurant,Japanese Restaurant,Italian Restaurant,Udon Restaurant,Sushi Restaurant
1,0,2,French Restaurant,Japanese Restaurant,Italian Restaurant,Restaurant,Argentinian Restaurant
2,9,3,French Restaurant,Italian Restaurant,Restaurant,Vietnamese Restaurant,Seafood Restaurant
3,8,4,French Restaurant,Restaurant,Scandinavian Restaurant,Tapas Restaurant,Falafel Restaurant
4,0,5,French Restaurant,Italian Restaurant,Greek Restaurant,Japanese Restaurant,Lebanese Restaurant
5,5,6,French Restaurant,American Restaurant,Seafood Restaurant,Mexican Restaurant,Italian Restaurant
6,2,7,French Restaurant,Italian Restaurant,Korean Restaurant,Spanish Restaurant,Greek Restaurant
7,5,8,French Restaurant,Corsican Restaurant,Italian Restaurant,Mediterranean Restaurant,Seafood Restaurant
8,0,9,French Restaurant,Italian Restaurant,Vegetarian / Vegan Restaurant,Vietnamese Restaurant,Seafood Restaurant
9,0,10,French Restaurant,Japanese Restaurant,Italian Restaurant,Asian Restaurant,Indian Restaurant


## 9) Paris is driven by culture, let's analyze the ongoing event type.

### Thanks to https://opendata.paris.fr we can get an constantly updated data set of events in Paris.

In [33]:
!wget "https://opendata.paris.fr/explore/dataset/que-faire-a-paris-/download/?format=geojson&timezone=Europe/Berlin&lang=fr" -O culture.geojson
print('GeoJSON file downloaded!')
!ls ./

--2020-12-26 12:03:57--  https://opendata.paris.fr/explore/dataset/que-faire-a-paris-/download/?format=geojson&timezone=Europe/Berlin&lang=fr
Resolving opendata.paris.fr (opendata.paris.fr)... 34.248.20.69, 34.249.199.226
Connecting to opendata.paris.fr (opendata.paris.fr)|34.248.20.69|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: unspecified [application/json]
Saving to: ‘culture.geojson’

culture.geojson         [          <=>       ]   6.92M  1.02MB/s    in 6.8s    

2020-12-26 12:04:07 (1.02 MB/s) - ‘culture.geojson’ saved [7251275]

GeoJSON file downloaded!
arrondissements.geojson  culture.geojson


In [34]:
# let's look at it
culture_geo = r'./culture.geojson'
Paris_culture = geopandas.read_file(culture_geo)
Paris_culture.head()
# there are quite a lot of information

Unnamed: 0,blind,pmr,date_end,deaf,updated_at,access_type,occurrences,contact_name,cover_alt,id,category,title,address_street,date_start,price_detail,access_link,contact_url,address_name,contact_twitter,contact_phone,description,tags,contact_mail,access_mail,lead_text,cover_url,contact_facebook,access_phone,cover_credit,address_city,price_type,cover,url,date_description,address_zipcode,transport,programs,geometry
0,0,0,2021-02-26T17:00:00+01:00,0,2020-12-18T17:53:48+01:00,reservation,2021-02-15T10:00:00+01:00,"GOBELINS, l'école de l'image",Gobelins,105400,Animations -> Stage,Stage d'initiation à l'animation,"73, boulevard Saint Marcel",2021-02-15T10:00:00+01:00,580 euros les 30 heures,https://ateliers.gobelins.fr/Stage-d-initiatio...,https://www.gobelins.fr/,"GOBELINS, l'école de l'image",https://twitter.com/gobelins_paris?ref_src=tws...,0631508717,<p><b>PRÉSENTATION</b></p><p>Tu aimes le ciném...,Ados;Geek;Cinéma;Enfants,lbouhali@gobelins.fr,lbouhali@gobelins.fr,Un atelier d'initiation à l'animation axé sur ...,https://quefaire-api.paris.fr/images/45880,https://fr-fr.facebook.com/gobelins.ecole,631508717.0,Gobelins,Paris,payant,"{'mimetype': 'image/jpeg', 'format': 'JPEG', '...",https://quefaire.paris.fr/105400/stage-d-initi...,"Du 15 au 19 février 2021 : <br />lundi, mardi,...",75013,Ligne 7 Station les Gobelins\nLigne 5 Station ...,,POINT (2.35402 48.83368)
1,0,0,2021-02-13T18:00:00+01:00,0,2020-12-17T15:59:30+01:00,libre,2020-12-07T14:00:00+01:00,Bibliothèque du cinéma François Truffaut,"Exposition ""Vinyles, quand la musique se dessine""",113425,Expositions -> Illustration / BD,"Vinyle, quand la musique se dessine","4 rue du cinéma, forum des halles",2020-12-05T13:00:00+01:00,,,https://www.paris.fr/equipements/bibliotheque-...,Bibliothèque du cinéma François Truffaut,https://twitter.com/truffaut_cinema,01 40 26 29 33,"<p>A l'occasion de l'année de la BD, les établ...",Geek;Cinéma;Expos;Bibliothèques;En famille,bibliotheque.cinema@paris.fr,,Une exposition itinérante proposée par les tro...,https://quefaire-api.paris.fr/images/67888,https://www.facebook.com/bibliothequeducinemaf...,,© Pixabay,Paris,gratuit,"{'mimetype': 'image/jpeg', 'format': 'JPEG', '...",https://quefaire.paris.fr/113425/vinyle-quand-...,Du 5 décembre 2020 au 13 février 2021 : <br />...,75001,"1 : Tuileries (73m)\n1, 7 : Palais Royal - Mus...","BD 2020, la bande dessinée est mise à l’honneu...",POINT (2.33105 48.86405)
2,0,1,2021-01-29T21:00:00+01:00,0,2020-12-17T15:35:12+01:00,reservation,2021-01-29T19:00:00+01:00,Centre Wallonie Bruxelles,Wooshing Machine,113776,Spectacles -> Théâtre,Woosh DELUXXIII - THE MAGNIFICENT 4 XXL IN PARIS,127-129 rue Saint-Martin,2021-01-29T19:00:00+01:00,Plein tarif: 10€\nRéduit: 8€\nGroupe: 5€ (5per...,http://bit.ly/3nw8D4Q,https://bit.ly/2K4KkwJ,Centre Wallonie-Bruxelles,,0153019696,<h4>#PROJECTION #SPECTACLE</h4><p>19H00 &gt; <...,Étudiants;Insolite;Ados;Cinéma;En famille,reservation@cwb.fr,reservation@cwb.fr,Soirée consacrée au focus sur la Compagnie Woo...,https://cdn.paris.fr/qfap/2020/12/17/76441_V29...,http://bit.ly/37tcGt2,153019696.0,© Stéphane Broc,Paris,payant,"{'mimetype': 'image/jpeg', 'format': 'JPEG', '...",https://quefaire.paris.fr/113776/woosh-deluxxi...,Le vendredi 29 janvier 2021<br />de 19h à 21h<...,75004,11 : Rambuteau (227m)\n4 : Étienne Marcel (315m),Woosh DELUXXIII - Focus Cie Wooshing Machine (...,POINT (2.35048 48.86092)
3,0,1,2021-01-15T21:00:00+01:00,0,2020-12-17T11:18:26+01:00,reservation,2021-01-15T19:00:00+01:00,Centre Wallonie Bruxelles,Wooshing Machine,113762,Spectacles -> Autre spectacle,Woosh DELUXXIII - Opening night,127-129 rue Saint-Martin,2021-01-15T19:00:00+01:00,Entrée gratuite sur réservation.\nDans le resp...,http://bit.ly/3nw8D4Q,https://www.cwb.fr/,Centre Wallonie-Bruxelles,,0153019696,<p><b>19H00 &gt; OPENING NIGHT (50’) </b></p><...,Insolite;En famille,reservation@cwb.fr,reservation@cwb.fr,Ouverture de la première soirée consacrée au f...,https://cdn.paris.fr/qfap/2020/12/17/76423_NDI...,https://fr-fr.facebook.com/CentreWallonieBruxe...,153019696.0,DR,Paris,gratuit,"{'mimetype': 'image/jpeg', 'format': 'JPEG', '...",https://quefaire.paris.fr/113762/woosh-deluxxi...,Le vendredi 15 janvier 2021<br />de 19h à 21h<...,75004,11 : Rambuteau (227m)\n4 : Étienne Marcel (315m),Woosh DELUXXIII - Focus Cie Wooshing Machine (...,POINT (2.35048 48.86092)
4,0,1,2021-01-28T18:30:00+01:00,1,2020-12-16T17:18:05+01:00,reservation,2020-12-17T17:30:00+01:00,Médiathèque de la Canopée,Logo Atelier de conversation lsf en ligne,113201,Animations -> Atelier / Cours,Atelier de conversation en LSF [EN LIGNE],10 passage de la Canopée,2020-12-17T17:30:00+01:00,,http://bit.ly/AnimCanopee,https://bibliothequecanopee.wordpress.com/,Médiathèque de la Canopée la fontaine,https://twitter.com/bibCanopee,0144507656,<p>Vous êtes Sourd ou Entendant ? Français ou ...,Bibliothèques,mediatheque.canopee@paris.fr,,Un rendez-vous pour pratiquer la Langue des S...,https://cdn.paris.fr/qfap/2020/12/16/76416_QXR...,https://www.facebook.com/bibcanopee/?ref=hl,,La Canopée,Paris,gratuit,"{'mimetype': 'image/jpeg', 'format': 'JPEG', '...",https://quefaire.paris.fr/113201/atelier-de-co...,Le jeudi 17 décembre 2020<br />de 17h30 à 18h3...,75001,"Les Halles, ligne 4 / Châtelet, ligne 1, 7, 11...",Les bibliothèques en ligne (https://quefaire.p...,POINT (2.34685 48.86240)


In [35]:
#let's wrok with a slimer dataframe, in vue of the merge, by selecting the relevant column.
#We need to extract the neighbourhood number.
Paris_culture['Arrondissement'] = Paris_culture['address_zipcode'].str[-2:]
Paris_culture = Paris_culture[['category','Arrondissement']]
Paris_culture.head(20)


Unnamed: 0,category,Arrondissement
0,Animations -> Stage,13
1,Expositions -> Illustration / BD,1
2,Spectacles -> Théâtre,4
3,Spectacles -> Autre spectacle,4
4,Animations -> Atelier / Cours,1
5,Événements -> Autre événement,13
6,Concerts -> Hip-Hop,13
7,Expositions -> Art Contemporain,4
8,Expositions -> Beaux-Arts,19
9,Animations -> Atelier / Cours,18


In [36]:
# one hot encoding
Paris_culture_onehot = pd.get_dummies(Paris_culture[['category']], prefix="", prefix_sep="")
# add neighbourhood column back to dataframe
Paris_culture_onehot['Arrondissement'] = Paris_culture['Arrondissement'] 

# move neighbourhood column to the first column
col_name="Arrondissement"
first_col = Paris_culture_onehot.pop(col_name)
Paris_culture_onehot.insert(0, col_name, first_col)

Paris_culture_onehot_grouped = Paris_culture_onehot.groupby('Arrondissement').mean().reset_index()
Paris_culture_onehot_grouped.drop(Paris_culture_onehot_grouped[Paris_culture_onehot_grouped['Arrondissement'] == '00'].index, inplace=True)
Paris_culture_onehot_grouped.head(20)

Unnamed: 0,Arrondissement,Animations -> Atelier / Cours,Animations -> Autre animation,Animations -> Balade,Animations -> Conférence / Débat,Animations -> Lecture / Rencontre,Animations -> Loisirs / Jeux,Animations -> Stage,Animations -> Visite guidée,Concerts -> Autre concert,Concerts -> Chanson française,Concerts -> Classique,Concerts -> Folk,Concerts -> Hip-Hop,Concerts -> Jazz,Concerts -> Musiques du Monde,Concerts -> Pop / Variété,Concerts -> Rock,Concerts -> Électronique,Expositions -> Art Contemporain,Expositions -> Autre expo,Expositions -> Beaux-Arts,Expositions -> Design / Mode,Expositions -> Histoire / Civilisations,Expositions -> Illustration / BD,Expositions -> Photographie,Expositions -> Sciences / Techniques,Expositions -> Street-art,Spectacles -> Autre spectacle,Spectacles -> Cirque / Art de la Rue,Spectacles -> Danse,Spectacles -> Humour,Spectacles -> Jeune public,Spectacles -> Opéra / Musical,Spectacles -> Projection,Spectacles -> Théâtre,Événements -> Autre événement,Événements -> Brocante / Marché,Événements -> Festival / Cycle,Événements -> Fête / Parade,Événements -> Salon,Événements -> Soirée / Bal,Événements -> Événement sportif
1,1,0.096,0.016,0.0,0.016,0.056,0.016,0.04,0.04,0.0,0.0,0.016,0.0,0.0,0.552,0.0,0.0,0.0,0.0,0.016,0.0,0.032,0.016,0.0,0.024,0.0,0.0,0.008,0.0,0.0,0.0,0.0,0.0,0.0,0.024,0.032,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,2,0.0,0.0,0.0,0.0,0.0,0.111111,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.333333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.111111,0.0,0.111111,0.0,0.0,0.0,0.0,0.333333,0.0
3,3,0.052632,0.0,0.0,0.157895,0.0,0.0,0.0,0.0,0.0,0.0,0.013158,0.0,0.013158,0.026316,0.0,0.0,0.052632,0.013158,0.065789,0.052632,0.065789,0.0,0.039474,0.0,0.131579,0.0,0.0,0.013158,0.0,0.092105,0.0,0.052632,0.0,0.052632,0.078947,0.013158,0.0,0.013158,0.0,0.0,0.0,0.0
4,4,0.028169,0.028169,0.028169,0.070423,0.014085,0.014085,0.014085,0.056338,0.014085,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.014085,0.140845,0.028169,0.014085,0.0,0.084507,0.028169,0.056338,0.0,0.0,0.028169,0.0,0.056338,0.014085,0.014085,0.0,0.028169,0.183099,0.028169,0.0,0.014085,0.0,0.0,0.0,0.0
5,5,0.08,0.04,0.0,0.0,0.04,0.04,0.08,0.12,0.0,0.0,0.04,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.12,0.04,0.0,0.08,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.04,0.16,0.08,0.0,0.04,0.0,0.0,0.0,0.0
6,6,0.1,0.0,0.0,0.0,0.0,0.0,0.15,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.05,0.0,0.1,0.0,0.05,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.05,0.0,0.0,0.0,0.0,0.0,0.1,0.0
7,7,0.090909,0.0,0.0,0.0,0.0,0.090909,0.0,0.0,0.0,0.0,0.181818,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.090909,0.0,0.272727,0.0,0.181818,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.090909,0.0,0.0,0.0
8,8,0.052632,0.0,0.0,0.0,0.105263,0.0,0.0,0.0,0.0,0.052632,0.210526,0.0,0.0,0.0,0.0,0.052632,0.052632,0.0,0.052632,0.0,0.105263,0.0,0.105263,0.052632,0.052632,0.0,0.0,0.0,0.0,0.0,0.0,0.052632,0.0,0.0,0.0,0.052632,0.0,0.0,0.0,0.0,0.0,0.0
9,9,0.2,0.066667,0.0,0.0,0.066667,0.066667,0.066667,0.333333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.066667,0.066667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.066667,0.0
10,10,0.242424,0.0,0.030303,0.090909,0.060606,0.030303,0.0,0.030303,0.0,0.0,0.0,0.0,0.0,0.060606,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.030303,0.30303,0.0,0.0,0.030303,0.0,0.0,0.0,0.0,0.0,0.0,0.030303,0.060606,0.0,0.0,0.0,0.0,0.0,0.0


In [37]:
num_top_venues = 5

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Arrondissement']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
Paris_culture_onehot_grouped_sorted = pd.DataFrame(columns=columns)
Paris_culture_onehot_grouped_sorted['Arrondissement'] = Paris_culture_onehot_grouped['Arrondissement']

for ind in np.arange(Paris_culture_onehot_grouped.shape[0]):
    Paris_culture_onehot_grouped_sorted.iloc[ind, 1:] = return_most_common_venues(Paris_culture_onehot_grouped.iloc[ind, :], num_top_venues)

Paris_culture_onehot_grouped_sorted[:20]

Unnamed: 0,Arrondissement,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
1,1,Concerts -> Jazz,Animations -> Atelier / Cours,Animations -> Lecture / Rencontre,Animations -> Stage,Animations -> Visite guidée
2,2,Événements -> Soirée / Bal,Expositions -> Histoire / Civilisations,Événements -> Autre événement,Animations -> Loisirs / Jeux,Spectacles -> Projection
3,3,Animations -> Conférence / Débat,Expositions -> Photographie,Spectacles -> Danse,Spectacles -> Théâtre,Expositions -> Beaux-Arts
4,4,Spectacles -> Théâtre,Expositions -> Art Contemporain,Expositions -> Histoire / Civilisations,Animations -> Conférence / Débat,Expositions -> Photographie
5,5,Spectacles -> Théâtre,Animations -> Visite guidée,Expositions -> Autre expo,Animations -> Atelier / Cours,Expositions -> Histoire / Civilisations
6,6,Expositions -> Photographie,Animations -> Visite guidée,Animations -> Stage,Expositions -> Beaux-Arts,Événements -> Soirée / Bal
7,7,Expositions -> Beaux-Arts,Expositions -> Histoire / Civilisations,Concerts -> Classique,Animations -> Loisirs / Jeux,Expositions -> Art Contemporain
8,8,Concerts -> Classique,Expositions -> Beaux-Arts,Animations -> Lecture / Rencontre,Expositions -> Histoire / Civilisations,Expositions -> Photographie
9,9,Animations -> Visite guidée,Animations -> Atelier / Cours,Animations -> Loisirs / Jeux,Expositions -> Sciences / Techniques,Expositions -> Street-art
10,10,Expositions -> Photographie,Animations -> Atelier / Cours,Animations -> Conférence / Débat,Animations -> Lecture / Rencontre,Événements -> Autre événement


In [38]:
# set number of clusters
kclusters = 10

Paris_culture_onehot_grouped_clustering = Paris_culture_onehot_grouped.drop('Arrondissement', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(Paris_culture_onehot_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:20]

array([3, 2, 0, 0, 6, 6, 2, 2, 6, 6, 0, 6, 6, 0, 0, 4, 4, 6, 4, 6],
      dtype=int32)

In [39]:
# add clustering labels
Paris_culture_onehot_grouped_sorted.insert(0, 'Cluster Labels', kmeans.labels_)
Paris_merged_cultured = Paris
Paris_culture_onehot_grouped_sorted.rename(columns = {'Arrondissement':'c_ar'}, inplace = True) 
# merge manhattan_grouped with manhattan_data to add latitude/longitude for each neighbourhood
Paris_culture_onehot_grouped_sorted['c_ar']=pd.to_numeric(Paris_culture_onehot_grouped_sorted.c_ar)

Paris_merged_cultured = Paris_merged_cultured.join(Paris_culture_onehot_grouped_sorted.set_index('c_ar'), on='c_ar')

Paris_merged_cultured.head() # check the last columns!

Unnamed: 0,n_sq_co,perimetre,l_ar,surface,n_sq_ar,l_aroff,c_arinsee,c_ar,geometry,Area (km2),Population,Population per km2,latitude,longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
0,750001537,6054.936862,1er Ardt,1824613.0,750000001,Louvre,75101,1,"POLYGON ((2.32801 48.86992, 2.32997 48.86851, ...",1.826,17268,9457,48.862563,2.336443,3,Concerts -> Jazz,Animations -> Atelier / Cours,Animations -> Lecture / Rencontre,Animations -> Stage,Animations -> Visite guidée
1,750001537,4554.10436,2ème Ardt,991153.7,750000002,Bourse,75102,2,"POLYGON ((2.35152 48.86443, 2.35095 48.86341, ...",0.992,22558,22740,48.868279,2.342803,2,Événements -> Soirée / Bal,Expositions -> Histoire / Civilisations,Événements -> Autre événement,Animations -> Loisirs / Jeux,Spectacles -> Projection
2,750001537,11253.182479,19ème Ardt,6792651.0,750000019,Buttes-Chaumont,75119,19,"POLYGON ((2.38943 48.90122, 2.39014 48.90108, ...",6.786,187799,27674,48.887076,2.384821,4,Concerts -> Classique,Animations -> Conférence / Débat,Animations -> Atelier / Cours,Concerts -> Musiques du Monde,Concerts -> Jazz
3,750001537,4519.263648,3ème Ardt,1170883.0,750000003,Temple,75103,3,"POLYGON ((2.36383 48.86750, 2.36389 48.86747, ...",1.171,36727,31364,48.862872,2.360001,0,Animations -> Conférence / Débat,Expositions -> Photographie,Spectacles -> Danse,Spectacles -> Théâtre,Expositions -> Beaux-Arts
4,750001537,8099.424883,7ème Ardt,4090057.0,750000007,Palais-Bourbon,75107,7,"POLYGON ((2.32090 48.86306, 2.32094 48.86305, ...",4.088,58166,14228,48.856174,2.312188,2,Expositions -> Beaux-Arts,Expositions -> Histoire / Civilisations,Concerts -> Classique,Animations -> Loisirs / Jeux,Expositions -> Art Contemporain


In [40]:
# create map
map_clusters_culture = folium.Map(location=[latitude, longitude], zoom_start=13)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

map_clusters_culture.choropleth(arrondissement_geo, data=Paris_merged_cultured,columns=['c_ar','Cluster Labels'], key_on='feature.properties.c_ar',    fill_color='YlOrRd', fill_opacity=0.4,     line_opacity=0.2,    legend_name='Cluster on events')




# add markers to the map
markers_colors = []
for lat, lon, poi, cluster, first_venue, second_venue, third_venue, fourth_venue, fifth_venue in zip(Paris_merged_cultured['latitude'], Paris_merged_cultured['longitude'], Paris_merged_cultured['l_ar'], Paris_merged_cultured['Cluster Labels'], Paris_merged_cultured['1st Most Common Venue'], Paris_merged_cultured['2nd Most Common Venue'], Paris_merged_cultured['3rd Most Common Venue'],Paris_merged_cultured['4th Most Common Venue'],Paris_merged_cultured['5th Most Common Venue']):
    #label = folium.Popup( str(poi) + " Arrondissement " + ' Cluster '+ """  <p>"""+ str(cluster) + ' Main venue :  '  + str(first_venue), parse_html=True)
    text = str(poi) + """  <p>"""+ ' Cluster '+  str(cluster) + """  <p>"""+   ' Main venues :  ' + """  <p>"""+  str(first_venue)+ """  <p>"""+  str(second_venue)+ """  <p>"""+  str(third_venue)+ """  <p>"""+  str(fourth_venue)+ """  <p>"""+  str(fifth_venue)
    text_processed = folium.Html(text, script=True) # i'm assuming this bit runs fine
    iframe = folium.IFrame(html=text_processed, width=350, height=300)
    label = folium.Popup(iframe, parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters_culture)



       
map_clusters_culture



### Nice, now the experienced tourist has a map where each neighbourhood is 
- presented with it's 5 top ongoing cultural activity,
- clustered with other neighbourhood.

### I invite you to click on the center of each neightbourhood to see their main restaurant type.
See below for the sum-up

In [41]:
Paris_culture_onehot_grouped_sorted[:20]

Unnamed: 0,Cluster Labels,c_ar,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
1,3,1,Concerts -> Jazz,Animations -> Atelier / Cours,Animations -> Lecture / Rencontre,Animations -> Stage,Animations -> Visite guidée
2,2,2,Événements -> Soirée / Bal,Expositions -> Histoire / Civilisations,Événements -> Autre événement,Animations -> Loisirs / Jeux,Spectacles -> Projection
3,0,3,Animations -> Conférence / Débat,Expositions -> Photographie,Spectacles -> Danse,Spectacles -> Théâtre,Expositions -> Beaux-Arts
4,0,4,Spectacles -> Théâtre,Expositions -> Art Contemporain,Expositions -> Histoire / Civilisations,Animations -> Conférence / Débat,Expositions -> Photographie
5,6,5,Spectacles -> Théâtre,Animations -> Visite guidée,Expositions -> Autre expo,Animations -> Atelier / Cours,Expositions -> Histoire / Civilisations
6,6,6,Expositions -> Photographie,Animations -> Visite guidée,Animations -> Stage,Expositions -> Beaux-Arts,Événements -> Soirée / Bal
7,2,7,Expositions -> Beaux-Arts,Expositions -> Histoire / Civilisations,Concerts -> Classique,Animations -> Loisirs / Jeux,Expositions -> Art Contemporain
8,2,8,Concerts -> Classique,Expositions -> Beaux-Arts,Animations -> Lecture / Rencontre,Expositions -> Histoire / Civilisations,Expositions -> Photographie
9,6,9,Animations -> Visite guidée,Animations -> Atelier / Cours,Animations -> Loisirs / Jeux,Expositions -> Sciences / Techniques,Expositions -> Street-art
10,6,10,Expositions -> Photographie,Animations -> Atelier / Cours,Animations -> Conférence / Débat,Animations -> Lecture / Rencontre,Événements -> Autre événement


## 10) Discussion

Through this project, we have use several tools 
- Main python lib : Pandas, numpy, geopandas, folium, maplotlib, Json, request
- Shell tools like wget

and use this tool to process several sources of data : webpage, json, geojson. 
    
The objective of the project was to offer to an experienced tourist, a novel way to look at Paris and discover its life outside of the main activities. As everobody as different ways to enjoy the city life, here we didn't have as objective to suggest anything rather to raise the curiosity through new set of information.
Now the experienced tourism has : 
- 4 maps of Paris sliced by neighbourhoods, with colored tile or/and on each center a marker offering a descriptif regarding the subject of interest : Population density, Top 5 main venue, Top 5 Restaurant, Top 5 ongoing cultural event. 
- Clusterization of the neightbourhoods, thanks to the unsupervised cluster algorithm, to help him to plan thematic holidays by staying in similar neightbourhood or to discovered severa type of local life by selecting different type of neightboorhood. 

As it can be guess, thanks to the high quality of the data for Paris, free of access, the study can be refine for specific tourist taste.

Thanks !
