# Clustering and segmenting Neighborhood in Rennes, France

## Introduction / Business Problem

Rennes is my birth city and was quite in advance on its time by giving open datas to the people since at least ten years. Now, there is lots of data from the city of Rennes and other contractors that have business in Rennes. Like for example the company that run the bus in Rennes has some API to get datas about them. Bus stop location, real time traffic, etc..

Our objectif in this project is to cluster Rennes Neighborhood using foursquare and using datas from the city of Rennes.

We use foursquare API but we also add data about transportation of Rennes city.

Then we try to analyse cluster predicted and assigned them to groups of people. 

Overall business objectif  is to direct people/businesses, that are looking to settle in Rennes, into the good neighborhood.

## Data

In order to complete out goals, two source of datas are going to be used : 
- Foursquare API
- Rennes' datasets

All datas are quite easily available. Biggest challenge is to find out the neighborhood and assign each venue, structure, etc.. to the correct neighborhood.


### Foursquare API

Foursquare is a social location service that allows users to explore the world around them. They are at this time able to review locations in which they come and go and give a notation and a comment about this place. 

The Foursquare API allows application developers to interact with the Foursquare platform. We can retrieve venues and all the details about it (notations, comments, users, etc..), but also details about users.

In order to cluster our neighborhood, we will use Foursquare API to have data about Rennes locations and venues.
We will then be able to link top 10 venues to each neighborhood.

### Rennes' datasets

In addition to the Foursquare venues, we will add transport informations to our neighborhood : 
- Number of Bus stops
- Number of charger for electric vehicles
- Number of kilometers of paid parking
- Number of bike supports
- Number of kilometers of bike ways
- Number of culture equipment

Each of these datasets are coming from Rennes Open Data Services. We downloaded some csv files containing informations about Rennes and others cities surrounding Rennes. Each equipment, location, etc. is coming with a Latitude and a Longitude. for simplicity purpose, we will use in this study only the informations concerning Rennes. 

In Rennes Open Datas Services we were also able to find on very important dataset, which is the one that cut Rennes into neighborhood.

Thanksfully Rennes is a quite small city compared to New York and Toronto, so we do not need to cut down the numbers of neighborhood.

In order to analyse our clusters, we also have datas about : 
- Sexe, age and nationality of population in Rennes
- Seniority of habitants of Rennes

We will use these datas to see if our analysis is correct at the end of the project. 
We will indeed be able to see if habitants of a neighbourhood are concording with the analysis we made about each clusters we created.

# Notebook

## Data import

First we import all the libraries needed

In [4]:
import pandas as pd
import numpy as np
import requests
import json # library to handle JSON files
from math import *

!conda install -c conda-forge geopy --yes # uncomment this line if you haven't completed the Foursquare API lab
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values


from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

!conda install -c conda-forge folium=0.5.0 --yes # uncomment this line if you haven't completed the Foursquare API lab
import folium # map rendering library

print('Libraries imported.')

Fetching package metadata .............
Solving package specifications: .

# All requested packages already installed.
# packages in environment at /opt/conda/envs/DSX-Python35:
#
geopy                     1.17.0                     py_0    conda-forge
Fetching package metadata .............
Solving package specifications: .

# All requested packages already installed.
# packages in environment at /opt/conda/envs/DSX-Python35:
#
folium                    0.5.0                      py_0    conda-forge
Libraries imported.


After this we define this little function that will help us compute distance between two points defined with Longitude and Latitude. This will be very helpfull for us in order to define the appartenance of an object (bus stop, electric car charger, etc.) to a neighbourhood.

In [1]:
def haversine(lon1, lat1, lon2, lat2):

    lon1, lat1, lon2, lat2 = map(radians, [lon1, lat1, lon2, lat2])

    # haversine formula 
    dlon = lon2 - lon1 
    dlat = lat2 - lat1 
    a = sin(dlat/2)**2 + cos(lat1) * cos(lat2) * sin(dlon/2)**2
    c = 2 * asin(sqrt(a)) 
    r = 6371 # Radius of earth in kilometers. Use 3956 for miles
    return c * r

## Neighbourhood

First we import the Neighbourhood of Rennes. We downloaded csv files from Rennes Open Data website into the Watson Studio. We use the code provided by IBM to import the data into the notebook. After This we only remove the unused column and we keep only the Latitude and Longitude (as floats), and of course the name of the neighbourhood.

In [2]:
# The code was removed by Watson Studio for sharing.

Unnamed: 0,Neighbourhood,lat,long
0,Le Blosne,48.085013,-1.658945
1,Cleunay - Arsenal - Redon,48.095816,-1.722033
2,Saint Martin,48.126865,-1.683262
3,Villejean - Beauregard,48.129004,-1.711953
4,Bréquigny,48.086038,-1.685403


Let's see this on the map : 

In [5]:
latitude = 48.117266
longitude = -1.6777926

# create map of Rennes using latitude and longitude values
map_rennes = folium.Map(location=[latitude, longitude], zoom_start=12)

# add markers to map
for lat, lng, neighborhood in zip(rennes_neighbourhood['lat'], rennes_neighbourhood['long'], rennes_neighbourhood['Neighbourhood']):
    label = '{}'.format(neighborhood)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_rennes)  
    
map_rennes

## Other Datas from Rennes Open Data Website :

In all the datas that we will get in this section, we will remove unused columns and keep a column with the equipment name, the longitude and the latitude as float and we will add a column full of Nan named Neighbourhood in order to class each object into a neighbourhood later. And then we will show the head.

Also for simplicity we only keep datas from the city of Rennes only, as in some data frames, their is data of suburb of Rennes that we remove.

### Electric Car Charger

In [6]:
body = client_2551b64066e74033992250268342bdad.get_object(Bucket='courseraproject-donotdelete-pr-pstz2scfjmnlsh',Key='bornes-de-recharge-dediees-aux-vehicules-electriques-sur-le-territoire-de-rennes.csv')['Body']
# add missing __iter__ method, so pandas accepts body as file-like object
if not hasattr(body, "__iter__"): body.__iter__ = types.MethodType( __iter__, body )

electric_car_charger = pd.read_csv(body,delimiter=';')
electric_car_charger['addr'], electric_car_charger['town'] = electric_car_charger['site_adr'].str.split(',').str
electric_car_charger = electric_car_charger.loc[electric_car_charger['town'] == ' Rennes']
electric_car_charger['lat'], electric_car_charger['long'] = electric_car_charger['Geo Point'].str.split(',').str
electric_car_charger = electric_car_charger[['lat','long']]
electric_car_charger['Equipment'] = 'electric car charger'
electric_car_charger.head()

Unnamed: 0,lat,long,Equipment
2,48.1061492349,-1.67716224362,electric car charger
3,48.1109995579,-1.6836300089,electric car charger
4,48.1305281329,-1.6383229456,electric car charger
6,48.0924137533,-1.674211241,electric car charger
7,48.1135228591,-1.68623278686,electric car charger


### Bus stops

In [7]:
body = client_2551b64066e74033992250268342bdad.get_object(Bucket='courseraproject-donotdelete-pr-pstz2scfjmnlsh',Key='equipement-accessibilite-arrets-bus.csv')['Body']
# add missing __iter__ method, so pandas accepts body as file-like object
if not hasattr(body, "__iter__"): body.__iter__ = types.MethodType( __iter__, body )

bus_stop = pd.read_csv(body,delimiter=';')
bus_stop = bus_stop.loc[bus_stop['Commune (nom)'] == 'Rennes']
bus_stop['lat'], bus_stop['long'] = bus_stop['Coordonnées'].str.split(',').str
bus_stop = bus_stop[['lat','long']]
bus_stop['Equipment'] = 'bus stop'
bus_stop.head()

Unnamed: 0,lat,long,Equipment
0,48.127369,-1.640433,bus stop
1,48.121446,-1.655036,bus stop
2,48.119241,-1.667693,bus stop
3,48.11605,-1.674245,bus stop
4,48.11252,-1.680352,bus stop


### Bike Stops

For this one we keep the number of support at each point in order to sum them up later.

In [8]:
body = client_2551b64066e74033992250268342bdad.get_object(Bucket='courseraproject-donotdelete-pr-pstz2scfjmnlsh',Key='supports-velos.csv')['Body']
# add missing __iter__ method, so pandas accepts body as file-like object
if not hasattr(body, "__iter__"): body.__iter__ = types.MethodType( __iter__, body )

bike_stops = pd.read_csv(body,delimiter=';')
bike_stops = bike_stops.loc[bike_stops['nom_commune'] == 'Rennes']
bike_stops['lat'], bike_stops['long'] = bike_stops['Geo Point'].str.split(',').str
bike_stops = bike_stops[['lat', 'long']]
bike_stops['Equipment'] = 'bike stops'
bike_stops.head()

Unnamed: 0,lat,long,Equipment
0,48.1174872812,-1.6777579592,bike stops
1,48.1098139465,-1.67522515707,bike stops
2,48.1096012826,-1.67985481523,bike stops
3,48.113188555,-1.67762495861,bike stops
4,48.0859188837,-1.64220623748,bike stops


### Cultural equipment

For this one, name of the neighbourhood is already included. We do not even need to keep Latitude and Longitude.

In [65]:
body = client_2551b64066e74033992250268342bdad.get_object(Bucket='courseraproject-donotdelete-pr-pstz2scfjmnlsh',Key='liste-des-equipements-et-organismes-culturels-de-rennes-metropole.csv')['Body']
# add missing __iter__ method, so pandas accepts body as file-like object
if not hasattr(body, "__iter__"): body.__iter__ = types.MethodType( __iter__, body )

cultural_equipment = pd.read_csv(body,delimiter=';')
cultural_equipment = cultural_equipment.loc[cultural_equipment['CommuneNom'] == 'Rennes']
cultural_equipment['Equipment'] = 'cultural equipement'
cultural_equipment['Neighbourhood'] = cultural_equipment['QuarNom']
cultural_equipment = cultural_equipment[['Equipment','Neighbourhood']]
cultural_equipment = cultural_equipment.replace({'Bourg-L\'Evêque - La Touche - Moulin du Comte': 'Bourg l\'Evesque - La Touche - Moulin du Comte'}, regex=True)
cultural_equipment = cultural_equipment.replace({'Sud-Gare': 'Sud gare'}, regex=True)
cultural_equipment = cultural_equipment.replace({'Francisco-Ferrer - Landry - Poterie': 'Francisco Ferrer - Landry - Poterie'}, regex=True)
cultural_equipment = cultural_equipment.replace({'Cleunay - Arsenal - Redon - La Courrouze': 'Cleunay - Arsenal - Redon'}, regex=True)
cultural_equipment = cultural_equipment.replace({'Saint-Martin': 'Saint Martin'}, regex=True)
cultural_equipment = cultural_equipment.replace({'Thabor - Saint-Hélier -  Alphonse Guérin': 'Thabor - Saint-Hélier - Alphonse Guérin'}, regex=True)

    
cultural_equipment.head()

Unnamed: 0,Equipment,Neighbourhood
0,cultural equipement,Centre
1,cultural equipement,Thabor - Saint-Hélier - Alphonse Guérin
2,cultural equipement,Maurepas - Bellangerais
3,cultural equipement,Maurepas - Bellangerais
4,cultural equipement,Jeanne d'Arc - Longs Champs - Beaulieu


### Green Roads

In [10]:
body = client_2551b64066e74033992250268342bdad.get_object(Bucket='courseraproject-donotdelete-pr-pstz2scfjmnlsh',Key='amenagements-velo-et-zones-de-circulation-apaisee-sur-rennes-metropole.csv')['Body']
# add missing __iter__ method, so pandas accepts body as file-like object
if not hasattr(body, "__iter__"): body.__iter__ = types.MethodType( __iter__, body )

green_roads = pd.read_csv(body,delimiter=';')

green_roads = green_roads.loc[green_roads['c_insee'] == 35238.0]
green_roads['lat'], green_roads['long'] = green_roads['Geo Point'].str.split(',').str
green_roads = green_roads[['lat','long']]
green_roads['Equipment'] = 'green roads'
green_roads.head()

Unnamed: 0,lat,long,Equipment
0,48.1260914882,-1.63356338546,green roads
1,48.1138992815,-1.67989604862,green roads
2,48.1125929026,-1.68146027273,green roads
3,48.1011841606,-1.67756709221,green roads
4,48.0909412214,-1.66796278621,green roads


### Paid Parking

In [11]:
body = client_2551b64066e74033992250268342bdad.get_object(Bucket='courseraproject-donotdelete-pr-pstz2scfjmnlsh',Key='portions-de-voies-en-stationnement-payant-sur-la-ville-de-rennes.csv')['Body']
# add missing __iter__ method, so pandas accepts body as file-like object
if not hasattr(body, "__iter__"): body.__iter__ = types.MethodType( __iter__, body )

paid_parking = pd.read_csv(body,delimiter=';')
paid_parking['lat'], paid_parking['long'] = paid_parking['Geo Point'].str.split(',').str
paid_parking = paid_parking[['lat','long']]
paid_parking['Equipment'] = 'parking'
paid_parking.head()

Unnamed: 0,lat,long,Equipment
0,48.1123867128,-1.66129260255,parking
1,48.1047938066,-1.69226198181,parking
2,48.109626229,-1.67280206572,parking
3,48.1041845846,-1.69095615868,parking
4,48.1060532383,-1.69053215943,parking


Lets merge all of our Data Frames (except cultural equipment that we will add once we found the neigbourhood of each equipment).
We also make sure that latitude and longitude are floats.

In [66]:
frames = [paid_parking, green_roads, bike_stops, bus_stop, electric_car_charger]
equipments = pd.concat(frames)
equipments['lat'] = equipments['lat'].astype(float)
equipments['long'] = equipments['long'].astype(float)
equipments = equipments.reset_index(drop=True)
equipments.head()

Unnamed: 0,lat,long,Equipment
0,48.112387,-1.661293,parking
1,48.104794,-1.692262,parking
2,48.109626,-1.672802,parking
3,48.104185,-1.690956,parking
4,48.106053,-1.690532,parking


## Put equipment into Neighboorhood

For simplicity reason we will assign the equipment to the closest neighbourhood.

We will create column with name of the Neighbourhood.
For the computation of the distances we use the fonction defined at the begining.
We will compute distance for each equipment from each neighbourhood.

In [67]:
df = pd.get_dummies(rennes_neighbourhood,prefix='',prefix_sep='')
df.drop(df.index, inplace=True)
df = df.drop(['lat', 'long'], axis=1)
df.applymap(str)
equipments = pd.concat([equipments, df], axis=1)
equipments.head()

Unnamed: 0,lat,long,Equipment,Bourg l'Evesque - La Touche - Moulin du Comte,Bréquigny,Centre,Cleunay - Arsenal - Redon,Francisco Ferrer - Landry - Poterie,Jeanne d'Arc - Longs Champs - Beaulieu,Le Blosne,Maurepas - Bellangerais,Saint Martin,Sud gare,Thabor - Saint-Hélier - Alphonse Guérin,Villejean - Beauregard
0,48.112387,-1.661293,parking,,,,,,,,,,,,
1,48.104794,-1.692262,parking,,,,,,,,,,,,
2,48.109626,-1.672802,parking,,,,,,,,,,,,
3,48.104185,-1.690956,parking,,,,,,,,,,,,
4,48.106053,-1.690532,parking,,,,,,,,,,,,


In [68]:
check_distance = equipments.copy()

for i in rennes_neighbourhood.index:
    check_distance['Neighbourhood']=rennes_neighbourhood.at[i,'Neighbourhood']
    check_distance['Neighbourhood_lat']=rennes_neighbourhood.at[i,'lat']
    check_distance['Neighbourhood_long']=rennes_neighbourhood.at[i,'long']
    check_distance['dist_from_Neighbourhood']= check_distance.apply(lambda row: haversine(row['Neighbourhood_long'], 
                                            row['Neighbourhood_lat'], 
                                            row['long'], 
                                            row['lat']), axis=1)
    
    print(rennes_neighbourhood.at[i,'Neighbourhood'])
    for j in check_distance.index: 
        equipments.loc[j,check_distance.loc[j,'Neighbourhood']] = check_distance.loc[j,'dist_from_Neighbourhood']

equipments.head()

Le Blosne
Cleunay - Arsenal - Redon
Saint Martin
Villejean - Beauregard
Bréquigny
Jeanne d'Arc - Longs Champs - Beaulieu
Sud gare
Bourg l'Evesque - La Touche - Moulin du Comte
Maurepas - Bellangerais
Thabor - Saint-Hélier - Alphonse Guérin
Francisco Ferrer - Landry - Poterie
Centre


Unnamed: 0,lat,long,Equipment,Bourg l'Evesque - La Touche - Moulin du Comte,Bréquigny,Centre,Cleunay - Arsenal - Redon,Francisco Ferrer - Landry - Poterie,Jeanne d'Arc - Longs Champs - Beaulieu,Le Blosne,Maurepas - Bellangerais,Saint Martin,Sud gare,Thabor - Saint-Hélier - Alphonse Guérin,Villejean - Beauregard
0,48.112387,-1.661293,parking,3.896795,3.433599,1.418882,4.872061,2.216049,1.572841,3.048843,2.762201,2.291612,1.948695,0.043649,4.189906
1,48.104794,-1.692262,parking,1.690138,2.146864,1.131022,2.425713,3.758173,3.982923,3.31066,4.60351,2.543529,1.62255,2.450537,3.063282
2,48.109626,-1.672802,parking,3.033481,2.784813,0.580635,3.965113,2.636811,2.45597,2.923974,3.350906,2.068178,1.407921,0.909966,3.617837
3,48.104185,-1.690956,parking,1.804669,2.059528,1.102737,2.488301,3.647867,3.927924,3.193212,4.594616,2.585828,1.504398,2.383094,3.169517
4,48.106053,-1.690532,parking,1.771707,2.257959,0.943917,2.601469,3.671544,3.805979,3.31312,4.41368,2.37627,1.599981,2.285037,3.006851


Now we find the minimum value and set it to one and set all the other values to 0

In [15]:
def nan_all_but_min(df):
    arr = df.values
    idx = np.argmin(arr, axis=1)
    newarr = np.full_like(arr, np.nan, dtype='float')
    newarr[np.arange(arr.shape[0]), idx] = arr[np.arange(arr.shape[0]), idx]
    df = pd.DataFrame(newarr, columns=df.columns, index=df.index)
    return df

In [69]:
df = equipments[['Equipment','lat','long']]
equipments = pd.concat([df,
                        nan_all_but_min(equipments[['Bourg l\'Evesque - La Touche - Moulin du Comte','Bréquigny','Centre','Cleunay - Arsenal - Redon',
                                                    'Francisco Ferrer - Landry - Poterie','Jeanne d\'Arc - Longs Champs - Beaulieu','Le Blosne',
                                                    'Maurepas - Bellangerais','Saint Martin','Sud gare','Thabor - Saint-Hélier - Alphonse Guérin',
                                                    'Villejean - Beauregard']]).notnull().astype('int')
                       ],axis=1)


In [70]:
equipments.head()

Unnamed: 0,Equipment,lat,long,Bourg l'Evesque - La Touche - Moulin du Comte,Bréquigny,Centre,Cleunay - Arsenal - Redon,Francisco Ferrer - Landry - Poterie,Jeanne d'Arc - Longs Champs - Beaulieu,Le Blosne,Maurepas - Bellangerais,Saint Martin,Sud gare,Thabor - Saint-Hélier - Alphonse Guérin,Villejean - Beauregard
0,parking,48.112387,-1.661293,0,0,0,0,0,0,0,0,0,0,1,0
1,parking,48.104794,-1.692262,0,0,1,0,0,0,0,0,0,0,0,0
2,parking,48.109626,-1.672802,0,0,1,0,0,0,0,0,0,0,0,0
3,parking,48.104185,-1.690956,0,0,1,0,0,0,0,0,0,0,0,0
4,parking,48.106053,-1.690532,0,0,1,0,0,0,0,0,0,0,0,0


Now we do a get_dummies reversed and delete all the useless values.

In [71]:
def get_neighbourhood(row):
    for c in equipments.columns:
        if row[c]==1:
            return c
        
equipments['Neighbourhood'] = equipments.apply(get_neighbourhood, axis=1)

In [72]:
equipments = equipments[['Equipment','Neighbourhood']]
frames = [equipments, cultural_equipment]
equipments = pd.concat(frames)
equipments.head()

Unnamed: 0,Equipment,Neighbourhood
0,parking,Thabor - Saint-Hélier - Alphonse Guérin
1,parking,Centre
2,parking,Centre
3,parking,Centre
4,parking,Centre


Finally we can onehot encode the datas in order to be able to join them later with foursquare datas.

In [73]:
# one hot encoding
equipment_onehot = pd.get_dummies(equipments[['Equipment']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
equipment_onehot['Neighbourhood'] = equipments['Neighbourhood']

# move neighborhood column to the first column
fixed_columns_equipment = [equipment_onehot.columns[-1]] + list(equipment_onehot.columns[:-1])
equipment_onehot = equipment_onehot[fixed_columns_equipment]
equipment_onehot = equipment_onehot.dropna(axis = 0)

equipment_onehot.head()

Unnamed: 0,Neighbourhood,bike stops,bus stop,cultural equipement,electric car charger,green roads,parking
0,Thabor - Saint-Hélier - Alphonse Guérin,0,0,0,0,0,1
1,Centre,0,0,0,0,0,1
2,Centre,0,0,0,0,0,1
3,Centre,0,0,0,0,0,1
4,Centre,0,0,0,0,0,1


Let's compute the mean and chose the 5 most common in order to compute clusters with datas of equipment alone

In [74]:
equipment_grouped = equipment_onehot.groupby('Neighbourhood').mean().reset_index()
equipment_grouped

Unnamed: 0,Neighbourhood,bike stops,bus stop,cultural equipement,electric car charger,green roads,parking
0,Bourg l'Evesque - La Touche - Moulin du Comte,0.141689,0.166213,0.06812,0.002725,0.621253,0.0
1,Bréquigny,0.293785,0.19209,0.090395,0.00565,0.418079,0.0
2,Centre,0.213752,0.065022,0.013453,0.005979,0.33707,0.364723
3,Cleunay - Arsenal - Redon,0.077778,0.2,0.233333,0.0,0.488889,0.0
4,Francisco Ferrer - Landry - Poterie,0.128767,0.178082,0.030137,0.0,0.663014,0.0
5,Jeanne d'Arc - Longs Champs - Beaulieu,0.09589,0.243151,0.034247,0.010274,0.616438,0.0
6,Le Blosne,0.176991,0.073746,0.073746,0.0,0.675516,0.0
7,Maurepas - Bellangerais,0.106796,0.145631,0.07767,0.0,0.669903,0.0
8,Saint Martin,0.109375,0.165625,0.021875,0.0,0.6,0.103125
9,Sud gare,0.175105,0.099156,0.023207,0.008439,0.57173,0.122363


In [75]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [76]:
num_top_venues = 5

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighbourhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighbourhoods_equipment_sorted = pd.DataFrame(columns=columns)
neighbourhoods_equipment_sorted['Neighbourhood'] = equipment_grouped['Neighbourhood']

for ind in np.arange(equipment_grouped.shape[0]):
    neighbourhoods_equipment_sorted.iloc[ind, 1:] = return_most_common_venues(equipment_grouped.iloc[ind, :], num_top_venues)

neighbourhoods_equipment_sorted.head()

Unnamed: 0,Neighbourhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
0,Bourg l'Evesque - La Touche - Moulin du Comte,green roads,bus stop,bike stops,cultural equipement,electric car charger
1,Bréquigny,green roads,bike stops,bus stop,cultural equipement,electric car charger
2,Centre,parking,green roads,bike stops,bus stop,cultural equipement
3,Cleunay - Arsenal - Redon,green roads,cultural equipement,bus stop,bike stops,parking
4,Francisco Ferrer - Landry - Poterie,green roads,bus stop,bike stops,cultural equipement,parking


Clustering method applied on datas to get similar neighbourdhood.

In [77]:
kclusters = 5
equipment_grouped_clustering = equipment_grouped.drop('Neighbourhood', 1)
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(equipment_grouped_clustering)
kmeans.labels_[0:10] 
equipment_merged = rennes_neighbourhood.copy()
equipment_merged['Cluster Labels'] = kmeans.labels_
equipment_merged = equipment_merged.join(neighbourhoods_equipment_sorted.set_index('Neighbourhood'), on='Neighbourhood')
equipment_merged.head()

Unnamed: 0,Neighbourhood,lat,long,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
0,Le Blosne,48.085013,-1.658945,2,green roads,bike stops,cultural equipement,bus stop,parking
1,Cleunay - Arsenal - Redon,48.095816,-1.722033,4,green roads,cultural equipement,bus stop,bike stops,parking
2,Saint Martin,48.126865,-1.683262,1,green roads,bus stop,bike stops,parking,cultural equipement
3,Villejean - Beauregard,48.129004,-1.711953,3,green roads,bike stops,bus stop,cultural equipement,parking
4,Bréquigny,48.086038,-1.685403,2,green roads,bike stops,bus stop,cultural equipement,electric car charger


Finally let's display them on the map

In [78]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=13)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i+x+(i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(equipment_merged['lat'], equipment_merged['long'], equipment_merged['Neighbourhood'], equipment_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

# Foursquare Datas

Now that we found datas from Rennes Open Data website and that we displayed it on the map, let's do the same with foursquare datas.

Define Foursquare Credentials and Version

In [79]:
CLIENT_ID = 'TOLWE2WLGB0D2OZEWWAE4DVFAXQ01H1MJCJ5LDZVQ3FRVQK1' # your Foursquare ID
CLIENT_SECRET = 'GVLHU0OQDCDO3PW5LBSNWAGGP0QRRTLRWZXBS2J2CNNCUAA5' # your Foursquare Secret
VERSION = '20180609'
LIMIT = 100

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: TOLWE2WLGB0D2OZEWWAE4DVFAXQ01H1MJCJ5LDZVQ3FRVQK1
CLIENT_SECRET:GVLHU0OQDCDO3PW5LBSNWAGGP0QRRTLRWZXBS2J2CNNCUAA5


Lets get venues of all neighbouhoods in Rennes

In [80]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighbourhood', 
                  'Neighbourhood Latitude', 
                  'Neighbourhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [81]:
rennes_venues = getNearbyVenues(names=rennes_neighbourhood['Neighbourhood'],
                                   latitudes=rennes_neighbourhood['lat'],
                                   longitudes=rennes_neighbourhood['long']
                                  )

Le Blosne
Cleunay - Arsenal - Redon
Saint Martin
Villejean - Beauregard
Bréquigny
Jeanne d'Arc - Longs Champs - Beaulieu
Sud gare
Bourg l'Evesque - La Touche - Moulin du Comte
Maurepas - Bellangerais
Thabor - Saint-Hélier - Alphonse Guérin
Francisco Ferrer - Landry - Poterie
Centre


In [82]:
print(rennes_venues.shape)
rennes_venues.head()

(100, 7)


Unnamed: 0,Neighbourhood,Neighbourhood Latitude,Neighbourhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Le Blosne,48.085013,-1.658945,"Arrêt Triangle [32,33,61,161ex]",48.086173,-1.660973,Bus Stop
1,Le Blosne,48.085013,-1.658945,Métro Triangle ⓐ,48.086451,-1.66033,Metro Station
2,Le Blosne,48.085013,-1.658945,C.C Le Torigné,48.082942,-1.657291,Shopping Mall
3,Le Blosne,48.085013,-1.658945,Centre culturel Le Triangle,48.088499,-1.65926,Performing Arts Venue
4,Le Blosne,48.085013,-1.658945,Métro Le Blosne ⓐ,48.087712,-1.65429,Metro Station


In [83]:
print('There are {} uniques categories.'.format(len(rennes_venues['Venue Category'].unique())))

There are 51 uniques categories.


## Analyze Each Neighborhood

In [84]:
# one hot encoding
rennes_onehot = pd.get_dummies(rennes_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
rennes_onehot['Neighbourhood'] = rennes_venues['Neighbourhood']

# move neighborhood column to the first column
fixed_columns = [rennes_onehot.columns[-1]] + list(rennes_onehot.columns[:-1])
rennes_onehot = rennes_onehot[fixed_columns]

rennes_onehot.head()

Unnamed: 0,Neighbourhood,American Restaurant,Art Museum,Asian Restaurant,Auto Workshop,Bakery,Bar,Bistro,Brasserie,Burger Joint,...,Soccer Field,Soccer Stadium,Stadium,Supermarket,Sushi Restaurant,Tea Room,Tennis Court,Thai Restaurant,Thrift / Vintage Store,Wine Shop
0,Le Blosne,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,Le Blosne,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,Le Blosne,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,Le Blosne,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,Le Blosne,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [85]:
rennes_grouped = rennes_onehot.groupby('Neighbourhood').mean().reset_index()
rennes_grouped

Unnamed: 0,Neighbourhood,American Restaurant,Art Museum,Asian Restaurant,Auto Workshop,Bakery,Bar,Bistro,Brasserie,Burger Joint,...,Soccer Field,Soccer Stadium,Stadium,Supermarket,Sushi Restaurant,Tea Room,Tennis Court,Thai Restaurant,Thrift / Vintage Store,Wine Shop
0,Bourg l'Evesque - La Touche - Moulin du Comte,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,Bréquigny,0.166667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.166667,0.166667,0.0,0.0,0.0,0.0,0.0,0.0
2,Centre,0.0,0.017544,0.017544,0.0,0.035088,0.087719,0.017544,0.017544,0.035088,...,0.0,0.0,0.0,0.0,0.017544,0.035088,0.0,0.052632,0.0,0.0
3,Cleunay - Arsenal - Redon,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.5,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,Francisco Ferrer - Landry - Poterie,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
5,Jeanne d'Arc - Longs Champs - Beaulieu,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0
6,Le Blosne,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
7,Maurepas - Bellangerais,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.333333,0.0,0.0,0.0
8,Saint Martin,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0
9,Sud gare,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [86]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighbourhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighbourhoods_rennes_sorted = pd.DataFrame(columns=columns)
neighbourhoods_rennes_sorted['Neighbourhood'] = rennes_grouped['Neighbourhood']

for ind in np.arange(rennes_grouped.shape[0]):
    neighbourhoods_rennes_sorted.iloc[ind, 1:] = return_most_common_venues(rennes_grouped.iloc[ind, :], num_top_venues)

neighbourhoods_rennes_sorted

Unnamed: 0,Neighbourhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Bourg l'Evesque - La Touche - Moulin du Comte,Grocery Store,French Restaurant,Soccer Stadium,Bus Stop,Wine Shop,College Cafeteria,Gym Pool,Garden,Food & Drink Shop,Fast Food Restaurant
1,Bréquigny,American Restaurant,Supermarket,Park,Gym Pool,Stadium,Hotel,Bakery,Creperie,Grocery Store,Garden
2,Centre,Plaza,Bar,Creperie,Coffee Shop,Historic Site,Irish Pub,Thai Restaurant,Bakery,Burger Joint,Hotel
3,Cleunay - Arsenal - Redon,French Restaurant,Soccer Field,Wine Shop,College Cafeteria,Gym Pool,Grocery Store,Garden,Food & Drink Shop,Fast Food Restaurant,Falafel Restaurant
4,Francisco Ferrer - Landry - Poterie,Auto Workshop,Garden,French Restaurant,Park,Bus Stop,Wine Shop,Concert Hall,Gym Pool,Grocery Store,Food & Drink Shop
5,Jeanne d'Arc - Longs Champs - Beaulieu,Bus Stop,College Cafeteria,Stadium,Concert Hall,Wine Shop,Gym Pool,Grocery Store,Garden,French Restaurant,Food & Drink Shop
6,Le Blosne,Metro Station,Performing Arts Venue,Shopping Mall,Bus Stop,Wine Shop,Concert Hall,Grocery Store,Garden,French Restaurant,Food & Drink Shop
7,Maurepas - Bellangerais,Gym Pool,Tennis Court,Skating Rink,Coffee Shop,Grocery Store,Garden,French Restaurant,Food & Drink Shop,Fast Food Restaurant,Falafel Restaurant
8,Saint Martin,Thrift / Vintage Store,Grocery Store,Food & Drink Shop,Bus Stop,Concert Hall,Historic Site,Gym Pool,Garden,French Restaurant,Fast Food Restaurant
9,Sud gare,Metro Station,Fast Food Restaurant,Shop & Service,Wine Shop,College Cafeteria,Grocery Store,Garden,French Restaurant,Food & Drink Shop,Falafel Restaurant


In [87]:
kclusters = 5
rennes_grouped_clustering = rennes_grouped.drop('Neighbourhood', 1)
kmeans2 = KMeans(n_clusters=kclusters, random_state=0).fit(rennes_grouped_clustering)
kmeans2.labels_[0:10]
rennes_merged = rennes_neighbourhood.drop(rennes_neighbourhood.index[3]).copy() #One neighbourhood retuned 0 venues from foursquare
rennes_merged['Cluster Labels'] = kmeans2.labels_
rennes_merged = rennes_merged.join(neighbourhoods_rennes_sorted.set_index('Neighbourhood'), on='Neighbourhood')
rennes_merged.head()

Unnamed: 0,Neighbourhood,lat,long,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Le Blosne,48.085013,-1.658945,4,Metro Station,Performing Arts Venue,Shopping Mall,Bus Stop,Wine Shop,Concert Hall,Grocery Store,Garden,French Restaurant,Food & Drink Shop
1,Cleunay - Arsenal - Redon,48.095816,-1.722033,1,French Restaurant,Soccer Field,Wine Shop,College Cafeteria,Gym Pool,Grocery Store,Garden,Food & Drink Shop,Fast Food Restaurant,Falafel Restaurant
2,Saint Martin,48.126865,-1.683262,1,Thrift / Vintage Store,Grocery Store,Food & Drink Shop,Bus Stop,Concert Hall,Historic Site,Gym Pool,Garden,French Restaurant,Fast Food Restaurant
4,Bréquigny,48.086038,-1.685403,3,American Restaurant,Supermarket,Park,Gym Pool,Stadium,Hotel,Bakery,Creperie,Grocery Store,Garden
5,Jeanne d'Arc - Longs Champs - Beaulieu,48.12082,-1.644283,4,Bus Stop,College Cafeteria,Stadium,Concert Hall,Wine Shop,Gym Pool,Grocery Store,Garden,French Restaurant,Food & Drink Shop


In [88]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=13)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i+x+(i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(rennes_merged['lat'], rennes_merged['long'], rennes_merged['Neighbourhood'], rennes_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

# Merge of the two datasets

Now that we displayed both datasets on the map, let's merge them and see what happens.

In [89]:
result = rennes_onehot.append(equipment_onehot)
result = result.drop(result['Neighbourhood'] == 0)
result = result.fillna(0)
result.head()

Unnamed: 0,American Restaurant,Art Museum,Asian Restaurant,Auto Workshop,Bakery,Bar,Bistro,Brasserie,Burger Joint,Burrito Place,...,Tennis Court,Thai Restaurant,Thrift / Vintage Store,Wine Shop,bike stops,bus stop,cultural equipement,electric car charger,green roads,parking
1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
5,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [90]:
result_grouped = result.groupby('Neighbourhood').mean().reset_index()
result_grouped

Unnamed: 0,Neighbourhood,American Restaurant,Art Museum,Asian Restaurant,Auto Workshop,Bakery,Bar,Bistro,Brasserie,Burger Joint,...,Tennis Court,Thai Restaurant,Thrift / Vintage Store,Wine Shop,bike stops,bus stop,cultural equipement,electric car charger,green roads,parking
0,Bourg l'Evesque - La Touche - Moulin du Comte,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.140162,0.16442,0.067385,0.002695,0.614555,0.0
1,Bréquigny,0.005464,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.284153,0.185792,0.087432,0.005464,0.404372,0.0
2,Centre,0.0,0.000717,0.000717,0.0,0.001435,0.003587,0.000717,0.000717,0.001435,...,0.0,0.002152,0.0,0.0,0.205165,0.06241,0.012195,0.005739,0.323529,0.350072
3,Cleunay - Arsenal - Redon,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.076087,0.195652,0.228261,0.0,0.478261,0.0
4,Francisco Ferrer - Landry - Poterie,0.0,0.0,0.0,0.002703,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.127027,0.175676,0.02973,0.0,0.654054,0.0
5,Jeanne d'Arc - Longs Champs - Beaulieu,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.094276,0.239057,0.03367,0.010101,0.606061,0.0
6,Le Blosne,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.174927,0.072886,0.072886,0.0,0.667638,0.0
7,Maurepas - Bellangerais,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.003205,0.0,0.0,0.0,0.105769,0.144231,0.076923,0.0,0.663462,0.0
8,Saint Martin,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.003086,0.0,0.108025,0.16358,0.021605,0.0,0.592593,0.101852
9,Sud gare,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.17364,0.098326,0.023013,0.008368,0.566946,0.121339


In [91]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighbourhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighbourhoods_venues_sorted = pd.DataFrame(columns=columns)
neighbourhoods_venues_sorted['Neighbourhood'] = result_grouped['Neighbourhood']

for ind in np.arange(result_grouped.shape[0]):
    neighbourhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(result_grouped.iloc[ind, :], num_top_venues)

neighbourhoods_venues_sorted.head()

Unnamed: 0,Neighbourhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Bourg l'Evesque - La Touche - Moulin du Comte,green roads,bus stop,bike stops,cultural equipement,electric car charger,Bus Stop,Grocery Store,French Restaurant,Soccer Stadium,parking
1,Bréquigny,green roads,bike stops,bus stop,cultural equipement,American Restaurant,Supermarket,Gym Pool,Park,Stadium,Hotel
2,Centre,parking,green roads,bike stops,bus stop,cultural equipement,electric car charger,Bar,Plaza,Coffee Shop,Creperie
3,Cleunay - Arsenal - Redon,green roads,cultural equipement,bus stop,bike stops,Soccer Field,French Restaurant,Food & Drink Shop,Creperie,Cupcake Shop,Falafel Restaurant
4,Francisco Ferrer - Landry - Poterie,green roads,bus stop,bike stops,cultural equipement,Park,French Restaurant,Auto Workshop,Bus Stop,Garden,Falafel Restaurant


In [92]:
kclusters = 5
result_grouped_clustering = result_grouped.drop('Neighbourhood', 1)
kmeans3 = KMeans(n_clusters=kclusters, random_state=0).fit(result_grouped_clustering)
kmeans3.labels_[0:10] 
result_merged = rennes_neighbourhood.copy()
result_merged['Cluster Labels'] = kmeans3.labels_
result_merged = result_merged.join(neighbourhoods_venues_sorted.set_index('Neighbourhood'), on='Neighbourhood')
result_merged.head()

Unnamed: 0,Neighbourhood,lat,long,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Le Blosne,48.085013,-1.658945,1,green roads,bike stops,cultural equipement,bus stop,Metro Station,Shopping Mall,Performing Arts Venue,Fast Food Restaurant,Creperie,Cupcake Shop
1,Cleunay - Arsenal - Redon,48.095816,-1.722033,3,green roads,cultural equipement,bus stop,bike stops,Soccer Field,French Restaurant,Food & Drink Shop,Creperie,Cupcake Shop,Falafel Restaurant
2,Saint Martin,48.126865,-1.683262,2,green roads,bus stop,bike stops,parking,cultural equipement,Food & Drink Shop,Grocery Store,Bus Stop,Thrift / Vintage Store,Auto Workshop
3,Villejean - Beauregard,48.129004,-1.711953,4,green roads,bike stops,bus stop,cultural equipement,parking,Concert Hall,Historic Site,Gym Pool,Grocery Store,Garden
4,Bréquigny,48.086038,-1.685403,1,green roads,bike stops,bus stop,cultural equipement,American Restaurant,Supermarket,Gym Pool,Park,Stadium,Hotel


In [93]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=13)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i+x+(i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(result_merged['lat'], result_merged['long'], result_merged['Neighbourhood'], result_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

Sadly, because of the lack of datas about Rennes on Foursquare, we see that the datas from Rennes Open Data website are clearly taking the head over datas from Foursquare. 

Lets compare the differents clusters in order to try to find out more about this

In [124]:
result_merged.loc[result_merged['Cluster Labels'] == 0, result_merged.columns[[0] + list(range(4, result_merged.shape[1]))]]

Unnamed: 0,Neighbourhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
8,Maurepas - Bellangerais,green roads,bus stop,bike stops,cultural equipement,Tennis Court,Gym Pool,Skating Rink,Fast Food Restaurant,Creperie,Cupcake Shop
9,Thabor - Saint-Hélier - Alphonse Guérin,parking,green roads,bike stops,bus stop,cultural equipement,Bus Stop,electric car charger,Wine Shop,Park,Art Museum


In [125]:
result_merged.loc[result_merged['Cluster Labels'] == 1, result_merged.columns[[0] + list(range(4, result_merged.shape[1]))]]

Unnamed: 0,Neighbourhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Le Blosne,green roads,bike stops,cultural equipement,bus stop,Metro Station,Shopping Mall,Performing Arts Venue,Fast Food Restaurant,Creperie,Cupcake Shop
4,Bréquigny,green roads,bike stops,bus stop,cultural equipement,American Restaurant,Supermarket,Gym Pool,Park,Stadium,Hotel
5,Jeanne d'Arc - Longs Champs - Beaulieu,green roads,bus stop,bike stops,cultural equipement,electric car charger,Bus Stop,College Cafeteria,Stadium,Concert Hall,Creperie
6,Sud gare,green roads,bike stops,parking,bus stop,cultural equipement,electric car charger,Metro Station,Fast Food Restaurant,Shop & Service,Thrift / Vintage Store
7,Bourg l'Evesque - La Touche - Moulin du Comte,green roads,bus stop,bike stops,cultural equipement,electric car charger,Bus Stop,Grocery Store,French Restaurant,Soccer Stadium,parking


In [126]:
result_merged.loc[result_merged['Cluster Labels'] == 2, result_merged.columns[[0] + list(range(4, result_merged.shape[1]))]]

Unnamed: 0,Neighbourhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
2,Saint Martin,green roads,bus stop,bike stops,parking,cultural equipement,Food & Drink Shop,Grocery Store,Bus Stop,Thrift / Vintage Store,Auto Workshop
10,Francisco Ferrer - Landry - Poterie,green roads,bus stop,bike stops,cultural equipement,Park,French Restaurant,Auto Workshop,Bus Stop,Garden,Falafel Restaurant


In [128]:
result_merged.loc[result_merged['Cluster Labels'] == 3, result_merged.columns[[0] + list(range(4, result_merged.shape[1]))]]

Unnamed: 0,Neighbourhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
1,Cleunay - Arsenal - Redon,green roads,cultural equipement,bus stop,bike stops,Soccer Field,French Restaurant,Food & Drink Shop,Creperie,Cupcake Shop,Falafel Restaurant
11,Centre,parking,green roads,bike stops,bus stop,cultural equipement,electric car charger,Bar,Plaza,Coffee Shop,Creperie


In [127]:
result_merged.loc[result_merged['Cluster Labels'] == 4, result_merged.columns[[0] + list(range(4, result_merged.shape[1]))]]

Unnamed: 0,Neighbourhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
3,Villejean - Beauregard,green roads,bike stops,bus stop,cultural equipement,parking,Concert Hall,Historic Site,Gym Pool,Grocery Store,Garden


As we can see our imported datas are clearly taking the lead on the Foursquare Dataset.