<h1 align = 'center'>Clustering the Neighbourhoods of London and Paris</h1>

## Introduction
---

London and Paris are quite the popular tourist and vacation destinations for people all around the world. They are diverse and multicultural and offer a wide variety of experiences that is widely sought after. We try to group the neighbourhoods of London and Paris respectively and draw insights to what they look like now.

## Business Problem
---


The aim is to help tourists choose their destinations depending on the experiences that the neighbourhoods have to offer and what they would want to have. This also helps people make decisions if they are thinking about migrating to London or Paris or even if they want to relocate neighbourhoods within the city. Our findings will help stakeholders make informed decisions and address any concerns they have including the different kinds of cuisines, provision stores and what the city has to offer.

## Data Description
---

We require geographical location data for both London and Paris. Postal codes in each city serve as a starting point. Using Postal codes we use can find out the neighborhoods, boroughs, venues and their most popular venue categories.

## London

To derive our solution, We scrape our data from https://en.wikipedia.org/wiki/List_of_areas_of_London

This wikipedia page has information about all the neighbourhoods, we limit it London.

1. *borough* : Name of Neighbourhood
2. *town* : Name of borough
3. *post_code* : Postal codes for London.

This wikipedia page lacks information about the geographical locations. To solve this problem we use ArcGIS API

### ArcGIS API

ArcGIS Online enables you to connect people, locations, and data using interactive maps. Work with smart, data-driven styles and intuitive analysis tools that deliver location intelligence. Share your insights with the world or specific groups. 

More specifically, we use ArcGIS to get the geo locations of the neighbourhoods of London. The following columns are added to our initial dataset which prepares our data. 

4. *latitude* : Latitude for Neighbourhood
5. *longitude* : Longitude for Neighbourhood

## Paris

To derive our solution, We leverage JSON data available at https://www.data.gouv.fr/fr/datasets/r/e88c6fda-1d09-42a0-a069-606d3259114e 

The JSON file has data about all the neighbourhoods in France, we limit it to Paris.

1. *postal_code* : Postal codes for France
2. *nom_comm* : Name of Neighbourhoods in France
3. *nom_dept* : Name of the boroughs, equivalent to towns in France
4. *geo_point_2d* : Tuple containing the latitude and longitude of the Neighbourhoods.

## Foursquare API Data

We will need data about different venues in different neighbourhoods of that specific borough. In order to gain that information we will use "Foursquare" locational information. Foursquare is a location data provider with information about all manner of venues and events within an area of interest. Such information includes venue names, locations, menus and even photos. As such, the foursquare location platform will be used as the sole data source since all the stated required information can be obtained through the API.

After finding the list of neighbourhoods, we then connect to the Foursquare API to gather information about venues inside each and every neighbourhood. For each neighbourhood, we have chosen the radius to be 500 meters.

The data retrieved from Foursquare contained information of venues within a specified distance of the longitude and latitude of the postcodes. The information obtained per venue as follows:

1. *Neighbourhood* : Name of the Neighbourhood
2. *Neighbourhood Latitude* : Latitude of the Neighbourhood
3. *Neighbourhood Longitude* : Longitude of the Neighbourhood
4. *Venue* : Name of the Venue
5. *Venue Latitude* : Latitude of Venue
6. *Venue Longitude* : Longitude of Venue
7. *Venue Category* : Category of Venue


Based on all the information collected for both London and Paris, we have sufficient data to build our model. We cluster the neighbourhoods together based on similar venue categories. We then present our observations and findings. Using this data, our stakeholders can take the necessary decision.

# Methodology

We will be creating our model with the help of Python so we start off by importing all the required packages.

In [1]:
import pandas as pd
import requests
import numpy as np
import matplotlib.cm as cm
import matplotlib.colors as colors
import folium
from sklearn.cluster import KMeans

## Exploring London

### Neighbourhoods of London
We begin to start collecting and refining the data needed for the our business solution to work.

### Data collection

Scraping the datas we need from the wiki page

In [2]:
url_london = "https://en.wikipedia.org/wiki/List_of_areas_of_London"
wiki_london_url = requests.get(url_london)
wiki_london_url.status_code

200

Response 200 denotes a successful connection

In [3]:
data_london = pd.read_html(wiki_london_url.content)[1]   #Paring displays all the tables, we need the second one
data_london

Unnamed: 0,Location,London borough,Post town,Postcode district,Dial code,OS grid ref
0,Abbey Wood,"Bexley, Greenwich [7]",LONDON,SE2,020,TQ465785
1,Acton,"Ealing, Hammersmith and Fulham[8]",LONDON,"W3, W4",020,TQ205805
2,Addington,Croydon[8],CROYDON,CR0,020,TQ375645
3,Addiscombe,Croydon[8],CROYDON,CR0,020,TQ345665
4,Albany Park,Bexley,"BEXLEY, SIDCUP","DA5, DA14",020,TQ478728
...,...,...,...,...,...,...
526,Woolwich,Greenwich,LONDON,SE18,020,TQ435795
527,Worcester Park,"Sutton, Kingston upon Thames",WORCESTER PARK,KT4,020,TQ225655
528,Wormwood Scrubs,Hammersmith and Fulham,LONDON,W12,020,TQ225815
529,Yeading,Hillingdon,HAYES,UB4,020,TQ115825


### Feature Selection

In [4]:
data_london.columns

Index(['Location', 'London borough', 'Post town', 'Postcode district',
       'Dial code', 'OS grid ref'],
      dtype='object')

We will work with only some of the columns, hence dropping unneccessary columns

In [5]:
df = data_london.drop([data_london.columns[0],data_london.columns[4],data_london.columns[5]],axis=1)
df.head()

Unnamed: 0,London borough,Post town,Postcode district
0,"Bexley, Greenwich [7]",LONDON,SE2
1,"Ealing, Hammersmith and Fulham[8]",LONDON,"W3, W4"
2,Croydon[8],CROYDON,CR0
3,Croydon[8],CROYDON,CR0
4,Bexley,"BEXLEY, SIDCUP","DA5, DA14"


In [6]:
df.columns = ['Borough','Town','Post_code']
df.head()

Unnamed: 0,Borough,Town,Post_code
0,"Bexley, Greenwich [7]",LONDON,SE2
1,"Ealing, Hammersmith and Fulham[8]",LONDON,"W3, W4"
2,Croydon[8],CROYDON,CR0
3,Croydon[8],CROYDON,CR0
4,Bexley,"BEXLEY, SIDCUP","DA5, DA14"


### Data Preprocessing

The borough columns contains some brackets at the end, hence we will strip them.

In [7]:
df['Borough'] = df['Borough'].map(lambda x: x.rstrip(']').rstrip('0123456789').rstrip('['))
df.head()

Unnamed: 0,Borough,Town,Post_code
0,"Bexley, Greenwich",LONDON,SE2
1,"Ealing, Hammersmith and Fulham",LONDON,"W3, W4"
2,Croydon,CROYDON,CR0
3,Croydon,CROYDON,CR0
4,Bexley,"BEXLEY, SIDCUP","DA5, DA14"


In [8]:
# Checking the shape of the dataframe
df.shape

(531, 3)

## Feature Engineering

Choosing only LONDON data from town column

In [9]:
df = df[df['Town'].str.contains('LONDON')]
df.head()

Unnamed: 0,Borough,Town,Post_code
0,"Bexley, Greenwich",LONDON,SE2
1,"Ealing, Hammersmith and Fulham",LONDON,"W3, W4"
6,City,LONDON,EC3
7,Westminster,LONDON,WC2
9,Bromley,LONDON,SE20


In [10]:
#Checking the size of the dataframe
df.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 308 entries, 0 to 528
Data columns (total 3 columns):
 #   Column     Non-Null Count  Dtype 
---  ------     --------------  ----- 
 0   Borough    308 non-null    object
 1   Town       308 non-null    object
 2   Post_code  308 non-null    object
dtypes: object(3)
memory usage: 9.6+ KB


# Geolocations of the London Neighbourhoods

## ArcGis API

Plotting the coordinates of the neighbourhood using ArcGis API.

In [11]:
# !pip install arcgis

In [12]:
from arcgis.geocoding import geocode
from arcgis.gis import GIS
gis = GIS()

Function to return latitude and longitude for all the london neighbourhood

In [13]:
geocode('SE2, London, England, GBR',as_featureset=True).features

[{"geometry": {"x": 0.12127000000003818, "y": 51.492450000000076, "spatialReference": {"wkid": 4326, "latestWkid": 4326}}, "attributes": {"Loc_name": "World", "Status": "M", "Score": 100, "Match_addr": "SE2, London, England", "LongLabel": "SE2, London, England, GBR", "ShortLabel": "SE2", "Addr_type": "Locality", "Type": "City", "PlaceName": "SE2", "Place_addr": "SE2, London, England", "Phone": "", "URL": "", "Rank": 15, "AddBldg": "", "AddNum": "", "AddNumFrom": "", "AddNumTo": "", "AddRange": "", "Side": "", "StPreDir": "", "StPreType": "", "StName": "", "StType": "", "StDir": "", "BldgType": "", "BldgName": "", "LevelType": "", "LevelName": "", "UnitType": "", "UnitName": "", "SubAddr": "", "StAddr": "", "Block": "", "Sector": "", "Nbrhd": "SE2", "District": "", "City": "London", "MetroArea": "", "Subregion": "London", "Region": "England", "RegionAbbr": "ENG", "Territory": "", "Zone": "", "Postal": "", "PostalExt": "", "Country": "GBR", "LangCode": "ENG", "Distance": 0, "X": 0.121270

In [14]:
def co_ordinates(postal):
  '''
  This functions return the co-ordinates(i.e. latitude and longitude) of a given a place given their postal code.
  '''
  lati = 0
  longi = 0
  g = geocode(address='{}, London, England, GBR'.format(postal))[0] #from the above code, since the first index gives a location score of 100%, 
                                                                    #we extract the first index value for the co-oridnates
  lati = g['location']['y']
  longi = g['location']['x']
  return [str(lati), str(longi)]

In [15]:
print(co_ordinates('EC3'))  # checking if the function works

['51.51200000000006', '-0.08057999999994081']


Applying the above function to every postal code

In [16]:
uk_postal = df['Post_code']
all_latlong_uk = uk_postal.apply(lambda x: co_ordinates(x))
all_latlong_uk

0       [51.492450000000076, 0.12127000000003818]
1        [51.51324000000005, -0.2674599999999714]
6       [51.51200000000006, -0.08057999999994081]
7       [51.51651000000004, -0.11967999999995982]
9       [51.41009000000008, -0.05682999999993399]
                          ...                    
521    [51.589770000000044, 0.030520000000024083]
522      [51.50642000000005, -0.1272099999999341]
525     [51.615920000000074, -0.1767399999999384]
526      [51.48207000000008, 0.07143000000002075]
528      [51.50645000000003, -0.2369099999999662]
Name: Post_code, Length: 308, dtype: object

Separating Latitude and lontitude to separate variables

In [17]:
latitude = [i[0] for i in all_latlong_uk]
longitude = [j[1] for j in all_latlong_uk]

Adding them into our dataframe

In [18]:
df['Latitude'] = latitude
df['Longitude'] = longitude
df.head()

Unnamed: 0,Borough,Town,Post_code,Latitude,Longitude
0,"Bexley, Greenwich",LONDON,SE2,51.49245000000008,0.1212700000000381
1,"Ealing, Hammersmith and Fulham",LONDON,"W3, W4",51.51324000000005,-0.2674599999999714
6,City,LONDON,EC3,51.51200000000006,-0.0805799999999408
7,Westminster,LONDON,WC2,51.51651000000004,-0.1196799999999598
9,Bromley,LONDON,SE20,51.41009000000008,-0.0568299999999339


Converting them into floating point numbers

In [19]:
df['Latitude'] = df['Latitude'].astype(float)
df['Longitude'] = df['Longitude'].astype(float)
df.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 308 entries, 0 to 528
Data columns (total 5 columns):
 #   Column     Non-Null Count  Dtype  
---  ------     --------------  -----  
 0   Borough    308 non-null    object 
 1   Town       308 non-null    object 
 2   Post_code  308 non-null    object 
 3   Latitude   308 non-null    float64
 4   Longitude  308 non-null    float64
dtypes: float64(2), object(3)
memory usage: 14.4+ KB


In [20]:
df.head()

Unnamed: 0,Borough,Town,Post_code,Latitude,Longitude
0,"Bexley, Greenwich",LONDON,SE2,51.49245,0.12127
1,"Ealing, Hammersmith and Fulham",LONDON,"W3, W4",51.51324,-0.26746
6,City,LONDON,EC3,51.512,-0.08058
7,Westminster,LONDON,WC2,51.51651,-0.11968
9,Bromley,LONDON,SE20,51.41009,-0.05683


# Visualizing a map for London

I will be using the folium package to visualize the map of London and its neighbourhood.

Firstly retriving the co-oridnates for London

In [21]:
geocode(address='London, England, GBR', as_featureset=True).features

[{"geometry": {"x": -0.1272099999999341, "y": 51.50642000000005, "spatialReference": {"wkid": 4326, "latestWkid": 4326}}, "attributes": {"Loc_name": "World", "Status": "T", "Score": 100, "Match_addr": "London, England", "LongLabel": "London, England, GBR", "ShortLabel": "London", "Addr_type": "Locality", "Type": "City", "PlaceName": "London", "Place_addr": "London, England", "Phone": "", "URL": "", "Rank": 1.75, "AddBldg": "", "AddNum": "", "AddNumFrom": "", "AddNumTo": "", "AddRange": "", "Side": "", "StPreDir": "", "StPreType": "", "StName": "", "StType": "", "StDir": "", "BldgType": "", "BldgName": "", "LevelType": "", "LevelName": "", "UnitType": "", "UnitName": "", "SubAddr": "", "StAddr": "", "Block": "", "Sector": "", "Nbrhd": "", "District": "", "City": "London", "MetroArea": "", "Subregion": "London", "Region": "England", "RegionAbbr": "ENG", "Territory": "", "Zone": "", "Postal": "", "PostalExt": "", "Country": "GBR", "LangCode": "ENG", "Distance": 0, "X": -0.1272099999999341

In [22]:
#Create a map for London
map_london = folium.Map(location=[51.50642000000005,-0.1272099999999341],zoom_start=12)

#adding markers into the map
for lat,lon,bor,town in zip(df['Latitude'],df['Longitude'],df['Borough'],df['Town']):
  label = "{}, {}".format(town,bor)
  label = folium.Popup(label,parse_html=True)
  folium.CircleMarker(
      [lat,lon],
      radius = 5,
      popup = label,
      color = 'red',
      fill = True).add_to(map_london)

map_london

## Venues in London

To find venues and venue category in London, we will use foursquare API.

In [23]:
CLIENT_ID = 'you_client_id'
CLIENT_SECRET = 'your_client_secret'
VERSION = '20180605'

Defining a function to get the near-by venues in the neighbourhood. This will help us get venue categories too.

In [28]:
LIMIT = 100
def getNearByVenues(names,latitude,longitude,radius=500):
  venues = []
  for name, lat, lon in zip(names,latitude,longitude):
    print(name)

    #using API to request for venues
    url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
        CLIENT_ID,CLIENT_SECRET,VERSION,lat,lon,radius,LIMIT)
    #make a GET request
    result = requests.get(url).json()["response"]['groups'][0]["items"]

    #return only relevant information for each nearby venue
    venues.append([(name,lat,lon,v['venue']['name'],v['venue']['categories'][0]['name']) for v in result])

  near_venues = pd.DataFrame([item for venue_list in venues for item in venue_list])
  near_venues.columns = ['Neighbourhood','Neighbourhood Latitude','Neighbourhood Longitude','Venue','Venue Category']

  return (near_venues)

In [29]:
venues_london = getNearByVenues(df['Borough'],df['Latitude'],df['Longitude'])

Bexley, Greenwich 
Ealing, Hammersmith and Fulham
City
Westminster
Bromley
Islington
Islington
Barnet
Enfield
Wandsworth
Southwark
City
Richmond upon Thames
Barnet
Islington
Wandsworth
Westminster
Bromley
Newham
Ealing
Westminster
Lewisham
Camden
Southwark
Tower Hamlets
Bexley
City
Lewisham
Greenwich
Tower Hamlets
Camden
Haringey
Tower Hamlets
Haringey
Barnet
Brent
Lambeth
Lewisham
Tower Hamlets
Kensington and Chelsea, Hammersmith and Fulham
Brent
Barnet
Barnet
Southwark
Tower Hamlets
Camden
Tower Hamlets
Waltham Forest
Newham
Islington
Richmond upon Thames
Lewisham
Camden
Westminster
Greenwich
Kensington and Chelsea
Barnet
Westminster
Lewisham
Waltham Forest
Hounslow, Ealing, Hammersmith and Fulham
Brent
Barnet
Lambeth, Wandsworth
Islington
Barnet
Merton
Barnet
Westminster
Barnet, Brent, Camden
Lewisham
Bexley
Haringey
Bromley
Tower Hamlets
Newham
Hackney
Islington
Southwark
Lewisham
Brent
Southwark
Ealing
Kensington and Chelsea
Wandsworth
Southwark
Barnet
Newham
Richmond upon Thames


In [30]:
venues_london.shape

(10419, 5)

In [84]:
venues_london.head()

Unnamed: 0,Neighbourhood,Neighbourhood Latitude,Neighbourhood Longitude,Venue,Venue Category
0,"Bexley, Greenwich",51.49245,0.12127,Lesnes Abbey,Historic Site
1,"Bexley, Greenwich",51.49245,0.12127,Sainsbury's,Supermarket
2,"Bexley, Greenwich",51.49245,0.12127,Lidl,Supermarket
3,"Bexley, Greenwich",51.49245,0.12127,Abbey Wood Railway Station (ABW),Train Station
4,"Bexley, Greenwich",51.49245,0.12127,Bean @ Work,Coffee Shop


We have collected 10419 venues and their categories records in London.

In [31]:
venues_london['Venue Category'].value_counts()

Pub                     784
Coffee Shop             657
Café                    558
Hotel                   338
Italian Restaurant      311
                       ... 
Caucasian Restaurant      1
Skate Park                1
Fishing Store             1
Kosher Restaurant         1
Poke Place                1
Name: Venue Category, Length: 303, dtype: int64

In [32]:
venues_london.groupby('Venue Category').max()

Unnamed: 0_level_0,Neighbourhood,Neighbourhood Latitude,Neighbourhood Longitude,Venue
Venue Category,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
Accessories Store,Westminster,51.51656,-0.11968,James Smith & Sons
Adult Boutique,Islington,51.52969,-0.08697,Sh! Women's Erotic Emporium
African Restaurant,Westminster,51.52587,-0.08808,Red Sea Restaurant
American Restaurant,Waltham Forest,51.61780,0.02795,Spielburger
Antique Shop,Westminster,51.51651,-0.11968,The London Silver Vaults
...,...,...,...,...
Wings Joint,Hammersmith and Fulham,51.54187,-0.19795,Wingmans
Women's Store,Westminster,51.55457,-0.11478,Vivien of Holloway
Xinjiang Restaurant,Southwark,51.47480,-0.09313,Silk Road
Yoga Studio,Westminster,51.55457,-0.03558,yogahaven


## One Hot Encoding

We need to encode our venue categories to get a better result for our clustering.

In [33]:
London_venue_cat = pd.get_dummies(venues_london[['Venue Category']], prefix="", prefix_sep="")

Adding Neighbourhood to our encoded dataframe

In [34]:
London_venue_cat['Neighbourhood'] = venues_london['Neighbourhood'] 

# moving neighborhood column to the first column
fixed_columns = [London_venue_cat.columns[-1]] + list(London_venue_cat.columns[:-1])
London_venue_cat = London_venue_cat[fixed_columns]

London_venue_cat.head()

Unnamed: 0,Neighbourhood,Accessories Store,Adult Boutique,African Restaurant,American Restaurant,Antique Shop,Arcade,Arepa Restaurant,Argentinian Restaurant,Art Gallery,Art Museum,Arts & Crafts Store,Arts & Entertainment,Asian Restaurant,Athletics & Sports,Australian Restaurant,Auto Garage,Auto Workshop,Automotive Shop,BBQ Joint,Bagel Shop,Bakery,Bar,Beach,Bed & Breakfast,Beer Bar,Beer Garden,Beer Store,Bike Shop,Bistro,Boarding House,Bookstore,Botanical Garden,Boutique,Boxing Gym,Brasserie,Brazilian Restaurant,Breakfast Spot,Brewery,Bridal Shop,...,Sporting Goods Shop,Sports Bar,Sports Club,Stables,Stationery Store,Steakhouse,Street Food Gathering,Student Center,Supermarket,Sushi Restaurant,Szechuan Restaurant,Taco Place,Tapas Restaurant,Tea Room,Tennis Court,Thai Restaurant,Theater,Thrift / Vintage Store,Tour Provider,Tourist Information Center,Toy / Game Store,Track,Trail,Train Station,Tram Station,Turkish Restaurant,University,Vape Store,Vegetarian / Vegan Restaurant,Video Game Store,Vietnamese Restaurant,Warehouse Store,Whisky Bar,Wine Bar,Wine Shop,Wings Joint,Women's Store,Xinjiang Restaurant,Yoga Studio,Zoo Exhibit
0,"Bexley, Greenwich",0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,"Bexley, Greenwich",0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,"Bexley, Greenwich",0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,"Bexley, Greenwich",0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,"Bexley, Greenwich",0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


Grouping our results based on Neighbourhood and calculating the mean values for each venue categories

In [35]:
london_grouped = London_venue_cat.groupby('Neighbourhood').mean().reset_index()
london_grouped.head()

Unnamed: 0,Neighbourhood,Accessories Store,Adult Boutique,African Restaurant,American Restaurant,Antique Shop,Arcade,Arepa Restaurant,Argentinian Restaurant,Art Gallery,Art Museum,Arts & Crafts Store,Arts & Entertainment,Asian Restaurant,Athletics & Sports,Australian Restaurant,Auto Garage,Auto Workshop,Automotive Shop,BBQ Joint,Bagel Shop,Bakery,Bar,Beach,Bed & Breakfast,Beer Bar,Beer Garden,Beer Store,Bike Shop,Bistro,Boarding House,Bookstore,Botanical Garden,Boutique,Boxing Gym,Brasserie,Brazilian Restaurant,Breakfast Spot,Brewery,Bridal Shop,...,Sporting Goods Shop,Sports Bar,Sports Club,Stables,Stationery Store,Steakhouse,Street Food Gathering,Student Center,Supermarket,Sushi Restaurant,Szechuan Restaurant,Taco Place,Tapas Restaurant,Tea Room,Tennis Court,Thai Restaurant,Theater,Thrift / Vintage Store,Tour Provider,Tourist Information Center,Toy / Game Store,Track,Trail,Train Station,Tram Station,Turkish Restaurant,University,Vape Store,Vegetarian / Vegan Restaurant,Video Game Store,Vietnamese Restaurant,Warehouse Store,Whisky Bar,Wine Bar,Wine Shop,Wings Joint,Women's Store,Xinjiang Restaurant,Yoga Studio,Zoo Exhibit
0,Barnet,0.0,0.0,0.0,0.001812,0.0,0.0,0.0,0.007246,0.0,0.0,0.0,0.0,0.019928,0.0,0.0,0.0,0.007246,0.0,0.001812,0.012681,0.018116,0.005435,0.0,0.0,0.005435,0.0,0.0,0.0,0.0,0.0,0.009058,0.0,0.0,0.0,0.0,0.0,0.005435,0.0,0.0,...,0.005435,0.0,0.0,0.0,0.009058,0.001812,0.0,0.005435,0.03442,0.019928,0.0,0.0,0.001812,0.005435,0.0,0.01087,0.005435,0.0,0.0,0.0,0.001812,0.0,0.0,0.016304,0.0,0.027174,0.0,0.0,0.0,0.0,0.007246,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,"Barnet, Brent, Camden",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,Bexley,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.230769,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.115385,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,"Bexley, Greenwich",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.111111,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,"Bexley, Greenwich",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.285714,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.142857,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


Let's make a function to get the top most common venue categories.

In [36]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

There are too many venue categories, we can take the top 10 to cluster the neighbourhoods.

Creating a function to label the columns of the venue correctly.

In [37]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighbourhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

## Top venue categories

Getting the top venue categories in London

In [38]:
# create a new dataframe for London
neighborhoods_venues_sorted_london = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted_london['Neighbourhood'] = london_grouped['Neighbourhood']

for ind in np.arange(london_grouped.shape[0]):
    neighborhoods_venues_sorted_london.iloc[ind, 1:] = return_most_common_venues(london_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted_london.head()

Unnamed: 0,Neighbourhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Barnet,Coffee Shop,Café,Grocery Store,Pub,Bus Stop,Pharmacy,Italian Restaurant,Supermarket,Chinese Restaurant,Turkish Restaurant
1,"Barnet, Brent, Camden",Gym / Fitness Center,Bus Stop,Clothing Store,Supermarket,Hardware Store,Zoo Exhibit,Filipino Restaurant,Exhibit,Falafel Restaurant,Farmers Market
2,Bexley,Supermarket,Historic Site,Platform,Convenience Store,Train Station,Coffee Shop,Golf Course,Construction & Landscaping,Bus Stop,Park
3,"Bexley, Greenwich",Bus Stop,Golf Course,Sports Club,Construction & Landscaping,Home Service,Historic Site,Park,Massage Studio,Fast Food Restaurant,Ethiopian Restaurant
4,"Bexley, Greenwich",Supermarket,Platform,Convenience Store,Historic Site,Train Station,Coffee Shop,Zoo Exhibit,Event Space,Exhibit,Falafel Restaurant


# Model Building

## K Means

Choosing 5 as number of clusters to our dataset.

In [39]:
london = london_grouped.drop('Neighbourhood',axis=1)
model = KMeans(n_clusters=5, random_state=45)
model.fit(london)

KMeans(algorithm='auto', copy_x=True, init='k-means++', max_iter=300,
       n_clusters=5, n_init=10, n_jobs=None, precompute_distances='auto',
       random_state=45, tol=0.0001, verbose=0)

## Labelling the clustered data

In [40]:
model.labels_

array([0, 4, 2, 3, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], dtype=int32)

Adding these labels to our top common venues dataframe

In [41]:
neighborhoods_venues_sorted_london.insert(0,'Cluster label',model.labels_+1)

Joining intial dataframe with our combined data on neighbourhood to add latitude & longitude for each neighborhood to prepare it for plotting.

In [42]:
neighborhoods_venues_sorted_london.rename(columns={'Neighbourhood':'Borough'},inplace=True)

In [43]:
df.head()

Unnamed: 0,Borough,Town,Post_code,Latitude,Longitude
0,"Bexley, Greenwich",LONDON,SE2,51.49245,0.12127
1,"Ealing, Hammersmith and Fulham",LONDON,"W3, W4",51.51324,-0.26746
6,City,LONDON,EC3,51.512,-0.08058
7,Westminster,LONDON,WC2,51.51651,-0.11968
9,Bromley,LONDON,SE20,51.41009,-0.05683


In [44]:
df_london = pd.merge(df,neighborhoods_venues_sorted_london,on='Borough')
df_london.head()

Unnamed: 0,Borough,Town,Post_code,Latitude,Longitude,Cluster label,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,"Bexley, Greenwich",LONDON,SE2,51.49245,0.12127,3,Supermarket,Platform,Convenience Store,Historic Site,Train Station,Coffee Shop,Zoo Exhibit,Event Space,Exhibit,Falafel Restaurant
1,"Ealing, Hammersmith and Fulham",LONDON,"W3, W4",51.51324,-0.26746,1,Grocery Store,Indian Restaurant,Train Station,Park,Bed & Breakfast,Breakfast Spot,Home Service,Filipino Restaurant,Ethiopian Restaurant,Event Space
2,City,LONDON,EC3,51.512,-0.08058,1,Coffee Shop,Hotel,Italian Restaurant,Gym / Fitness Center,Pub,Restaurant,Wine Bar,Cocktail Bar,French Restaurant,Sandwich Place
3,City,LONDON,"EC1, EC2",51.51841,-0.08815,1,Coffee Shop,Hotel,Italian Restaurant,Gym / Fitness Center,Pub,Restaurant,Wine Bar,Cocktail Bar,French Restaurant,Sandwich Place
4,City,LONDON,EC4,51.51389,-0.10434,1,Coffee Shop,Hotel,Italian Restaurant,Gym / Fitness Center,Pub,Restaurant,Wine Bar,Cocktail Bar,French Restaurant,Sandwich Place


Checking for null values

In [45]:
df_london.isnull().sum()

Borough                   0
Town                      0
Post_code                 0
Latitude                  0
Longitude                 0
Cluster label             0
1st Most Common Venue     0
2nd Most Common Venue     0
3rd Most Common Venue     0
4th Most Common Venue     0
5th Most Common Venue     0
6th Most Common Venue     0
7th Most Common Venue     0
8th Most Common Venue     0
9th Most Common Venue     0
10th Most Common Venue    0
dtype: int64

## Visualizing the clustered neighborhood

In [46]:
geocode('London,England,GBR',as_featureset=True).features

[{"geometry": {"x": -0.1272099999999341, "y": 51.50642000000005, "spatialReference": {"wkid": 4326, "latestWkid": 4326}}, "attributes": {"Loc_name": "World", "Status": "T", "Score": 100, "Match_addr": "London, England", "LongLabel": "London, England, GBR", "ShortLabel": "London", "Addr_type": "Locality", "Type": "City", "PlaceName": "London", "Place_addr": "London, England", "Phone": "", "URL": "", "Rank": 1.75, "AddBldg": "", "AddNum": "", "AddNumFrom": "", "AddNumTo": "", "AddRange": "", "Side": "", "StPreDir": "", "StPreType": "", "StName": "", "StType": "", "StDir": "", "BldgType": "", "BldgName": "", "LevelType": "", "LevelName": "", "UnitType": "", "UnitName": "", "SubAddr": "", "StAddr": "", "Block": "", "Sector": "", "Nbrhd": "", "District": "", "City": "London", "MetroArea": "", "Subregion": "London", "Region": "England", "RegionAbbr": "ENG", "Territory": "", "Zone": "", "Postal": "", "PostalExt": "", "Country": "GBR", "LangCode": "ENG", "Distance": 0, "X": -0.1272099999999341

In [47]:
# create map
map_clusters = folium.Map(location=[51.50642000000005, -0.1272099999999341], zoom_start=11)

# set color scheme for the clusters
kclusters = 5  #same clusters as defined and fitted in the model
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(df_london['Latitude'], df_london['Longitude'], df_london['Borough'], df_london['Cluster label']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

### Let's verify each clusters

*Cluster 1*

In [48]:
df_london.loc[df_london['Cluster label'] == 1, df_london.columns[[1] + list(range(5,df_london.shape[1]))]]

Unnamed: 0,Town,Cluster label,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
1,LONDON,1,Grocery Store,Indian Restaurant,Train Station,Park,Bed & Breakfast,Breakfast Spot,Home Service,Filipino Restaurant,Ethiopian Restaurant,Event Space
2,LONDON,1,Coffee Shop,Hotel,Italian Restaurant,Gym / Fitness Center,Pub,Restaurant,Wine Bar,Cocktail Bar,French Restaurant,Sandwich Place
3,LONDON,1,Coffee Shop,Hotel,Italian Restaurant,Gym / Fitness Center,Pub,Restaurant,Wine Bar,Cocktail Bar,French Restaurant,Sandwich Place
4,LONDON,1,Coffee Shop,Hotel,Italian Restaurant,Gym / Fitness Center,Pub,Restaurant,Wine Bar,Cocktail Bar,French Restaurant,Sandwich Place
5,LONDON,1,Hotel,Coffee Shop,Café,Pub,Sandwich Place,Italian Restaurant,Theater,Restaurant,Bakery,Hotel Bar
...,...,...,...,...,...,...,...,...,...,...,...,...
303,LONDON,1,Grocery Store,Pub,Café,Bridal Shop,Park,Chinese Restaurant,Seafood Restaurant,Bar,Bakery,BBQ Joint
304,LONDON,1,Indian Restaurant,Pharmacy,Coffee Shop,Gym / Fitness Center,Supermarket,Gastropub,Sandwich Place,Pet Store,Italian Restaurant,Park
305,LONDON,1,Flower Shop,Pub,Park,Restaurant,Train Station,Gym / Fitness Center,Tennis Court,Wine Shop,Fish & Chips Shop,Film Studio
306,LONDON,1,Italian Restaurant,Coffee Shop,Pub,Wine Bar,Sandwich Place,Gym / Fitness Center,Hotel,Falafel Restaurant,Cocktail Bar,French Restaurant


*Cluster 2*

In [49]:
df_london.loc[df_london['Cluster label'] == 2, df_london.columns[[1] + list(range(5,df_london.shape[1]))]]

Unnamed: 0,Town,Cluster label,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
298,"HARROW, STANMOREEDGWARE, LONDON",2,Chinese Restaurant,Construction & Landscaping,Gym,Bakery,Metro Station,Food & Drink Shop,Flower Shop,Flea Market,Fishing Store,Fish Market


*Cluster 3*

In [50]:
df_london.loc[df_london['Cluster label'] == 3, df_london.columns[[1] + list(range(5,df_london.shape[1]))]]

Unnamed: 0,Town,Cluster label,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,LONDON,3,Supermarket,Platform,Convenience Store,Historic Site,Train Station,Coffee Shop,Zoo Exhibit,Event Space,Exhibit,Falafel Restaurant
175,"BEXLEYHEATH, LONDON",3,Supermarket,Historic Site,Platform,Convenience Store,Train Station,Coffee Shop,Golf Course,Construction & Landscaping,Bus Stop,Park
176,LONDON,3,Supermarket,Historic Site,Platform,Convenience Store,Train Station,Coffee Shop,Golf Course,Construction & Landscaping,Bus Stop,Park
177,"LONDON, SIDCUP",3,Supermarket,Historic Site,Platform,Convenience Store,Train Station,Coffee Shop,Golf Course,Construction & Landscaping,Bus Stop,Park
178,LONDON,3,Supermarket,Historic Site,Platform,Convenience Store,Train Station,Coffee Shop,Golf Course,Construction & Landscaping,Bus Stop,Park


*Cluster 4*

In [51]:
df_london.loc[df_london['Cluster label'] == 4, df_london.columns[[1] + list(range(5,df_london.shape[1]))]]

Unnamed: 0,Town,Cluster label,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
271,"LONDON, WELLING",4,Bus Stop,Golf Course,Sports Club,Construction & Landscaping,Home Service,Historic Site,Park,Massage Studio,Fast Food Restaurant,Ethiopian Restaurant
272,"LONDON, ERITH",4,Bus Stop,Golf Course,Sports Club,Construction & Landscaping,Home Service,Historic Site,Park,Massage Studio,Fast Food Restaurant,Ethiopian Restaurant


*Cluster 5*

In [52]:
df_london.loc[df_london['Cluster label'] == 5, df_london.columns[[1] + list(range(5,df_london.shape[1]))]]

Unnamed: 0,Town,Cluster label,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
252,LONDON,5,Gym / Fitness Center,Bus Stop,Clothing Store,Supermarket,Hardware Store,Zoo Exhibit,Filipino Restaurant,Exhibit,Falafel Restaurant,Farmers Market


# Exploring Paris

## Extracting data into a pandas dataframe

In [53]:
!wget -q -O 'paris_data.json' https://www.data.gouv.fr/fr/datasets/r/e88c6fda-1d09-42a0-a069-606d3259114e
print('File Downloaded')

File Downloaded


In [54]:
raw_data = pd.read_json('paris_data.json')
raw_data.head()

Unnamed: 0,datasetid,recordid,fields,geometry,record_timestamp
0,correspondances-code-insee-code-postal,2bf36b38314b6c39dfbcd09225f97fa532b1fc45,"{'code_comm': '645', 'nom_dept': 'ESSONNE', 's...","{'type': 'Point', 'coordinates': [2.2517129721...",2016-09-21T00:29:06.175+02:00
1,correspondances-code-insee-code-postal,7ee82e74e059b443df18bb79fc5a19b1f05e5a88,"{'code_comm': '133', 'nom_dept': 'SEINE-ET-MAR...","{'type': 'Point', 'coordinates': [3.0529405055...",2016-09-21T00:29:06.175+02:00
2,correspondances-code-insee-code-postal,e2cd3186f07286705ed482a10b6aebd9de633c81,"{'code_comm': '378', 'nom_dept': 'ESSONNE', 's...","{'type': 'Point', 'coordinates': [2.1971816504...",2016-09-21T00:29:06.175+02:00
3,correspondances-code-insee-code-postal,868bf03527a1d0a9defe5cf4e6fa0a730d725699,"{'code_comm': '243', 'nom_dept': 'SEINE-ET-MAR...","{'type': 'Point', 'coordinates': [2.7097808131...",2016-09-21T00:29:06.175+02:00
4,correspondances-code-insee-code-postal,21e809b1d4480333c8b6fe7addd8f3b06f343e2c,"{'code_comm': '003', 'nom_dept': 'VAL-DE-MARNE...","{'type': 'Point', 'coordinates': [2.3335102498...",2016-09-21T00:29:06.175+02:00


### Data Preprocessing

We will break down each of the nested fields and create the dataframe that we need.

In [55]:
processed = pd.DataFrame()
for i in raw_data['fields']:
  new = i
  processed = processed.append(new,ignore_index=True)

processed.head()

Unnamed: 0,code_arr,code_cant,code_comm,code_dept,code_reg,geo_point_2d,geo_shape,id_geofla,insee_com,nom_comm,nom_dept,nom_region,population,postal_code,statut,superficie,z_moyen
0,3,3,645,91,11,"[48.750443119964764, 2.251712972144151]","{'type': 'Polygon', 'coordinates': [[[2.238024...",16275,91645,VERRIERES-LE-BUISSON,ESSONNE,ILE-DE-FRANCE,15.5,91370,Commune simple,999.0,121.0
1,3,20,133,77,11,"[48.41256065214989, 3.052940505560729]","{'type': 'Polygon', 'coordinates': [[[3.076046...",31428,77133,COURCELLES-EN-BASSEE,SEINE-ET-MARNE,ILE-DE-FRANCE,0.2,77126,Commune simple,1082.0,88.0
2,1,9,378,91,11,"[48.52726809075556, 2.19718165044305]","{'type': 'Polygon', 'coordinates': [[[2.203466...",30975,91378,MAUCHAMPS,ESSONNE,ILE-DE-FRANCE,0.3,91730,Commune simple,313.0,150.0
3,5,14,243,77,11,"[48.87307018579678, 2.7097808131278462]","{'type': 'Polygon', 'coordinates': [[[2.727542...",17000,77243,LAGNY-SUR-MARNE,SEINE-ET-MARNE,ILE-DE-FRANCE,20.2,77400,Chef-lieu canton,579.0,71.0
4,3,34,3,94,11,"[48.80588035965699, 2.333510249842654]","{'type': 'Polygon', 'coordinates': [[[2.343851...",32123,94003,ARCUEIL,VAL-DE-MARNE,ILE-DE-FRANCE,19.5,94110,Chef-lieu canton,232.0,70.0


### Feature Selection

Selecting the columns that we are gonna work on.

In [56]:
df2 = processed[['postal_code','nom_comm','nom_dept','geo_point_2d']]
df2

Unnamed: 0,postal_code,nom_comm,nom_dept,geo_point_2d
0,91370,VERRIERES-LE-BUISSON,ESSONNE,"[48.750443119964764, 2.251712972144151]"
1,77126,COURCELLES-EN-BASSEE,SEINE-ET-MARNE,"[48.41256065214989, 3.052940505560729]"
2,91730,MAUCHAMPS,ESSONNE,"[48.52726809075556, 2.19718165044305]"
3,77400,LAGNY-SUR-MARNE,SEINE-ET-MARNE,"[48.87307018579678, 2.7097808131278462]"
4,94110,ARCUEIL,VAL-DE-MARNE,"[48.80588035965699, 2.333510249842654]"
...,...,...,...,...
1295,77520,CESSOY-EN-MONTOIS,SEINE-ET-MARNE,"[48.50730730461658, 3.138844194183689]"
1296,93420,VILLEPINTE,SEINE-SAINT-DENIS,"[48.95902025378707, 2.536306342059409]"
1297,77130,CANNES-ECLUSE,SEINE-ET-MARNE,"[48.36403767307805, 2.990786679832767]"
1298,78930,VILLETTE,YVELINES,"[48.92627887061508, 1.6937417245662671]"


In [57]:
df_paris = df2[df2['nom_dept'].str.contains('PARIS')].reset_index(drop=True)
df_paris

Unnamed: 0,postal_code,nom_comm,nom_dept,geo_point_2d
0,75009,PARIS-9E-ARRONDISSEMENT,PARIS,"[48.87689616237872, 2.337460241388529]"
1,75002,PARIS-2E-ARRONDISSEMENT,PARIS,"[48.86790337886785, 2.344107166658533]"
2,75011,PARIS-11E-ARRONDISSEMENT,PARIS,"[48.85941549762748, 2.378741060237548]"
3,75003,PARIS-3E-ARRONDISSEMENT,PARIS,"[48.86305413181178, 2.359361058970589]"
4,75006,PARIS-6E-ARRONDISSEMENT,PARIS,"[48.84896809191946, 2.332670898588416]"
5,75004,PARIS-4E-ARRONDISSEMENT,PARIS,"[48.854228281954754, 2.357361938142205]"
6,75001,PARIS-1ER-ARRONDISSEMENT,PARIS,"[48.8626304851685, 2.336293446550539]"
7,75017,PARIS-17E-ARRONDISSEMENT,PARIS,"[48.88733716648682, 2.307485559493426]"
8,75008,PARIS-8E-ARRONDISSEMENT,PARIS,"[48.87252726662346, 2.312582560420059]"
9,75013,PARIS-13E-ARRONDISSEMENT,PARIS,"[48.82871768452136, 2.362468228516128]"


In [58]:
df_paris.shape

(20, 4)

We will be working with 20 dataset out of 1300, which are all Paris datasets.

### Geo-location of Neighbourhoods

The coordinates we need are in the geo_point_2d column. Hence, separating them into latitudes and longitudes

In [59]:
locations = df_paris['geo_point_2d'].astype('str')
locations

0      [48.87689616237872, 2.337460241388529]
1      [48.86790337886785, 2.344107166658533]
2      [48.85941549762748, 2.378741060237548]
3      [48.86305413181178, 2.359361058970589]
4      [48.84896809191946, 2.332670898588416]
5     [48.854228281954754, 2.357361938142205]
6       [48.8626304851685, 2.336293446550539]
7      [48.88733716648682, 2.307485559493426]
8      [48.87252726662346, 2.312582560420059]
9      [48.82871768452136, 2.362468228516128]
10     [48.83515623066034, 2.419807034965275]
11    [48.844508659617546, 2.349859385560182]
12     [48.88686862295828, 2.384694327870042]
13     [48.86318677744551, 2.400819826729021]
14     [48.87602855694339, 2.361112904561707]
15     [48.86039876035177, 2.262099559395783]
16    [48.892735074561706, 2.348711933867703]
17     [48.85608259819694, 2.312438687733857]
18     [48.84015541860987, 2.293559372435076]
19     [48.82899321160942, 2.327100883257538]
Name: geo_point_2d, dtype: object

In [60]:
lat = locations.apply(lambda x: x.split(',')[0].lstrip('['))
lat

0      48.87689616237872
1      48.86790337886785
2      48.85941549762748
3      48.86305413181178
4      48.84896809191946
5     48.854228281954754
6       48.8626304851685
7      48.88733716648682
8      48.87252726662346
9      48.82871768452136
10     48.83515623066034
11    48.844508659617546
12     48.88686862295828
13     48.86318677744551
14     48.87602855694339
15     48.86039876035177
16    48.892735074561706
17     48.85608259819694
18     48.84015541860987
19     48.82899321160942
Name: geo_point_2d, dtype: object

In [61]:
lon = locations.apply(lambda x: x.split(',')[1].rstrip(']'))
lon

0      2.337460241388529
1      2.344107166658533
2      2.378741060237548
3      2.359361058970589
4      2.332670898588416
5      2.357361938142205
6      2.336293446550539
7      2.307485559493426
8      2.312582560420059
9      2.362468228516128
10     2.419807034965275
11     2.349859385560182
12     2.384694327870042
13     2.400819826729021
14     2.361112904561707
15     2.262099559395783
16     2.348711933867703
17     2.312438687733857
18     2.293559372435076
19     2.327100883257538
Name: geo_point_2d, dtype: object

In [62]:
#creating new column and adding the values
df_paris['Latitude'] = lat
df_paris['Longitude'] = lon

#converting into float
df_paris['Latitude'] = df_paris['Latitude'].astype('float')
df_paris['Longitude'] = df_paris['Longitude'].astype('float')

#dropping the geo_point_2d column
df_paris.drop('geo_point_2d',axis=1,inplace=True)

df_paris.head()

Unnamed: 0,postal_code,nom_comm,nom_dept,Latitude,Longitude
0,75009,PARIS-9E-ARRONDISSEMENT,PARIS,48.876896,2.33746
1,75002,PARIS-2E-ARRONDISSEMENT,PARIS,48.867903,2.344107
2,75011,PARIS-11E-ARRONDISSEMENT,PARIS,48.859415,2.378741
3,75003,PARIS-3E-ARRONDISSEMENT,PARIS,48.863054,2.359361
4,75006,PARIS-6E-ARRONDISSEMENT,PARIS,48.848968,2.332671


Co-ordinates for Paris

In [63]:
geocode('Paris,France,FR',as_featureset=True).features[0]

{"geometry": {"x": 2.3488000000000397, "y": 48.85341000000005, "spatialReference": {"wkid": 4326, "latestWkid": 4326}}, "attributes": {"Loc_name": "World", "Status": "M", "Score": 100, "Match_addr": "Paris France, Paris, \u00cele-de-France", "LongLabel": "Paris France, Paris, \u00cele-de-France, FRA", "ShortLabel": "Paris France", "Addr_type": "Locality", "Type": "City", "PlaceName": "Paris France", "Place_addr": "Paris, \u00cele-de-France", "Phone": "", "URL": "", "Rank": 9.75, "AddBldg": "", "AddNum": "", "AddNumFrom": "", "AddNumTo": "", "AddRange": "", "Side": "", "StPreDir": "", "StPreType": "", "StName": "", "StType": "", "StDir": "", "BldgType": "", "BldgName": "", "LevelType": "", "LevelName": "", "UnitType": "", "UnitName": "", "SubAddr": "", "StAddr": "", "Block": "", "Sector": "", "Nbrhd": "", "District": "", "City": "Paris", "MetroArea": "", "Subregion": "Paris", "Region": "\u00cele-de-France", "RegionAbbr": "", "Territory": "", "Zone": "", "Postal": "", "PostalExt": "", "C

## Visualizing Paris and its neighbourhood

In [64]:
map_paris = folium.Map(location=[48.85341000000005, 2.3488000000000397], zoom_start=11)

for lat, lon, borough, town in zip(df_paris['Latitude'], df_paris['Longitude'], df_paris['nom_comm'], df_paris['nom_dept']):
    label = folium.Popup('{}, {}'.format(borough, town), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#A52A2A',
        fill_opacity=0.7).add_to(map_paris)
       
map_paris

## Venues in Paris

In [65]:
venues_paris = getNearByVenues(df_paris['nom_comm'],df_paris['Latitude'],df_paris['Longitude'])

PARIS-9E-ARRONDISSEMENT
PARIS-2E-ARRONDISSEMENT
PARIS-11E-ARRONDISSEMENT
PARIS-3E-ARRONDISSEMENT
PARIS-6E-ARRONDISSEMENT
PARIS-4E-ARRONDISSEMENT
PARIS-1ER-ARRONDISSEMENT
PARIS-17E-ARRONDISSEMENT
PARIS-8E-ARRONDISSEMENT
PARIS-13E-ARRONDISSEMENT
PARIS-12E-ARRONDISSEMENT
PARIS-5E-ARRONDISSEMENT
PARIS-19E-ARRONDISSEMENT
PARIS-20E-ARRONDISSEMENT
PARIS-10E-ARRONDISSEMENT
PARIS-16E-ARRONDISSEMENT
PARIS-18E-ARRONDISSEMENT
PARIS-7E-ARRONDISSEMENT
PARIS-15E-ARRONDISSEMENT
PARIS-14E-ARRONDISSEMENT


In [66]:
venues_paris

Unnamed: 0,Neighbourhood,Neighbourhood Latitude,Neighbourhood Longitude,Venue,Venue Category
0,PARIS-9E-ARRONDISSEMENT,48.876896,2.337460,So Nat,Vegetarian / Vegan Restaurant
1,PARIS-9E-ARRONDISSEMENT,48.876896,2.337460,RAP,Gourmet Shop
2,PARIS-9E-ARRONDISSEMENT,48.876896,2.337460,Farine & O,Bakery
3,PARIS-9E-ARRONDISSEMENT,48.876896,2.337460,Le Bouclier de Bacchus,Wine Bar
4,PARIS-9E-ARRONDISSEMENT,48.876896,2.337460,Place Saint-Georges,Plaza
...,...,...,...,...,...
1270,PARIS-14E-ARRONDISSEMENT,48.828993,2.327101,L'Ordonnance,French Restaurant
1271,PARIS-14E-ARRONDISSEMENT,48.828993,2.327101,U Express,Supermarket
1272,PARIS-14E-ARRONDISSEMENT,48.828993,2.327101,Picard,Food & Drink Shop
1273,PARIS-14E-ARRONDISSEMENT,48.828993,2.327101,Parc Hotel Paris,Hotel


We managed to collect 1275 venue records

## Grouping Venue Categories

In [67]:
venues_paris.groupby('Venue Category').max()

Unnamed: 0_level_0,Neighbourhood,Neighbourhood Latitude,Neighbourhood Longitude,Venue
Venue Category,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
Afghan Restaurant,PARIS-11E-ARRONDISSEMENT,48.859415,2.378741,Afghanistan
African Restaurant,PARIS-9E-ARRONDISSEMENT,48.876896,2.361113,Wally Le Saharien
American Restaurant,PARIS-19E-ARRONDISSEMENT,48.892735,2.384694,Harper's
Antique Shop,PARIS-9E-ARRONDISSEMENT,48.876896,2.337460,Hôtel des Ventes Drouot
Argentinian Restaurant,PARIS-3E-ARRONDISSEMENT,48.863054,2.359361,Anahi
...,...,...,...,...
Wine Bar,PARIS-9E-ARRONDISSEMENT,48.892735,2.400820,Vingt Vins d'Art
Wine Shop,PARIS-3E-ARRONDISSEMENT,48.876029,2.400820,Trois Fois Vin
Women's Store,PARIS-2E-ARRONDISSEMENT,48.867903,2.344107,L'Appartement Sézane
Zoo,PARIS-12E-ARRONDISSEMENT,48.835156,2.419807,Parc zoologique de Paris


There are 209 unique venue categories, London as 300, so these two cities are equally diversed.

## One-Hot Encoding the category column

In [68]:
paris_encode = pd.get_dummies(venues_paris[['Venue Category']], prefix="", prefix_sep="")

#adding Neighbourhood
paris_encode['Neighbourhood'] = venues_paris['Neighbourhood']

# moving neighborhood column to the first column
fixed_columns = [paris_encode.columns[-1]] + list(paris_encode.columns[0:-1])
paris_encode = paris_encode[fixed_columns]

paris_encode.head()

Unnamed: 0,Neighbourhood,Afghan Restaurant,African Restaurant,American Restaurant,Antique Shop,Argentinian Restaurant,Art Gallery,Art Museum,Arts & Crafts Store,Asian Restaurant,Auvergne Restaurant,Baby Store,Bagel Shop,Bakery,Bar,Basque Restaurant,Bed & Breakfast,Beer Bar,Beer Garden,Beer Store,Bistro,Boat or Ferry,Bookstore,Boutique,Boxing Gym,Brasserie,Brazilian Restaurant,Breakfast Spot,Brewery,Bridge,Bubble Tea Shop,Burger Joint,Bus Station,Bus Stop,Café,Cambodian Restaurant,Canal,Candy Store,Cheese Shop,Chinese Restaurant,...,Seafood Restaurant,Shanxi Restaurant,Shoe Store,Shopping Mall,Snack Place,Soba Restaurant,South American Restaurant,Southwestern French Restaurant,Souvenir Shop,Souvlaki Shop,Spa,Spanish Restaurant,Sporting Goods Shop,Sports Bar,Steakhouse,Supermarket,Sushi Restaurant,Szechuan Restaurant,Taco Place,Tailor Shop,Tapas Restaurant,Tea Room,Tennis Stadium,Thai Restaurant,Theater,Tibetan Restaurant,Toy / Game Store,Trail,Turkish Restaurant,Udon Restaurant,Vegetarian / Vegan Restaurant,Venezuelan Restaurant,Video Game Store,Vietnamese Restaurant,Waterfront,Wine Bar,Wine Shop,Women's Store,Zoo,Zoo Exhibit
0,PARIS-9E-ARRONDISSEMENT,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0
1,PARIS-9E-ARRONDISSEMENT,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,PARIS-9E-ARRONDISSEMENT,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,PARIS-9E-ARRONDISSEMENT,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0
4,PARIS-9E-ARRONDISSEMENT,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


## Venue category mean value

In [69]:
Paris_grouped = paris_encode.groupby('Neighbourhood').mean().reset_index()
Paris_grouped.head()

Unnamed: 0,Neighbourhood,Afghan Restaurant,African Restaurant,American Restaurant,Antique Shop,Argentinian Restaurant,Art Gallery,Art Museum,Arts & Crafts Store,Asian Restaurant,Auvergne Restaurant,Baby Store,Bagel Shop,Bakery,Bar,Basque Restaurant,Bed & Breakfast,Beer Bar,Beer Garden,Beer Store,Bistro,Boat or Ferry,Bookstore,Boutique,Boxing Gym,Brasserie,Brazilian Restaurant,Breakfast Spot,Brewery,Bridge,Bubble Tea Shop,Burger Joint,Bus Station,Bus Stop,Café,Cambodian Restaurant,Canal,Candy Store,Cheese Shop,Chinese Restaurant,...,Seafood Restaurant,Shanxi Restaurant,Shoe Store,Shopping Mall,Snack Place,Soba Restaurant,South American Restaurant,Southwestern French Restaurant,Souvenir Shop,Souvlaki Shop,Spa,Spanish Restaurant,Sporting Goods Shop,Sports Bar,Steakhouse,Supermarket,Sushi Restaurant,Szechuan Restaurant,Taco Place,Tailor Shop,Tapas Restaurant,Tea Room,Tennis Stadium,Thai Restaurant,Theater,Tibetan Restaurant,Toy / Game Store,Trail,Turkish Restaurant,Udon Restaurant,Vegetarian / Vegan Restaurant,Venezuelan Restaurant,Video Game Store,Vietnamese Restaurant,Waterfront,Wine Bar,Wine Shop,Women's Store,Zoo,Zoo Exhibit
0,PARIS-10E-ARRONDISSEMENT,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.03,0.0,0.0,0.0,0.02,0.02,0.0,0.0,0.0,0.01,0.0,0.05,0.0,0.01,0.0,0.01,0.0,0.0,0.02,0.0,0.0,0.0,0.02,0.0,0.0,0.04,0.0,0.0,0.0,0.0,0.0,...,0.02,0.01,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.01,0.0,0.0,0.01,0.0,0.01,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.02,0.02,0.0,0.0,0.0
1,PARIS-11E-ARRONDISSEMENT,0.02439,0.0,0.0,0.0,0.0,0.0,0.02439,0.0,0.02439,0.0,0.0,0.0,0.04878,0.04878,0.0,0.0,0.0,0.0,0.0,0.02439,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.04878,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.02439,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02439,0.073171,0.0,0.02439,0.0,0.0,0.0,0.0
2,PARIS-12E-ARRONDISSEMENT,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.2
3,PARIS-13E-ARRONDISSEMENT,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.190476,0.0,0.0,0.0,0.015873,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.031746,0.0,0.015873,0.0,0.0,0.0,0.079365,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.079365,0.0,0.0,0.0,0.015873,0.0,0.0,0.0,0.0,0.0,0.206349,0.0,0.0,0.0,0.0,0.0,0.0
4,PARIS-14E-ARRONDISSEMENT,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.076923,0.0,0.0,0.0,0.0,0.0,0.0,0.076923,0.0,0.0,0.0,0.0,0.038462,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.038462,0.038462,0.0,0.0,0.0,0.0,0.038462,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


## Top venue categories

Using previously used function to determine the top venue categories

In [70]:
# create a new dataframe for Paris
neighborhoods_venues_sorted_paris = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted_paris['Neighbourhood'] = Paris_grouped['Neighbourhood']

for ind in np.arange(Paris_grouped.shape[0]):
    neighborhoods_venues_sorted_paris.iloc[ind, 1:] = return_most_common_venues(Paris_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted_paris.head()

Unnamed: 0,Neighbourhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,PARIS-10E-ARRONDISSEMENT,French Restaurant,Bistro,Hotel,Café,Coffee Shop,Italian Restaurant,Pizza Place,Asian Restaurant,Japanese Restaurant,Indian Restaurant
1,PARIS-11E-ARRONDISSEMENT,Restaurant,Vietnamese Restaurant,Italian Restaurant,Bakery,Bar,Café,Pastry Shop,French Restaurant,Mediterranean Restaurant,Plaza
2,PARIS-12E-ARRONDISSEMENT,Zoo Exhibit,Bistro,Monument / Landmark,Supermarket,Zoo,Argentinian Restaurant,Gas Station,Gaming Cafe,Furniture / Home Store,Frozen Yogurt Shop
3,PARIS-13E-ARRONDISSEMENT,Vietnamese Restaurant,Asian Restaurant,Thai Restaurant,Chinese Restaurant,French Restaurant,Japanese Restaurant,Juice Bar,Hotel,Bus Stop,Plaza
4,PARIS-14E-ARRONDISSEMENT,French Restaurant,Hotel,Bakery,Food & Drink Shop,Bistro,Italian Restaurant,Pizza Place,Fast Food Restaurant,Brasserie,Supermarket


# Model Building

## K Means

In [71]:
paris = Paris_grouped.drop('Neighbourhood',axis=1)
model = KMeans(n_clusters=5,random_state=45)
model.fit(paris)

KMeans(algorithm='auto', copy_x=True, init='k-means++', max_iter=300,
       n_clusters=5, n_init=10, n_jobs=None, precompute_distances='auto',
       random_state=45, tol=0.0001, verbose=0)

In [72]:
model.labels_

array([0, 0, 1, 2, 4, 0, 3, 4, 0, 0, 0, 0, 0, 0, 0, 0, 0, 4, 4, 0], dtype=int32)

### Labelling clustered data

In [73]:
neighborhoods_venues_sorted_paris.insert(0, 'Cluster Labels', model.labels_+1)

Joining our Paris data with the clustered data of venue categories

In [74]:
neighborhoods_venues_sorted_paris.rename(columns={'Neighbourhood':'nom_comm'},inplace=True)

In [75]:
df_paris = pd.merge(df_paris,neighborhoods_venues_sorted_paris,on='nom_comm')
df_paris.head()

Unnamed: 0,postal_code,nom_comm,nom_dept,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,75009,PARIS-9E-ARRONDISSEMENT,PARIS,48.876896,2.33746,1,French Restaurant,Hotel,Bistro,Japanese Restaurant,Cocktail Bar,Bakery,Lounge,Wine Bar,Gym / Fitness Center,Vegetarian / Vegan Restaurant
1,75002,PARIS-2E-ARRONDISSEMENT,PARIS,48.867903,2.344107,1,French Restaurant,Cocktail Bar,Bakery,Coffee Shop,Italian Restaurant,Hotel,Creperie,Wine Bar,Furniture / Home Store,Salad Place
2,75011,PARIS-11E-ARRONDISSEMENT,PARIS,48.859415,2.378741,1,Restaurant,Vietnamese Restaurant,Italian Restaurant,Bakery,Bar,Café,Pastry Shop,French Restaurant,Mediterranean Restaurant,Plaza
3,75003,PARIS-3E-ARRONDISSEMENT,PARIS,48.863054,2.359361,1,French Restaurant,Italian Restaurant,Japanese Restaurant,Coffee Shop,Sandwich Place,Wine Bar,Gourmet Shop,Bakery,Art Gallery,Cocktail Bar
4,75006,PARIS-6E-ARRONDISSEMENT,PARIS,48.848968,2.332671,1,Chocolate Shop,Pastry Shop,French Restaurant,Bakery,Wine Bar,Fountain,Plaza,Tea Room,Italian Restaurant,Mexican Restaurant


In [76]:
#checking for any null values
df_paris.isnull().sum()

postal_code               0
nom_comm                  0
nom_dept                  0
Latitude                  0
Longitude                 0
Cluster Labels            0
1st Most Common Venue     0
2nd Most Common Venue     0
3rd Most Common Venue     0
4th Most Common Venue     0
5th Most Common Venue     0
6th Most Common Venue     0
7th Most Common Venue     0
8th Most Common Venue     0
9th Most Common Venue     0
10th Most Common Venue    0
dtype: int64

## Visualizing clustered neighbourhood

In [77]:
geocode("Paris,France,FR",as_featureset=True).features[0]

{"geometry": {"x": 2.3488000000000397, "y": 48.85341000000005, "spatialReference": {"wkid": 4326, "latestWkid": 4326}}, "attributes": {"Loc_name": "World", "Status": "M", "Score": 100, "Match_addr": "Paris France, Paris, \u00cele-de-France", "LongLabel": "Paris France, Paris, \u00cele-de-France, FRA", "ShortLabel": "Paris France", "Addr_type": "Locality", "Type": "City", "PlaceName": "Paris France", "Place_addr": "Paris, \u00cele-de-France", "Phone": "", "URL": "", "Rank": 9.75, "AddBldg": "", "AddNum": "", "AddNumFrom": "", "AddNumTo": "", "AddRange": "", "Side": "", "StPreDir": "", "StPreType": "", "StName": "", "StType": "", "StDir": "", "BldgType": "", "BldgName": "", "LevelType": "", "LevelName": "", "UnitType": "", "UnitName": "", "SubAddr": "", "StAddr": "", "Block": "", "Sector": "", "Nbrhd": "", "District": "", "City": "Paris", "MetroArea": "", "Subregion": "Paris", "Region": "\u00cele-de-France", "RegionAbbr": "", "Territory": "", "Zone": "", "Postal": "", "PostalExt": "", "C

In [78]:
# create map
map_clusters = folium.Map(location=[48.85341000000005, 2.3488000000000397], zoom_start=11)

# set color scheme for the clusters
kclusters = 5  #same clusters as defined and fitted in the model
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(df_paris['Latitude'], df_paris['Longitude'], df_paris['nom_comm'], df_paris['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

### Examining our clusters

*Cluster 1*

In [79]:
df_paris.loc[df_paris['Cluster Labels'] == 1, df_paris.columns[[1] + list(range(5,df_paris.shape[1]))]]

Unnamed: 0,nom_comm,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,PARIS-9E-ARRONDISSEMENT,1,French Restaurant,Hotel,Bistro,Japanese Restaurant,Cocktail Bar,Bakery,Lounge,Wine Bar,Gym / Fitness Center,Vegetarian / Vegan Restaurant
1,PARIS-2E-ARRONDISSEMENT,1,French Restaurant,Cocktail Bar,Bakery,Coffee Shop,Italian Restaurant,Hotel,Creperie,Wine Bar,Furniture / Home Store,Salad Place
2,PARIS-11E-ARRONDISSEMENT,1,Restaurant,Vietnamese Restaurant,Italian Restaurant,Bakery,Bar,Café,Pastry Shop,French Restaurant,Mediterranean Restaurant,Plaza
3,PARIS-3E-ARRONDISSEMENT,1,French Restaurant,Italian Restaurant,Japanese Restaurant,Coffee Shop,Sandwich Place,Wine Bar,Gourmet Shop,Bakery,Art Gallery,Cocktail Bar
4,PARIS-6E-ARRONDISSEMENT,1,Chocolate Shop,Pastry Shop,French Restaurant,Bakery,Wine Bar,Fountain,Plaza,Tea Room,Italian Restaurant,Mexican Restaurant
5,PARIS-4E-ARRONDISSEMENT,1,French Restaurant,Ice Cream Shop,Park,Clothing Store,Hotel,Wine Bar,Pastry Shop,Pedestrian Plaza,Plaza,Italian Restaurant
6,PARIS-1ER-ARRONDISSEMENT,1,French Restaurant,Japanese Restaurant,Hotel,Plaza,Italian Restaurant,Art Museum,Café,Coffee Shop,Historic Site,Udon Restaurant
11,PARIS-5E-ARRONDISSEMENT,1,French Restaurant,Hotel,Italian Restaurant,Bakery,Plaza,Café,Coffee Shop,Pub,Wine Bar,Vietnamese Restaurant
12,PARIS-19E-ARRONDISSEMENT,1,French Restaurant,Bar,Supermarket,Bakery,Hotel,Brewery,Japanese Restaurant,Bistro,Seafood Restaurant,Beer Bar
13,PARIS-20E-ARRONDISSEMENT,1,Plaza,Bistro,Bakery,Japanese Restaurant,Bar,French Restaurant,Café,Italian Restaurant,Hotel,Park


Cluster 2

In [80]:
df_paris.loc[df_paris['Cluster Labels'] == 2, df_paris.columns[[1] + list(range(5,df_paris.shape[1]))]]

Unnamed: 0,nom_comm,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
10,PARIS-12E-ARRONDISSEMENT,2,Zoo Exhibit,Bistro,Monument / Landmark,Supermarket,Zoo,Argentinian Restaurant,Gas Station,Gaming Cafe,Furniture / Home Store,Frozen Yogurt Shop


*Cluster 3*

In [81]:
df_paris.loc[df_paris['Cluster Labels'] == 3, df_paris.columns[[1] + list(range(5,df_paris.shape[1]))]]

Unnamed: 0,nom_comm,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
9,PARIS-13E-ARRONDISSEMENT,3,Vietnamese Restaurant,Asian Restaurant,Thai Restaurant,Chinese Restaurant,French Restaurant,Japanese Restaurant,Juice Bar,Hotel,Bus Stop,Plaza


*Cluster 4*

In [82]:
df_paris.loc[df_paris['Cluster Labels'] == 4, df_paris.columns[[1] + list(range(5,df_paris.shape[1]))]]

Unnamed: 0,nom_comm,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
15,PARIS-16E-ARRONDISSEMENT,4,Lake,Pool,Plaza,French Restaurant,Bus Station,Art Museum,Boat or Ferry,Park,Zoo Exhibit,Farmers Market


*Cluster 5*

In [83]:
df_paris.loc[df_paris['Cluster Labels'] == 5, df_paris.columns[[1] + list(range(5,df_paris.shape[1]))]]

Unnamed: 0,nom_comm,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
7,PARIS-17E-ARRONDISSEMENT,5,French Restaurant,Hotel,Italian Restaurant,Restaurant,Japanese Restaurant,Café,Bakery,Plaza,Bistro,Tennis Stadium
8,PARIS-8E-ARRONDISSEMENT,5,French Restaurant,Hotel,Spa,Bakery,Grocery Store,Department Store,Corsican Restaurant,Resort,Park,Cocktail Bar
17,PARIS-7E-ARRONDISSEMENT,5,French Restaurant,Hotel,Café,Plaza,Italian Restaurant,Art Museum,Cocktail Bar,History Museum,Historic Site,Irish Pub
19,PARIS-14E-ARRONDISSEMENT,5,French Restaurant,Hotel,Bakery,Food & Drink Shop,Bistro,Italian Restaurant,Pizza Place,Fast Food Restaurant,Brasserie,Supermarket
