# The Battle of Neighborhoods 

## Introduction/Business Problem

We want to open a chinese chain restaurant in some town in France. We want to know where are the best departments to implant in the beginning and so know where will be the restaurant opened.
For this business problem, we will explore localisation data in order to use some informations like the density, the number of restaurants, etc

## Data

For this, we will use a csv file with:
- Department 
- Name 
- postal code  
- Population in 2012 
- Density
- Surface
the csv file is avalable in https://sql.sh/736-base-donnees-villes-francaises  and are made in 2016.

We will use the foursquare API to have informations about the venues and to help us take the best places.

## Methodology


Data has the main components: department, name, Latitude and Longitude informations of the department with the density and the population . We will add some informations about venues by using the Foursquare API.
before this, we will filter the departments in the way that we need the best depqrtments with big population

#### import libraries

In [14]:
import pandas as pd
import folium
import requests # library to handle requests
from sklearn.cluster import KMeans


#### read table

In [15]:
villes= pd.read_csv("labs/DP0701EN/villes_france.csv", sep=';')

In [16]:
villes.head()

Unnamed: 0,department,name,postalcode,population,population.1,densite,surface,lat,long
0,ozan,OZAN,1190,500,50000,93,6.6,4.91667,46.3833
1,cormoranche-sur-saone,CORMORANCHE-SUR-SAONE,1290,1000,100000,107,9.85,4.83333,46.2333
2,plagne-01,PLAGNE,1130,100,10000,20,6.2,5.73333,46.1833
3,tossiat,TOSSIAT,1250,1400,140000,138,10.17,5.31667,46.1333
4,pouillat,POUILLAT,1250,100,10000,14,6.23,5.43333,46.3333


In [17]:
villes.dtypes

department       object
name             object
postalcode       object
population        int64
population.1      int64
densite           int64
surface         float64
lat             float64
long            float64
dtype: object

In [5]:
villes.shape

(36700, 9)

#### Define credentials for API

In [43]:
CLIENT_ID = '' # my Foursquare ID
CLIENT_SECRET = '' # my Foursquare Secret
VERSION = '20180605' # Foursquare API version

print('my credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

my credentails:
CLIENT_ID: 
CLIENT_SECRET:


In [26]:
def getNearbyVenues(names, latitudes, longitudes, radius=120000):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

We decide to only explore departments with big density meaning there are a lot of population  

In [20]:

villes_eligible=villes[villes['densite']>10000]

In [21]:
villes_eligible.shape

(37, 9)

In [22]:
villes_eligible#.head()


Unnamed: 0,department,name,postalcode,population,population.1,densite,surface,lat,long
28152,lyon,LYON,69001-69002-69003-69004-69005-69006-69007-6900...,474900,47490000,10117,47.87,4.84139,45.7589
30437,paris,PARIS,75001-75002-75003-75004-75005-75006-75007-7500...,2211000,221100000,21288,105.4,2.34445,48.86
35900,neuilly-sur-seine,NEUILLY-SUR-SEINE,92200,60300,6030000,16556,3.73,2.26667,48.8833
35901,chatillon-92,CHATILLON,92320,32500,3250000,11170,2.92,2.28333,48.8
35902,bois-colombes,BOIS-COLOMBES,92270,27800,2780000,15252,1.92,2.26667,48.9167
35903,puteaux,PUTEAUX,92800,44500,4450000,14029,3.19,2.23333,48.8667
35907,issy-les-moulineaux,ISSY-LES-MOULINEAUX,92130,63300,6330000,15142,4.25,2.26667,48.8167
35912,levallois-perret,LEVALLOIS-PERRET,92300,63000,6300000,26660,2.41,2.28333,48.9
35913,boulogne-billancourt,BOULOGNE-BILLANCOURT,92100,112200,11220000,18509,6.17,2.25,48.8333
35914,asnieres-sur-seine,ASNIERES-SUR-SEINE,92600,81700,8170000,17080,4.82,2.28556,48.9112


I used python folium library to visualize geographic details of France and its departments and we created a map. I used latitude and longitude values to get the visual as below:

In [23]:
Long_paris=2.34445 # 2.213749
Lat_paris=48.8600 # 46.227638
# create map of France using latitude and longitude values
map_france = folium.Map(location=[Lat_paris, Long_paris], zoom_start=10)

# add markers to map
for lat, lng in zip(villes_eligible['lat'], villes_eligible['long']):
    folium.CircleMarker(
        [lat, lng],
        radius=1000,
        color='blue',
        fill=True #
    ).add_to(map_france)  
   


In [44]:
map_france

### Get the venues by using foursquare API

In [27]:
# type your answer here
LIMIT= 100 # limit of number of venues returned by Foursquare API
department_venues = getNearbyVenues(names=villes_eligible['department'],
                                   latitudes=villes_eligible['lat'],
                                   longitudes=villes_eligible['long']
                                  )

lyon
paris
neuilly-sur-seine
chatillon-92
bois-colombes
puteaux
issy-les-moulineaux
levallois-perret
boulogne-billancourt
asnieres-sur-seine
colombes
courbevoie
montrouge
clichy
vanves
suresnes
garenne-colombes
malakoff
bourg-la-reine
bagnolet
montreuil-93
lilas
aubervilliers
pre-saint-gervais
pantin
saint-ouen-93
epinay-sur-seine
alfortville
saint-mande
kremlin-bicetre
nogent-sur-marne
cachan
vincennes
charenton-le-pont
saint-maurice-94
gentilly
villejuif


In [28]:
print(department_venues.shape)
department_venues.head()

(177, 7)


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,lyon,4.84139,45.7589,GARAGE CORON,4.626298,45.586771,Automotive Shop
1,lyon,4.84139,45.7589,GARAGE DE FLACE - SARL GUYONET GUTRIN,4.813963,46.320638,Automotive Shop
2,lyon,4.84139,45.7589,GARAGE STEINMETZ,4.073168,45.605982,Automotive Shop
3,lyon,4.84139,45.7589,Indian Motorcycle Valence,4.885476,44.965873,Motorcycle Shop
4,lyon,4.84139,45.7589,GARAGE DU PARC - EURL MAYEUL FRADIN,4.066383,46.058804,Automotive Shop


In [29]:
#department_venues_save=department_venues

In [30]:
import pickle
#pickle.dump( department_venues, open( "save.p", "wb" ) )

In [None]:
#department_venues=pickle.load( "save.p" )

In [31]:
department_venues

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,lyon,4.84139,45.7589,GARAGE CORON,4.626298,45.586771,Automotive Shop
1,lyon,4.84139,45.7589,GARAGE DE FLACE - SARL GUYONET GUTRIN,4.813963,46.320638,Automotive Shop
2,lyon,4.84139,45.7589,GARAGE STEINMETZ,4.073168,45.605982,Automotive Shop
3,lyon,4.84139,45.7589,Indian Motorcycle Valence,4.885476,44.965873,Motorcycle Shop
4,lyon,4.84139,45.7589,GARAGE DU PARC - EURL MAYEUL FRADIN,4.066383,46.058804,Automotive Shop
...,...,...,...,...,...,...,...
172,gentilly,2.33333,48.8167,JCV 45,1.784857,47.982413,Automotive Shop
173,villejuif,2.36667,48.8000,MCA,2.314538,49.017836,Automotive Shop
174,villejuif,2.36667,48.8000,Domino's Pizza Cergy-Pontoise,2.080840,49.036700,Pizza Place
175,villejuif,2.36667,48.8000,SAINT JEAN AUTOMOBILES,3.023078,48.951442,Automotive Shop


In [32]:
department_venues.groupby('Neighborhood').count()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
alfortville,5,5,5,5,5,5
asnieres-sur-seine,4,4,4,4,4,4
aubervilliers,4,4,4,4,4,4
bagnolet,6,6,6,6,6,6
bois-colombes,4,4,4,4,4,4
boulogne-billancourt,5,5,5,5,5,5
bourg-la-reine,4,4,4,4,4,4
cachan,4,4,4,4,4,4
charenton-le-pont,5,5,5,5,5,5
chatillon-92,4,4,4,4,4,4


In [33]:
print('There are {} uniques categories.'.format(len(department_venues['Venue Category'].unique())))

There are 6 uniques categories.


In [34]:
# one hot encoding
department_onehot = pd.get_dummies(department_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
department_onehot['Neighborhood'] = department_venues['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [department_onehot.columns[-1]] + list(department_onehot.columns[:-1])
department_onehot = department_onehot[fixed_columns]

department_onehot.head()

Unnamed: 0,Neighborhood,Automotive Shop,Cafeteria,Food Court,Motorcycle Shop,Pizza Place,Restaurant
0,lyon,1,0,0,0,0,0
1,lyon,1,0,0,0,0,0
2,lyon,1,0,0,0,0,0
3,lyon,0,0,0,1,0,0
4,lyon,1,0,0,0,0,0


In [35]:
department_onehot.shape

(177, 7)

In [36]:
department_grouped = department_onehot.groupby('Neighborhood').mean().reset_index()
department_grouped

Unnamed: 0,Neighborhood,Automotive Shop,Cafeteria,Food Court,Motorcycle Shop,Pizza Place,Restaurant
0,alfortville,0.8,0.0,0.0,0.0,0.2,0.0
1,asnieres-sur-seine,0.75,0.0,0.0,0.0,0.25,0.0
2,aubervilliers,0.5,0.0,0.25,0.0,0.25,0.0
3,bagnolet,0.333333,0.166667,0.166667,0.0,0.166667,0.166667
4,bois-colombes,0.75,0.0,0.0,0.0,0.25,0.0
5,boulogne-billancourt,0.8,0.0,0.0,0.0,0.2,0.0
6,bourg-la-reine,0.75,0.0,0.0,0.0,0.25,0.0
7,cachan,0.75,0.0,0.0,0.0,0.25,0.0
8,charenton-le-pont,0.8,0.0,0.0,0.0,0.2,0.0
9,chatillon-92,0.75,0.0,0.0,0.0,0.25,0.0


#### Kmeans will be use for clustering the top density departments

In [37]:
# set number of clusters
kclusters = 4

department_grouped_clustering = department_grouped.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(department_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

array([0, 3, 2, 1, 3, 0, 3, 3, 0, 3], dtype=int32)

In [38]:
# add clustering labels
department_grouped.insert(0, 'Cluster Labels', kmeans.labels_)

villes_merged = villes_eligible

# merge toronto_grouped with toronto_data to add latitude/longitude for each neighborhood
villes_merged = villes_merged.join(department_grouped.set_index('Neighborhood'), on='department')

villes_merged.head() # check the last columns!

Unnamed: 0,department,name,postalcode,population,population.1,densite,surface,lat,long,Cluster Labels,Automotive Shop,Cafeteria,Food Court,Motorcycle Shop,Pizza Place,Restaurant
28152,lyon,LYON,69001-69002-69003-69004-69005-69006-69007-6900...,474900,47490000,10117,47.87,4.84139,45.7589,0,0.888889,0.0,0.0,0.111111,0.0,0.0
30437,paris,PARIS,75001-75002-75003-75004-75005-75006-75007-7500...,2211000,221100000,21288,105.4,2.34445,48.86,3,0.75,0.0,0.0,0.0,0.25,0.0
35900,neuilly-sur-seine,NEUILLY-SUR-SEINE,92200,60300,6030000,16556,3.73,2.26667,48.8833,0,0.8,0.0,0.0,0.0,0.2,0.0
35901,chatillon-92,CHATILLON,92320,32500,3250000,11170,2.92,2.28333,48.8,3,0.75,0.0,0.0,0.0,0.25,0.0
35902,bois-colombes,BOIS-COLOMBES,92270,27800,2780000,15252,1.92,2.26667,48.9167,3,0.75,0.0,0.0,0.0,0.25,0.0


# Results

Here are the description of the clusters

In [39]:
villes_merged.loc[villes_merged['Cluster Labels'] == 0, villes_merged.columns[[1] + list(range(5, villes_merged.shape[1]))]]

Unnamed: 0,name,densite,surface,lat,long,Cluster Labels,Automotive Shop,Cafeteria,Food Court,Motorcycle Shop,Pizza Place,Restaurant
28152,LYON,10117,47.87,4.84139,45.7589,0,0.888889,0.0,0.0,0.111111,0.0,0.0
35900,NEUILLY-SUR-SEINE,16556,3.73,2.26667,48.8833,0,0.8,0.0,0.0,0.0,0.2,0.0
35903,PUTEAUX,14029,3.19,2.23333,48.8667,0,0.833333,0.0,0.0,0.0,0.166667,0.0
35907,ISSY-LES-MOULINEAUX,15142,4.25,2.26667,48.8167,0,0.8,0.0,0.0,0.0,0.2,0.0
35913,BOULOGNE-BILLANCOURT,18509,6.17,2.25,48.8333,0,0.8,0.0,0.0,0.0,0.2,0.0
35915,COLOMBES,10934,7.81,2.25,48.9167,0,0.833333,0.0,0.0,0.0,0.166667,0.0
35919,COURBEVOIE,20975,4.17,2.25222,48.8973,0,0.833333,0.0,0.0,0.0,0.166667,0.0
35925,SURESNES,12327,3.79,2.23333,48.8667,0,0.833333,0.0,0.0,0.0,0.166667,0.0
35929,LA GARENNE-COLOMBES,15521,1.78,2.25,48.9,0,0.833333,0.0,0.0,0.0,0.166667,0.0
35980,ALFORTVILLE,12043,3.67,2.41667,48.8,0,0.8,0.0,0.0,0.0,0.2,0.0


In [40]:
villes_merged.loc[villes_merged['Cluster Labels'] == 1, villes_merged.columns[[1] + list(range(5, villes_merged.shape[1]))]]

Unnamed: 0,name,densite,surface,lat,long,Cluster Labels,Automotive Shop,Cafeteria,Food Court,Motorcycle Shop,Pizza Place,Restaurant
35937,BAGNOLET,13184,2.57,2.41667,48.8667,1,0.333333,0.166667,0.166667,0.0,0.166667,0.166667
35950,LES LILAS,17641,1.26,2.43333,48.8833,1,0.333333,0.166667,0.166667,0.0,0.166667,0.166667
35963,LE PRE-SAINT-GERVAIS,25778,0.7,2.4,48.8833,1,0.333333,0.166667,0.166667,0.0,0.166667,0.166667
35965,PANTIN,10805,5.01,2.4,48.9,1,0.4,0.2,0.2,0.0,0.2,0.0


In [41]:
villes_merged.loc[villes_merged['Cluster Labels'] == 2, villes_merged.columns[[1] + list(range(5, villes_merged.shape[1]))]]

Unnamed: 0,name,densite,surface,lat,long,Cluster Labels,Automotive Shop,Cafeteria,Food Court,Motorcycle Shop,Pizza Place,Restaurant
35954,AUBERVILLIERS,13209,5.76,2.38333,48.9167,2,0.5,0.0,0.25,0.0,0.25,0.0
35971,EPINAY-SUR-SEINE,11929,4.57,2.30833,48.95,2,0.5,0.0,0.25,0.0,0.25,0.0


In [42]:
villes_merged.loc[villes_merged['Cluster Labels'] == 3, villes_merged.columns[[1] + list(range(5, villes_merged.shape[1]))]]

Unnamed: 0,name,densite,surface,lat,long,Cluster Labels,Automotive Shop,Cafeteria,Food Court,Motorcycle Shop,Pizza Place,Restaurant
30437,PARIS,21288,105.4,2.34445,48.86,3,0.75,0.0,0.0,0.0,0.25,0.0
35901,CHATILLON,11170,2.92,2.28333,48.8,3,0.75,0.0,0.0,0.0,0.25,0.0
35902,BOIS-COLOMBES,15252,1.92,2.26667,48.9167,3,0.75,0.0,0.0,0.0,0.25,0.0
35912,LEVALLOIS-PERRET,26660,2.41,2.28333,48.9,3,0.75,0.0,0.0,0.0,0.25,0.0
35914,ASNIERES-SUR-SEINE,17080,4.82,2.28556,48.9112,3,0.75,0.0,0.0,0.0,0.25,0.0
35920,MONTROUGE,23476,2.07,2.31667,48.8167,3,0.75,0.0,0.0,0.0,0.25,0.0
35922,CLICHY,19128,3.08,2.3,48.9,3,0.75,0.0,0.0,0.0,0.25,0.0
35923,VANVES,17308,1.56,2.3,48.8333,3,0.75,0.0,0.0,0.0,0.25,0.0
35933,MALAKOFF,14967,2.07,2.3,48.8167,3,0.75,0.0,0.0,0.0,0.25,0.0
35935,BOURG-LA-REINE,10744,1.86,2.31667,48.775,3,0.75,0.0,0.0,0.0,0.25,0.0


# Discussion

We have found four clusters by using the kmeans method. The best places seems to be the cluster 2. We have a cluster with a good representation of Automotive Shop, Cafeteria, Food Court Shop, Pizza Place and Restaurants

# Conclusion

To our customers, we will propose to search a place in BAGNOLET, LES LILAS, LE PRE-SAINT-GERVAIS or PANTIN because they are part of places with a high population density, we have around the centers some restaurants. and food places which means a real interrest  for investments