# Coursera Capstone Project

# Introduction

For my *Capstone project* I will help an investor who is about to start a business of importing and selling **BIO: vegetables** and  **fruits**.


# Description and background

In the past few years my client have tried to start his dream business but he failed many times. So this time he consulted me for help and as a data scientist I decided to start by describing my approach.
- **NB**: our client wants to open his Bio store  in **Paris city** but no matter which neighbourhood.

1. **First**: we will search a list of all paris neighbourhoods.
2. **Second**: we will sort all neighbourhoods by the number of restaurants and fast foods
3. Then the **Third** step: is to inverse sort neighbourhoods by the number of store and super markets
4. The last step is to reverse order our data by the population

# Data of our project

## Paris neighbourhoods data
To derive our solution, We leverage JSON data available at  the french government website: [https://www.data.gouv.fr/fr/datasets/r/e88c6fda-1d09-42a0-a069-606d3259114e](https://www.data.gouv.fr/fr/datasets/r/e88c6fda-1d09-42a0-a069-606d3259114e)

The JSON file has data about all the neighbourhoods in **France**; we will limit it to Paris using department code = 75.
columns we will be using are:
1.  _postal_code_: Postal codes for France
2.  _nom_comm_: Name of Neighborhoods in France
3.  _nom_dept_: Name of the boroughs, equivalent to towns in France
4.  _geo_point_2d_: Tuple containing the latitude and longitude of the Neighborhoods.

## Foursquare API
In this next step we will be using **Foursquare** data api to search for the number of restaurant in every **Paris** area.
And to have better idea on our concurrents we will look for the number of **stores** or **super markets** in all those areas

# Final Step
After all those researches we will provide to our client a list of all neighbourhoods with more restaurants and less concurrency and of course the results will be sorted and we let our customer choose were he will be openning his business


# Let's first import all modules and packages we need

In [201]:
import pandas as pd
import urllib.request as req
import requests
from geopy.geocoders import Nominatim
import folium
import json

## Retreiving data
In this step we will visit France locations information from the government website `https://data.gouv.fr` and we store the result in a file called **data.json**

In [202]:
url = 'https://www.data.gouv.fr/fr/datasets/r/e88c6fda-1d09-42a0-a069-606d3259114e'
req.urlretrieve(url, 'data.json')

('data.json', <http.client.HTTPMessage at 0x7ffa839b3070>)

## Data cleaning
Now we will build our pandas Dataframe by dropping all additionnal columns and keeping only those we are interested in

In [203]:
france_df = pd.read_json("data.json")
france_df=pd.json_normalize(france_df.fields)
france_df.drop(["z_moyen","insee_com", "id_geofla", "code_cant", "superficie", "code_arr", "geo_shape.type","geo_shape.coordinates"], inplace=True, axis=1)
france_df[["code_comm","code_reg","population","code_dept"]] = france_df[["code_comm","code_reg","population","code_dept"]].apply(pd.to_numeric)
print(france_df.shape)
france_df.head()

(1300, 10)


Unnamed: 0,code_comm,nom_dept,statut,nom_region,code_reg,code_dept,geo_point_2d,postal_code,nom_comm,population
0,645,ESSONNE,Commune simple,ILE-DE-FRANCE,11,91,"[48.750443119964764, 2.251712972144151]",91370,VERRIERES-LE-BUISSON,15.5
1,133,SEINE-ET-MARNE,Commune simple,ILE-DE-FRANCE,11,77,"[48.41256065214989, 3.052940505560729]",77126,COURCELLES-EN-BASSEE,0.2
2,378,ESSONNE,Commune simple,ILE-DE-FRANCE,11,91,"[48.52726809075556, 2.19718165044305]",91730,MAUCHAMPS,0.3
3,243,SEINE-ET-MARNE,Chef-lieu canton,ILE-DE-FRANCE,11,77,"[48.87307018579678, 2.7097808131278462]",77400,LAGNY-SUR-MARNE,20.2
4,414,SEINE-ET-MARNE,Commune simple,ILE-DE-FRANCE,11,77,"[48.62891464105825, 3.2582355268439223]",77160,SAINT-HILLIERS,0.4


Then we keep only **Paris** neighboorhoods by keeping only rows with `code_dept = 75`

In [204]:
paris_filtered_df = france_df.loc[france_df['code_dept'] == 75].reset_index()
print(paris_filtered_df.shape)
paris_filtered_df.head()

(20, 11)


Unnamed: 0,index,code_comm,nom_dept,statut,nom_region,code_reg,code_dept,geo_point_2d,postal_code,nom_comm,population
0,96,109,PARIS,Chef-lieu canton,ILE-DE-FRANCE,11,75,"[48.87689616237872, 2.337460241388529]",75009,PARIS-9E-ARRONDISSEMENT,60.3
1,102,102,PARIS,Chef-lieu canton,ILE-DE-FRANCE,11,75,"[48.86790337886785, 2.344107166658533]",75002,PARIS-2E-ARRONDISSEMENT,22.4
2,103,111,PARIS,Chef-lieu canton,ILE-DE-FRANCE,11,75,"[48.85941549762748, 2.378741060237548]",75011,PARIS-11E-ARRONDISSEMENT,152.7
3,113,108,PARIS,Chef-lieu canton,ILE-DE-FRANCE,11,75,"[48.87252726662346, 2.312582560420059]",75008,PARIS-8E-ARRONDISSEMENT,40.3
4,126,113,PARIS,Chef-lieu canton,ILE-DE-FRANCE,11,75,"[48.82871768452136, 2.362468228516128]",75013,PARIS-13E-ARRONDISSEMENT,182.0


Now let's split `geo_point_2d` into `Latitude` and `Longitude` and keep only fields we are interested in

In [205]:
paris_df = pd.DataFrame(list(paris_filtered_df["geo_point_2d"]), columns=['Latitude','Longitude'])
paris_df[["City", "Neighborhood", "postal_code", "population"]] = paris_filtered_df[["nom_dept", "nom_comm", "postal_code","population"]]
paris_df.head()


Unnamed: 0,Latitude,Longitude,City,Neighborhood,postal_code,population
0,48.876896,2.33746,PARIS,PARIS-9E-ARRONDISSEMENT,75009,60.3
1,48.867903,2.344107,PARIS,PARIS-2E-ARRONDISSEMENT,75002,22.4
2,48.859415,2.378741,PARIS,PARIS-11E-ARRONDISSEMENT,75011,152.7
3,48.872527,2.312583,PARIS,PARIS-8E-ARRONDISSEMENT,75008,40.3
4,48.828718,2.362468,PARIS,PARIS-13E-ARRONDISSEMENT,75013,182.0


Let's extract Paris coordinates using **Nominatim**

In [206]:
address = 'Paris'

geolocator = Nominatim(user_agent="paris_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Paris City are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Paris City are 48.8566969, 2.3514616.


Using folium let's show paris boroughs on th Map

In [207]:
paris_map = folium.Map(location=[latitude, longitude], zoom_start=10)

# add markers to map
for lat, lng, borough, neighborhood in zip(paris_df['Latitude'], paris_df['Longitude'], paris_df['City'], paris_df['Neighborhood']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7).add_to(paris_map)
    
paris_map

Here we set all variables we need for our Foursquare API

In [208]:
### Define Foursquare Credentials and Version
CLIENT_ID = 'ILCOGTVELSR3YCEBTNWE1S0P35GGN33ORE3GK0HUQORM1DCO'
CLIENT_SECRET = 'TMHARFJZAWLFOUPHD0B3N1SXTURRWTVQH1WGDZERHKLYCYSR'
VERSION = '20180605'

# Limit of number of venues returned by Foursquare API
LIMIT = 100 
# Define radius of search for Foursquare API
radius = 500

## Getting results
Here we will for every Neighboorhood count the number of restaurants

In [209]:
food_category_id = '4d4b7105d754a06374d81259,4bf58dd8d48988d1ca941735,4bf58dd8d48988d1cc941735'
count_restaurants = {"Neighborhood":[], "count_restaurants":[]}
for lat, lng, Neighborhood in zip(paris_df['Latitude'], paris_df['Longitude'], paris_df["Neighborhood"]):
    url='https://api.foursquare.com/v2/venues/explore?client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}&categoryId={}'.format(
        CLIENT_ID,CLIENT_SECRET,VERSION,
        lat,lng,
        radius, LIMIT, food_category_id)
    results = requests.get(url).json()
    venues = results["response"]["groups"][0]["items"]
    count_restaurants["Neighborhood"].append(Neighborhood)
    count_restaurants["count_restaurants"].append(len(venues))



Add the restaurants count to our Paris dataframe

In [210]:
restaurants_df = pd.DataFrame(counts, columns=["Neighborhood", "count"])

new_df = pd.merge(paris_df, restaurants_df)
new_df.rename(columns={"count": "count_restaurants"}, inplace=True)
new_df

Unnamed: 0,Latitude,Longitude,City,Neighborhood,postal_code,population,count_restaurants
0,48.876896,2.33746,PARIS,PARIS-9E-ARRONDISSEMENT,75009,60.3,100
1,48.867903,2.344107,PARIS,PARIS-2E-ARRONDISSEMENT,75002,22.4,100
2,48.859415,2.378741,PARIS,PARIS-11E-ARRONDISSEMENT,75011,152.7,45
3,48.872527,2.312583,PARIS,PARIS-8E-ARRONDISSEMENT,75008,40.3,41
4,48.828718,2.362468,PARIS,PARIS-13E-ARRONDISSEMENT,75013,182.0,62
5,48.835156,2.419807,PARIS,PARIS-12E-ARRONDISSEMENT,75012,142.9,5
6,48.863054,2.359361,PARIS,PARIS-3E-ARRONDISSEMENT,75003,35.7,86
7,48.848968,2.332671,PARIS,PARIS-6E-ARRONDISSEMENT,75006,43.1,33
8,48.854228,2.357362,PARIS,PARIS-4E-ARRONDISSEMENT,75004,28.2,100
9,48.876029,2.361113,PARIS,PARIS-10E-ARRONDISSEMENT,75010,95.9,100


Here we will for every Neighboorhood count the number of stores

In [211]:
food_category_id = '52f2ab2ebcbc57f1066b8b1c'
counts_stores = {"Neighborhood":[], "count_stores":[]}
for lat, lng, Neighborhood in zip(paris_df['Latitude'], paris_df['Longitude'], paris_df["Neighborhood"]):
    url='https://api.foursquare.com/v2/venues/explore?client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}&categoryId={}'.format(
        CLIENT_ID,CLIENT_SECRET,VERSION,
        lat,lng,
        radius, LIMIT, food_category_id)
    results = requests.get(url).json()
    venues = results["response"]["groups"][0]["items"]
    counts_stores["Neighborhood"].append(Neighborhood)
    counts_stores["count_stores"].append(len(venues))



Now let's prepare our result dataframe by sorting result 

In [214]:
stores_df = pd.DataFrame(counts_stores, columns=["Neighborhood", "count_stores"])

new_df2 = pd.merge(new_df, stores_df)
new_df2
final_df = new_df2.sort_values(["population", "count_restaurants", "count_stores"], ascending=[False, True, True])
final_df

Unnamed: 0,Latitude,Longitude,City,Neighborhood,postal_code,population,count_restaurants,count_stores
17,48.840155,2.293559,PARIS,PARIS-15E-ARRONDISSEMENT,75015,236.5,61,3
15,48.892735,2.348712,PARIS,PARIS-18E-ARRONDISSEMENT,75018,200.6,51,2
13,48.863187,2.40082,PARIS,PARIS-20E-ARRONDISSEMENT,75020,197.1,48,4
12,48.886869,2.384694,PARIS,PARIS-19E-ARRONDISSEMENT,75019,184.8,31,2
4,48.828718,2.362468,PARIS,PARIS-13E-ARRONDISSEMENT,75013,182.0,62,1
10,48.860399,2.2621,PARIS,PARIS-16E-ARRONDISSEMENT,75016,169.4,2,0
16,48.887337,2.307486,PARIS,PARIS-17E-ARRONDISSEMENT,75017,168.5,64,1
2,48.859415,2.378741,PARIS,PARIS-11E-ARRONDISSEMENT,75011,152.7,45,6
5,48.835156,2.419807,PARIS,PARIS-12E-ARRONDISSEMENT,75012,142.9,5,0
19,48.828993,2.327101,PARIS,PARIS-14E-ARRONDISSEMENT,75014,137.2,24,1
