## Capstone Final Project : House hunting in Nice, France

Capital of the French Riviera, Nice is one of the most popular tourist destinations in the world. Its assets are unique and numerous. Nice has an ideal location right between the Mediterranean Sea and the mountains, just a few kilometers from the Italian border. It boasts a mild climate, bright sun, diverse landscapes, and of course beautiful beaches with blue azure waters. It sounds like the perfect place to live!

Of course, when you are looking for the house of your dreams, there are several subjects to be concerned for. We can start by the price of the house, of course... It should be a house or an apartment? the number of rooms, the surface, etc. But this is easily accessible information that you can have at hand through the agencies' web sites. However, having an idea about the surroundings, and about the neighborhood’s lifestyle is a little bit more complicated. 

The objective of this project is to identify suitable neighborhoods to purchase or rent a house or apartment in Nice, France. The suitability of a neighborhood will mainly depend on the reachability of diverse facilities and venues. This information must allow us to answer questions such as: How many restaurants will be near my new home, there are some schools in the neighborhood, what about subway and bus stations, it will be practical for going to work? We will obtain accurate information about venues and facilities from Frousquate.

So, the final goal is to group similar neighborhoods (according to the characteristics mentioned before), and then provide the information in a suitable way that will facilitate a wise selection of your new hose!

In [95]:
import pandas as pd
import pandas_profiling
import matplotlib.pyplot as plt
%matplotlib inline
import seaborn as sns
import requests

For this project, I will use two datasets: (1) A dataset that contains a list of properties in Nice, and their geographical coordinates, and (2) the Foursquare dataset to have information about the venues in the surrounding area.

First, I download the dataset about properties (houses, apartments, etc.) and values in France (Nice, and other cities).

In [None]:
#data schema in: https://www.data.gouv.fr/fr/datasets/demandes-de-valeurs-foncieres-geolocalisees/
path = "https://cadastre.data.gouv.fr/data/etalab-dvf/latest/csv/2018/full.csv.gz" 
df_logement = pd.read_csv(path)

In [62]:
df_logement = df_logement[0:100000]

In [65]:
df_logement.shape
df_logement.info()
df_logement.head()

Unnamed: 0,id_mutation,date_mutation,numero_disposition,nature_mutation,valeur_fonciere,adresse_nom_voie,adresse_code_voie,code_postal,code_commune,nom_commune,code_departement,id_parcelle,nombre_lots,type_local,surface_reelle_bati,nombre_pieces_principales,longitude,latitude
0,2018-1,2018-01-03,1,Vente,109000.0,RUE GEN LOGEROT,1660,1000.0,1053,Bourg-en-Bresse,1,01053000AN0073,1,Dépendance,,0.0,5.22044,46.200062
1,2018-1,2018-01-03,1,Vente,109000.0,RUE GEN LOGEROT,1660,1000.0,1053,Bourg-en-Bresse,1,01053000AN0073,2,Appartement,73.0,4.0,5.22044,46.200062
2,2018-2,2018-01-04,1,Vente,239300.0,RUE DE LA BARMETTE,25,1250.0,1095,Nivigne et Suran,1,01095000AH0186,0,Maison,163.0,4.0,5.408041,46.255562
3,2018-2,2018-01-04,1,Vente,239300.0,RUE DE LA BARMETTE,25,1250.0,1095,Nivigne et Suran,1,01095000AH0186,0,Maison,51.0,2.0,5.408041,46.255562
4,2018-2,2018-01-04,1,Vente,239300.0,RUE DE LA BARMETTE,25,1250.0,1095,Nivigne et Suran,1,01095000AH0186,0,Maison,51.0,2.0,5.408041,46.255562


I will clean the data, delete the columns with high % of empty values, or the columns that do not provide useful information for the analysis. I will also keep the dataset with information about to Nice only. 

In [64]:
df_logement = df_logement.drop(columns=['adresse_numero', 'adresse_suffixe', 'ancien_code_commune', 'ancien_id_parcelle', 
                                        'ancien_nom_commune', 'code_nature_culture_speciale', 'code_type_local', 'lot1_numero',
                                        'lot1_surface_carrez', 'lot2_numero', 'lot2_surface_carrez', 'lot3_numero', 'lot3_surface_carrez',
                                        'lot4_numero', 'lot4_surface_carrez', 'lot5_numero', 'lot5_surface_carrez', 'nature_culture', 
                                        'nature_culture_speciale', 'numero_volume', 'Unnamed: 0', 'code_nature_culture', 'surface_terrain'])

In [66]:
df_logement_nice = df_logement[df_logement.nom_commune == 'Nice']
print(df_logement_nice.shape)

(12646, 18)


In [70]:
df_logement_nice = df_logement_nice.dropna(subset=['longitude'])
print(df_logement_nice.shape)

(12630, 18)


In [71]:
df_logement_nice = df_logement_nice.dropna(subset=['latitude'])
print(df_logement_nice.shape)

(12630, 18)


I use Padas Profiling for generating a report of the dataset. 

In [72]:
pandas_profiling.ProfileReport(df_logement_nice)



In [73]:
df_logement_nice['latitude'] = df_logement_nice['latitude'].apply(lambda x: format(x, '.5f')) 
df_logement_nice['longitude'] = df_logement_nice['longitude'].apply(lambda x: format(x, '.5f')) 

In [74]:
df_logement_nice.head(10)

Unnamed: 0,id_mutation,date_mutation,numero_disposition,nature_mutation,valeur_fonciere,adresse_nom_voie,adresse_code_voie,code_postal,code_commune,nom_commune,code_departement,id_parcelle,nombre_lots,type_local,surface_reelle_bati,nombre_pieces_principales,longitude,latitude
78293,2018-29550,2018-01-08,1,Vente,291560.0,BD CIMIEZ,1410,6000.0,6088,Nice,6,06088000LM0086,2,Appartement,52.0,2.0,7.2722,43.71372
78294,2018-29551,2018-01-04,1,Vente,326800.0,BD STALINGRAD,6285,6300.0,6088,Nice,6,06088000KL0015,2,Appartement,68.0,4.0,7.28766,43.69674
78295,2018-29552,2018-01-09,1,Vente,110000.0,RUE BERTOLA,795,6300.0,6088,Nice,6,06088000IR0235,1,Appartement,35.0,1.0,7.29337,43.71366
78296,2018-29552,2018-01-09,1,Vente,110000.0,RUE BERTOLA,795,6300.0,6088,Nice,6,06088000IR0235,1,Appartement,35.0,1.0,7.29337,43.71366
78297,2018-29553,2018-01-08,1,Vente,53000.0,BD CARNOT,1145,6300.0,6088,Nice,6,06088000KL0013,1,Dépendance,,0.0,7.28898,43.69681
78298,2018-29554,2018-01-10,1,Vente,114000.0,RUE SCALIERO,6130,6300.0,6088,Nice,6,06088000IX0212,2,Appartement,24.0,1.0,7.28474,43.70253
78299,2018-29555,2018-01-08,1,Vente,285000.0,RUE DE PARIS,4790,6000.0,6088,Nice,6,06088000LB0314,2,Appartement,99.0,4.0,7.26721,43.7045
78300,2018-29556,2018-01-10,1,Vente,112000.0,AV HENRY DUNANT,3230,6100.0,6088,Nice,6,06088000HA0178,2,Appartement,42.0,2.0,7.26549,43.72274
78301,2018-29557,2018-01-08,1,Vente,170000.0,AV DURANTE,2125,6000.0,6088,Nice,6,06088000LA0075,1,Appartement,37.0,2.0,7.26308,43.70234
78302,2018-29558,2018-01-12,1,Vente,760000.0,BD DU MONT BORON,4350,6300.0,6088,Nice,6,06088000IV0381,1,Dépendance,,0.0,7.29566,43.70617


Then, I define my Foursquare credentials, and I will get the venues in the sorrouding areas (1000

In [None]:
CLIENT_ID = 'XX' # your Foursquare ID
CLIENT_SECRET = 'XX' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

In [15]:
categories_url = 'https://api.foursquare.com/v2/venues/categories?client_id={}&client_secret={}&v={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION)
            
# make the GET request
results = requests.get(categories_url).json()

In [76]:
len(results['response']['categories'])

10

In [17]:
categories_list = []
# Let's print only the top-level categories and their IDs and also add them to categories_list

def print_categories(categories, level=0, max_level=0):    
    if level>max_level: return
    out = ''
    out += '-'*level
    for category in categories:
        print(out + category['name'] + ' (' + category['id'] + ')')
        print_categories(category['categories'], level+1, max_level)
        categories_list.append((category['name'], category['id']))
        


Arts & Entertainment (4d4b7104d754a06370d81259)
College & University (4d4b7105d754a06372d81259)
Event (4d4b7105d754a06373d81259)
Food (4d4b7105d754a06374d81259)
Nightlife Spot (4d4b7105d754a06376d81259)
Outdoors & Recreation (4d4b7105d754a06377d81259)
Professional & Other Places (4d4b7105d754a06375d81259)
Residence (4e67e38e036454776db1fb3a)
Shop & Service (4d4b7105d754a06378d81259)
Travel & Transport (4d4b7105d754a06379d81259)


In [77]:
print_categories(results['response']['categories'], 0, 0)

Arts & Entertainment (4d4b7104d754a06370d81259)
College & University (4d4b7105d754a06372d81259)
Event (4d4b7105d754a06373d81259)
Food (4d4b7105d754a06374d81259)
Nightlife Spot (4d4b7105d754a06376d81259)
Outdoors & Recreation (4d4b7105d754a06377d81259)
Professional & Other Places (4d4b7105d754a06375d81259)
Residence (4e67e38e036454776db1fb3a)
Shop & Service (4d4b7105d754a06378d81259)
Travel & Transport (4d4b7105d754a06379d81259)


In [81]:
def get_venues_count(ll, radius, categoryId):
    #print(ll)
    explore_url = 'https://api.foursquare.com/v2/venues/explore?client_id={}&client_secret={}&v={}&ll={}&radius={}&categoryId={}'.format(
                CLIENT_ID, 
                CLIENT_SECRET, 
                VERSION,
                ll,
                radius,
                categoryId)
    
    return requests.get(explore_url).json()['response']['totalResults']

In [79]:
#Create new dataframe to store venues data
df_logement_venues = df_logement_nice.copy()
for c in categories_list:
    df_logement_venues[c[0]] = 0

In [80]:
df_logement_venues.head(10)

Unnamed: 0,id_mutation,date_mutation,numero_disposition,nature_mutation,valeur_fonciere,adresse_nom_voie,adresse_code_voie,code_postal,code_commune,nom_commune,...,Arts & Entertainment,College & University,Event,Food,Nightlife Spot,Outdoors & Recreation,Professional & Other Places,Residence,Shop & Service,Travel & Transport
78293,2018-29550,2018-01-08,1,Vente,291560.0,BD CIMIEZ,1410,6000.0,6088,Nice,...,0,0,0,0,0,0,0,0,0,0
78294,2018-29551,2018-01-04,1,Vente,326800.0,BD STALINGRAD,6285,6300.0,6088,Nice,...,0,0,0,0,0,0,0,0,0,0
78295,2018-29552,2018-01-09,1,Vente,110000.0,RUE BERTOLA,795,6300.0,6088,Nice,...,0,0,0,0,0,0,0,0,0,0
78296,2018-29552,2018-01-09,1,Vente,110000.0,RUE BERTOLA,795,6300.0,6088,Nice,...,0,0,0,0,0,0,0,0,0,0
78297,2018-29553,2018-01-08,1,Vente,53000.0,BD CARNOT,1145,6300.0,6088,Nice,...,0,0,0,0,0,0,0,0,0,0
78298,2018-29554,2018-01-10,1,Vente,114000.0,RUE SCALIERO,6130,6300.0,6088,Nice,...,0,0,0,0,0,0,0,0,0,0
78299,2018-29555,2018-01-08,1,Vente,285000.0,RUE DE PARIS,4790,6000.0,6088,Nice,...,0,0,0,0,0,0,0,0,0,0
78300,2018-29556,2018-01-10,1,Vente,112000.0,AV HENRY DUNANT,3230,6100.0,6088,Nice,...,0,0,0,0,0,0,0,0,0,0
78301,2018-29557,2018-01-08,1,Vente,170000.0,AV DURANTE,2125,6000.0,6088,Nice,...,0,0,0,0,0,0,0,0,0,0
78302,2018-29558,2018-01-12,1,Vente,760000.0,BD DU MONT BORON,4350,6300.0,6088,Nice,...,0,0,0,0,0,0,0,0,0,0


In [None]:
#Request number of venues, store result as CSV
for i, row in df_logement_venues.iterrows():
    coordinates = str(df_logement_venues.latitude[i])+','+str(df_logement_venues.longitude[i])
    #print(coordinates)
    for c in categories_list:
        df_logement_venues.loc[i, c[0]] = get_venues_count(coordinates, radius=1000, categoryId=c[1])

In [83]:
df_logement_venues.shape

(12630, 28)

In [93]:
df_logement_venues.head(50)

Unnamed: 0,id_mutation,date_mutation,numero_disposition,nature_mutation,valeur_fonciere,adresse_nom_voie,adresse_code_voie,code_postal,code_commune,nom_commune,...,Arts & Entertainment,College & University,Event,Food,Nightlife Spot,Outdoors & Recreation,Professional & Other Places,Residence,Shop & Service,Travel & Transport
78293,2018-29550,2018-01-08,1,Vente,291560.0,BD CIMIEZ,1410,6000.0,6088,Nice,...,3,16,0,12,9,8,23,5,44,14
78294,2018-29551,2018-01-04,1,Vente,326800.0,BD STALINGRAD,6285,6300.0,6088,Nice,...,15,7,0,72,30,45,29,11,76,39
78295,2018-29552,2018-01-09,1,Vente,110000.0,RUE BERTOLA,795,6300.0,6088,Nice,...,2,7,1,5,1,3,8,6,35,13
78296,2018-29552,2018-01-09,1,Vente,110000.0,RUE BERTOLA,795,6300.0,6088,Nice,...,2,7,1,5,1,3,8,6,35,13
78297,2018-29553,2018-01-08,1,Vente,53000.0,BD CARNOT,1145,6300.0,6088,Nice,...,14,7,0,51,17,42,18,11,75,31
78298,2018-29554,2018-01-10,1,Vente,114000.0,RUE SCALIERO,6130,6300.0,6088,Nice,...,17,10,0,80,31,41,36,3,78,45
78299,2018-29555,2018-01-08,1,Vente,285000.0,RUE DE PARIS,4790,6000.0,6088,Nice,...,16,11,1,95,23,36,50,6,115,123
78300,2018-29556,2018-01-10,1,Vente,112000.0,AV HENRY DUNANT,3230,6100.0,6088,Nice,...,3,11,0,5,3,6,11,5,26,6
78301,2018-29557,2018-01-08,1,Vente,170000.0,AV DURANTE,2125,6000.0,6088,Nice,...,14,12,1,105,39,57,60,7,120,143
78302,2018-29558,2018-01-12,1,Vente,760000.0,BD DU MONT BORON,4350,6300.0,6088,Nice,...,5,6,0,13,10,8,10,4,72,14


In [None]:
df_logement_venues_nice.to_csv('logements_venues_nice.csv')