# House Rental

## Subject

### Helping to evaluate vacation renting sites

__Short description__: This project is concentrated on the evaluating renting advertisements by studying the electricity consumption in towns where the property is advertised. In the context of users choosing different destinations for vacations, provide them an additional information on those places in terms of environmental conditions. The environmental condition in our limited example is based on the electrical consumption in the destination town.

__Further details__: In France, several house rental web sites have RSS XML flow that can be parsed into a data set containing the list of available rentals. You will find the names of the towns inside the text.

At the same time you have the CSV file from [ENEDIS](https://data.enedis.fr/explore/dataset/consommation-electrique-par-secteur-dactivite-commune/) containing the history of electricity consumption allowing you to estimation how much and for which purpose the energy is consumed. Thus you can provide every community with the “electrical” description, where you will calculate different indicators, such as for example, 
* part/amount of non-resident consumption, which might indicate the importance of the industrial installations in the town.
* evolution of the residencial consumption for several years, which might indicate the growth factor of the town
* evolution of the non-residential consumption
* other indicators left to you

Putting both data sources together allows you to sort/filter the rental advertisements by the “energy” indicators, as, for example, “zero industry” advertisements - quiet locations. To find an exact usage is left to you as a part of the exercise.

## Code

### Imports

In [20]:
import sys
import pandas as pd
import re
import numpy as np
import feedparser
from bokeh import io, plotting, layouts, models, palettes

pd.options.display.max_columns= 200
pd.options.display.max_rows= 200

In [2]:
base = "http://www.ty-gites.com/rss/"
location = "locations_vacances_"
regions = ["alsace",
           "aquitaine",
           "auvergne",
           "bourgogne",
           "bretagne",
           "centre_val_de_loire",
           "champagne_ardenne",
           "corse",
           "franche_comte",
           "ile_de_france",
           "languedoc_roussillon",
           "limousin",
           "lorraine",
           "midi_pyrenees",
           "nord_pas_de_calais",
           "normandie",
           "pays_de_la_loire",
           "picardie",
           "poitou_charentes",
           "provence_alpes_cote_d_azur",
           "rhone_alpes",
           "outre_mer"]

### Parsing RSS feeds into Dataframe

In [3]:
feeds, items = [], []
cpt = 0
regex = r"-\s(\w|-)*\s-"
for region in regions:
    url = base+location+region+".xml"
    feeds.append(feedparser.parse(url))

for feed in feeds:
    for item in feed.entries:
        m = re.search(regex, item.title)
        if m:
            items.append((item.title, m.group(0)[1:-1].lower(), item.published, item.summary, item.link))
        else:
            cpt+=1

rents = pd.DataFrame(items, columns=['titre', 'commune', 'date publication', 'description', 'lien'])
print("Il manque %d entrées" % cpt)
rents.head(10)

Il manque 89 entrées


Unnamed: 0,titre,commune,date publication,description,lien
0,Gîte Le nid des hirondelles - KOGENHEIM -,kogenheim,"Wed, 28 Mar 2018 23:52:48 GMT","Le gîte ""LE NID DES HIRONDELLES"", est situé au...",http://www.ty-gites.com/location-gite-kogenhei...
1,Gîte Gites-Weyer - BERGHEIM -,bergheim,"Wed, 28 Mar 2018 13:58:57 GMT","""Elégants et coquets, nos appartements sont un...",http://www.ty-gites.com/location-gite-bergheim...
2,Gîte en CENTRE ALSACE proche de SELESTAT - NE...,neubois,"Sun, 25 Mar 2018 20:34:55 GMT",disponible pour MARCHES DE NOEL du 11 au 23 dé...,"http://www.ty-gites.com/location-gite-neubois,..."
3,Gîte Au murmure de l'eau - Ebersheim -,ebersheim,"Sat, 24 Mar 2018 09:23:35 GMT",Au murmure de l'eau gîte rural de Patrick Stin...,http://www.ty-gites.com/location-gite-ebershei...
4,Chalet Le Rebberg - Soultzeren -,soultzeren,"Fri, 23 Mar 2018 22:39:57 GMT",Bienvenue dans notre chalet!\nA 2 et jusqu'à 8...,http://www.ty-gites.com/location-chalet-soultz...
5,Gîte Chez Sylvie LEONHART - SAINT-HIPPOLYTE -,saint-hippolyte,"Mon, 19 Mar 2018 15:20:56 GMT",L’Alsace offre une grande variété de paysages ...,http://www.ty-gites.com/location-gite-saint_hi...
6,Gîte Le Panoramic - Sondernach -,sondernach,"Sun, 18 Mar 2018 23:15:16 GMT","Gîte ambiance bois très charmant à Sondernach,...",http://www.ty-gites.com/location-gite-sonderna...
7,Gîte La Marguerite - Sondernach -,sondernach,"Sun, 18 Mar 2018 23:15:00 GMT",Le gîte se trouve au premier étage de cette ma...,http://www.ty-gites.com/location-gite-sonderna...
8,Gîte Freyburger Jeannine - Elbach -,elbach,"Fri, 16 Mar 2018 18:01:37 GMT",Jeanine et Bernard FREYBURGER ont aménagé un l...,"http://www.ty-gites.com/location-gite-elbach,4..."
9,Gîte Le Jardin d'Elisa - Niederhaslach -,niederhaslach,"Wed, 14 Mar 2018 18:20:40 GMT",Un charmant gîte de vacances en Alsace situé a...,http://www.ty-gites.com/location-gite-niederha...


In [4]:
conso = pd.read_csv( "datasets/conso.csv" , delimiter = ";" )

dataEnedis = pd.DataFrame(data=conso)
dataEnedis.head(10)


Unnamed: 0,Année,Nom commune,Code commune,Nom EPCI,Code EPCI,Type EPCI,Nom département,Code département,Nom région,Code région,Nb sites Résidentiel,Conso totale Résidentiel (MWh),Conso moyenne Résidentiel (MWh),Nb sites Professionnel,Conso totale Professionnel (MWh),Conso moyenne Professionnel (MWh),Nb sites Agriculture,Conso totale Agriculture (MWh),Nb sites Industrie,Conso totale Industrie (MWh),Nb sites Tertiaire,Conso totale Tertiaire (MWh),Nb sites Secteur non affecté,Conso totale Secteur non affecté (MWh),Nombre d'habitants,Taux de logements collectifs,Taux de résidences principales,Superficie des logements < 30 m2,Superficie des logements 30 à 40 m2,Superficie des logements 40 à 60 m2,Superficie des logements 60 à 80 m2,Superficie des logements 80 à 100 m2,Superficie des logements > 100 m2,Résidences principales avant 1919,Résidences principales de 1919 à 1945,Résidences principales de 1946 à 1970,Résidences principales de 1971 à 1990,Résidences principales de 1991 à 2005,Résidences principales de 2006 à 2010,Résidences principales après 2011,Taux de chauffage électrique,Geo Shape,Geo Point 2D
0,2011,La Chapelle-Saint-Maurice,74060,CC de la Rive Gauche du Lac d'Annecy,247400732,CC,Haute-Savoie,74,Auvergne-Rhône-Alpes,84,63.0,388.407654,6.165201,17.0,123.886227,7.287425,0,0.0,0,0.0,1,33.605294,0,0.0,134,28.955467,87.22633,1.818182,1.818182,9.090909,38.181818,23.636364,25.454545,36.363636,5.454545,7.272727,21.818182,12.727273,10.909091,5.454545,36.363636,,
1,2015,Thaumiers,18261,CC le Dunois,241800424,CC,Cher,18,Centre-Val de Loire,24,257.0,1644.113231,6.397328,34.0,232.732521,6.845074,0,0.0,1,9.031,0,0.0,0,0.0,414,0.749064,75.107296,0.0,1.714286,7.428571,23.428571,26.285714,41.142857,65.142857,12.571429,4.571429,7.428571,2.285714,6.285714,1.714286,17.714286,,
2,2013,Neuville-sur-Saône,69143,CU de Lyon,246900245,CU,Rhône,69,Auvergne-Rhône-Alpes,84,3602.0,13663.962722,3.793438,750.0,6758.088071,9.010784,0,0.0,16,36270.708649,42,9052.230987,0,0.0,7242,72.548244,99.786579,4.208236,5.383414,18.980476,31.729883,21.571045,18.126951,12.69379,6.374103,24.723784,35.05317,12.476466,8.678691,0.0,17.43724,,
3,2011,Lachapelle-sous-Rougemont,90058,CC du Pays Sous Vosgien,249000217,CC,Territoire-de-Belfort,90,Bourgogne-Franche-Comté,27,271.0,1523.843323,5.623038,46.0,469.457076,10.205589,0,0.0,2,2138.948712,1,103.714138,0,0.0,597,36.191962,98.377403,1.652911,3.305781,15.289254,11.570255,25.206599,42.975199,37.190072,6.611563,9.504127,12.809908,19.008253,13.636384,1.239693,28.51238,,
4,2016,Seyre,31546,CC Coteaux du Lauragais Sud (Co.Laur.Sud),243100179,CC,Haute-Garonne,31,Occitanie,76,53.0,438.788782,8.279034,11.0,22.572683,2.052062,0,0.0,0,0.0,0,0.0,0,0.0,108,1.879739,91.391091,0.0,0.0,2.127638,10.63819,12.766082,74.468091,23.404272,2.127638,12.766082,17.021358,14.89372,6.382914,23.40402,34.042461,"{""type"": ""Polygon"", ""coordinates"": [[[1.678708...","43.3635974739, 1.66366858427"
5,2012,Bourdeaux,26056,CC du Val de Drôme,242600252,CC,Drôme,26,Auvergne-Rhône-Alpes,84,438.0,2059.138022,4.701228,101.0,994.960184,9.851091,1,76.59732,2,367.469879,1,383.057327,0,0.0,616,22.964509,65.429234,2.836879,2.836879,9.219858,18.085106,22.695035,44.326241,42.907801,4.964539,8.865248,18.439716,13.120567,7.801418,3.900709,13.829787,,
6,2014,Bonnée,45039,CC Val d'Or et Forêt,244500518,CC,Loiret,45,Centre-Val de Loire,24,323.0,2244.8422,6.949976,57.0,649.699815,11.398242,5,274.041233,1,297.602592,10,1943.785158,0,0.0,673,1.830332,93.484385,0.340143,0.340143,10.204075,21.428579,26.530616,41.156444,25.850331,2.380964,15.986398,38.095228,12.244897,5.44218,6.325278e-15,30.272117,,
7,2016,Étreval,54185,CC du Pays du Saintois,200035772,CC,Meurthe-et-Moselle,54,Grand-Est,44,,,,,,,0,0.0,0,0.0,0,0.0,0,0.0,71,0.0,86.993393,0.0,0.0,7.692423,11.538634,19.230683,61.538634,57.692423,11.538634,3.846211,11.538634,7.692423,7.692423,0.0,11.538634,"{""type"": ""Polygon"", ""coordinates"": [[[6.067205...","48.4566428205, 6.05278055195"
8,2015,Aulnoy,77013,CC Pays de Coulommiers,200035590,CC,Seine-et-Marne,77,Île-de-France,11,153.0,1543.139388,10.085878,22.0,150.427961,6.837635,2,801.69546,0,0.0,1,54.033168,0,0.0,379,2.533873,91.483555,0.0,0.0,6.338018,11.267604,22.535207,59.859102,47.88735,3.521152,7.746452,18.309839,11.97182,7.042235,3.521152,37.323963,,
9,2012,Causse-et-Diège,12257,nd,ZZZZZZZZZ,nd,Aveyron,12,Occitanie,76,426.0,2137.030473,5.016503,82.0,503.676535,6.142397,1,48.329299,0,0.0,3,94.139923,0,0.0,724,1.581028,75.0,2.380952,0.892857,6.845238,15.178571,22.916667,51.785714,42.559524,5.952381,8.333333,17.857143,13.690476,7.142857,4.464286,17.559524,,


In [5]:
codeCommune = pd.DataFrame(columns=['Commune', 'Code Commune', "Nombre d'habitants"])

codeCommune['Commune'] = conso["Nom commune"].values
codeCommune['Code Commune'] = conso["Code commune"]
codeCommune["Nombre d'habitants"] = conso["Nombre d'habitants"]
codeCommune = codeCommune.drop_duplicates()

codeCommune.sort_values(['Commune']).head(20)



Unnamed: 0,Commune,Code Commune,Nombre d'habitants
9270,Aast,64001,178
57449,Abainville,55001,297
8401,Abancourt,59001,461
86115,Abancourt,60001,639
96027,Abaucourt,54001,304
40553,Abaucourt-Hautecourt,55002,116
8700,Abbans-Dessous,25001,244
16265,Abbans-Dessus,25002,301
31404,Abbaretz,44001,1984
76817,Abbecourt,60002,763


*On supprime les champs inutiles du dataset car Bokeh demande une clé API Google pour traiter les Geo Points*

In [6]:
df_enedis = dataEnedis.drop(['Geo Shape','Geo Point 2D','Nom EPCI'], axis=1)

#df_enedis.head(5)

 #  Check if any value is NaN in DataFrame

In [7]:
# Si la fonction retourne 'True' alors il existe des colonnes qui contiennent 'Nan' en donnée 
isNull = False
while isNull == False:
    for i in df_enedis.isnull().values.any(axis=1):
        if i == True:
            isNull = i

print(isNull)

True


In [28]:
# Afin de supprimer un maximum de lignes dont les données sont inutilisables
# df_enedis[df_enedis["Nombre d'habitants"] > 80].head(10)


  Generation de Graphique avec Bohek à partir du fichier ENEDIS

In [9]:
io.output_notebook()

In [27]:
df_enedis.columns = [x.strip().replace(' ', '_') for x in df_enedis.columns]
df_enedis.head()

Unnamed: 0,Année,Nom_commune,Code_commune,Code_EPCI,Type_EPCI,Nom_département,Code_département,Nom_région,Code_région,Nb_sites_Résidentiel,Conso_totale_Résidentiel_(MWh),Conso_moyenne_Résidentiel_(MWh),Nb_sites_Professionnel,Conso_totale_Professionnel_(MWh),Conso_moyenne_Professionnel_(MWh),Nb_sites_Agriculture,Conso_totale_Agriculture_(MWh),Nb_sites_Industrie,Conso_totale_Industrie_(MWh),Nb_sites_Tertiaire,Conso_totale_Tertiaire_(MWh),Nb_sites_Secteur_non_affecté,Conso_totale_Secteur_non_affecté_(MWh),Nombre_d'habitants,Taux_de_logements_collectifs,Taux_de_résidences_principales,Superficie_des_logements_<_30_m2,Superficie_des_logements_30_à_40_m2,Superficie_des_logements_40_à_60_m2,Superficie_des_logements_60_à_80_m2,Superficie_des_logements_80_à_100_m2,Superficie_des_logements_>_100_m2,Résidences_principales_avant_1919,Résidences_principales_de_1919_à_1945,Résidences_principales_de_1946_à_1970,Résidences_principales_de_1971_à_1990,Résidences_principales_de_1991_à_2005,Résidences_principales_de_2006_à_2010,Résidences_principales_après_2011,Taux_de_chauffage_électrique
0,2011,La Chapelle-Saint-Maurice,74060,247400732,CC,Haute-Savoie,74,Auvergne-Rhône-Alpes,84,63.0,388.407654,6.165201,17.0,123.886227,7.287425,0,0.0,0,0.0,1,33.605294,0,0.0,134,28.955467,87.22633,1.818182,1.818182,9.090909,38.181818,23.636364,25.454545,36.363636,5.454545,7.272727,21.818182,12.727273,10.909091,5.454545,36.363636
1,2015,Thaumiers,18261,241800424,CC,Cher,18,Centre-Val de Loire,24,257.0,1644.113231,6.397328,34.0,232.732521,6.845074,0,0.0,1,9.031,0,0.0,0,0.0,414,0.749064,75.107296,0.0,1.714286,7.428571,23.428571,26.285714,41.142857,65.142857,12.571429,4.571429,7.428571,2.285714,6.285714,1.714286,17.714286
2,2013,Neuville-sur-Saône,69143,246900245,CU,Rhône,69,Auvergne-Rhône-Alpes,84,3602.0,13663.962722,3.793438,750.0,6758.088071,9.010784,0,0.0,16,36270.708649,42,9052.230987,0,0.0,7242,72.548244,99.786579,4.208236,5.383414,18.980476,31.729883,21.571045,18.126951,12.69379,6.374103,24.723784,35.05317,12.476466,8.678691,0.0,17.43724
3,2011,Lachapelle-sous-Rougemont,90058,249000217,CC,Territoire-de-Belfort,90,Bourgogne-Franche-Comté,27,271.0,1523.843323,5.623038,46.0,469.457076,10.205589,0,0.0,2,2138.948712,1,103.714138,0,0.0,597,36.191962,98.377403,1.652911,3.305781,15.289254,11.570255,25.206599,42.975199,37.190072,6.611563,9.504127,12.809908,19.008253,13.636384,1.239693,28.51238
4,2016,Seyre,31546,243100179,CC,Haute-Garonne,31,Occitanie,76,53.0,438.788782,8.279034,11.0,22.572683,2.052062,0,0.0,0,0.0,0,0.0,0,0.0,108,1.879739,91.391091,0.0,0.0,2.127638,10.63819,12.766082,74.468091,23.404272,2.127638,12.766082,17.021358,14.89372,6.382914,23.404017,34.042461


In [11]:

years_sorted = sorted(df_enedis['Année'].unique())


In [33]:
#numlines = len(df_enedis.columns)
#mypalette = palettes.Spectral11[0:numlines]

#p = plotting.figure(width=500, height=300)
#p.multi_line(xs=years_sorted,
#             ys=df_enedis['Conso_totale_Industrie_(MWh)'],
#             line_color=mypalette,
#             line_width=5)

p = plotting.figure(plot_width=400, plot_height=400)

p.line(years_sorted,df_enedis[df_enedis["Nom_commune"] == 'Bourdeaux']['Conso_totale_Industrie_(MWh)'],line_width=3)
plotting.show(p)

  elif np.issubdtype(type(obj), np.float):
