# House Rental

## Subject

### Helping to evaluate vacation renting sites

__Short description__: This project is concentrated on the evaluating renting advertisements by studying the electricity consumption in towns where the property is advertised. In the context of users choosing different destinations for vacations, provide them an additional information on those places in terms of environmental conditions. The environmental condition in our limited example is based on the electrical consumption in the destination town.

__Further details__: In France, several house rental web sites have RSS XML flow that can be parsed into a data set containing the list of available rentals. You will find the names of the towns inside the text.

At the same time you have the CSV file from [ENEDIS](https://data.enedis.fr/explore/dataset/consommation-electrique-par-secteur-dactivite-commune/) containing the history of electricity consumption allowing you to estimation how much and for which purpose the energy is consumed. Thus you can provide every community with the “electrical” description, where you will calculate different indicators, such as for example, 
* part/amount of non-resident consumption, which might indicate the importance of the industrial installations in the town.
* evolution of the residencial consumption for several years, which might indicate the growth factor of the town
* evolution of the non-residential consumption
* other indicators left to you

Putting both data sources together allows you to sort/filter the rental advertisements by the “energy” indicators, as, for example, “zero industry” advertisements - quiet locations. To find an exact usage is left to you as a part of the exercise.

## Code

### Imports

In [None]:
import sys
import pandas as pd
import re
import numpy as np
import feedparser
from bokeh import io, plotting, layouts, models, palettes
import cufflinks as cf
import plotly
import plotly.plotly as py
import plotly.graph_objs as go
from bokeh.plotting import figure
from bokeh.io import push_notebook, show, output_notebook
plotly.tools.set_credentials_file(username='enzobes', api_key='NbVOe5BmvoI0upllgZ6J')

pd.options.display.max_columns= 200
pd.options.display.max_rows= 200

In [None]:
base = "http://www.ty-gites.com/rss/"
location = "locations_vacances_"
regions = ["alsace",
           "aquitaine",
           "auvergne",
           "bourgogne",
           "bretagne",
           "centre_val_de_loire",
           "champagne_ardenne",
           "corse",
           "franche_comte",
           "ile_de_france",
           "languedoc_roussillon",
           "limousin",
           "lorraine",
           "midi_pyrenees",
           "nord_pas_de_calais",
           "normandie",
           "pays_de_la_loire",
           "picardie",
           "poitou_charentes",
           "provence_alpes_cote_d_azur",
           "rhone_alpes",
           "outre_mer"]

### Parsing RSS feeds into Dataframe

In [None]:
feeds, items = [], []
cpt = 0
regex = r"-\s(\w|-)*\s-"
for region in regions:
    url = base+location+region+".xml"
    feeds.append(feedparser.parse(url))

for feed in feeds:
    for item in feed.entries:
        m = re.search(regex, item.title)
        if m:
            items.append((item.title, m.group(0)[1:-1].lower().strip(), item.published, item.summary, item.link))
        else:
            cpt+=1

rents = pd.DataFrame(items, columns=['titre', 'nom_commune', 'date_publication', 'description', 'lien'])
print("Il manque %d entrées" % cpt)
rents.head()

In [None]:
conso = pd.read_csv( "datasets/conso.csv" , delimiter = ";" )

dataEnedis = pd.DataFrame(data=conso)
dataEnedis.columns = [x.strip().replace(' ', '_').lower() for x in dataEnedis.columns]
dataEnedis['nom_commune'] = dataEnedis['nom_commune'].str.lower()
#dataEnedis['année'] = pd.to_datetime( dataEnedis['année'] , format = "%Y" )
dataEnedis.head()

In [None]:
codeCommune = pd.DataFrame(columns=['nom_commune', 'code_commune', "nombre_d'habitants"])

codeCommune['nom_commune'] = conso["nom_commune"].values
codeCommune['code_commune'] = conso["code_commune"]
codeCommune["nombre_d'habitants"] = conso["nombre_d'habitants"]
codeCommune = codeCommune.drop_duplicates()
CommuneByPeople = codeCommune.sort_values(["nombre_d'habitants"], ascending = False)
CommuneByPeople.head(10)



*On supprime les champs inutiles du dataset*

In [None]:
df_enedis = dataEnedis.drop(['geo_shape','geo_point_2d','nom_epci'], axis=1)

In [None]:
merged = codeCommune.merge(rents, on='nom_commune',how='inner')
merged.head()

 #  Check if any value is NaN in DataFrame

Si la fonction retourne `True` alors il existe des colonnes qui contiennent `Nan` en donnée 

In [None]:
isNull = False
while isNull == False:
    for i in df_enedis.isnull().values.any(axis=1):
        if i == True:
            isNull = i

print(isNull)

  Generation de Graphique avec Bohek à partir du fichier ENEDIS

In [None]:
io.output_notebook()

In [None]:
years_sorted = sorted(df_enedis['année'].unique())
years_sorted

In [None]:
p = plotting.figure(plot_width=950, plot_height=400)

r = p.line(years_sorted,df_enedis[df_enedis["nom_commune"] == 'bourbriac']['conso_totale_industrie_(mwh)'], line_width=2)

plotting.show(p)

In [None]:
df_enedis = df_enedis.sort_values(['nb_sites_résidentiel'], ascending=False)
df_enedis.head()

In [16]:
cf.set_config_file(offline=False, world_readable=True, theme='ggplot')

nb_site_resident = df_enedis['nb_sites_résidentiel'].head(10)
nb_site_pro = df_enedis['nb_sites_professionnel'].head(10)
nb_site_indus = df_enedis['nb_sites_industrie'].head(10)

nom_commune = CommuneByPeople['nom_commune'].head(10)

residents = []
pros = []
indus = []
communes = []
for item in nom_commune:

    communes.append(item)

for item in nb_site_resident:

    residents.append(item)
    
for item in nb_site_pro:

    pros.append(item)
    
for item in nb_site_indus:

    indus.append(item)
    
print(communes)


trace0 = go.Bar(
            x=communes,
            y=residents,
            name = "Site pro"
)

trace1 = go.Bar(
            x=communes,
            y=pros,
            name = "Site résidents"
)
trace2 = go.Bar(
            x=communes,
            y=indus,
            name = "Site Industriel"
)


data = [trace0, trace1, trace2]
layout = go.Layout(showlegend=True)
fig = go.Figure(data=data, layout=layout)
py.iplot(fig, filename='basic-bar')

['toulouse', 'nice', 'nantes', 'montpellier', 'bordeaux', 'lille', 'rennes', 'reims', 'le havre', 'saint-étienne']


## Merging RSS feed with dataset
### Looking for houses

On prend les 10 communes possédant le plus de logements de chaque type

In [None]:
import matplotlib.pyplot as plt

f, a = plt.subplots(2,1)
df_rents01 = df_enedis.merge(merged, on=["code_commune", "nom_commune", "nombre_d'habitants"], how="inner")

df_rents01.groupby("nom_commune").agg({"superficie_des_logements_80_à_100_m2":"max"})['superficie_des_logements_80_à_100_m2'].nlargest(10).plot(title="Logements de 80 à 100 m2", x="Nb. logements", y="Communes", kind="barh",figsize=(15,12), ax=a[0])
df_rents01.groupby("nom_commune").agg({"superficie_des_logements_>_100_m2":"max"})['superficie_des_logements_>_100_m2'].nlargest(10).plot(title="Logements de plus de 100 m2", x="Nb. logements", y="Communes", kind="barh",figsize=(15,12), ax=a[1])

In [19]:
#numlines = len(df_enedis.columns)
#mypalette = palettes.Spectral11[0:numlines]

#p = plotting.figure(width=500, height=300)
#p.multi_line(xs=years_sorted,
#             ys=df_enedis['Conso_totale_Industrie_(MWh)'],
#             line_color=mypalette,
#             line_width=5)

p = plotting.figure(plot_width=950, plot_height=400)
r = p.line(years_sorted,df_enedis[df_enedis["nom_commune"] == 'chalabre']['conso_totale_industrie_(mwh)'], line_width=2)
def selectCommune(x):
    
    r.data_source.data['y'] = df_enedis[df_enedis["nom_commune"] == x]['conso_totale_industrie_(mwh)']
    push_notebook()
    
show(p, notebook_handle=True)

In [20]:
interact(selectCommune,x=communes )

NameError: name 'interact' is not defined