# Airbnb Paris
## by Mathieu Rella

# I. Business Understanding

We will be exploring Airbnb paris data to try to find answers to some questions like :

- Where is it good to rent on airbnb in paris ?
- Which season is the more profitable for the host ?
- What do really believe the guest of paris listing ?
- Can we predict the price of a listing ?

In [152]:
# import all packages and set plots to be embedded inline
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sb
import plotly.express as px
import qgrid
import plotly.graph_objects as go

import plotly
plotly.__version__
import json
from plotly.offline import download_plotlyjs, init_notebook_mode,  iplot
init_notebook_mode(connected=True)

from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import r2_score, mean_squared_error
# Sklearn ML Modules
from sklearn.preprocessing import MultiLabelBinarizer,LabelEncoder,OneHotEncoder,StandardScaler 
import sklearn.metrics as mtr
import math

%matplotlib inline

# suppress warnings from final output
import warnings
warnings.simplefilter("ignore")

In [153]:
# load all the dataset into a pandas dataframe

df_list = pd.read_csv('Data/listings.csv')
df_rev = pd.read_csv('Data/Reviews.csv')
df_cal = pd.read_csv('Data/calendar.csv')

#### F. Sentiment analysis

For this section i'm going to use Textblob, a Python library for processing textual data such as the comment column of the df_rev.

In [154]:
# Transform the types of the comments columns
df_rev['comments'] = df_rev['comments'].astype('str')
from textblob import TextBlob
from googletrans import Translator
translator = Translator()
from textblob.exceptions import NotTranslated
from time import sleep

def translate_comment(x):
    try:
        # Try to translate the string version of the comment
        return TextBlob(str(x)).translate(to='en')
    except NotTranslated:
        # If the output is the same as the input just return the TextBlob version of the input
        return TextBlob(str(x))


df_rev['comments'] = df_rev['comments'].apply(translate_comment)

for i in range(len(df_rev['comments'])):
    # Translate one comment at a time
    df_rev['comments'].iloc[i] = translate_comment(df_rev['comments'].iloc[i])

# Sleep for a quarter of second
    sleep(1)

HTTPError: HTTP Error 429: Too Many Requests

In [155]:
# Text analysis using textblob 

def sentiment_polarity_calc(text):
    try:
        return TextBlob(text).sentiment.polarity
    except:
        return None

df_rev['polarity_sentiment'] = df_rev['comments'].apply(sentiment_polarity_calc)

From this sentiment analysis, textblob rated between[-1:1] all the comment
    - Rated -1 to 0 it's considered as a negative comments
    - 0 means it's a neutral comment or did not achieve to rate it
    - 0 to 1 the comment is considered as positive
let's check if textblob made is job well.

In [156]:
# sorted df_rev by the best polarity sentiment
df_rev = df_rev.sort_values(by='polarity_sentiment',ascending=False)
df_rev.head(10)

Unnamed: 0,listing_id,id,date,reviewer_id,reviewer_name,comments,polarity_sentiment
748187,16011380,206787232,2017-10-27,6269557,Romain,Perfect,1.0
1069595,27004700,647616281,2020-08-07,159003555,Annemieke,Very happy about the service and the hotel!,1.0
419464,6332449,142762670,2017-04-09,56683279,Laetitia,Perfect !,1.0
808809,17874801,585190141,2019-12-31,303696325,Priscila,Great!,1.0
897468,20599898,238617405,2018-02-26,156290654,Jundan,The best Airbnb in the district Bourse (2).,1.0
1196854,33936690,445494981,2019-04-29,129377934,Sylviane,SOLINE nous a réservé un excellent accueil. L'...,1.0
1038938,25754484,492357388,2019-07-21,149781677,Baptiste,Hôte très sympathique et accueillante. Nous av...,1.0
1196857,33937777,457418601,2019-05-24,51192410,Jérôme,Logement idéal pour affaire et loisir... empla...,1.0
808805,17874801,569631114,2019-11-27,291066073,Thomas,Great !,1.0
419349,6330946,43134376,2015-08-17,38853315,Damien,Séjour très agréable! Très belle chambre au cœ...,1.0


In [157]:
# Worst polarity sentiment
df_rev.tail(10)

Unnamed: 0,listing_id,id,date,reviewer_id,reviewer_name,comments,polarity_sentiment
1075209,27164208,644512470,2020-07-31,6868907,Jean Philippe,Il y a plusieurs annonces pour les mêmes chamb...,-1.0
333017,4430281,71538570,2016-04-26,43931818,Nicolas,J'ai séjourné deux nuits chez (Website hidden ...,-1.0
102257,847825,526775997,2019-09-09,258193208,Peter,"Die Wohnung hat eine tolle Lage, Moulin Rouge ...",-1.0
1252617,37238894,504171060,2019-08-07,53170995,Vyns,"Merci à Mathilde d'avoir été compréhensive, ma...",-1.0
990354,23676144,404804510,2019-01-25,10178761,Sonia,Très arrangeant malgré notre retard à l’ arriv...,-1.0
1022315,24889487,527231824,2019-09-10,93189181,Fawaz,Worst than the expectations!,-1.0
455914,7098493,83262423,2016-07-02,5779347,Aurelie,"Bon séjour, l'appartement correspond bien au p...",-1.0
468251,7378106,645553442,2020-08-02,157366961,Stefan,"Que fais-tu, si tu arrives à Paris pendant l’é...",-1.0
590093,11542239,193088366,2017-09-12,34681875,Chloé Quynh,"Die Unterkunft ist zwar sehr klein, jedoch rel...",-1.0
671549,13648143,208797834,2017-11-03,134217735,Aino Inga,"Die Wohnung war klein aber fein! Alles sauber,...",-1.0


In [158]:
df_rev.iloc[-3,5]

"Que fais-tu, si tu arrives à Paris pendant l’été de la crise de Covid-19 et qu'il fait vraiment chaud ?\n\nTu vas voir Cécile, qui t'accueille avec un sourire et une bouteille d’eau dans son appartement. Tes soucis seront vite oubliés et ton séjour à Paris peut commencer.\n\nL'appartement est bien équipé, bien situé et la base idéale pour explorer Paris. Le métro et une rue avec une pharmacie, un supermarché, des restaurants et des cafés sont juste à quelques pas de l'appartement. Le Centre Pompidou est également à seulement quelques minutes à pied.\n\nJ'ai rencontré Cécile en tant qu'hôte; très sympathique, serviable et chaleureuse. Elle a même pensée à fournir des masques de protection et un désinfectant pour les mains. Aussi, un ami pouvait laisser sa valise avec Cécile jusqu'à l'enregistrement sur son AirBnB plus tard le jour d'arrivée.\n\nEncore une fois merci beaucoup Cécile ! À bientôt !"

textblob seems to not work as efficiently as expected for language other than english, i can witness some discrepancy from french reviewer that shouldn't earn negative polarity score even if that seems a minority.
We need to normalize those score as positive, negative or neutral to make the task easier

In [159]:
def getAnalysis(score):
 if score < 0:
  return 'Negative'
 elif score == 0:
  return 'Neutral'
 else:
  return 'Positive'

df_rev['textBlob_polarity_analysis'] = df_rev['polarity_sentiment'].apply(getAnalysis)

In [160]:
# Add neighbourhood column based on the id of df_list
df_rev['neighbourhood'] = df_rev.listing_id.map(df_list.set_index('id')['neighbourhood_cleansed'].to_dict())
df_rev

Unnamed: 0,listing_id,id,date,reviewer_id,reviewer_name,comments,polarity_sentiment,textBlob_polarity_analysis,neighbourhood
748187,16011380,206787232,2017-10-27,6269557,Romain,Perfect,1.0,Positive,Palais-Bourbon
1069595,27004700,647616281,2020-08-07,159003555,Annemieke,Very happy about the service and the hotel!,1.0,Positive,Palais-Bourbon
419464,6332449,142762670,2017-04-09,56683279,Laetitia,Perfect !,1.0,Positive,Opéra
808809,17874801,585190141,2019-12-31,303696325,Priscila,Great!,1.0,Positive,Buttes-Montmartre
897468,20599898,238617405,2018-02-26,156290654,Jundan,The best Airbnb in the district Bourse (2).,1.0,Positive,Bourse
1196854,33936690,445494981,2019-04-29,129377934,Sylviane,SOLINE nous a réservé un excellent accueil. L'...,1.0,Positive,Popincourt
1038938,25754484,492357388,2019-07-21,149781677,Baptiste,Hôte très sympathique et accueillante. Nous av...,1.0,Positive,Vaugirard
1196857,33937777,457418601,2019-05-24,51192410,Jérôme,Logement idéal pour affaire et loisir... empla...,1.0,Positive,Passy
808805,17874801,569631114,2019-11-27,291066073,Thomas,Great !,1.0,Positive,Buttes-Montmartre
419349,6330946,43134376,2015-08-17,38853315,Damien,Séjour très agréable! Très belle chambre au cœ...,1.0,Positive,Reuilly


In [161]:
# Average Polarity Analysis by neighbourhood
df_avg_pol = df_rev.copy()
df_avg_pol = df_avg_pol.groupby(['neighbourhood'])[['polarity_sentiment']].mean()
df_avg_pol = df_avg_pol.reset_index()
df_avg_pol = df_avg_pol.sort_values(["polarity_sentiment"], ascending=False)

# Bar Chart representing the average Price per Neighbourhood
fig = px.bar(df_avg_pol, x='neighbourhood', y='polarity_sentiment')
fig.show()

I have not been able to translate non-English languages due to high request - which sometimes did not seem to properly note polarity feeling.
The graph above is potentially not really accurate.