# Paris Airbnb Price

How much should I rent my flat per night?

I use linear regression for predicting the night price of an appartment on Airbnb.
The dataset is available at http://insideairbnb.com/get-the-data.html under a Creative Commons CC0 1.0 Universal (CC0 1.0) "Public Domain Dedication" license.

## 1.Data Preprocessing
If the data file is not uncompressed yet, we have to uncompress it.

In [1]:
#import gzip
#import shutil
#with gzip.open('data/listing.csv.gz', 'rb') as f_in:
#    with open('data/listing.csv', 'wb') as f_out:
#        shutil.copyfileobj(f_in, f_out)

First, we import the required libraries

In [2]:
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd

Then, we import the dataset and have a look at it.

In [3]:
dataset = pd.read_csv('data/listings.csv')
dataset.iloc[:5,:]

  interactivity=interactivity, compiler=compiler, result=result)


Unnamed: 0,id,listing_url,scrape_id,last_scraped,name,summary,space,description,experiences_offered,neighborhood_overview,...,requires_license,license,jurisdiction_names,instant_bookable,is_business_travel_ready,cancellation_policy,require_guest_profile_picture,require_guest_phone_verification,calculated_host_listings_count,reviews_per_month
0,2577,https://www.airbnb.com/rooms/2577,20181207151406,2018-12-08,Loft for 4 by Canal Saint Martin,"100 m2 loft (1100 sq feet) with high ceiling, ...",The district has any service or shop you may d...,"100 m2 loft (1100 sq feet) with high ceiling, ...",none,,...,t,,{PARIS},t,f,strict_14_with_grace_period,f,f,1,
1,3109,https://www.airbnb.com/rooms/3109,20181207151406,2018-12-08,zen and calm,Appartement très calme de 50M2 Belle lumière D...,I bedroom appartment in Paris 14,I bedroom appartment in Paris 14,none,,...,t,,{PARIS},f,f,flexible,f,f,1,0.29
2,5396,https://www.airbnb.com/rooms/5396,20181207151406,2018-12-08,Explore the heart of old Paris,"Cozy, well-appointed and graciously designed s...","Small, well appointed studio apartment at the ...","Cozy, well-appointed and graciously designed s...",none,"You are within walking distance to the Louvre,...",...,t,,{PARIS},t,f,strict_14_with_grace_period,f,f,1,1.29
3,7397,https://www.airbnb.com/rooms/7397,20181207151406,2018-12-08,MARAIS - 2ROOMS APT - 2/4 PEOPLE,"VERY CONVENIENT, WITH THE BEST LOCATION !",PLEASE ASK ME BEFORE TO MAKE A REQUEST !!! No ...,"VERY CONVENIENT, WITH THE BEST LOCATION ! PLEA...",none,,...,t,7510400829623.0,{PARIS},f,f,moderate,f,f,1,2.47
4,7964,https://www.airbnb.com/rooms/7964,20181207151406,2018-12-08,Large & sunny flat with balcony !,Very large & nice apartment all for you! - Su...,hello ! We have a great 75 square meter apartm...,Very large & nice apartment all for you! - Su...,none,,...,t,,{PARIS},f,f,strict_14_with_grace_period,f,f,1,0.06


There are a lot of variable.
Let's keep only the relevant ones, as well as the dependent variable: the price per night!

In [4]:
dataset = dataset[['host_is_superhost','neighbourhood','zipcode','latitude','longitude','property_type',
                   'room_type','accommodates','bathrooms','bedrooms','beds','bed_type','amenities','square_feet','price',
                   'weekly_price','monthly_price','cleaning_fee','number_of_reviews','review_scores_rating',
                   'review_scores_accuracy','review_scores_cleanliness','review_scores_checkin',
                   'review_scores_communication','review_scores_location','review_scores_value']]
dataset.iloc[:5,:]

Unnamed: 0,host_is_superhost,neighbourhood,zipcode,latitude,longitude,property_type,room_type,accommodates,bathrooms,bedrooms,...,monthly_price,cleaning_fee,number_of_reviews,review_scores_rating,review_scores_accuracy,review_scores_cleanliness,review_scores_checkin,review_scores_communication,review_scores_location,review_scores_value
0,f,République,75010,48.869933,2.362511,Loft,Entire home/apt,4,2.0,2.0,...,,$40.00,0,,,,,,,
1,f,Alésia,75014,48.833494,2.318518,Apartment,Entire home/apt,2,1.0,1.0,...,,$50.00,7,100.0,10.0,10.0,10.0,10.0,10.0,10.0
2,f,Saint-Paul - Ile Saint-Louis,75004,48.851001,2.35869,Apartment,Entire home/apt,2,1.0,0.0,...,"$2,000.00",$36.00,148,94.0,9.0,9.0,10.0,10.0,10.0,10.0
3,t,Le Marais,75004,48.857576,2.352751,Apartment,Entire home/apt,4,1.0,2.0,...,"$2,200.00",$50.00,231,94.0,10.0,9.0,10.0,10.0,10.0,9.0
4,f,Gare du Nord - Gare de I'Est,75009,48.874642,2.343411,Apartment,Entire home/apt,2,1.0,2.0,...,,$60.00,6,96.0,10.0,10.0,10.0,10.0,10.0,10.0


These variables are interesting, but we could do better.
- Some of these are directly related (neighbourhood and zipcode directly depends of latitude and longitude / global review score depends on the other review scores).
- Amenities are a bit difficult to treat for a first version of the algorithm.
- Square feet field is rarely completed
- Weekly and Monthly prices are not always available and we should for now keep these appart.
- We are going to predict the total price which is composed of the price added the cleaning fee.
- When there is no review, the other review variables as considered 'Nan'.
- Sometimes, the other review variables are 'NaN' even though there are reviews.

In [5]:
dataset = dataset.drop(columns=['neighbourhood','zipcode','amenities','square_feet','weekly_price','monthly_price','review_scores_rating'])
dataset['price'] = dataset['price'].replace('[\$,]', '', regex=True).astype(float) + dataset['cleaning_fee'].replace('[\$,]', '', regex=True).astype(float)
dataset = dataset.drop(columns=['cleaning_fee'])
dataset

Unnamed: 0,host_is_superhost,latitude,longitude,property_type,room_type,accommodates,bathrooms,bedrooms,beds,bed_type,price,number_of_reviews,review_scores_accuracy,review_scores_cleanliness,review_scores_checkin,review_scores_communication,review_scores_location,review_scores_value
0,f,48.869933,2.362511,Loft,Entire home/apt,4,2.0,2.0,2.0,Real Bed,165.0,0,,,,,,
1,f,48.833494,2.318518,Apartment,Entire home/apt,2,1.0,1.0,1.0,Real Bed,125.0,7,10.0,10.0,10.0,10.0,10.0,10.0
2,f,48.851001,2.358690,Apartment,Entire home/apt,2,1.0,0.0,1.0,Pull-out Sofa,151.0,148,9.0,9.0,10.0,10.0,10.0,10.0
3,t,48.857576,2.352751,Apartment,Entire home/apt,4,1.0,2.0,2.0,Real Bed,165.0,231,10.0,9.0,10.0,10.0,10.0,9.0
4,f,48.874642,2.343411,Apartment,Entire home/apt,2,1.0,2.0,2.0,Real Bed,159.0,6,10.0,10.0,10.0,10.0,10.0,10.0
5,f,48.865279,2.393263,Apartment,Entire home/apt,3,1.0,1.0,1.0,Real Bed,,1,,,,,,
6,f,48.858985,2.347347,Apartment,Entire home/apt,2,1.0,1.0,1.0,Real Bed,275.0,0,,,,,,
7,t,48.862266,2.371341,Apartment,Entire home/apt,2,1.0,1.0,1.0,Real Bed,90.0,17,10.0,10.0,10.0,10.0,10.0,10.0
8,t,48.862775,2.374680,Apartment,Entire home/apt,5,1.0,2.0,3.0,Real Bed,185.0,144,10.0,10.0,10.0,10.0,10.0,10.0
9,t,48.867430,2.375240,Apartment,Entire home/apt,4,1.0,1.0,2.0,Real Bed,140.0,147,10.0,10.0,10.0,10.0,10.0,10.0
