# Airbnb Price Prediction
### Date: 05.02.2025
### Author: Konrad Wróński
### Dataset: https://www.kaggle.com/datasets/konradb/inside-airbnb-netherlands
Description: Based on Airbnb data, I will try to predict the price of a room. Data comes from 3 different dutch cities: Amsterdam, Rotterdam and The Hague.

### 1. Importing the libraries

In [94]:
import kagglehub
import pandas as pd 
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import sklearn
from sklearn.linear_model import LinearRegression

### 2. Data Injection

In [95]:
path = kagglehub.dataset_download("konradb/inside-airbnb-netherlands")

print("Path to dataset files:", path)

Path to dataset files: /Users/konradwronski/.cache/kagglehub/datasets/konradb/inside-airbnb-netherlands/versions/2


In [96]:
amsterdam = pd.read_csv("/Users/konradwronski/.cache/kagglehub/datasets/konradb/inside-airbnb-netherlands/versions/2/netherlands/Amsterdam/listings_detailed.csv")
rotterdam = pd.read_csv("/Users/konradwronski/.cache/kagglehub/datasets/konradb/inside-airbnb-netherlands/versions/2/netherlands/Rotterdam/listings_detailed.csv")
hague = pd.read_csv("/Users/konradwronski/.cache/kagglehub/datasets/konradb/inside-airbnb-netherlands/versions/2/netherlands/The Hague/listings_detailed.csv")

### 3.Data Cleaning

In [97]:
amsterdam.head(2)

Unnamed: 0,id,listing_url,scrape_id,last_scraped,source,name,description,neighborhood_overview,picture_url,host_id,...,review_scores_communication,review_scores_location,review_scores_value,license,instant_bookable,calculated_host_listings_count,calculated_host_listings_count_entire_homes,calculated_host_listings_count_private_rooms,calculated_host_listings_count_shared_rooms,reviews_per_month
0,2818,https://www.airbnb.com/rooms/2818,20230309202119,2023-03-09,city scrape,Quiet Garden View Room & Super Fast Wi-Fi,Quiet Garden View Room & Super Fast Wi-Fi<br /...,"Indische Buurt (""Indies Neighborhood"") is a ne...",https://a0.muscache.com/pictures/10272854/8dcc...,3159,...,4.98,4.69,4.81,0363 5F3A 5684 6750 D14D,f,1,0,1,0,1.9
1,311124,https://www.airbnb.com/rooms/311124,20230309202119,2023-03-10,city scrape,*historic centre* *bright* *canal view* *jordaan*,> Please be so kind to book ONLY AFTER conta...,Perfect location in the lively centre. All his...,https://a0.muscache.com/pictures/5208672/5bb60...,1600010,...,4.92,4.93,4.6,0363 59D8 7D30 6CFA DC81,f,1,1,0,0,0.66


In [98]:
rotterdam.head(2)

Unnamed: 0,id,listing_url,scrape_id,last_scraped,source,name,description,neighborhood_overview,picture_url,host_id,...,review_scores_communication,review_scores_location,review_scores_value,license,instant_bookable,calculated_host_listings_count,calculated_host_listings_count_entire_homes,calculated_host_listings_count_private_rooms,calculated_host_listings_count_shared_rooms,reviews_per_month
0,73155,https://www.airbnb.com/rooms/73155,20230326235203,2023-03-27,city scrape,Apartment in centre of Rotterdam,This bright comfortable one bedroom apartment ...,The apartment is in a quiet side street of the...,https://a0.muscache.com/pictures/712479/b2531a...,381163,...,4.95,4.97,4.86,0599 D9BA 806A 9A56 BDBB,f,1,1,0,0,0.78
1,77592,https://www.airbnb.com/rooms/77592,20230326235203,2023-03-27,city scrape,"Charming, Cozy Rotterdam Center",<b>The space</b><br />Welcome to Rotterdam! <...,,https://a0.muscache.com/pictures/e96512a1-b4c7...,416305,...,4.61,4.83,4.56,Exempt,f,1,1,0,0,0.12


In [99]:
hague.head(2)

Unnamed: 0,id,listing_url,scrape_id,last_scraped,source,name,description,neighborhood_overview,picture_url,host_id,...,review_scores_communication,review_scores_location,review_scores_value,license,instant_bookable,calculated_host_listings_count,calculated_host_listings_count_entire_homes,calculated_host_listings_count_private_rooms,calculated_host_listings_count_shared_rooms,reviews_per_month
0,103875,https://www.airbnb.com/rooms/103875,20230326235216,2023-03-27,city scrape,Design house-city centre-private parking,"Your own modern house in the city center, surr...","This neighborhood is great, students, expats a...",https://a0.muscache.com/pictures/66433033/c313...,541315,...,5.0,4.95,4.85,0518B3A6F34367DC62EA,f,3,3,0,0,0.15
1,378711,https://www.airbnb.com/rooms/378711,20230326235216,2023-03-27,city scrape,ruime studio w.private kitchen/bath near beach.,Onze archictectonische half vrij staande villa...,Onze groene en rustige buurt ligt zeer gunstig...,https://a0.muscache.com/pictures/ed90f502-c421...,1902132,...,4.81,4.62,4.0,0518 8B36 A437 A071 6DE3,f,2,1,1,0,0.46


In [100]:
amsterdam.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 6998 entries, 0 to 6997
Data columns (total 75 columns):
 #   Column                                        Non-Null Count  Dtype  
---  ------                                        --------------  -----  
 0   id                                            6998 non-null   int64  
 1   listing_url                                   6998 non-null   object 
 2   scrape_id                                     6998 non-null   int64  
 3   last_scraped                                  6998 non-null   object 
 4   source                                        6998 non-null   object 
 5   name                                          6998 non-null   object 
 6   description                                   6992 non-null   object 
 7   neighborhood_overview                         4506 non-null   object 
 8   picture_url                                   6998 non-null   object 
 9   host_id                                       6998 non-null   i

In [101]:
rotterdam.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 892 entries, 0 to 891
Data columns (total 75 columns):
 #   Column                                        Non-Null Count  Dtype  
---  ------                                        --------------  -----  
 0   id                                            892 non-null    int64  
 1   listing_url                                   892 non-null    object 
 2   scrape_id                                     892 non-null    int64  
 3   last_scraped                                  892 non-null    object 
 4   source                                        892 non-null    object 
 5   name                                          892 non-null    object 
 6   description                                   891 non-null    object 
 7   neighborhood_overview                         478 non-null    object 
 8   picture_url                                   892 non-null    object 
 9   host_id                                       892 non-null    int

In [102]:
hague.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 947 entries, 0 to 946
Data columns (total 75 columns):
 #   Column                                        Non-Null Count  Dtype  
---  ------                                        --------------  -----  
 0   id                                            947 non-null    int64  
 1   listing_url                                   947 non-null    object 
 2   scrape_id                                     947 non-null    int64  
 3   last_scraped                                  947 non-null    object 
 4   source                                        947 non-null    object 
 5   name                                          947 non-null    object 
 6   description                                   947 non-null    object 
 7   neighborhood_overview                         616 non-null    object 
 8   picture_url                                   947 non-null    object 
 9   host_id                                       947 non-null    int

### 3.1 Merging Datasets

In [103]:
amsterdam["amsterdam"] = 1
rotterdam["rotterdam"] = 1
hague["hague"] = 1


In [104]:
cities = [amsterdam, rotterdam, hague]
result = pd.concat(cities)

In [105]:
missings = result.isna().sum().sort_values(ascending=False)
missings = missings[missings > 0]
missings



calendar_updated                8837
bathrooms                       8837
rotterdam                       7945
hague                           7890
neighbourhood_group_cleansed    6998
host_neighbourhood              5843
host_about                      3701
neighborhood_overview           3237
neighbourhood                   3237
host_response_time              2331
host_response_rate              2331
amsterdam                       1839
host_location                   1128
host_acceptance_rate            1083
review_scores_cleanliness        930
review_scores_accuracy           930
review_scores_checkin            930
review_scores_communication      930
review_scores_location           930
review_scores_value              930
review_scores_rating             924
last_review                      924
first_review                     924
reviews_per_month                924
bedrooms                         397
beds                             114
license                           57
b

In [106]:
result[["amsterdam", "rotterdam", "hague"]] = result[["amsterdam", "rotterdam", "hague"]].fillna(0)


In [107]:
result.columns

Index(['id', 'listing_url', 'scrape_id', 'last_scraped', 'source', 'name',
       'description', 'neighborhood_overview', 'picture_url', 'host_id',
       'host_url', 'host_name', 'host_since', 'host_location', 'host_about',
       'host_response_time', 'host_response_rate', 'host_acceptance_rate',
       'host_is_superhost', 'host_thumbnail_url', 'host_picture_url',
       'host_neighbourhood', 'host_listings_count',
       'host_total_listings_count', 'host_verifications',
       'host_has_profile_pic', 'host_identity_verified', 'neighbourhood',
       'neighbourhood_cleansed', 'neighbourhood_group_cleansed', 'latitude',
       'longitude', 'property_type', 'room_type', 'accommodates', 'bathrooms',
       'bathrooms_text', 'bedrooms', 'beds', 'amenities', 'price',
       'minimum_nights', 'maximum_nights', 'minimum_minimum_nights',
       'maximum_minimum_nights', 'minimum_maximum_nights',
       'maximum_maximum_nights', 'minimum_nights_avg_ntm',
       'maximum_nights_avg_ntm', 'ca

### 3.2 Removing columns 

In [108]:
final_columns = ['room_type', 'accommodates','bedrooms', 'beds', 
                 'price', 'amsterdam', 'rotterdam', 'hague', 'number_of_reviews',
                  'host_listings_count']

result = result[final_columns]

In [109]:
result.head(2)

Unnamed: 0,room_type,accommodates,bedrooms,beds,price,amsterdam,rotterdam,hague,number_of_reviews,host_listings_count
0,Private room,2,1.0,2.0,$69.00,1.0,0.0,0.0,322,1.0
1,Entire home/apt,2,1.0,1.0,$325.00,1.0,0.0,0.0,87,1.0


### 3.3 Creating dummy variables

One-hot encoding

In [110]:
result = pd.get_dummies(result, columns= ['room_type'], drop_first=True, dtype='int64')

In [111]:
result.head(2)

Unnamed: 0,accommodates,bedrooms,beds,price,amsterdam,rotterdam,hague,number_of_reviews,host_listings_count,room_type_Hotel room,room_type_Private room,room_type_Shared room
0,2,1.0,2.0,$69.00,1.0,0.0,0.0,322,1.0,0,1,0
1,2,1.0,1.0,$325.00,1.0,0.0,0.0,87,1.0,0,0,0


In [112]:
result.info()

<class 'pandas.core.frame.DataFrame'>
Index: 8837 entries, 0 to 946
Data columns (total 12 columns):
 #   Column                  Non-Null Count  Dtype  
---  ------                  --------------  -----  
 0   accommodates            8837 non-null   int64  
 1   bedrooms                8440 non-null   float64
 2   beds                    8723 non-null   float64
 3   price                   8837 non-null   object 
 4   amsterdam               8837 non-null   float64
 5   rotterdam               8837 non-null   float64
 6   hague                   8837 non-null   float64
 7   number_of_reviews       8837 non-null   int64  
 8   host_listings_count     8836 non-null   float64
 9   room_type_Hotel room    8837 non-null   int64  
 10  room_type_Private room  8837 non-null   int64  
 11  room_type_Shared room   8837 non-null   int64  
dtypes: float64(6), int64(5), object(1)
memory usage: 897.5+ KB
