# Supervised Regression with XGBoost Algorithm: Predict AirBnB Listing Prices

## Problem Statement: 

Using the XGBoost algorithm we will predict the AirBnB listing prices based on available features such as **host information** like response rate, location, etc., and **listing information** such as number of bedrooms, number of bathrooms, etc. 

## 0. Setup Google Colab Environment 

**Uncomment and execute the code cell below if using a Google Colab to run this notebook.**

In [1]:
# !python3 -m pip install xgboost numpy pandas scikit-learn plotnine --quiet 
# !python3 -m pip install seaborn pyarrow optuna plotly --quiet

## 1. Import Libraries and Modules

In [2]:
# Import of built-in libraries.
import os 
import warnings
import re 

# Ignore warnings
warnings.filterwarnings(action='ignore')

# Import of third-party libraries.
import IPython
import numpy as np 
import pandas as pd  
import matplotlib as mpl 
import seaborn as sns 
import xgboost as xgb

In [3]:
# Print versions of imported libraries.
print('ipython --version: %s' %(IPython.__version__))
print('numpy --version: %s' %(np.__version__))
print('pandas --version: %s' %(pd.__version__))
print('matplotlib --version: %s' %(mpl.__version__))
print('seaborn --version: %s' %(sns.__version__))
print('xgboost --version: %s' %(xgb.__version__))

ipython --version: 8.6.0
numpy --version: 1.22.3
pandas --version: 1.5.1
matplotlib --version: 3.5.3
seaborn --version: 0.12.1
xgboost --version: 1.5.0


In [4]:
# Import required modules from libraries

from IPython.display import display 
from IPython.core.interactiveshell import InteractiveShell
InteractiveShell.ast_node_interactivity = 'all'

from matplotlib import pyplot as plt 

In [5]:
pd.set_option('display.max_columns', 1000)

## 2. Load Data Set as Pandas DataFrame

In [6]:
# Uncomment this code cell if using Google Colab
# data_loc = 'https://raw.githubusercontent.com/rs2pydev/reg_xgb_airbnb/master/dataset/listings.csv'
# listings_orig = pd.read_csv(filepath_or_buffer=data_loc, header='infer')

In [12]:
# Uncomment this code cell if using local environment
data_loc = './dataset/listings.csv'
listings_orig = pd.read_csv(filepath_or_buffer=data_loc, header='infer')

In [19]:
listings = listings_orig.copy(deep=True)

## 3.1 Analytical EDA

In [20]:
# Shape of DataFrame.
print('Shape of DataFrame: ', listings.shape)

Shape of DataFrame:  (39881, 75)


In [21]:
# Display first few rows of DataFrame.
display(listings.head())

Unnamed: 0,id,listing_url,scrape_id,last_scraped,source,name,description,neighborhood_overview,picture_url,host_id,host_url,host_name,host_since,host_location,host_about,host_response_time,host_response_rate,host_acceptance_rate,host_is_superhost,host_thumbnail_url,host_picture_url,host_neighbourhood,host_listings_count,host_total_listings_count,host_verifications,host_has_profile_pic,host_identity_verified,neighbourhood,neighbourhood_cleansed,neighbourhood_group_cleansed,latitude,longitude,property_type,room_type,accommodates,bathrooms,bathrooms_text,bedrooms,beds,amenities,price,minimum_nights,maximum_nights,minimum_minimum_nights,maximum_minimum_nights,minimum_maximum_nights,maximum_maximum_nights,minimum_nights_avg_ntm,maximum_nights_avg_ntm,calendar_updated,has_availability,availability_30,availability_60,availability_90,availability_365,calendar_last_scraped,number_of_reviews,number_of_reviews_ltm,number_of_reviews_l30d,first_review,last_review,review_scores_rating,review_scores_accuracy,review_scores_cleanliness,review_scores_checkin,review_scores_communication,review_scores_location,review_scores_value,license,instant_bookable,calculated_host_listings_count,calculated_host_listings_count_entire_homes,calculated_host_listings_count_private_rooms,calculated_host_listings_count_shared_rooms,reviews_per_month
0,2539,https://www.airbnb.com/rooms/2539,20220907064715,2022-09-07,city scrape,Clean & quiet apt home by the park,Renovated apt home in elevator building.<br />...,Close to Prospect Park and Historic Ditmas Park,https://a0.muscache.com/pictures/3949d073-a02e...,2787,https://www.airbnb.com/users/show/2787,John,2008-09-07,"New York, NY",Educated professional living in Brooklyn. I l...,within an hour,100%,90%,f,https://a0.muscache.com/im/pictures/user/86745...,https://a0.muscache.com/im/pictures/user/86745...,Gravesend,9.0,12.0,"['email', 'phone']",t,t,"Brooklyn , New York, United States",Kensington,Brooklyn,40.64529,-73.97238,Private room in rental unit,Private room,2,,1 shared bath,1.0,1.0,"[""First aid kit"", ""Microwave"", ""Stove"", ""Coffe...",$299.00,30,730,30.0,30.0,730.0,730.0,30.0,730.0,,t,21,51,81,356,2022-09-07,9,0,0,2015-12-04,2018-10-19,4.89,4.88,5.0,5.0,5.0,4.75,4.88,,f,9,1,6,2,0.11
1,2595,https://www.airbnb.com/rooms/2595,20220907064715,2022-09-07,city scrape,Skylit Midtown Castle,"Beautiful, spacious skylit studio in the heart...",Centrally located in the heart of Manhattan ju...,https://a0.muscache.com/pictures/f0813a11-40b2...,2845,https://www.airbnb.com/users/show/2845,Jennifer,2008-09-09,"New York, NY",A New Yorker since (Phone number hidden by Air...,within a day,75%,23%,f,https://a0.muscache.com/im/pictures/user/50fc5...,https://a0.muscache.com/im/pictures/user/50fc5...,Midtown,6.0,9.0,"['email', 'phone', 'work_email']",t,t,"New York, United States",Midtown,Manhattan,40.75356,-73.98559,Entire rental unit,Entire home/apt,1,,1 bath,,1.0,"[""Stove"", ""Coffee maker"", ""Long term stays all...",$175.00,30,1125,30.0,30.0,1125.0,1125.0,30.0,1125.0,,t,0,0,5,280,2022-09-07,49,1,0,2009-11-21,2022-06-21,4.68,4.73,4.63,4.77,4.8,4.81,4.4,,f,3,3,0,0,0.31
2,5121,https://www.airbnb.com/rooms/5121,20220907064715,2022-09-07,city scrape,BlissArtsSpace!,One room available for rent in a 2 bedroom apt...,,https://a0.muscache.com/pictures/2090980c-b68e...,7356,https://www.airbnb.com/users/show/7356,Garon,2009-02-03,"New York, NY","I am an artist(painter, filmmaker) and curato...",within an hour,100%,100%,t,https://a0.muscache.com/im/pictures/user/72a61...,https://a0.muscache.com/im/pictures/user/72a61...,Bedford-Stuyvesant,2.0,2.0,"['email', 'phone']",t,t,,Bedford-Stuyvesant,Brooklyn,40.68535,-73.95512,Private room in rental unit,Private room,2,,,1.0,1.0,"[""Heating"", ""Kitchen"", ""Air conditioning"", ""Wi...",$60.00,30,730,30.0,30.0,730.0,730.0,30.0,730.0,,t,5,30,60,335,2022-09-07,50,0,0,2009-05-28,2019-12-02,4.52,4.22,4.09,4.91,4.91,4.47,4.52,,f,2,0,2,0,0.31
3,45910,https://www.airbnb.com/rooms/45910,20220907064715,2022-09-07,city scrape,Beautiful Queens Brownstone! - 5BR,"<b>The space</b><br />Beautiful, fully furnish...",,https://a0.muscache.com/pictures/27117627/19ff...,204539,https://www.airbnb.com/users/show/204539,Mark,2010-08-17,"New York, NY",Father of two boys - 9 & 10.,within an hour,100%,19%,f,https://a0.muscache.com/im/users/204539/profil...,https://a0.muscache.com/im/users/204539/profil...,Ridgewood,7.0,7.0,"['email', 'phone']",t,t,,Ridgewood,Queens,40.70309,-73.89963,Entire townhouse,Entire home/apt,16,,2.5 baths,5.0,10.0,"[""Hair dryer"", ""Essentials"", ""Carbon monoxide ...",$425.00,30,730,30.0,30.0,730.0,730.0,30.0,730.0,,t,30,60,90,365,2022-09-07,13,0,0,2012-01-03,2019-11-12,4.42,4.64,4.36,4.82,5.0,4.82,4.55,,f,6,6,0,0,0.1
4,5136,https://www.airbnb.com/rooms/5136,20220907064715,2022-09-07,city scrape,"Spacious Brooklyn Duplex, Patio + Garden",We welcome you to stay in our lovely 2 br dupl...,,https://a0.muscache.com/pictures/miso/Hosting-...,7378,https://www.airbnb.com/users/show/7378,Rebecca,2009-02-03,"New York, NY","Rebecca is an artist/designer, and Henoch is i...",,,33%,f,https://a0.muscache.com/im/users/7378/profile_...,https://a0.muscache.com/im/users/7378/profile_...,Greenwood Heights,1.0,5.0,"['email', 'phone']",t,t,,Sunset Park,Brooklyn,40.66265,-73.99454,Entire rental unit,Entire home/apt,4,,1.5 baths,2.0,2.0,"[""Hair dryer"", ""Cable TV"", ""Refrigerator"", ""BB...",$275.00,21,1125,21.0,21.0,1125.0,1125.0,21.0,1125.0,,t,0,0,0,179,2022-09-07,3,1,1,2014-01-02,2022-08-10,5.0,5.0,5.0,5.0,5.0,4.67,5.0,,f,1,1,0,0,0.03


In [22]:
# DataFrame Metadata Information 
listings.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 39881 entries, 0 to 39880
Data columns (total 75 columns):
 #   Column                                        Non-Null Count  Dtype  
---  ------                                        --------------  -----  
 0   id                                            39881 non-null  int64  
 1   listing_url                                   39881 non-null  object 
 2   scrape_id                                     39881 non-null  int64  
 3   last_scraped                                  39881 non-null  object 
 4   source                                        39881 non-null  object 
 5   name                                          39868 non-null  object 
 6   description                                   39036 non-null  object 
 7   neighborhood_overview                         23466 non-null  object 
 8   picture_url                                   39881 non-null  object 
 9   host_id                                       39881 non-null 

The first five columns appear redundant for our regression task as these pertain to scarping information. The columns are: 

- `id`  
- `listing_url` 
- `scrape_id`  
- `last_scraped`
- `source`    

Therefore, we discard these from our DataFrame. 

In [23]:
cols_to_drop = ['id', 'listing_url', 'scrape_id', 'last_scraped', 'source']
listings = listings.drop(columns=cols_to_drop, axis=1)

In [24]:
listings.shape
listings.head()

(39881, 70)

Unnamed: 0,name,description,neighborhood_overview,picture_url,host_id,host_url,host_name,host_since,host_location,host_about,host_response_time,host_response_rate,host_acceptance_rate,host_is_superhost,host_thumbnail_url,host_picture_url,host_neighbourhood,host_listings_count,host_total_listings_count,host_verifications,host_has_profile_pic,host_identity_verified,neighbourhood,neighbourhood_cleansed,neighbourhood_group_cleansed,latitude,longitude,property_type,room_type,accommodates,bathrooms,bathrooms_text,bedrooms,beds,amenities,price,minimum_nights,maximum_nights,minimum_minimum_nights,maximum_minimum_nights,minimum_maximum_nights,maximum_maximum_nights,minimum_nights_avg_ntm,maximum_nights_avg_ntm,calendar_updated,has_availability,availability_30,availability_60,availability_90,availability_365,calendar_last_scraped,number_of_reviews,number_of_reviews_ltm,number_of_reviews_l30d,first_review,last_review,review_scores_rating,review_scores_accuracy,review_scores_cleanliness,review_scores_checkin,review_scores_communication,review_scores_location,review_scores_value,license,instant_bookable,calculated_host_listings_count,calculated_host_listings_count_entire_homes,calculated_host_listings_count_private_rooms,calculated_host_listings_count_shared_rooms,reviews_per_month
0,Clean & quiet apt home by the park,Renovated apt home in elevator building.<br />...,Close to Prospect Park and Historic Ditmas Park,https://a0.muscache.com/pictures/3949d073-a02e...,2787,https://www.airbnb.com/users/show/2787,John,2008-09-07,"New York, NY",Educated professional living in Brooklyn. I l...,within an hour,100%,90%,f,https://a0.muscache.com/im/pictures/user/86745...,https://a0.muscache.com/im/pictures/user/86745...,Gravesend,9.0,12.0,"['email', 'phone']",t,t,"Brooklyn , New York, United States",Kensington,Brooklyn,40.64529,-73.97238,Private room in rental unit,Private room,2,,1 shared bath,1.0,1.0,"[""First aid kit"", ""Microwave"", ""Stove"", ""Coffe...",$299.00,30,730,30.0,30.0,730.0,730.0,30.0,730.0,,t,21,51,81,356,2022-09-07,9,0,0,2015-12-04,2018-10-19,4.89,4.88,5.0,5.0,5.0,4.75,4.88,,f,9,1,6,2,0.11
1,Skylit Midtown Castle,"Beautiful, spacious skylit studio in the heart...",Centrally located in the heart of Manhattan ju...,https://a0.muscache.com/pictures/f0813a11-40b2...,2845,https://www.airbnb.com/users/show/2845,Jennifer,2008-09-09,"New York, NY",A New Yorker since (Phone number hidden by Air...,within a day,75%,23%,f,https://a0.muscache.com/im/pictures/user/50fc5...,https://a0.muscache.com/im/pictures/user/50fc5...,Midtown,6.0,9.0,"['email', 'phone', 'work_email']",t,t,"New York, United States",Midtown,Manhattan,40.75356,-73.98559,Entire rental unit,Entire home/apt,1,,1 bath,,1.0,"[""Stove"", ""Coffee maker"", ""Long term stays all...",$175.00,30,1125,30.0,30.0,1125.0,1125.0,30.0,1125.0,,t,0,0,5,280,2022-09-07,49,1,0,2009-11-21,2022-06-21,4.68,4.73,4.63,4.77,4.8,4.81,4.4,,f,3,3,0,0,0.31
2,BlissArtsSpace!,One room available for rent in a 2 bedroom apt...,,https://a0.muscache.com/pictures/2090980c-b68e...,7356,https://www.airbnb.com/users/show/7356,Garon,2009-02-03,"New York, NY","I am an artist(painter, filmmaker) and curato...",within an hour,100%,100%,t,https://a0.muscache.com/im/pictures/user/72a61...,https://a0.muscache.com/im/pictures/user/72a61...,Bedford-Stuyvesant,2.0,2.0,"['email', 'phone']",t,t,,Bedford-Stuyvesant,Brooklyn,40.68535,-73.95512,Private room in rental unit,Private room,2,,,1.0,1.0,"[""Heating"", ""Kitchen"", ""Air conditioning"", ""Wi...",$60.00,30,730,30.0,30.0,730.0,730.0,30.0,730.0,,t,5,30,60,335,2022-09-07,50,0,0,2009-05-28,2019-12-02,4.52,4.22,4.09,4.91,4.91,4.47,4.52,,f,2,0,2,0,0.31
3,Beautiful Queens Brownstone! - 5BR,"<b>The space</b><br />Beautiful, fully furnish...",,https://a0.muscache.com/pictures/27117627/19ff...,204539,https://www.airbnb.com/users/show/204539,Mark,2010-08-17,"New York, NY",Father of two boys - 9 & 10.,within an hour,100%,19%,f,https://a0.muscache.com/im/users/204539/profil...,https://a0.muscache.com/im/users/204539/profil...,Ridgewood,7.0,7.0,"['email', 'phone']",t,t,,Ridgewood,Queens,40.70309,-73.89963,Entire townhouse,Entire home/apt,16,,2.5 baths,5.0,10.0,"[""Hair dryer"", ""Essentials"", ""Carbon monoxide ...",$425.00,30,730,30.0,30.0,730.0,730.0,30.0,730.0,,t,30,60,90,365,2022-09-07,13,0,0,2012-01-03,2019-11-12,4.42,4.64,4.36,4.82,5.0,4.82,4.55,,f,6,6,0,0,0.1
4,"Spacious Brooklyn Duplex, Patio + Garden",We welcome you to stay in our lovely 2 br dupl...,,https://a0.muscache.com/pictures/miso/Hosting-...,7378,https://www.airbnb.com/users/show/7378,Rebecca,2009-02-03,"New York, NY","Rebecca is an artist/designer, and Henoch is i...",,,33%,f,https://a0.muscache.com/im/users/7378/profile_...,https://a0.muscache.com/im/users/7378/profile_...,Greenwood Heights,1.0,5.0,"['email', 'phone']",t,t,,Sunset Park,Brooklyn,40.66265,-73.99454,Entire rental unit,Entire home/apt,4,,1.5 baths,2.0,2.0,"[""Hair dryer"", ""Cable TV"", ""Refrigerator"", ""BB...",$275.00,21,1125,21.0,21.0,1125.0,1125.0,21.0,1125.0,,t,0,0,0,179,2022-09-07,3,1,1,2014-01-02,2022-08-10,5.0,5.0,5.0,5.0,5.0,4.67,5.0,,f,1,1,0,0,0.03


In [None]:
# Print all column names.
col_names = listings.columns.to_list()
print('Column names: ', col_names, sep='\n', end='\n\n')

In [None]:
# 

In [None]:
X = listings.drop(columns=['price'])

In [None]:
listings.head(3)

## 3.2 Visual EDA