Problem: What are the biggest indicators of price for an airbnb listing? How can we predict the price of an airbnb lists?

In [1]:
#imports

In [1]:
import pandas as pd
import numpy as np
from nltk.sentiment.vader import SentimentIntensityAnalyzer

In [2]:
pd.options.display.max_columns=300
pd.options.display.max_rows=5000

This is the first import that gives the basic information of the airbnb listings. This is being imported to see any data that can be used and what data will look like. We also are assessing any columns that are strictly null and can be drop at this time.

In [3]:
listings_df=pd.read_csv('../edinburgh-inside-airbnb/listings-summary.csv')

In [4]:
listings_df.head()

Unnamed: 0,id,name,host_id,host_name,neighbourhood_group,neighbourhood,latitude,longitude,room_type,price,minimum_nights,number_of_reviews,last_review,reviews_per_month,calculated_host_listings_count,availability_365
0,15420,Georgian Boutique Apt City Centre,60423,Charlotte,,"Old Town, Princes Street and Leith Street",55.95689,-3.18768,Entire home/apt,80,3,283,2019-06-23,2.76,1,193
1,24288,"Cool central Loft, sleeps 4, 2 double bed+en-s...",46498,Gordon,,Meadows and Southside,55.94265,-3.18467,Entire home/apt,115,2,199,2019-06-19,1.86,1,4
2,38628,Edinburgh Holiday Let,165635,Trish,,Joppa,55.94308,-3.09525,Entire home/apt,46,4,52,2019-05-29,0.85,2,288
3,44552,Double room - spacious Leith flat,195950,Shaun,,South Leith,55.966,-3.17241,Private room,32,2,184,2019-06-04,1.71,1,136
4,47616,"City flat, close to nature and the Fringe",216203,Ben,,"Canongate, Southside and Dumbiedykes",55.94732,-3.17851,Private room,100,1,32,2019-05-26,0.84,1,0


In [5]:
listings_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 13245 entries, 0 to 13244
Data columns (total 16 columns):
id                                13245 non-null int64
name                              13244 non-null object
host_id                           13245 non-null int64
host_name                         13226 non-null object
neighbourhood_group               0 non-null float64
neighbourhood                     13245 non-null object
latitude                          13245 non-null float64
longitude                         13245 non-null float64
room_type                         13245 non-null object
price                             13245 non-null int64
minimum_nights                    13245 non-null int64
number_of_reviews                 13245 non-null int64
last_review                       11213 non-null object
reviews_per_month                 11213 non-null float64
calculated_host_listings_count    13245 non-null int64
availability_365                  13245 non-null int64
dt

In [6]:
listings_df.isnull().sum()

id                                    0
name                                  1
host_id                               0
host_name                            19
neighbourhood_group               13245
neighbourhood                         0
latitude                              0
longitude                             0
room_type                             0
price                                 0
minimum_nights                        0
number_of_reviews                     0
last_review                        2032
reviews_per_month                  2032
calculated_host_listings_count        0
availability_365                      0
dtype: int64

In [7]:
listings_df.drop(columns='neighbourhood_group',axis=1,inplace=True) #dropped due to entire column being null

Since neighborhood group was entirely null through the set, the entire column can be dropped.

In [8]:
listings_df.isnull().sum()

id                                   0
name                                 1
host_id                              0
host_name                           19
neighbourhood                        0
latitude                             0
longitude                            0
room_type                            0
price                                0
minimum_nights                       0
number_of_reviews                    0
last_review                       2032
reviews_per_month                 2032
calculated_host_listings_count       0
availability_365                     0
dtype: int64

All other null values will be zeroed in order to ensure that the data can be can be read appropriately.

In [10]:
listings_df.replace(np.nan,0,inplace=True) #nulls are zeroed to show missing data

In [11]:
listings_df.head()

Unnamed: 0,id,name,host_id,host_name,neighbourhood,latitude,longitude,room_type,price,minimum_nights,number_of_reviews,last_review,reviews_per_month,calculated_host_listings_count,availability_365
0,15420,Georgian Boutique Apt City Centre,60423,Charlotte,"Old Town, Princes Street and Leith Street",55.95689,-3.18768,Entire home/apt,80,3,283,2019-06-23,2.76,1,193
1,24288,"Cool central Loft, sleeps 4, 2 double bed+en-s...",46498,Gordon,Meadows and Southside,55.94265,-3.18467,Entire home/apt,115,2,199,2019-06-19,1.86,1,4
2,38628,Edinburgh Holiday Let,165635,Trish,Joppa,55.94308,-3.09525,Entire home/apt,46,4,52,2019-05-29,0.85,2,288
3,44552,Double room - spacious Leith flat,195950,Shaun,South Leith,55.966,-3.17241,Private room,32,2,184,2019-06-04,1.71,1,136
4,47616,"City flat, close to nature and the Fringe",216203,Ben,"Canongate, Southside and Dumbiedykes",55.94732,-3.17851,Private room,100,1,32,2019-05-26,0.84,1,0


In [12]:
listings_all_df=pd.read_csv('../edinburgh-inside-airbnb/listings.csv')

  interactivity=interactivity, compiler=compiler, result=result)


In [13]:
listings_all_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 13245 entries, 0 to 13244
Columns: 106 entries, id to reviews_per_month
dtypes: float64(24), int64(21), object(61)
memory usage: 10.7+ MB


In [14]:
listings_all_df.describe()

Unnamed: 0,id,scrape_id,thumbnail_url,medium_url,xl_picture_url,host_id,host_acceptance_rate,host_listings_count,host_total_listings_count,neighbourhood_group_cleansed,latitude,longitude,accommodates,bathrooms,bedrooms,beds,square_feet,guests_included,minimum_nights,maximum_nights,minimum_minimum_nights,maximum_minimum_nights,minimum_maximum_nights,maximum_maximum_nights,minimum_nights_avg_ntm,maximum_nights_avg_ntm,availability_30,availability_60,availability_90,availability_365,number_of_reviews,number_of_reviews_ltm,review_scores_rating,review_scores_accuracy,review_scores_cleanliness,review_scores_checkin,review_scores_communication,review_scores_location,review_scores_value,jurisdiction_names,calculated_host_listings_count,calculated_host_listings_count_entire_homes,calculated_host_listings_count_private_rooms,calculated_host_listings_count_shared_rooms,reviews_per_month
count,13245.0,13245.0,0.0,0.0,0.0,13245.0,0.0,13226.0,13226.0,0.0,13245.0,13245.0,13245.0,13233.0,13241.0,13230.0,25.0,13245.0,13245.0,13245.0,13245.0,13245.0,13245.0,13245.0,13245.0,13245.0,13245.0,13245.0,13245.0,13245.0,13245.0,13245.0,11068.0,11063.0,11064.0,11059.0,11063.0,11056.0,11057.0,0.0,13245.0,13245.0,13245.0,13245.0,11213.0
mean,20077240.0,20190630000000.0,,,,78955190.0,,7.857856,7.857856,,55.950495,-3.198244,3.541185,1.226026,1.582735,2.032124,667.92,1.768139,2.890449,608.949943,2.731672,3.556663,604.539751,608.2604,3.056837,606.192269,7.563911,15.257984,26.832088,98.166478,37.725255,15.387844,95.024666,9.756305,9.589027,9.821593,9.832324,9.609171,9.48874,,5.44832,4.501397,0.910834,0.036089,1.93413
std,9844764.0,4.882997,,,,73674640.0,,41.214293,41.214293,,0.016005,0.037258,2.093676,0.545397,0.927004,1.579724,467.80052,1.504623,15.263121,542.925015,15.141566,15.952812,543.214251,542.420322,15.426088,542.487832,9.324226,16.985193,26.203676,111.798509,63.908517,22.853817,6.76945,0.662215,0.791137,0.576645,0.560235,0.65785,0.75456,,16.770714,16.639504,1.869773,0.517057,2.107862
min,15420.0,20190630000000.0,,,,33078.0,,0.0,0.0,,55.86454,-3.41834,1.0,0.0,0.0,0.0,0.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,20.0,2.0,2.0,2.0,2.0,2.0,2.0,,1.0,0.0,0.0,0.0,0.01
25%,13279110.0,20190630000000.0,,,,19151740.0,,1.0,1.0,,55.94056,-3.21376,2.0,1.0,1.0,1.0,484.0,1.0,1.0,30.0,1.0,2.0,29.0,30.0,1.2,30.0,0.0,0.0,0.0,0.0,2.0,0.0,93.0,10.0,9.0,10.0,10.0,9.0,9.0,,1.0,0.0,0.0,0.0,0.37
50%,20171840.0,20190630000000.0,,,,51407140.0,,1.0,1.0,,55.95092,-3.19396,3.0,1.0,1.0,2.0,635.0,1.0,2.0,1124.0,2.0,2.0,1124.0,1124.0,2.0,1124.0,3.0,10.0,22.0,50.0,12.0,5.0,97.0,10.0,10.0,10.0,10.0,10.0,10.0,,1.0,1.0,0.0,0.0,1.11
75%,27397920.0,20190630000000.0,,,,129196800.0,,3.0,3.0,,55.96086,-3.17735,4.0,1.0,2.0,3.0,915.0,2.0,2.0,1125.0,2.0,3.0,1125.0,1125.0,2.9,1125.0,13.0,25.0,45.0,161.0,45.0,21.0,99.0,10.0,10.0,10.0,10.0,10.0,10.0,,3.0,1.0,1.0,0.0,2.89
max,36066010.0,20190630000000.0,,,,271101400.0,,1067.0,1067.0,,55.99176,-3.07895,19.0,9.0,13.0,30.0,1630.0,16.0,1000.0,3600.0,1000.0,1000.0,3600.0,3600.0,1000.0,3600.0,30.0,60.0,90.0,365.0,773.0,185.0,100.0,10.0,10.0,10.0,10.0,10.0,10.0,,135.0,135.0,21.0,12.0,19.15


In [15]:
listings_all_df.head()

Unnamed: 0,id,listing_url,scrape_id,last_scraped,name,summary,space,description,experiences_offered,neighborhood_overview,notes,transit,access,interaction,house_rules,thumbnail_url,medium_url,picture_url,xl_picture_url,host_id,host_url,host_name,host_since,host_location,host_about,host_response_time,host_response_rate,host_acceptance_rate,host_is_superhost,host_thumbnail_url,host_picture_url,host_neighbourhood,host_listings_count,host_total_listings_count,host_verifications,host_has_profile_pic,host_identity_verified,street,neighbourhood,neighbourhood_cleansed,neighbourhood_group_cleansed,city,state,zipcode,market,smart_location,country_code,country,latitude,longitude,is_location_exact,property_type,room_type,accommodates,bathrooms,bedrooms,beds,bed_type,amenities,square_feet,price,weekly_price,monthly_price,security_deposit,cleaning_fee,guests_included,extra_people,minimum_nights,maximum_nights,minimum_minimum_nights,maximum_minimum_nights,minimum_maximum_nights,maximum_maximum_nights,minimum_nights_avg_ntm,maximum_nights_avg_ntm,calendar_updated,has_availability,availability_30,availability_60,availability_90,availability_365,calendar_last_scraped,number_of_reviews,number_of_reviews_ltm,first_review,last_review,review_scores_rating,review_scores_accuracy,review_scores_cleanliness,review_scores_checkin,review_scores_communication,review_scores_location,review_scores_value,requires_license,license,jurisdiction_names,instant_bookable,is_business_travel_ready,cancellation_policy,require_guest_profile_picture,require_guest_phone_verification,calculated_host_listings_count,calculated_host_listings_count_entire_homes,calculated_host_listings_count_private_rooms,calculated_host_listings_count_shared_rooms,reviews_per_month
0,15420,https://www.airbnb.com/rooms/15420,20190625184115,2019-06-25,Georgian Boutique Apt City Centre,"Stunning, impeccably refurbished spacious grou...",This is a huge and luxurious apartment for 2 -...,"Stunning, impeccably refurbished spacious grou...",none,"The neighbourhood is in the historic New Town,...",Please note that because of my interest in int...,It is easy to walk to many of the main tourist...,Guests have full access at the apartment. All...,Guests will be sent full details of what is su...,The apartment is strictly non-smoking and we r...,,,https://a0.muscache.com/im/pictures/cf69631f-4...,,60423,https://www.airbnb.com/users/show/60423,Charlotte,2009-12-06,"Edinburgh, Scotland, United Kingdom","I have a background in property, having worked...",within a few hours,100%,,t,https://a0.muscache.com/im/users/60423/profile...,https://a0.muscache.com/im/users/60423/profile...,,3.0,3.0,"['email', 'phone', 'manual_online', 'reviews',...",t,t,"Edinburgh, City of Edinburgh, United Kingdom",New Town,"Old Town, Princes Street and Leith Street",,Edinburgh,City of Edinburgh,EH1 3LD,Edinburgh,"Edinburgh, United Kingdom",GB,United Kingdom,55.95689,-3.18768,f,Apartment,Entire home/apt,2,1.0,1.0,1.0,Real Bed,"{TV,Internet,Wifi,""Wheelchair accessible"",Kitc...",861.0,$80.00,,,$200.00,$40.00,1,$0.00,3,30,1,3,30,30,2.9,30.0,today,t,6,12,22,193,2019-06-25,283,60,2011-01-18,2019-06-23,99.0,10.0,10.0,10.0,10.0,10.0,10.0,f,,,f,f,strict_14_with_grace_period,f,f,1,1,0,0,2.76
1,24288,https://www.airbnb.com/rooms/24288,20190625184115,2019-06-25,"Cool central Loft, sleeps 4, 2 double bed+en-s...",Boho rustic-chic former warehouse Loft located...,"Two bedroom, very central Loft apartment with ...",Boho rustic-chic former warehouse Loft located...,none,It's all in the mix: Culture-museums and galle...,The apartment is in the City centre so being a...,Walk to key central attractions or catch a bus...,The whole flat on the first floor and utility ...,Will meet guests on arrival and at check-out i...,Non smokers only and no smoking in the buildin...,,,https://a0.muscache.com/im/pictures/3460007/88...,,46498,https://www.airbnb.com/users/show/46498,Gordon,2009-10-17,"Edinburgh, Scotland, United Kingdom",Principal\r\nStudio DuB\r\nArchitecture & Plan...,within an hour,100%,,t,https://a0.muscache.com/im/users/46498/profile...,https://a0.muscache.com/im/users/46498/profile...,Southside,1.0,1.0,"['email', 'phone', 'reviews', 'offline_governm...",t,f,"Edinburgh, EH8 9JW, United Kingdom",Southside,Meadows and Southside,,Edinburgh,EH8 9JW,Scotland,Edinburgh,"Edinburgh, United Kingdom",GB,United Kingdom,55.94265,-3.18467,t,Loft,Entire home/apt,4,1.5,2.0,2.0,Real Bed,"{TV,Wifi,Kitchen,""Paid parking off premises"",E...",,$115.00,,,$250.00,$30.00,4,$25.00,2,365,2,2,365,365,2.0,365.0,a week ago,t,1,3,4,4,2019-06-25,199,45,2010-09-19,2019-06-19,92.0,10.0,9.0,10.0,10.0,10.0,9.0,f,,,t,f,flexible,f,f,1,1,0,0,1.86
2,38628,https://www.airbnb.com/rooms/38628,20190625184115,2019-06-26,Edinburgh Holiday Let,Brunstane - Daiches Braes (close to Portobello...,Check out (Website hidden by Airbnb) Free Wi-...,Brunstane - Daiches Braes (close to Portobello...,none,Quiet and easy access to outside.,1.Brunstane -Daiches Braes From the Airpor (W...,From the Airpor (Website hidden by Airbnb) Ge...,The owner will meet the guests on arrival if p...,On arrival - I live upstairs and can help gues...,No smoking in the apt- smoking permitted on t...,,,https://a0.muscache.com/im/pictures/d9885120-1...,,165635,https://www.airbnb.com/users/show/165635,Trish,2010-07-13,"Edinburgh, Scotland, United Kingdom",Hi \r\nI like travelling and housing projects ...,within an hour,100%,,f,https://a0.muscache.com/im/users/165635/profil...,https://a0.muscache.com/im/users/165635/profil...,,2.0,2.0,"['email', 'phone', 'reviews', 'jumio', 'govern...",t,t,"Edinburgh, City of Edinburgh, United Kingdom",,Joppa,,Edinburgh,City of Edinburgh,EH15 2,Edinburgh,"Edinburgh, United Kingdom",GB,United Kingdom,55.94308,-3.09525,t,Apartment,Entire home/apt,2,1.0,0.0,2.0,Real Bed,"{TV,Internet,Wifi,""Wheelchair accessible"",Kitc...",,$46.00,$280.00,$800.00,$100.00,,2,$10.00,4,60,4,6,60,60,4.2,60.0,today,t,9,29,31,288,2019-06-26,52,18,2014-06-13,2019-05-29,94.0,10.0,9.0,10.0,10.0,10.0,10.0,f,,,t,f,strict_14_with_grace_period,f,f,2,2,0,0,0.85
3,44552,https://www.airbnb.com/rooms/44552,20190625184115,2019-06-25,Double room - spacious Leith flat,Pleasant double room in 2 bedroom ground floor...,You will be staying in a pleasant double room ...,Pleasant double room in 2 bedroom ground floor...,none,,,There are frequent buses to the centre and oth...,"Bedroom, shared bathroom, shared kitchen, and ...",I am normally around to help with any question...,Please bring one piece of photo ID when you ch...,,,https://a0.muscache.com/im/pictures/454814/0e3...,,195950,https://www.airbnb.com/users/show/195950,Shaun,2010-08-09,"Edinburgh, Scotland, United Kingdom",Having travelled extensively and seen some gre...,within a few hours,100%,,t,https://a0.muscache.com/im/users/195950/profil...,https://a0.muscache.com/im/users/195950/profil...,,1.0,1.0,"['email', 'phone', 'reviews', 'jumio', 'govern...",t,t,"Edinburgh, City of Edinburgh, United Kingdom",Leith,South Leith,,Edinburgh,City of Edinburgh,EH6 8,Edinburgh,"Edinburgh, United Kingdom",GB,United Kingdom,55.966,-3.17241,t,Condominium,Private room,2,1.0,1.0,1.0,Real Bed,"{TV,""Cable TV"",Internet,Wifi,Kitchen,""Pets liv...",,$32.00,$230.00,$700.00,$0.00,$10.00,2,$0.00,2,21,2,2,21,21,2.0,21.0,3 days ago,t,18,25,45,136,2019-06-25,184,29,2010-08-16,2019-06-04,93.0,10.0,10.0,10.0,10.0,9.0,10.0,f,,,f,f,strict_14_with_grace_period,f,f,1,0,1,0,1.71
4,47616,https://www.airbnb.com/rooms/47616,20190625184115,2019-06-25,"City flat, close to nature and the Fringe",Annemarie & I would like to welcome you to our...,The flat has two floors with bedrooms and bath...,Annemarie & I would like to welcome you to our...,none,We're at the quiet end of a residential street...,"As we are both working office-hours, arrivals ...",There are 30 bus routes within 5 minutes' walk...,You will be sharing the flat with us and will ...,,Please enjoy your stay and help others to do t...,,,https://a0.muscache.com/im/pictures/49770983/3...,,216203,https://www.airbnb.com/users/show/216203,Ben,2010-08-29,"Edinburgh, Scotland, United Kingdom",I'm from Australia and live in Edinburgh with ...,within an hour,100%,,f,https://a0.muscache.com/im/pictures/7c0abffb-b...,https://a0.muscache.com/im/pictures/7c0abffb-b...,Southside,1.0,1.0,"['email', 'phone', 'reviews', 'jumio', 'govern...",t,t,"Edinburgh, City of Edinburgh, United Kingdom",Southside,"Canongate, Southside and Dumbiedykes",,Edinburgh,City of Edinburgh,EH8 9,Edinburgh,"Edinburgh, United Kingdom",GB,United Kingdom,55.94732,-3.17851,t,Condominium,Private room,2,1.0,1.0,1.0,Real Bed,"{TV,Wifi,Kitchen,""Paid parking off premises"",H...",,$100.00,,,$75.00,$10.00,2,$0.00,1,31,1,1,31,31,1.0,31.0,2 months ago,t,0,0,0,0,2019-06-25,32,9,2016-05-08,2019-05-26,98.0,10.0,10.0,10.0,10.0,10.0,10.0,f,,,f,f,moderate,f,f,1,0,1,0,0.84


In [16]:
listings_all_df.columns

Index(['id', 'listing_url', 'scrape_id', 'last_scraped', 'name', 'summary',
       'space', 'description', 'experiences_offered', 'neighborhood_overview',
       ...
       'instant_bookable', 'is_business_travel_ready', 'cancellation_policy',
       'require_guest_profile_picture', 'require_guest_phone_verification',
       'calculated_host_listings_count',
       'calculated_host_listings_count_entire_homes',
       'calculated_host_listings_count_private_rooms',
       'calculated_host_listings_count_shared_rooms', 'reviews_per_month'],
      dtype='object', length=106)

In [17]:
listings_all_df.isna().sum().sort_values(ascending=False)

jurisdiction_names                              13245
neighbourhood_group_cleansed                    13245
xl_picture_url                                  13245
host_acceptance_rate                            13245
thumbnail_url                                   13245
medium_url                                      13245
square_feet                                     13220
license                                         13207
monthly_price                                   12457
weekly_price                                    12069
notes                                            7079
host_neighbourhood                               5687
host_about                                       5609
house_rules                                      5092
security_deposit                                 4743
access                                           4187
neighborhood_overview                            3773
interaction                                      3746
cleaning_fee                

In [18]:
listings_all_df.drop(columns=['jurisdiction_names','neighbourhood_group_cleansed','xl_picture_url','host_acceptance_rate','thumbnail_url',
                             'medium_url','square_feet','license','monthly_price','weekly_price'],axis=1,inplace=True)
#columns dropped due to 90% to 100% of data being missing

In [19]:
listings_all_df.columns

Index(['id', 'listing_url', 'scrape_id', 'last_scraped', 'name', 'summary',
       'space', 'description', 'experiences_offered', 'neighborhood_overview',
       'notes', 'transit', 'access', 'interaction', 'house_rules',
       'picture_url', 'host_id', 'host_url', 'host_name', 'host_since',
       'host_location', 'host_about', 'host_response_time',
       'host_response_rate', 'host_is_superhost', 'host_thumbnail_url',
       'host_picture_url', 'host_neighbourhood', 'host_listings_count',
       'host_total_listings_count', 'host_verifications',
       'host_has_profile_pic', 'host_identity_verified', 'street',
       'neighbourhood', 'neighbourhood_cleansed', 'city', 'state', 'zipcode',
       'market', 'smart_location', 'country_code', 'country', 'latitude',
       'longitude', 'is_location_exact', 'property_type', 'room_type',
       'accommodates', 'bathrooms', 'bedrooms', 'beds', 'bed_type',
       'amenities', 'price', 'security_deposit', 'cleaning_fee',
       'guests_includ

In [20]:
listings_info_all=pd.merge(listings_df,listings_all_df, on='id')

In [21]:
listings_info_all.head()

Unnamed: 0,id,name_x,host_id_x,host_name_x,neighbourhood_x,latitude_x,longitude_x,room_type_x,price_x,minimum_nights_x,number_of_reviews_x,last_review_x,reviews_per_month_x,calculated_host_listings_count_x,availability_365_x,listing_url,scrape_id,last_scraped,name_y,summary,space,description,experiences_offered,neighborhood_overview,notes,transit,access,interaction,house_rules,picture_url,host_id_y,host_url,host_name_y,host_since,host_location,host_about,host_response_time,host_response_rate,host_is_superhost,host_thumbnail_url,host_picture_url,host_neighbourhood,host_listings_count,host_total_listings_count,host_verifications,host_has_profile_pic,host_identity_verified,street,neighbourhood_y,neighbourhood_cleansed,city,state,zipcode,market,smart_location,country_code,country,latitude_y,longitude_y,is_location_exact,property_type,room_type_y,accommodates,bathrooms,bedrooms,beds,bed_type,amenities,price_y,security_deposit,cleaning_fee,guests_included,extra_people,minimum_nights_y,maximum_nights,minimum_minimum_nights,maximum_minimum_nights,minimum_maximum_nights,maximum_maximum_nights,minimum_nights_avg_ntm,maximum_nights_avg_ntm,calendar_updated,has_availability,availability_30,availability_60,availability_90,availability_365_y,calendar_last_scraped,number_of_reviews_y,number_of_reviews_ltm,first_review,last_review_y,review_scores_rating,review_scores_accuracy,review_scores_cleanliness,review_scores_checkin,review_scores_communication,review_scores_location,review_scores_value,requires_license,instant_bookable,is_business_travel_ready,cancellation_policy,require_guest_profile_picture,require_guest_phone_verification,calculated_host_listings_count_y,calculated_host_listings_count_entire_homes,calculated_host_listings_count_private_rooms,calculated_host_listings_count_shared_rooms,reviews_per_month_y
0,15420,Georgian Boutique Apt City Centre,60423,Charlotte,"Old Town, Princes Street and Leith Street",55.95689,-3.18768,Entire home/apt,80,3,283,2019-06-23,2.76,1,193,https://www.airbnb.com/rooms/15420,20190625184115,2019-06-25,Georgian Boutique Apt City Centre,"Stunning, impeccably refurbished spacious grou...",This is a huge and luxurious apartment for 2 -...,"Stunning, impeccably refurbished spacious grou...",none,"The neighbourhood is in the historic New Town,...",Please note that because of my interest in int...,It is easy to walk to many of the main tourist...,Guests have full access at the apartment. All...,Guests will be sent full details of what is su...,The apartment is strictly non-smoking and we r...,https://a0.muscache.com/im/pictures/cf69631f-4...,60423,https://www.airbnb.com/users/show/60423,Charlotte,2009-12-06,"Edinburgh, Scotland, United Kingdom","I have a background in property, having worked...",within a few hours,100%,t,https://a0.muscache.com/im/users/60423/profile...,https://a0.muscache.com/im/users/60423/profile...,,3.0,3.0,"['email', 'phone', 'manual_online', 'reviews',...",t,t,"Edinburgh, City of Edinburgh, United Kingdom",New Town,"Old Town, Princes Street and Leith Street",Edinburgh,City of Edinburgh,EH1 3LD,Edinburgh,"Edinburgh, United Kingdom",GB,United Kingdom,55.95689,-3.18768,f,Apartment,Entire home/apt,2,1.0,1.0,1.0,Real Bed,"{TV,Internet,Wifi,""Wheelchair accessible"",Kitc...",$80.00,$200.00,$40.00,1,$0.00,3,30,1,3,30,30,2.9,30.0,today,t,6,12,22,193,2019-06-25,283,60,2011-01-18,2019-06-23,99.0,10.0,10.0,10.0,10.0,10.0,10.0,f,f,f,strict_14_with_grace_period,f,f,1,1,0,0,2.76
1,24288,"Cool central Loft, sleeps 4, 2 double bed+en-s...",46498,Gordon,Meadows and Southside,55.94265,-3.18467,Entire home/apt,115,2,199,2019-06-19,1.86,1,4,https://www.airbnb.com/rooms/24288,20190625184115,2019-06-25,"Cool central Loft, sleeps 4, 2 double bed+en-s...",Boho rustic-chic former warehouse Loft located...,"Two bedroom, very central Loft apartment with ...",Boho rustic-chic former warehouse Loft located...,none,It's all in the mix: Culture-museums and galle...,The apartment is in the City centre so being a...,Walk to key central attractions or catch a bus...,The whole flat on the first floor and utility ...,Will meet guests on arrival and at check-out i...,Non smokers only and no smoking in the buildin...,https://a0.muscache.com/im/pictures/3460007/88...,46498,https://www.airbnb.com/users/show/46498,Gordon,2009-10-17,"Edinburgh, Scotland, United Kingdom",Principal\r\nStudio DuB\r\nArchitecture & Plan...,within an hour,100%,t,https://a0.muscache.com/im/users/46498/profile...,https://a0.muscache.com/im/users/46498/profile...,Southside,1.0,1.0,"['email', 'phone', 'reviews', 'offline_governm...",t,f,"Edinburgh, EH8 9JW, United Kingdom",Southside,Meadows and Southside,Edinburgh,EH8 9JW,Scotland,Edinburgh,"Edinburgh, United Kingdom",GB,United Kingdom,55.94265,-3.18467,t,Loft,Entire home/apt,4,1.5,2.0,2.0,Real Bed,"{TV,Wifi,Kitchen,""Paid parking off premises"",E...",$115.00,$250.00,$30.00,4,$25.00,2,365,2,2,365,365,2.0,365.0,a week ago,t,1,3,4,4,2019-06-25,199,45,2010-09-19,2019-06-19,92.0,10.0,9.0,10.0,10.0,10.0,9.0,f,t,f,flexible,f,f,1,1,0,0,1.86
2,38628,Edinburgh Holiday Let,165635,Trish,Joppa,55.94308,-3.09525,Entire home/apt,46,4,52,2019-05-29,0.85,2,288,https://www.airbnb.com/rooms/38628,20190625184115,2019-06-26,Edinburgh Holiday Let,Brunstane - Daiches Braes (close to Portobello...,Check out (Website hidden by Airbnb) Free Wi-...,Brunstane - Daiches Braes (close to Portobello...,none,Quiet and easy access to outside.,1.Brunstane -Daiches Braes From the Airpor (W...,From the Airpor (Website hidden by Airbnb) Ge...,The owner will meet the guests on arrival if p...,On arrival - I live upstairs and can help gues...,No smoking in the apt- smoking permitted on t...,https://a0.muscache.com/im/pictures/d9885120-1...,165635,https://www.airbnb.com/users/show/165635,Trish,2010-07-13,"Edinburgh, Scotland, United Kingdom",Hi \r\nI like travelling and housing projects ...,within an hour,100%,f,https://a0.muscache.com/im/users/165635/profil...,https://a0.muscache.com/im/users/165635/profil...,,2.0,2.0,"['email', 'phone', 'reviews', 'jumio', 'govern...",t,t,"Edinburgh, City of Edinburgh, United Kingdom",,Joppa,Edinburgh,City of Edinburgh,EH15 2,Edinburgh,"Edinburgh, United Kingdom",GB,United Kingdom,55.94308,-3.09525,t,Apartment,Entire home/apt,2,1.0,0.0,2.0,Real Bed,"{TV,Internet,Wifi,""Wheelchair accessible"",Kitc...",$46.00,$100.00,,2,$10.00,4,60,4,6,60,60,4.2,60.0,today,t,9,29,31,288,2019-06-26,52,18,2014-06-13,2019-05-29,94.0,10.0,9.0,10.0,10.0,10.0,10.0,f,t,f,strict_14_with_grace_period,f,f,2,2,0,0,0.85
3,44552,Double room - spacious Leith flat,195950,Shaun,South Leith,55.966,-3.17241,Private room,32,2,184,2019-06-04,1.71,1,136,https://www.airbnb.com/rooms/44552,20190625184115,2019-06-25,Double room - spacious Leith flat,Pleasant double room in 2 bedroom ground floor...,You will be staying in a pleasant double room ...,Pleasant double room in 2 bedroom ground floor...,none,,,There are frequent buses to the centre and oth...,"Bedroom, shared bathroom, shared kitchen, and ...",I am normally around to help with any question...,Please bring one piece of photo ID when you ch...,https://a0.muscache.com/im/pictures/454814/0e3...,195950,https://www.airbnb.com/users/show/195950,Shaun,2010-08-09,"Edinburgh, Scotland, United Kingdom",Having travelled extensively and seen some gre...,within a few hours,100%,t,https://a0.muscache.com/im/users/195950/profil...,https://a0.muscache.com/im/users/195950/profil...,,1.0,1.0,"['email', 'phone', 'reviews', 'jumio', 'govern...",t,t,"Edinburgh, City of Edinburgh, United Kingdom",Leith,South Leith,Edinburgh,City of Edinburgh,EH6 8,Edinburgh,"Edinburgh, United Kingdom",GB,United Kingdom,55.966,-3.17241,t,Condominium,Private room,2,1.0,1.0,1.0,Real Bed,"{TV,""Cable TV"",Internet,Wifi,Kitchen,""Pets liv...",$32.00,$0.00,$10.00,2,$0.00,2,21,2,2,21,21,2.0,21.0,3 days ago,t,18,25,45,136,2019-06-25,184,29,2010-08-16,2019-06-04,93.0,10.0,10.0,10.0,10.0,9.0,10.0,f,f,f,strict_14_with_grace_period,f,f,1,0,1,0,1.71
4,47616,"City flat, close to nature and the Fringe",216203,Ben,"Canongate, Southside and Dumbiedykes",55.94732,-3.17851,Private room,100,1,32,2019-05-26,0.84,1,0,https://www.airbnb.com/rooms/47616,20190625184115,2019-06-25,"City flat, close to nature and the Fringe",Annemarie & I would like to welcome you to our...,The flat has two floors with bedrooms and bath...,Annemarie & I would like to welcome you to our...,none,We're at the quiet end of a residential street...,"As we are both working office-hours, arrivals ...",There are 30 bus routes within 5 minutes' walk...,You will be sharing the flat with us and will ...,,Please enjoy your stay and help others to do t...,https://a0.muscache.com/im/pictures/49770983/3...,216203,https://www.airbnb.com/users/show/216203,Ben,2010-08-29,"Edinburgh, Scotland, United Kingdom",I'm from Australia and live in Edinburgh with ...,within an hour,100%,f,https://a0.muscache.com/im/pictures/7c0abffb-b...,https://a0.muscache.com/im/pictures/7c0abffb-b...,Southside,1.0,1.0,"['email', 'phone', 'reviews', 'jumio', 'govern...",t,t,"Edinburgh, City of Edinburgh, United Kingdom",Southside,"Canongate, Southside and Dumbiedykes",Edinburgh,City of Edinburgh,EH8 9,Edinburgh,"Edinburgh, United Kingdom",GB,United Kingdom,55.94732,-3.17851,t,Condominium,Private room,2,1.0,1.0,1.0,Real Bed,"{TV,Wifi,Kitchen,""Paid parking off premises"",H...",$100.00,$75.00,$10.00,2,$0.00,1,31,1,1,31,31,1.0,31.0,2 months ago,t,0,0,0,0,2019-06-25,32,9,2016-05-08,2019-05-26,98.0,10.0,10.0,10.0,10.0,10.0,10.0,f,f,f,moderate,f,f,1,0,1,0,0.84


In [22]:
print(listings_info_all.columns)

Index(['id', 'name_x', 'host_id_x', 'host_name_x', 'neighbourhood_x',
       'latitude_x', 'longitude_x', 'room_type_x', 'price_x',
       'minimum_nights_x',
       ...
       'instant_bookable', 'is_business_travel_ready', 'cancellation_policy',
       'require_guest_profile_picture', 'require_guest_phone_verification',
       'calculated_host_listings_count_y',
       'calculated_host_listings_count_entire_homes',
       'calculated_host_listings_count_private_rooms',
       'calculated_host_listings_count_shared_rooms', 'reviews_per_month_y'],
      dtype='object', length=110)


In [23]:
listings_info_all.drop(columns=['name_y','host_id_y','host_name_y','minimum_nights_y','availability_365_y','number_of_reviews_y',
                               'reviews_per_month_y','latitude_y','longitude_y','neighbourhood_y',
                                'room_type_y','price_y','last_review_y',
                                'calculated_host_listings_count_y'],inplace=True) #drop duplicate columns

In [27]:
listings_info_all.isnull().sum().sort_values(ascending=False) #check for null/major magority will be deleted

notes                                           7079
host_neighbourhood                              5687
host_about                                      5609
house_rules                                     5092
security_deposit                                4743
access                                          4187
neighborhood_overview                           3773
interaction                                     3746
cleaning_fee                                    3741
transit                                         3559
host_response_rate                              3389
host_response_time                              3389
space                                           2905
state                                           2550
review_scores_location                          2189
review_scores_value                             2188
review_scores_checkin                           2186
review_scores_accuracy                          2182
review_scores_communication                   

In [28]:
listings_info_all.replace(np.nan,0,inplace=True)

In [29]:
neighborhood_group_df=pd.read_csv('../edinburgh-inside-airbnb/neighbourhoods.csv')

In [30]:
neighborhood_group_df.neighbourhood_group.isnull().sum()

111

In [31]:
neighborhood_group_df.describe()

Unnamed: 0,neighbourhood_group
count,0.0
mean,
std,
min,
25%,
50%,
75%,
max,


Neighborhood group data frame can be disregarded because all data is present in other data frames.

In [None]:
reviews_sum_df=pd.read_csv('../edinburgh-inside-airbnb/reviews-summary.csv')

In [None]:
reviews_sum_df.head() #

Data disregarded due to being contained in reviews csv

In [32]:
reviews_df=pd.read_csv('../edinburgh-inside-airbnb/reviews.csv')

In [33]:
reviews_df.head()

Unnamed: 0,listing_id,id,date,reviewer_id,reviewer_name,comments
0,15420,171793,2011-01-18,186358,Nels,My wife and I stayed at this beautiful apartme...
1,15420,176350,2011-01-31,95218,Gareth,Charlotte couldn't have been a more thoughtful...
2,15420,232149,2011-04-19,429751,Guido,I went to Edinburgh for the second time on Apr...
3,15420,236073,2011-04-23,420830,Mariah,This flat was incredible. As other guests have...
4,15420,263713,2011-05-15,203827,Linda,Fantastic host and the apartment was perfect. ...


In [34]:
reviews_df.comments=reviews_df.comments.astype(str) #cast as string for sentiment analysis

In [35]:
corpus = list(reviews_df['comments']) #move all into a list for sentiment analysis
corpus[0]

"My wife and I stayed at this beautiful apartment and our stay was spectacular.  The neighborhood is very cute.  We stayed for a full week and enjoyed going to all of the local cafes and restaurants.  We really enjoyed being within easy walking distance of all of the major tourist sites while not being surrounded by tourist traps.  I have recommended this apartment to my wife's parents who are thinking about visiting Edinburgh in the future. Charlotte was extremely helpful in getting us information about the next leg of our trip (visiting the Isle of Mull) and in making us feel welcomed.  My wife and I have visited four Airbnb apartments so far and this one is the best."

In [36]:
sia=SentimentIntensityAnalyzer() #sentiment analysis instatiation
sia.polarity_scores(corpus[0])

{'neg': 0.0, 'neu': 0.786, 'pos': 0.214, 'compound': 0.9816}

Below we are creating a for loop to rate the positive and negative value of each review. This will be used in order to provide an NPS for each listing.

In [37]:
all_scores=[] 
for title in corpus:
    scores=sia.polarity_scores(title)
    all_scores.append(scores)
scores=pd.DataFrame(all_scores)
scores.tail() #for loop for all sentiment analysis of reviews

Unnamed: 0,neg,neu,pos,compound
499666,0.0,0.657,0.343,0.8553
499667,0.0,0.33,0.67,0.9042
499668,0.0,0.951,0.049,0.4215
499669,0.0,1.0,0.0,0.0
499670,0.0,0.508,0.492,0.927


Data frames are joined in order to create a file to show the aggregated average for each listing.

In [38]:
reviews_info=pd.concat([reviews_df,scores],axis=1)
reviews_info.head() #create dataframe for reviews

Unnamed: 0,listing_id,id,date,reviewer_id,reviewer_name,comments,neg,neu,pos,compound
0,15420,171793,2011-01-18,186358,Nels,My wife and I stayed at this beautiful apartme...,0.0,0.786,0.214,0.9816
1,15420,176350,2011-01-31,95218,Gareth,Charlotte couldn't have been a more thoughtful...,0.0,0.835,0.165,0.8583
2,15420,232149,2011-04-19,429751,Guido,I went to Edinburgh for the second time on Apr...,0.0,0.684,0.316,0.9886
3,15420,236073,2011-04-23,420830,Mariah,This flat was incredible. As other guests have...,0.0,0.907,0.093,0.8385
4,15420,263713,2011-05-15,203827,Linda,Fantastic host and the apartment was perfect. ...,0.0,0.513,0.487,0.9637


In [39]:
review_scores=reviews_info.groupby(by='listing_id',as_index=False).mean()
review_scores.head()

Unnamed: 0,listing_id,id,reviewer_id,neg,neu,pos,compound
0,15420,148706100.0,44070590.0,0.010307,0.676505,0.313194,0.910205
1,24288,143792700.0,44045580.0,0.01396,0.661819,0.324206,0.780775
2,38628,233678300.0,102155100.0,0.017538,0.742846,0.239596,0.6718
3,44552,91482860.0,32309300.0,0.013418,0.708614,0.277951,0.797253
4,47616,192696100.0,64302110.0,0.015125,0.659844,0.325063,0.814244


Superflous columns are dropped prior to joining.

In [40]:
review_scores.drop(columns=['id','reviewer_id'],inplace=True) #drop unnecessary columns
review_scores.head()

Unnamed: 0,listing_id,neg,neu,pos,compound
0,15420,0.010307,0.676505,0.313194,0.910205
1,24288,0.01396,0.661819,0.324206,0.780775
2,38628,0.017538,0.742846,0.239596,0.6718
3,44552,0.013418,0.708614,0.277951,0.797253
4,47616,0.015125,0.659844,0.325063,0.814244


Final Data frame is merged in order to have all data for EDA in one place.

In [41]:
listing_with_reviews=listings_info_all.merge(review_scores,how='left',left_on='id',right_on='listing_id')

Calendar is disregarded due to data being present in previous dataframes.

In [42]:
calendar_df=pd.read_csv('../edinburgh-inside-airbnb/calendar.csv')
calendar_df.head() #disregard due to pricing in other dataframe

Unnamed: 0,listing_id,date,available,price,adjusted_price,minimum_nights,maximum_nights
0,568026,2019-06-25,f,$64.00,$64.00,3.0,60.0
1,327159,2019-06-25,t,$113.00,$113.00,1.0,1.0
2,327159,2019-06-26,t,$113.00,$113.00,1.0,56.0
3,327159,2019-06-27,t,$113.00,$113.00,1.0,56.0
4,327159,2019-06-28,t,$113.00,$113.00,1.0,56.0


In [44]:
listing_with_reviews.to_csv('./Data/listing_with_reviews.csv')