In [1]:
import pandas as pd
import numpy as np

## Top Amenities on AirBnB listings
#### Goals for this step

- Find out which are the most common amenities for the higher review scores
- Find out the average scores AirBnB who do not have 50% of the basic amenities


In [2]:
df = pd.read_pickle("C:/Users/Admin/Documents/ironhack/AirBnB_data/airbnb_amenities.pkl")

In [3]:
df.head()

amenity,wireless internet,kitchen,heating,essentials,washer,tv,smoke detector,internet,hangers,shampoo,...,cleaning before checkout,accessible-height toilet,handheld shower head,fireplace guards,baby monitor,hot water kettle,wide clearance to shower & toilet,firm mattress,review_score,price
0,1,0,1,1,0,1,0,1,0,1,...,0,0,0,0,0,0,0,0,90.0,74.0
1,1,1,1,1,0,0,0,1,1,0,...,0,0,0,0,0,0,0,0,87.0,55.0
2,1,1,1,1,0,1,0,0,0,0,...,0,0,0,0,0,0,0,0,100.0,993.0
3,1,1,1,1,1,1,1,1,1,1,...,0,0,0,0,0,0,0,0,100.0,697.0
4,1,1,1,1,0,0,1,1,0,1,...,0,0,0,0,0,0,0,0,94.0,424.0


## Find out top amenities

In [4]:
df["review_score"].describe()

count    379055.000000
mean         93.091855
std           8.374772
min          20.000000
25%          90.000000
50%          95.000000
75%         100.000000
max         100.000000
Name: review_score, dtype: float64

We will focus on review scores greater than or equal to 95, those will be our top reviews

In [5]:
top_reviews = df.loc[df["review_score"] >= 95]

In [6]:
top_reviews.sum().sort_values(ascending=False)

amenity
price                                29850500.0
review_score                         20425141.0
wireless internet                      201326.0
heating                                192974.0
kitchen                                192957.0
                                        ...    
fireplace guards                          475.0
baby monitor                              471.0
hot water kettle                          370.0
wide clearance to shower & toilet         321.0
firm mattress                             281.0
Length: 102, dtype: float64

When entering a new listing on AirBnB, it gives you a checklist of amenities that guests usually expect. We will create a list of basic amenities and remove them from the analysis to see what are the extra amenities that get higher review scores.

In [7]:
amenities_top = list(top_reviews.columns)

In [8]:
amenities_top.sort()

In [39]:
top_reviews.sum().sort_values(ascending=False).head(50)

amenity
price                         29850500.0
review_score                  20425141.0
wireless internet               201326.0
heating                         192974.0
kitchen                         192957.0
essentials                      188023.0
washer                          158060.0
tv                              153821.0
smoke detector                  147089.0
hangers                         140932.0
shampoo                         139606.0
hair dryer                      132669.0
internet                        131895.0
iron                            130258.0
laptop friendly workspace       121962.0
family/kid friendly             119215.0
dryer                           100994.0
air conditioning                 92092.0
carbon monoxide detector         88061.0
first aid kit                    76658.0
fire extinguisher                76280.0
free parking on premises         68019.0
cable tv                         64438.0
buzzer/wireless intercom         59590.0
24-hour 

We will create the list of basic amenities based on the airbnb "basic" amenities checklist.
Wifi was defined as "wireless internet", desk/workspace as "laptop friendly workspace" and closet/drawers to "hangers" based on the list of amenities in our data set.

In [9]:
basic_amenities = ["essentials", "wireless internet", "internet", "tv", "heating", "air conditioning", "iron", "shampoo"
                   , "hair dryer", "breakfast", "laptop friendly workspace", "indoor fireplace", "hangers", "private entrance"
                  , "smoke detector", "carbon monoxide detector", "fire extinguisher", "first aid kit"
                   , "lock on bedroom door"]


In [10]:
new_columns = []

for i in df.columns:
    if i not in basic_amenities:
        new_columns.append(i)


In [11]:
top_amenities_new = top_reviews[new_columns].reset_index(drop=True).copy()

top_amenities_new.head()

amenity,kitchen,washer,family/kid friendly,dryer,buzzer/wireless intercom,cable tv,free parking on premises,24-hour check-in,elevator in building,safety card,...,cleaning before checkout,accessible-height toilet,handheld shower head,fireplace guards,baby monitor,hot water kettle,wide clearance to shower & toilet,firm mattress,review_score,price
0,1,0,1,0,1,0,0,0,0,0,...,0,0,0,0,0,0,0,0,100.0,993.0
1,1,1,0,1,1,1,0,1,1,0,...,0,0,0,0,0,0,0,0,100.0,697.0
2,1,1,1,0,0,0,1,0,0,0,...,0,0,0,0,0,0,0,0,100.0,503.0
3,1,0,1,0,0,0,0,1,0,0,...,0,0,0,0,0,0,0,0,97.0,450.0
4,1,1,0,0,0,0,0,0,1,0,...,0,0,0,0,0,0,0,0,100.0,95.0


In [12]:
top_amenities_new.sum().head(10)

amenity
kitchen                     192957.0
washer                      158060.0
family/kid friendly         119215.0
dryer                       100994.0
buzzer/wireless intercom     59590.0
cable tv                     64438.0
free parking on premises     68019.0
24-hour check-in             56221.0
elevator in building         46873.0
safety card                  34405.0
dtype: float64

From the list on top we can see the most common amenities on AirBnBs that got the higher review scores.

## Find out average review scores for listings with 50% of the required amenities

In [13]:
df

amenity,wireless internet,kitchen,heating,essentials,washer,tv,smoke detector,internet,hangers,shampoo,...,cleaning before checkout,accessible-height toilet,handheld shower head,fireplace guards,baby monitor,hot water kettle,wide clearance to shower & toilet,firm mattress,review_score,price
0,1,0,1,1,0,1,0,1,0,1,...,0,0,0,0,0,0,0,0,90.0,74.0
1,1,1,1,1,0,0,0,1,1,0,...,0,0,0,0,0,0,0,0,87.0,55.0
2,1,1,1,1,0,1,0,0,0,0,...,0,0,0,0,0,0,0,0,100.0,993.0
3,1,1,1,1,1,1,1,1,1,1,...,0,0,0,0,0,0,0,0,100.0,697.0
4,1,1,1,1,0,0,1,1,0,1,...,0,0,0,0,0,0,0,0,94.0,424.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
379050,1,1,1,1,1,1,1,0,1,1,...,0,0,0,0,0,0,0,0,100.0,150.0
379051,1,1,1,1,1,1,0,1,0,0,...,0,0,0,0,0,0,0,0,96.0,35.0
379052,1,1,1,1,1,0,1,1,1,1,...,0,0,0,0,0,0,0,0,97.0,110.0
379053,1,1,1,1,1,1,1,0,1,0,...,0,0,0,0,0,0,0,0,94.0,69.0


In [14]:
df_basic_amenities = df[basic_amenities].copy()

In [15]:
df_basic_amenities["amenities_sum"] = df_basic_amenities.sum(axis=1)

In [16]:
df_basic_amenities["has_basic"] = np.where(df_basic_amenities["amenities_sum"] > 8, 1, 0)

In [17]:
df_basic_amenities

amenity,essentials,wireless internet,internet,tv,heating,air conditioning,iron,shampoo,hair dryer,breakfast,...,indoor fireplace,hangers,private entrance,smoke detector,carbon monoxide detector,fire extinguisher,first aid kit,lock on bedroom door,amenities_sum,has_basic
0,1,1,1,1,1,1,0,1,0,0,...,0,0,0,0,0,1,1,0,9,1
1,1,1,1,0,1,0,1,0,1,0,...,0,1,1,0,0,0,0,0,8,0
2,1,1,0,1,1,0,0,0,0,0,...,0,0,0,0,0,0,0,0,4,0
3,1,1,1,1,1,0,1,1,1,0,...,0,1,0,1,1,0,0,0,12,1
4,1,1,1,0,1,0,0,1,1,0,...,0,0,0,1,0,0,0,0,8,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
379050,1,1,0,1,1,1,0,1,1,0,...,0,1,0,1,1,1,1,1,13,1
379051,1,1,1,1,1,0,1,0,1,0,...,0,0,0,0,0,0,0,0,7,0
379052,1,1,1,0,1,0,1,1,0,0,...,1,1,0,1,1,1,0,0,11,1
379053,1,1,0,1,1,1,0,0,0,1,...,1,1,0,1,0,0,0,0,9,1


In [18]:
df["has_basic"] = df_basic_amenities["has_basic"]
df["amenities_sum"] = df_basic_amenities["amenities_sum"]


In [19]:
df[df["has_basic"] == 0]["review_score"].describe()

count    144323.000000
mean         91.506336
std           9.568423
min          20.000000
25%          88.000000
50%          93.000000
75%         100.000000
max         100.000000
Name: review_score, dtype: float64

In [20]:
df_basic_amenities["amenities_sum"].value_counts().sort_index()

0      1170
1       773
2      2849
3      7570
4     15196
5     22296
6     27567
7     31784
8     35118
9     38067
10    38455
11    36896
12    34223
13    31678
14    24764
15    17711
16     9195
17     3025
18      653
19       65
Name: amenities_sum, dtype: int64

In [21]:
df[df["amenities_sum"] == 0]["review_score"].describe()

count    1170.000000
mean       90.967521
std        12.391765
min        20.000000
25%        86.000000
50%        95.000000
75%       100.000000
max       100.000000
Name: review_score, dtype: float64

In [36]:
df_rs20 = df[df["review_score"] == 20].reset_index(drop=True).copy().drop(columns=["price", "amenities_sum","has_basic"])

In [37]:
df_rs20.sum().sort_values(ascending=False).head(50)

amenity
review_score                  11480.0
kitchen                         517.0
wireless internet               510.0
essentials                      452.0
heating                         450.0
washer                          399.0
tv                              349.0
smoke detector                  317.0
family/kid friendly             311.0
shampoo                         298.0
hangers                         297.0
internet                        252.0
hair dryer                      245.0
laptop friendly workspace       238.0
iron                            238.0
air conditioning                231.0
dryer                           213.0
carbon monoxide detector        168.0
fire extinguisher               161.0
elevator in building            159.0
first aid kit                   151.0
free parking on premises        142.0
buzzer/wireless intercom        112.0
lock on bedroom door            109.0
smoking allowed                 106.0
cable tv                         99.0
24-h

In [None]:
["essentials", "wireless internet", "internet", "tv", "heating"
                                           , "air conditioning", "iron", "shampoo", "hair dryer", "breakfast"
                                           , "laptop friendly workspace", "indoor fireplace", "hangers"
                                           , "private entrance", "smoke detector", "carbon monoxide detector"
                                           , "fire extinguisher", "first aid kit", "lock on bedroom door"]




countries_amenities = {"United Kingdon" : ["essentials", "wireless internet", "internet", "tv", "heating"
                                           , "iron", "shampoo", "hair dryer", "breakfast"
                                           , "laptop friendly workspace", "indoor fireplace", "hangers"
                                           , "private entrance", "smoke detector", "carbon monoxide detector"
                                           , "fire extinguisher", "first aid kit", "lock on bedroom door"]
                       , "France"
 }
United Kingdom     45143
France             41202
Spain              36008
Australia          28856
Canada             25358
Italy              24350
Denmark            16111
Netherlands        15659
Germany            15124
Austria             7289
Belgium             5735
Ireland             5234
Hong Kong           4453
Greece              3828
Switzerland         1699