In [1]:
import pandas as pd
import ast
import plotly.express as px

In [2]:
listings_gz = pd.read_csv('data/listings.csv.gz', compression='gzip')
amenities = listings_gz["amenities"].apply(ast.literal_eval)

# How does amenities affect the price of a listing?

Looking into which amenities are most prevalent in the listings in the Copenhagen area, we wish to see which amenities are the most common, whether there is a price discrepancy between inclusion of certain amenities. 

We start off by getting an overview of which amenities are the most common.

In [3]:
amenities = listings_gz["amenities"].apply(ast.literal_eval)
amenities_count = amenities.explode().value_counts()
amenities_count[:15]

Kitchen                    13304
Wifi                       12203
Essentials                 11386
Smoke alarm                 9777
Dishes and silverware       9640
Hot water                   9487
Long term stays allowed     9472
Refrigerator                9180
Heating                     9132
Cooking basics              8893
Hair dryer                  8600
Iron                        8496
Hangers                     7825
Bed linens                  7616
Washer                      7282
Name: amenities, dtype: int64

Here we see from the 13815 total listings that most of them have a kitchen, wifi and essentials as listed amenities. If we were to compare the price difference between listings with and without these amenities we would run the risk of having too few listings without. Therefore, we would like to focus on amenities which are a bit more balanced in that regard, so we'll go for those between the 5-9000 range.

In [4]:
amenities_of_interest = amenities_count[(amenities_count > 5000) & (amenities_count < 9000)].keys()

prices = {am: {"Without": 0, "With": 0} for am in amenities_of_interest}
for w, b in zip(["Without", "With"], [lambda x: amenity not in x, lambda x: amenity in x]):
    for amenity in amenities_of_interest:
        prices[amenity][w] = listings_gz[amenities.apply(b)]["price"].apply(lambda x: float(x[1:].replace(",",""))).median()

Having collected the amenities we wish to look at along with their prices both including and excluding the amenities, we can now look at the prices difference between the two.

In [5]:
px.bar(pd.DataFrame(prices).T, barmode="group", 
       title="Median price of listings with and without amenities", 
       labels={"index": "Amenity", "value": "Price", "variable": "With or without"})