# Exploring Airbnb listings data in Seattle

## Introduction

This notebook analyzes the Seattle Airbnb listing data that was scraped on March 19, 2021. I will use the Pandas techniques to shape the data and use visualization tools to represent the data.


**Context**

Airbnb has successfully disrupted the traditional hospitality industry as more and more travelers decide to use Airbnb as their primary accommodation provider. Since its beginning in 2008, Airbnb has seen enormous growth, with the number of rentals listed on its website growing each year exponentially.

Seattle is a seaport city on the West Coast of the United States. It is the seat of King County, Washington. With a 2019 population of 753,675, it is the largest city in both the state of Washington and the Pacific Northwest region of North America.

**Why this dataset**

This dataset is interesting to me because I wonder what the relationship is between the number of different Airbnb room types and the locations. In particular, this notebook will attempt to explore the following questions.

- What are the top locations and their listing prices among all listings? 
- What are the top room types and their listing prices among all listings?
- What are the respective mean prices for the most popolar room type and in the most popular location? 

To this end, I will focus on analyzing **neighbourhood_group_cleansed** , **property_type**, **room_type**, **price**.


In [None]:
# Import all necessary modules and initialize the plot style.
import numpy as np 
import pandas as pd
import seaborn as sns
import plotly.express as px

import matplotlib.pyplot as plt
from matplotlib import style
style.use('ggplot')

## Data import, a first look and clean-up

The dataset is compiled in a CSV file and let us import it and take a look at its raw form.

In [None]:
listing = pd.read_csv('../input/airbnb-seattle-listings-data/airbnb_seattle_listings_data.csv')
display(listing)

The dataset consists of data scraped on a few different days, and let us focus on the snapshot on March 19. As we will see, there are totally over 2000 listings.

In [None]:
# Get the data that was last scraped on 03/19/2021.
listing_snapshot = listing[listing['last_scraped'] == '2021-03-19']
len(listing_snapshot)

Some attributes in each listing are not useful for the analysis in this notebook and we will drop them, including different anonymized identifiers and listing related URLs.

In [None]:
# Listing all the column titles.
listing_snapshot = listing_snapshot.drop(['listing_url', 'host_id', 'host_url', 'scrape_id', 'last_scraped', 'picture_url', 'description'], axis = 1)
print(listing_snapshot.columns)

Before we proceed further, let us check if there are missing or duplicate values. 

In [None]:
# Check and handle rows and columns with missing values if any.
listing_snapshot.isnull().sum()

In [None]:
# Ensuring no duplicated data.
listing_snapshot.duplicated().sum()

The data is ready for analysis.

## Analysis

### 1. What are the top locations and their listing prices among all listings?

In the following figure, I visualize the distribution of locations (neighborhoods).

In [None]:
neighborhood = listing_snapshot.neighbourhood_group_cleansed.value_counts()

plt.figure(figsize=(14,8))
(neighborhood / listing_snapshot.shape[0] * 100).plot(kind="bar");
plt.title("Neighborhood listings distribution");
plt.xlabel('Neighborhood name');
plt.ylabel('Percentage');

The following plot shows the average listing price per location in descending order.

In [None]:
plt.figure(figsize=(20,6));
listing_snapshot['price'] = listing_snapshot['price'].apply(lambda price: float(price.replace('$', '').replace(',', '')))
listing_by_neighbourhood = listing_snapshot.groupby(['neighbourhood_group_cleansed'])['price'].mean().sort_values(ascending=False).plot(kind='bar')
plt.title('Listing price per neighborhood');
plt.xlabel('Neighborhood name');
plt.ylabel('Price (dollars)');

### 2. What are the top room types and their listing prices among all listings?

In [None]:
# Explore the percentange of different room types.
roomtype = listing_snapshot.groupby(["room_type"]).count().reset_index()
roomtype['percentage'] = roomtype['name'] / roomtype['name'].sum() * 100

fig = px.pie(roomtype,
             names='room_type',
             values='percentage')
fig.update_traces(rotation=90, pull=0.05, textinfo="percent+label")
fig.show()

The pie chart below demonstrates the distribution of different room types in the Seattle city.

The listing prices per room type are shown in the figure below.

In [None]:
plt.figure(figsize=(20,6));
listing_by_neighbourhood = listing_snapshot.groupby(['room_type'])['price'].mean().sort_values(ascending=False).plot(kind='bar')
plt.title('Listing price per room type');
plt.xlabel('Room types');
plt.ylabel('Price (dollars)');

## 3. What are the respective mean prices for the most popolar room type and in the most popular location? 

Finally, let us tabulate the repectivie mean room prices for the most popular neighbor and the most popular room type.

In [None]:
most_popular_neighborhood = neighborhood.index[0]
most_popular_neighborhood_price = listing_snapshot.groupby(['neighbourhood_group_cleansed'])['price'].mean()[most_popular_neighborhood]
most_popular_room_type = roomtype.sort_values(by=['percentage'], ascending=False)['room_type'][0]
most_popular_room_type_price = listing_snapshot.groupby(['room_type'])['price'].mean()[most_popular_room_type]
df = pd.DataFrame({'category': ['most popular neighborhood', 'most popular room type'],
                   'value': [most_popular_neighborhood, most_popular_room_type],
                   'price': [most_popular_neighborhood_price, most_popular_room_type_price]})
display(df)