# Seattle Airbnb seasonal variations from 2020 Q4 to 2021 Q1
### and possible correlations with economic recovery after the coronavirus pandemic

## Context

The travel industry has been clobbered, from flights to cruises to lodging due to the coronavirus pandemic. This also has a devastating effect on Airbnb business by scaling back travels as the virus spreads. With the progress of the vaccination throughout the country and some marketing strategies that Airbnb adopted, I wonder if there are traces of economic recovery that recent datasets from Airbnb can reveal.

## Prior research

According to a recent report, bookings on Airbnb slumped from 96% to 41% since January 2020 due to travel restrictions imposed by many countries because of the pandemic (lionel Lairent, [Airbnb may become obsolete depending on recovery of tourism after Covid-19 crisis](http://theprint.in/economy/airbnb-may-become-obsolete-depending-on-recovery-of-tourism-after-covid-19-crisis/394346/)). This validates our basic assumption that there is a significant impact on Airbnb listings by the pandemic. This also serves as a qualitative baseline for the analysis in this notebook.

Prior research published in April 2021 showed Airbnb lost hosts throughout the pandemic, particularly those listings with only one property (Laura Forman, [Companies Cheered by Post-Covid-19 Demand Have a Supply Problem](http://www.wsj.com/articles/companies-cheered-by-post-covid-19-demand-have-a-supply-problem-11619436602)). In this study, we will analyze the current weight of one-property listings among all and try to evaluate changes in this category.

In the recent report from Airbnb in 2021 February, they provided a critical and urgent social safety net to help new hosts with only one listing to help stay economically afloat. In the US, they could earn \\$3,900, almost twice as much as what has been made available via stimulus checks to date. (Airbnb, [New Airbnb Hosts Have Earned \\$1 Billion During the Pandemic](http://news.airbnb.com/wp-content/uploads/sites/4/2021/02/New-Hosts-Earnings-Report-Airbnb.pdf?v2)). This subsidy strategy may stimulate one-property listings to recover, and I will validate this conjecture in my data analysis. 

## Dataset profile
These datasets are part of Airbnb Inside that were scraped each month from October 2020 to March 2021, and the original source can be found [here](http://insideairbnb.com/get-the-data.html). They are available under a Public Domain Dedication license, so there is no restriction to use them. 

The 6 datasets for Seattle will be used in this notebook:

* listings_October_2020.csv, listings_November_2020.csv, listings_December_2020.csv, listings_January_2021.csv, listings_February_2021.csv, listings_March_2021.csv  - summary information on listing in Seattle such as location, host information, review information, amenities, etc.

These datasets are interesting to me because I would like to explore the variations of the Airbnb listings in Seattle and observe possible correlations with economic recovery after the coronavirus pandemic.

## Project plan

In this mini project, I will analyze
* How is the number of listings change depending on the time of this half a year?
* How prices change depending on the time of this half a year?
* How is the weight of one-property listings among all?
* Does the Airbnb marketing strategies effectively improve the hosting situation with only one listing?

To this end, I will focus on analyzing **host_id**, **price**, **calculated_host_listings_count** to explore my questions. To clarify, calculated_host_listings_count is defined by the number of listings the host has (per Airbnb calculations).

I will use aggregation, comparisons, and descriptive statistics to analyze listings offered by different types of hosts (e.g., one-property or multi-property), price change, and customer engagement via reviews on Airbnb during the last half a year, and visualize the findings via different types of charts.

In [None]:
# Import all necessary modules and initialize the plot style.
import numpy as np 
import pandas as pd
import seaborn as sns
import plotly.express as px

import matplotlib.pyplot as plt
from matplotlib import style
style.use('ggplot')

## Analysis

### Prepare the data

We first load the listing and pricing data for each month from 2020 Q4 and 2021 Q1.

In [None]:
data_by_month = {}
months = ['October_2020', 'November_2020', 'December_2020', 'January_2021', 'February_2021', 'March_2021' ]
for month in months:
    file_path = '../input/seattle-airbnb-seasonal-data/listings_' + month + '.csv'
    data_by_month[month] = pd.read_csv(file_path)

### 1. How is the number of listings change depending on the time of this half a year?
In the following figure, I visualize the total listings numbers that was scraped by month from 2020 Q4 to 2021 Q1.

In [None]:
listings_by_month = pd.DataFrame(columns=['Month', 'Listings'])
for month in months:
    listings_by_month = listings_by_month.append({'Month': month, 'Listings': len(data_by_month[month])}, ignore_index=True)

In [None]:
display(listings_by_month)
df = pd.DataFrame.from_dict(listings_by_month)
 
plt.figure(figsize=(14,8))
df.plot(kind="bar", x='Month', y='Listings');
plt.title("Listing trend from 2020 Q4 to 2021 Q1");
plt.xlabel('Month');
plt.ylabel('Total number of listings');

print("2020 Q4 average monthly listings = ", np.mean(listings_by_month.iloc[:3]))
print("2021 Q1 average monthly listings = ", np.mean(listings_by_month.iloc[3:]))

The bar chart above shows that the monthly total number of listings stayed relatively flat between Q4 and Q1, and there is a slight decreasing between quarters.

### 2. How prices change depending on the time of this half a year? 
I will first calculate the mean listing price of each month, and then compare the each mean price month by month.

In [None]:
mean_price_by_month = pd.DataFrame(columns=['Month', 'Mean Price'])
for month in months:
    mean_price_by_month = mean_price_by_month.append({'Month': month, 'Mean Price': data_by_month[month]['price'].mean()}, ignore_index=True)

In [None]:
display(mean_price_by_month)
df = pd.DataFrame.from_dict(mean_price_by_month)
plt.figure(figsize=(40,8))
df.plot(x='Month', y='Mean Price')
plt.title("Mean price trend from 2020 Q4 to 2021 Q1")
plt.xlabel('Month')
plt.ylabel('Mean price (USD)')
# I use a unused variable to suppress the display of ticks.
_ = plt.xticks(rotation=60)

We observe an interesitng V-shape trend of the mean listing price from the plot above. The price touched the bottom in January this year and started to rebound since February. This trend may correlate with the possible seasonable variation of the traveling and lodging business in Q4, and the COVID vaccination throughout the country starting from Q1 (according to the timeline of Covid-19 vaccine development [here](https://www.ajmc.com/view/a-timeline-of-covid-19-vaccine-developments-in-2021)). This hypothesis formed by the observation may be studied via comparsion with listing data from past years.

### 3. How is the weight of one-property listings among all?

I will use the stacked bar charts to show how is the weight of different property listings per month. Note that  each row of the listing table presents a separate listing, while **calculated_host_listings_count** is computed for each host, As a result, we need to deduplicate rows of the listing table by the host ID. This avoids counting the same host mulitple times, so as to generate the correct distribution of the number of properties owned by hosts.

In [None]:
property_distribution_by_month = pd.DataFrame(columns=['Month', '1-Property Listings', '2-Property Listings', 'Multi-Property Listings'])
for month in months:
    # Dedup by the host ID.
    month_listing_data = data_by_month[month].drop_duplicates(subset=['host_id'])
    host_property_count = month_listing_data['calculated_host_listings_count']
    one_property_listings = len(host_property_count[host_property_count == 1]) / len(host_property_count)
    two_property_listings = len(host_property_count[host_property_count == 2]) / len(host_property_count)
    multi_property_listings = len(host_property_count[host_property_count > 2]) / len(host_property_count)
    property_distribution_by_month = \
    property_distribution_by_month.append({'Month': month, \
                                           '1-Property Listings': one_property_listings, \
                                           '2-Property Listings': two_property_listings, \
                                           'Multi-Property Listings': multi_property_listings}, ignore_index=True)

In [None]:
display(property_distribution_by_month)
df = pd.DataFrame.from_dict(property_distribution_by_month)
 
plt.figure(figsize=(40,8))
df.plot.bar(x='Month', stacked=True);
plt.title("Property distribution trend from 2020 Q4 to 2021 Q1");
plt.xlabel('Month');
plt.ylabel('Property distribution');

As can be seen from the figure above, the weight of one-property listings is largest among all, which is around 80%. In other words, most of the hosts on Airbnb have only one property offered for rental. One-property listings went slightly down during the recent half a year from 80.5% to 79.5%. This observation may suggest the marketing strategy via subsidies from Airbnb may not have achieved it intended effect yet.

## Conclusion

In this analysis I have aimed to understand the trend of listings from 2020 Q4 to 2021 Q1 with the help of Airbnb data for Seattle, and to analyze if Airbnb marketing strategies effectively improve the hosting situation with only one listing.

We observe that the total number of listings is declining and the group of one-property hosts has shrinked, who might be affected by the pandemic most and need more help from either Airbnb or the other channels. The good indicator is that the average listing price is rebounding since February this year. The dataset I used does not provide the complete picture of how our economy is doing but may serve as a perspective to peek into the recovery after the coronavirus pandemic, and the market still has much efforts to do.

For the future work, I could focus on looking for the direct and indirect reasons that affect the declining number of one-property listing hosts in Airbnb.
