# Analysis of seasonality in Airbnb markets based on InsideAirbnb data
Despite many shortcomings, [InsideAirbnb]('http://insideairbnb.com/get-the-data') is the *de facto* standard for analytics focused on the Airbnb marketplace.  It's the most common data source for academic papers in this area.  The reason is simple - [Airbnb]('https://www.airbnb.com/') is difficult to scrape, and very expensive to scrape at scale.
The silver lining in using InsideAirbnb data is that while all the analyses are not accurate, they are all subject to the same bias.
## Seasonality
The objective of this analysis is to discover different seasonality patterns for different location, e.g. summer peaks at beach resorts and winter peaks at ski resort, as opposed to year-round popularity in metropolitan/urban areas.

### Assumption
I'm using the count of reviews as a proxy for the number of rentals.  While it's not entirely accurate, I believe it's just as good as the occupancy estimate from InsideAirbnb.

In [None]:
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt

# import qgrid
# pip install datetime_truncate
# import datetime
# from datetime_truncate import truncate


In [None]:
'''
The future plan is to download data for all locations monitored by InsideAirbnb, but for proof of concept
I'm focusing on the Asheville, NC area
'''

city = 'Asheville'

In [None]:
# dfl stands for dataframe of listings
dfl = pd.read_csv(f'../Data/{city}/listings.csv')

print(dfl.size)

# l = qgrid.show_grid(dfl)
# l

In [None]:
# dfr stands for dataframe of reviews
dfr = pd.read_csv(f'../Data/{city}/reviews.csv')

# add year/month to get rid of the day of month granularity
dfr['year_month'] = dfr['date'].str[0: 7]
dfr['year'] = dfr['date'].str[0: 4].astype(int)
dfr['month'] = dfr['date'].str[5: 7].astype(int)

print(dfr.size)
# r = qgrid.show_grid(dfr)
# r

In [None]:
dfr_year_month_revs = dfr.groupby(['year', 'month'])['listing_id'].agg(np.count_nonzero)
dfr_year_month_revs = dfr_year_month_revs.reset_index(level=[0,1])
dfr_year_month_revs.columns = ['year', 'month', 'reviews']

dfr_year_month_revs

In [None]:
dfr_revs_per_month = dfr.groupby(['month'])['month'].agg(np.count_nonzero)
dfr_revs_per_month = dfr_revs_per_month.to_frame()
dfr_revs_per_month.columns = ['revs_per_month']
print(type(dfr_revs_per_month))
print(dfr_revs_per_month.columns)
print(dfr_revs_per_month.dtypes)

dfr_revs_per_month



In [None]:
sns.set(rc={'figure.figsize':(11.7,8.27)})
ax = sns.scatterplot(data=dfr_year_month_revs, x='year', y='month', size='reviews', hue='reviews', sizes=(50,2000))
sns.move_legend(ax, 'lower left')
plt.show()

In [None]:
# both of the lines below work - that's the way to handle 'unusual' column names in Python
# dfr_min_max = dfr.groupby('listing_id').date.agg([np.min, np.max, np.count_nonzero])
dfr_min_max = dfr.groupby('listing_id')['date'].agg([np.min, np.max, np.count_nonzero])

# w_min_max = qgrid.show_grid(dfr_min_max)
# w_min_max

In [None]:
# dfr_revs_per_month = dfr.groupby('listing_id')['year_month'].agg([np.min, np.max, np.count_nonzero])
dfr_revs_per_month = dfr.groupby(['month'])['month'].agg(np.count_nonzero)
dfr_revs_per_month = dfr_revs_per_month.to_frame()
dfr_revs_per_month.columns = ['revs_per_month']
print(type(dfr_revs_per_month))
print(dfr_revs_per_month.columns)
print(dfr_revs_per_month.dtypes)

dfr_revs_per_month



In [None]:
sns.barplot(data=dfr_year_month_revs, x=dfr_year_month_revs.month, y=dfr_year_month_revs.reviews, estimator=np.sum)

In [None]:
sns.barplot(data=dfr_year_month_revs, x=dfr_year_month_revs.year, y=dfr_year_month_revs.reviews, estimator=np.sum)