# Case Study: Hotel Stays

**This case study illustrates the use of the staircase package to analyse data pertaining to bookings at a hotel.  The data is adapted from the [Hotel Booking Demand dataset](https://www.kaggle.com/jessemostipak/hotel-booking-demand) available on Kaggle.**

In [None]:
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib.ticker as ticker
from pandas.plotting import register_matplotlib_converters
register_matplotlib_converters()
import seaborn as sns

import staircase as sc

**We begin by importing the data using the [pandas](https://pandas.pydata.org/) library, and take the opportunity to specify which columns of the resulting dataframe should be interpreted as dates, and which should be interpreted as categorical.**

In [None]:
data = pd.read_csv('./data/hotel_stays.csv', parse_dates=["check_in", "check_out"], dtype={"reserved_room_type":"category", "assigned_room_type":"category"})
data

**We will be interested in performing some analysis based on assigned room types.  The following code produces a pandas.Series instance, indexed by the various room types.  The values of this series are staircase.Stairs instances, each representing the number of rooms occupied over time.**

In [None]:
assigned_room_type_stairs = data.groupby(["assigned_room_type"]).apply(lambda df: sc.Stairs(use_dates=True).layer(df.check_in, df.check_out))
assigned_room_type_stairs

**It will be useful to keep a reference of the possible room types which can be assigned..**

In [None]:
assigned_room_types = list(assigned_room_type_stairs.index)
assigned_room_types

## How many rooms assigned over the course of the year?

**Let's start by looking at how many rooms, in total, were occupied over time.  The arithmetic operators that belong to the Stairs class are automatically applied when calling their counterparts belonging to the pandas.Series class.  For example the code block below adds the Stairs instances for each room type together, to produce a single Stairs instance representing total rooms.**

In [None]:
all_assigned_stairs = assigned_room_type_stairs.sum()
all_assigned_stairs

**We can make a simple plot to get a quick feel for how the total bookings vary over the course of the year.  Plotting with the staircase package is compatible with the [matplotlib](https://matplotlib.org/) library (and therefore also [seaborn](https://seaborn.pydata.org/) which is built upon matplotlib).** 

In [None]:
fig, ax = plt.subplots(figsize=(15,8))
all_assigned_stairs.plot(ax)
ax.set_xlim(pd.to_datetime('2016'), pd.to_datetime('2017'));

**A visual inspection of the graph suggests the hotel probably has approximately 180 rooms.  We can check the actual maximum simulteanous bookings observed with the Stairs.max function.**

In [None]:
all_assigned_stairs.max()

In [None]:
midnights_2016 = pd.date_range('2016-01-01', '2016-12-31')
all_assigned_midnight = all_assigned_stairs.resample(midnights_2016)

In [None]:
fig, ax = plt.subplots(figsize=(15,8))
all_assigned_stairs.plot(ax)
all_assigned_midnight.plot(ax=ax);

In [None]:
midnights_2016 = pd.date_range('2016-01-01', '2016-12-31')
all_assigned_midnight = pd.Series(all_assigned_stairs(midnights_2016), index=midnights_2016)

In [None]:
fig, ax = plt.subplots(figsize=(15,8))
all_assigned_stairs.plot(ax)
all_assigned_midnight.plot(ax=ax)

7 day average

In [None]:
fig, ax = plt.subplots(figsize=(15,8))
all_assigned_stairs.plot(ax)
all_assigned_midnight.rolling(7,center=True).mean().plot(ax=ax, linewidth=3)

In [None]:
fig, ax = plt.subplots(figsize=(15,8))
for room_type in assigned_room_types:
    assigned_room_type_stairs[room_type].plot(ax, label=room_type)
ax.legend()

In [None]:
sc.sample(assigned_room_type_stairs, midnights_2016)

In [None]:
fig, ax = plt.subplots(figsize=(15,8))
sns.lineplot(data=sc.sample(assigned_room_type_stairs, midnights_2016), x="points", y="value", hue="key", ax=ax)

DISCUSSION

## How many people over the course of the year?

In [None]:
people_room_type_stairs = data.groupby(["assigned_room_type"]).apply(lambda df: sc.Stairs(use_dates=True).layer(df.check_in, df.check_out, df.adults+df.children))
people_stairs = people_room_type_stairs.sum()
people_midnight = pd.Series(people_stairs(midnights_2016), index=midnights_2016)

In [None]:
fig, ax = plt.subplots(figsize=(15,8))
people_stairs.plot(ax)
people_midnight.rolling(7,center=True).mean().plot(ax=ax, linewidth=3)

In [None]:
print(people_midnight.mean())
print(all_assigned_midnight.mean())
print(people_midnight.mean()/all_assigned_midnight.mean())

In [None]:
print(people_stairs.mean(pd.to_datetime('2016-01-01'), pd.to_datetime('2017-1-1')))
print(people_stairs.mean(pd.to_datetime('2016-01-01'), pd.to_datetime('2017-1-1'))/all_assigned_midnight.mean())

In [None]:
fig, ax = plt.subplots(figsize=(15,8))
sns.lineplot(data=sc.sample(people_room_type_stairs, midnights_2016), x="points", y="value", hue="key", ax=ax)

In [None]:
people_per_room_type_stairs = (people_room_type_stairs/(assigned_room_type_stairs + sc.Stairs(0.00001, use_dates=True)))

In [None]:
for room_type in people_per_room_type_stairs.index:
    fig, ax = plt.subplots(figsize=(20,8))
    people_per_room_type_stairs[room_type].plot(ax, label=room_type)
    ax.legend()

## How often were people upgraded from room type A to room type B?

In [None]:
reserved_room_type_stairs = data.groupby(["reserved_room_type"]).apply(lambda df: sc.Stairs(use_dates=True).layer(df.check_in, df.check_out))
reserved_room_types = list(reserved_room_type_stairs.index)
reserved_room_types

In [None]:
reserved_vs_assigned = reserved_room_type_stairs - assigned_room_type_stairs[reserved_room_type_stairs.index]

In [None]:
fig, ax = plt.subplots(figsize=(15,8))
sns.lineplot(data=sc.sample(reserved_vs_assigned, midnights_2016), x="points", y="value", hue="key", ax=ax)

In [None]:
reserved_vs_assigned_frac = assigned_room_type_stairs[reserved_room_type_stairs.index]/(reserved_room_type_stairs + sc.Stairs(0.0001, use_dates=True))
fig, ax = plt.subplots(figsize=(15,8))
ax.plot(midnights_2016, reserved_vs_assigned_frac['A'](midnights_2016))

## What percentage of rooms are booked for just one night (over time)?

In [None]:
data

In [None]:
one_night_condition = (data.check_out.dt.date - data.check_in.dt.date == '1d')
assigned_room_type_one_night_stairs = data[one_night_condition].groupby(["assigned_room_type"]).apply(lambda df: sc.Stairs(use_dates=True).layer(df.check_in, df.check_out))

In [None]:
one_night_frac_stairs = assigned_room_type_one_night_stairs/(assigned_room_type_stairs + sc.Stairs(0.0001, use_dates=True))

In [None]:
for room_type, s in one_night_frac_stairs.iteritems():
    fig, ax = plt.subplots(figsize=(20,8))
    s.plot(ax, label=room_type)
    ax.legend()

In [None]:
sc.sample(one_night_frac_stairs, midnights_2016).groupby('key').mean()