# Exploring the Hotel-Level Data

Here, we convert all reservations to hotel nightly sales statistics using the `parse_dates` and `res_to_stats` functions in `utils.py`.

We'll start by deriving some basic information about each hotel, including:
* **Capacity** (total number of rooms)
* **Occupancy** (rooms sold / capacity)

Then, we'll pull more statistics into the `stats` DataFrames:
* **Revenue and Rooms Sold by Customer Segment**
* **ADR by Customer Segment**

These stats will help us understand what kind of hotels we're working with.

In [1]:
import pandas as pd
import numpy as np

from utils import parse_dates, add_res_columns, res_to_stats

In [2]:
df_h1 = pd.read_csv("../data/H1.csv")
df_h2 = pd.read_csv("../data/H2.csv")

In [3]:
df_h1 = parse_dates(df_h1)
df_h2 = parse_dates(df_h2)
df_h1 = add_res_columns(df_h1)
df_h2 = add_res_columns(df_h2)

In [4]:
h1_stats = res_to_stats(df_h1)
h2_stats = res_to_stats(df_h2)

KeyboardInterrupt: 

In [None]:
h1_stats.head(2)

In [None]:
h2_stats.head(2)

In [None]:
h1_stats.describe()

In [None]:
h2_stats.describe()

## Capacity

Based on the above tables, we can see the maximum capacity of each hotel.

**H1 (Resort Hotel)'s capacity is 187 rooms.**

**H2 (City Hotel)'s capacity is 226 rooms.**

In [None]:
h1_stats["Occ"] = h1_stats.RoomsSold.astype(float) / 187
h2_stats["Occ"] = h2_stats.RoomsSold.astype(float) / 226

In [None]:
h1_stats.describe()

In [None]:
h2_stats.describe()

In [None]:
df_h1.head(3)

In [None]:
h1_res_nums = np.array(range(len(df_h1)))
h2_res_nums = np.array(range(len(df_h2)))
h1_res_nums, h2_res_nums

In [None]:
df_h1.insert(0, 'ResNum', h1_res_nums)
df_h2.insert(0, 'ResNum', h2_res_nums)


In [None]:
df_h1.CustomerType.value_counts()

In [None]:
df_h1['Revenue'] = df_h1.LOS * df_h1.ADR
df_h2['Revenue'] = df_h2.LOS * df_h2.ADR

In [None]:
mask = df_h1.IsCanceled == 0
df_h1[mask][['CustomerType', 'LOS', 'Revenue']].groupby("CustomerType").agg(sum)