# Welcome to the Revenue Management Simulation!

First, make sure you've read by blog post about the project, so you have an understanding of the steps. Then, follow along to reproduce the project on your local computer.

For more information about what's happening behind the scenes, take a look at the `.py` scripts in `code` folder.

As-of-date (abbreviated 'AOD') is 2017-08-01. We will be trying to predict cancellations and demand for each day in August 2017.

### Let's start with some imports:

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import datetime as dt
from dbds import generate_hotel_dfs
from save_sims import save_historical_OTB
from agg import prep_demand_features
from demand import model_demand

pd.options.display.max_rows = 150
pd.options.display.max_columns = 250
pd.options.display.max_colwidth = None

aod = '2017-08-01'

### Let's also make sure we have a `code/pickle` folder.

In [2]:
!mkdir pickle # create an empty pickle folder if it doesn't exist (will not work on Windows)

mkdir: pickle: File exists


### Now, let's get our reservations ready for modeling, and calculate actual hotel statistics.

Time to execute: < 30 seconds

In [3]:
h1_res, h1_dbd = generate_hotel_dfs("../data/H1.csv", capacity=187)
h2_res, h2_dbd = generate_hotel_dfs("../data/H2.csv", capacity=226)

h1_res.to_pickle("pickle/h1_res.pick")
h1_dbd.to_pickle("pickle/h1_dbd.pick")
h2_res.to_pickle("pickle/h2_res.pick")
h2_dbd.to_pickle("pickle/h2_dbd.pick")

### Now let's calculate current OTB (on the books) statistics, as of every date from 2016-07-01 to 2017-08-30.

This is necessary to get demand features for each future arrival date, namely:
* On the books same-time-last year
* Recent booking trends (T-30, T-15, T-5)
* Pace

**Running the below (commented) code will take several hours the first time.** It trains and runs the predictive cancellation model (along with many other calculations), and saves a pickled file for each day, for each hotel. For information about exactly what's happening with each iteration, take a look at `sim.py`. 

In the real-world, this wouldn't be necessary, as we could simply save one file per day and access it whenever we need to. But given the situation, it's a necessary step.

**To avoid this, I've saved the aggregated output into one csv for each hotel.** These files are `../data/h1_stats.csv` and `../data/h2_stats.csv`.

In [4]:
# save_historical_otb(h1_dbd, h1_res, h2_dbd, h2_res) # this one takes several hours

In [5]:
# these process the resulting files from save_historical_otb

# h1_sim = prep_demand_features(1)
# h2_sim = prep_demand_features(2)

In [6]:
# take the faster route (use the CSVs in the repo):
date_cols = ['StayDate',
 'STLY_StayDate',
 'AsOfDate',
 'STLY_AsOfDate',
 'AsOfDate_STLY',
 'StayDate_STLY']

h1_sim = pd.read_csv("../data/h1_stats.csv", parse_dates=date_cols, infer_datetime_format=True)
h1_sim.drop(columns=["STLY_Stay_Date", "STLY_AsOfDate", "Unnamed: 0"], errors='ignore', inplace=True) # remove dup. columns
h1_sim.reset_index(inplace=True)

h2_sim = pd.read_csv("../data/h2_stats.csv", parse_dates=date_cols, infer_datetime_format=True)
h2_sim.drop(columns=["STLY_Stay_Date", "STLY_AsOfDate", "Unnamed: 0"], errors='ignore', inplace=True) # remove dup. columns
h2_sim.reset_index(inplace=True)


### Now we can predict demand for each day in August, using the below two cells.

Note: Ignore the pricing information. I attempted to use price as a feature to predict demand, but it wasn't working. The reason is because I don't have historical selling price data, nor competitor pricing data, so there was no way to teach the model to recognize that increasing price reduces demand. 

In [7]:
h1_demand = model_demand(1, h1_sim, aod)

Training Random Forest model to predict remaining transient demand...
Model ready.

R² score on test set (stay dates Aug 1 - Aug 31, 2017):                        0.743
MAE (Mean Absolute Error) score on test set (stay dates Aug 1 - Aug 31, 2017): 2.31
MSE (Mean Squared Error) score on test set (stay dates Aug 1 - Aug 31, 2017):  8.111

Calculating optimal selling prices...

Average recommended price change...                                            44.93
Estimated RN (Roomnight) growth after implementing price recommendations...    0.0
Estimated revenue growth after implementing price recommendations...           7086.34
Simulation ready.



In [8]:
h2_demand = model_demand(2, h1_sim, aod)

Training Random Forest model to predict remaining transient demand...
Model ready.

R² score on test set (stay dates Aug 1 - Aug 31, 2017):                        0.748
MAE (Mean Absolute Error) score on test set (stay dates Aug 1 - Aug 31, 2017): 2.288
MSE (Mean Squared Error) score on test set (stay dates Aug 1 - Aug 31, 2017):  7.979

Calculating optimal selling prices...

Average recommended price change...                                            43.23
Estimated RN (Roomnight) growth after implementing price recommendations...    -0.0
Estimated revenue growth after implementing price recommendations...           7077.21
Simulation ready.



## That's it! Now you're ready to look through all of these dataFrames we've created. 