Below are the instructions for the technical assessment. There are two parts: Data Science and Engineering. You should plan to spend about 1-2 hours on the Data Science portion and 1-2 hours on the engineering portion.

## Data Science

For the Data Science portion, it is important to note that your work won't be assessed on whether you get the best model, but that you understand important concepts behind analyzing the data, feature engineering and model development and evaluation. Keep this section simple, clear and illustrative of your understanding of how to prototype a model.

## Engineering

In a separate set of files (ie not in this Jupyter Notebook), take the model that you created and implement basic training and prediction pipelines that would simulate what you would implement in production. These pipelines should broadly cover the following steps:
* Preprocessing
  * This will be based off the raw data received at the beginning of DS assignment
* Model Training & Evaluation
* Predictions (in batch)

*Some Requirements*:
* The training and prediction pipelines should be independent of each other (though they can draw from the same base methods/classes if need be).
* The prediction job predicts on the latest "promoted" model.
* All model artifacts and outputs are stored according to date partition or, where relevant, by version
* The training job includes logic around "model promotion"
  * If there is an 10% increase in ROC/AUC over the previous model then promote model; else don't promote model
* For both jobs, a user (human or machine) should be able to simply call on the script or import a class to run the pipeline

*Bonus*:

Parameterize the pipelines according to how a Data Scientist would use this.
* Allow for arbitrary set of features to be passed into training (and prediction) job
* Parameterize the % threshold increase to promote a model
* Parameterize which evaluation metric can be used. To keep it simple, stick with most common metrics for evaluation


Organize the files in a folder structure that would emulate how you would organize the code in a Github repo. Zip up all files and send them back to the recruiter by the morning of your interview.

# Data Science Portion

## Imports

In [None]:
!pip install pandas 
!pip install numpy
!pip install -U scikit-learn

In [None]:
import pandas as pd
import numpy as np
from sklearn import preprocessing
# Add any other packages you would like to use here

## Dataset

The dataset in this notebook is representive of Vacasa's internal data.

In this notebook, we would like you to develop a model to predict whether a reservation will cancel and describe what the model learned. 

* The label in the dataset is given as `is_canceled`.
* For a complete description of dataset, visit the link: https://www.sciencedirect.com/science/article/pii/S2352340918315191

In [None]:
df = pd.read_csv('train/hotel_bookings.csv')
df.head()

 ## Helpful EDA

In [None]:
df.info()

In [None]:
df['reservation_status'].unique()

In [None]:
df['is_canceled'].mean()

In [None]:
df.shape