# Forecasting of Staffing Needs

### Team Members:
- Iris Yang
- Marcelle Chiriboga
- Patrick Tung
- Weifeng (Davy) Guo

## Agenda
- Introduction
- The Analysis
    - Predicted Number of Exceptions
    - Predicted Number of Urgent Exception Groups
    - Exceptions Classification
- The Dashboard

# Introduction

## The Partner - Providence Health Care

- Providence Health Care (PHC) is a non-profit organization.
- Almost 9,000 people working at their 16 facilities - 6,000 staff, 1,000 medical staff/physicians, 200 researchers, 1,600 volunteers.
- PHC is the provincial centre for the care of six groups of people with often-intensive health needs.

<div align="center"><img src="img/phc_logo.png"></div>

## The Problem

- In the healthcare business, staff absences must always be backfilled. 
- These absences, expected or not, are called **exceptions**.
- One way to minimize their impact is to predict future exceptions based on historical data.

<div align="center"><img src="img/phc_strategy.png" width="1000"></div>

## Objective

The purpose of this project was to predict the short-term staff needs in order to provide PHC some insight into unexpected potential costs and staff shortages.

Specifically we focused on building models for:

- Forecasting staffing needs on a weekly basis, allowing PHC to estimate how many back up staff are needed;
- Forecasting how many exceptions will fall under the urgent exception groups (i.e. overtime and relief not found);
- Forecasting possible outcome for each exceptions submitted.

# The Analysis

We performed an EDA to indentify the facilities, labour agreement and job families we should focus on.


<div align="center"><img src="img/exceptions_per_facilites.png" width = "600"></div>

<div align="center"><img src="img/exceptions_by_labor_agreement.png" width = "1200"></div>

<div align="center"><img src="img/nurs_job_families_pareto.png" width = "1200"></div>

<div align="center"><img src="img/nurs_pareto_table.png" width = "1200"></div>

## Exception Count Prediction

Forecasting the number of exceptions for Providence Health Care

### Method

- Data
    * Training: 2013 - 2016
    * Validation: 2017
    * Testing: 2018
- Data Wrangling
    * Split data by SITE, JOB_FAMILY, and SUB_PROGRAM
    * e.g. St Paul's Hospital, Registered Nurse - DC1, Emergency
- Fit time series model for each “combination”
    * Facebook Prophet
- Predict the number of exceptions for the combinations
- Adjusted models based on Mean Absolute Error
- Output a .csv file containing the forecasts

### Product/Interface

<div align="center"><img src="img/exception_gui.png" width = "1200"></div>

### Output file

* `.csv` file containing all the predictions (on a weekly basis)

<div align="center"><img src="img/example_output.png" width = "1000"></div>

### Difficulties

* Certain combinations of data had very little exceptions
    * Little to no pattern
    * Predictions are not meaningful

* e.g. Youville Residence, Registered Nurse - DC2B - Parkview

<div align="center"><img src="img/Youville-DC2B-Parkview.png"></div>


### Solution

* Fit meaningful data using a threshold
    * Must have 300 exceptions within the past 4 years

* e.g. St Paul's Hospital, Registered Nurse - DC1, EMERG
    * MAE: 55.22

<div align="center"><img src="img/SPH-DC1-EMERGSPH.png"></div>

## Urgent Exception Prediction

Predicting the number of urgent exceptions

<div align="center"><img src="img/phc_strategy.png" width="700"></div>

## Urgent Exception

- Exceptions backfilled by **Overtime** and **Relief Not Found**
    - Overtime: high cost that need to minimize
    - Relief Not Found: need to avoid
- Give a insight so HR can arrange on-call and other backfills

### Method

- Linear Regression

### Data

- Dates: Until 2018, excluding 2014
- Job Family: DC1000, DC2A00, DC2B00
- Earning Category: Overtime & Relief Not Found

### Difficulties

- Low correlation
- Randomness in daily basis

### Variables

- Dates (One Hot Encoding)
    - Day of week, day of month, week of year, month of year
- Productive hours

### Input file

- Exception Hours for past years
- Productive Hours for past years
- Productive Hours for the period you want to predict (estimation)

### Output file

- `.csv` file with dates, job family, predicted count

<div align="center"><img src="img/urgent_3.png" width = 400></div>

<div align="center"><img src="img/urgent_1.png" width = 1000></div>

## Exception Classification

Forecasting possible outcome for each exception submitted

### Label Grouping

- EARNING_CATEGORY is the final outcome for an exception


- Original EARNING_CATEGORY has 12 values which is too much for classification


- 3 labels is more reasonable for classifation:

    - Straight Time: Regular Relief Utilized, Casual at Straight-Time, PT Over FTE,  Miscellaneous Straight-Time, PT Employee Moved - Straight-Time, FT Employee Moved - Straight-Time
    
    - Overtime and Beyond: Overtime, Agency, Insufficient Notice, On-Call, Relief Not Found
    
    - Relief Not Needed: Relief Not Needed.

### Feature Selection

- Using `EXCEPTION_HOURS`, `EXCEPTION_CREATION_TO_SHIFTSTART_MINUTES`,`NOTICE` as accuracy baseline.


- Using forward selection, adding `SITE`, `PROGRAM`, `SUB_PROGRAM`, `EXCEPTION_GROUP`, `MONTH`, `DEPARTMENT`, `SHIFT`.

### Prediction Result Analysis



<div align="center"><img src="img/rf_1.png" width = "800"></div>

### Difficulties

- Imblanced Data

<div align="center"><img src="img/rf_2.png" width = "800"></div>

### Prediction Accuracies After Adjustments

<div align="center"><img src="img/classification_accuracies.png" width = "800"></div>

### Output file
    
- .csv file with the prediction result
    
<div align="center"><img src="img/rf_4.png" width = "600"></div>

# Dashboard

## Exception Predictions

<div align="center"><img src="img/dashboard_predictions.png" width = "600"></div>

## Exceptions Classification

<div align="center"><img src="img/dashboard_classification.png" width = "600"></div>

## Comparison of Productive and Exception Hours

<div align="center"><img src="img/dashboard_history.png" width = "600"></div>

# Summary

- Data product contains the three models
- Results from the models can be shown in tableau dashboard
- HR can choose models based on the data they get
- Provide insights from the predictions to help decision making

# Thank you!