# Forecasting of Staffing Needs

### Team Members:
- Davy Guo
- Iris Yang
- Marcelle Chiriboga
- Patrick Tung

## Agenda
- Introduction
- The Data
- Data Science Approaches
- Timeline

# Introduction

## The Partner - Providence Health Care

- Providence Health Care (PHC) is a non-profit organization.
- Almost 9,000 people working at their 16 facilities - 6,000 staff, 1,000 medical staff/physicians, 200 researchers, 1,600 volunteers.
- PHC is the provincial centre for the care of six groups of people with often-intensive health needs:
  - heart and/or lung diseases;
  - kidney disease;
  - mental illnesses;
  - Older British Columbians (residential care, seniors, & geriatrics);
  - HIV/AIDS; and
  - urban health issues (homelessness, drug & alcohol-related issues & malnutrition).

## Objective

The purpose of this project is to forecasting staffing needs in healthcare on a weekly basis, providing insight on how many back up staff PHC needs to have a full staff.

## Final Product

The final product will consist of:
- a dashboard (developed in R Shiny or Tableau);
- the scripts containing the code used to proceed with the analysis; and
- a report outlining the methodologies and findings.

# The Data

## Data Description

<img src="img/data_des.png" align=middle>

## Data Wrangling

- Raw data is based on exception record: one entry per record
- To apply linear regression: make each row a day, summarise the related variables
- Feature selection needed

<table>
    <tr>
        <td>
            <img src="img/lr_1_adv.jpg" align=middle width="800" height="600">
        </td>
        <td>
            <img src="img/lr_2_adv30.jpg" align=middle width="800" height="600">
        </td>        
    </tr>
</table>

- Around 2/3 of the exceptions are created ahead, and around 1/4 to 1/3 are created at least one month ahead.
- The past one year data contains the exceptions with shifts in the future month that are already created.
- We can use this part of data as predictors to estimate the actual exceptions.

## Feature Selection

- Select correlated variables among 33 columns and other wrangled variables
- Some pontential related variables:
    - Numbers/Hours of exceptions that are filed ahead
    - Exception Group
    - Job Family
    - Program

# Data Science Approaches

- Time Series
- Linear Regression
- Neural Network

## Time Series

- To explore the pattern through time, we plot the number of exceptions for each day of year 2018.
- The scatterplot shows two separate groups.

<img src="img/ts_1_day18.jpg" align=middle width="1000" height="800">

- By coloring the scatters by weeks, we can see a pattern.
- Days around the end of a week, Sunday and Monday, have much less exceptions than other weekdays.
- Based on this, we would try a week-based time series to analyze the trend throughout a week.

<img src="img/ts_2_day18.jpg" align=middle width="1000" height="800">

Overall:

- For the year round trend, we plot the number of exceptions in each week, for data of all 9 years.
- There are 53 weeks in total while the last "week" will only contain one or two days, which is an outlier, so it is eliminated in the graph.

Trend:

- We can see there is a clear trend throughout a year.
- There is a small peak around April, a larger peak around August, and the highest peak at the end of the year.
- A year-based time series could be used to analyze this trend. 
- We will try to extract seasonality and noise to find a more stable year round pattern.
- As the company expands, the shifts and exceptions also increase year by year.

Things to keep in mind:

- The graph also shows us that 2019 and 2020 have clearly different patterns.
- This is because those exceptions in the future are filed beforehand.
- We will be careful when using the data in 2019 and 2020 during our analysis.

<img src="img/ts_3_week.jpg" align=middle width="1000" height="800">

Summary on time series:

- We will use time series as a base pattern to predict the number of exceptions and exception hours for each day in a future period, for example one month.
- The time series will also provide a general trend of the future period.

## Linear Regression

- Our goal is to use the past data, for example exception records for one year, to make predictions for operational exceptions (one month ahead).
- Variables in the records can be used as predictors.

## Neural Network

# Timeline


| Time Period | Milestone |
|-----------------|-------------|
| Week 1 | Finalize the proposal report to send to our mentor and partner |
| Week 2 | Data wrangling and feature selection (and more EDA) |
| Weeks 3 - 4 | Explore different approaches to fit the models |
| Week 5 | Build algorithms, testing, adjusting |
| Week 6 | Improve the dashboard, wrapping up |
| Week 7 | Presentations and reports |

# Thank you!