## June 16, 2021

Chin & Beard Suds Co is just like any other company, people have unscheduled time off. Whilst this is expected in organisations, it can be difficult to manage. At C&BS Co, we have had a rough start to our financial year with lots of people being off for illness or sickness. How bad has it been and do we have people off every single day?

This analysis can be tough in BI tools to look at the day-to-day reality when days off are recorded with just a start date and the number of days taken off. This week's challenge is producing a simple data set that will give us this view. 

We are analysing the period 1st April to 31st May 2021.

### Input
One Excel workbook, two sheets:

1. Absence Table

![img](https://1.bp.blogspot.com/-FkXnQjO2k3s/YK-4ZBBpaWI/AAAAAAAACMQ/kgqvxAft_9IRE5mjJgLRlkVa-MSltYYYwCLcBGAsYHQ/s320/Screenshot%2B2021-05-27%2Bat%2B16.18.44.png)

2. Scaffold

![img](https://1.bp.blogspot.com/-cKnP_N19TVY/YK-4p6oCTCI/AAAAAAAACMY/QOLP0Bf44OImwaearNzbMAbBsmSyysbAgCLcBGAsYHQ/s320/Screenshot%2B2021-05-27%2Bat%2B16.19.55.png)

### Requirement
- Input data
- Build a data set that has each date listed out between 1st April to 31st May 2021
- Build a data set containing each date someone will be off work
- Merge these two data sets together 
- Workout the number of people off each day
- Output the data

### Output
Can you answer:
1. What date had the most people off?
2. How many days does no-one have time off on?

![img](https://1.bp.blogspot.com/-EyRwPwV4hj4/YK-67lR2lKI/AAAAAAAACMg/-O3jOctb2aIwHeB6JAQ1FouUTo9-Q-WrACLcBGAsYHQ/w400-h194/Screenshot%2B2021-05-27%2Bat%2B16.29.28.png)

One table:


Two data fields:
- Date
- Number of people off each day

61 rows (62 including headers)

In [69]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

### Input data

In [70]:
data = pd.read_excel("./data/Absenteeism Scaffold.xlsx", sheet_name=["Reasons", "Scaffold"])
reasons = data["Reasons"].copy()
scaffold = data["Scaffold"].copy()

In [71]:
reasons

Unnamed: 0,Name,Start Date,Days Off,Reason
0,Andy,2021-04-01,4.0,Illness
1,Carl,2021-04-04,5.0,Illness
2,Luke,2021-04-05,7.0,Accident
3,Tom,2021-04-07,2.0,Illness
4,Craig,2021-04-08,3.0,Accident
5,Lorna,2021-04-10,5.0,Accident
6,Pat,2021-05-11,10.0,Illness
7,Jenny,2021-05-14,3.0,Illness
8,Tom,2021-05-18,5.0,Accident


In [72]:
scaffold.head()

Unnamed: 0,Scaffold
0,0.0
1,1.0
2,2.0
3,3.0
4,4.0


### Build a data set that has each date listed out between 1st April to 31st May 2021

In [73]:
date = pd.DataFrame(pd.date_range(start="2021-04-01", freq="D", end="2021-05-31"))
date = date.rename(columns={0: "Date"})
date.head()

Unnamed: 0,Date
0,2021-04-01
1,2021-04-02
2,2021-04-03
3,2021-04-04
4,2021-04-05


### Build a data set containing each date someone will be off work

In [74]:
reasons

Unnamed: 0,Name,Start Date,Days Off,Reason
0,Andy,2021-04-01,4.0,Illness
1,Carl,2021-04-04,5.0,Illness
2,Luke,2021-04-05,7.0,Accident
3,Tom,2021-04-07,2.0,Illness
4,Craig,2021-04-08,3.0,Accident
5,Lorna,2021-04-10,5.0,Accident
6,Pat,2021-05-11,10.0,Illness
7,Jenny,2021-05-14,3.0,Illness
8,Tom,2021-05-18,5.0,Accident


In [75]:
tmp = pd.DataFrame(pd.date_range(start=reasons["Start Date"][0], periods=reasons["Days Off"][0]).to_frame())
tmp = tmp.rename(columns={0: "Days Off"})
tmp.loc[:,"Days Off"] = 1
tmp

Unnamed: 0,Days Off
2021-04-01,1
2021-04-02,1
2021-04-03,1
2021-04-04,1


In [76]:
results = []
for i in range(9):
    tmp = pd.DataFrame(pd.date_range(start=reasons["Start Date"][i],
                                     periods=reasons["Days Off"][i]).to_frame())
    tmp = tmp.rename(columns={0: "Days Off"})
    tmp.loc[:, "Days Off"] = 1
    results.append(tmp)

In [77]:
days_off = pd.concat(results, axis=1).fillna(0).astype(int)
days_off = days_off.sum(axis=1)
days_off.name = "Days Off"

In [78]:
days_off

2021-04-01    1
2021-04-02    1
2021-04-03    1
2021-04-04    2
2021-04-05    2
2021-04-06    2
2021-04-07    3
2021-04-08    4
2021-04-09    2
2021-04-10    3
2021-04-11    2
2021-04-12    1
2021-04-13    1
2021-04-14    1
2021-05-11    1
2021-05-12    1
2021-05-13    1
2021-05-14    2
2021-05-15    2
2021-05-16    2
2021-05-17    1
2021-05-18    2
2021-05-19    2
2021-05-20    2
2021-05-21    1
2021-05-22    1
Name: Days Off, dtype: int64

### Merge these two data sets together 
- Workout the number of people off each day

In [79]:
date = date.set_index("Date").join(days_off).fillna(0).astype(int)
date = date.rename(columns={"Days Off" : "Number of people off each day"})
date

Unnamed: 0_level_0,Number of people off each day
Date,Unnamed: 1_level_1
2021-04-01,1
2021-04-02,1
2021-04-03,1
2021-04-04,2
2021-04-05,2
...,...
2021-05-27,0
2021-05-28,0
2021-05-29,0
2021-05-30,0


### Question1 : What date had the most people off?

In [82]:
date.sort_values(by="Number of people off each day", ascending=False).head(1)

Unnamed: 0_level_0,Number of people off each day
Date,Unnamed: 1_level_1
2021-04-08,4


### Question2 : How many days does no-one have time off on?

In [91]:
print("{} Days".format(date.loc[date["Number of people off each day"] == 0, :].shape[0]))

35 Days


In [93]:
date.to_csv("./output/Week24_output.csv")