## Week 17: Timesheet checks

This challenge came about from a challenge Pat received as his friends and family have begun to learn about his super data preparation skills. Pat has built out a detailed set of requirements so this might be an easier challenge for some.

The Challenge 

My employees log their hours daily and are contracted to 8 hours per week so I want to check their average number of hours worked over the last 2 weeks. Also, I allow for 20% of their time (not including Chats) to work on their own special projects, meaning they should be spending at least 80% of their time on Client items of work, so I also want to check that they are sticking to instructions by calculating the % of total hours spent on Client work. The task has three sets of requirements as the stakeholder is quite specific.

### Input
One file but three people's data:

![img](https://1.bp.blogspot.com/-1QmUxFAp41Q/YIZ7-eVaQhI/AAAAAAAACJo/lhHLdDOU2nM_os8CkIgRqYYR2XPtvULZACLcBGAsYHQ/w640-h112/Screenshot%2B2021-04-26%2Bat%2B09.37.29.png)

### Requirment
- Remove the ‘Totals’ Rows
- Pivot Dates to rows and rename fields 'Date' and 'Hours'
- Split the ‘Name, Age, Area of Work’ field into 3 Fields and Rename
- Remove unnecessary fields
- Remove the row where Dan was on Annual Leave and check the data type of the Hours Field.
- Total up the number of hours spent on each area of work for each date by each employee.

- First we are going to work out the avg number of hours per day worked by each employee
- Calculate the total number of hours worked and days worked per person
- Calculate the avg hours and remove unnecessary fields.

- Now we are going to work out what % of their day (not including Chats) was spend on Client work.
- Filter out Work related to Chats.
- Calculate total number of hours spent working on each area for each employee
- Calculate total number of hours spent working on both areas together for each employee
- Join these totals together
- Calculate the % of total and remove unnecessary fields
- Filter the data to just show Client work
- Join to the table with Avg hours to create your final output

### Output
![img](https://lh4.googleusercontent.com/Vt38vFKMGTPe8GuzPjuxQSPkM6ccj3n3U3ASMgiYbAGPejkADismHasgHjRxm5xttLTHolhy1HFO6pKylD98H8_hgeU-xt2n8yGc-UPd5oYDcvEpQrGjgm-4pbpJjFp7vcKpqzQ)

One file: 

4 fields:
- Name
- Area of Work
- % of Total
- Avg Number of Hours worked per day|

3 rows (4 rows including headers)

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

In [2]:
df = pd.read_excel("./data/Preppin Data Challenge.xlsx")
df

Unnamed: 0,"Name, Age, Area of Work",Project,2021-02-01 00:00:00,2021-02-02 00:00:00,2021-02-03 00:00:00,2021-02-04 00:00:00,2021-02-05 00:00:00,2021-02-08 00:00:00,2021-02-09 00:00:00,2021-02-10 00:00:00,2021-02-11 00:00:00,2021-02-12 00:00:00
0,"Dan, 28: Client",Client Meetings,,2.0,,1.0,,1.5,0.5,,,Annual Leave
1,"Dan, 28: Client",Client Issues,1.0,1.5,4.5,3.5,1.0,2.0,1.0,2.0,3.0,Annual Leave
2,"Dan, 28: Client",Monthly Reports,,,,,2.0,1.0,1.0,2.0,1.0,Annual Leave
3,"Dan, 28: Client",Client Emails,2.0,0.5,0.5,0.5,1.0,1.0,1.0,,,Annual Leave
4,"Dan, 28: Client",Client Communications,1.0,1.0,,,,0.5,,,,Annual Leave
5,"Dan, 28: Special Projects",Virtual Space,,,,0.5,1.0,1.5,0.5,1.5,,Annual Leave
6,"Dan, 28: Special Projects",Presentation for Exec,,,,,,,1.5,,1.0,Annual Leave
7,"Dan, 28: Special Projects",Webinar,,,,0.5,,,0.5,,,Annual Leave
8,"Dan, 28: Special Projects",Grad Scheme Organisation,1.0,1.0,,,,,,,,Annual Leave
9,"Dan, 28: Special Projects",Team Social,,,,,,,,0.5,,Annual Leave


In [3]:
### Remove the ‘Totals’ Rows

In [4]:
total_index = df.loc[df["Name, Age, Area of Work"].isna()].index
df = df.drop(total_index, axis=0)

In [5]:
### Pivot Dates to rows and rename fields 'Date' and 'Hours'

In [6]:
df = df.melt(id_vars=["Name, Age, Area of Work", "Project"],
             var_name="Date",
             value_name="Hour")
df

Unnamed: 0,"Name, Age, Area of Work",Project,Date,Hour
0,"Dan, 28: Client",Client Meetings,2021-02-01,
1,"Dan, 28: Client",Client Issues,2021-02-01,1.0
2,"Dan, 28: Client",Monthly Reports,2021-02-01,
3,"Dan, 28: Client",Client Emails,2021-02-01,2.0
4,"Dan, 28: Client",Client Communications,2021-02-01,1.0
...,...,...,...,...
505,"Sam, 45: Chats",Team Meetings,2021-02-12,
506,"Sam, 45: Chats",Minutes,2021-02-12,
507,"Sam, 45: Chats",Coffee Catch Ups,2021-02-12,1
508,"Sam, 45: Chats",Personal development,2021-02-12,


In [7]:
### Split the ‘Name, Age, Area of Work’ field into 3 Fields and Rename
### Remove unnecessary fields

In [8]:
name_age_work = (df["Name, Age, Area of Work"]
                    .map(lambda x: x.replace(":", "")
                    .split(" "))
                    .apply(pd.Series)
                )
df["Name"] = name_age_work[0]
df["Age"] = name_age_work[1]
df["Area of Work"] = name_age_work[2]
df["Name"] = df["Name"].str.replace(",", "")
df = df.drop(["Name, Age, Area of Work"], axis=1)
df = df.loc[:, ["Name", "Age", "Date", "Area of Work", "Hour"]]
df

Unnamed: 0,Name,Age,Date,Area of Work,Hour
0,Dan,28,2021-02-01,Client,
1,Dan,28,2021-02-01,Client,1.0
2,Dan,28,2021-02-01,Client,
3,Dan,28,2021-02-01,Client,2.0
4,Dan,28,2021-02-01,Client,1.0
...,...,...,...,...,...
505,Sam,45,2021-02-12,Chats,
506,Sam,45,2021-02-12,Chats,
507,Sam,45,2021-02-12,Chats,1
508,Sam,45,2021-02-12,Chats,


In [9]:
### Remove the row where Dan was on Annual Leave and check the data type of the Hours Field.
dan_remove_index = df.loc[(df["Name"] == "Dan") & (df["Hour"] == "Annual Leave")].index
df = df.drop(dan_remove_index, axis=0)
df

Unnamed: 0,Name,Age,Date,Area of Work,Hour
0,Dan,28,2021-02-01,Client,
1,Dan,28,2021-02-01,Client,1.0
2,Dan,28,2021-02-01,Client,
3,Dan,28,2021-02-01,Client,2.0
4,Dan,28,2021-02-01,Client,1.0
...,...,...,...,...,...
505,Sam,45,2021-02-12,Chats,
506,Sam,45,2021-02-12,Chats,
507,Sam,45,2021-02-12,Chats,1
508,Sam,45,2021-02-12,Chats,


In [10]:
### Total up the number of hours spent on each area of work for each date by each employee.
grouped = df.groupby(["Name", "Area of Work", "Date"])["Hour"].sum()
grouped

Name  Area of Work  Date      
Dan   Chats         2021-02-01     3.0
                    2021-02-02     1.5
                    2021-02-03     2.0
                    2021-02-04     1.5
                    2021-02-05    2.75
                                  ... 
Sam   Special       2021-02-08     2.0
                    2021-02-09       0
                    2021-02-10     1.0
                    2021-02-11       0
                    2021-02-12       1
Name: Hour, Length: 87, dtype: object

In [11]:
### First we are going to work out the avg number of hours per day worked by each employee
avg_hours_per_day = df.groupby(["Name", "Date"])["Hour"].mean()
avg_hours_per_day

Name    Date      
Dan     2021-02-01    1.333333
        2021-02-02    0.937500
        2021-02-03    1.875000
        2021-02-04    1.000000
        2021-02-05    1.031250
        2021-02-08    1.285714
        2021-02-09    0.833333
        2021-02-10    1.285714
        2021-02-11    1.250000
George  2021-02-01    1.285714
        2021-02-02    1.333333
        2021-02-03    1.666667
        2021-02-04    1.700000
        2021-02-05    1.600000
        2021-02-08    1.285714
        2021-02-09    1.000000
        2021-02-10    1.416667
        2021-02-11    1.400000
        2021-02-12    2.000000
Sam     2021-02-01    2.125000
        2021-02-02    1.300000
        2021-02-03    1.250000
        2021-02-04    1.250000
        2021-02-05    1.416667
        2021-02-08    2.125000
        2021-02-09    1.300000
        2021-02-10    1.250000
        2021-02-11    1.250000
        2021-02-12    1.416667
Name: Hour, dtype: float64

In [23]:
### Calculate the total number of hours worked and days worked per person
group_1 = df.groupby(["Name"])["Hour"].sum()
group_2 = df.groupby(["Name"])["Date"].nunique()
total = pd.concat([group_1, group_2], axis=1).rename(columns={"Hour": "Total Hours", "Date": "Total Days"})
total

Unnamed: 0_level_0,Total Hours,Total Days
Name,Unnamed: 1_level_1,Unnamed: 2_level_1
Dan,72.25,9
George,84.0,10
Sam,77.0,10


In [24]:
### Calculate the avg hours and remove unnecessary fields.
total["Avg Hours"] = total["Total Hours"] / total["Total Days"]
total

Unnamed: 0_level_0,Total Hours,Total Days,Avg Hours
Name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Dan,72.25,9,8.027778
George,84.0,10,8.4
Sam,77.0,10,7.7


In [26]:
### Now we are going to work out what % of their day (not including Chats) was spend on Client work.

In [30]:
### Filter out Work related to Chats.
without_chat = df.loc[df["Area of Work"] != "Chats"]
without_chat

Unnamed: 0,Name,Age,Date,Area of Work,Hour
0,Dan,28,2021-02-01,Client,
1,Dan,28,2021-02-01,Client,1.0
2,Dan,28,2021-02-01,Client,
3,Dan,28,2021-02-01,Client,2.0
4,Dan,28,2021-02-01,Client,1.0
...,...,...,...,...,...
500,Sam,45,2021-02-12,Special,
501,Sam,45,2021-02-12,Special,
502,Sam,45,2021-02-12,Special,
503,Sam,45,2021-02-12,Special,1


In [49]:
### Calculate total number of hours spent working on each area for each employee
group_1 = without_chat.groupby(["Name", "Area of Work"])["Hour"].sum().to_frame()
group_1 = group_1.reset_index()
group_1

Unnamed: 0,Name,Area of Work,Hour
0,Dan,Client,40.5
1,Dan,Special,13.5
2,George,Client,56.5
3,George,Special,13.0
4,Sam,Client,53.0
5,Sam,Special,8.0


In [50]:
### Calculate total number of hours spent working on both areas together for each employee
group_2 = without_chat.groupby(["Name"])["Hour"].sum()
group_2

Name
Dan       54.0
George    69.5
Sam       61.0
Name: Hour, dtype: object

In [56]:
area_total = group_1.merge(group_2, how="left", on="Name").rename(columns={"Hour_x": "Area Hours", 
                                                                           "Hour_y": "Total Hours"})

Unnamed: 0,Name,Area of Work,Area Hours,Total Hours
0,Dan,Client,40.5,54.0
1,Dan,Special,13.5,54.0
2,George,Client,56.5,69.5
3,George,Special,13.0,69.5
4,Sam,Client,53.0,61.0
5,Sam,Special,8.0,61.0
