## 2021: Week 18 Prep Air Project Overruns

In January we ran a month of challenges that focused on letting people try Preppin' for the first time without jumping into some of the more challenging functions. This month, we want to build on that knowledge with a set of beginner / intermediate challenges. We'll keep going with the help links but clearly flag up the focus of the challenge. 

If you haven't tried the January challenges but wonder if Preppin' might be too hard (it won't be), please go back and try these challenges. You will find these challenges and solution posts here.

### The Challenge
This week's challenge is focused on Dates and the calculation functions available to you. Here's a recent blog post that I wrote that might help you if you want a little extra support.

This week we would like you to prepare you data for building a Gantt chart and supporting information on a dashboard (you don't have to build the dashboard but bonus points if you do!). Prep Air (our fake airline) has had a number of projects that have been over-running and the leadership team want to know why:

![img](https://1.bp.blogspot.com/-btQGEM1M1Yk/YJEukngVUWI/AAAAAAAACKI/Eck9OHIkrsg8xRCYCglxgBluMOZGmTiogCLcBGAsYHQ/w640-h364/Prep%2BAir%2BProjects-2.png)

The data starts in a state I've seen in a few systems. Preparing the data to make analysis easier is the aim this week.

### Input
The input file is an Excel file with a single tab

![img](https://1.bp.blogspot.com/-cAx51-KAeXg/YJEuvWaQTBI/AAAAAAAACKM/udGU0Z1w-R8tOl8Cyr9vFJj0ii21EJdwwCLcBGAsYHQ/w640-h190/Screenshot%2B2021-05-04%2Bat%2B12.23.27.png)

### Requirement
Here's what we're asking of you:
- Input the data
- Workout the 'Completed Date' by adding on how many days it took to complete each task from the Scheduled Date
- Rename 'Completed In Days from Schedule Date' to 'Days Difference to Schedule'
- Your workflow will likely branch into two at this point:

1. Pivot Task to become column headers with the Completed Date as the values in the column
    - You will need to remove some data fields to ensure the result of the pivot is a single row for each project, sub-project and owner combination. 
    - Calculate the difference between Scope to Build time
    - Calculate the difference between Build to Delivery time
    - Pivot the Build, Deliver and Scope column to re-create the 'Completed Dates' field and Task field
        - You will need to rename these


2. You don't need to do anything else to this second flow

Now you will need to:
- Join Branch 1 and Branch 2 back together 
    - Hint: there are 3 join clauses for this join
- Calculate which weekday each task got completed on as we want to know whether these are during the weekend or not for the dashboard
- Clean up the data set to remove any fields that are not required.
- Output as a csv file

### Output
![img](https://1.bp.blogspot.com/-h0Fqp3adZTw/YJEyCFaFNvI/AAAAAAAACKY/YUuZDiIdGfwg08GzKGPadrjzYxub7MUfQCLcBGAsYHQ/w640-h142/Screenshot%2B2021-05-04%2Bat%2B12.37.31.png)

One file:

10 data fields:

- Project 
- Sub-Project
- Owner
- Scheduled Date
- Completed Date
- Completed Weekday
- Task
- Scope to Build Time
- Build to Delivery Time
- Days Difference to Schedule


18 rows of data (19 including headers) 

In [2]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

### Input the data

In [4]:
df = pd.read_excel("./data/PD 2021 Wk 18 Input.xlsx")
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 18 entries, 0 to 17
Data columns (total 6 columns):
 #   Column                                 Non-Null Count  Dtype         
---  ------                                 --------------  -----         
 0   Project                                18 non-null     object        
 1   Sub-project                            18 non-null     object        
 2   Task                                   18 non-null     object        
 3   Owner                                  18 non-null     object        
 4   Scheduled Date                         18 non-null     datetime64[ns]
 5   Completed In Days from Scheduled Date  18 non-null     float64       
dtypes: datetime64[ns](1), float64(1), object(4)
memory usage: 992.0+ bytes


In [5]:
df

Unnamed: 0,Project,Sub-project,Task,Owner,Scheduled Date,Completed In Days from Scheduled Date
0,New Loyalty Scheme,Marketing,Scope,Tom,2021-04-19,0.0
1,New Loyalty Scheme,Marketing,Build,Tom,2021-04-21,2.0
2,New Loyalty Scheme,Marketing,Deliver,Tom,2021-04-30,5.0
3,New Loyalty Scheme,Operations,Scope,Jenny,2021-04-15,0.0
4,New Loyalty Scheme,Operations,Build,Jenny,2021-04-23,3.0
5,New Loyalty Scheme,Operations,Deliver,Jenny,2021-04-28,4.0
6,Spring Sale,Marketing,Scope,Carl,2021-04-22,0.0
7,Spring Sale,Marketing,Build,Carl,2021-04-29,6.0
8,Spring Sale,Marketing,Deliver,Carl,2021-05-04,3.0
9,Spring Sale,Operations,Scope,Jonathan,2021-04-25,0.0


### Workout the 'Completed Date' by adding on how many days it took to complete each task from the Scheduled Date

In [14]:
df["Completed In Days from Scheduled Date"] = pd.to_timedelta(df["Completed In Days from Scheduled Date"], unit="D")
df["Completed Date"] = df["Scheduled Date"] + df["Completed In Days from Scheduled Date"]
df

### Rename 'Completed In Days from Schedule Date' to 'Days Difference to Schedule'

In [16]:
df = df.rename(columns={"Completed In Days from Scheduled Date": "Days Difference to Schedule"})
df

Unnamed: 0,Project,Sub-project,Task,Owner,Scheduled Date,Days Difference to Schedule,Completed Date
0,New Loyalty Scheme,Marketing,Scope,Tom,2021-04-19,0 days,2021-04-19
1,New Loyalty Scheme,Marketing,Build,Tom,2021-04-21,2 days,2021-04-23
2,New Loyalty Scheme,Marketing,Deliver,Tom,2021-04-30,5 days,2021-05-05
3,New Loyalty Scheme,Operations,Scope,Jenny,2021-04-15,0 days,2021-04-15
4,New Loyalty Scheme,Operations,Build,Jenny,2021-04-23,3 days,2021-04-26
5,New Loyalty Scheme,Operations,Deliver,Jenny,2021-04-28,4 days,2021-05-02
6,Spring Sale,Marketing,Scope,Carl,2021-04-22,0 days,2021-04-22
7,Spring Sale,Marketing,Build,Carl,2021-04-29,6 days,2021-05-05
8,Spring Sale,Marketing,Deliver,Carl,2021-05-04,3 days,2021-05-07
9,Spring Sale,Operations,Scope,Jonathan,2021-04-25,0 days,2021-04-25


In [47]:
df.melt(id_vars="Task", value_vars=["Scheduled Date"])

Unnamed: 0,Task,variable,value
0,Scope,Scheduled Date,2021-04-19
1,Build,Scheduled Date,2021-04-21
2,Deliver,Scheduled Date,2021-04-30
3,Scope,Scheduled Date,2021-04-15
4,Build,Scheduled Date,2021-04-23
5,Deliver,Scheduled Date,2021-04-28
6,Scope,Scheduled Date,2021-04-22
7,Build,Scheduled Date,2021-04-29
8,Deliver,Scheduled Date,2021-05-04
9,Scope,Scheduled Date,2021-04-25


In [51]:
tmp = df.melt(id_vars="Task", value_vars=["Completed Date"])
tmp

Unnamed: 0,Task,variable,value
0,Scope,Completed Date,2021-04-19
1,Build,Completed Date,2021-04-23
2,Deliver,Completed Date,2021-05-05
3,Scope,Completed Date,2021-04-15
4,Build,Completed Date,2021-04-26
5,Deliver,Completed Date,2021-05-02
6,Scope,Completed Date,2021-04-22
7,Build,Completed Date,2021-05-05
8,Deliver,Completed Date,2021-05-07
9,Scope,Completed Date,2021-04-25


In [64]:
tmp["ending time"] = tmp["value"].shift(-1, fill_value='2021-05-17')
tmp

  """Entry point for launching an IPython kernel.


Unnamed: 0,Task,variable,value,ending time
0,Scope,Completed Date,2021-04-19,2021-04-23
1,Build,Completed Date,2021-04-23,2021-05-05
2,Deliver,Completed Date,2021-05-05,2021-04-15
3,Scope,Completed Date,2021-04-15,2021-04-26
4,Build,Completed Date,2021-04-26,2021-05-02
5,Deliver,Completed Date,2021-05-02,2021-04-22
6,Scope,Completed Date,2021-04-22,2021-05-05
7,Build,Completed Date,2021-05-05,2021-05-07
8,Deliver,Completed Date,2021-05-07,2021-04-25
9,Scope,Completed Date,2021-04-25,2021-04-30
