# Race Dashboard



This document describes the requirements and design decisions we will adopt for the Race Dashboard for the Sanofi Asset Efficiency challenge.  This document is intended to provide a detailed explanation of what is to be included on the dashboard, required data sources, and assumptions being made in the design.  It will also show the steps for getting the data into the necessary format.

## Scope/objective

The objective is to be able to display the documented metrics and categories as best suited on a Dashboard presentation for Sanofi staff to be able to access.


## Metrics / categories
The metrics have been mapped into sectors to mimic different sectors of a race track.  The metrics are:
 
 
 
•     Race = 8 Laps = 8 Months   
•     Lap = Monthly Progress   
<br>
<br>

•     **Sector 1 = OEE Improvement   
•     Sector 2 = OEE Variability Improvement   
•     Sector 3 = Stoppage Reduction   
•     Sector 4 = Changeover Improvement**   
• _Sector 5 = Most effective OEE application   
• Sector 6 = Best Innovation   
• Sector 7 = Most consistent OEE improvement progress   
• Sector 8 = Collaboration   
• Sector 9 = Team Spirit_

 
 


| Change log |
|:----------:|    


| Date | Initials | Comments |
|------|:---------|:---------|
| 2021-06-23 | MC | in leaderboard, replace NaN values in laptime calc with the max laptime for that lap
| 2021-06-23 | MC | use race review dates for grouping data, rather than calendar months
| 2021-06-24 |JB | missing OEE_Diff figures should default to 'OEE %' - OEE start point, not just 'OEE %' value
| 2021-06-24 |JB | for sector one, multiply sum of OEE_Diff by -1.  Should have always been doing this.
|2021-06-25 | MC | in leaderboard, change prev_race_time calc to include all but last 2 cols, to handle new race cols as they arrive

In [3]:
import pandas as pd
import numpy as np
import datetime


# Viz libs
import matplotlib.pyplot as plt
%matplotlib inline

import seaborn as sns

# display options
# pd.options.display.float_format = "{:.2f}".format


## Read file and cleanse

- QSWeeklyFigures.xlsx should contain the latest data from QS   
- QSDashboard.xlsx has a list of plants/sites taking part, with original target OEE  
- QSWeeklyUnplannedTechLoss.xles should contain the latest data from QS (my unplanned chart by line/week)

#### Cleaning required:   
- OEE % needs converting to numeric, coerce the nulls to nan values
- I think W53-2020 data is bobbings - causes dup indexes for date 2021-01-10 and don't need 2020 data anyway...so dropping it

In [16]:
dir = "C:/Users/mark_/McLaren Technology Group/McLaren Accelerator - Sanofi - Sanofi/Data Analysis/"
# dir = "C:/Users/mark_/Sanofi/Sanofi x McLaren sharing - General/Race Dashboard data/"
output_dir = "C:/Users/mark_/Documents/McLaren2021/Sanofi/"

# dir = 'C:/Users/james.blood/Documents/McLarenSanofi/McLarenSanofi/data/'

file = (dir + 'OEE.xlsx')
df_weekly = pd.read_excel(file)
file = (dir + 'QSDashboard.xlsx')
df_dash = pd.read_excel(file)
file = (dir + 'Unplanned_tech_loss.xlsx')
df_techloss = pd.read_excel(file)
file = (dir + 'changeover.xlsx')
df_changeover = pd.read_excel(file)
df_weekly = df_weekly.loc[df_weekly['Week'].str.contains('2021')]
df_techloss = df_techloss.loc[df_techloss['Week'].str.contains('2021')]
df_weekly['OEE %'] = pd.to_numeric(df_weekly['OEE %'], errors='coerce')
df_techloss.rename(columns={'Unplanned losses - %OEE':'Unplanned_tech_loss'}, inplace=True)
df_techloss['Unplanned_tech_loss'] = pd.to_numeric(df_techloss['Unplanned_tech_loss'], errors='coerce')
df_changeover.rename(columns={'Change over losses - %OEE':'Changeover'}, inplace=True)
df_changeover['Changeover'] = pd.to_numeric(df_changeover['Changeover'], errors='coerce')
# don't use their progress figure as it's a static val
# df_dash.rename(columns={'⇗ OEE% progress':'OEE% progress'}, inplace=True)

create a datetime from the week number

In [17]:
df_weekly['WeekOfYear'] = pd.to_numeric(df_weekly['Week'].str[1:3])
df_weekly['Year'] = pd.to_numeric(df_weekly['Week'].str[4:])
dates = df_weekly.Year*100+df_weekly.WeekOfYear
df_weekly['Date'] = pd.to_datetime(dates.astype(str) + '0', format='%Y%W%w')
# df_weekly.drop(columns=['Year','WeekOfYear'], inplace=True)
df_weekly.head()

Unnamed: 0,Week,Line,OEE %,WeekOfYear,Year,Date
11,W01-2021,C2 Packaging Line,0.16897,1,2021,2021-01-10
12,W01-2021,C9 Packaging Line,,1,2021,2021-01-10
13,W01-2021,GAMMA1,0.406686,1,2021,2021-01-10
14,W01-2021,IMA C80/2,0.510044,1,2021,2021-01-10
15,W01-2021,L18 Packaging Line,0.173736,1,2021,2021-01-10


In [18]:
#merge the 2 dataframes to get the start OEE
df_weekly = df_weekly.merge(df_dash[['Plant','Line', 'OEE  Start point','OEE% Target (2022)']],on='Line')

In [19]:
df_weekly = df_weekly.merge(df_techloss[['Line', 'Week', 'Unplanned_tech_loss']],on=['Line','Week'])

In [20]:
df_weekly = df_weekly.merge(df_changeover[['Line','Week','Changeover']])

#### Start Changeover

Start changeover value isn't provided, so going to calc our own start point using the average changeover for each site in 2021 up to April 2021.  This needs to be done before we drop the early 2021 rows.

This is then merged into the df_weekly dataframe as a loose join.

In [21]:
start_changeover_calc = df_weekly[['Plant','Line','Changeover']][df_weekly['Date'] < '2021-04-30'].groupby(['Plant','Line']).mean().reset_index()
start_changeover_calc.rename(columns={'Changeover':'start_changeover'}, inplace=True)
df_weekly = df_weekly.merge(start_changeover_calc[['Line','start_changeover']])

Turn decimals into percentages before we go any calcs?? Not doing this at the moment as it's useful having similar values for calculating the sector times later.

In [22]:
# df_weekly[['OEE %','OEE  Start point','OEE% Target (2022)','Unplanned_tech_loss','Changeover']] = df_weekly[['OEE %','OEE  Start point','OEE% Target (2022)','Unplanned_tech_loss','Changeover']] * 100

#### Dates for the Asset Challenge

Start Date is going to be fixed as 2021-04-01. Remove all the rows from df_weekly before this date

End Date will move and act as a cutoff before each Race meeting

In [23]:
start_date = '2021-04-01'
df_weekly = df_weekly[df_weekly['Date'] > start_date].sort_values('Date')

# do we need this?  We now have race review dates
end_date = '2021-07-15'
df_weekly = df_weekly[df_weekly['Date'] < end_date].sort_values('Date')

### PCT_CHANGE
Using pct_change python function with periods=4, giving a 4 week (4 previous rows) rolling pct_change figure
- I believe we are doing this rolling average calculation within Tableau at the moment, so this isn't being used here

- Not sure whether this is required for all of the categories?

In [24]:
df_weekly.sort_values(['Line','Date'], inplace = True)
df_weekly['OEE_pct_chg'] = (df_weekly.groupby('Line')['OEE %']
                                   .apply(pd.Series.pct_change, periods=4))
df_weekly['techloss_pct_chg'] = (df_weekly.groupby('Line')['Unplanned_tech_loss']
                                   .apply(pd.Series.pct_change, periods=4))
df_weekly['Changeover_pct_chg'] = (df_weekly.groupby('Line')['Changeover']
                                   .apply(pd.Series.pct_change, periods=4))
df_weekly.head()

Unnamed: 0,Week,Line,OEE %,WeekOfYear,Year,Date,Plant,OEE Start point,OEE% Target (2022),Unplanned_tech_loss,Changeover,start_changeover,OEE_pct_chg,techloss_pct_chg,Changeover_pct_chg
286,W13-2021,AL5 Packaging 1,,13,2021,2021-04-04,Frankfurt,0.479693,0.5,,,0.0,,,
287,W14-2021,AL5 Packaging 1,,14,2021,2021-04-11,Frankfurt,0.479693,0.5,,,0.0,,,
288,W15-2021,AL5 Packaging 1,0.449745,15,2021,2021-04-18,Frankfurt,0.479693,0.5,0.0,0.0,0.0,,,
289,W16-2021,AL5 Packaging 1,0.642652,16,2021,2021-04-25,Frankfurt,0.479693,0.5,0.0,0.0,0.0,,,
290,W17-2021,AL5 Packaging 1,0.505804,17,2021,2021-05-02,Frankfurt,0.479693,0.5,0.0,0.0,0.0,,,


## Standard Deviation
Calculate std_dev and mean on a 4 week rolling basis

Standard deviation is the square root of the variance, so no need to calculate both and have left var out

In [25]:
df_weekly['rolling_std'] = df_weekly.groupby('Line')['OEE %'].apply(lambda x : x.rolling(4,1).agg(np.std))
df_weekly.head(50)

Unnamed: 0,Week,Line,OEE %,WeekOfYear,Year,Date,Plant,OEE Start point,OEE% Target (2022),Unplanned_tech_loss,Changeover,start_changeover,OEE_pct_chg,techloss_pct_chg,Changeover_pct_chg,rolling_std
286,W13-2021,AL5 Packaging 1,,13,2021,2021-04-04,Frankfurt,0.479693,0.5,,,0.0,,,,
287,W14-2021,AL5 Packaging 1,,14,2021,2021-04-11,Frankfurt,0.479693,0.5,,,0.0,,,,
288,W15-2021,AL5 Packaging 1,0.449745,15,2021,2021-04-18,Frankfurt,0.479693,0.5,0.0,0.0,0.0,,,,
289,W16-2021,AL5 Packaging 1,0.642652,16,2021,2021-04-25,Frankfurt,0.479693,0.5,0.0,0.0,0.0,,,,0.136405
290,W17-2021,AL5 Packaging 1,0.505804,17,2021,2021-05-02,Frankfurt,0.479693,0.5,0.0,0.0,0.0,,,,0.099233
291,W18-2021,AL5 Packaging 1,0.443611,18,2021,2021-05-09,Frankfurt,0.479693,0.5,0.0,0.0,0.0,,,,0.092469
292,W19-2021,AL5 Packaging 1,0.507565,19,2021,2021-05-16,Frankfurt,0.479693,0.5,0.0,0.0,0.0,0.12856,,,0.083941
293,W20-2021,AL5 Packaging 1,,20,2021,2021-05-23,Frankfurt,0.479693,0.5,,,0.0,-0.210202,,,0.036426
294,W21-2021,AL5 Packaging 1,,21,2021,2021-05-30,Frankfurt,0.479693,0.5,,,0.0,0.003481,,,0.045222
295,W22-2021,AL5 Packaging 1,0.0,22,2021,2021-06-06,Frankfurt,0.479693,0.5,0.0,0.0,0.0,-1.0,,,0.358903


### Calculating Sector times


The lap time is a sum of the calculated sector scores + the pole position time from the F1 race data (eg for Paul Ricard it was 88 secs):



Sector [1-4] calculations   
**sector 1**
How much has your OEE increased / decreased?  Sum difference between each week and multiply total by -1.  This provides a negative figure to subtract from your laptime, so that larger OEE increase is rewarded with a bigger reduction in laptime

df_weekly['sector_1'] = df_weekly['OEE_Diff'].mul(-1)

OEE_Diff calculation
- Sort values by Line and Date
- Find the difference between each weekly OEE figure
- Fill NaN values from missing OEE figures with the weekly OEE minus OEE Start Point for that site


**Sector 2** 
How big was your rolling std deviation this period, over the previous 4 weeks std dev?  

df_weekly['sector_2'] = df_weekly['rolling_std']

rolling_std = rolling std deviation for past 4 weeks for each site


 
**Sector 3**
We want to reduce Unplanned tech loss (recorded as % of OEE) Unplanned tech loss is calculated within QlikSense but missing values sometimes.  Fill the missing values and then display the average Unplanned tech loss :

df_weekly['sector_3'] = df_weekly['Unplanned_tech_loss']


Populate missing unplanned tech loss:
- Create weekly min/max cols for Unplanned tech loss from any site 
- Merge those columns into df_weekly 
- fill any NaN unplanned tech loss rows with the max OEE calc'd for that week (bigger is worse)

 
**Sector 4**
We're trying to reduce changeover time (recorded as % of OEE).  
Start changeover value isn't provided, so calc our own start point for each Line using the average changeover in 2021 up to 30 April, 2021.
   
start_changeover_calc = df_weekly[['Plant','Line','Changeover']][df_weekly['Date'] < '2021-04-30'].groupby(['Plant','Line']).mean().reset_index()
start_changeover_calc.rename(columns={'Changeover':'start_changeover'}, inplace=True)
df_weekly = df_weekly.merge(start_changeover_calc[['Line','start_changeover']])



df_weekly['sector_4'] = df_weekly['Changeover_rolling_mean']

Changeover_mean = df_weekly.sort_values(by=['Line', 'Date'])[['Line', 'Date', 'Changeover', 'start_changeover']]
Changeover_mean['Changeover_rolling_mean'] = Changeover_mean.groupby('Line')['Changeover'].apply(lambda x : x.rolling(4,1).mean())
df_weekly = df_weekly.merge(Changeover_mean[["Line","Date","Changeover_rolling_mean"]], on=(["Line","Date"]))



**Clean the sectors of NaN before summing them**   
Sometimes, when we haven't got enough information for pct_change calcs, we were getting no values coming through for the lap_time.  We should make sure there is a value in each of the sectors, otherwise there is an unfair advantage by not having data available.  Find all NaN values and replace with the mean for that column(sector)

**Sectors 5 - 9**   
These scores are taken from the Nomination process.  Read in the Nomination s/s, merge any values we find with df_weekly, replace all NaN (missing) values with 0, and reduce the scores we find to 10% of their original value.  This value is then subtracted from the lap_time - so the better you do in the nominations the more your lap_time gets reduced by.

 
**lap_time**
df_weekly['lap_time'] = df_weekly[['sector_1','sector_2','sector_3','sector_4']].sum(axis=1)

In [26]:
file = (dir + 'Nominations Category Scoring.xlsx')
df_nom_sectors = pd.read_excel(file, sheet_name='Nomination scoring', usecols="A:H", parse_dates=['Date'])

In [27]:
df_nom_sectors['Date'] = pd.Series(df_nom_sectors['Date']).fillna(method='ffill')
df_nom_sectors = df_nom_sectors.fillna(0)

df_weekly = df_weekly.merge(df_nom_sectors[['Line','Plant','Date','Best Solution','Best Innovation','Improvement Iterations','Lessons and Sharing','Team Contribution and Spirit']], how='outer', on=['Date','Plant','Line'])
df_weekly

Unnamed: 0,Week,Line,OEE %,WeekOfYear,Year,Date,Plant,OEE Start point,OEE% Target (2022),Unplanned_tech_loss,...,start_changeover,OEE_pct_chg,techloss_pct_chg,Changeover_pct_chg,rolling_std,Best Solution,Best Innovation,Improvement Iterations,Lessons and Sharing,Team Contribution and Spirit
0,W13-2021,AL5 Packaging 1,,13.0,2021.0,2021-04-04,Frankfurt,0.479693,0.5,,...,0.0,,,,,,,,,
1,W14-2021,AL5 Packaging 1,,14.0,2021.0,2021-04-11,Frankfurt,0.479693,0.5,,...,0.0,,,,,,,,,
2,W15-2021,AL5 Packaging 1,0.449745,15.0,2021.0,2021-04-18,Frankfurt,0.479693,0.5,0.0,...,0.0,,,,,,,,,
3,W16-2021,AL5 Packaging 1,0.642652,16.0,2021.0,2021-04-25,Frankfurt,0.479693,0.5,0.0,...,0.0,,,,0.136405,,,,,
4,W17-2021,AL5 Packaging 1,0.505804,17.0,2021.0,2021-05-02,Frankfurt,0.479693,0.5,0.0,...,0.0,,,,0.099233,,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
240,,AL5 Packaging 1,,,,2021-09-30,Frankfurt,,,,...,,,,,,0.0,0.0,0.0,0.0,0.0
241,,AL6,,,,2021-09-30,Frankfurt,,,,...,,,,,,0.0,0.0,0.0,0.0,0.0
242,,M18 Filling,,,,2021-09-30,Frankfurt,,,,...,,,,,,0.0,0.0,0.0,0.0,0.0
243,,M21 Filling,,,,2021-09-30,Frankfurt,,,,...,,,,,,0.0,0.0,0.0,0.0,0.0


#### Create review dates
create a review_date column for grouping the data later, so we only get the data we're interested in for each review

In [28]:
def thurs_of_weekbefore(year, week):
    return datetime.date.fromisocalendar(year, week-1, 4)  # (year, week before (w-1), thursday)

review_weeks = [16, 20, 24, 28, 34, 38, 42, 47]
review_dates = []

for i in review_weeks:
    if i > 0:
        review_dates.append((thurs_of_weekbefore(2021,i)))

df_review_dates = pd.DataFrame(review_dates)
df_review_dates.rename(columns={0:'Review_Date'}, inplace=True)
df_review_dates['Review_Date'] = pd.to_datetime(df_review_dates.Review_Date)

# df_review_dates.info()
df_weekly = pd.merge_asof(df_weekly.sort_values('Date'), df_review_dates, left_on='Date', right_on='Review_Date', direction='forward')


#### Populate missing OEE %
- Find the weekly min/max OEE % from any site   
- Merge those columns into df_weekly   
- fill any NaN with the min OEE we calc'd for that week   

In [29]:
df_weekly_minmax = (df_weekly.assign(Data_Value=df_weekly['OEE %'].abs())
       .groupby(pd.Grouper(key='Date',freq='W'))['OEE %'].agg([('Min' , 'min'), ('Max', 'max')])
       .add_prefix('Week'))
df_weekly_minmax.reset_index(inplace=True)
df_weekly = df_weekly.merge(df_weekly_minmax[['Date','WeekMin','WeekMax']])
df_weekly['OEE %'].fillna(df_weekly.WeekMin, inplace=True)
df_weekly

Unnamed: 0,Week,Line,OEE %,WeekOfYear,Year,Date,Plant,OEE Start point,OEE% Target (2022),Unplanned_tech_loss,...,Changeover_pct_chg,rolling_std,Best Solution,Best Innovation,Improvement Iterations,Lessons and Sharing,Team Contribution and Spirit,Review_Date,WeekMin,WeekMax
0,W13-2021,AL5 Packaging 1,0.046124,13.0,2021.0,2021-04-04,Frankfurt,0.479693,0.500,,...,,,,,,,,2021-04-15,0.046124,0.649336
1,W13-2021,M18 Filling,0.046124,13.0,2021.0,2021-04-04,Frankfurt,0.443522,0.650,0.0,...,,,,,,,,2021-04-15,0.046124,0.649336
2,W13-2021,C2 Packaging Line,0.458414,13.0,2021.0,2021-04-04,Maisons-Alfort,0.397503,0.470,0.0,...,,,,,,,,2021-04-15,0.046124,0.649336
3,W13-2021,TR200 Packaging Line,0.596432,13.0,2021.0,2021-04-04,Lisieux,0.483505,0.650,0.0,...,,,,,,,,2021-04-15,0.046124,0.649336
4,W13-2021,AL6,0.367897,13.0,2021.0,2021-04-04,Frankfurt,0.332657,0.450,0.0,...,,,,,,,,2021-04-15,0.046124,0.649336
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
180,W26-2021,SUPPO Packaging Line,0.000000,26.0,2021.0,2021-07-04,Lisieux,0.353021,0.530,,...,,0.099474,,,,,,2021-07-08,0.000000,0.695684
181,W26-2021,L18 Packaging Line,0.397574,26.0,2021.0,2021-07-04,Tours,0.377683,0.547,0.0,...,,0.083948,,,,,,2021-07-08,0.000000,0.695684
182,W26-2021,C9 Packaging Line,0.208822,26.0,2021.0,2021-07-04,Maisons-Alfort,0.528518,0.530,0.0,...,,0.173524,,,,,,2021-07-08,0.000000,0.695684
183,W26-2021,IMA C80/2,0.466914,26.0,2021.0,2021-07-04,SCOPPITO,0.451031,0.580,0.0,...,,0.047104,,,,,,2021-07-08,0.000000,0.695684


#### We need the diff between the weekly OEE % figures, and the Weekly Changeover figures
Need something to calculate the OEE Progress and Changeover, otherwise we will have problems when we group and sum values later
- Create OEE_diff with OEE % from groupby of each Line, Week (only 1 row per week, so 'mean' will yield the same)   
- Find the diff between the rows in OEE_Diff for each Line   
- fillNA (first row for each Line) with OEE Start point - should only be needed on the first row for each Line   

repeat same logic for Changeover - there will be more NaN as start_changeover wasn't provided for all.  We populate this later

In [30]:
# this was calculating the wrong Diff - the first row of each site was looking at the previous site for all but the 1st calc
# needed to sort by Line and Date first 

# OEE_Diff = df_weekly.groupby(['Line',pd.Grouper(key='Date',freq='W')])['OEE %'].mean().reset_index()
# OEE_Diff["OEE_Diff"] = OEE_Diff["OEE %"].diff()
# df_weekly = df_weekly.merge(OEE_Diff[["Line","Date","OEE_Diff"]], on=(["Line","Date"]))

# df_weekly['OEE_Diff'].fillna(df_weekly['OEE %'] - df_weekly['OEE  Start point'], inplace=True)
# df_weekly[["Line","Date","OEE %","OEE_Diff"]].head(50).sort_values(by=['Line', 'Date'])

In [31]:
df_weekly[df_weekly.Line.str.contains('AL5')]

Unnamed: 0,Week,Line,OEE %,WeekOfYear,Year,Date,Plant,OEE Start point,OEE% Target (2022),Unplanned_tech_loss,...,Changeover_pct_chg,rolling_std,Best Solution,Best Innovation,Improvement Iterations,Lessons and Sharing,Team Contribution and Spirit,Review_Date,WeekMin,WeekMax
0,W13-2021,AL5 Packaging 1,0.046124,13.0,2021.0,2021-04-04,Frankfurt,0.479693,0.5,,...,,,,,,,,2021-04-15,0.046124,0.649336
25,W14-2021,AL5 Packaging 1,0.259071,14.0,2021.0,2021-04-11,Frankfurt,0.479693,0.5,,...,,,,,,,,2021-04-15,0.259071,0.530707
37,W15-2021,AL5 Packaging 1,0.449745,15.0,2021.0,2021-04-18,Frankfurt,0.479693,0.5,0.0,...,,,,,,,,2021-05-13,0.187845,0.65547
40,W16-2021,AL5 Packaging 1,0.642652,16.0,2021.0,2021-04-25,Frankfurt,0.479693,0.5,0.0,...,,0.136405,,,,,,2021-05-13,0.0,0.703703
62,W17-2021,AL5 Packaging 1,0.505804,17.0,2021.0,2021-05-02,Frankfurt,0.479693,0.5,0.0,...,,0.099233,,,,,,2021-05-13,0.352339,0.707643
76,W18-2021,AL5 Packaging 1,0.443611,18.0,2021.0,2021-05-09,Frankfurt,0.479693,0.5,0.0,...,,0.092469,,,,,,2021-05-13,0.356017,0.769172
83,W19-2021,AL5 Packaging 1,0.507565,19.0,2021.0,2021-05-16,Frankfurt,0.479693,0.5,0.0,...,,0.083941,,,,,,2021-06-10,0.0,0.733994
92,W20-2021,AL5 Packaging 1,0.0,20.0,2021.0,2021-05-23,Frankfurt,0.479693,0.5,,...,,0.036426,,,,,,2021-06-10,0.0,0.698169
108,W21-2021,AL5 Packaging 1,0.191951,21.0,2021.0,2021-05-30,Frankfurt,0.479693,0.5,,...,,0.045222,,,,,,2021-06-10,0.191951,0.706192
119,W22-2021,AL5 Packaging 1,0.0,22.0,2021.0,2021-06-06,Frankfurt,0.479693,0.5,0.0,...,,0.358903,,,,,,2021-06-10,0.0,0.769493


In [32]:
OEE_Diff = df_weekly.sort_values(by=['Line', 'Date'])[['Line','Date','OEE %','OEE  Start point']]
OEE_Diff['OEE_Diff'] = OEE_Diff.groupby('Line')['OEE %'].diff().fillna(df_weekly['OEE %'] - df_weekly['OEE  Start point'])
df_weekly = df_weekly.merge(OEE_Diff[["Line","Date","OEE_Diff"]], on=(["Line","Date"]))
df_weekly[["Line","Date","OEE %","OEE_Diff"]].head(50).sort_values(by=['Line', 'Date'])

Unnamed: 0,Line,Date,OEE %,OEE_Diff
0,AL5 Packaging 1,2021-04-04,0.046124,-0.433569
25,AL5 Packaging 1,2021-04-11,0.259071,0.212947
37,AL5 Packaging 1,2021-04-18,0.449745,0.190675
40,AL5 Packaging 1,2021-04-25,0.642652,0.192906
4,AL6,2021-04-04,0.367897,0.03524
21,AL6,2021-04-11,0.360681,-0.007216
35,AL6,2021-04-18,0.33414,-0.026541
48,AL6,2021-04-25,0.309545,-0.024595
2,C2 Packaging Line,2021-04-04,0.458414,0.060911
20,C2 Packaging Line,2021-04-11,0.530707,0.072293


In [33]:
# Changeover_Diff = df_weekly.groupby(['Line',pd.Grouper(key='Date',freq='W')])['Changeover'].mean().reset_index()
# Changeover_Diff["Changeover_Diff"] = Changeover_diff["Changeover"].diff()
# df_weekly = df_weekly.merge(Changeover_diff[["Line","Date","Changeover_Diff"]], on=(["Line","Date"]))

# df_weekly['Changeover_Diff'].fillna(df_weekly['start_changeover'] - df_weekly['Changeover'], inplace=True)

In [34]:
Changeover_Diff = df_weekly.sort_values(by=['Line', 'Date'])[['Line','Date','Changeover','start_changeover']]
Changeover_Diff['Changeover_Diff'] = Changeover_Diff.groupby('Line')['Changeover'].diff().fillna(df_weekly['start_changeover'] - df_weekly['Changeover'])
df_weekly = df_weekly.merge(Changeover_Diff[["Line","Date","Changeover_Diff"]], on=(["Line","Date"]))

In [35]:
Changeover_mean = df_weekly.sort_values(by=['Line', 'Date'])[['Line','Date','Changeover','start_changeover']]
Changeover_mean['Changeover_rolling_mean'] = Changeover_mean.groupby('Line')['Changeover'].apply(lambda x : x.rolling(4,1).mean())
df_weekly = df_weekly.merge(Changeover_mean[["Line","Date","Changeover_rolling_mean"]], on=(["Line","Date"]))

#### Populate missing Unplanned Tech Loss

- Create weekly min/max cols for Unplanned tech loss from any site   
- Merge those columns into df_weekly   
- fill any NaN rows with the max OEE calc'd for that week   

**this might be flawed!!** 

In [36]:
df_weekly_minmax = (df_weekly.assign(Data_Value=df_weekly['Unplanned_tech_loss'].abs())
       .groupby(pd.Grouper(key='Date',freq='W'))['Unplanned_tech_loss'].agg([('Min' , 'min'), ('Max', 'max')])
       .add_prefix('WeekUTL'))
df_weekly_minmax.reset_index(inplace=True)
df_weekly = df_weekly.merge(df_weekly_minmax[['Date','WeekUTLMin','WeekUTLMax']])
df_weekly['Unplanned_tech_loss'].fillna(df_weekly.WeekUTLMax, inplace=True)
df_weekly

Unnamed: 0,Week,Line,OEE %,WeekOfYear,Year,Date,Plant,OEE Start point,OEE% Target (2022),Unplanned_tech_loss,...,Lessons and Sharing,Team Contribution and Spirit,Review_Date,WeekMin,WeekMax,OEE_Diff,Changeover_Diff,Changeover_rolling_mean,WeekUTLMin,WeekUTLMax
0,W13-2021,AL5 Packaging 1,0.046124,13.0,2021.0,2021-04-04,Frankfurt,0.479693,0.500,0.0,...,,,2021-04-15,0.046124,0.649336,-0.433569,,,0.0,0.0
1,W13-2021,M18 Filling,0.046124,13.0,2021.0,2021-04-04,Frankfurt,0.443522,0.650,0.0,...,,,2021-04-15,0.046124,0.649336,-0.397398,0.0,0.0,0.0,0.0
2,W13-2021,C2 Packaging Line,0.458414,13.0,2021.0,2021-04-04,Maisons-Alfort,0.397503,0.470,0.0,...,,,2021-04-15,0.046124,0.649336,0.060911,0.0,0.0,0.0,0.0
3,W13-2021,TR200 Packaging Line,0.596432,13.0,2021.0,2021-04-04,Lisieux,0.483505,0.650,0.0,...,,,2021-04-15,0.046124,0.649336,0.112928,0.0,0.0,0.0,0.0
4,W13-2021,AL6,0.367897,13.0,2021.0,2021-04-04,Frankfurt,0.332657,0.450,0.0,...,,,2021-04-15,0.046124,0.649336,0.035240,0.0,0.0,0.0,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
180,W26-2021,SUPPO Packaging Line,0.000000,26.0,2021.0,2021-07-04,Lisieux,0.353021,0.530,0.0,...,,,2021-07-08,0.000000,0.695684,-0.374391,,0.0,0.0,0.0
181,W26-2021,L18 Packaging Line,0.397574,26.0,2021.0,2021-07-04,Tours,0.377683,0.547,0.0,...,,,2021-07-08,0.000000,0.695684,0.050852,0.0,0.0,0.0,0.0
182,W26-2021,C9 Packaging Line,0.208822,26.0,2021.0,2021-07-04,Maisons-Alfort,0.528518,0.530,0.0,...,,,2021-07-08,0.000000,0.695684,-0.279919,0.0,0.0,0.0,0.0
183,W26-2021,IMA C80/2,0.466914,26.0,2021.0,2021-07-04,SCOPPITO,0.451031,0.580,0.0,...,,,2021-07-08,0.000000,0.695684,-0.110146,0.0,0.0,0.0,0.0


#### Populate missing Changeover 

In [37]:
df_weekly_minmax = (df_weekly.assign(Data_Value=df_weekly['Changeover'].abs())
       .groupby(pd.Grouper(key='Date',freq='W'))['Changeover'].agg([('Min' , 'min'), ('Max', 'max')])
       .add_prefix('WeekChangeover'))
df_weekly_minmax.reset_index(inplace=True)
df_weekly = df_weekly.merge(df_weekly_minmax[['Date','WeekChangeoverMin','WeekChangeoverMax']])
df_weekly['Changeover'].fillna(df_weekly.WeekChangeoverMax, inplace=True)
df_weekly

Unnamed: 0,Week,Line,OEE %,WeekOfYear,Year,Date,Plant,OEE Start point,OEE% Target (2022),Unplanned_tech_loss,...,Review_Date,WeekMin,WeekMax,OEE_Diff,Changeover_Diff,Changeover_rolling_mean,WeekUTLMin,WeekUTLMax,WeekChangeoverMin,WeekChangeoverMax
0,W13-2021,AL5 Packaging 1,0.046124,13.0,2021.0,2021-04-04,Frankfurt,0.479693,0.500,0.0,...,2021-04-15,0.046124,0.649336,-0.433569,,,0.0,0.0,0.0,0.0
1,W13-2021,M18 Filling,0.046124,13.0,2021.0,2021-04-04,Frankfurt,0.443522,0.650,0.0,...,2021-04-15,0.046124,0.649336,-0.397398,0.0,0.0,0.0,0.0,0.0,0.0
2,W13-2021,C2 Packaging Line,0.458414,13.0,2021.0,2021-04-04,Maisons-Alfort,0.397503,0.470,0.0,...,2021-04-15,0.046124,0.649336,0.060911,0.0,0.0,0.0,0.0,0.0,0.0
3,W13-2021,TR200 Packaging Line,0.596432,13.0,2021.0,2021-04-04,Lisieux,0.483505,0.650,0.0,...,2021-04-15,0.046124,0.649336,0.112928,0.0,0.0,0.0,0.0,0.0,0.0
4,W13-2021,AL6,0.367897,13.0,2021.0,2021-04-04,Frankfurt,0.332657,0.450,0.0,...,2021-04-15,0.046124,0.649336,0.035240,0.0,0.0,0.0,0.0,0.0,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
180,W26-2021,SUPPO Packaging Line,0.000000,26.0,2021.0,2021-07-04,Lisieux,0.353021,0.530,0.0,...,2021-07-08,0.000000,0.695684,-0.374391,,0.0,0.0,0.0,0.0,0.0
181,W26-2021,L18 Packaging Line,0.397574,26.0,2021.0,2021-07-04,Tours,0.377683,0.547,0.0,...,2021-07-08,0.000000,0.695684,0.050852,0.0,0.0,0.0,0.0,0.0,0.0
182,W26-2021,C9 Packaging Line,0.208822,26.0,2021.0,2021-07-04,Maisons-Alfort,0.528518,0.530,0.0,...,2021-07-08,0.000000,0.695684,-0.279919,0.0,0.0,0.0,0.0,0.0,0.0
183,W26-2021,IMA C80/2,0.466914,26.0,2021.0,2021-07-04,SCOPPITO,0.451031,0.580,0.0,...,2021-07-08,0.000000,0.695684,-0.110146,0.0,0.0,0.0,0.0,0.0,0.0


#### Populate missing start_changeover (NaN)
If we've got this far and you still don't have a start_changeover, then you're a new site and can have this week's Changeover value

These rows still have NaN Changeover values

In [38]:
df_weekly[['Line','Date','Changeover','start_changeover']][df_weekly['Changeover'].isna()]

Unnamed: 0,Line,Date,Changeover,start_changeover


In [39]:
df_weekly.start_changeover.fillna(df_weekly.Changeover, inplace=True)
df_weekly['OEE  Start point'].fillna(df_weekly['OEE %'], inplace=True)
df_weekly['OEE% Target (2022)'].fillna(0.65, inplace=True)

#### Sector times

In [40]:
# df_weekly['sector_1'] = (df_weekly['WeekMax'] - df_weekly['OEE %'])
# df_weekly['sector_1'] = (df_weekly['OEE  Start point'] - df_weekly['OEE %'])
df_weekly['sector_1'] = df_weekly['OEE_Diff'].mul(-1)
df_weekly['sector_2'] = df_weekly['rolling_std']
df_weekly['sector_3'] = df_weekly['Unplanned_tech_loss']
# df_weekly['sector_4'] = (df_weekly['start_changeover'] - df_weekly['Changeover'])
# df_weekly['sector_4'] = (df_weekly['Changeover'] - df_weekly['start_changeover'])
df_weekly['sector_4'] = df_weekly['Changeover_rolling_mean']
# take 10% of the sector5-9 scores 
df_weekly[['sector_5','sector_6','sector_7','sector_8','sector_9']] = df_weekly[['Best Solution','Best Innovation','Improvement Iterations','Lessons and Sharing','Team Contribution and Spirit']] * -0.1
df_weekly[['sector_5','sector_6','sector_7','sector_8','sector_9']] = df_weekly[['sector_5','sector_6','sector_7','sector_8','sector_9']].fillna(0)

# we'll use these in the absence of values for a sector
df_weekly[['sector_1','sector_2','sector_3','sector_4']] = df_weekly[['sector_1','sector_2','sector_3','sector_4']].fillna(df_weekly[['sector_1','sector_2','sector_3','sector_4']].mean())

#this will sum and handle the NaN
df_weekly['lap_time'] = df_weekly[['sector_1','sector_2','sector_3','sector_4','sector_5','sector_6','sector_7','sector_8','sector_9']].sum(axis=1)

# now add the pole['Laptime'] from fastf1 to the lap_time adjustment we've created
# just use 88 secs rather than playing with timedeltas for now
# df_weekly['lap_time'] = pole['LapTime'] + pd.to_timedelta(df_weekly['lap_time'], unit='S')
# df_weekly['lap_time'] = 88 + df_weekly['lap_time']
df_weekly.groupby(['Line', pd.Grouper(key='Date', freq='W')])['lap_time'].sum()
# print (df_weekly['sector_1_time'] , df_weekly['sector_2_time'] , df_weekly['sector_3_time'], df_weekly['sector_4_time'])

Line                  Date      
AL5 Packaging 1       2021-04-04    0.538653
                      2021-04-11   -0.107863
                      2021-04-18   -0.085591
                      2021-04-25   -0.056501
                      2021-05-02    0.236080
                                      ...   
TR200 Packaging Line  2021-06-06   -0.136638
                      2021-06-13    0.349383
                      2021-06-20    0.262920
                      2021-06-27    0.004328
                      2021-07-04    0.636541
Name: lap_time, Length: 185, dtype: float64

#### Write out df_weekly to excel

In [41]:
df_weekly.to_excel(output_dir + "df_weekly_with_calcs.xlsx")

#### Monthly Calcs

Repeat the process for a df_monthly spreadsheet.  We will use this for calculating the Leader board  
group df_weekly by review_date so we can get the right data for each review meeting

In [42]:
# df_monthly = df_weekly.groupby([pd.Grouper(key='Date',freq='M'),'Line'])[['start_changeover','OEE  Start point','OEE %','Unplanned_tech_loss','Changeover','rolling_std','techloss_pct_chg','Changeover_pct_chg']].mean().reset_index()
# df_monthly = df_weekly.groupby([pd.Grouper(key='Date',freq='M'),'Line']).lap_time.sum().reset_index()
df_monthly = df_weekly.groupby(['Review_Date','Line']).lap_time.sum().reset_index()
# change the name of review_date to save renaming all references to Date later
df_monthly = df_monthly.rename(columns={'Review_Date':'Date'})
df_monthly

Unnamed: 0,Date,Line,lap_time
0,2021-04-15,AL5 Packaging 1,0.43079
1,2021-04-15,AL6,0.082162
2,2021-04-15,C2 Packaging Line,0.022998
3,2021-04-15,C9 Packaging Line,0.454274
4,2021-04-15,GAMMA1,0.163419
5,2021-04-15,IMA C80/2,0.226898
6,2021-04-15,L18 Packaging Line,0.110771
7,2021-04-15,L25 Packaging Line,0.142717
8,2021-04-15,M18 Filling,0.380376
9,2021-04-15,M21 Filling,0.550768


In [43]:
# df_monthly_minmax = (df_weekly.assign(Data_Value=df_weekly['OEE %'].abs())
#        .groupby(pd.Grouper(key='Date',freq='M'))['OEE %'].agg([('Min' , 'min'), ('Max', 'max')])
#        .add_prefix('Month'))
# df_monthly_minmax.reset_index(inplace=True)
# df_monthly = df_monthly.merge(df_monthly_minmax[['Date','MonthMin','MonthMax']])
df_monthly['lap_time'] = df_monthly['lap_time'] + 88
df_monthly

Unnamed: 0,Date,Line,lap_time
0,2021-04-15,AL5 Packaging 1,88.43079
1,2021-04-15,AL6,88.082162
2,2021-04-15,C2 Packaging Line,88.022998
3,2021-04-15,C9 Packaging Line,88.454274
4,2021-04-15,GAMMA1,88.163419
5,2021-04-15,IMA C80/2,88.226898
6,2021-04-15,L18 Packaging Line,88.110771
7,2021-04-15,L25 Packaging Line,88.142717
8,2021-04-15,M18 Filling,88.380376
9,2021-04-15,M21 Filling,88.550768


### Leader board table

In [49]:
# filter using the end_date to stop picking up future dated nomination rows of zero I created when joining the s/s
pivot = df_monthly[df_monthly['Date'] < end_date].pivot(index='Line', columns='Date', values='lap_time')
pivot.reset_index(inplace=True)
# pivot creates NaN for rows with no monthly data for each race review data
# populate each NaN value with the max for that column - so they get the max laptime for that race
# we can search for cols [1:] and find all cols after Date and Line
pivot.iloc[:,1:] = pivot.iloc[:,1:].fillna(pivot.iloc[:,1:].max())

# sum all the columns to get a race_time
pivot['race_time'] = pivot.sum(axis=1)
# sum all but the last 2 cols (this lap and the race_time) to calc prev_race_time

# pivot['prev_race_time'] = pivot[pivot.columns[2]] + pivot[pivot.columns[3]]
pivot['prev_race_time'] = pivot.iloc[:,1:-2].sum(axis=1)

pivot = pivot.merge(df_dash[['Plant','Line']], on='Line')
pivot.sort_values('race_time', inplace=True)
pivot['position'] = np.arange(1,len(pivot) + 1)
pivot['gap_to_leader'] = pivot['race_time'] - pivot['race_time'].iloc[0]
pivot.sort_values('prev_race_time', inplace=True)
pivot['prev_position'] = np.arange(1,len(pivot) + 1)
pivot['Gain/Loss'] = pivot.prev_position - pivot.position
pivot.sort_values('race_time', inplace=True)
pivot['interval'] = pivot.race_time.diff()
pivot = pivot.merge(df_dash[['Line','OEE  Start point', '⇗ OEE% progress', 'OEE% Target (2022)']], on='Line')
pivot

Unnamed: 0,Line,2021-04-15 00:00:00,2021-05-13 00:00:00,2021-06-10 00:00:00,2021-07-08 00:00:00,race_time,prev_race_time,Plant,position,gap_to_leader,prev_position,Gain/Loss,interval,OEE Start point,⇗ OEE% progress,OEE% Target (2022)
0,GAMMA1,88.163419,88.084194,88.209689,67.03777,331.495072,264.457302,SCOPPITO,1,0.0,4,3,,0.418683,0.085148,0.57
1,IMA C80/2,88.226898,88.194613,88.202379,73.717674,338.341565,264.623891,SCOPPITO,2,6.846493,6,4,6.846493,0.451031,0.043365,0.58
2,AL6,88.082162,88.050678,88.197412,82.988464,347.318715,264.330251,Frankfurt,3,15.823643,1,-2,8.97715,0.332657,0.078541,0.45
3,M22 Filling,88.364827,88.309584,88.649614,83.684813,349.008838,265.324025,Frankfurt,4,17.513766,9,5,1.690123,0.530068,0.12028,0.65
4,M21 Filling,88.550768,88.510135,88.824939,84.188437,350.074279,265.885842,Frankfurt,5,18.579207,12,7,1.065441,0.599671,0.022006,0.65
5,M18 Filling,88.380376,88.489084,89.117409,84.102993,350.089862,265.986869,Frankfurt,6,18.59479,13,7,0.015583,0.443522,-0.010057,0.65
6,L18 Packaging Line,88.110771,88.114026,88.128288,88.378179,352.731264,264.353085,Tours,7,21.236192,2,-5,2.641403,0.377683,0.086173,0.547
7,L25 Packaging Line,88.142717,88.396446,88.17466,88.105085,352.818908,264.713823,Tours,8,21.323835,7,-1,0.087643,0.351564,0.001613,0.478
8,C2 Packaging Line,88.022998,88.2866,88.258832,88.612671,353.181101,264.56843,Maisons-Alfort,9,21.686029,5,-4,0.362193,0.397503,0.012193,0.47
9,C9 Packaging Line,88.454274,88.082285,88.347275,88.639319,353.523154,264.883834,Maisons-Alfort,10,22.028081,8,-2,0.342053,0.528518,-0.019492,0.53


#### write this out for tableau

In [50]:
pivot.to_csv(output_dir + "leaderboard.csv")

END OF PROCESSING - Sanity checks below

In [45]:
pivot.sort_values(pivot.columns[1], inplace=True)
pivot['apr_position'] = np.arange(1,len(pivot) + 1)
pivot.sort_values(pivot.columns[2], inplace=True)
pivot['may_position'] = np.arange(1,len(pivot) + 1)
pivot.sort_values(pivot.columns[3], inplace=True)
pivot['jun_position'] = np.arange(1,len(pivot) + 1)
pivot.sort_values(pivot.columns[4], inplace=True)
pivot['jly_position'] = np.arange(1,len(pivot) + 1)

pivot[pivot.columns[[0,-4,-3,-2,-1]]]

Unnamed: 0,Line,apr_position,may_position,jun_position,jly_position
0,GAMMA1,7,3,6,1
1,IMA C80/2,8,5,5,2
2,AL6,3,1,4,3
3,M22 Filling,9,9,9,4
5,M18 Filling,10,12,12,5
4,M21 Filling,13,13,10,6
7,L25 Packaging Line,6,10,3,7
12,LINE 01 - UHLMANN 1880,14,14,14,8
13,MEDISEAL PURAN,15,15,15,9
6,L18 Packaging Line,4,4,2,10


In [47]:
df_weekly[['Week','Line','Changeover_Diff','Changeover_rolling_mean']][df_weekly['Line'].str.contains('AL6')]

Unnamed: 0,Week,Line,Changeover_Diff,Changeover_rolling_mean
4,W13-2021,AL6,0.0,0.0
21,W14-2021,AL6,0.0,0.0
35,W15-2021,AL6,0.0,0.0
48,W16-2021,AL6,0.0,0.0
64,W17-2021,AL6,0.0,0.0
73,W18-2021,AL6,0.0,0.0
89,W19-2021,AL6,0.0,0.0
94,W20-2021,AL6,0.0,0.0
107,W21-2021,AL6,0.0,0.0
124,W22-2021,AL6,0.0,0.0


In [48]:
df_weekly[['Date','Plant','Line','lap_time']].groupby(['Line',pd.Grouper(key='Date', freq='W')]).sum().reset_index()

Unnamed: 0,Line,Date,lap_time
0,AL5 Packaging 1,2021-04-04,0.538653
1,AL5 Packaging 1,2021-04-11,-0.107863
2,AL5 Packaging 1,2021-04-18,-0.085591
3,AL5 Packaging 1,2021-04-25,-0.056501
4,AL5 Packaging 1,2021-05-02,0.236080
...,...,...,...
180,TR200 Packaging Line,2021-06-06,-0.136638
181,TR200 Packaging Line,2021-06-13,0.349383
182,TR200 Packaging Line,2021-06-20,0.262920
183,TR200 Packaging Line,2021-06-27,0.004328


In [34]:
df_weekly[['Plant','Line','lap_time']][df_weekly['Date'] == '2021-06-13']

Unnamed: 0,Plant,Line,lap_time
130,Frankfurt,M21 Filling,-4.816726
131,SUZANO,MEDISEAL PURAN,0.693566
132,Lisieux,IWK Packaging Line,0.693566
133,SUZANO,LINE 01 - UHLMANN 1880,0.693566
134,Tours,L18 Packaging Line,0.364237
135,Lisieux,TR200 Packaging Line,0.704211
136,Frankfurt,M18 Filling,-4.296866
137,Frankfurt,AL5 Packaging 1,0.294557
138,SCOPPITO,GAMMA1,-21.344741
139,Maisons-Alfort,C9 Packaging Line,0.404967
