## How to run the script

1. Please run the pre intervention script before the post intervention script
2. Backlog is not time dependent so this script can be run straightaway. 

In [1]:
output_file_name = 'BT_Workbank_KPIs_TD_Pre_2021-2022_P04.csv'
column_name2 = 'Pre-intervention (2021-2022 P04)'

## Purpose of script

The aim of this script is to reduce the number of outstanding jobs (Workbank) related to a specific competence using the strategic framework for benefit and assessment stratergy. This will done in two steps:

First stage indicators:
1. Number of jobs associated to retrained competences delivered (i.e. number of completed jobs).
2. Number of jobs delivered on per team member basis (i.e. number of completed jobs/ team size).
3. Successfully completed jobs as a proportion of the total required jobs (i.e. completed jobs / required jobs).

Second stage indicators:
1. Workbank backlog associaed to retained competences

All this will be answered for pre and post intervention with the date of intervention taken as September 2021 by using the folowing steps in the script.

1.  Get all file paths and read files
2. Pre process the data
3. Join Backlog data to plan vs completed
4. Merge the competence held 
5. Join team size
6. Recommendations taken up to date
7. Workbank KPIs

## 1. Get all file paths and read files

#### 1.A. Import relavent libraries

In [2]:
import pandas as pd
pd.set_option('display.float_format', lambda x: '%.3f' % x) # Supress scientific notation
import numpy as np
from azure.storage.blob import BlobServiceClient
import os
import pyodbc
from io import StringIO, BytesIO
import io

#### 1.B Define connections to SQL tables and Azure Storage blobs

In [3]:
# Define the connection to SQL database
#conn = pyodbc.connect( # SQL server
                       # name of SQL data base
                        # UID =
                       # Password
#                      ) # databasename

# To connect to Azure devp envirnoment                        
#STORAGEACCOUNTURL= ''
#STORAGEACCOUNTKEY= ''
#CONTAINERNAME= ''
#blob_service_client_instance_devp = BlobServiceClient(account_url=STORAGEACCOUNTURL, credential=STORAGEACCOUNTKEY)

# To connect to Azure test envirnoment                        
#STORAGEACCOUNTURL= ''
#STORAGEACCOUNTKEY= ''
#CONTAINERNAME= ''
#blob_service_client_instance_devp = BlobServiceClient(account_url=STORAGEACCOUNTURL, credential=STORAGEACCOUNTKEY)

# To connect to Azure staging envirnoment                        
#STORAGEACCOUNTURL= ''
#STORAGEACCOUNTKEY= ''
#CONTAINERNAME= ''
#blob_service_client_instance_devp = BlobServiceClient(account_url=STORAGEACCOUNTURL, credential=STORAGEACCOUNTKEY)

# To connect to Azure prod envirnoment                        
#STORAGEACCOUNTURL= ''
#STORAGEACCOUNTKEY= ''
#CONTAINERNAME= ''
#blob_service_client_instance_devp = BlobServiceClient(account_url=STORAGEACCOUNTURL, credential=STORAGEACCOUNTKEY)


#### 1.C Define all the Functions used in the script

In [4]:
# Define a funtion to read csv files directly from azure data storage
def download_csv(BLOBNAME,header_row):
    blob_client_instance = blob_service_client_instance_devp.get_blob_client(CONTAINERNAME, BLOBNAME, snapshot=None)
    blob_data = blob_client_instance.download_blob()
    with BytesIO() as f:
        blob_data.readinto(f)
        f.seek(0)
        data = pd.read_csv(f,header=header_row)   
    return data

# Rename columns
def rename_column(table_name):
    table_name.columns = table_name.columns.str.replace(' ', '')
    table_name.columns = table_name.columns.str.lower()

# Save csv files and upload to azure blob storage
def save_outputs_azure_blobs(output_name1,output_name2,azure_environment):
    output_name1.to_csv(output_name2, index = False)
    # Create a blob client using the local file name as the name for the blob
    blob_client = azure_environment.get_blob_client(container='output', blob='static/benefitTracking/Workbank/{tmp2}'.format(tmp2=output_name2))
    # Upload the created file
    with open(output_name2, "rb") as data:
        blob_client.upload_blob(data,overwrite=True)

#### 1.D Read the paths to all files/download blobs/read SQL tables

In [5]:
# Backlog
backlog= download_csv('static/benefitsBaseline/backlog/Backlog Baseline.csv',0)
rename_column(backlog)

# Planned vs completed 
plan_completed= download_csv('static/benefitsBaseline/plannedVcompleted/Planned v Completed Baseline.csv',1)
rename_column(plan_completed)

# Master competence mapping 
BLOBNAME = 'static/benefitsBaseline/competenceLinks/Master Competence Links Baseline.xlsx'
blob_client_instance = blob_service_client_instance_devp.get_blob_client(CONTAINERNAME, BLOBNAME, snapshot=None)
blob_data = blob_client_instance.download_blob()
with BytesIO() as f:
    blob_data.readinto(f)
    f.seek(0)
    competences_mapping = pd.read_excel(f,sheet_name='Workbank')   
rename_column(competences_mapping)

# Team Size (AZURE BLOB)
comp_teamsize =  download_csv('static/benefitsBaseline/teamSize/team_size_PreInter.csv',0)
rename_column(comp_teamsize)

# workbank trainning delivered (AZURE BLOB) 
workbank_trainning_demand =  download_csv('static/benefitsBaseline/trainning_delivered_recommendations/workbank_recomen_course_trainning.csv',0)
rename_column(workbank_trainning_demand)

## 2. Pre process the data

#### 2.A. Planned vs completed

In [6]:
# Convert columns to same format to enable convertion step below
plan_completed[['unitsrequired', 'unitscompleted']] = \
    plan_completed[['unitsrequired', 'unitscompleted']].applymap(lambda x: str(x).replace(',',''))
plan_completed[['unitsrequired', 'unitscompleted']] = plan_completed[['unitsrequired', 'unitscompleted']].astype(float)

# Drop rows where both the unitsrequired and units complted is nan as these are not useful to us 
plan_completed = plan_completed.dropna(subset=['unitsrequired', 'unitscompleted'], how='all').reset_index()

In [7]:
# Summarise the plan vs completed data
plan_completed_summary = pd.DataFrame(plan_completed[['workgroupset', 
                                                      'standardjobnumber&desc',
                                                      'unitsrequired', 
                                                      'unitscompleted']].groupby(['workgroupset',
                                                                                   'standardjobnumber&desc']).agg({'unitsrequired': 'sum','unitscompleted': 'sum'}))
# Sort DataFrame
plan_completed_summary.reset_index(inplace=True)
plan_completed_summary.shape

(177146, 4)

#### 2.B. Backlog

In [8]:
# Only keep records which are actually in backlog filter using the "backlog?" column
backlog = backlog.loc[backlog['backlog?']=='Y'].reset_index(drop=True)

# Convert columns to same format to enable convertion step below
backlog[['backloghoursrequired', 'backlogunitsrequired']] = \
    backlog[['backloghoursrequired', 'backlogunitsrequired']].applymap(lambda x: str(x).replace(',',''))

# Convert columns to number format
backlog[['backloghoursrequired', 'backlogunitsrequired']] = backlog[['backloghoursrequired', 'backlogunitsrequired']].astype(float)

In [9]:
# Summarise the plan vs completed data
backlog_summary = pd.DataFrame(backlog[['workgroupset',
                                        'standardjobnumber&desc',
                                        'backloghoursrequired',
                                        'backlogunitsrequired']].groupby(['workgroupset',
                                                                            'standardjobnumber&desc']).agg({'backloghoursrequired': 'sum',
                                                                                                    'backlogunitsrequired': 'sum'}))
# Sort DataFrame
backlog_summary.reset_index(inplace=True)
backlog_summary.shape

(23817, 4)

## 3. Join Backlog data to plan vs completed

In [10]:
backlog_p_c = pd.merge(plan_completed_summary,backlog_summary, 
                       on=['standardjobnumber&desc','workgroupset'],
                       how='left')

# in this dataset, where the backlog?, vbackloghoursrequired and backlogunitsrequired are all nan, means that it wasn't found in the planned vs completed data.


backlog_p_c["foundinbacklog?"] = backlog_p_c['backloghoursrequired'].isnull().map({True: "Not Found", False: "Found"})
backlog_p_c = backlog_p_c.dropna(subset=['backloghoursrequired', 'backloghoursrequired'], how='all').reset_index()
backlog_p_c.head(3)

Unnamed: 0,index,workgroupset,standardjobnumber&desc,unitsrequired,unitscompleted,backloghoursrequired,backlogunitsrequired,foundinbacklog?
0,5,ATME IRVINE,009032 - TEF3034. SURVEY PLATFORM,110.0,367.0,36.0,12.0,Found
1,6,ATME IRVINE,009040 - TEF3020. MONITOR SIDEWEAR ON RAIL,361.35,776.46,14.93,14.93,Found
2,7,ATME IRVINE,009050 - TEF3041. RECORD TRACK GEOMETRY MANUALLY,209550.0,291135.73,51.47,5147.0,Found


## 4. Merge the jobs competence mapping to data 

#### 4.A. Map backlog/plannedVScompleted data to competences

In [11]:
# Only keep columns of interest from mapping
columns_keep_competence = ['standardjobnumber&desc', 'sjtaskdescription',
                           'discipline', 'relevantcapabilities(gang)']
competences_mapping = competences_mapping[columns_keep_competence]

# Merge the two datasets
backlog_p_c_comp = backlog_p_c.merge(competences_mapping, how='left', on='standardjobnumber&desc')
backlog_p_c_comp['relevantcapabilities(gang)'].fillna('Unknown', inplace=True)
backlog_p_c_comp['missingmapping']  = np.where(backlog_p_c_comp['relevantcapabilities(gang)']=='Unknown', 'Unknown', 'Mapped')
#backlog_p_c_comp.to_csv('backlog_p_c_comp.csv')
missing_competences = backlog_p_c_comp.groupby('missingmapping')[['unitscompleted']].count()

# Drop jobs that do not map to any competences
backlog_p_c_comp = backlog_p_c_comp.loc[backlog_p_c_comp['relevantcapabilities(gang)'] != 'Unknown'].reset_index(drop=True)
backlog_p_c_comp.head(2)

Unnamed: 0,index,workgroupset,standardjobnumber&desc,unitsrequired,unitscompleted,backloghoursrequired,backlogunitsrequired,foundinbacklog?,sjtaskdescription,discipline,relevantcapabilities(gang),missingmapping
0,561,Aberdeen SM(SIGNALS),006289 - SIGNALLING LOCKOUT SYSTEM,34.0,26.0,1.14,2.0,Found,NR/SMS/SW01 - A,S&T,Sig 07,Mapped
1,563,Aberdeen SM(SIGNALS),006292 - S/BOX LC CONTROL / INDICATION UNIT,32.0,135.0,0.72,3.0,Found,NR/SMS/SB11 - A,S&T,Sig 07,Mapped


In [12]:
# Determine how much of the data was mapped and not mapped to competences
missing_competences['Percentage Units Mapped (%)'] = round(missing_competences['unitscompleted']/np.sum(missing_competences['unitscompleted'])*100,1)
missing_competences

Unnamed: 0_level_0,unitscompleted,Percentage Units Mapped (%)
missingmapping,Unnamed: 1_level_1,Unnamed: 2_level_1
Mapped,11092,46.6
Unknown,12725,53.4


#### 4.B. For backlog jobs where more than one competence is mapped to, we need to split the competences and equally distribute the cost and delay attributed to each.

In [13]:
# Split competences into a new column
backlog_p_c_comp['competencealias'] = backlog_p_c_comp['relevantcapabilities(gang)'].str.split('/')

# Add the count of competences into a new column
backlog_p_c_comp['numberofcompetences'] = backlog_p_c_comp['competencealias'].apply(lambda x: len(x))

# Divide the backloghoursrequired and backlogunitsrequired by the numberofcompetences.
# Once the record is expanded to the equivalent numberofcompetences the sum will be the correct one
backlog_p_c_comp['attributedunitsrequired'] = np.round(backlog_p_c_comp['unitsrequired'] / backlog_p_c_comp['numberofcompetences'],1)
backlog_p_c_comp['attributedunitscompleted'] = np.round(backlog_p_c_comp['unitscompleted'] / backlog_p_c_comp['numberofcompetences'],1)
backlog_p_c_comp['attributedbacklogunitsrequired'] = np.round(backlog_p_c_comp['backlogunitsrequired'] / backlog_p_c_comp['numberofcompetences'],1)
backlog_p_c_comp['attributedbackloghoursrequired'] = np.round(backlog_p_c_comp['backloghoursrequired'] / backlog_p_c_comp['numberofcompetences'],1)
# Now we can expand the competence column to the numberofcompetences
backlog_p_c_comp_2 = backlog_p_c_comp.explode('competencealias').reset_index(drop=True)

# Drop old Competence column
backlog_p_c_comp_2.drop(['numberofcompetences', 'relevantcapabilities(gang)'], axis=1, inplace=True)

# remove whitespaces from the columns 
backlog_p_c_comp_2['competencealias'] = backlog_p_c_comp_2['competencealias'].str.strip()

backlog_p_c_comp_2.head(5)

Unnamed: 0,index,workgroupset,standardjobnumber&desc,unitsrequired,unitscompleted,backloghoursrequired,backlogunitsrequired,foundinbacklog?,sjtaskdescription,discipline,missingmapping,competencealias,attributedunitsrequired,attributedunitscompleted,attributedbacklogunitsrequired,attributedbackloghoursrequired
0,561,Aberdeen SM(SIGNALS),006289 - SIGNALLING LOCKOUT SYSTEM,34.0,26.0,1.14,2.0,Found,NR/SMS/SW01 - A,S&T,Mapped,Sig 07,34.0,26.0,2.0,1.1
1,563,Aberdeen SM(SIGNALS),006292 - S/BOX LC CONTROL / INDICATION UNIT,32.0,135.0,0.72,3.0,Found,NR/SMS/SB11 - A,S&T,Mapped,Sig 07,32.0,135.0,3.0,0.7
2,567,Aberdeen SM(SIGNALS),006298 - LC MCB - MTCE SEQUENCE TEST,27.0,139.0,0.48,2.0,Found,NR/SMS/LC10 - A,S&T,Mapped,Sig 15,27.0,139.0,2.0,0.5
3,571,Aberdeen SM(SIGNALS),006308 - LCROAD LIGHTS & AUDIBLE WARNINGS,127.0,279.0,3.48,6.0,Found,NR/SMS/LC11 - A,S&T,Mapped,Sig 15,127.0,279.0,6.0,3.5
4,608,Aberdeen SM(SIGNALS),006476 - GROUND FRAME SWITCH PANEL,9.0,155.0,0.4,1.0,Found,NR/SMS/GF01 - SERVICE B,S&T,Mapped,Sig 07,9.0,155.0,1.0,0.4


In [14]:
backlog_p_c_comp_2.columns

Index(['index', 'workgroupset', 'standardjobnumber&desc', 'unitsrequired',
       'unitscompleted', 'backloghoursrequired', 'backlogunitsrequired',
       'foundinbacklog?', 'sjtaskdescription', 'discipline', 'missingmapping',
       'competencealias', 'attributedunitsrequired',
       'attributedunitscompleted', 'attributedbacklogunitsrequired',
       'attributedbackloghoursrequired'],
      dtype='object')

## 5. Merge team Size with backlog/plannedVScompleted data

In [15]:
#comp_teamsize = comp_teamsize.drop('Unnamed: 0',axis=1)
comp_teamsize['workgroupset'] = comp_teamsize['workgroupset'].str.strip()
#comp_teamsize = comp_teamsize.rename(columns={'masterdeliveryunit':'master-deliveryunit'})
# Merege team size to the previous dataset
backlog_p_c_comp_ts = pd.merge(backlog_p_c_comp_2,comp_teamsize, 
                               left_on=['workgroupset'],
                               right_on=['workgroupset'])

## 6. Merge Workbank trainning delivered recommendations

In [16]:
# NEW METHOD
#workbank_trainning_demand = workbank_trainning_demand.rename(columns={'master-deliveryunit':'master-deliveryunit'})
print(workbank_trainning_demand['workgroupset'].nunique())

## Merge losc data with recommendations
backlog_p_c_comp_ts_br = pd.merge(backlog_p_c_comp_ts,workbank_trainning_demand,how='left',
                                                    left_on = ['masterdeliveryunit','workgroupset','competencealias'],
                                                    right_on = ['masterdeliveryunit','workgroupset','competencealias'])
backlog_p_c_comp_ts_br = backlog_p_c_comp_ts_br.loc[~backlog_p_c_comp_ts_br['recommendednumberofpeopletotrain'].isna()]

## Number of unique workgroupsets in pre and post inervention 
print(backlog_p_c_comp_ts_br['workgroupset'].nunique())


117
115


## 7. Workbank KPIs

#### 8.A. Work group Set and Competence level

In [17]:
#Drop rows where the team size, attributedunitsrequired and completed are nan
backlog_p_c_comp_ts_br = backlog_p_c_comp_ts_br.dropna(subset=['teamsize']).reset_index(drop=True)

# Determine if a job was completed or not if the unitscompleted is more or equal to unitsrequired. 
backlog_p_c_comp_ts_br['Completed?'] = np.where(backlog_p_c_comp_ts_br['attributedunitscompleted']>=backlog_p_c_comp_ts_br['attributedunitsrequired'],'Delivered jobs','Outstanding jobs')
backlog_p_c_comp_ts_br['Completed?'] = np.where(backlog_p_c_comp_ts_br['attributedunitscompleted']==0,'Planned jobs',backlog_p_c_comp_ts_br['Completed?'])

In [18]:
completed_jobs_1 = backlog_p_c_comp_ts_br.groupby(['routelookup','masterdeliveryunit','workgroupset','Completed?','competencealias'])['standardjobnumber&desc'].count().reset_index()
completed_jobs_2 = backlog_p_c_comp_ts_br.groupby(['routelookup','masterdeliveryunit','workgroupset','Completed?','competencealias'])['teamsize'].max().reset_index()
completed_jobs_3 = backlog_p_c_comp_ts_br.groupby(['routelookup','masterdeliveryunit','workgroupset','Completed?','competencealias'])['attributedbackloghoursrequired'].sum().reset_index()
completed_jobs = pd.merge(completed_jobs_1,completed_jobs_2,how='left',on=['routelookup','masterdeliveryunit','workgroupset','Completed?','competencealias'])
completed_jobs = pd.merge(completed_jobs,completed_jobs_3,how='left',on=['routelookup','masterdeliveryunit','workgroupset','Completed?','competencealias'])

completed_jobs['completed job/Team Size'] = completed_jobs['standardjobnumber&desc']/completed_jobs['teamsize']
completed_jobs['completed job/total jobs'] = completed_jobs['standardjobnumber&desc']/completed_jobs['teamsize']/len(backlog_p_c_comp_ts_br)


# rename columns
completed_jobs.rename(columns={'workgroupset':'Work Group Set',
                                'routelookup':'Route',
                                'masterdeliveryunit':'Master Delivery Unit',
                                'competencealias':'Competence Alias',
                                'teamsize':'Team Size',
                                'standardjobnumber&desc':'1.1 Number of jobs associated to retrained competences delivered',
                                'attributedbackloghoursrequired':'2.1 Workbank backlog associated to retrained competences',
                                'completed job/Team Size':'1.2 Number of jobs delivered on per team member basis',
                                'completed job/total jobs':'1.3 Successfully completed jobs as a proportion of the total required jobs'},inplace = True)
completed_jobs.head(3)


Unnamed: 0,Route,Master Delivery Unit,Work Group Set,Completed?,Competence Alias,1.1 Number of jobs associated to retrained competences delivered,Team Size,2.1 Workbank backlog associated to retrained competences,1.2 Number of jobs delivered on per team member basis,1.3 Successfully completed jobs as a proportion of the total required jobs
0,Anglia,Ipswich,Colchester SM(TRACK),Delivered jobs,Tr 01,1,34,31.7,0.029,0.0
1,Anglia,Ipswich,Colchester SM(TRACK),Delivered jobs,Tr 01.01,18,34,966.5,0.529,0.0
2,Anglia,Ipswich,Colchester SM(TRACK),Delivered jobs,Tr 07.01,1,34,8.6,0.029,0.0


In [19]:
# The format needs to change to match what Kabita requires for creating the pipelines in SQL 
completed_jobs_SQL = pd.melt(completed_jobs, id_vars=['Route','Master Delivery Unit', 'Work Group Set','Competence Alias','Team Size','Completed?'], var_name='KPIs', value_name='KPI value')
completed_jobs_SQL

Unnamed: 0,Route,Master Delivery Unit,Work Group Set,Competence Alias,Team Size,Completed?,KPIs,KPI value
0,Anglia,Ipswich,Colchester SM(TRACK),Tr 01,34,Delivered jobs,1.1 Number of jobs associated to retrained com...,1.000
1,Anglia,Ipswich,Colchester SM(TRACK),Tr 01.01,34,Delivered jobs,1.1 Number of jobs associated to retrained com...,18.000
2,Anglia,Ipswich,Colchester SM(TRACK),Tr 07.01,34,Delivered jobs,1.1 Number of jobs associated to retrained com...,1.000
3,Anglia,Ipswich,Ipswich SM(SIGNALS),Sig 10,27,Delivered jobs,1.1 Number of jobs associated to retrained com...,1.000
4,Anglia,Ipswich,Ipswich SM(TRACK),Tr 07,34,Delivered jobs,1.1 Number of jobs associated to retrained com...,2.000
...,...,...,...,...,...,...,...,...
1179,Western,Reading (East),West Ealing ENG(SIGNALS),Sig 13,8,Outstanding jobs,1.3 Successfully completed jobs as a proportio...,0.000
1180,Western,Reading (East),West Ealing SM(SIGNALS),Sig 10,29,Delivered jobs,1.3 Successfully completed jobs as a proportio...,0.000
1181,Western,Reading (East),West Ealing SM(TRACK),Tr 01,25,Delivered jobs,1.3 Successfully completed jobs as a proportio...,0.000
1182,Western,Reading (East),West Ealing SM(TRACK),Tr 01.01,25,Delivered jobs,1.3 Successfully completed jobs as a proportio...,0.000


In [20]:
# save output as csv and upload to azure blob storage containers
save_outputs_azure_blobs(completed_jobs_SQL,output_file_name,blob_service_client_instance_devp)
save_outputs_azure_blobs(completed_jobs_SQL,output_file_name,blob_service_client_instance_test)
save_outputs_azure_blobs(completed_jobs_SQL,output_file_name,blob_service_client_instance_prod)
save_outputs_azure_blobs(completed_jobs_SQL,output_file_name,blob_service_client_instance_staging)

#### 8.B. Summary of KPIs

In [21]:
# Number of competed jobs 
tmp_1 = completed_jobs.groupby(['Completed?'])['1.1 Number of jobs associated to retrained competences delivered'].sum()

# Number of competed jobs per team bases
tmp_2 = completed_jobs.groupby(['Completed?'])['1.2 Number of jobs delivered on per team member basis'].sum()

data={'KPIs':['No. of WGS in Workbank recommendations',
              'No. of WGS found in Workbank data',
              '1.1 Number of jobs associated to retrained competences delivered',
              '1.2 Number of jobs delivered on per team member basis ',
              '1.3 Successfully completed jobs as a proportion of the total required jobs ',
              '2.1 Workbank backlog associated to retrained competences'],
      column_name2:[workbank_trainning_demand['workgroupset'].nunique(),
                          backlog_p_c_comp_ts_br['workgroupset'].nunique(),
                         tmp_1['Delivered jobs'],
                         tmp_2['Delivered jobs'],
                         (tmp_1['Delivered jobs']/len(backlog_p_c_comp_ts_br))*100,
                         backlog_p_c_comp_ts_br['attributedbackloghoursrequired'].sum()]}

workbank_KPIs=pd.DataFrame(data)
workbank_KPIs

Unnamed: 0,KPIs,Pre-intervention (2021-2022 P04)
0,No. of WGS in Workbank recommendations,117.0
1,No. of WGS found in Workbank data,115.0
2,1.1 Number of jobs associated to retrained com...,970.0
3,1.2 Number of jobs delivered on per team membe...,36.959
4,1.3 Successfully completed jobs as a proportio...,88.665
5,2.1 Workbank backlog associated to retrained c...,99793.7


In [22]:
# read outout to csv
workbank_KPIs.to_csv('BT_Workbank_KPIs_TD_PrePost_Summary.csv', index = False)

#### 8.C. Data info

In [23]:
data={'':['Backlog',
              'Planned vs completed',
              'Competences',
              'Team size',
              'Trainning delivered and recommendations',
              'Backlog + Planned vs completed',
              'Backlog + Planned vs completed + competences',
              'Backlog + Planned vs completed + competences + teamsize',
              'Backlog + Planned vs completed + competences + teamsize + trainning delivered'],
              'WGS':[backlog['workgroupset'].nunique(),
              plan_completed['workgroupset'].nunique(),
              '-',
              comp_teamsize['workgroupset'].nunique(),
              workbank_trainning_demand['workgroupset'].nunique(),
              backlog_p_c['workgroupset'].nunique(),
              backlog_p_c_comp['workgroupset'].nunique(),
              backlog_p_c_comp_ts['workgroupset'].nunique(),
              backlog_p_c_comp_ts_br['workgroupset'].nunique()],              
              'length':[len(backlog),
              len(competences_mapping),
              len(plan_completed),
              len(comp_teamsize),
              len(workbank_trainning_demand),
              len(backlog_p_c),
              len(backlog_p_c_comp),
              len(backlog_p_c_comp_ts),
              len(backlog_p_c_comp_ts_br)]}

workbank_data_info=pd.DataFrame(data)
workbank_data_info

Unnamed: 0,Unnamed: 1,WGS,length
0,Backlog,814,23817
1,Planned vs completed,917,2084
2,Competences,-,177150
3,Team size,1045,1045
4,Trainning delivered and recommendations,117,270
5,Backlog + Planned vs completed,814,23817
6,Backlog + Planned vs completed + competences,643,11092
7,Backlog + Planned vs completed + competences +...,521,10192
8,Backlog + Planned vs completed + competences +...,115,1094
