# Analysis of Process Mining Benchmarking Data
This notebook contains analysis of prominent publicly available datasets for process mining, including the BPI challenge datasets and several other notable and widely-used datasets from several industries. For each dataset, it examines important elements of the XES standard including lifecycle information (and whether or not it is included, and to what extent), activity names, and any variables specified at the case, organization, lifecycle, and event levels. This demonstrates the significant heterogeneity even within supposedly standard datasets for benchmarking of process mining tasks. Each dataset is summarized in a consistent manner with at-a-glance quantatative descriptors including number of unique activities, unique lifecycle transitions, and number of events. 

# Imports & Config

In [1]:
from lxml import etree as ET
import pm4py
import numpy as np
import pandas as pd
import os
from tqdm import tqdm

In [2]:
data_path = '../data/BPIC/'

# Helper Functions

In [3]:
def load_df_from_log(log_path):
  """
  Return a dataframe from a given XES or CSV log filepath
  """
  if any(log_path.lower().endswith(ext) for ext in ['.xes', '.xes.gz']):
    log = pm4py.read_xes(log_path)
    df = pm4py.convert_to_dataframe(log)
  elif log_path.lower().endswith('.csv'):
    # handle alternate separators
    df = pd.read_csv(log_path)
    if df.shape[1] == 1:
      df = pd.read_csv(log_path, sep=';')

  return df

In [4]:
def get_columns_with_prefix(df, prefix):
    """
    Get a list of columns in the DataFrame with the specified prefix.

    Parameters:
    - df: pandas DataFrame
    - prefix: str, the prefix to search for in column names

    Returns:
    - List of columns with the specified prefix
    """
    return [col.split(':', 1)[1] if len(col.split(':', 1)) > 1 else "" for col in df.columns if col.startswith(prefix)]


def get_prefix_columns(df):
    """
    Get a dictionary where keys are prefixes and values are lists of columns with those prefixes.

    Parameters:
    - df: pandas DataFrame

    Returns:
    - Dictionary with prefixes as keys and lists of columns with those prefixes as values
    """
    result_dict = {}
    prefixes = ['case:', 'time:', 'org:', 'concept:', 'lifecycle:']

    for prefix in prefixes:
        columns_with_prefix = get_columns_with_prefix(df, prefix)
        if columns_with_prefix:
            result_dict[prefix] = columns_with_prefix
    return result_dict

In [5]:
def create_summary_row(df, name):
  """
  Creates a summary row for an event log dataframe including case count, event count, trace lengths, and unique activity count, and lifecycle transition states.
  Parameters:
  - df : pandas Dataframe containing XES event log formatted data
  - name : name for the given dataset
  """
  # init summary columns for return row
  cols = ['name',
          'case vars',
          'time vars',
          'org vars',
          'concept vars',
          'lifecycle vars',
          'unique lifecycle transition count'
          'lifecycle transitions',
          'case count',
          'event count',
          'trace lengths',
          'unique activity count',
          'activity names']
  # init row to return
  row = {}
  # give df a name
  row['name'] = name
  # get all the XES variable column names and populate columns in return row
  prefix_dict = get_prefix_columns(df)
  row['case vars'] = prefix_dict.get('case:', [])
  row['time vars'] = prefix_dict.get('time:', [])
  row['org vars'] = prefix_dict.get('org:', [])
  row['event vars'] = prefix_dict.get('concept:', [])
  row['lifecycle vars'] = prefix_dict.get('lifecycle:', [])
  # get lifecycle transition info if it exists
  if ('lifecycle:transition' in df.columns):
    row['lifecycle transitions'] = list(df['lifecycle:transition'].unique())
    row['unique lifecycle transition count'] = len(row['lifecycle transitions'])
  else:
    row['lifecycle transitions'] = []
    row['unique lifecycle transition count'] = -1
  # get case count and trace lengths
  if ('case:concept:name' in df.columns):
    row['case count'] = df['case:concept:name'].nunique()
    row['trace lengths'] = df.groupby('case:concept:name').count().max(axis=1).values.tolist()
  else:
    row['case count'] = -1
    row['trace lengths'] = []
  # get event count
  row['event count'] = len(df)
  # get unique activity count
  if ('concept:name' in df.columns):
    row['activity names'] = df['concept:name'].unique().tolist()
    row['unique activity count'] = len(row['activity names'])
  else:
    row['activity names'] = []
    row['unique activity count'] = -1

  # convert to a dataframe row
  s = pd.Series(row)

  return pd.DataFrame([s.tolist()], columns=s.index)


In [6]:
def sample_activity_names(df, size=5):
  '''
  Prints some samples of the unique names of activites present in df
  '''
  if ('concept:name' in df.columns):

    activity_names = df['concept:name'].unique()

    if df['concept:name'].nunique() > size :
      sample = np.random.choice(activity_names, size)
    else:
      sample = activity_names

    for s in sample:
      print(s)

    return activity_names
  else:
    print('Non-standard activity column')
    return []

In [7]:
def get_random_trace(df):
  # select random trace
  if 'case:concept:name' in df.columns:
    trace = df['case:concept:name'].sample().iloc[0]
  else:
    return False
  # look at one particular trace
  trace_rows = df[df['case:concept:name'] == trace]
  return trace_rows

# Data Exploration
For each dataset, a one-row summary will be created including details on common column kinds for event logs. Additionaly, a random trace will be printed to give a sense of what the log looks like, and a sample of the activity names will be printed as well to get a sense of the verbiage used to descripe event kinds.

## BPIC 2011

### Loading

In [8]:
bpi_2011_df = load_df_from_log(data_path + 'BPI_2011_Hospital_log.xes.gz')

  from .autonotebook import tqdm as notebook_tqdm
parsing log, completed traces :: 100%|██████████| 1143/1143 [00:10<00:00, 108.94it/s]


### EDA

In [9]:
row_2011 = create_summary_row(bpi_2011_df, 'BPIC 2011 hosptial data')
row_2011

Unnamed: 0,name,case vars,time vars,org vars,event vars,lifecycle vars,lifecycle transitions,unique lifecycle transition count,case count,trace lengths,event count,activity names,unique activity count
0,BPIC 2011 hosptial data,"[End date, Age, Treatment code:2, Treatment co...",[timestamp],[group],[name],[transition],[complete],1,1143,"[75, 239, 25, 22, 185, 59, 36, 365, 189, 8, 26...",150291,"[1e consult poliklinisch, administratief tarie...",624


In [10]:
get_random_trace(bpi_2011_df)

Unnamed: 0,org:group,Number of executions,Specialism code,concept:name,Producer code,Section,Activity code,time:timestamp,lifecycle:transition,case:End date,...,case:Treatment code:14,case:Treatment code:15,case:Diagnosis:15,case:Diagnosis:14,case:Diagnosis:11,case:Diagnosis:13,case:Diagnosis:12,case:Diagnosis code:14,case:Diagnosis code:13,case:Diagnosis code:15
142672,General Lab Clinical Chemistry,1,20,e.c.g. - elektrocardiografie,PLAB,Section 4,330001B,2007-11-14 00:00:00+00:00,complete,NaT,...,,,,,,,,,,
142673,General Lab Clinical Chemistry,1,86,aanname laboratoriumonderzoek,CRPO,Section 4,370000,2007-11-14 00:00:00+00:00,complete,NaT,...,,,,,,,,,,
142674,General Lab Clinical Chemistry,1,86,aanname laboratoriumonderzoek,CRLA,Section 4,370000,2007-11-14 00:00:00+00:00,complete,NaT,...,,,,,,,,,,
142675,General Lab Clinical Chemistry,1,86,aanname laboratoriumonderzoek,CRLA,Section 4,370000,2007-11-14 00:00:00+00:00,complete,NaT,...,,,,,,,,,,
142676,General Lab Clinical Chemistry,1,86,bilirubine -geconjugeerd,CHE2,Section 4,370401,2007-11-14 00:00:00+00:00,complete,NaT,...,,,,,,,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
142995,General Lab Clinical Chemistry,1,86,leukocyten tellen elektronisch,HAEM,Section 4,370712B,2008-03-10 00:00:00+00:00,complete,NaT,...,,,,,,,,,,
142996,General Lab Clinical Chemistry,1,86,trombocyten tellen - elektronisch,HAEM,Section 4,370715A,2008-03-10 00:00:00+00:00,complete,NaT,...,,,,,,,,,,
142997,General Lab Clinical Chemistry,1,86,ca-125 mbv meia,CHE2,Section 4,378619A,2008-03-10 00:00:00+00:00,complete,NaT,...,,,,,,,,,,
142998,General Lab Clinical Chemistry,1,86,ordertarief,CRLA,Section 4,379999,2008-03-10 00:00:00+00:00,complete,NaT,...,,,,,,,,,,


## BPIC 2012

### Loading

In [11]:
bpi_2012_df = load_df_from_log(data_path + 'BPI_Challenge_2012.xes.gz')

parsing log, completed traces :: 100%|██████████| 13087/13087 [00:09<00:00, 1409.64it/s]


### EDA

In [12]:
row_2012 = create_summary_row(bpi_2012_df, 'BPIC 2012 loan data')
row_2012

Unnamed: 0,name,case vars,time vars,org vars,event vars,lifecycle vars,lifecycle transitions,unique lifecycle transition count,case count,trace lengths,event count,activity names,unique activity count
0,BPIC 2012 loan data,"[REG_DATE, concept:name, AMOUNT_REQ]",[timestamp],[resource],[name],[transition],"[COMPLETE, SCHEDULE, START]",3,13087,"[26, 39, 59, 3, 3, 9, 14, 12, 14, 24, 77, 35, ...",262200,"[A_SUBMITTED, A_PARTLYSUBMITTED, A_PREACCEPTED...",24


In [13]:
get_random_trace(bpi_2012_df)

Unnamed: 0,org:resource,lifecycle:transition,concept:name,time:timestamp,case:REG_DATE,case:concept:name,case:AMOUNT_REQ
152560,112,COMPLETE,A_SUBMITTED,2012-01-02 16:22:49.990000+00:00,2012-01-02 16:22:49.989000+00:00,197240,27500
152561,112,COMPLETE,A_PARTLYSUBMITTED,2012-01-02 16:22:50.095000+00:00,2012-01-02 16:22:49.989000+00:00,197240,27500
152562,112,SCHEDULE,W_Afhandelen leads,2012-01-02 16:23:21.791000+00:00,2012-01-02 16:22:49.989000+00:00,197240,27500
152563,10910,START,W_Afhandelen leads,2012-01-02 16:28:06.094000+00:00,2012-01-02 16:22:49.989000+00:00,197240,27500
152564,10910,COMPLETE,A_PREACCEPTED,2012-01-02 16:29:14.543000+00:00,2012-01-02 16:22:49.989000+00:00,197240,27500
152565,10910,SCHEDULE,W_Completeren aanvraag,2012-01-02 16:29:14.727000+00:00,2012-01-02 16:22:49.989000+00:00,197240,27500
152566,10910,COMPLETE,W_Afhandelen leads,2012-01-02 16:29:16.739000+00:00,2012-01-02 16:22:49.989000+00:00,197240,27500
152567,10863,START,W_Completeren aanvraag,2012-01-02 18:28:46.612000+00:00,2012-01-02 16:22:49.989000+00:00,197240,27500
152568,10863,COMPLETE,A_ACCEPTED,2012-01-02 18:36:26.924000+00:00,2012-01-02 16:22:49.989000+00:00,197240,27500
152569,10863,COMPLETE,O_SELECTED,2012-01-02 18:40:27.112000+00:00,2012-01-02 16:22:49.989000+00:00,197240,27500


## BPIC 2013 Data

### Loading Data

In [14]:
# obtain log and df for incidents
df_2013_inc = load_df_from_log(data_path+'BPI_Challenge_2013_incidents.xes.gz')
# obtain log and df for open issues
df_2013_open = load_df_from_log(data_path+'BPI_Challenge_2013_open_problems.xes.gz')
# obtain log and df for closed issues
df_2013_closed = load_df_from_log(data_path+'BPI_Challenge_2013_closed_problems.xes.gz')

parsing log, completed traces :: 100%|██████████| 7554/7554 [00:04<00:00, 1520.19it/s]
parsing log, completed traces :: 100%|██████████| 819/819 [00:00<00:00, 4729.75it/s]
parsing log, completed traces :: 100%|██████████| 1487/1487 [00:00<00:00, 2618.65it/s]


### EDA

In [15]:
row_2013_0 = create_summary_row(df_2013_inc, 'BPIC 2013 incidents')
row_2013_0

Unnamed: 0,name,case vars,time vars,org vars,event vars,lifecycle vars,lifecycle transitions,unique lifecycle transition count,case count,trace lengths,event count,activity names,unique activity count
0,BPIC 2013 incidents,[concept:name],[timestamp],"[group, resource, role]",[name],[transition],"[In Progress, Awaiting Assignment, Resolved, A...",13,7554,"[17, 40, 17, 19, 62, 32, 21, 14, 8, 19, 52, 31...",65533,"[Accepted, Queued, Completed, Unmatched]",4


In [16]:
row_2013_1 = create_summary_row(df_2013_open, 'BPIC 2013 open issues')
row_2013_1

Unnamed: 0,name,case vars,time vars,org vars,event vars,lifecycle vars,lifecycle transitions,unique lifecycle transition count,case count,trace lengths,event count,activity names,unique activity count
0,BPIC 2013 open issues,[concept:name],[timestamp],"[group, resource, role]",[name],[transition],"[In Progress, Wait, Awaiting Assignment, Assig...",5,819,"[4, 3, 3, 7, 5, 7, 3, 5, 6, 2, 2, 2, 2, 8, 4, ...",2351,"[Accepted, Queued, Completed]",3


In [17]:
row_2013_2 = create_summary_row(df_2013_closed, 'BPIC 2013 closed issues')
row_2013_2

Unnamed: 0,name,case vars,time vars,org vars,event vars,lifecycle vars,lifecycle transitions,unique lifecycle transition count,case count,trace lengths,event count,activity names,unique activity count
0,BPIC 2013 closed issues,[concept:name],[timestamp],"[group, resource, role]",[name],[transition],"[Awaiting Assignment, In Progress, Assigned, C...",7,1487,"[5, 6, 5, 5, 7, 7, 7, 8, 5, 3, 9, 3, 10, 3, 2,...",6660,"[Queued, Accepted, Completed, Unmatched]",4


### Exploring traces

In [18]:
get_random_trace(df_2013_inc)

Unnamed: 0,org:group,resource country,organization country,org:resource,organization involved,org:role,concept:name,impact,product,lifecycle:transition,time:timestamp,case:concept:name
5260,N16 2nd,USA,us,Michael,Org line A2,A2_3,Accepted,Low,PROD485,In Progress,2012-03-27 19:57:22+00:00,1-703547921
5261,N16 2nd,USA,us,Michael,Org line A2,A2_3,Accepted,Low,PROD485,In Progress,2012-03-27 19:57:40+00:00,1-703547921
5262,N16 2nd,USA,us,Michael,Org line A2,A2_3,Accepted,Low,PROD485,Wait - User,2012-04-09 20:04:51+00:00,1-703547921
5263,N16 2nd,USA,us,Michael,Org line A2,A2_3,Accepted,Low,PROD485,In Progress,2012-05-01 22:22:36+00:00,1-703547921
5264,N16 2nd,USA,us,Michael,Org line A2,A2_3,Completed,Low,PROD485,Resolved,2012-05-04 20:21:56+00:00,1-703547921
5265,N16 2nd,0,us,Siebel,Org line A2,A2_3,Completed,Low,PROD485,Closed,2012-05-05 01:17:51+00:00,1-703547921


In [19]:
get_random_trace(df_2013_inc)

Unnamed: 0,org:group,resource country,organization country,org:resource,organization involved,org:role,concept:name,impact,product,lifecycle:transition,time:timestamp,case:concept:name
13311,G96,POLAND,fr,Malgorzata,Org line C,V3_2,Accepted,Low,PROD267,In Progress,2012-04-19 14:08:53+00:00,1-726829050
13312,G96,POLAND,fr,Malgorzata,Org line C,V3_2,Accepted,Low,PROD267,In Progress,2012-04-19 14:09:11+00:00,1-726829050
13313,G96,POLAND,fr,Malgorzata,Org line C,V3_2,Accepted,Low,PROD267,In Progress,2012-04-19 14:34:56+00:00,1-726829050
13314,V41 2nd,POLAND,fr,Malgorzata,Org line C,V3_3,Queued,Low,PROD267,Awaiting Assignment,2012-04-19 14:54:31+00:00,1-726829050
13315,V41 2nd,France,fr,Louis,Org line C,V3_3,Accepted,Low,PROD267,In Progress,2012-04-19 15:24:10+00:00,1-726829050
13316,V41 2nd,France,fr,Louis,Org line C,V3_3,Accepted,Low,PROD267,Wait - User,2012-04-19 15:24:40+00:00,1-726829050
13317,G97,France,fr,Louis,Org line C,V3_2,Queued,Low,PROD267,Awaiting Assignment,2012-04-19 17:16:19+00:00,1-726829050
13318,G97,POLAND,fr,Nina,Org line C,V3_2,Accepted,Low,PROD267,In Progress,2012-04-20 16:28:47+00:00,1-726829050
13319,G97,POLAND,fr,Nina,Org line C,V3_2,Accepted,Low,PROD267,Assigned,2012-04-20 16:28:54+00:00,1-726829050
13320,G97,POLAND,fr,Malgorzata,Org line C,V3_2,Accepted,Low,PROD267,In Progress,2012-04-23 10:42:46+00:00,1-726829050


In [20]:
get_random_trace(df_2013_closed)

Unnamed: 0,org:group,resource country,organization country,org:resource,organization involved,org:role,concept:name,impact,product,lifecycle:transition,time:timestamp,case:concept:name
1913,Org line G3,USA,us,Carolyn,G199 3rd,,Accepted,Medium,PROD98,In Progress,2011-05-03 17:14:07+00:00,1-522734472
1914,Org line G3,USA,us,Carolyn,G199 3rd,,Queued,Medium,PROD98,Awaiting Assignment,2011-05-03 17:16:39+00:00,1-522734472
1915,Org line G3,Sweden,us,Padmanabha,G199 3rd,,Accepted,Medium,PROD98,In Progress,2011-05-04 06:46:05+00:00,1-522734472
1916,Org line G3,USA,us,Carolyn,G199 3rd,,Accepted,Medium,PROD98,Assigned,2011-05-04 06:46:17+00:00,1-522734472
1917,Org line G3,POLAND,us,Ewa,G199 3rd,,Accepted,Medium,PROD98,In Progress,2012-02-06 21:35:55+00:00,1-522734472
1918,Org line G3,POLAND,us,Ewa,G199 3rd,,Completed,Medium,PROD98,Closed,2012-02-06 21:37:18+00:00,1-522734472


## BPIC 2014 Data

### Loading Data

In [21]:
df_2014_incident = load_df_from_log(data_path+'BPI_2014_Detail_Incident.csv')
df_2014_change = load_df_from_log(data_path+'BPI_2014_Detail_Change.csv')
df_2014_incident_activity = load_df_from_log(data_path+'BPI_2014_Detail_Incident_Activity.csv')
df_2014_interaction = load_df_from_log(data_path+'BPI_2014_Detail_Interaction.csv')

  df = pd.read_csv(log_path, sep=';')
  df = pd.read_csv(log_path, sep=';')


In [22]:
# rename the main event log to conform with XES standard column names for ingestion
df_2014_incident_activity.rename(columns={'Incident ID': 'case:concept:name'}, inplace=True)
df_2014_incident_activity.rename(columns={'IncidentActivity_Type': 'concept:name'}, inplace=True)
df_2014_incident_activity.rename(columns={'DateStamp': 'time:timestamp'}, inplace=True)
df_2014_incident_activity.rename(columns={'Assignment Group': 'org:resource'}, inplace=True)

### EDA

In [23]:
row_2014 = create_summary_row(df_2014_incident_activity, 'BPIC 2014 incident activity')
row_2014

Unnamed: 0,name,case vars,time vars,org vars,event vars,lifecycle vars,lifecycle transitions,unique lifecycle transition count,case count,trace lengths,event count,activity names,unique activity count
0,BPIC 2014 incident activity,[concept:name],[timestamp],[resource],[name],[],[],-1,46616,"[10, 28, 2, 20, 6, 6, 8, 18, 6, 6, 14, 14, 14,...",466737,"[Reassignment, Update from customer, Operator ...",39


In [24]:
# load data for an incident
get_random_trace(df_2014_incident_activity)

Unnamed: 0,case:concept:name,time:timestamp,IncidentActivity_Number,concept:name,org:resource,KM number,Interaction ID
50233,IM0003706,08-10-2013 15:27:59,001A5645486,Update,TEAM0018,KM0001115,SD0008754
50234,IM0003706,08-10-2013 15:26:54,001A5645475,Open,TEAM0018,KM0001115,SD0008754
50235,IM0003706,09-10-2013 10:54:55,001A5650378,Assignment,TEAM0019,KM0001115,SD0008754
50236,IM0003706,09-10-2013 10:46:57,001A5649747,Reassignment,TEAM0018,KM0001115,SD0008754
50237,IM0003706,20-10-2013 21:02:38,001A5728508,Operator Update,TEAM0053,KM0001115,SD0008754
50238,IM0003706,18-10-2013 08:28:41,001A5714395,Update,TEAM0018,KM0001115,SD0008754
50239,IM0003706,11-10-2013 10:44:28,001A5666036,Reassignment,TEAM0035,KM0001115,SD0008754
50240,IM0003706,11-10-2013 11:22:05,001A5667278,Assignment,TEAM0053,KM0001115,SD0008754
50241,IM0003706,11-10-2013 10:44:28,001A5666037,Operator Update,TEAM0035,KM0001115,SD0008754
50242,IM0003706,23-10-2013 09:12:08,001A5761092,Operator Update,TEAM0035,KM0001115,SD0008754


## BPIC 2015 Data


### Loading

In [25]:
df_2015_A = load_df_from_log(data_path+'BPIC15_1.xes')
df_2015_B = load_df_from_log(data_path+'BPIC15_2.xes')
df_2015_C = load_df_from_log(data_path+'BPIC15_3.xes')
df_2015_D = load_df_from_log(data_path+'BPIC15_4.xes')
df_2015_E = load_df_from_log(data_path+'BPIC15_5.xes')

parsing log, completed traces :: 100%|██████████| 1199/1199 [00:04<00:00, 285.20it/s]
parsing log, completed traces :: 100%|██████████| 832/832 [00:03<00:00, 239.24it/s]
parsing log, completed traces :: 100%|██████████| 1409/1409 [00:04<00:00, 298.35it/s]
parsing log, completed traces :: 100%|██████████| 1053/1053 [00:03<00:00, 290.10it/s]
parsing log, completed traces :: 100%|██████████| 1156/1156 [00:04<00:00, 238.63it/s]


### EDA

In [26]:
row_2015_A = create_summary_row(df_2015_A, 'BPIC 2015 municipality A')
row_2015_A

Unnamed: 0,name,case vars,time vars,org vars,event vars,lifecycle vars,lifecycle transitions,unique lifecycle transition count,case count,trace lengths,event count,activity names,unique activity count
0,BPIC 2015 municipality A,"[endDate, caseStatus, SUMleges, last_phase, ca...",[timestamp],[resource],[name],[transition],[complete],1,1199,"[45, 57, 57, 58, 46, 56, 58, 47, 71, 55, 34, 5...",52217,"[01_HOOFD_010, 01_HOOFD_011, 01_HOOFD_020, 02_...",398


In [27]:
row_2015_B = create_summary_row(df_2015_B, 'BPIC 2015 municipality B')
row_2015_B

Unnamed: 0,name,case vars,time vars,org vars,event vars,lifecycle vars,lifecycle transitions,unique lifecycle transition count,case count,trace lengths,event count,activity names,unique activity count
0,BPIC 2015 municipality B,"[Includes_subCases, concept:name, Responsible_...",[timestamp],[resource],[name],[transition],[complete],1,832,"[84, 79, 60, 40, 43, 46, 38, 51, 62, 44, 56, 6...",44354,"[01_HOOFD_010, 01_HOOFD_011, 01_HOOFD_020, 01_...",410


In [28]:
row_2015_C = create_summary_row(df_2015_C, 'BPIC 2015 municipality C')
row_2015_C

Unnamed: 0,name,case vars,time vars,org vars,event vars,lifecycle vars,lifecycle transitions,unique lifecycle transition count,case count,trace lengths,event count,activity names,unique activity count
0,BPIC 2015 municipality C,"[Includes_subCases, concept:name, Responsible_...",[timestamp],[resource],[name],[transition],[complete],1,1409,"[36, 18, 39, 61, 38, 60, 35, 37, 48, 42, 47, 2...",59681,"[01_HOOFD_010, 01_HOOFD_030_2, 01_HOOFD_015, 0...",383


In [29]:
row_2015_D = create_summary_row(df_2015_D, 'BPIC 2015 municipality D')
row_2015_D

Unnamed: 0,name,case vars,time vars,org vars,event vars,lifecycle vars,lifecycle transitions,unique lifecycle transition count,case count,trace lengths,event count,activity names,unique activity count
0,BPIC 2015 municipality D,"[concept:name, Responsible_actor, endDate, cas...",[timestamp],[resource],[name],[transition],[complete],1,1053,"[46, 13, 13, 116, 53, 41, 42, 43, 55, 41, 42, ...",47293,"[01_HOOFD_010, 04_BPT_005, 01_HOOFD_065_0, 01_...",356


In [30]:
row_2015_E = create_summary_row(df_2015_E, 'BPIC 2015 municipality E')
row_2015_E

Unnamed: 0,name,case vars,time vars,org vars,event vars,lifecycle vars,lifecycle transitions,unique lifecycle transition count,case count,trace lengths,event count,activity names,unique activity count
0,BPIC 2015 municipality E,"[endDate, caseStatus, SUMleges, last_phase, ca...",[timestamp],[resource],[name],[transition],[complete],1,1156,"[51, 100, 12, 61, 59, 75, 59, 42, 50, 71, 9, 3...",59083,"[01_HOOFD_010, 01_HOOFD_011, 01_HOOFD_020, 03_...",389


### Exploring Traces

In [31]:
get_random_trace(df_2015_A)

Unnamed: 0,question,dateFinished,dueDate,action_code,activityNameEN,planned,time:timestamp,monitoringResource,org:resource,activityNameNL,...,case:parts,case:termName,case:endDatePlanned,case:startDate,case:requestComplete,case:IDofConceptCase,case:landRegisterID,case:caseProcedure,case:Includes_subCases,dateStop
17775,EMPTY,2011-12-12 12:19:35,2011-07-08 16:24:08+00:00,01_HOOFD_010,register submission date request,2011-07-06 16:24:08+00:00,2011-12-07 00:00:00+00:00,560890,560912,registratie datum binnenkomst aanvraag,...,Bouw,Termijn bezwaar en beroep 1,NaT,2011-12-07 00:00:00+00:00,False,,,,N,
17776,EMPTY,2011-12-12 12:19:35,NaT,01_HOOFD_015,phase application received,NaT,2011-12-12 12:16:36+00:00,4901428,560912,fase aanvraag ontvangen,...,Bouw,Termijn bezwaar en beroep 1,NaT,2011-12-07 00:00:00+00:00,False,,,,N,
17777,True,2011-12-12 12:19:35,NaT,01_HOOFD_020,send confirmation receipt,NaT,2011-12-12 12:16:40+00:00,560890,560912,versturen ontvangstbevestiging,...,Bouw,Termijn bezwaar en beroep 1,NaT,2011-12-07 00:00:00+00:00,False,,,,N,
17778,EMPTY,2011-12-12 12:19:34,NaT,01_HOOFD_030_1,send confirmation receipt,NaT,2011-12-12 12:18:00+00:00,560890,560912,versturen ontvangstbevestiging,...,Bouw,Termijn bezwaar en beroep 1,NaT,2011-12-07 00:00:00+00:00,False,,,,N,
17779,EMPTY,2011-12-12 12:19:34,2011-12-14 12:18:01+00:00,01_HOOFD_030_2,enter senddate acknowledgement,2011-12-13 12:18:01+00:00,2011-12-12 12:18:00+00:00,560890,560912,invoeren verzenddatum ontvangstbevestiging,...,Bouw,Termijn bezwaar en beroep 1,NaT,2011-12-07 00:00:00+00:00,False,,,,N,
17780,False,2011-12-12 12:19:35,2011-12-17 12:18:42+00:00,02_DRZ_010,forward to the competent authority,2011-12-13 12:18:42+00:00,2011-12-12 12:18:47+00:00,560890,560912,doorsturen aan bevoegd gezag,...,Bouw,Termijn bezwaar en beroep 1,NaT,2011-12-07 00:00:00+00:00,False,,,,N,
17781,True,2011-12-12 12:19:35,NaT,04_BPT_005,regular procedure without MER,2011-12-13 12:18:48+00:00,2011-12-12 12:18:49+00:00,560890,560912,reguliere procedure zonder MER,...,Bouw,Termijn bezwaar en beroep 1,NaT,2011-12-07 00:00:00+00:00,False,,,,N,
17782,EMPTY,2011-12-12 12:19:34,NaT,01_HOOFD_065_1,send procedure confirmation,2011-12-13 12:18:49+00:00,2011-12-12 12:18:53+00:00,560890,560912,procedurebevestiging versturen,...,Bouw,Termijn bezwaar en beroep 1,NaT,2011-12-07 00:00:00+00:00,False,,,,N,
17783,EMPTY,2011-12-12 12:19:34,NaT,01_HOOFD_065_2,enter senddate procedure confirmation,2011-12-13 12:18:53+00:00,2011-12-12 12:18:53+00:00,560890,560912,invoeren verzenddatum procedurebevestiging,...,Bouw,Termijn bezwaar en beroep 1,NaT,2011-12-07 00:00:00+00:00,False,,,,N,
17784,True,2011-12-12 12:19:35,NaT,01_HOOFD_050,inform BAG administrator,NaT,2011-12-12 12:19:05+00:00,560890,560912,BAG beheerder informeren,...,Bouw,Termijn bezwaar en beroep 1,NaT,2011-12-07 00:00:00+00:00,False,,,,N,


In [32]:
get_random_trace(df_2015_B)

Unnamed: 0,monitoringResource,org:resource,activityNameNL,concept:name,question,dateFinished,action_code,activityNameEN,planned,lifecycle:transition,...,case:last_phase,case:case_type,case:startDate,case:requestComplete,case:SUMleges,case:IDofConceptCase,case:termName,case:landRegisterID,dueDate,dateStop
263,560530,560530,registratie datum binnenkomst aanvraag,01_HOOFD_010,EMPTY,2012-04-03 11:50:48,01_HOOFD_010,register submission date request,2012-04-04 10:42:05+00:00,complete,...,Zaak afgehandeld,557669,2012-04-03 10:42:05+00:00,True,1814.5575,12606445,Termijn bezwaar en beroep 1,,NaT,
264,560521,560530,OLO berichtenverkeer actief,01_HOOFD_011,True,2012-04-03 11:50:48,01_HOOFD_011,OLO messaging active,2012-04-04 11:20:10+00:00,complete,...,Zaak afgehandeld,557669,2012-04-03 10:42:05+00:00,True,1814.5575,12606445,Termijn bezwaar en beroep 1,,NaT,
265,560521,560530,aanvraag via OLO ingediend,01_HOOFD_012,True,2012-04-03 11:50:48,01_HOOFD_012,application submitted through OLO,2012-04-04 11:20:14+00:00,complete,...,Zaak afgehandeld,557669,2012-04-03 10:42:05+00:00,True,1814.5575,12606445,Termijn bezwaar en beroep 1,,NaT,
266,560521,560530,versturen ontvangstbevestiging,01_HOOFD_020,True,2012-04-03 11:50:48,01_HOOFD_020,send confirmation receipt,NaT,complete,...,Zaak afgehandeld,557669,2012-04-03 10:42:05+00:00,True,1814.5575,12606445,Termijn bezwaar en beroep 1,,NaT,
267,560521,560530,aanvrager is belanghebbende,03_GBH_005,True,2012-04-03 11:50:48,03_GBH_005,applicant is stakeholder,NaT,complete,...,Zaak afgehandeld,557669,2012-04-03 10:42:05+00:00,True,1814.5575,12606445,Termijn bezwaar en beroep 1,,NaT,
268,560521,560530,beeindigen op verzoek,05_EIND_010,False,2012-04-03 11:50:48,05_EIND_010,terminate on request,NaT,complete,...,Zaak afgehandeld,557669,2012-04-03 10:42:05+00:00,True,1814.5575,12606445,Termijn bezwaar en beroep 1,,NaT,
269,560521,560530,fase aanvraag ontvangen,01_HOOFD_015,EMPTY,2012-04-03 11:50:48,01_HOOFD_015,phase application received,NaT,complete,...,Zaak afgehandeld,557669,2012-04-03 10:42:05+00:00,True,1814.5575,12606445,Termijn bezwaar en beroep 1,,NaT,
270,560521,560530,invoeren verzenddatum ontvangstbevestiging,01_HOOFD_030_2,EMPTY,2012-04-03 11:50:47,01_HOOFD_030_2,enter senddate acknowledgement,2012-04-04 11:20:20+00:00,complete,...,Zaak afgehandeld,557669,2012-04-03 10:42:05+00:00,True,1814.5575,12606445,Termijn bezwaar en beroep 1,,NaT,
271,560521,560530,versturen ontvangstbevestiging,01_HOOFD_030_1,EMPTY,2012-04-03 11:50:47,01_HOOFD_030_1,send confirmation receipt,2012-04-04 11:20:17+00:00,complete,...,Zaak afgehandeld,557669,2012-04-03 10:42:05+00:00,True,1814.5575,12606445,Termijn bezwaar en beroep 1,,NaT,
272,560521,560530,doorsturen aan bevoegd gezag,02_DRZ_010,False,2012-04-03 11:50:48,02_DRZ_010,forward to the competent authority,NaT,complete,...,Zaak afgehandeld,557669,2012-04-03 10:42:05+00:00,True,1814.5575,12606445,Termijn bezwaar en beroep 1,,NaT,


In [33]:
get_random_trace(df_2015_C)

Unnamed: 0,question,dateFinished,dueDate,action_code,activityNameEN,planned,time:timestamp,monitoringResource,org:resource,activityNameNL,...,case:startDate,case:requestComplete,case:endDate,case:parts,case:SUMleges,case:caseProcedure,case:IDofConceptCase,case:endDatePlanned,dateStop,case:landRegisterID
10431,EMPTY,2011-07-22 11:10:40,2011-06-17 08:39:03+00:00,01_HOOFD_010,register submission date request,2011-06-15 08:39:03+00:00,2011-06-07 00:00:00+00:00,560741,560741,registratie datum binnenkomst aanvraag,...,2011-06-07 00:00:00+00:00,True,2011-07-21 00:00:00+00:00,Bouw,,,,NaT,,
10432,EMPTY,2011-07-22 11:10:40,2011-06-16 09:01:40+00:00,01_HOOFD_030_2,enter senddate acknowledgement,2011-06-15 09:01:40+00:00,2011-06-14 00:00:00+00:00,560696,560741,invoeren verzenddatum ontvangstbevestiging,...,2011-06-07 00:00:00+00:00,True,2011-07-21 00:00:00+00:00,Bouw,,,,NaT,,
10433,EMPTY,2011-07-22 11:10:40,NaT,01_HOOFD_015,phase application received,NaT,2011-06-14 08:59:55+00:00,560696,560741,fase aanvraag ontvangen,...,2011-06-07 00:00:00+00:00,True,2011-07-21 00:00:00+00:00,Bouw,,,,NaT,,
10434,True,2011-07-22 11:10:40,2011-06-17 08:59:55+00:00,01_HOOFD_020,reception through OLO,2011-06-15 08:59:55+00:00,2011-06-14 08:59:58+00:00,560696,560741,ontvangst via OLO,...,2011-06-07 00:00:00+00:00,True,2011-07-21 00:00:00+00:00,Bouw,,,,NaT,,
10435,EMPTY,2011-07-22 11:10:40,2011-06-16 08:59:59+00:00,01_HOOFD_030_1,send confirmation receipt,2011-06-15 08:59:59+00:00,2011-06-14 09:01:40+00:00,560696,560741,versturen ontvangstbevestiging,...,2011-06-07 00:00:00+00:00,True,2011-07-21 00:00:00+00:00,Bouw,,,,NaT,,
10436,EMPTY,2011-07-22 11:10:40,NaT,01_HOOFD_490_3,register date environmental permit decision,2011-07-23 11:00:52+00:00,2011-07-21 00:00:00+00:00,560696,560741,registreren datum besluit omgevingsvergunning,...,2011-06-07 00:00:00+00:00,True,2011-07-21 00:00:00+00:00,Bouw,,,,NaT,,
10437,False,2011-07-21 11:55:16,2011-06-19 09:01:44+00:00,01_HOOFD_040,forward to the competent authority,2011-06-15 09:01:44+00:00,2011-07-21 11:54:08+00:00,560696,3122446,doorsturen aan bevoegd gezag,...,2011-06-07 00:00:00+00:00,True,2011-07-21 00:00:00+00:00,Bouw,,,,NaT,,
10438,True,2011-07-21 11:55:16,NaT,01_HOOFD_060,regular procedure without MER,2011-07-22 11:54:08+00:00,2011-07-21 11:54:08+00:00,560696,3122446,reguliere procedure zonder MER,...,2011-06-07 00:00:00+00:00,True,2011-07-21 00:00:00+00:00,Bouw,,,,NaT,,
10439,EMPTY,2011-07-21 11:55:15,NaT,01_HOOFD_065_1,send procedure confirmation,NaT,2011-07-21 11:54:10+00:00,560696,3122446,procedurebevestiging versturen,...,2011-06-07 00:00:00+00:00,True,2011-07-21 00:00:00+00:00,Bouw,,,,NaT,,
10440,EMPTY,2011-07-21 11:55:15,NaT,01_HOOFD_065_2,enter senddate procedure confirmation,2011-07-22 11:54:10+00:00,2011-07-21 11:54:10+00:00,560696,3122446,invoeren verzenddatum procedurebevestiging,...,2011-06-07 00:00:00+00:00,True,2011-07-21 00:00:00+00:00,Bouw,,,,NaT,,


In [34]:
get_random_trace(df_2015_D)

Unnamed: 0,question,dateFinished,dueDate,action_code,activityNameEN,planned,time:timestamp,monitoringResource,org:resource,activityNameNL,...,case:case_type,case:startDate,case:requestComplete,case:IDofConceptCase,case:termName,case:caseProcedure,case:landRegisterID,case:Includes_subCases,dateStop,case:endDatePlanned
17562,EMPTY,2011-12-13 09:43:08,NaT,01_HOOFD_010,register submission date request,2011-12-14 09:36:29+00:00,2011-12-13 09:36:29+00:00,560812,560781,registratie datum binnenkomst aanvraag,...,557669,2011-12-13 09:36:29+00:00,True,,,,,N,,NaT
17563,False,2011-12-13 09:43:08,NaT,01_HOOFD_020,send confirmation receipt,NaT,2011-12-13 09:42:45+00:00,560812,560781,versturen ontvangstbevestiging,...,557669,2011-12-13 09:36:29+00:00,True,,,,,N,,NaT
17564,True,2011-12-13 09:43:08,NaT,03_GBH_005,applicant is stakeholder,NaT,2011-12-13 09:42:45+00:00,560812,560781,aanvrager is belanghebbende,...,557669,2011-12-13 09:36:29+00:00,True,,,,,N,,NaT
17565,False,2011-12-13 09:43:08,NaT,01_HOOFD_040,forward to the competent authority,NaT,2011-12-13 09:42:45+00:00,560812,560781,doorsturen aan bevoegd gezag,...,557669,2011-12-13 09:36:29+00:00,True,,,,,N,,NaT
17566,False,2011-12-13 09:43:08,NaT,01_HOOFD_050,inform BAG administrator,NaT,2011-12-13 09:42:45+00:00,560812,560781,BAG beheerder informeren,...,557669,2011-12-13 09:36:29+00:00,True,,,,,N,,NaT
17567,EMPTY,2011-12-13 09:43:08,NaT,01_HOOFD_015,phase application received,NaT,2011-12-13 09:42:45+00:00,560812,560781,fase aanvraag ontvangen,...,557669,2011-12-13 09:36:29+00:00,True,,,,,N,,NaT
17568,True,2011-12-13 09:43:08,NaT,01_HOOFD_060,regular procedure without MER,2011-12-14 09:42:45+00:00,2011-12-13 09:42:51+00:00,560812,560781,reguliere procedure zonder MER,...,557669,2011-12-13 09:36:29+00:00,True,,,,,N,,NaT
17569,EMPTY,2011-12-13 09:43:07,NaT,01_HOOFD_065_2,enter senddate procedure confirmation,2011-12-14 09:42:54+00:00,2011-12-13 09:42:54+00:00,560812,560781,invoeren verzenddatum procedurebevestiging,...,557669,2011-12-13 09:36:29+00:00,True,,,,,N,,NaT
17570,EMPTY,2011-12-13 09:43:07,NaT,01_HOOFD_065_1,send procedure confirmation,2011-12-14 09:42:51+00:00,2011-12-13 09:42:54+00:00,560812,560781,procedurebevestiging versturen,...,557669,2011-12-13 09:36:29+00:00,True,,,,,N,,NaT
17571,False,2011-12-13 15:12:00,NaT,14_VRIJ_010,no permit needed or only notification needed,NaT,2011-12-13 15:09:51+00:00,560812,560812,vergunningvrij of meldingplichtig,...,557669,2011-12-13 09:36:29+00:00,True,,,,,N,,NaT


In [35]:
get_random_trace(df_2015_E)

Unnamed: 0,question,dateFinished,dueDate,action_code,activityNameEN,planned,time:timestamp,monitoringResource,org:resource,activityNameNL,...,case:landRegisterID,case:parts,case:termName,case:startDate,case:requestComplete,case:IDofConceptCase,case:caseProcedure,case:Includes_subCases,case:endDatePlanned,dateStop
40935,EMPTY,2012-08-09 15:44:01,2012-08-12 15:40:39+00:00,01_HOOFD_010,register submission date request,2012-08-10 15:40:39+00:00,2012-08-07 00:00:00+00:00,1254625,1254625,registratie datum binnenkomst aanvraag,...,6820701,Bouw,,2012-08-07 00:00:00+00:00,FALSE,,,J,NaT,
40936,False,2012-08-09 15:44:01,NaT,01_HOOFD_011,OLO messaging active,NaT,2012-08-09 15:43:57+00:00,560600,1254625,OLO berichtenverkeer actief,...,6820701,Bouw,,2012-08-07 00:00:00+00:00,FALSE,,,J,NaT,
40937,True,2012-08-09 15:44:01,NaT,01_HOOFD_020,send confirmation receipt,NaT,2012-08-09 15:43:57+00:00,560600,1254625,versturen ontvangstbevestiging,...,6820701,Bouw,,2012-08-07 00:00:00+00:00,FALSE,,,J,NaT,
40938,True,2012-08-09 15:44:01,NaT,03_GBH_005,applicant is stakeholder,NaT,2012-08-09 15:43:57+00:00,560600,1254625,aanvrager is belanghebbende,...,6820701,Bouw,,2012-08-07 00:00:00+00:00,FALSE,,,J,NaT,
40939,False,2012-08-09 15:44:01,NaT,05_EIND_010,terminate on request,NaT,2012-08-09 15:43:57+00:00,560600,1254625,beeindigen op verzoek,...,6820701,Bouw,,2012-08-07 00:00:00+00:00,FALSE,,,J,NaT,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
40996,EMPTY,2012-10-09 12:24:49,NaT,01_BB_775,phase decision irrevocable,NaT,2012-10-09 12:24:35+00:00,560600,560602,fase besluit onherroepelijk,...,6820701,Bouw,,2012-08-07 00:00:00+00:00,FALSE,,,J,NaT,
40997,EMPTY,2012-10-09 12:24:49,NaT,01_BB_770,set phase: phase permitting irrevocable,NaT,2012-10-09 12:24:35+00:00,560600,560602,instellen besluitfase: oorspronkelijk besluit,...,6820701,Bouw,,2012-08-07 00:00:00+00:00,FALSE,,,J,NaT,
40998,EMPTY,2012-10-09 12:24:49,NaT,01_HOOFD_815,phase case handled,NaT,2012-10-09 12:24:39+00:00,560600,560602,fase zaak afgehandeld,...,6820701,Bouw,,2012-08-07 00:00:00+00:00,FALSE,,,J,NaT,
40999,EMPTY,2012-10-09 12:24:49,NaT,01_HOOFD_814,phase archived case,NaT,2012-10-09 12:24:39+00:00,560600,560602,fase zaak gearchiveerd,...,6820701,Bouw,,2012-08-07 00:00:00+00:00,FALSE,,,J,NaT,


## BPIC 2016 Data
Given the significant data format differences and complexity of the data of BPIC 2016, it is ommitted from this analysis as it requires assumptions to be made that do not render comparison of the data with other years data useful. BPI 2016 data contains information at the click level, resulting in significantly more granular and larger files, it also contains multiple unlinked data with no clear case identifiers, for example support messages linked only by a customer ID. 

## BPIC 2017

### Loading

In [36]:
df_2017 = load_df_from_log(data_path+'BPI Challenge 2017.xes.gz')

parsing log, completed traces :: 100%|██████████| 31509/31509 [01:03<00:00, 493.79it/s]


### EDA

In [37]:
row_2017 = create_summary_row(df_2017, '2017 loan application')
row_2017

Unnamed: 0,name,case vars,time vars,org vars,event vars,lifecycle vars,lifecycle transitions,unique lifecycle transition count,case count,trace lengths,event count,activity names,unique activity count
0,2017 loan application,"[LoanGoal, ApplicationType, concept:name, Requ...",[timestamp],[resource],[name],[transition],"[complete, schedule, withdraw, start, suspend,...",7,31509,"[22, 25, 18, 40, 51, 55, 46, 37, 27, 23, 54, 3...",1202267,"[A_Create Application, A_Submitted, W_Handle l...",26


### Exploring a trace

In [None]:
get_random_trace(df_2017)

Unnamed: 0,Action,org:resource,concept:name,EventOrigin,EventID,lifecycle:transition,time:timestamp,case:LoanGoal,case:ApplicationType,case:concept:name,case:RequestedAmount,FirstWithdrawalAmount,NumberOfTerms,Accepted,MonthlyCost,Selected,CreditScore,OfferedAmount,OfferID
43847,Created,User_1,A_Create Application,Application,Application_300912987,complete,2016-01-16 15:42:29.563000+00:00,Car,New credit,Application_300912987,5000.0,,,,,,,,
43848,statechange,User_1,A_Submitted,Application,ApplState_1383275233,complete,2016-01-16 15:42:29.606000+00:00,Car,New credit,Application_300912987,5000.0,,,,,,,,
43849,Created,User_1,W_Handle leads,Workflow,Workitem_1883965185,schedule,2016-01-16 15:42:29.813000+00:00,Car,New credit,Application_300912987,5000.0,,,,,,,,
43850,Deleted,User_1,W_Handle leads,Workflow,Workitem_1981057597,withdraw,2016-01-16 15:43:23.713000+00:00,Car,New credit,Application_300912987,5000.0,,,,,,,,
43851,Created,User_1,W_Complete application,Workflow,Workitem_1791571867,schedule,2016-01-16 15:43:23.720000+00:00,Car,New credit,Application_300912987,5000.0,,,,,,,,
43852,statechange,User_1,A_Concept,Application,ApplState_1206565341,complete,2016-01-16 15:43:23.725000+00:00,Car,New credit,Application_300912987,5000.0,,,,,,,,
43853,Obtained,User_85,W_Complete application,Workflow,Workitem_141416980,start,2016-01-18 09:04:18.480000+00:00,Car,New credit,Application_300912987,5000.0,,,,,,,,
43854,Released,User_85,W_Complete application,Workflow,Workitem_867841654,suspend,2016-01-18 09:04:35.044000+00:00,Car,New credit,Application_300912987,5000.0,,,,,,,,
43855,Obtained,User_28,W_Complete application,Workflow,Workitem_1564074110,resume,2016-01-18 09:33:55.999000+00:00,Car,New credit,Application_300912987,5000.0,,,,,,,,
43856,Released,User_28,W_Complete application,Workflow,Workitem_136493353,suspend,2016-01-18 09:34:15.736000+00:00,Car,New credit,Application_300912987,5000.0,,,,,,,,


: 

## BPIC 2018

### Loading

In [8]:
df_2018 = load_df_from_log(data_path+'BPI Challenge 2018.xes.gz')

  from .autonotebook import tqdm as notebook_tqdm
parsing log, completed traces :: 100%|██████████| 43809/43809 [05:57<00:00, 122.45it/s]


: 

### EDA

In [None]:
row_2018 = create_summary_row(df_2018, '2018 European agricultural guarantee fund')
row_2018

### Random trace sample

In [None]:
get_random_trace(df_2018)

## BPIC 2019

### Loading

In [9]:
df_2019 = load_df_from_log(data_path+'BPI_Challenge_2019.xes')

  from .autonotebook import tqdm as notebook_tqdm
parsing log, completed traces :: 100%|██████████| 251734/251734 [01:21<00:00, 3104.63it/s]


### EDA

In [10]:
row_2019 = create_summary_row(df_2019, '2019 Purchase order handling')
row_2019

Unnamed: 0,name,case vars,time vars,org vars,event vars,lifecycle vars,lifecycle transitions,unique lifecycle transition count,case count,trace lengths,event count,activity names,unique activity count
0,2019 Purchase order handling,"[Spend area text, Company, Document Type, Sub ...",[timestamp],[resource],[name],[],[],-1,251734,"[12, 15, 18, 12, 12, 12, 12, 16, 8, 15, 13, 13...",1595923,"[SRM: Created, SRM: Complete, SRM: Awaiting Ap...",42


### Random trace sample

In [11]:
get_random_trace(df_2019)

Unnamed: 0,User,org:resource,concept:name,Cumulative net worth (EUR),time:timestamp,case:Spend area text,case:Company,case:Document Type,case:Sub spend area text,case:Purchasing Document,...,case:Vendor,case:Item Type,case:Item Category,case:Spend classification text,case:Source,case:Name,case:GR-Based Inv. Verif.,case:Item,case:concept:name,case:Goods Receipt
495039,user_042,user_042,Create Purchase Order Item,492.0,2018-04-11 05:45:00+00:00,Sales,companyID_0000,Standard PO,Products for Resale,4507022098,...,vendorID_0671,Standard,"3-way match, invoice before GR",NPR,sourceSystemID_0000,vendor_0646,False,30,4507022098_00030,True
495040,NONE,NONE,Vendor creates invoice,492.0,2018-04-13 21:59:00+00:00,Sales,companyID_0000,Standard PO,Products for Resale,4507022098,...,vendorID_0671,Standard,"3-way match, invoice before GR",NPR,sourceSystemID_0000,vendor_0646,False,30,4507022098_00030,True
495041,user_012,user_012,Record Invoice Receipt,492.0,2018-04-16 13:53:00+00:00,Sales,companyID_0000,Standard PO,Products for Resale,4507022098,...,vendorID_0671,Standard,"3-way match, invoice before GR",NPR,sourceSystemID_0000,vendor_0646,False,30,4507022098_00030,True
495042,user_029,user_029,Record Goods Receipt,492.0,2018-04-16 16:09:00+00:00,Sales,companyID_0000,Standard PO,Products for Resale,4507022098,...,vendorID_0671,Standard,"3-way match, invoice before GR",NPR,sourceSystemID_0000,vendor_0646,False,30,4507022098_00030,True
495043,batch_02,batch_02,Remove Payment Block,492.0,2018-04-17 00:10:00+00:00,Sales,companyID_0000,Standard PO,Products for Resale,4507022098,...,vendorID_0671,Standard,"3-way match, invoice before GR",NPR,sourceSystemID_0000,vendor_0646,False,30,4507022098_00030,True
495044,user_002,user_002,Clear Invoice,492.0,2018-06-07 11:11:00+00:00,Sales,companyID_0000,Standard PO,Products for Resale,4507022098,...,vendorID_0671,Standard,"3-way match, invoice before GR",NPR,sourceSystemID_0000,vendor_0646,False,30,4507022098_00030,True


## BPIC 2020

### Loading

In [12]:
df_2020_domestic = load_df_from_log(data_path+'BPI_2020_DomesticDeclarations.xes.gz')
df_2020_international = load_df_from_log(data_path+'BPI_2020_InternationalDeclarations.xes.gz')
df_2020_permits = load_df_from_log(data_path+'BPI_2020_PermitLog.xes.gz')
df_2020_rfp = load_df_from_log(data_path+'BPI_2020_RequestForPayment.xes.gz')

parsing log, completed traces :: 100%|██████████| 10500/10500 [00:04<00:00, 2468.02it/s]
parsing log, completed traces :: 100%|██████████| 6449/6449 [00:03<00:00, 1870.99it/s]
parsing log, completed traces :: 100%|██████████| 7065/7065 [00:03<00:00, 1970.27it/s]
parsing log, completed traces :: 100%|██████████| 6886/6886 [00:01<00:00, 4094.13it/s]


### EDA

In [13]:
row_2020_domestic = create_summary_row(df_2020_domestic, '2020 domestic declarations')
row_2020_domestic

Unnamed: 0,name,case vars,time vars,org vars,event vars,lifecycle vars,lifecycle transitions,unique lifecycle transition count,case count,trace lengths,event count,activity names,unique activity count
0,2020 domestic declarations,"[id, concept:name, BudgetNumber, DeclarationNu...",[timestamp],"[resource, role]",[name],[],[],-1,10500,"[5, 5, 5, 6, 6, 5, 5, 5, 6, 5, 6, 5, 5, 5, 5, ...",56437,"[Declaration SUBMITTED by EMPLOYEE, Declaratio...",17


In [14]:
row_2020_international = create_summary_row(df_2020_international, '2020 international declarations')
row_2020_international

Unnamed: 0,name,case vars,time vars,org vars,event vars,lifecycle vars,lifecycle transitions,unique lifecycle transition count,case count,trace lengths,event count,activity names,unique activity count
0,2020 international declarations,"[Permit travel permit number, DeclarationNumbe...",[timestamp],"[resource, role]",[name],[],[],-1,6449,"[10, 12, 12, 10, 10, 10, 13, 13, 10, 12, 10, 1...",72151,"[Start trip, End trip, Permit SUBMITTED by EMP...",34


In [15]:
row_2020_permits = create_summary_row(df_2020_permits, '2020 travel permits')
row_2020_permits

Unnamed: 0,name,case vars,time vars,org vars,event vars,lifecycle vars,lifecycle transitions,unique lifecycle transition count,case count,trace lengths,event count,activity names,unique activity count
0,2020 travel permits,"[OrganizationalEntity, ProjectNumber, TaskNumb...",[timestamp],"[resource, role]",[name],[],[],-1,7065,"[18, 18, 8, 10, 10, 13, 19, 7, 10, 12, 10, 8, ...",86581,"[Start trip, End trip, Permit SUBMITTED by EMP...",51


In [16]:
row_2020_rfp = create_summary_row(df_2020_rfp, '2020 requests for payment')
row_2020_rfp

Unnamed: 0,name,case vars,time vars,org vars,event vars,lifecycle vars,lifecycle transitions,unique lifecycle transition count,case count,trace lengths,event count,activity names,unique activity count
0,2020 requests for payment,"[Rfp_id, Project, Task, concept:name, Organiza...",[timestamp],"[resource, role]",[name],[],[],-1,6886,"[4, 5, 5, 4, 4, 5, 4, 5, 4, 5, 4, 5, 5, 4, 4, ...",36796,"[Request For Payment SUBMITTED by EMPLOYEE, Re...",19


#### Random trace samples

In [17]:
get_random_trace(df_2020_domestic)

Unnamed: 0,id,org:resource,concept:name,time:timestamp,org:role,case:id,case:concept:name,case:BudgetNumber,case:DeclarationNumber,case:Amount
7392,st_step 96558_0,STAFF MEMBER,Declaration SUBMITTED by EMPLOYEE,2017-10-13 08:40:34+00:00,EMPLOYEE,declaration 96555,declaration 96555,budget 86566,declaration number 96556,38.418227
7393,st_step 96557_0,STAFF MEMBER,Declaration FINAL_APPROVED by SUPERVISOR,2017-10-13 08:41:29+00:00,SUPERVISOR,declaration 96555,declaration 96555,budget 86566,declaration number 96556,38.418227
7394,dd_declaration 96555_19,SYSTEM,Request Payment,2017-10-19 09:16:30+00:00,UNDEFINED,declaration 96555,declaration 96555,budget 86566,declaration number 96556,38.418227
7395,dd_declaration 96555_20,SYSTEM,Payment Handled,2017-10-23 17:30:48+00:00,UNDEFINED,declaration 96555,declaration 96555,budget 86566,declaration number 96556,38.418227


In [18]:
get_random_trace(df_2020_international)

Unnamed: 0,id,org:resource,concept:name,time:timestamp,org:role,case:Permit travel permit number,case:DeclarationNumber,case:Amount,case:RequestedAmount,case:Permit TaskNumber,...,case:concept:name,case:Permit OrganizationalEntity,case:travel permit number,case:Permit RequestedBudget,case:id,case:Permit ID,case:Permit id,case:BudgetNumber,case:Permit ActivityNumber,case:AdjustedAmount
23581,st_step 31482_0,STAFF MEMBER,Permit SUBMITTED by EMPLOYEE,2018-02-12 14:59:14+00:00,EMPLOYEE,travel permit number 31478,declaration number 31480,1862.236771,1862.236771,UNKNOWN,...,declaration 31479,organizational unit 65454,travel permit number 31478,1717.266995,declaration 31479,travel permit 31477,travel permit 31477,budget 146810,UNKNOWN,1862.236771
23582,st_step 31481_0,STAFF MEMBER,Permit APPROVED by ADMINISTRATION,2018-02-12 14:59:38+00:00,ADMINISTRATION,travel permit number 31478,declaration number 31480,1862.236771,1862.236771,UNKNOWN,...,declaration 31479,organizational unit 65454,travel permit number 31478,1717.266995,declaration 31479,travel permit 31477,travel permit 31477,budget 146810,UNKNOWN,1862.236771
23583,st_step 31483_0,STAFF MEMBER,Permit FINAL_APPROVED by SUPERVISOR,2018-02-13 15:30:17+00:00,SUPERVISOR,travel permit number 31478,declaration number 31480,1862.236771,1862.236771,UNKNOWN,...,declaration 31479,organizational unit 65454,travel permit number 31478,1717.266995,declaration 31479,travel permit 31477,travel permit 31477,budget 146810,UNKNOWN,1862.236771
23584,rv_travel permit 31477_6,STAFF MEMBER,Start trip,2018-04-15 00:00:00+00:00,EMPLOYEE,travel permit number 31478,declaration number 31480,1862.236771,1862.236771,UNKNOWN,...,declaration 31479,organizational unit 65454,travel permit number 31478,1717.266995,declaration 31479,travel permit 31477,travel permit 31477,budget 146810,UNKNOWN,1862.236771
23585,rv_travel permit 31477_7,STAFF MEMBER,End trip,2018-04-20 00:00:00+00:00,EMPLOYEE,travel permit number 31478,declaration number 31480,1862.236771,1862.236771,UNKNOWN,...,declaration 31479,organizational unit 65454,travel permit number 31478,1717.266995,declaration 31479,travel permit 31477,travel permit 31477,budget 146810,UNKNOWN,1862.236771
23586,st_step 31491_0,STAFF MEMBER,Declaration SUBMITTED by EMPLOYEE,2018-05-15 18:40:28+00:00,EMPLOYEE,travel permit number 31478,declaration number 31480,1862.236771,1862.236771,UNKNOWN,...,declaration 31479,organizational unit 65454,travel permit number 31478,1717.266995,declaration 31479,travel permit 31477,travel permit 31477,budget 146810,UNKNOWN,1862.236771
23587,st_step 31490_0,STAFF MEMBER,Declaration REJECTED by ADMINISTRATION,2018-05-15 18:54:32+00:00,ADMINISTRATION,travel permit number 31478,declaration number 31480,1862.236771,1862.236771,UNKNOWN,...,declaration 31479,organizational unit 65454,travel permit number 31478,1717.266995,declaration 31479,travel permit 31477,travel permit 31477,budget 146810,UNKNOWN,1862.236771
23588,st_step 31492_0,STAFF MEMBER,Declaration REJECTED by EMPLOYEE,2018-05-17 12:22:59+00:00,EMPLOYEE,travel permit number 31478,declaration number 31480,1862.236771,1862.236771,UNKNOWN,...,declaration 31479,organizational unit 65454,travel permit number 31478,1717.266995,declaration 31479,travel permit 31477,travel permit 31477,budget 146810,UNKNOWN,1862.236771
23589,st_step 31487_0,STAFF MEMBER,Declaration SUBMITTED by EMPLOYEE,2018-05-18 16:53:53+00:00,EMPLOYEE,travel permit number 31478,declaration number 31480,1862.236771,1862.236771,UNKNOWN,...,declaration 31479,organizational unit 65454,travel permit number 31478,1717.266995,declaration 31479,travel permit 31477,travel permit 31477,budget 146810,UNKNOWN,1862.236771
23590,st_step 31488_0,STAFF MEMBER,Declaration REJECTED by ADMINISTRATION,2018-05-18 16:58:57+00:00,ADMINISTRATION,travel permit number 31478,declaration number 31480,1862.236771,1862.236771,UNKNOWN,...,declaration 31479,organizational unit 65454,travel permit number 31478,1717.266995,declaration 31479,travel permit 31477,travel permit 31477,budget 146810,UNKNOWN,1862.236771


In [19]:
get_random_trace(df_2020_permits)

Unnamed: 0,id,org:resource,concept:name,time:timestamp,org:role,case:OrganizationalEntity,case:ProjectNumber,case:TaskNumber,case:dec_id_0,case:ActivityNumber,...,case:Cost Type_14,case:Cost Type_10,case:Cost Type_11,case:Cost Type_12,case:Task_5,case:Task_4,case:Task_9,case:Task_8,case:Task_7,case:Task_6
31473,st_step 3660_0,STAFF MEMBER,Permit SUBMITTED by EMPLOYEE,2018-03-06 20:36:52+00:00,EMPLOYEE,organizational unit 65466,UNKNOWN,UNKNOWN,,UNKNOWN,...,,,,,,,,,,
31474,st_step 3662_0,STAFF MEMBER,Permit APPROVED by ADMINISTRATION,2018-03-06 20:37:01+00:00,ADMINISTRATION,organizational unit 65466,UNKNOWN,UNKNOWN,,UNKNOWN,...,,,,,,,,,,
31475,st_step 3661_0,STAFF MEMBER,Permit REJECTED by BUDGET OWNER,2018-03-07 11:44:02+00:00,BUDGET OWNER,organizational unit 65466,UNKNOWN,UNKNOWN,,UNKNOWN,...,,,,,,,,,,
31476,st_step 3663_0,STAFF MEMBER,Permit REJECTED by EMPLOYEE,2018-03-08 09:28:35+00:00,EMPLOYEE,organizational unit 65466,UNKNOWN,UNKNOWN,,UNKNOWN,...,,,,,,,,,,
31477,st_step 3657_0,STAFF MEMBER,Permit SUBMITTED by EMPLOYEE,2018-04-03 12:43:39+00:00,EMPLOYEE,organizational unit 65466,UNKNOWN,UNKNOWN,,UNKNOWN,...,,,,,,,,,,
31478,st_step 3656_0,STAFF MEMBER,Permit APPROVED by ADMINISTRATION,2018-04-03 12:43:41+00:00,ADMINISTRATION,organizational unit 65466,UNKNOWN,UNKNOWN,,UNKNOWN,...,,,,,,,,,,
31479,st_step 3658_0,STAFF MEMBER,Permit APPROVED by BUDGET OWNER,2018-04-03 14:04:20+00:00,BUDGET OWNER,organizational unit 65466,UNKNOWN,UNKNOWN,,UNKNOWN,...,,,,,,,,,,
31480,st_step 3659_0,STAFF MEMBER,Permit FINAL_APPROVED by SUPERVISOR,2018-04-05 12:02:27+00:00,SUPERVISOR,organizational unit 65466,UNKNOWN,UNKNOWN,,UNKNOWN,...,,,,,,,,,,
31481,st_step 3670_0,STAFF MEMBER,Request For Payment SUBMITTED by EMPLOYEE,2018-05-01 10:50:19+00:00,EMPLOYEE,organizational unit 65466,UNKNOWN,UNKNOWN,,UNKNOWN,...,,,,,,,,,,
31482,st_step 3671_0,STAFF MEMBER,Request For Payment APPROVED by ADMINISTRATION,2018-05-01 10:50:37+00:00,ADMINISTRATION,organizational unit 65466,UNKNOWN,UNKNOWN,,UNKNOWN,...,,,,,,,,,,


In [20]:
get_random_trace(df_2020_rfp)

Unnamed: 0,id,org:resource,concept:name,time:timestamp,org:role,case:Rfp_id,case:Project,case:Task,case:concept:name,case:OrganizationalEntity,case:Cost Type,case:RequestedAmount,case:Activity,case:RfpNumber
18420,st_step 169797_0,STAFF MEMBER,Request For Payment SUBMITTED by EMPLOYEE,2018-06-08 17:01:04+00:00,EMPLOYEE,request for payment 169790,project 503,UNKNOWN,request for payment 169790,organizational unit 65456,0,138.063445,UNKNOWN,request for payment number 169791
18421,st_step 169796_0,STAFF MEMBER,Request For Payment REJECTED by ADMINISTRATION,2018-06-08 17:02:00+00:00,ADMINISTRATION,request for payment 169790,project 503,UNKNOWN,request for payment 169790,organizational unit 65456,0,138.063445,UNKNOWN,request for payment number 169791
18422,st_step 169795_0,STAFF MEMBER,Request For Payment REJECTED by EMPLOYEE,2018-06-13 09:33:18+00:00,EMPLOYEE,request for payment 169790,project 503,UNKNOWN,request for payment 169790,organizational unit 65456,0,138.063445,UNKNOWN,request for payment number 169791
18423,st_step 169792_0,STAFF MEMBER,Request For Payment SUBMITTED by EMPLOYEE,2018-07-10 13:26:52+00:00,EMPLOYEE,request for payment 169790,project 503,UNKNOWN,request for payment 169790,organizational unit 65456,0,138.063445,UNKNOWN,request for payment number 169791
18424,st_step 169793_0,STAFF MEMBER,Request For Payment APPROVED by ADMINISTRATION,2018-07-10 13:27:02+00:00,ADMINISTRATION,request for payment 169790,project 503,UNKNOWN,request for payment 169790,organizational unit 65456,0,138.063445,UNKNOWN,request for payment number 169791
18425,st_step 169794_0,STAFF MEMBER,Request For Payment FINAL_APPROVED by SUPERVISOR,2018-07-11 11:26:24+00:00,SUPERVISOR,request for payment 169790,project 503,UNKNOWN,request for payment 169790,organizational unit 65456,0,138.063445,UNKNOWN,request for payment number 169791
18426,rp_request for payment 169790_15,SYSTEM,Request Payment,2018-08-23 09:29:07+00:00,UNDEFINED,request for payment 169790,project 503,UNKNOWN,request for payment 169790,organizational unit 65456,0,138.063445,UNKNOWN,request for payment number 169791
18427,rp_request for payment 169790_16,SYSTEM,Payment Handled,2018-08-27 17:31:18+00:00,UNDEFINED,request for payment 169790,project 503,UNKNOWN,request for payment 169790,organizational unit 65456,0,138.063445,UNKNOWN,request for payment number 169791


## Helpdesk Log

### Loading

In [31]:
df_helpdesk = load_df_from_log(data_path+'helpdesk_log.csv')
df_helpdesk.rename(columns={
                   'Case ID' : 'case:concept:name',
                    'Activity':'concept:name',
                    'Complete Timestamp':'time:timestamp',
                    'Resource':'org:resource'
                    }
                   , inplace=True)

### EDA

In [32]:
row_helpdesk = create_summary_row(df_helpdesk, 'Helpdesk')
row_helpdesk

Unnamed: 0,name,case vars,time vars,org vars,event vars,lifecycle vars,lifecycle transitions,unique lifecycle transition count,case count,trace lengths,event count,activity names,unique activity count
0,Helpdesk,[concept:name],[timestamp],[resource],[name],[],[],-1,4580,"[5, 4, 5, 5, 4, 4, 4, 4, 4, 4, 4, 4, 5, 5, 5, ...",21348,"[Assign seriousness, Take in charge ticket, Re...",14


#### Random Trace

In [33]:
get_random_trace(df_helpdesk)

Unnamed: 0,case:concept:name,concept:name,org:resource,time:timestamp,Variant,Variant index,Variant.1,seriousness,customer,product,responsible_section,seriousness_2,service_level,service_type,support_section,workgroup
3200,Case 696,Assign seriousness,Value 14,2013/05/13 15:17:52.000,Variant 1,1,Variant 1,Value 1,Value 217,Value 1,Value 1,Value 1,Value 2,Value 1,Value 1,Value 1
3201,Case 696,Take in charge ticket,Value 2,2013/05/29 07:10:27.000,Variant 1,1,Variant 1,Value 1,Value 217,Value 1,Value 1,Value 1,Value 2,Value 1,Value 1,Value 1
3202,Case 696,Resolve ticket,Value 4,2013/05/29 13:32:23.000,Variant 1,1,Variant 1,Value 1,Value 217,Value 1,Value 1,Value 1,Value 2,Value 1,Value 1,Value 1
3203,Case 696,Closed,Value 3,2013/06/13 13:32:42.000,Variant 1,1,Variant 1,Value 1,Value 217,Value 1,Value 1,Value 1,Value 2,Value 1,Value 1,Value 1


## Road Traffic Fine Management

### Loading

In [34]:
df_road_traffic = load_df_from_log(data_path+'Road_Traffic_Fine_Management_Process.xes.gz')

parsing log, completed traces :: 100%|██████████| 150370/150370 [00:22<00:00, 6644.89it/s]


### EDA

In [36]:
row_rtfm = create_summary_row(df_road_traffic, 'Road Traffic Fine Management')
row_rtfm

Unnamed: 0,name,case vars,time vars,org vars,event vars,lifecycle vars,lifecycle transitions,unique lifecycle transition count,case count,trace lengths,event count,activity names,unique activity count
0,Road Traffic Fine Management,[concept:name],[timestamp],[resource],[name],[transition],[complete],1,150370,"[2, 5, 5, 6, 5, 2, 2, 5, 6, 5, 5, 5, 2, 5, 5, ...",561470,"[Create Fine, Send Fine, Insert Fine Notificat...",11


#### Random trace

In [39]:
get_random_trace(df_road_traffic)

Unnamed: 0,amount,org:resource,dismissal,concept:name,vehicleClass,totalPaymentAmount,lifecycle:transition,time:timestamp,article,points,case:concept:name,expense,notificationType,lastSent,paymentAmount,matricola
345787,38.0,62.0,NIL,Create Fine,A,0.0,complete,2009-08-01 00:00:00+00:00,7.0,0.0,S139725,,,,,
345788,,,,Send Fine,,,complete,2010-03-04 00:00:00+00:00,,,S139725,13.5,,,,
345789,,,,Insert Fine Notification,,,complete,2010-03-20 00:00:00+00:00,,,S139725,,P,P,,
345790,77.5,,,Add penalty,,,complete,2010-05-19 00:00:00+00:00,,,S139725,,,,,
345791,,,,Send for Credit Collection,,,complete,2012-03-26 00:00:00+00:00,,,S139725,,,,,


## Sepsis Cases

### Loading

In [42]:
df_sepsis = load_df_from_log(data_path+'Sepsis Cases - Event Log.xes.gz')

parsing log, completed traces :: 100%|██████████| 1050/1050 [00:00<00:00, 1629.60it/s]


### EDA

In [43]:
row_sepsis = create_summary_row(df_sepsis, 'Sepsis Cases')
row_sepsis

Unnamed: 0,name,case vars,time vars,org vars,event vars,lifecycle vars,lifecycle transitions,unique lifecycle transition count,case count,trace lengths,event count,activity names,unique activity count
0,Sepsis Cases,[concept:name],[timestamp],[group],[name],[transition],[complete],1,1050,"[22, 8, 11, 8, 17, 13, 8, 29, 24, 18, 9, 17, 1...",15214,"[ER Registration, Leucocytes, CRP, LacticAcid,...",16


#### Random trace

In [44]:
get_random_trace(df_sepsis)

Unnamed: 0,InfectionSuspected,org:group,DiagnosticBlood,DisfuncOrg,SIRSCritTachypnea,Hypotensie,SIRSCritHeartRate,Infusion,DiagnosticArtAstrup,concept:name,...,DiagnosticLacticAcid,lifecycle:transition,Diagnose,Hypoxie,DiagnosticUrinarySediment,DiagnosticECG,case:concept:name,Leucocytes,CRP,LacticAcid
11248,True,A,True,False,True,False,True,True,True,ER Registration,...,True,complete,VD,False,False,True,XCA,,,
11249,,C,,,,,,,,ER Triage,...,,complete,,,,,XCA,,,
11250,,A,,,,,,,,ER Sepsis Triage,...,,complete,,,,,XCA,,,
11251,,B,,,,,,,,LacticAcid,...,,complete,,,,,XCA,,,2.3
11252,,B,,,,,,,,Leucocytes,...,,complete,,,,,XCA,14.7,,
11253,,B,,,,,,,,CRP,...,,complete,,,,,XCA,,12.0,
11254,,A,,,,,,,,IV Liquid,...,,complete,,,,,XCA,,,
11255,,A,,,,,,,,IV Antibiotics,...,,complete,,,,,XCA,,,
11256,,I,,,,,,,,Admission NC,...,,complete,,,,,XCA,,,
11257,,F,,,,,,,,Admission NC,...,,complete,,,,,XCA,,,
