# Managing Event Logs

This tutorial will go through the steps necessary to import and manage an event log.

## The `D4PyEventLog` class

The `Declare4Py.D4PyEventLog.D4PyEventLog` class is responsible for managing with `.xes` event log. It methods utilities for importing an event log, retrieving useful information, exporting it in a `.xes` format or converting it in a Pandas dataframe and computing the frequent itemsets of activities or other attributes.

We show how to instantiate a `D4PyEventLog`, notice that the name of the case id is required.

In [1]:
import os
from Declare4Py.D4PyEventLog import D4PyEventLog

event_log: D4PyEventLog = D4PyEventLog(case_name="case:concept:name")

  import sre_constants
  import sre_parse


The next step is the parsing of the log with the `parse_xes_log` function. Logs can be passed both in the `.xes` or `xes.gz` formats.

In [6]:
log_path = os.path.join("../../../", "tests", "test_logs", "Sepsis Cases.xes.gz")

# Parses a xes log to EventLog
event_log.parse_xes_log(log_path)

parsing log, completed traces ::   0%|          | 0/1050 [00:00<?, ?it/s]

  df[col] = pd.to_datetime(df[col], utc=True)
  df[col] = pd.to_datetime(df[col], utc=True)
  df[col] = pd.to_datetime(df[col], utc=True)
  df[col] = pd.to_datetime(df[col], utc=True)


Once the event log has been successfully parsed, basic information are available such as the log itself, its length, the case name, the concept name and the timestamp name.

In [7]:
# Print the parsed log
print("This is the log:")
print(event_log.get_log())
print("--------------------------------------")

# Print the number of cases in the log
print("Number of cases:")
print(event_log.get_length())
print("--------------------------------------")

# Print the number of cases in the log
print("Case name:")
print(event_log.get_case_name())
print("--------------------------------------")

# Print the number of cases in the log
print("Concept name:")
print(event_log.get_concept_name())
print("--------------------------------------")

# Print the number of cases in the log
print("Timestamp name:")
print(event_log.get_timestamp_name())

This is the log:
[{'attributes': {'concept:name': 'A'}, 'events': [{'InfectionSuspected': True, 'org:group': 'A', 'DiagnosticBlood': True, 'DisfuncOrg': True, 'SIRSCritTachypnea': True, 'Hypotensie': True, 'SIRSCritHeartRate': True, 'Infusion': True, 'DiagnosticArtAstrup': True, 'concept:name': 'ER Registration', 'Age': 85.0, 'DiagnosticIC': True, 'DiagnosticSputum': False, 'DiagnosticLiquor': False, 'DiagnosticOther': False, 'SIRSCriteria2OrMore': True, 'DiagnosticXthorax': True, 'SIRSCritTemperature': True, 'time:timestamp': Timestamp('2014-10-22 09:15:41+0000', tz='UTC'), 'DiagnosticUrinaryCulture': True, 'SIRSCritLeucos': False, 'Oligurie': False, 'DiagnosticLacticAcid': True, 'lifecycle:transition': 'complete', 'Diagnose': 'A', 'Hypoxie': False, 'DiagnosticUrinarySediment': True, 'DiagnosticECG': True, 'Leucocytes': nan, 'CRP': nan, 'LacticAcid': nan}, '..', {'InfectionSuspected': nan, 'org:group': 'E', 'DiagnosticBlood': nan, 'DisfuncOrg': nan, 'SIRSCritTachypnea': nan, 'Hypotens

### The `get_trace` method

The `get_trace` method returns a trace given a numeric index.

In [8]:
event_log.get_trace(3)

{'attributes': {'concept:name': 'D'}, 'events': [{'InfectionSuspected': True, 'org:group': 'A', 'DiagnosticBlood': True, 'DisfuncOrg': False, 'SIRSCritTachypnea': True, 'Hypotensie': False, 'SIRSCritHeartRate': True, 'Infusion': True, 'DiagnosticArtAstrup': True, 'concept:name': 'ER Registration', 'Age': 70.0, 'DiagnosticIC': True, 'DiagnosticSputum': False, 'DiagnosticLiquor': False, 'DiagnosticOther': False, 'SIRSCriteria2OrMore': True, 'DiagnosticXthorax': True, 'SIRSCritTemperature': True, 'time:timestamp': Timestamp('2014-07-10 09:52:00+0000', tz='UTC'), 'DiagnosticUrinaryCulture': False, 'SIRSCritLeucos': False, 'Oligurie': False, 'DiagnosticLacticAcid': True, 'lifecycle:transition': 'complete', 'Diagnose': 'D', 'Hypoxie': False, 'DiagnosticUrinarySediment': False, 'DiagnosticECG': True, 'Leucocytes': nan, 'CRP': nan, 'LacticAcid': nan}, '..', {'InfectionSuspected': nan, 'org:group': '?', 'DiagnosticBlood': nan, 'DisfuncOrg': nan, 'SIRSCritTachypnea': nan, 'Hypotensie': nan, 'SIR

### The `get_event_attribute_values` method

The `get_event_attribute_values` method returns all the values of an attribute that occur in an event log along with their number of occurences.

In [9]:
# Print the set of activity values that are in the log along with their number of occurences
print("Activity names:")
print(event_log.get_event_attribute_values(event_log.get_concept_name()))
print("--------------------------------------")

# Print the set of resource values that are in the log along with their number of occurences
print("Resources names:")
print(event_log.get_event_attribute_values('org:group'))

Activity names:
{'ER Registration': 1050, 'Leucocytes': 3383, 'CRP': 3262, 'LacticAcid': 1466, 'ER Triage': 1053, 'ER Sepsis Triage': 1049, 'IV Liquid': 753, 'IV Antibiotics': 823, 'Admission NC': 1182, 'Release A': 671, 'Return ER': 294, 'Admission IC': 117, 'Release B': 56, 'Release C': 25, 'Release D': 24, 'Release E': 6}
--------------------------------------
Resources names:
{'A': 3462, 'B': 8111, 'C': 1053, 'D': 47, 'E': 782, 'F': 216, 'G': 148, 'H': 55, '?': 294, 'I': 126, 'J': 26, 'K': 18, 'L': 213, 'M': 84, 'N': 46, 'O': 186, 'P': 59, 'Q': 63, 'R': 57, 'S': 33, 'T': 35, 'U': 18, 'V': 25, 'W': 55, 'X': 1, 'Y': 1}


### The `get_start_activities` method

The `get_start_activities` method returns all the activities that start the traces in the log. The method returns a dictionary where each starting activity is paired with the number of traces that start with that activity.

In [10]:
event_log.get_start_activities()

{'ER Registration': 995,
 'IV Liquid': 14,
 'ER Triage': 6,
 'CRP': 10,
 'ER Sepsis Triage': 7,
 'Leucocytes': 18}

### The `get_end_activities` method

The `get_end_activities` function returns all the activities that end the traces in the log. The method returns a dictionary where each ending activity is paired with the number of traces that end with that activity.

In [11]:
event_log.get_end_activities()

{'Release A': 393,
 'Return ER': 291,
 'IV Antibiotics': 87,
 'Release B': 55,
 'ER Sepsis Triage': 49,
 'Leucocytes': 44,
 'IV Liquid': 12,
 'Release C': 19,
 'CRP': 41,
 'LacticAcid': 24,
 'Release D': 14,
 'Admission NC': 14,
 'Release E': 5,
 'ER Triage': 2}

### The `attribute_log_projection` method

A log is a complex data structure that can be explored along several dimensions. The method `attribute_log_projection` projects the cases in the log according to the given input attribute. A projection is a list (the log) of lists (the single cases) containing the value of the attribute.

In [12]:
# Activity projection
for idx, trace in enumerate(event_log.attribute_log_projection(event_log.get_concept_name())):
    print(f"{idx}- {trace}")
print("--------------------------------------")

# Resource projection
for idx, trace in enumerate(event_log.attribute_log_projection("org:group")):
    print(f"{idx}- {trace}")

0- ['ER Registration', 'Leucocytes', 'CRP', 'LacticAcid', 'ER Triage', 'ER Sepsis Triage', 'IV Liquid', 'IV Antibiotics', 'Admission NC', 'CRP', 'Leucocytes', 'Leucocytes', 'CRP', 'Leucocytes', 'CRP', 'CRP', 'Leucocytes', 'Leucocytes', 'CRP', 'CRP', 'Leucocytes', 'Release A']
1- ['ER Registration', 'ER Triage', 'CRP', 'LacticAcid', 'Leucocytes', 'ER Sepsis Triage', 'IV Liquid', 'IV Antibiotics', 'Admission NC', 'CRP', 'CRP', 'Release A']
2- ['ER Registration', 'ER Triage', 'ER Sepsis Triage', 'Leucocytes', 'CRP', 'IV Liquid', 'IV Antibiotics', 'Admission NC', 'Admission NC', 'Leucocytes', 'CRP', 'Leucocytes', 'CRP', 'Release A']
3- ['ER Registration', 'ER Triage', 'ER Sepsis Triage', 'CRP', 'LacticAcid', 'Leucocytes', 'IV Liquid', 'IV Antibiotics', 'Admission NC', 'Leucocytes', 'CRP', 'Release A', 'Return ER']
4- ['ER Registration', 'ER Triage', 'ER Sepsis Triage', 'IV Liquid', 'CRP', 'Leucocytes', 'LacticAcid', 'IV Antibiotics']
5- ['ER Registration', 'ER Triage', 'ER Sepsis Triage', 

### The `get_variants` method

This method returns all the variants of an event log. It returns a dictionary where the key is a string expressing the variant and the value is a list containing all the traces encoding that variant. The following snippet of code returns the variants in a string format.

In [13]:
for idx, variant in enumerate(event_log.get_variants().keys()):
    print(f"{idx}- {variant}")

0- ('ER Registration', 'Leucocytes', 'CRP', 'LacticAcid', 'ER Triage', 'ER Sepsis Triage', 'IV Liquid', 'IV Antibiotics', 'Admission NC', 'CRP', 'Leucocytes', 'Leucocytes', 'CRP', 'Leucocytes', 'CRP', 'CRP', 'Leucocytes', 'Leucocytes', 'CRP', 'CRP', 'Leucocytes', 'Release A')
1- ('ER Registration', 'ER Triage', 'CRP', 'LacticAcid', 'Leucocytes', 'ER Sepsis Triage', 'IV Liquid', 'IV Antibiotics', 'Admission NC', 'CRP', 'CRP', 'Release A')
2- ('ER Registration', 'ER Triage', 'ER Sepsis Triage', 'Leucocytes', 'CRP', 'IV Liquid', 'IV Antibiotics', 'Admission NC', 'Admission NC', 'Leucocytes', 'CRP', 'Leucocytes', 'CRP', 'Release A')
3- ('ER Registration', 'ER Triage', 'ER Sepsis Triage', 'CRP', 'LacticAcid', 'Leucocytes', 'IV Liquid', 'IV Antibiotics', 'Admission NC', 'Leucocytes', 'CRP', 'Release A', 'Return ER')
4- ('ER Registration', 'ER Triage', 'ER Sepsis Triage', 'IV Liquid', 'CRP', 'Leucocytes', 'LacticAcid', 'IV Antibiotics')
5- ('ER Registration', 'ER Triage', 'ER Sepsis Triage', 

### The `to_dataframe` method

The event log can be converted in a Pandas dataframe with the `to_dataframe` method.

In [14]:
event_log.to_dataframe()
event_log.get_log().head()

Unnamed: 0,InfectionSuspected,org:group,DiagnosticBlood,DisfuncOrg,SIRSCritTachypnea,Hypotensie,SIRSCritHeartRate,Infusion,DiagnosticArtAstrup,concept:name,...,DiagnosticLacticAcid,lifecycle:transition,Diagnose,Hypoxie,DiagnosticUrinarySediment,DiagnosticECG,Leucocytes,CRP,LacticAcid,case:concept:name
0,True,A,True,True,True,True,True,True,True,ER Registration,...,True,complete,A,False,True,True,,,,A
1,,B,,,,,,,,Leucocytes,...,,complete,,,,,9.6,,,A
2,,B,,,,,,,,CRP,...,,complete,,,,,,21.0,,A
3,,B,,,,,,,,LacticAcid,...,,complete,,,,,,,2.2,A
4,,C,,,,,,,,ER Triage,...,,complete,,,,,,,,A


### The `to_eventlog` method

The event log can be converted in a EventLog with the `to_eventlog` method.

In [15]:
event_log.to_eventlog()
event_log.get_log()



[{'attributes': {'concept:name': 'A'}, 'events': [{'InfectionSuspected': True, 'org:group': 'A', 'DiagnosticBlood': True, 'DisfuncOrg': True, 'SIRSCritTachypnea': True, 'Hypotensie': True, 'SIRSCritHeartRate': True, 'Infusion': True, 'DiagnosticArtAstrup': True, 'concept:name': 'ER Registration', 'Age': 85.0, 'DiagnosticIC': True, 'DiagnosticSputum': False, 'DiagnosticLiquor': False, 'DiagnosticOther': False, 'SIRSCriteria2OrMore': True, 'DiagnosticXthorax': True, 'SIRSCritTemperature': True, 'time:timestamp': Timestamp('2014-10-22 09:15:41+0000', tz='UTC'), 'DiagnosticUrinaryCulture': True, 'SIRSCritLeucos': False, 'Oligurie': False, 'DiagnosticLacticAcid': True, 'lifecycle:transition': 'complete', 'Diagnose': 'A', 'Hypoxie': False, 'DiagnosticUrinarySediment': True, 'DiagnosticECG': True, 'Leucocytes': nan, 'CRP': nan, 'LacticAcid': nan}, '..', {'InfectionSuspected': nan, 'org:group': 'E', 'DiagnosticBlood': nan, 'DisfuncOrg': nan, 'SIRSCritTachypnea': nan, 'Hypotensie': nan, 'SIRSCr

### The `save_xes` method

The event log can be saved in `xes` format with the `save_xes` method.

In [16]:
event_log.save_xes("saved_log.xes")

exporting log, completed traces ::   0%|          | 0/1050 [00:00<?, ?it/s]

### The `compute_frequent_itemsets` method

The `D4PyEventLog` class offers support for computing the frequent itemsets of attributes in the log. The method `compute_frequent_itemsets` takes as input the `min_support` of the itemsets, the name of the case id attribute, a list with the names of the attributes you want to discover the itemsets, the `algorithm` to perform the computation (available `fpgrowth` and `apriori`) and `len_itemset` indicating the maximum length of the itemsets, the default is `None`.

In [17]:
frequent_itemsets = event_log.compute_frequent_itemsets(min_support=0.8, case_id_col=event_log.get_case_name(), categorical_attributes=['concept:name'], algorithm='fpgrowth', len_itemset=3)
frequent_itemsets

Unnamed: 0,support,itemsets,length
0,1.0,(concept:name_ER Triage),1
1,1.0,(concept:name_ER Registration),1
2,0.999048,(concept:name_ER Sepsis Triage),1
3,0.96381,(concept:name_Leucocytes),1
4,0.959048,(concept:name_CRP),1
5,0.819048,(concept:name_LacticAcid),1
6,1.0,"(concept:name_ER Triage, concept:name_ER Regis...",2
7,0.999048,"(concept:name_ER Sepsis Triage, concept:name_E...",2
8,0.999048,"(concept:name_ER Triage, concept:name_ER Sepsi...",2
9,0.999048,"(concept:name_ER Triage, concept:name_ER Sepsi...",3
