## Demonstrating Object-centric Causal nets by Merging Different Object Types C-nets with a Running Example

This notebook shows a running example of the discovery and visualization of an Object-centric Casual nets as presented in the paper 'Discovering Minimal Object-Centric Causal-Nets in Process Mining'.

### Pre-requisites

Please install the libraries mentioned in the file requirements.txt
<br>You can do it by using pip install -r requirements.txt 

In [1]:
import pm4py

from discover_occnets import *
from view_occnets_jupyter import *

In [2]:
file_path = './order-management.json'

In [3]:
ocel, ot_activities, event_to_obj, obj_to_obj = import_log(file_path)

### Model Discovery

The approach introduced in the paper uses the discovery of Causal nets for each object type and then merges them. This running example is based on an order management process found in an OCEL 2.0 event log. First, let us view the process model when including all object types.

In [4]:
ot_activities, subgraphs, ot_edges = subgraphs_dict(file_path, dependency_threshold=0.99)

In [7]:
grouped_df = ot_edges.groupby('original_edge')
for key, item in grouped_df:
    print(grouped_df.get_group(key), "\n\n")

                   original_edge         source         target  \
0    confirm order confirm order  confirm order            o_2   
1    confirm order confirm order            o_2            i_2   
2    confirm order confirm order            i_2  confirm order   
6    confirm order confirm order  confirm order            o_8   
7    confirm order confirm order            o_8            i_8   
8    confirm order confirm order            i_8  confirm order   
112  confirm order confirm order  confirm order           o_36   
113  confirm order confirm order           o_36           i_42   
114  confirm order confirm order           i_42  confirm order   

                        label           type  color intensity width  length  \
0                              visualization  black      None  None     1.0   
1     freq = 505 / dep = 1.00  visualization  black      None  None     4.0   
2                              visualization  black      None  None     1.0   
6                      

In [6]:
OCCN_model = all_ot_visualization(ot_activities, subgraphs, profile=['orders','items','packages'])

Now, we illustrate how Causal nets for an object type are discovered. Here, we choose the object type 'items'. These are the activities related to this object type.

In [None]:
print(ot_activities['items'])

First, we flatten and read the log. This allows us to have all activities related to the object type identified by the 'item' object ID and discover their traces, which is the sequence of activities for each object in the log.

In [None]:
ocel, ot_activities, event_to_obj, obj_to_obj = import_log(file_path)
flt = flatten_log(ocel, ot_activities)
flt_items = flt['items']
logs = read_log(flt)
logs_items = logs['items']
all_traces = traces(flt)
ot_traces = all_traces['items']

Let us see how the trace for item 'i-880001' looks like.

In [None]:
print(ot_traces['i-880001'])

It shows all activities realized for item 'i-880001' in the order in which they happened, according to activity timestamps in the log. Next, we want to count some frequencies. As shown in the model, each activity has its corresponding total count. Also, the frequency one activity is followed by another activity is counted because it allows us to calculate the dependency measure between them. In addition, we discover the start and end activities for the object type 'items'.

In [None]:
act_total = activity_total(logs_items)
activities = activity_frequencies(logs_items)
or_start = original_start(act_total, activities)
or_end = original_end(act_total, activities)
freq = frequencies(activities)

Let us check how the frequency matrix for object type 'items' looks like.

In [None]:
print()
print('FREQUENCY MATRIX')
print()
print(freq)

Now, we can proceed to calculate the dependencies between activities. If only frequencies were considered, noise in the log could influence it, causing underfitting and admitting more behavior than it should in the model. However, the dependency measures prevent this from happening and are the fundamental variables on which Causal nets rely. We use three formulas to calculate the dependency measures. The first one relates two activities to one another; the second one relates one activity to itself (loops); and the third one relates two non-consecutive activities and helps determine if there is a long-distance causal relation between them. Let us see the dependency matrix for the object type 'items'.

In [None]:
dep = dependency_matrix(freq)
dep_dict = dependency_dict(dep)
long = long_distance_dependency(act_total, ot_traces, or_start, or_end)
print()
print('DEPENDENCY MATRIX')
print()
print(dep)

We consider a dependency measure of 0.95 or higher to build the dependency graph that serves as basis for the model. The dependency graph is composed of nodes and edges, where the nodes are the activities, and the edges are the arcs that connect one activity to another, showing the frequency and dependency relationships between them. When building the dependency graph, we need to avoid disconnected activities in the Causal nets model. This is achieved for each object type, but it is not always possible when there is more than one object type in the model because each object type has different start and end activities. To find the predecessor and successor activities of each activity, the dependency measure threshold is considered. This results in activities losing the race and potentially being disconnected in the Causal nets of an object type. To solve this, the next best predecessor is mined.

In [None]:
depgraph = dependency_graph(act_total, or_start, or_end, freq, dep, dep_dict, long, dependency_threshold=0.95)

Next, we use the dependency graph edges to discover the input and output arcs, that is, the predecessor and successor activities of each activity. By replaying the log in the arcs, we find the input and output bindings of each activity. This is the last step we need to take to discover the Causal nets of an object type.

In [None]:
# Generate the arcs based on the edges of the dep_graph
in_arcs = input_arcs(depgraph)
out_arcs = output_arcs(depgraph)

# Find the bindings in the incoming and outgoing arcs of the graph, using replay
cnet_outbindings = output_bindings(ot_traces, out_arcs, in_arcs)
cnet_inbindings = input_bindings(ot_traces, out_arcs, in_arcs) 

Let us discover the Causal nets of the object type 'items'. 

In [None]:
activity_counts, ot_counts, mean_dict, median_dict, min_dict, max_dict = ot_act_stats(event_to_obj)
ot_nodes, ot_edges, i_seq, o_seq = ot_graph(depgraph, act_total, act_total, ot_counts, mean_dict, median_dict, min_dict, max_dict, activities, dep_dict, cnet_inbindings, cnet_outbindings, seq_i=1, seq_o=1)
ot_subgraphs_dict = {'items': (ot_nodes, ot_edges)}
print(ot_subgraphs_dict)

However, this tabular representation is not ideal. So, in the next section, we will graphically visualize the complete model for selected object types.

### Object-centric Model Visualization

We start by visualizing the 'items' object type Causal nets discovered in the previous section.

In [None]:
profile=['items']
cnets_items = all_ot_visualization(ot_activities, subgraphs, profile=profile)


However, in object-centric process mining, we are interested in the relationships between different object types. If all object types are included, the model can become cluttered and hinder the analysis. Thus, some object types are chosen, and their Causal nets are discovered and then merged to be presented in the model. The object types and respective activities for the current process are presented here.

In [None]:
for k,v in ot_activities.items():
    print(f"OT '{k}': activities {v}")  
print()

To generate the object-centric model, we can define a profile, which is the list of object types we select to include in the visualization. If not defined, the model will include all object types encountered in the OCEL file. In our case, the profile is ['items', 'packages']. Note that activities {'package delivered', 'send package', 'create package', 'failed delivery'} are common to both object types. This would result in more than one edge connecting two activities, one for each object type. To solve this, edges are merged to be represented in the model, and the frequency and dependency measures of one activity followed by another are shown in different colors assigned to each object type, for clarity. This is done only for single bindings, as explained in the paper. Next, the nodes of merged edges are removed from the graph in the algorithm. The final step is to generate the model and the legend for visualization. In our case, the chosen profile generates the following model.

In [None]:
profile = ['items', 'packages']

In [None]:
OCCN_model = all_ot_visualization(ot_activities, subgraphs, profile=profile)