<a href="https://colab.research.google.com/github/matthiasweidlich/promi_course/blob/master/conformance/rules_replay.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Hands-On Exercise: Rule Checking and Replay-based Conformance

In this exercise, you will take up the real-life event log of a Dutch financial institute, known already from the earlier notebook. Be reminded that you should be able to fetch the example data with the code in the next cell. If this does not work, however, you can also download the event log (XES format, please unzip) [here](http://www.win.tue.nl/bpi/doku.php?id=2012:challenge) and then either copy it to your google drive, mount it, and read it from there, or directly upload it using your browser.

Also, further details can be found in the [description of the dataset](http://www.win.tue.nl/bpi/doku.php?id=2012:challenge).

In [None]:
# basic configuration
%config InlineBackend.figure_format = 'svg'
%matplotlib inline

# import data from google drive
#from google.colab import drive
#drive.mount('/content/drive')

# direct data upload
#from google.colab import files
#files.upload()

# fetch the data file
! wget -O financial_log.xes https://github.com/matthiasweidlich/promi_course/blob/master/log_exploration/financial_log.xes?raw=true

## Import Event Log
The following method imports the log file and returns it in a list structure.

In [None]:
import xml.etree.ElementTree as et

def load_xes(file, event_filter = []):
    log = []
    
    tree = et.parse(file)
    data = tree.getroot()
    
    # find all traces
    traces = data.findall('{http://www.xes-standard.org/}trace')
    
    for t in traces:
        trace_id = None
        
        # get trace id
        for a in t.findall('{http://www.xes-standard.org/}string'):
            if a.attrib['key'] == 'concept:name':
                trace_id = a.attrib['value']
        
        events = []
        # events
        for event in t.iter('{http://www.xes-standard.org/}event'):
            
            e = {'name': None, 'timestamp': None, 'resource': None, 'transition': None}
            
            for a in event:
                e[a.attrib['key'].split(':')[1]] = a.attrib['value']
            
            if e['transition'] == 'COMPLETE' and (e['name'] in event_filter or len(event_filter) == 0):
                events.append(e)
        
        # add trace to log
        if len(events) > 0:
            log.append({'trace_id': trace_id, 'events': events})
        
    return log

Now import the given log and compute the trace variants of the log along with their frequencies.

In [None]:
from pprint import pprint

log_file = './financial_log.xes'
log = load_xes(log_file)

print('Load log with %s traces.' %len(log))

trace_variants = {}
for trace in log:
    events = []
    for event in trace['events']:
        events.append(event['name'])
    trace_variants[tuple(events)] = trace_variants.get(tuple(events), 0) + 1
    
# print the four most frequent variants
trace_variants_sorted_by_freq = sorted(trace_variants.items(), key=lambda kv: kv[1], reverse=True)
pprint(trace_variants_sorted_by_freq[:4])


## Import Process Model

For the log, a process model is given in the form of a Petri net. Such a process model is typically created manually. For this particular example, however, the model has been discovered automatically using the Inductive Miner, applying some noise filtering threshold. 

In [None]:
# fetch some python files to model Petri nets
! wget -O pn.py https://github.com/matthiasweidlich/promi_course/blob/master/conformance/pn.py?raw=true
%run pn.py

# fetch the actual definition file of the discovered Petri net 
! wget -O financial_log_80_noise.pnml https://github.com/matthiasweidlich/promi_course/blob/master/conformance/financial_log_80_noise.pnml?raw=true
net = PetriNet()
load(net, "./financial_log_80_noise.pnml")

# mark the initial place
net.add_marking(1,1)
# visualise it 
draw_petri_net(net)

Set up some helper dictionaries to relate transition IDs (from the Petri net) and activity labels to each other. Observe that an activity label is only assigned to a single transition. However, multiple transitions may carry a _tau_ label, representing a silent transition.



In [None]:
# helper mappings between ids and labels
mapping = net.get_mapping()
rev_mapping = {}
for k, v in net.get_mapping().items():
    for k2 in v:
        rev_mapping[k2] = k

from pprint import pprint
# mapping from labels to LISTS of transitions ids
pprint(mapping)

# mapping from transitions id to label
pprint(rev_mapping)


Next, we illustrate how, given an initial marking, the currently enabled transitions may be identified, how the marking is changed by firing a transition, and how the marking may be adapted to enable a transition. 

In [None]:
marking = list(net.get_marking())

print("Initial marking: ", net.get_marking())

enabled = net.all_enabled_transitions()
print("Enabled transitions in initial marking: ", 
      list(map((lambda k: rev_mapping[k]), enabled)))

# Fire enabled transition (take the first, but there is only one)
net.fire_transition(enabled[0])
enabled = net.all_enabled_transitions()
print("Enabled transitions after firing first transition: ", 
      list(map((lambda k: rev_mapping[k]), enabled)))

# Check whether the transition with label 'O_CREATED' is enabled 
# (there is only one transition carrying this label)
print("Is transition 'O_CREATED' enabled?", 
      net.is_enabled(net.get_mapping()['O_CREATED'][0]))

# Enable the transition by changing the marking and adding tokens to the input 
# places of the transition with label 'O_CREATED' 
input_places = net.get_input_places(net.get_mapping()['O_CREATED'][0])

for p in input_places:
    net.add_marking(p,1)

# Again, check whether the transition with label 'O_CREATED' is enabled 
print("Is transition 'O_CREATED' enabled after tokens have been added to the places in its preset?", 
      net.is_enabled(net.get_mapping()['O_CREATED'][0]))

# Check whether further transitions have been enabled by adding the token to 
# the places in the preset of the transition with label 'O_CREATED'
enabled = net.all_enabled_transitions()
print("Enabled transitions after adapting the marking: ", 
      list(map((lambda k: rev_mapping[k]), enabled)))

print("Current marking: ", net.get_marking())

for k,v in net.places.items():
    net.add_marking(v, marking[k])


## Rule Checking

First, we assess the conformance of the given event log and process model using rules that are derived from the model. Specifically, we consider a cardinality rule that checks a lower and an upper bound for the number of executions of an activity for a particular trace, as well as an ordering rule that checks whether executions of one activity happen only after executions of another activity.

**Task:** Complete the following functions to check the respective rules in a rather generic manner. 

In [None]:
def check_lower_bound(trace: [], act: str, bound: int) -> bool:
    
    ###########################
    # Your code here
    ###########################

    return False

def check_upper_bound(trace: [], act: str, bound: int) -> bool:

    ###########################
    # Your code here
    ###########################

    return False

def check_order_after(trace: [], act_1: str, act_2: str) -> bool:

    ###########################
    # Your code here
    ###########################

    return False


Check whether the five most frequent trace variants actually satisfy the following rules:


*   The application is completed at least once (activity "W_Completeren aanvraag").
*   The application is submitted at most once (activity "A_SUBMITTED").
*   The income lead ("W_Afhandelen leads") is fixed only after the preacceptance ("A_PREACCEPTED"), but never before. 



In [None]:
for k in range(5):
    trace_k = list(trace_variants_sorted_by_freq[k][0])
    print("Checking trace: %s" % trace_k)
    print("Application completed at least once? ", check_lower_bound(trace_k, 'W_Completeren aanvraag', 1))
    print("Application submitted at most once? ", check_upper_bound(trace_k, 'A_SUBMITTED', 1))
    print("Fixing income lead only after preaceptance? ", check_order_after(trace_k, 'W_Afhandelen leads', 'A_PREACCEPTED'))

## Replay-based Conformance

Next, consider replay-based conformance checking. 

**Task:** The following function shall take a Petri net and a trace and replay it. It shall return the numbers of produced, consumed, missing, and remaining tokens. 

In [None]:

def replay_trace(net: PetriNet, trace: []) -> (int, int, int, int):
   
    produced = 1
    consumed = 1
    missing = 0
    remaining = 0

    ###########################
    # Your code here
    ###########################
    
    return produced, consumed, missing, remaining


def fitness(net: PetriNet, log_freq: dict) -> float:
    sum_prod = 0
    sum_cons = 0
    sum_miss = 0
    sum_rema = 0

    for trace_var, freq in log_freq.items():
        # keep copy of marking
        marking = list(net.get_marking())
        # replay trace
        replay_values = replay_trace(net, trace_var)
        sum_prod += log_freq[trace_var] * replay_values[0]
        sum_cons += log_freq[trace_var] * replay_values[1]
        sum_miss += log_freq[trace_var] * replay_values[2]
        sum_rema += log_freq[trace_var] * replay_values[3]
        # restore marking
        for k,v in net.places.items():
            net.add_marking(v, marking[k])

    return 0.5 * (1 - sum_miss / sum_cons) + 0.5 * (1 - sum_rema / sum_prod)

Measure fitness of the most frequent trace variant: 

In [None]:
log_1 = {t[0]:t[1] for t in trace_variants_sorted_by_freq[0:1]}
fitness_value = fitness(net, log_1)
print("Fitness value of most frequent trace variant:", fitness_value)

Now, see how the fitness value changes when considering the _k_-most frequent trace variants.

In [None]:
fitness_value = 0
for k in range(30):
    log_k = {t[0]:t[1] for t in trace_variants_sorted_by_freq[k:k+1]}
    log_x = {t[0]:t[1] for t in trace_variants_sorted_by_freq[0:k+1]}
    fitness_value_k = fitness(net, log_k)
    fitness_value = fitness(net, log_x)
    print("Fitness value of the single %s-most frequent trace variant: %f" % (k+1, fitness_value_k))
    print("Fitness value of %s-most frequent trace variants: %f" % (k+1, fitness_value))

##-- End