# CC4MCSLAEnriched

The notebook describes our approach to state machine abstraction and compliance checking using the alignment technique. The approach requires an event log and a defined state machine as inputs, and consists of three components: annotation, abstraction, and checker.

The first component, the annotation component, is responsible for annotating the events contained in an event log based on a knowledge base. This annotation process identifies whether an event is related to a state or a transition in order to abstract a state machine discovered in the next component.

The second component, the abstraction component, is responsible for discovering a state machine in the event log. This component uses the annotated event log to identify the states and transitions in the log and to construct a state machine model on the basis of patterns.

The last component, the checker, compares the discovered state machine with the defined state machine to identify deviations between what happened and what is defined. This component checks for compliance between the defined state machine and the actual behavior captured in the event log.

Overall, the approach presented in this notebook provides a way to automatically abstract state machines from event logs and to check their compliance with a defined state machine. 

## Annotation ##

The first component use an ontology for describing the Knowledge Base and annotate the events composing the event log. We use python and the library owlready2 to implement and read ontology. 

In [1]:
from owlready2 import * 
onto = get_ontology('http://test.org/onto.owl')

##### Ontology Classes Definition ##### 
with onto: 
    class stateMachine(Thing):
        pass
    class State(stateMachine):
        pass
    class Start(State):
        pass
    class Execute(State):
        pass
    class Complete(State):
        pass
    class Transition(stateMachine):
        pass
    class eventType(Thing):
        pass
    class isRelatedTo(ObjectProperty, FunctionalProperty):
        domain = [eventType]
        range  = [stateMachine]

##### Declare event types ##### Individuals
Service_Create = eventType('Service_Create', isRelatedTo=Start)
Service_Remove = eventType('Service_Remove', isRelatedTo=Start)
Service_Update = eventType('Service_Update', isRelatedTo=Start)
Container_Create = eventType('Container_Create', isRelatedTo=Execute)
Container_Destroy = eventType('Container_Destroy', isRelatedTo=Execute)
Container_Start = eventType('Container_Start', isRelatedTo=Complete)
Container_Stop = eventType('Container_Stop', isRelatedTo=Complete)
Ressource_Usage = eventType('Ressource_Usage', isRelatedTo=Transition)

onto.save(file='onto.owl', format="rdfxml")



In [2]:
print(list(onto.classes()))
print(list(onto.individuals()))
print(list(onto.object_properties()))

[onto.stateMachine, onto.State, onto.Start, onto.Execute, onto.Complete, onto.Transition, onto.eventType]
[onto.Service_Create, onto.Service_Remove, onto.Service_Update, onto.Container_Create, onto.Container_Destroy, onto.Container_Start, onto.Container_Stop, onto.Ressource_Usage]
[onto.isRelatedTo]


We then define a function to find the ancestors in an ontology of the type of event identified in the event logs.

In [3]:
##### Ancestors Calling #####
def search_ancestors(onto, ask):
    result = onto.search(iri = "*{}".format(ask))
    lcStep = str(result[0].isRelatedTo).split('.')[1]
    smElt = str(result[0].isRelatedTo.is_a[0]).split('.')[1]
    if lcStep == 'Transition':
        smElt = 'Transition'
        lcStep = 'N/A'
    return [smElt, lcStep]

The annotation function of the event-logs is implemented in the final stage, whereby the state-machine element and lifecycle step are incorporated into each event. The function outputs an XES file containing the annotated events with the additional metadata.

In [4]:
#### Pre-processing based on ontology ####
import pandas as pd
import pm4py

#### Import event-logs from CSV to pandas dataframe
dataframe = pd.read_csv('logs.csv', sep=',')
pd.set_option('display.width',1000)
dataframe = pm4py.format_dataframe(dataframe, case_id='Resource Name', activity_key='Event-Type', timestamp_key='Timestamp')

### Print Raw event logs
print("Raw event-logs")
print(dataframe[['Timestamp', 'Source', 'Resource Name', 'Event-Type', 'Metric', 'Value']])



Raw event-logs
                   Timestamp     Source Resource Name         Event-Type     Metric Value
0  2023-03-13 00:00:03+00:00   Provider          Auth     Service_Create   replicas     2
1  2023-03-13 00:00:04+00:00   Provider          Auth   Container_Create          /     /
2  2023-03-13 00:00:05+00:00   Provider          Auth    Container_Start          /     /
3  2023-03-13 00:00:05+00:00  Ressource          Auth    Ressource_Usage  Cpu Usage    15
4  2023-03-13 00:01:05+00:00  Ressource          Auth    Ressource_Usage  Cpu Usage    15
..                       ...        ...           ...                ...        ...   ...
73 2023-03-13 00:03:05+00:00  Ressource            UI    Ressource_Usage  Cpu Usage    15
74 2023-03-13 00:03:15+00:00  Ressource            UI    Ressource_Usage  Cpu Usage    15
75 2023-03-13 00:03:45+00:00   Provider            UI     Service_Update   replicas     0
76 2023-03-13 00:03:46+00:00   Provider            UI     Container_Stop          /  

  dataframe = pm4py.format_dataframe(dataframe, case_id='Resource Name', activity_key='Event-Type', timestamp_key='Timestamp')


In [5]:
#### Iterate through event logs ####
for idx, row in dataframe.iterrows():
    # Search event type in ontology and returns ancestors 
    smElt, lcStep = search_ancestors(onto, row['Event-Type'])
    dataframe.loc[[idx],'smElt'] = smElt
    dataframe.loc[[idx],'lcStep'] = lcStep

### Export as XES ###
event_log = pm4py.convert_to_event_log(dataframe)
xes = pm4py.write_xes(event_log, 'exported.xes')

### Print Dataframe
print("Annotated logs with state-machine element (smElt) and lifecycle step (lcStep)")
print(dataframe[['Timestamp', 'Source', 'Resource Name', 'Event-Type', 'Metric', 'Value', 'smElt', 'lcStep']])


  from .autonotebook import tqdm as notebook_tqdm
exporting log, completed traces :: 100%|██████████| 3/3 [00:00<00:00, 285.64it/s]

Annotated logs with state-machine element (smElt) and lifecycle step (lcStep)
                   Timestamp     Source Resource Name         Event-Type     Metric Value       smElt    lcStep
0  2023-03-13 00:00:03+00:00   Provider          Auth     Service_Create   replicas     2       State     Start
1  2023-03-13 00:00:04+00:00   Provider          Auth   Container_Create          /     /       State   Execute
2  2023-03-13 00:00:05+00:00   Provider          Auth    Container_Start          /     /       State  Complete
3  2023-03-13 00:00:05+00:00  Ressource          Auth    Ressource_Usage  Cpu Usage    15  Transition       N/A
4  2023-03-13 00:01:05+00:00  Ressource          Auth    Ressource_Usage  Cpu Usage    15  Transition       N/A
..                       ...        ...           ...                ...        ...   ...         ...       ...
73 2023-03-13 00:03:05+00:00  Ressource            UI    Ressource_Usage  Cpu Usage    15  Transition       N/A
74 2023-03-13 00:03:15+00:




Therefore, the annotation component generates an annotated XES file that includes state-machine elements (smElt) and lifecycle steps (lcStep). This file is then utilized by the subsequent component responsible for state-machine abstraction.

### Abstraction

Based on the XES annotated event logs, we abstract state-machine using our defined patterns. We define several functions in order to identify state-machine elements. We also define an abstract function for pattern identification of events in a pandas dataframe rows. 

In [6]:
#### Function Definition ####
import pm4py
import pandas as pd

from typing import List
from StateMachine import StateMachine

from datetime import timedelta

def pattern_identification(log: pd.DataFrame, pattern: List, attribute: str):
    """ 
        Return index of pattern in log in DataFrame 
        To Do: 
            - Add functionnalities to define patterns across several attributes
            - Enabled possibilities of eventually follows pattern 
    """
    # Identify number of item in pattern
    nbPattern = len(pattern)

    # Construction pattern as string
    ## Begin of request
    pattern_s = f"""log.index[(log['{attribute}'] == '{pattern[0]}')"""
    for item in range(1, nbPattern):
        pattern_s += f""" & (log['{attribute}'].shift(-{item}) == '{pattern[item]}')"""
    ## End of request
    pattern_s += f"""]"""

    # Execution of defined pattern
    indice_p_s = eval(pattern_s)
    return indice_p_s

def state_abstraction(log: pd.DataFrame):
    """
        Return states identified
    """
    ## Declare discovered state machine
    SM_Discovered = StateMachine(
        name=''
    )
    pattern=['Start', 'Execute', 'Complete']

    states_index = pattern_identification(log, pattern, 'lcStep')
    states_name = []

    for i, s in enumerate(states_index, 1):
        S_name = 'S'+str(i)
        states_name.append(S_name)
        SM_Discovered.add_state(StateMachine.state(
            name= S_name,
            type='',
            Resourcerequirements={
                log.loc[s]['Metric'] : log.loc[s]['Value']
            }
        ))
    return SM_Discovered, states_index, states_name

def state_type_abstraction(log: pd.DataFrame, State_Machine_Discovered: StateMachine):
    """
        Apply State-Type Abstraction
    """
    state_nb = len(State_Machine_Discovered.states)
    for state in State_Machine_Discovered.states:
        if state.name == 'S1':
            state.set_type('isInitial')
        elif int(state.name[1:]) < state_nb:
            state.set_type('isNormal')
        elif int(state.name[1:]) == state_nb:
            state.set_type('isFinal')
    return State_Machine_Discovered

def transition_abstraction(log: pd.DataFrame, State_Machine_Discovered: StateMachine, states_index):
    """
        Abstraction transition by combining reconfiguration actions and triggering event associated to the state-machine

    """
    states = State_Machine_Discovered.states
    for idx, state in enumerate(states):
        if idx < (len(states) - 1):
            diff_state = int(states[idx+1].Resourcerequirements['replicas']) - int(states[idx].Resourcerequirements['replicas'])

            if diff_state > 0:
                type = 'Scale-out'
            elif diff_state < 0:
                type = 'Scale-in'
            else:
                type = 'Error'
                print('Error: State Equivalent')

            #### Get states Event
            # Set time window selected
            time_window = timedelta(minutes=1)

            # Select events in the time window before state execution
            pattern_ts = log.loc[states_index[idx+1]]['time:timestamp']
            pattern_ts_minus_tw = (pattern_ts - time_window).isoformat()
            transition_Window = log[ ( log['time:timestamp'] > pattern_ts_minus_tw) & \
                (log['time:timestamp'] < pattern_ts) & (log['smElt'] == 'Transition' )].astype({'Value': int})
            
            # Return for each metric observed a consumption average
            avg = transition_Window.groupby('Metric')['Value'].mean().to_dict()

            if bool(avg) != False:
                State_Machine_Discovered.add_transition(
                    StateMachine.transition(
                        name=f"T{idx+1}",
                        source=state.name,
                        target=states[idx+1].name,
                        events=[StateMachine.event(
                                    id = 'E1',
                                    type = 'ResourceRelatedEvent',
                                    predicate = {
                                        'metric': 'Cpu Usage',
                                        'operator': '>=',
                                        'refValue': avg['Cpu Usage'],
                                        'time': str(time_window.total_seconds()) + 's'
                                    })],
                        actions=[StateMachine.action(
                            id = 'A1',
                            type = type,
                            attributes= {
                                'replicas' : abs(diff_state)
                            }
                        )]
                ))

    return State_Machine_Discovered


#### Execution of Abstraction 

Using these defined abstraction pattern, we identified state-machine from the event-logs. Then, we show the identified state-machine. 


In [7]:
##### Importation of annotated event logs #####
file_path = 'exported.xes'
event_log = pm4py.read_xes(file_path)

# Filter by case
events = event_log.groupby('@@case_index')
for i, case_event_log in events:
    ##### State abstraction : Pattern 3.1 #####
    SM_Discovered, states_index, states_name = state_abstraction(case_event_log)

    ##### State-Type abstraction : Pattern 3.2 #####
    SM_Discovered = state_type_abstraction(case_event_log, SM_Discovered)

    ##### Transition abstraction : Pattern 3.3 + 3.4 #####
    SM_Discovered = transition_abstraction(case_event_log, SM_Discovered, states_index)

    json = SM_Discovered.to_json()

    print(SM_Discovered)

    with open("SM_discovered/SM_.json", "w") as outfile:
        outfile.write(json)

parsing log, completed traces :: 100%|██████████| 3/3 [00:00<00:00, 341.27it/s]


States: [ (S1, isInitial, {'replicas': '2'}), (S2, isNormal, {'replicas': '4'}), (S3, isNormal, {'replicas': '6'}), (S4, isFinal, {'replicas': '8'}) ]
Transitions: [ ('T1':'S1'->'S2',[(E1,ResourceRelatedEvent,{'metric': 'Cpu Usage', 'operator': '>=', 'refValue': 15.0, 'time': '60.0s'})],[(A1,Scale-out,{'replicas': 2})]), ('T2':'S2'->'S3',[(E1,ResourceRelatedEvent,{'metric': 'Cpu Usage', 'operator': '>=', 'refValue': 15.0, 'time': '60.0s'})],[(A1,Scale-out,{'replicas': 2})]), ('T3':'S3'->'S4',[(E1,ResourceRelatedEvent,{'metric': 'Cpu Usage', 'operator': '>=', 'refValue': 11.666666666666666, 'time': '60.0s'})],[(A1,Scale-out,{'replicas': 2})]) ]
States: [ (S1, isInitial, {'replicas': '2'}), (S2, isFinal, {'replicas': '8'}) ]
Transitions: [ ('T1':'S1'->'S2',[(E1,ResourceRelatedEvent,{'metric': 'Cpu Usage', 'operator': '>=', 'refValue': 15.0, 'time': '60.0s'})],[(A1,Scale-out,{'replicas': 6})]) ]
States: [ (S1, isInitial, {'replicas': '2'}), (S2, isNormal, {'replicas': '4'}), (S3, isFinal,

With the discovered state-machine in hand, it is now possible to apply conformance checking with the defined state-machine. 

### Alignment

The final section of this study focuses on the implementation of an alignment algorithm that is used to identify any deviations between the defined state-machine and the observed state-machine. 

In [8]:
""" 
    Checker 
""" 
import matplotlib.pyplot as plt
import json
import networkx as nx

def to_graph(SM):
    SM_disc = nx.DiGraph()
    for state in SM['_StateMachine__states']:
        SM_disc.add_node(state['_state__name'], replicas = state['_state__Resourcerequirements']['replicas'], type=state['_state__type'])
    for transition in SM['_StateMachine__transitions']:
        SM_disc.add_edge(transition['_transition__source'], transition['_transition__target'], 
                         name=transition['_transition__name'], events=transition['_transition__events'], actions=transition['_transition__actions'])
    return SM_disc

def get_initial_nodes(graph):
    return [n for n,d in graph.in_degree() if d==0]

def get_final_nodes(graph):
    return [n for n,d in graph.out_degree() if d==0]

### Discovered state-machine
SM_Disc = to_graph(json.load(open("SM_discovered/SM_.json")))

### Defined state-machine
SM_Def = to_graph(json.load(open("SM_Defined.json")))

print("### Discovered state-machine ###")
print(SM_Disc.nodes)
print(SM_Disc.edges)

print("### Defined state-machine ###")
print(SM_Def.nodes)
print(SM_Def.edges)


### Discovered state-machine ###
['S1', 'S2', 'S3']
[('S1', 'S2'), ('S2', 'S3')]
### Defined state-machine ###
['S1', 'S2', 'S3']
[('S1', 'S2'), ('S2', 'S3')]


#### Search Space Construction

The alignment technique is utilized to identify deviations by attempting to align the defined state-machine and the discovered state-machine. The process of searching for the optimal alignment is carried out in two steps: first, constructing a search space that represents all possible moves; and second, searching for the optimal path within this search space. 

In [13]:
### Search Space construction
SS = nx.DiGraph()
SS.add_node(0, weight=0)

for i, (eltx, elty) in enumerate(zip(SM_Disc.nodes, SM_Def.nodes)):
    temp_last_nodes = get_final_nodes(SS)
    # e = epsilon to guarantee end
    e = i*0.1
    if SM_Disc.nodes[eltx]['replicas'] == SM_Def.nodes[eltx]['replicas']:
        # State equivalent 
        SS.add_node(str([eltx,elty]), weight=1+e)
        [SS.add_edge(node, str([eltx,elty])) for node in temp_last_nodes]
    else:
        SS.add_node(str([eltx,'>>']), weight=5+e)
        SS.add_node(str(['>>',elty]), weight=5+e)
        [SS.add_edge(node, str([eltx,'>>'])) for node in temp_last_nodes]
        [SS.add_edge(node, str(['>>',elty])) for node in temp_last_nodes]

print(SS.nodes)
print(SS.edges)

[0, "['S1', 'S1']", "['S2', 'S2']", "['S3', 'S3']"]
[(0, "['S1', 'S1']"), ("['S1', 'S1']", "['S2', 'S2']"), ("['S2', 'S2']", "['S3', 'S3']")]


In [14]:
### Identify starting and ending nodes of the search space
starting_nodes = get_initial_nodes(SS)
ending_nodes = get_final_nodes(SS)

# Compute the worst possible alignment
y_worst_sum = ((len(SM_Def.nodes) * 2 ) * 5)

### Compute the cost of an identified alignment 
results_path = []
for s in starting_nodes:
    for e in ending_nodes:
        y_optimal_cost = 0
        path = nx.astar_path(SS, s, e)
        for elt in path: y_optimal_cost+=SS.nodes[elt]['weight']
        results_path.append((path, y_optimal_cost))
        fitnessValue = 1 - y_optimal_cost / y_worst_sum

print("Report : ")
print(f"Alignment : {results_path}")
print(f"Y_Optimal : {y_optimal_cost}")
print(f"FitnessValue : {fitnessValue}")

Report : 
Alignment : [([0, "['S1', 'S1']", "['S2', 'S2']", "['S3', 'S3']"], 3.3)]
Y_Optimal : 3.3
FitnessValue : 0.89
