# CC4MCSLAEnriched

In this notebook, we present our approach implementation. Our approach is based on three components, first the annotation, second the abstraction and last the Checker. 

## Annotation ##

First, we define the ontology using owlready2.

In [1]:
from owlready2 import * 
onto = get_ontology('http://test.org/onto.owl')

##### Ontology Definition #####
with onto: 
    class stateMachine(Thing):
        pass
    class State(stateMachine):
        pass
    class Start(State):
        pass
    class Execute(State):
        pass
    class Complete(State):
        pass
    class Transition(stateMachine):
        pass
    class eventType(Thing):
        pass
    class isRelatedTo(ObjectProperty, FunctionalProperty):
        domain = [eventType]
        range  = [stateMachine]

##### Declare event types
Service_Create = eventType('Service_Create', isRelatedTo=Start)
Service_Remove = eventType('Service_Remove', isRelatedTo=Start)
Service_Update = eventType('Service_Update', isRelatedTo=Start)
Container_Create = eventType('Container_Create', isRelatedTo=Execute)
Container_Destroy = eventType('Container_Destroy', isRelatedTo=Execute)
Container_Start = eventType('Container_Start', isRelatedTo=Complete)
Container_Stop = eventType('Container_Stop', isRelatedTo=Complete)
Ressource_Usage = eventType('Ressource_Usage', isRelatedTo=Transition)

onto.save(file='onto.owl', format="rdfxml")





Then, we define a function for finding ancestors of indentified event type in event logs.

In [2]:
##### Ancestors Calling #####
def search_ancestors(onto, ask):
    result = onto.search(iri = "*{}".format(ask))
    lcStep = str(result[0].isRelatedTo).split('.')[1]
    smElt = str(result[0].isRelatedTo.is_a[0]).split('.')[1]
    if lcStep == 'Transition':
        smElt = 'Transition'
        lcStep = 'N/A'
    return [smElt, lcStep]

Finally, we implement the annotation function of xes file adding the state-machine element and lifecycle step for each event and return an xes file with the annotated events.

In [4]:
#### Pre-processing based on ontology ####
import pandas as pd
import pm4py

#### Import event-logs from CSV to pandas dataframe
dataframe = pd.read_csv('Simulated_logs.csv', sep=',')
pd.set_option('display.width',1000)
dataframe = pm4py.format_dataframe(dataframe, case_id='Resource Name', activity_key='Event-Type', timestamp_key='Timestamp')

### Print Raw event logs
print("Raw event-logs")
print(dataframe)

Raw event-logs
       Unnamed: 0                 Timestamp     Source Resource Name         Event-Type     Metric Value case:concept:name       concept:name            time:timestamp  @@index  @@case_index
0           28412 2023-03-17 17:14:16+00:00   Provider   Service-105     Service_Create   replicas     1       Service-105     Service_Create 2023-03-17 17:14:16+00:00        0             0
1           28413 2023-03-17 17:14:17+00:00   Provider   Service-105   Container_Create          /     /       Service-105   Container_Create 2023-03-17 17:14:17+00:00        1             0
2           28414 2023-03-17 17:14:18+00:00   Provider   Service-105    Container_Start          /     /       Service-105    Container_Start 2023-03-17 17:14:18+00:00        2             0
3           28415 2023-03-17 17:14:50+00:00  Ressource   Service-105    Ressource_Usage  Cpu Usage    81       Service-105    Ressource_Usage 2023-03-17 17:14:50+00:00        3             0
4           28416 2023-03-17 1

  dataframe = pm4py.format_dataframe(dataframe, case_id='Resource Name', activity_key='Event-Type', timestamp_key='Timestamp')


In [5]:
#### Iterate through event logs ####
for idx, row in dataframe.iterrows():
    # Search event type in ontology and returns ancestors 
    smElt, lcStep = search_ancestors(onto, row['Event-Type'])
    dataframe.loc[[idx],'smElt'] = smElt
    dataframe.loc[[idx],'lcStep'] = lcStep

### Export as XES ###
event_log = pm4py.convert_to_event_log(dataframe)
xes = pm4py.write_xes(event_log, 'exported.xes')

### Print Dataframe
print(dataframe)

  from .autonotebook import tqdm as notebook_tqdm
exporting log, completed traces :: 100%|██████████| 45/45 [00:03<00:00, 13.17it/s]

       Unnamed: 0                 Timestamp     Source Resource Name         Event-Type     Metric Value case:concept:name       concept:name            time:timestamp  @@index  @@case_index       smElt    lcStep
0           28412 2023-03-17 17:14:16+00:00   Provider   Service-105     Service_Create   replicas     1       Service-105     Service_Create 2023-03-17 17:14:16+00:00        0             0       State     Start
1           28413 2023-03-17 17:14:17+00:00   Provider   Service-105   Container_Create          /     /       Service-105   Container_Create 2023-03-17 17:14:17+00:00        1             0       State   Execute
2           28414 2023-03-17 17:14:18+00:00   Provider   Service-105    Container_Start          /     /       Service-105    Container_Start 2023-03-17 17:14:18+00:00        2             0       State  Complete
3           28415 2023-03-17 17:14:50+00:00  Ressource   Service-105    Ressource_Usage  Cpu Usage    81       Service-105    Ressource_Usage 2023-0




### Abstraction

Based on the annotated event logs, we abstract state-machine using our defined patterns. We define several functions in order to identify state-machine elements. We also define an abstract function for pattern identification of events in a pandas dataframe rows. 

In [6]:
#### Function Definition ####
import pm4py
import pandas as pd

from typing import List
from StateMachine import StateMachine

from datetime import timedelta

def pattern_identification(log: pd.DataFrame, pattern: List, attribute: str):
    """ 
        Return index of pattern in log in DataFrame 
        To Do: 
            - Add functionnalities to define patterns across several attributes
            - Enabled possibilities of eventually follows pattern 
    """
    # Identify number of item in pattern
    nbPattern = len(pattern)

    # Construction pattern as string
    ## Begin of request
    pattern_s = f"""log.index[(log['{attribute}'] == '{pattern[0]}')"""
    for item in range(1, nbPattern):
        pattern_s += f""" & (log['{attribute}'].shift(-{item}) == '{pattern[item]}')"""
    ## End of request
    pattern_s += f"""]"""

    # Execution of defined pattern
    indice_p_s = eval(pattern_s)
    return indice_p_s

def state_abstraction(log: pd.DataFrame):
    """
        Return states identified
    """
    ## Declare discovered state machine
    SM_Discovered = StateMachine(
        name=''
    )
    pattern=['Start', 'Execute', 'Complete']

    states_index = pattern_identification(log, pattern, 'lcStep')
    states_name = []

    for i, s in enumerate(states_index, 1):
        S_name = 'S'+str(i)
        states_name.append(S_name)
        SM_Discovered.add_state(StateMachine.state(
            name= S_name,
            type='',
            Resourcerequirements={
                log.loc[s]['Metric'] : log.loc[s]['Value']
            }
        ))
    return SM_Discovered, states_index, states_name

def state_type_abstraction(log: pd.DataFrame, State_Machine_Discovered: StateMachine):
    """
        Apply State-Type Abstraction
    """
    state_nb = len(State_Machine_Discovered.states)
    for state in State_Machine_Discovered.states:
        if state.name == 'S1':
            state.set_type('isInitial')
        elif int(state.name[1:]) < state_nb:
            state.set_type('isNormal')
        elif int(state.name[1:]) == state_nb:
            state.set_type('isFinal')
    return State_Machine_Discovered

def transition_abstraction(log: pd.DataFrame, State_Machine_Discovered: StateMachine, states_index):
    """
        Abstraction transition by combining reconfiguration actions and triggering event associated to the state-machine

    """
    states = State_Machine_Discovered.states
    for idx, state in enumerate(states):
        if idx < (len(states) - 1):
            diff_state = int(states[idx+1].Resourcerequirements['replicas']) - int(states[idx].Resourcerequirements['replicas'])

            if diff_state > 0:
                type = 'Scale-out'
            elif diff_state < 0:
                type = 'Scale-in'
            else:
                type = 'Error'
                print('Error: State Equivalent')

            #### Get states Event
            # Set time window selected
            time_window = timedelta(minutes=1)

            # Select events in the time window before state execution
            pattern_ts = log.loc[states_index[idx+1]]['time:timestamp']
            pattern_ts_minus_tw = (pattern_ts - time_window).isoformat()
            transition_Window = log[ ( log['time:timestamp'] > pattern_ts_minus_tw) & \
                (log['time:timestamp'] < pattern_ts) & (log['smElt'] == 'Transition' )].astype({'Value': int})
            
            # Return for each metric observed a consumption average
            avg = transition_Window.groupby('Metric')['Value'].mean().to_dict()

            if bool(avg) != False:
                State_Machine_Discovered.add_transition(
                    StateMachine.transition(
                        name=f"T{idx+1}",
                        source=state.name,
                        target=states[idx+1].name,
                        events=[StateMachine.event(
                                    id = 'E1',
                                    type = 'ResourceRelatedEvent',
                                    predicate = {
                                        'metric': 'Cpu Usage',
                                        'operator': '>=',
                                        'refValue': avg['Cpu Usage'],
                                        'time': str(time_window.total_seconds()) + 's'
                                    })],
                        actions=[StateMachine.action(
                            id = 'A1',
                            type = type,
                            attributes= {
                                'replicas' : abs(diff_state)
                            }
                        )]
                ))

    return State_Machine_Discovered


Execution of Abstraction 

In [7]:
##### Importation of annotated event logs #####
file_path = 'exported.xes'
event_log = pm4py.read_xes(file_path)

# Filter by case
events = event_log.groupby('@@case_index')
for i, case_event_log in events:
    ##### State abstraction : Pattern 3.1 #####
    SM_Discovered, states_index, states_name = state_abstraction(case_event_log)
    
    print("##### State abstraction : Pattern 3.1 #####")
    print(SM_Discovered)
    print(" ")

    ##### State-Type abstraction : Pattern 3.2 #####
    SM_Discovered = state_type_abstraction(case_event_log, SM_Discovered)

    print("##### State-Type abstraction : Pattern 3.2 #####")
    print(SM_Discovered)
    print(" ")

    ##### Transition abstraction : Pattern 3.3 + 3.4 #####
    SM_Discovered = transition_abstraction(case_event_log, SM_Discovered, states_index)

    print("##### Transition abstraction : Pattern 3.3 + 3.4 #####")
    print(SM_Discovered)
    print(" ")

    json = SM_Discovered.to_json()

    with open("SM_discovered/SM_.json", "w") as outfile:
        outfile.write(json)

parsing log, completed traces :: 100%|██████████| 45/45 [00:04<00:00,  9.52it/s]


##### State abstraction : Pattern 3.1 #####
States: [ (S1, , {'replicas': '1'}), (S10, , {'replicas': '24'}), (S11, , {'replicas': '0'}), (S12, , {'replicas': '7'}), (S13, , {'replicas': '20'}), (S14, , {'replicas': '15'}), (S15, , {'replicas': '8'}), (S16, , {'replicas': '22'}), (S17, , {'replicas': '0'}), (S18, , {'replicas': '25'}), (S19, , {'replicas': '20'}), (S2, , {'replicas': '20'}), (S20, , {'replicas': '19'}), (S21, , {'replicas': '10'}), (S22, , {'replicas': '0'}), (S23, , {'replicas': '6'}), (S24, , {'replicas': '19'}), (S25, , {'replicas': '7'}), (S26, , {'replicas': '15'}), (S27, , {'replicas': '7'}), (S28, , {'replicas': '14'}), (S29, , {'replicas': '24'}), (S3, , {'replicas': '0'}), (S30, , {'replicas': '15'}), (S31, , {'replicas': '19'}), (S32, , {'replicas': '21'}), (S33, , {'replicas': '24'}), (S34, , {'replicas': '0'}), (S35, , {'replicas': '17'}), (S36, , {'replicas': '5'}), (S37, , {'replicas': '14'}), (S38, , {'replicas': '2'}), (S39, , {'replicas': '6'}), (S4, ,

### Alignment

In this last section, we implement an alignment algorithm for identifying the deviations between defined state-machine and observed state-machine. 

In [8]:
""" 
    Checker 
""" 
import matplotlib.pyplot as plt
import json
import networkx as nx

def to_graph(SM):
    SM_disc = nx.DiGraph()
    for state in SM['_StateMachine__states']:
        SM_disc.add_node(state['_state__name'], replicas = state['_state__Resourcerequirements']['replicas'], type=state['_state__type'])
    for transition in SM['_StateMachine__transitions']:
        SM_disc.add_edge(transition['_transition__source'], transition['_transition__target'], 
                         name=transition['_transition__name'], events=transition['_transition__events'], actions=transition['_transition__actions'])
    return SM_disc

def get_initial_nodes(graph):
    return [n for n,d in graph.in_degree() if d==0]

def get_final_nodes(graph):
    return [n for n,d in graph.out_degree() if d==0]

### Discovered state-machine
SM_Disc = to_graph(json.load(open("SM_discovered/SM_.json")))

### Defined state-machine
SM_Def = to_graph(json.load(open("SM_Defined.json")))

print("### Discovered state-machine ###")
print(SM_Disc.nodes)
print(SM_Disc.edges)

print("### Defined state-machine ###")
print(SM_Def.nodes)
print(SM_Def.edges)


### Discovered state-machine ###
['S1', 'S2', 'S3', 'S4', 'S5', 'S6', 'S7', 'S8', 'S9', 'S10', 'S11', 'S12', 'S13', 'S14', 'S15', 'S16', 'S17', 'S18', 'S19', 'S20', 'S21', 'S22', 'S23', 'S24', 'S25', 'S26', 'S27', 'S28', 'S29', 'S30', 'S31', 'S32', 'S33', 'S34', 'S35', 'S36', 'S37', 'S38', 'S39', 'S40', 'S41', 'S42', 'S43', 'S44', 'S45', 'S46', 'S47', 'S48', 'S49', 'S50', 'S51', 'S52', 'S53', 'S54', 'S55', 'S56', 'S57', 'S58', 'S59', 'S60', 'S61', 'S62', 'S63', 'S64', 'S65', 'S66', 'S67', 'S68', 'S69', 'S70', 'S71', 'S72', 'S73', 'S74', 'S75', 'S76', 'S77', 'S78', 'S79', 'S80', 'S81', 'S82', 'S83', 'S84', 'S85', 'S86', 'S87', 'S88', 'S89', 'S90', 'S91', 'S92', 'S93', 'S94', 'S95', 'S96', 'S97', 'S98', 'S99', 'S100', 'S101', 'S102', 'S103', 'S104', 'S105', 'S106', 'S107', 'S108', 'S109', 'S110', 'S111', 'S112', 'S113', 'S114', 'S115', 'S116', 'S117', 'S118', 'S119']
[('S2', 'S3'), ('S3', 'S4'), ('S4', 'S5'), ('S5', 'S6'), ('S6', 'S7'), ('S7', 'S8'), ('S8', 'S9'), ('S9', 'S10'), ('S11', 

Search Space Construction

In [9]:
### Search Space construction
SS = nx.DiGraph()
SS.add_node(0, weight=0)

for i, (eltx, elty) in enumerate(zip(SM_Disc.nodes, SM_Def.nodes)):
    temp_last_nodes = get_final_nodes(SS)
    # e = epsilon to guarantee end
    e = i*0.1
    if SM_Disc.nodes[eltx]['replicas'] == SM_Def.nodes[eltx]['replicas']:
        # State equivalent 
        SS.add_node(str([eltx,elty]), weight=1+e)
        [SS.add_edge(node, str([eltx,elty])) for node in temp_last_nodes]
    else:
        SS.add_node(str([eltx,'>>']), weight=5+e)
        SS.add_node(str(['>>',elty]), weight=5+e)
        [SS.add_edge(node, str([eltx,'>>'])) for node in temp_last_nodes]
        [SS.add_edge(node, str(['>>',elty])) for node in temp_last_nodes]

### Identify starting and ending nodes of the search space
starting_nodes = get_initial_nodes(SS)
ending_nodes = get_final_nodes(SS)

# Compute the worst possible alignment
y_worst_sum = ((len(SM_Def.nodes) * 2 ) * 5)

### Compute the cost of an identified alignment 
results_path = []
for s in starting_nodes:
    for e in ending_nodes:
        y_optimal_cost = 0
        path = nx.astar_path(SS, s, e)
        for elt in path: y_optimal_cost+=SS.nodes[elt]['weight']
        results_path.append((path, y_optimal_cost))
        fitnessValue = 1 - y_optimal_cost / y_worst_sum

print("Report : ")
print(f"Path : {results_path}")
print(f"Y_Optimal : {y_optimal_cost}")
print(f"FitnessValue : {fitnessValue}")

Report : 
Path : [([0, "['S1', '>>']", "['S2', '>>']", "['S3', '>>']"], 15.3), ([0, "['S1', '>>']", "['S2', '>>']", "['>>', 'S3']"], 15.3)]
Y_Optimal : 15.3
FitnessValue : 0.49
