# Simple Log Analysis with Declare4Py

This tutorial explains how to perform simple analysis of logs with Declare4Py

After importing the Declare4Py package and specified the path of the log, a `Declare4Py` object has to be instantiated.

In [None]:
import os
from declare4py.declare4py import Declare4Py


log_path = os.path.join("..", "tests", "Sepsis Cases.xes.gz")

d4py = Declare4Py()

The next step is the parsing of the log with the `parse_xes_log` function. Logs can be passed both in the `.xes` or `xes.gz` formats. 

In [None]:
d4py.parse_xes_log(log_path)

Declare4Py offers several facilities for simple log indexing and analysis

In [None]:
# Return the number of cases in the log
print(f"Number of cases: {d4py.get_log_length()}")
print("--------------------------------------")

# Return the ids of the cases in the log
print(f"Cases ids:\n{d4py.get_trace_keys()}")
print("--------------------------------------")

# Return the names of the activities in the log
print(f"Activity alphabet:\n{d4py.get_log_alphabet_activities()}")
print("--------------------------------------")

# Return the names of the resources in the log
print(f"Resource alphabet:\n{d4py.get_log_alphabet_payload()}")
print("--------------------------------------")

A log is a complex data structure that can be explored along several dimensions. The functions `activities_log_projection` and `resources_log_projection` project the cases in the log according to the activities and resources dimensions, respectively. Each projection is a list (the log) of lists (the single cases) containing the name of the activity/resource.

In [None]:
# Activity projection
for idx, trace in enumerate(d4py.activities_log_projection()):
    print(f"{idx}- {trace}")
print("--------------------------------------")

# Resource projection
for idx, trace in enumerate(d4py.resources_log_projection()):
    print(f"{idx}- {trace}")
print("--------------------------------------")

A useful utily for logs is the one hot encoding according to the `act` or `payload` dimensions. These encodings can be useful for statistical analysis or Machine Learning tasks. The returned data type is a Pandas Dataframe.

In [None]:
# One hot encoding for activities
d4py.log_encoding(dimension='act')

In [None]:
# One hot encoding for payload
d4py.log_encoding(dimension='payload')

## Frequent Itemsets

Declare4Py offers support for computing the frequent itemsets of activities/resources in the log. The function `compute_frequent_itemsets` takes as input the `min_support` of the itemsets, the `algorithm` to perform the computation (available `fpgrowth` and `apriori`) and `len_itemset` indicating the maximum length of the itemsets, the default is `None`.

In [None]:
d4py.compute_frequent_itemsets(min_support=0.8, algorithm='fpgrowth', len_itemset=3)
d4py.frequent_item_sets