### Welcome to the LogView Icicle & Pie Function Evaluation Test

Welcome! This test is designed to evaluate the **icicle()** and **pie()** functions introduced in addition to the LogView framework, with a focus on **when and where they are useful**, and **how they support realistic process analysis scenarios**.

### Purpose of This Test

This test aims to assess the **effectiveness, clarity, and practical utility** of the new visualization functions:

- **`icicle()`**: Offers a hierarchical visualization of filter application sequences. It is designed to help users **understand the structure, depth, and interactions** of sequential filters.
- **`pie()`**: Provides a breakdown of how a specific filter applies across the log and branches. It could be useful for **assessing filter selectivity, relevance, and spread**.

### What This Test Covers

You will explore **realistic log filtering scenarios** and analyze how each function supports the following:

- Making filtering decisions more transparent
- Understanding the impact of sequential filters
- Identifying when each visualization is **appropriate, insufficient, or complementary**

The accompanying **tutorial notebook is available as a reference**, containing all examples.

Let’s begin!


### Using `icicle()` and `pie()` During the Test

Throughout this test, you are may use the `icicle()` and `pie()` functions at any point to visualize and reflect on your filtering logic. These visualizations are meant to support your analysis, not just as final steps, but as ongoing aids in exploring and validating your queries.

You can call either function **as soon as you have created a result set** using `log_view.evaluate_query(...)`, however, they are most informative after applying a sequence of filters. Any result set name listed in the output of `log_view.get_summary()` can be used to generate a visualization.

These functions are designed to help you:
- Understand how your filters were applied
- Compare different branches and filtering paths
- Assess the quality and impact of each filtering step

### How to Use the Visualization Functions

#### `icicle()` Function Parameters

| Parameter         | Type      | Description                                                                                                                                                                  |
|------------------|-----------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| `result_set_name`| `str`     | The name of the result set you want to visualize. This must match the name used when calling `log_view.evaluate_query(...)`. Any name in the list of "result_set" names can be used to call the function. |
| `log_view`       | `LogView` | A `LogView` object that stores the original event log along with all filtering steps applied. Created using `LogViewBuilder.build_log_view(log)`.                           |
| `metric`         | `str`     | The metric to visualize in the chart. Supported options:<br> • `"avg_case_duration_seconds"` (default)<br> • `"avg_events_per_case"`<br> • `"avg_time_between_events"`      |
| `show_time`      | `bool`    | If `True`, prints the time taken for each internal step (lineage extraction, filtering, plotting). Default is `False`.                                                      |
| `details`        | `bool`    | If `True` (default), prints a summary showing all filter steps, number of cases per subset, and computed metric values.                                                     |

#### `pie()` Function Parameters

| Parameter         | Type      | Description |
|------------------|-----------|-------------|
| `result_set_name`| `str`     | The name of the result set you want to visualize. This must correspond to a result set created via `log_view.evaluate_query(...)`. |
| `log_view`       | `LogView` | The `LogView` object containing the original event log and all registered filtering steps. |
| `metric`         | `str`     | The metric to visualize across the subsets. Supported values:<br> • `"avg_case_duration_seconds"` (default)<br> • `"avg_events_per_case"`<br> • `"avg_time_between_events"` |
| `details`        | `bool`    | If `True` (default), prints a summary listing all filter paths with number of cases and the average metric per path. |


### Imports & Setup

Import all necessary libraries and custom functions, feel free to add to this list:

In [2]:
import os
import zipfile
import pandas as pd
import pm4py
from logview.utils import LogViewBuilder
from logview.predicate import *
from filter_visualization import icicle, pie
from pm4py.objects.log.importer.xes import importer as xes_importer
from pm4py.algo.discovery.dfg import algorithm as dfg_discovery
from pm4py.visualization.dfg import visualizer as dfg_visualizer

### Load and Prepare Event Log

Then load your event log and format it for analysis with PM4Py. The code below does this for the BPI Challenge 2017 dataset:

In [None]:
# Load data

csv_file = "BPI_Challenge_2017.csv"
zip_file = "BPI_Challenge_2017.zip"

if not os.path.exists(csv_file):
    if os.path.exists(zip_file):
        print(f"Extracting {csv_file} from {zip_file}...")
        with zipfile.ZipFile(zip_file, 'r') as zip_ref:
            zip_ref.extract(csv_file)
    else:
        raise FileNotFoundError(f"Both '{csv_file}' and '{zip_file}' not found")
    
CASE_ID_COL = 'case'
TIMESTAMP_COL = 'time'
ACTIVITY_COL = 'event'
    
bpi_data = pd.read_csv(csv_file, sep=',', quotechar='"')
bpi_data.columns = bpi_data.columns.str.strip()
bpi_data[TIMESTAMP_COL] = pd.to_datetime(bpi_data[TIMESTAMP_COL], format='%Y/%m/%d %H:%M:%S.%f')
log = pm4py.format_dataframe(bpi_data, case_id=CASE_ID_COL, activity_key=ACTIVITY_COL, timestamp_key=TIMESTAMP_COL)

display(log)

Unnamed: 0,case,event,time,lifecycle:transition,ApplicationType,LoanGoal,RequestedAmount,MonthlyCost,org:resource,Selected,...,Accepted,CreditScore,NumberOfTerms,EventOrigin,OfferedAmount,case:concept:name,concept:name,time:timestamp,@@index,@@case_index
0,Application_1000086665,A_Create Application,2016-08-03 17:57:21.673000+00:00,COMPLETE,New credit,"Other, see explanation",5000.0,,User_1,,...,,,,Application,,Application_1000086665,A_Create Application,2016-08-03 17:57:21.673000+00:00,0,0
1,Application_1000086665,A_Submitted,2016-08-03 17:57:21.734000+00:00,COMPLETE,New credit,"Other, see explanation",5000.0,,User_1,,...,,,,Application,,Application_1000086665,A_Submitted,2016-08-03 17:57:21.734000+00:00,1,0
2,Application_1000086665,W_Handle leads,2016-08-03 17:57:21.963000+00:00,SCHEDULE,New credit,"Other, see explanation",5000.0,,User_1,,...,,,,Workflow,,Application_1000086665,W_Handle leads,2016-08-03 17:57:21.963000+00:00,2,0
3,Application_1000086665,W_Handle leads,2016-08-03 17:58:28.286000+00:00,WITHDRAW,New credit,"Other, see explanation",5000.0,,User_1,,...,,,,Workflow,,Application_1000086665,W_Handle leads,2016-08-03 17:58:28.286000+00:00,3,0
4,Application_1000086665,W_Complete application,2016-08-03 17:58:28.293000+00:00,SCHEDULE,New credit,"Other, see explanation",5000.0,,User_1,,...,,,,Workflow,,Application_1000086665,W_Complete application,2016-08-03 17:58:28.293000+00:00,4,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1202262,Application_999993812,W_Call incomplete files,2016-10-20 10:19:28.812000+00:00,RESUME,New credit,Caravan / Camper,30000.0,,User_41,,...,,,,Workflow,,Application_999993812,W_Call incomplete files,2016-10-20 10:19:28.812000+00:00,1202262,31508
1202263,Application_999993812,W_Call incomplete files,2016-10-20 10:21:59.667000+00:00,SUSPEND,New credit,Caravan / Camper,30000.0,,User_41,,...,,,,Workflow,,Application_999993812,W_Call incomplete files,2016-10-20 10:21:59.667000+00:00,1202263,31508
1202264,Application_999993812,O_Accepted,2016-10-24 08:24:30.056000+00:00,COMPLETE,New credit,Caravan / Camper,30000.0,,User_68,,...,,,,Offer,,Application_999993812,O_Accepted,2016-10-24 08:24:30.056000+00:00,1202264,31508
1202265,Application_999993812,A_Pending,2016-10-24 08:24:30.059000+00:00,COMPLETE,New credit,Caravan / Camper,30000.0,,User_68,,...,,,,Application,,Application_999993812,A_Pending,2016-10-24 08:24:30.059000+00:00,1202265,31508


In [None]:
# Basic structure
print("Log shape:", log.shape)
print("\nColumn names:", log.columns.tolist())

# Missing values
print("\nMissing values per column:")
print(log.isnull().sum())

Log shape: (1202267, 24)

Column names: ['case', 'event', 'time', 'lifecycle:transition', 'ApplicationType', 'LoanGoal', 'RequestedAmount', 'MonthlyCost', 'org:resource', 'Selected', 'EventID', 'OfferID', 'FirstWithdrawalAmount', 'Action', 'Accepted', 'CreditScore', 'NumberOfTerms', 'EventOrigin', 'OfferedAmount', 'case:concept:name', 'concept:name', 'time:timestamp', '@@index', '@@case_index']

Missing values per column:
case                           0
event                          0
time                           0
lifecycle:transition           0
ApplicationType                0
LoanGoal                       0
RequestedAmount                0
MonthlyCost              1159272
org:resource                   0
Selected                 1159272
EventID                        0
OfferID                  1051413
FirstWithdrawalAmount    1159272
Action                         0
Accepted                 1159272
CreditScore              1159272
NumberOfTerms            1159272
EventOrigin  

In [4]:
# Unique case count
print("\nNumber of unique cases:", log['case'].nunique())

# Number of unique events (activities)
print("Number of unique activities:", log['event'].nunique())

# Distribution of events per case
print("\nEvents per case (summary):")
print(log.groupby('case').size().describe())


Number of unique cases: 31509
Number of unique activities: 26

Events per case (summary):
count    31509.000000
mean        38.156305
std         16.715308
min         10.000000
25%         25.000000
50%         35.000000
75%         47.000000
max        180.000000
dtype: float64


In [5]:
# Distribution of activity frequency
print("\nActivity frequency:")
print(log['event'].value_counts())


Activity frequency:
event
W_Validate application        209496
W_Call after offers           191092
W_Call incomplete files       168529
W_Complete application        148900
W_Handle leads                 47264
O_Create Offer                 42995
O_Created                      42995
O_Sent (mail and online)       39707
A_Validating                   38816
A_Create Application           31509
A_Accepted                     31509
A_Concept                      31509
A_Complete                     31362
O_Returned                     23305
A_Incomplete                   23055
O_Cancelled                    20898
A_Submitted                    20423
O_Accepted                     17228
A_Pending                      17228
A_Cancelled                    10431
O_Refused                       4695
A_Denied                        3753
W_Assess potential fraud        3282
O_Sent (online only)            2026
W_Shortened completion           238
W_Personal Loan collection        22
Name: count

In [6]:
# Time range
print("\nTimestamp range:")
print(log['time'].min(), "to", log['time'].max())

# Data types
print("\nData types:")
print(log.dtypes)


Timestamp range:
2016-01-01 10:51:15.304000+00:00 to 2017-02-01 15:11:03.499000+00:00

Data types:
case                                  object
event                                 object
time                     datetime64[ns, UTC]
lifecycle:transition                  object
ApplicationType                       object
LoanGoal                              object
RequestedAmount                      float64
MonthlyCost                          float64
org:resource                          object
Selected                              object
EventID                               object
OfferID                               object
FirstWithdrawalAmount                float64
Action                                object
Accepted                              object
CreditScore                          float64
NumberOfTerms                        float64
EventOrigin                           object
OfferedAmount                        float64
case:concept:name             string[python]


### Explore and Apply Filters

In this step, you're going to explore **one** of the analysis questions below. Each question is designed to help you uncover **differences in process behavior** using specific filtering conditions.

#### 1. Do high-value new applications that end up in a successful loan application tend to follow longer or more complex processes than those that don't?

**Filter hints:**

```python
GreaterThanConstant('RequestedAmount', 10000)   # High-value application
EqToConstant('ApplicationType', 'New')          # New application
EqToConstant('event', 'A_Pending')              # Successful outcome

#### 2. Do in-person submitted applications for car loans perform differently than digital submissions, especially when the requested amount is over €5,000?

**Filter hints:**

```python
EqToConstant('event', 'A_Submitted')
EqToConstant('LoanGoal', 'Car')
GreaterThanConstant('RequestedAmount', 5000)

#### Getting Started

Make sure you've instantiated your LogView object:

```python
log_view = LogViewBuilder.build_log_view(log)

You can retrieve the full summary of your session's filtering history at any point:

```python
summary = log_view.get_summary()

In [7]:
# Your code goes here