# Google Colab Notebook for Fine-Tuning Large Language Models for Process Mining Tasks

This notebook replicates and extends the experiments from **“Evaluating the Ability of LLMs to Solve Semantics-Aware Process Mining Tasks”** on three semantics-aware tasks: **T‑SAD** (trace anomaly detection), **A‑SAD** (eventually-follows order validation), and **S‑NAP** (next-activity prediction). It reports both **in‑context learning (ICL)** and **fine‑tuning (FT)** settings, with comparisons to reference scores from the paper.

[Evaluating the Ability of LLMs to Solve Semantics-Aware Process Mining Tasks](https://arxiv.org/pdf/2407.02310)

**ICL results** show performance close to random baselines on classification tasks (T‑SAD/A‑SAD), while S‑NAP benefits more noticeably from ICL. Among ICL models, **Llama** tends to outperform **Mistral**, especially on **S‑NAP**, indicating that next-activity prediction is more responsive to prompt-based reasoning than anomaly/order classification.

**Fine‑tuning results** substantially improve performance across all tasks. FT Llama and FT Mistral achieve **high F1 scores** on T‑SAD and A‑SAD, and **marked gains on S‑NAP** compared to ICL. These results align with the paper’s conclusion that semantics-aware process mining tasks benefit strongly from supervised fine‑tuning rather than purely prompt-based inference.

Overall, the replication confirms that **ICL offers limited gains for semantic validation tasks**, while **fine‑tuning is consistently effective**, especially for tasks requiring structured understanding of process behavior.

In [None]:
!nvidia-smi

Wed Jan 15 21:35:01 2025       
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.104.05             Driver Version: 535.104.05   CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|   0  NVIDIA A100-SXM4-40GB          Off | 00000000:00:04.0 Off |                    0 |
| N/A   30C    P0              40W / 400W |      2MiB / 40960MiB |      0%      Default |
|                                         |                      |             Disabled |
+-----------------------------------------+----------------------+----------------------+
                                                                    

# Evaluating the Ability of LLMs to Solve Semantics-Aware Process Mining Tasks

Replication Study

Resources ICL
- [Original Paper](https://arxiv.org/abs/2407.02310)
- [llms4pm-icl Github](https://github.com/a-rebmann/llms4pm)

Own Replicated FT-Repository
- [llms4pm-ft Github](https://github.com/luciendgolden/llms4pm-ft)

In [None]:
import pandas as pd
from google.colab import drive
drive.mount('/content/drive')
!ls "/content/drive/MyDrive"

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).
'Colab Notebooks'   data   Forms   llms4pm-ft   llms4pm-icl   Presentations


In [None]:
!apt install firefox firefox-geckodriver
!pip install dataframe_image selenium

Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
Package firefox-geckodriver is not available, but is referred to by another package.
This may mean that the package is missing, has been obsoleted, or
is only available from another source
However the following packages replace it:
  firefox

[1;31mE: [0mPackage 'firefox-geckodriver' has no installation candidate[0m


# ICL-Learning

## Replication Step 0: Data collection

The corpus and datasets can be downloaded from here: [datasets](https://zenodo.org/records/14273161)

- T-SAD: Given a trace σ, decide if σ is a valid execution of the underlying process or not, without knowing the behavior allowed in the process. Each row contains a trace (column trace) with a corresponding label (column anomalous) indicating whether the trace represents a valid execution of the underlying process. The set of activities that can occur in the process are also given (column unique_activities).

```csv
model_id,revision_id,trace,label,unique_activities,anomalous,id
c78bef3bc4f043e880c51a5de86f7b33,cf03653c9c664b55a18da5b53ca9cee5,"['Take comprehensive exam', 'Submit course form (at least ECTS)', 'Complete courses', 'Get an international publication', 'Follow seminar on research methodology', 'Give first doctoral seminar', 'Participate in international conference', 'Give second doctoral seminar']",False,"{'Take comprehensive exam', 'Submit course form (at least ECTS)', 'Complete courses', 'Follow seminar on research methodology', 'Give first doctoral seminar', 'Get an international publication', 'Give second doctoral seminar', 'Participate in international conference'}",False,c78bef3bc4f043e880c51a5de86f7b33_cf03653c9c664b55a18da5b53ca9cee5
```

- A-SAD: Given an eventually-follows relation ef = a ≺ b of a trace σ, decide if ef represents a valid execution order of the two activities a and b that are executed in a process or not, without knowing the behavior allowed in the process.
Each row contains an eventually-follows relation (column eventually_follows) with a corresponding label (column out_of_order) indicating wether the two activities of the relation were executed in an invalid order (TRUE) or in a valid order (FALSE) according to the underlying process (model). The set of activities that can occur in the process are also given (column unique_activities).

```csv
model_id,revision_id,out_of_order,unique_activities,eventually_follows,id
2b4e4aca49ef4694a290b956fe18eb9b,f9f65a9604b4434996eede7b550b8f8a,True,"{'Register claim', 'Perform assessment', 'Phone garage to authorise repairs', 'Send letter', 'Checks insurance claim', 'Reject claim', 'Schedule payment', 'Check document'}","('Phone garage to authorise repairs', 'Reject claim')",2b4e4aca49ef4694a290b956fe18eb9b_f9f65a9604b4434996eede7b550b8f8a
```

- S-NAP: Given an event log L and a prefix p_k of length k, with 1 < k, predict the next activity a_k+1
Each row contains a trace prefix (column prefix) with a corresponding next activity (column next) indicating the activity that should be performed next after the last activity of the prefix  according to the trace from which the prefix was generated. The set of activities that can occur in the process are also given (column unique_activities).

```csv
model_id,revision_id,trace,prefix,next,unique_activities,id
f59a5a5a07b64916bcbd843e48485c0e,11c2f63f1f684c9dabbdb18d5e47bcca,"['mold upper and lower part of the enginge', 'bend front defender', 'wield parts together', 'bend bars for the frame', 'insert outlets and cylinders', 'make seat', 'bend rear defender', 'weld bars together', 'assemble parts']","['mold upper and lower part of the enginge', 'bend front defender', 'wield parts together']",bend bars for the frame,"{'bend bars for the frame', 'weld bars together', 'insert outlets and cylinders', 'bend rear defender', 'wield parts together', 'bend front defender', 'mold upper and lower part of the enginge', 'assemble parts', 'make seat'}",f59a5a5a07b64916bcbd843e48485c0e_11c2f63f1f684c9dabbdb18d5e47bcca
```

## Replication Step 1: Data Exploration & Preprocessing

**Connect to Google drive and load the data under data/ and the two repositories mentioned above**

In [None]:
!pip install -r "/content/drive/MyDrive/llms4pm-icl/requirements.txt"

Collecting jupyter (from -r /content/drive/MyDrive/llms4pm-icl/requirements.txt (line 1))
  Downloading jupyter-1.1.1-py2.py3-none-any.whl.metadata (2.0 kB)
Collecting pm4py (from -r /content/drive/MyDrive/llms4pm-icl/requirements.txt (line 5))
  Downloading pm4py-2.7.13-py3-none-any.whl.metadata (4.4 kB)
Collecting func_timeout (from -r /content/drive/MyDrive/llms4pm-icl/requirements.txt (line 6))
  Downloading func_timeout-4.3.5.tar.gz (44 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m44.3/44.3 kB[0m [31m1.7 MB/s[0m eta [36m0:00:00[0m
[?25h  Preparing metadata (setup.py) ... [?25l[?25hdone
Collecting bitsandbytes (from -r /content/drive/MyDrive/llms4pm-icl/requirements.txt (line 9))
  Downloading bitsandbytes-0.45.0-py3-none-manylinux_2_24_x86_64.whl.metadata (2.9 kB)
Collecting langdetect (from -r /content/drive/MyDrive/llms4pm-icl/requirements.txt (line 11))
  Downloading langdetect-1.0.9.tar.gz (981 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

In [None]:
DATA_PATH = "/content/drive/MyDrive/data/"

corpus_df = pd.read_csv(DATA_PATH + "process_behavior_corpus.csv")
T_SAD_df = pd.read_csv(DATA_PATH + "T_SAD.csv")
A_SAD_df = pd.read_csv(DATA_PATH + "A_SAD.csv")
S_NAP_df = pd.read_csv(DATA_PATH + "S_NAP.csv")
S_PMD_df = pd.read_csv(DATA_PATH + "S-PMD.csv")

In [None]:
T_total = len(T_SAD_df)
T_valid = len(T_SAD_df[T_SAD_df['anomalous'] == False])
T_anomalous = len(T_SAD_df[T_SAD_df['anomalous'] == True])

A_total = len(A_SAD_df)
A_valid = len(A_SAD_df[A_SAD_df['out_of_order'] == False])
A_anomalous = len(A_SAD_df[A_SAD_df['out_of_order'] == True])

SN_total = len(S_NAP_df)
SN_valid = SN_total
SN_anomalous = '-'

summary_table = pd.DataFrame({
    'Task Dataset': ['T-SAD', 'A-SAD', 'S-NAP'],
    'Total': [T_total, A_total, SN_total],
    'Valid': [T_valid, A_valid, SN_valid],
    'Anomalous': [T_anomalous, A_anomalous, SN_anomalous]
})

summary_table['Total'] = summary_table['Total'].apply(lambda x: f"{x:,}")
summary_table['Valid'] = summary_table['Valid'].apply(lambda x: f"{x:,}")
summary_table['Anomalous'] = summary_table['Anomalous'].apply(lambda x: f"{x:,}" if isinstance(x, int) else x)

summary_table

Unnamed: 0,Task Dataset,Total,Valid,Anomalous
0,T-SAD,291251,150301,140950
1,A-SAD,316308,158154,158154
2,S-NAP,1289081,1289081,-


In [None]:
corpus_df["num_unique"] = corpus_df["unique_activities"].apply(lambda x: len(eval(x)))
corpus_df["num_variants"] = corpus_df["string_traces"].apply(lambda x: len(eval(x)))

num_process_models = len(corpus_df)

ua_total = len(set.union(*corpus_df["unique_activities"].apply(eval)))
ua_avg = corpus_df["num_unique"].mean()
ua_med = corpus_df["num_unique"].median()
ua_min = corpus_df["num_unique"].min()
ua_max = corpus_df["num_unique"].max()

uv_total = corpus_df["string_traces"].apply(lambda x: len(eval(x))).sum()
uv_avg = corpus_df["num_variants"].mean()
uv_med = corpus_df["num_variants"].median()
uv_min = corpus_df["num_variants"].min()
uv_max = corpus_df["num_variants"].max()

summary_df = pd.DataFrame([
    {
        "Characteristic": "# Process models",
        "Total": f"{num_process_models:,}",
        "Avg":  "-",
        "Med":  "-",
        "Min":  "-",
        "Max":  "-"
    },
    {
        "Characteristic": "# Unique activities",
        "Total": f"{ua_total:,}",
        "Avg":  f"{ua_avg:.2f}",
        "Med":  f"{ua_med:.0f}",
        "Min":  f"{ua_min:,}",
        "Max":  f"{ua_max:,}"
    },
    {
        "Characteristic": "# Unique sequences",
        "Total": f"{uv_total:,}",
        "Avg":  f"{uv_avg:.2f}",
        "Med":  f"{uv_med:.0f}",
        "Min":  f"{uv_min:,}",
        "Max":  f"{uv_max:,}"
    }
])

summary_df

Unnamed: 0,Characteristic,Total,Avg,Med,Min,Max
0,# Process models,15902,-,-,-,-
1,# Unique activities,49108,4.69,4,1,21
2,# Unique sequences,163976,10.31,1,1,10080


In [None]:
import pickle
from typing import Optional
from sklearn.model_selection import train_test_split
from datasets.arrow_dataset import Dataset
import pandas as pd
import ast
from tqdm import tqdm

def stratified_sample(df, label_col, frac, random_state=42) -> pd.DataFrame:
    """
    Performs stratified sampling to reduce the dataset size by a given fraction.
    """
    stratified_df, _ = train_test_split(
        df,
        stratify=df[label_col],
        test_size=1-frac,
        random_state=random_state
    )
    return stratified_df

def split_by_model(df, task, pkl_path="/content/drive/MyDrive/data/train_val_test.pkl", frac: Optional[float] = None) -> tuple[pd.DataFrame, pd.DataFrame, pd.DataFrame]:
    """
    Splits a DataFrame into train, validation, and test subsets based on IDs.
    Only includes rows with more than one unique activity.
    """
    df["id"] = df["model_id"].astype(str) + "_" + df["revision_id"].astype(str)
    df["num_unique_activities"] = df["unique_activities"].apply(len)

    df = df[df["num_unique_activities"] > 1]

    with open(pkl_path, "rb") as file:
        train_ids, val_ids, test_ids = pickle.load(file)

    train_df = df[df["id"].isin(train_ids)]
    val_df = df[df["id"].isin(val_ids)]
    test_df = df[df["id"].isin(test_ids)]

    if frac is not None and 0 < frac < 1:
        if task in ["TRACE_ANOMALY", "OUT_OF_ORDER"]:
            train_df = stratified_sample(train_df, label_col="ds_labels", frac=frac)
            val_df = stratified_sample(val_df, label_col="ds_labels", frac=frac)
            test_df = stratified_sample(test_df, label_col="ds_labels", frac=frac)
        else:
            train_df = train_df.sample(frac=frac, random_state=42)
            val_df = val_df.sample(frac=frac, random_state=42)
            test_df = test_df.sample(frac=frac, random_state=42)

    return train_df, val_df, test_df

def remove_duplicates(pair_df):
    """
    Removes duplicate rows in the DataFrame based on specific columns.
    Additional columns like 'trace', 'eventually_follows', and 'prefix'
    are considered if present in the DataFrame.
    """
    columns = ["revision_id", "model_id", "unique_activities"]
    if "trace" in pair_df.columns:
        columns.append("trace")
    if "eventually_follows" in pair_df.columns:
        columns.append("eventually_follows")
    if "prefix" in pair_df.columns:
        columns.append("prefix")
    pair_df = pair_df.drop_duplicates(subset=columns)
    return pair_df

def setify(x: str):
    """
    Converts a string representation of a set into an actual Python set.
    Ensures the result is a set, otherwise raises an AssertionError.
    """
    set_: set[str] = ast.literal_eval(x)
    assert isinstance(set_, set), f"Conversion failed for {x}"
    return set_

def parse_tuple(x: str):
    """
    Converts a string representation of a tuple into an actual Python tuple.
    Ensures the result is a tuple, otherwise raises an AssertionError.
    """
    tuple_ = ast.literal_eval(x) if isinstance(x, str) else x
    assert isinstance(tuple_, tuple), f"Conversion failed for {x}"
    return tuple_

def load_dataset(file_name: str, task: str, frac: Optional[float]) -> tuple[Dataset, Dataset, Dataset]:
    """
    Dynamically loads and processes a dataset based on the file name and task.
    """
    df = pd.read_csv(file_name)

    if task == "TRACE_ANOMALY":
        # T-SAD
        df["ds_labels"] = (~df["anomalous"]).astype(bool)  # Invert labels
        df["trace"] = df["trace"].apply(lambda x: ast.literal_eval(x) if isinstance(x, str) else x)
        df["trace"] = df["trace"].apply(lambda x: tuple(x))
        #df = remove_duplicates(df)
        df["unique_activities"] = df["unique_activities"].apply(setify)
        columns = ["model_id", "revision_id", "unique_activities", "trace", "ds_labels"]
        df = df.loc[:, columns]
        #print(df.head())
    elif task == "OUT_OF_ORDER":
        # A-SAD
        df["ds_labels"] = (~df["out_of_order"]).astype(bool)  # Invert labels
        #df = remove_duplicates(df)
        df["unique_activities"] = df["unique_activities"].apply(setify)
        df["eventually_follows"] = df["eventually_follows"].apply(parse_tuple)
        columns = ["model_id", "revision_id", "unique_activities", "ds_labels", "eventually_follows"]
        df = df.loc[:, columns]
        #print(df.head())
    elif task == "NEXT_ACTIVITY":
        # S-NAP
        #df = remove_duplicates(df)
        df["prefix"] = df["prefix"].apply(lambda x: ast.literal_eval(x) if isinstance(x, str) else x)
        df["unique_activities"] = df["unique_activities"].apply(setify)
        columns = ["model_id", "revision_id", "prefix", "next", "unique_activities"]
        df = df.loc[:, columns]
        #print(df.head())
    else:
        raise ValueError(f"Unsupported task: {task}")

    train_df, val_df, test_df = split_by_model(df, task=task, frac=frac)

    return (
        Dataset.from_pandas(train_df.reset_index(drop=True)),
        Dataset.from_pandas(val_df.reset_index(drop=True)),
        Dataset.from_pandas(test_df.reset_index(drop=True)),
    )

task_to_dataset = {
    "T-SAD": ("/content/drive/MyDrive/data/T_SAD.csv", "TRACE_ANOMALY"),
    "A-SAD": ("/content/drive/MyDrive/data/A_SAD.csv", "OUT_OF_ORDER"),
    "S-NAP": ("/content/drive/MyDrive/data/S_NAP.csv", "NEXT_ACTIVITY"),
}

train_val_test_datasets = {
    "T-SAD": {},
    "A-SAD": {},
    "S-NAP": {},
}

for task, (file_name, task_name) in tqdm(task_to_dataset.items(), desc="Processing tasks"):
    train_df, val_df, test_df = load_dataset(file_name, task_name, frac=None)
    train_val_test_datasets[task]["train"] = train_df
    train_val_test_datasets[task]["val"] = val_df
    train_val_test_datasets[task]["test"] = test_df
    train_val_test_datasets[task]["train_len"] = len(train_df)
    train_val_test_datasets[task]["val_len"] = len(val_df)
    train_val_test_datasets[task]["test_len"] = len(test_df)

summary_df = pd.DataFrame([
    {
        "Task": task,
        "Total": train_val_test_datasets[task]["train_len"] +
                 train_val_test_datasets[task]["val_len"] +
                 train_val_test_datasets[task]["test_len"],
        "Train": train_val_test_datasets[task]["train_len"],
        "Validation": train_val_test_datasets[task]["val_len"],
        "Test": train_val_test_datasets[task]["test_len"],
    }
    for task in task_to_dataset
])

summary_df

Processing tasks: 100%|██████████| 3/3 [01:31<00:00, 30.60s/it]


Unnamed: 0,Task,Total,Train,Validation,Test
0,T-SAD,290811,227602,43509,19700
1,A-SAD,316308,229402,56154,30752
2,S-NAP,1288965,1071453,166785,50727


## Replication Step 2: Run Evaluating the Ability of LLMs to Solve Semantics-Aware Process Mining Tasks approach for ICL

In [None]:
import os

BASE_PATH = "/content/drive/MyDrive/llms4pm-icl"
EVAL_PATH = os.path.join(BASE_PATH, "eval")
DATA_ROOT = os.path.join(BASE_PATH, "data")

!python3 "/content/drive/MyDrive/llms4pm-icl/random_baseline.py"

tqdm_auto.pandas() has been initialized.
pandas version: 2.2.3
tqdm version: 4.67.1
pandas DataFrame has progress_apply: True
column_view::get_data: Unsupported type: 24
column_view::get_data: Unsupported type: 24
column_view::get_data: Unsupported type: 24
column_view::get_data: Unsupported type: 24
column_view::get_data: Unsupported type: 24
column_view::get_data: Unsupported type: 24
column_view::get_data: Unsupported type: 24
column_view::get_data: Unsupported type: 24
column_view::get_data: Unsupported type: 24
column_view::get_data: Unsupported type: 24
column_view::get_data: Unsupported type: 24
column_view::get_data: Unsupported type: 24


In [None]:
!python3 "/content/drive/MyDrive/llms4pm-icl/evaluate_llm.py" trace_anomaly cuda:0 mistralai/Mistral-7B-Instruct-v0.2 "[3,5]" 3 10000

In [None]:
!python3 "/content/drive/MyDrive/llms4pm-icl/evaluate_llm.py" out_of_order cuda:0 mistralai/Mistral-7B-Instruct-v0.2 "[3,5]" 3 10000

In [None]:
!python3 "/content/drive/MyDrive/llms4pm-icl/evaluate_llm.py" next_activity cuda:0 mistralai/Mistral-7B-Instruct-v0.2 "[3,5]" 3 10000

In [None]:
!python3 "/content/drive/MyDrive/llms4pm-icl/evaluate_llm.py" trace_anomaly cuda:0 meta-llama/Meta-Llama-3-8B-Instruct "[3,5]" 3 10000

In [None]:
!python3 "/content/drive/MyDrive/llms4pm-icl/evaluate_llm.py" out_of_order cuda:0 meta-llama/Meta-Llama-3-8B-Instruct "[3,5]" 3 10000

In [None]:
!python3 "/content/drive/MyDrive/llms4pm-icl/evaluate_llm.py" next_activity cuda:0 meta-llama/Meta-Llama-3-8B-Instruct "[3,5]" 3 10000

## Replication Step 3: Evaluate Model

In [None]:
import pandas as pd

data = {
    "Metric": ["F1_Mac", "F1_Mac", "F1_Mac"],
    "Approach": ["Random", "ICL Mistral", "ICL Llama"],
    "T-SAD": ["0.50 ± 0.000", "0.49 ± 0.022", "0.51 ± 0.015"],
    "A-SAD": ["0.50 ± 0.000", "0.44 ± 0.011", "0.53 ± 0.021"],
    "S-NAP": ["0.13 ± 0.000", "0.18 ± 0.018", "0.32 ± 0.054"]
}

df_ref = pd.DataFrame(data)

import dataframe_image as dfi
from google.colab import files

df_ref.dfi.export("icl-results-paper.png", table_conversion="selenium")
files.download("icl-results-paper.png")

df_ref

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

Unnamed: 0,Metric,Approach,T-SAD,A-SAD,S-NAP
0,F1_Mac,Random,0.50 ± 0.000,0.50 ± 0.000,0.13 ± 0.000
1,F1_Mac,ICL Mistral,0.49 ± 0.022,0.44 ± 0.011,0.18 ± 0.018
2,F1_Mac,ICL Llama,0.51 ± 0.015,0.53 ± 0.021,0.32 ± 0.054


In [None]:
import pandas as pd
import os

map_to_approach = {
    "Random": {
        "T-SAD": "random_t_sad_results.csv",
        "A-SAD": "random_a_sad_results.csv",
        "S-NAP": "random_nap_results.csv",
    },
    "ICL Mistral": {
        "T-SAD": "mistralai-Mistral-7B-Instruct-v0.2_trace_anomaly_checkpoints.csv",
        "A-SAD": "mistralai-Mistral-7B-Instruct-v0.2_out_of_order_checkpoints.csv",
        "S-NAP": "mistralai_Mistral_7B_Instruct_v0_2_2024_12_01_10_14_20_next_activity.csv"
    },
    "ICL Llama": {
        "T-SAD": "meta-llama-Meta-Llama-3-8B-Instruct_trace_anomaly_checkpoints.csv",
        "A-SAD": "meta-llama-Meta-Llama-3-8B-Instruct_out_of_order_checkpoints.csv",
        "S-NAP": "meta_llama_Meta_Llama_3_8B_Instruct_2024_11_30_20_00_04_next_activity.csv"
    }
}

BASE_PATH = "/content/drive/MyDrive/llms4pm-icl/results"

table_data = []

for approach, files in map_to_approach.items():
    row = [approach]
    for task, file_name in files.items():
        file_path = os.path.join(BASE_PATH, file_name)
        if os.path.exists(file_path):
            df = pd.read_csv(file_path)
            if 'f1 mac' in df.columns:
                highest_f1 = df['f1 mac'].max()
                row.append(f"{highest_f1:.2f} ± {df['f1 mac'].std():.3f}")
            else:
                row.append("N/A")
        else:
            row.append("N/A")
    table_data.append(row)

df_new = pd.DataFrame(table_data, columns=["Approach", "T-SAD", "A-SAD", "S-NAP"])
df_new.insert(0,"Metric","F1_Mac")

import dataframe_image as dfi
from google.colab import files

df_new.dfi.export("icl-results-rep.png", table_conversion="selenium")
files.download("icl-results-rep.png")

df_new

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

Unnamed: 0,Metric,Approach,T-SAD,A-SAD,S-NAP
0,F1_Mac,Random,0.50 ± 0.002,0.51 ± 0.002,0.13 ± 0.005
1,F1_Mac,ICL Mistral,0.51 ± 0.042,0.49 ± 0.069,0.14 ± nan
2,F1_Mac,ICL Llama,0.51 ± 0.038,0.48 ± 0.046,0.27 ± nan


In [None]:
def parse_f1(s: str) -> float:
    if s == "N/A":
        return float("nan")
    return float(s.split(" ± ")[0])

df_combined = df_ref.merge(df_new, on=["Metric", "Approach"], suffixes=("_ref", "_new"))

for col_base in ["T-SAD", "A-SAD", "S-NAP"]:
    col_ref = f"{col_base}_ref"
    col_new = f"{col_base}_new"
    col_delta = f"{col_base}_delta"

    df_combined[col_delta] = df_combined.apply(
        lambda row: parse_f1(row[col_new]) - parse_f1(row[col_ref]),
        axis=1
    )


cols_to_show = [
    "Metric", "Approach",
    "T-SAD_ref", "T-SAD_new", "T-SAD_delta",
    "A-SAD_ref", "A-SAD_new", "A-SAD_delta",
    "S-NAP_ref", "S-NAP_new", "S-NAP_delta"
]

df_combined[cols_to_show]

Unnamed: 0,Metric,Approach,T-SAD_ref,T-SAD_new,T-SAD_delta,A-SAD_ref,A-SAD_new,A-SAD_delta,S-NAP_ref,S-NAP_new,S-NAP_delta
0,F1_Mac,Random,0.50 ± 0.000,0.50 ± 0.002,0.0,0.50 ± 0.000,0.51 ± 0.002,0.01,0.13 ± 0.000,0.13 ± 0.005,0.0
1,F1_Mac,ICL Mistral,0.49 ± 0.022,0.51 ± 0.042,0.02,0.44 ± 0.011,0.49 ± 0.069,0.05,0.18 ± 0.018,0.07 ± nan,-0.11
2,F1_Mac,ICL Llama,0.51 ± 0.015,0.51 ± 0.038,0.0,0.53 ± 0.021,0.48 ± 0.046,-0.05,0.32 ± 0.054,0.11 ± 0.013,-0.21


In [None]:
# https://docs.google.com/spreadsheets/d/1z3io2WXt2qnWlmIZ95-AcQ2aYWCAY55mCtpiqsSEHHw/edit?gid=0#gid=0
data_runtime = {
    "Approach": ["ICL Mistral", "ICL Llama"],
    "T-SAD": ["1:00:00", "1:00:00"],
    "A-SAD": ["1:55:00", "1:20:00"],
    "S-NAP": ["1:30:00", "1:25:00"]
}

df_runtime_detailed = pd.DataFrame(data_runtime)

import dataframe_image as dfi
from google.colab import files

df_runtime_detailed.dfi.export("icl-runtime-rep.png", table_conversion="selenium")
files.download("icl-runtime-rep.png")

df_runtime_detailed

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

Unnamed: 0,Approach,T-SAD,A-SAD,S-NAP
0,ICL Mistral,1:00:00,1:55:00,1:30:00
1,ICL Llama,1:00:00,1:20:00,1:25:00


# Fine-Tuning LMs

## Replication Step 0: Data collection

The corpus and datasets can be downloaded from here: [datasets](https://zenodo.org/records/14273161)

- T-SAD: Given a trace σ, decide if σ is a valid execution of the underlying process or not, without knowing the behavior allowed in the process. Each row contains a trace (column trace) with a corresponding label (column anomalous) indicating whether the trace represents a valid execution of the underlying process. The set of activities that can occur in the process are also given (column unique_activities).

```csv
model_id,revision_id,trace,label,unique_activities,anomalous,id
c78bef3bc4f043e880c51a5de86f7b33,cf03653c9c664b55a18da5b53ca9cee5,"['Take comprehensive exam', 'Submit course form (at least ECTS)', 'Complete courses', 'Get an international publication', 'Follow seminar on research methodology', 'Give first doctoral seminar', 'Participate in international conference', 'Give second doctoral seminar']",False,"{'Take comprehensive exam', 'Submit course form (at least ECTS)', 'Complete courses', 'Follow seminar on research methodology', 'Give first doctoral seminar', 'Get an international publication', 'Give second doctoral seminar', 'Participate in international conference'}",False,c78bef3bc4f043e880c51a5de86f7b33_cf03653c9c664b55a18da5b53ca9cee5
```

- A-SAD: Given an eventually-follows relation ef = a ≺ b of a trace σ, decide if ef represents a valid execution order of the two activities a and b that are executed in a process or not, without knowing the behavior allowed in the process.
Each row contains an eventually-follows relation (column eventually_follows) with a corresponding label (column out_of_order) indicating wether the two activities of the relation were executed in an invalid order (TRUE) or in a valid order (FALSE) according to the underlying process (model). The set of activities that can occur in the process are also given (column unique_activities).

```csv
model_id,revision_id,out_of_order,unique_activities,eventually_follows,id
2b4e4aca49ef4694a290b956fe18eb9b,f9f65a9604b4434996eede7b550b8f8a,True,"{'Register claim', 'Perform assessment', 'Phone garage to authorise repairs', 'Send letter', 'Checks insurance claim', 'Reject claim', 'Schedule payment', 'Check document'}","('Phone garage to authorise repairs', 'Reject claim')",2b4e4aca49ef4694a290b956fe18eb9b_f9f65a9604b4434996eede7b550b8f8a
```

- S-NAP: Given an event log L and a prefix p_k of length k, with 1 < k, predict the next activity a_k+1
Each row contains a trace prefix (column prefix) with a corresponding next activity (column next) indicating the activity that should be performed next after the last activity of the prefix  according to the trace from which the prefix was generated. The set of activities that can occur in the process are also given (column unique_activities).

```csv
model_id,revision_id,trace,prefix,next,unique_activities,id
f59a5a5a07b64916bcbd843e48485c0e,11c2f63f1f684c9dabbdb18d5e47bcca,"['mold upper and lower part of the enginge', 'bend front defender', 'wield parts together', 'bend bars for the frame', 'insert outlets and cylinders', 'make seat', 'bend rear defender', 'weld bars together', 'assemble parts']","['mold upper and lower part of the enginge', 'bend front defender', 'wield parts together']",bend bars for the frame,"{'bend bars for the frame', 'weld bars together', 'insert outlets and cylinders', 'bend rear defender', 'wield parts together', 'bend front defender', 'mold upper and lower part of the enginge', 'assemble parts', 'make seat'}",f59a5a5a07b64916bcbd843e48485c0e_11c2f63f1f684c9dabbdb18d5e47bcca
```

## Replication Step 1: Data Exploration & Preprocessing

**Connect to Google drive and load the data under data/ and the two repositories mentioned above**

In [None]:
!pip install -r "/content/drive/MyDrive/llms4pm-ft/requirements.txt"



In [None]:
!pip install "unsloth[colab-new] @ git+https://github.com/unslothai/unsloth.git"

Collecting unsloth@ git+https://github.com/unslothai/unsloth.git (from unsloth[colab-new]@ git+https://github.com/unslothai/unsloth.git)
  Cloning https://github.com/unslothai/unsloth.git to /tmp/pip-install-xk694dvg/unsloth_c9ba6f1f86054dd8ab85b0dd0f1c4d4c
  Running command git clone --filter=blob:none --quiet https://github.com/unslothai/unsloth.git /tmp/pip-install-xk694dvg/unsloth_c9ba6f1f86054dd8ab85b0dd0f1c4d4c
  Resolved https://github.com/unslothai/unsloth.git to commit 5dddf27f3ba94506c48251e907031039eecd40d1
  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone
Building wheels for collected packages: unsloth
  Building wheel for unsloth (pyproject.toml) ... [?25l[?25hdone
  Created wheel for unsloth: filename=unsloth-2025.1.5-py3-none-any.whl size=176838 sha256=9b77cf6792467a73e1b3f9aa889288e3a8c1b39ac064aefcc549b14b6477a6cf
  Stored in directory: /tmp/pip-ep

**Testing on 10% of the dataset for training the model**

In [None]:
import pickle
from typing import Optional
from sklearn.model_selection import train_test_split
from datasets.arrow_dataset import Dataset
import pandas as pd
import ast
from tqdm import tqdm

def stratified_sample(df, label_col, frac, random_state=42) -> pd.DataFrame:
    """
    Performs stratified sampling to reduce the dataset size by a given fraction.
    """
    stratified_df, _ = train_test_split(
        df,
        stratify=df[label_col],
        test_size=1-frac,
        random_state=random_state
    )
    return stratified_df

def split_by_model(df, task, pkl_path="/content/drive/MyDrive/data/train_val_test.pkl", frac: Optional[float] = None) -> tuple[pd.DataFrame, pd.DataFrame, pd.DataFrame]:
    """
    Splits a DataFrame into train, validation, and test subsets based on IDs.
    Only includes rows with more than one unique activity.
    """
    df["id"] = df["model_id"].astype(str) + "_" + df["revision_id"].astype(str)
    df["num_unique_activities"] = df["unique_activities"].apply(len)

    df = df[df["num_unique_activities"] > 1]

    with open(pkl_path, "rb") as file:
        train_ids, val_ids, test_ids = pickle.load(file)

    train_df = df[df["id"].isin(train_ids)]
    val_df = df[df["id"].isin(val_ids)]
    test_df = df[df["id"].isin(test_ids)]

    if frac is not None and 0 < frac < 1:
        if task in ["TRACE_ANOMALY", "OUT_OF_ORDER"]:
            train_df = stratified_sample(train_df, label_col="ds_labels", frac=frac)
            val_df = stratified_sample(val_df, label_col="ds_labels", frac=frac)
            test_df = stratified_sample(test_df, label_col="ds_labels", frac=frac)
        else:
            train_df = train_df.sample(frac=frac, random_state=42)
            val_df = val_df.sample(frac=frac, random_state=42)
            test_df = test_df.sample(frac=frac, random_state=42)

    return train_df, val_df, test_df

def remove_duplicates(pair_df):
    """
    Removes duplicate rows in the DataFrame based on specific columns.
    Additional columns like 'trace', 'eventually_follows', and 'prefix'
    are considered if present in the DataFrame.
    """
    columns = ["revision_id", "model_id", "unique_activities"]
    if "trace" in pair_df.columns:
        columns.append("trace")
    if "eventually_follows" in pair_df.columns:
        columns.append("eventually_follows")
    if "prefix" in pair_df.columns:
        columns.append("prefix")
    pair_df = pair_df.drop_duplicates(subset=columns)
    return pair_df

def setify(x: str):
    """
    Converts a string representation of a set into an actual Python set.
    Ensures the result is a set, otherwise raises an AssertionError.
    """
    set_: set[str] = ast.literal_eval(x)
    assert isinstance(set_, set), f"Conversion failed for {x}"
    return set_

def parse_tuple(x: str):
    """
    Converts a string representation of a tuple into an actual Python tuple.
    Ensures the result is a tuple, otherwise raises an AssertionError.
    """
    tuple_ = ast.literal_eval(x) if isinstance(x, str) else x
    assert isinstance(tuple_, tuple), f"Conversion failed for {x}"
    return tuple_

def load_dataset(file_name: str, task: str, frac: Optional[float]) -> tuple[Dataset, Dataset, Dataset]:
    """
    Dynamically loads and processes a dataset based on the file name and task.
    """
    df = pd.read_csv(file_name)

    if task == "TRACE_ANOMALY":
        # T-SAD
        df["ds_labels"] = (~df["anomalous"]).astype(bool)  # Invert labels
        df["trace"] = df["trace"].apply(lambda x: ast.literal_eval(x) if isinstance(x, str) else x)
        df["trace"] = df["trace"].apply(lambda x: tuple(x))
        df = remove_duplicates(df)
        df["unique_activities"] = df["unique_activities"].apply(setify)
        columns = ["model_id", "revision_id", "unique_activities", "trace", "ds_labels"]
        df = df.loc[:, columns]
        #print(df.head())
    elif task == "OUT_OF_ORDER":
        # A-SAD
        df["ds_labels"] = (~df["out_of_order"]).astype(bool)  # Invert labels
        df = remove_duplicates(df)
        df["unique_activities"] = df["unique_activities"].apply(setify)
        df["eventually_follows"] = df["eventually_follows"].apply(parse_tuple)
        columns = ["model_id", "revision_id", "unique_activities", "ds_labels", "eventually_follows"]
        df = df.loc[:, columns]
        #print(df.head())
    elif task == "NEXT_ACTIVITY":
        # S-NAP
        df = remove_duplicates(df)
        df["prefix"] = df["prefix"].apply(lambda x: ast.literal_eval(x) if isinstance(x, str) else x)
        df["unique_activities"] = df["unique_activities"].apply(setify)
        columns = ["model_id", "revision_id", "prefix", "next", "unique_activities"]
        df = df.loc[:, columns]
        #print(df.head())
    else:
        raise ValueError(f"Unsupported task: {task}")

    train_df, val_df, test_df = split_by_model(df, task=task, frac=frac)

    return (
        Dataset.from_pandas(train_df.reset_index(drop=True)),
        Dataset.from_pandas(val_df.reset_index(drop=True)),
        Dataset.from_pandas(test_df.reset_index(drop=True)),
    )

task_to_dataset = {
    "T-SAD": ("/content/drive/MyDrive/data/T_SAD.csv", "TRACE_ANOMALY"),
    "A-SAD": ("/content/drive/MyDrive/data/A_SAD.csv", "OUT_OF_ORDER"),
    "S-NAP": ("/content/drive/MyDrive/data/S_NAP.csv", "NEXT_ACTIVITY"),
}

train_val_test_datasets = {
    "T-SAD": {},
    "A-SAD": {},
    "S-NAP": {},
}

frac = 0.1

for task, (file_name, task_name) in tqdm(task_to_dataset.items(), desc="Processing tasks"):
    train_df, val_df, test_df = load_dataset(file_name, task_name, frac=frac)
    train_val_test_datasets[task]["train"] = train_df
    train_val_test_datasets[task]["val"] = val_df
    train_val_test_datasets[task]["test"] = test_df
    train_val_test_datasets[task]["train_len"] = len(train_df)
    train_val_test_datasets[task]["val_len"] = len(val_df)
    train_val_test_datasets[task]["test_len"] = len(test_df)

summary_df = pd.DataFrame([
    {
        "Task": task,
        "Total": train_val_test_datasets[task]["train_len"] +
                 train_val_test_datasets[task]["val_len"] +
                 train_val_test_datasets[task]["test_len"],
        "Train": train_val_test_datasets[task]["train_len"],
        "Validation": train_val_test_datasets[task]["val_len"],
        "Test": train_val_test_datasets[task]["test_len"],
    }
    for task in task_to_dataset
])

summary_df

Processing tasks: 100%|██████████| 3/3 [01:27<00:00, 29.12s/it]


Unnamed: 0,Task,Total,Train,Validation,Test
0,T-SAD,18899,15120,2697,1082
1,A-SAD,31630,22940,5615,3075
2,S-NAP,128896,107145,16678,5073


## Replication Step 2: Run Evaluating the Ability of LLMs to Solve Semantics-Aware Process Mining Tasks approach for FT

```
Fine-Tuning. We fine-tune Llama and Mistral in batches of
two instances with gradient accumulation over 16 batches, re-
sulting in an effective batch size of 32. We fine-tune RoBERTa
also in batches of 32 instances. All models are trained using
the AdamW algorithm [24], with an initial learning rate of 1e-
5. We fine-tune the LLMs for three epochs and RoBERTa for
ten epochs. We run each combination of task and model three
times using different random seeds, corresponding to different
random initialization of model parameters and shuffling of
training data in each run.
```

| **Parameter**               | **Value for Llama & Mistral** | **Value for RoBERTa** |
|-----------------------------|-------------------------------|-----------------------|
| **Optimizer**               | AdamW                         | AdamW                 |
| **Initial Learning Rate**   | 1e-5                          | 1e-5                  |
| **Number of Runs**          | 3                             | 3                     |
| **Batch Size per Instance** | 2                             | 32                    |
| **Gradient Accumulation**   | 16                            | N/A                   |
| **Effective Batch Size**    | 32                            | 32                    |
| **Number of Epochs**        | 3                             | 10                    |

In [None]:
!python3 "/content/drive/MyDrive/llms4pm-ft/llama_hf_script.py" T-SAD

2025-01-12 18:19:08.225937: I tensorflow/core/util/port.cc:153] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2025-01-12 18:19:08.242628: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:485] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2025-01-12 18:19:08.262972: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:8454] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2025-01-12 18:19:08.269134: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1452] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2025-01-12 18:19:08.283679: I tensorflow/core/platform/cpu_feature_guar

In [None]:
!python3 "/content/drive/MyDrive/llms4pm-ft/llama_hf_script.py" A-SAD

In [None]:
!python3 "/content/drive/MyDrive/llms4pm-ft/llama_hf_script.py" S-NAP

## Replication Step 3: Evaluate Model

In [None]:
import pandas as pd

data = {
    "Metric": ["F1_Mac", "F1_Mac", "F1_Mac"],
    "Approach": ["FT RoBERTa", "FT Mistral", "FT Llama"],
    "T-SAD": ["0.77 ± 0.006", "0.79 ± 0.010", "0.79 ± 0.011"],
    "A-SAD": ["0.85 ± 0.003", "0.88 ± 0.002", "0.88 ± 0.000"],
    "S-NAP": ["0.63 ± 0.048", "0.68 ± 0.039", "0.69 ± 0.049"]
}

df_ref = pd.DataFrame(data)

import dataframe_image as dfi
from google.colab import files

df_ref.dfi.export("ft-results-paper.png", table_conversion="selenium")
files.download("ft-results-paper.png")

df_ref

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

Unnamed: 0,Metric,Approach,T-SAD,A-SAD,S-NAP
0,F1_Mac,FT RoBERTa,0.77 ± 0.006,0.85 ± 0.003,0.63 ± 0.048
1,F1_Mac,FT Mistral,0.79 ± 0.010,0.88 ± 0.002,0.68 ± 0.039
2,F1_Mac,FT Llama,0.79 ± 0.011,0.88 ± 0.000,0.69 ± 0.049


In [None]:
import pandas as pd
import os

map_to_approach = {
    "FT RoBERTa": {
        "T-SAD": "finetune_eval_T_SAD_roberta_base_T_SAD_samples_151206_epochs_5_lr.csv",
        "A-SAD": "finetune_eval_A_SAD_outputs_roberta_base_A_SAD_samples_229402_epochs.csv",
        "S-NAP": "finetune_eval_S-NAP_roberta_large_samples_187178_epochs_10.csv",
    },
    "FT Mistral": {
        "T-SAD": "N/A",
        "A-SAD": "N/A",
        "S-NAP": "N/A"
    },
    "FT Llama": {
        "T-SAD": "finetune_eval_T-SAD_meta-llama_Meta-Llama-3-8B-Instruct.csv",
        "A-SAD": "finetune_eval_A-SAD_meta-llama_Meta-Llama-3-8B-Instruct_A-SAD_samples-all_epochs-3_lr-1e-05_batch-2x16_time-2025-01-08_09-03-01_results.csv",
        "S-NAP": "finetune_eval_S-NAP_meta-llama_Meta-Llama-3-8B-Instruct_S-NAP_samples-all_epochs-3_lr-1e-05_batch-2x16_time-2025-01-12_09-17-08_results.csv"
    }
}

BASE_PATH = "/content/drive/MyDrive/llms4pm-ft/eval"

table_data = []

for approach, files in map_to_approach.items():
    row = [approach]
    for task, file_name in files.items():
        file_path = os.path.join(BASE_PATH, file_name)
        if os.path.exists(file_path):
            df = pd.read_csv(file_path)
            if 'f1 mac' in df.columns:
                highest_f1 = df['f1 mac'].max()
                row.append(f"{highest_f1:.2f} ± {df['f1 mac'].std():.3f}")
            elif 'f1_mac' in df.columns:
                highest_f1 = df['f1_mac'].max()
                row.append(f"{highest_f1:.2f} ± {df['f1_mac'].std():.3f}")
            else:
                row.append("N/A")
        else:
            row.append("N/A")
    table_data.append(row)

df_new = pd.DataFrame(table_data, columns=["Approach", "T-SAD", "A-SAD", "S-NAP"])
df_new.insert(0,"Metric","F1_Mac")

import dataframe_image as dfi
from google.colab import files

df_new.dfi.export("ft-results-rep.png", table_conversion="selenium")
files.download("ft-results-rep.png")

df_new

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

Unnamed: 0,Metric,Approach,T-SAD,A-SAD,S-NAP
0,F1_Mac,FT RoBERTa,0.70 ± nan,0.80 ± nan,0.11 ± nan
1,F1_Mac,FT Mistral,,,
2,F1_Mac,FT Llama,0.72 ± 0.026,0.81 ± nan,0.51 ± nan


In [None]:
def parse_f1(s: str) -> float:
    if s == "N/A":
        return float("nan")
    return float(s.split(" ± ")[0])

df_combined = df_ref.merge(df_new, on=["Metric", "Approach"], suffixes=("_ref", "_new"))

for col_base in ["T-SAD", "A-SAD", "S-NAP"]:
    col_ref = f"{col_base}_ref"
    col_new = f"{col_base}_new"
    col_delta = f"{col_base}_delta"

    df_combined[col_delta] = df_combined.apply(
        lambda row: parse_f1(row[col_new]) - parse_f1(row[col_ref]),
        axis=1
    )


cols_to_show = [
    "Metric", "Approach",
    "T-SAD_ref", "T-SAD_new", "T-SAD_delta",
    "A-SAD_ref", "A-SAD_new", "A-SAD_delta",
    "S-NAP_ref", "S-NAP_new", "S-NAP_delta"
]

df_combined[cols_to_show]

Unnamed: 0,Metric,Approach,T-SAD_ref,T-SAD_new,T-SAD_delta,A-SAD_ref,A-SAD_new,A-SAD_delta,S-NAP_ref,S-NAP_new,S-NAP_delta
0,F1_Mac,FT RoBERTa,0.77 ± 0.006,0.70 ± nan,-0.07,0.85 ± 0.003,0.80 ± nan,-0.05,0.63 ± 0.048,0.11 ± nan,-0.52
1,F1_Mac,FT Mistral,0.79 ± 0.010,,,0.88 ± 0.002,,,0.68 ± 0.039,,
2,F1_Mac,FT Llama,0.79 ± 0.011,0.72 ± 0.026,-0.07,0.88 ± 0.000,0.81 ± nan,-0.07,0.69 ± 0.049,0.51 ± nan,-0.18


In [None]:
import pandas as pd

data = {
    "Approach": ["FT Llama"],
    "T-SAD": ["11.1h"],
    "A-SAD": ["15.0h"],
    "S-NAP": ["23.0h"]
}

df_runtime = pd.DataFrame(data)

import dataframe_image as dfi
from google.colab import files

df_runtime.dfi.export("ft-runtime-paper.png", table_conversion="selenium")
files.download("ft-runtime-paper.png")

df_runtime

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

Unnamed: 0,Approach,T-SAD,A-SAD,S-NAP
0,FT Llama,11.1h,15.0h,23.0h


In [None]:
import pandas as pd

data = {
    "Approach": ["FT Llama"],
    "T-SAD": ["7.36h"],
    "A-SAD": ["N/A"],
    "S-NAP": ["4.23h"]
}

df_runtime = pd.DataFrame(data)

import dataframe_image as dfi
from google.colab import files

df_runtime.dfi.export("ft-runtime-rep.png", table_conversion="selenium")
files.download("ft-runtime-rep.png")

df_runtime

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

Unnamed: 0,Approach,T-SAD,A-SAD,S-NAP
0,FT Llama,7.36h,,4.23h
