Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

attribute driven trace clustering algorithm does not work - empty distance matrix #494

Closed
PhoenixRising93 opened this issue Jul 3, 2024 · 1 comment

Comments

@PhoenixRising93
Copy link

Hi,
I have a problem with the trace clustering algorithm provided in: https://github.com/caoyukun0430/pm4py-source/tree/yukun_paper
I copied the folder to \anaconda3\Lib\site-packages\pm4py\algo\trace_cluster

Then I tried to use the apply function described here: https://pm4py.fit.fraunhofer.de/static/assets/api/2.7.11/pm4py.algo.clustering.trace_attribute_driven.html

In Spyder it showed the error:

  File ~\anaconda3\lib\site-packages\pm4py\algo\clustering\trace_attribute_driven\algorithm.py:114 in apply
    Z = linkage(y, method='average')

  File ~\anaconda3\lib\site-packages\scipy\cluster\hierarchy.py:1068 in linkage
    n = int(distance.num_obs_y(y))

  File ~\anaconda3\lib\site-packages\scipy\spatial\distance.py:2572 in num_obs_y
    raise ValueError("The number of observations cannot be determined on "

ValueError: The number of observations cannot be determined on an empty distance matrix.

At first I thought it was a problem with my project-specific EventLog because I selected a categorical feature as an attribute. I also sliced the dataframe because I thought I had too many events in the log - namely 400,000 - but this did not change the error outcome.

So I tried the Receipt.xes file from the trace_cluster folder. This is the code:

import pm4py
from pm4py.algo.clustering.trace_attribute_driven import algorithm 
from pm4py.algo.clustering.trace_attribute_driven.algorithm import Variants

log = pm4py.read_xes(r"...\anaconda3\Lib\site-packages\pm4py\algo\trace_cluster\example\real_log\Receipt.xes")

#variant = trace_clustering.Variants.DMM_LEVEN

variant = Variants.VARIANT_DMM_LEVEN

pm4py.algo.clustering.trace_attribute_driven.algorithm.apply(log, 'case:responsible', variant)

This error showed again:

ValueError: The number of observations cannot be determined on an empty distance matrix.

I also tried different attributes but that did not work either. In my initial try with the project-specific data I converted the dataframe with 'log = pm4py.convert_to_event_log(dataframe)'

I looked into the source code of num_obs_y and this errors occurs when k == 0. It seems that no distance matrix is calculated at all.
What is the issue here?

Could you please provide an example code in the documentation of how the implemented trace clustering works?

Thank you.

@fit-alessandro-berti
Copy link
Contributor

Dear @PhoenixRising93

The dataframe is internally converted to an EventLog. Therefore, you should call the function with 'responsible' instead of 'case:responsible'

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants