Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Data Object and Data Object Reference name vs ID #214

Closed
sharky98 opened this issue Sep 1, 2022 · 8 comments · Fixed by #325
Closed

Data Object and Data Object Reference name vs ID #214

sharky98 opened this issue Sep 1, 2022 · 8 comments · Fixed by #325

Comments

@sharky98
Copy link

sharky98 commented Sep 1, 2022

Hello,

I am facing multiple issues regarding the data object and its reference. I am using either Camunda or Modelio to model my BPMN, which is a simple user task that output a decision DataObject and an exclusive gate that either go to the end or redo the user task based on the Decision DataObject.

I checked both examples (cli and ducks) and tests, but everything is based on Camunda Forms, which isn't a route I want to go. Keeping with BPMN only, I was thinking of using DataObject as a trigger to my python app that something must be provided to the process.

I currently use a standalone script based on your CLI example (hardcoded the BPMN files, forced the debug and changed the user task to set the value of the data output) to figure out how to work with BPMN and your library. You'll find below relevant part of codes. To summarize, here my issues.

  1. The parser and specs uses BPMN ID as the name, which is a value auto-generated and not exposed in either Modelio or Camunda. This is not very friendly as it require the end-user that will make the process to open the raw XML and find out the BPMN ID of the DataObject and either change it everywhere in the XML to some meaningful value; or go back to the conditional expression and use the non-meaningful BPMN ID as the condition checker.
  2. Even after changing the conditional expression to match the name (BPMN ID), I still get that this name is not defined (maybe the PythonScriptEngine is not considering those DataObject?).

Thanks for reading me, and hopefully you'll have some idea how to sort this out!

BTW, I am using a editable from source installation of the package, since the latest released version does not have support for those data object.

Here is where you use the ID as the name.

class BpmnDataSpecification:
def __init__(self, name, description=None):
"""
:param name: the name of the task (the BPMN ID)
:param description: the task description (the BPMN name)
"""
self.name = name
self.description = description or name
# In the future, we can add schemas defining the objects here.

Here is where you use the same ID to check if it exist as a data object.

def parse_incoming_data_references(self):
specs = []
for name in self.xpath('.//bpmn:dataInputAssociation/bpmn:sourceRef'):
ref = first(self.doc_xpath(f".//bpmn:dataObjectReference[@id='{name.text}']"))
if ref is not None and ref.get('dataObjectRef') in self.process_parser.spec.data_objects:
specs.append(self.process_parser.spec.data_objects[ref.get('dataObjectRef')])
else:
raise ValidationException(f'Cannot resolve dataInputAssociation {name}', self.node, self.filename)
return specs
def parse_outgoing_data_references(self):
specs = []
for name in self.xpath('.//bpmn:dataOutputAssociation/bpmn:targetRef'):
ref = first(self.doc_xpath(f".//bpmn:dataObjectReference[@id='{name.text}']"))
if ref is not None and ref.get('dataObjectRef') in self.process_parser.spec.data_objects:
specs.append(self.process_parser.spec.data_objects[ref.get('dataObjectRef')])
else:
raise ValidationException(f'Cannot resolve dataOutputAssociation {name}', self.node, self.filename)
return specs

Here is how I changed the complete_user_task(task). It is hardcoded for now, assuming only one output requested.

def complete_user_task(task):

    display_task(task)
    if task.data is None:
        task.data = {}

    if len(task.task_spec.data_output_associations) > 0:
        task.data[task.task_spec.data_output_associations[0].name] = "redo"

Here is the error I got that the name is not defined.

Set the value of `DataObject_17prm6c` to `redo`
{'DataObject_17prm6c': 'redo'}
Traceback (most recent call last):
  File "C:\Users\sa830268\Documents\dev\projects\be_tracker.backend\.venv\src\spiffworkflow\SpiffWorkflow\bpmn\PythonScriptEngine.py", line 196, in evaluate
    return self._evaluate(expression, task.data, task=task)
  File "C:\Users\sa830268\Documents\dev\projects\be_tracker.backend\.venv\src\spiffworkflow\SpiffWorkflow\bpmn\PythonScriptEngine.py", line 223, in _evaluate
    return eval(expression, globals, lcls)
  File "<string>", line 1, in <module>
NameError: name 'DataObject_17prm6c' is not defined

During handling of the above exception, another exception occurred:

  File "C:\Users\sa830268\Documents\dev\bin\pyenv\pyenv-win\versions\3.10.5\lib\runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "C:\Users\sa830268\Documents\dev\bin\pyenv\pyenv-win\versions\3.10.5\lib\runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "C:\Users\sa830268\Documents\dev\projects\be_tracker.backend\app\workflow\workflow.py", line 188, in <module>
    run(myWorkflow, True)
  File "C:\Users\sa830268\Documents\dev\projects\be_tracker.backend\app\workflow\workflow.py", line 167, in run
    workflow.do_engine_steps()
  File "C:\Users\sa830268\Documents\dev\projects\be_tracker.backend\.venv\src\spiffworkflow\SpiffWorkflow\bpmn\workflow.py", line 187, in do_engine_steps
    task.complete()
  File "C:\Users\sa830268\Documents\dev\projects\be_tracker.backend\.venv\src\spiffworkflow\SpiffWorkflow\task.py", line 783, in complete
    return self.task_spec._on_complete(self)
  File "C:\Users\sa830268\Documents\dev\projects\be_tracker.backend\.venv\src\spiffworkflow\SpiffWorkflow\specs\base.py", line 405, in _on_complete
    self._on_complete_hook(my_task)
  File "C:\Users\sa830268\Documents\dev\projects\be_tracker.backend\.venv\src\spiffworkflow\SpiffWorkflow\specs\ExclusiveChoice.py", line 82, in _on_complete_hook
    if condition is None or condition._matches(my_task):
  File "C:\Users\sa830268\Documents\dev\projects\be_tracker.backend\.venv\src\spiffworkflow\SpiffWorkflow\bpmn\specs\BpmnSpecMixin.py", line 33, in _matches
    return task.workflow.script_engine.evaluate(task, self.args[0])
  File "C:\Users\sa830268\Documents\dev\projects\be_tracker.backend\.venv\src\spiffworkflow\SpiffWorkflow\bpmn\PythonScriptEngine.py", line 198, in evaluate
    raise WorkflowTaskExecException(task,
SpiffWorkflow.bpmn.exceptions.WorkflowTaskExecException: Gateway_1i6bkpl: Error evaluating expression 'DataObject_17prm6c=='redo'', name 'DataObject_17prm6c' is not defined

Here my BPMN process from Camunda. It will fail to evaluate decision="redo" because it is stored as DataObject_17prm6c.

  <bpmn:process id="process" name="process" isExecutable="true">
    <bpmn:startEvent id="StartEvent_1">
      <bpmn:extensionElements />
      <bpmn:outgoing>Flow_0xkz2ar</bpmn:outgoing>
    </bpmn:startEvent>
    <bpmn:sequenceFlow id="Flow_0xkz2ar" sourceRef="StartEvent_1" targetRef="Activity_1repqrz" />
    <bpmn:userTask id="Activity_1repqrz" name="Init">
      <bpmn:extensionElements />
      <bpmn:incoming>Flow_0xkz2ar</bpmn:incoming>
      <bpmn:incoming>Flow_0zwwzf9</bpmn:incoming>
      <bpmn:outgoing>Flow_08d7vel</bpmn:outgoing>
      <bpmn:dataOutputAssociation id="DataOutputAssociation_1s559c4">
        <bpmn:targetRef>DataObjectReference_05w9q7s</bpmn:targetRef>
      </bpmn:dataOutputAssociation>
    </bpmn:userTask>
    <bpmn:exclusiveGateway id="Gateway_1i6bkpl" name="Check Decision?" default="Flow_0tgoifw">
      <bpmn:incoming>Flow_08d7vel</bpmn:incoming>
      <bpmn:outgoing>Flow_0tgoifw</bpmn:outgoing>
      <bpmn:outgoing>Flow_0zwwzf9</bpmn:outgoing>
    </bpmn:exclusiveGateway>
    <bpmn:sequenceFlow id="Flow_08d7vel" sourceRef="Activity_1repqrz" targetRef="Gateway_1i6bkpl" />
    <bpmn:endEvent id="Event_0v8tu84">
      <bpmn:incoming>Flow_0tgoifw</bpmn:incoming>
    </bpmn:endEvent>
    <bpmn:sequenceFlow id="Flow_0tgoifw" name="Go" sourceRef="Gateway_1i6bkpl" targetRef="Event_0v8tu84" />
    <bpmn:sequenceFlow id="Flow_0zwwzf9" name="Redo" sourceRef="Gateway_1i6bkpl" targetRef="Activity_1repqrz">
      <bpmn:conditionExpression xsi:type="bpmn:tFormalExpression">decision=='redo'</bpmn:conditionExpression>
    </bpmn:sequenceFlow>
    <bpmn:dataObjectReference id="DataObjectReference_05w9q7s" name="Decision" dataObjectRef="DataObject_17prm6c" />
    <bpmn:dataObject id="DataObject_17prm6c" />
  </bpmn:process>

Here the same process as generated by Modelio

    <process isClosed="false" isExecutable="true" processType="Public" name="CR_Process" id="bbc9568e-4664-4a24-929e-bd3854857707">
        <dataObject name="decision" id="REF-af032a66-519a-4e2f-93db-b85804d3dd47"/>
        <startEvent isInterrupting="false" parallelMultiple="false" name="" id="modelio-80b31935-5e9b-4607-b08d-34fca404aa05">
            <ns5:outgoing xmlns:ns5="http://www.omg.org/spec/BPMN/20100524/MODEL" xmlns="">modelio-6af02fe6-8c7c-4ce6-b581-ef5b0217d60f</ns5:outgoing>
        </startEvent>
        <sequenceFlow sourceRef="modelio-80b31935-5e9b-4607-b08d-34fca404aa05" targetRef="modelio-6331c10a-3991-4fe9-9153-8b705ca1484e" name="" id="modelio-6af02fe6-8c7c-4ce6-b581-ef5b0217d60f"/>
        <userTask implementation="##WebService" isForCompensation="false" startQuantity="1" completionQuantity="1" name="Init" id="modelio-6331c10a-3991-4fe9-9153-8b705ca1484e">
            <ns5:incoming xmlns:ns5="http://www.omg.org/spec/BPMN/20100524/MODEL" xmlns="">modelio-6af02fe6-8c7c-4ce6-b581-ef5b0217d60f</ns5:incoming>
            <ns5:incoming xmlns:ns5="http://www.omg.org/spec/BPMN/20100524/MODEL" xmlns="">modelio-15cba2c4-c161-4612-b70a-0d30521ecd5e</ns5:incoming>
            <ns5:outgoing xmlns:ns5="http://www.omg.org/spec/BPMN/20100524/MODEL" xmlns="">modelio-1641545c-6422-4639-9ad9-8b4fbff74f28</ns5:outgoing>
            <ioSpecification>
                <dataOutput id="modelio-6331c10a-3991-4fe9-9153-8b705ca1484e-af032a66-519a-4e2f-93db-b85804d3dd47"/>
                <inputSet id="InputSet_modelio-6331c10a-3991-4fe9-9153-8b705ca1484e"/>
                <outputSet id="OutputSet_modelio-6331c10a-3991-4fe9-9153-8b705ca1484e">
                    <dataOutputRefs>modelio-6331c10a-3991-4fe9-9153-8b705ca1484e-af032a66-519a-4e2f-93db-b85804d3dd47</dataOutputRefs>
                </outputSet>
            </ioSpecification>
            <dataOutputAssociation id="modelio-64e43a05-949f-44f0-9912-7f0cb6712af3">
                <sourceRef>modelio-6331c10a-3991-4fe9-9153-8b705ca1484e-af032a66-519a-4e2f-93db-b85804d3dd47</sourceRef>
                <targetRef>af032a66-519a-4e2f-93db-b85804d3dd47</targetRef>
            </dataOutputAssociation>
        </userTask>
        <sequenceFlow sourceRef="modelio-6331c10a-3991-4fe9-9153-8b705ca1484e" targetRef="f5552950-c5a9-43c8-b203-b81ab0ddb8ec" name="Sequence Flow" id="modelio-1641545c-6422-4639-9ad9-8b4fbff74f28"/>
        <exclusiveGateway default="e38cee12-2744-48a0-85f6-81280c4b4aa6" gatewayDirection="Unspecified" name="Check Decision?" id="f5552950-c5a9-43c8-b203-b81ab0ddb8ec">
            <ns5:incoming xmlns:ns5="http://www.omg.org/spec/BPMN/20100524/MODEL" xmlns="">modelio-1641545c-6422-4639-9ad9-8b4fbff74f28</ns5:incoming>
            <ns5:outgoing xmlns:ns5="http://www.omg.org/spec/BPMN/20100524/MODEL" xmlns="">e38cee12-2744-48a0-85f6-81280c4b4aa6</ns5:outgoing>
            <ns5:outgoing xmlns:ns5="http://www.omg.org/spec/BPMN/20100524/MODEL" xmlns="">modelio-15cba2c4-c161-4612-b70a-0d30521ecd5e</ns5:outgoing>
        </exclusiveGateway>
        <endEvent name="" id="e3f80147-cf6f-47f4-92f0-9f94e83e1455">
            <ns5:incoming xmlns:ns5="http://www.omg.org/spec/BPMN/20100524/MODEL" xmlns="">e38cee12-2744-48a0-85f6-81280c4b4aa6</ns5:incoming>
        </endEvent>
        <dataObjectReference dataObjectRef="REF-af032a66-519a-4e2f-93db-b85804d3dd47" name="decision" id="af032a66-519a-4e2f-93db-b85804d3dd47"/>
        <sequenceFlow sourceRef="f5552950-c5a9-43c8-b203-b81ab0ddb8ec" targetRef="e3f80147-cf6f-47f4-92f0-9f94e83e1455" name="Sequence Flow1" id="e38cee12-2744-48a0-85f6-81280c4b4aa6"/>
        <sequenceFlow sourceRef="f5552950-c5a9-43c8-b203-b81ab0ddb8ec" targetRef="modelio-6331c10a-3991-4fe9-9153-8b705ca1484e" name="Sequence Flow2" id="modelio-15cba2c4-c161-4612-b70a-0d30521ecd5e">
            <conditionExpression>decision=="redo"</conditionExpression>
        </sequenceFlow>
    </process>
@sharky98
Copy link
Author

sharky98 commented Sep 1, 2022

Even after changing the conditional expression to match the name (BPMN ID), I still get that this name is not defined (maybe the PythonScriptEngine is not considering those DataObject?).

So it seems that the part of this issue is really the PythonScriptEngine that does not consider those data when they are put back into the workflow data dict.

The following methods could be updated to add the workflow data? I'll let you check if this is wanted and the right way to implement. Else, I'll look into override this methods for myself.

def evaluate(self, task, expression):
"""
Evaluate the given expression, within the context of the given task and
return the result.
"""
try:
if isinstance(expression, Operator):
# I am assuming that this takes care of some kind of XML
# expression judging from the contents of operators.py
return expression._matches(task)
else:
return self._evaluate(expression, task.data, task=task)
except Exception as e:
raise WorkflowTaskExecException(task,
"Error evaluating expression "
"'%s', %s" % (expression, str(e)))

    def evaluate(self, task, expression):
        """
        Evaluate the given expression, within the context of the given task and
        return the result.
        """
        try:
            if isinstance(expression, Operator):
                # I am assuming that this takes care of some kind of XML
                # expression judging from the contents of operators.py
                return expression._matches(task)
            else:
+                context_data = {**task.data, **task.workflow.data}
+                return self._evaluate(expression, context=context_data, task=task)
-                return self._evaluate(expression, task.data, task=task)
        except Exception as e:
            raise WorkflowTaskExecException(task,
                                            "Error evaluating expression "
                                            "'%s', %s" % (expression, str(e)))

Still left to see what is your idea/stance on the BPMN ID vs Name.

@essweine
Copy link
Contributor

essweine commented Sep 1, 2022

There is a quick answer to your point about bpmn id -> spiff name and bpmn name -> spiff description. Spiff tasks have auto-generated ids, so we don't overwrite those and we use this mapping instead. I agree it is confusing, but it is the same across parsing of all bpmn elements, so we are not going to deviate from it because consistency is important too.

I don't think there is an an easy fix for having to edit the XML. As you note, the camunda UI only gives you access to the reference, not the underlying data object, but it's the underlying object that we really care about.

We're working on our own version on bpmn js (repo here https://github.com/sartography/bpmn-js-spiffworkflow) that handles data objects a little better (we expose the underlying objects and allow multiple references to point to the same object, which camunda does not let you do -- every time you create a data object reference it generates a new object). You might take a look at that and see if it's useful to you.

We should be adding the requested workflow data into the task data automatically -- you should not have to do it yourself. This may be a bug and I will look into it.

@sharky98
Copy link
Author

sharky98 commented Sep 1, 2022

Thank you very much. I understand that it's your internal tool, so I'll bend to your mapping. No issues there then.

As for the modeler, it took me a while to understand how to set it (I was thinking it was a React component at first!), but it seems to be working as expected with the DataObjects.

My only issue is that I needed a "portable" modeler desktop software, so I would have liked to get a plugin for the desktop version. But that'll do for now; I might just either make the modeler into my React app or into the desktop plugin one day.

One last thing, if you can help me figure it out (I am new to this whole BPMN thing and it's driving me crazy this week!), is there a way to define in the DataObject attributes what is the underlying python or typescript class? Like, I want this DataObject "decision" to be of type "string" or "integer" or "MyCustomClass" or "MyCustomEnum", so that both my backend and frontend knows that to expect and provide as input/output. I was thinking using DataObjectReference, but clearly this is not it!

Thanks again! :)

@essweine
Copy link
Contributor

essweine commented Sep 2, 2022

I believe the BPMN way of setting types would be to use an ItemDefinition (https://www.omg.org/spec/BPMN/2.0.2/PDF#G11.91155), but unfortunately we don't support it (we are considering adding it later).

Also, the issue with the process-level variables not being recognized by the gateways is definitely a bug, so I wanted to confirm that.

I was initially going to include your proposed fix, but then I asked myself "if there is a conflict between a task and process variable, which takes precedence?". One solution would be to give one or the other precedence. Another would be to add process variables as a dictionary with a standard name, and they could be distinguish by using eg process['var']. Of course you still run into the problem of conflict with a task variable named process. We have had a LOT of internal discussions on how to handle process vs task data; this is one reason why we have a minimal data object implementation -- we haven't been able to come up with good answers to these questions. We would be interested to hear your take on what you think would be an intuitive solution.

But we will definitely include some fix that allows process variables to used by gateways (and other elements that rely on evaluate, such as DMN tables).

@essweine
Copy link
Contributor

essweine commented Sep 2, 2022

I also should have mentioned that we use the updated bpmn js in a react app (repo here https://github.com/sartography/spiffworkflow-frontend). Also in development, but maybe potentially useful to you.

@sharky98
Copy link
Author

sharky98 commented Sep 3, 2022

Thank you for your answer on this! I'll check on Tuesday when I am back on my project.

We would be interested to hear your take on what you think would be an intuitive solution.

As a dev user, my intuitive solution would be to have each of the task variables and process variables inside a dictionary on their own (process_data: {} and task_data: {}). See below the section "Dictionary".

However, my perspective as a BPMN user (with I don't have deep understanding) would be to keep it as simple as possible. I would put the burden on SpiffWorkflow to figure out which variable to take in the PythonScriptEngine, such as the BPMN user only need to write myVar, whether it is a process or task variable. So the PythonScriptEngine will have to look both dictionary.

However, to deal with collisions, I would let the default PythonScriptEngine throwing an exception, with an explicit message stating something in the line that myVar is present in both the process and the task. See below PythonScriptEngine section.

From there, I see two options.

  • Either SpiffWorkflow deal with explicit process VS task data when provided so that the BPMN user above, when the exception is thrown could change his expression to something like process_data['myVar'] to explicit says to take the process version of the variable.
  • Or we leave the burden on the dev user to implement how to deal with collision by extending the PythonScriptEngine. In that case, it would be ideal to put the collision detection that throws an error in a method by itself, so it is easy to just override it, leaving the rest of the PythonScriptEngine as-is.

Dictionary

The current object would look like this if we put both in the same dictionary. Duplicate key are not possible, so the 'value3' in 'TaskProcess_DataObject' would be overwritten by 'value4'.

{
    [... task other properties]
    'data':
    {
        'Task_DataObject': 'value1',
        'Process_DataObject': 'value2',
        'TaskProcess_DataObject': 'value3',
        'TaskProcess_DataObject': 'value4',
    }
}

My idea would be that the new object would look like this. So, no collision possible.

{
    [... task other properties]
    'process_data':
    {
        'Process_DataObject': 'value2',
        'TaskProcess_DataObject': 'value4',
    },
    'task_data':
    {
        'Task_DataObject': 'value1',
        'TaskProcess_DataObject': 'value3',
    },
}

PythonScriptEngine

In here, we send a context that has the symmetric difference of both set (basicaly, keeping only variables that are not in both sets), the process_data and task_data dictionary under their own key of the context and use that dictionary to evaluate the expression. So, it should cover both implicit without collision and explicit call out of variables by the BPMN user. (a note should be given to the BPMN user that process_data and task_data are reserved names and cannot be used for any DataObject id).

If the expression cannot be evaluated because the variable is not in the merged dict, instead of just throwing that to the user, a check should be made if the variable name is in the task data dictionary. If yes, that mean it is a data collision and we send this to a new method data_collision() that would just throw the collision exception in the base SpiffWorkflow. However, having this set in an explicit methods allows any dev user to override with their own logic.

So, this is my long answer... brain is now going to sleep! I'll check on this on tuesday.

@essweine
Copy link
Contributor

essweine commented Sep 6, 2022

Trying to keep things simple is definitely one of our priorities. We decided early on that sticking with task data was simpler than process data but we really wanted to be able to define inputs to subprocesses, so that was the main impetus behind adding data object support. A secondary concern was the ability to temporarily remove objects from the task data, in order to not have to copy them over and over again if they are not actively being used.

We expected people to just draw arrows from data objects to the bpmn elements that use them in the case where a task needs access to the process data, so I was initially going to suggest drawing an arrow to the gateway or sequence flow that uses the data object, but I discovered that camunda doesn't let you do that. We're going to update bmn.js to allow this, as the bpmn spec indicates that this is perfectly valid.

An alternative solution to your issue would be to leave the data object data in the task until you're done manipulating it and store it in the data object at that point.

Implementing a custom script engine is an interesting idea and this is what I'd advise you to do if our solutions don't meet your needs.

@danfunk
Copy link
Collaborator

danfunk commented Sep 14, 2022

FWIW: I loved the idea of having a Data Object connected to a Gateway, and I put a few hours into making that possible this morning, only to realize that it is prohibited by the BPMN 2.0 schema, and so we can't model it in a valid diagram.

I did some digging in the official specification and found some details on page 345. It sais " The data used for Gateway Conditions MUST have been in a Message that was sent prior to (upstream from) the
Gateway" But then how that data is passed into the Gateway is unclear.

One of the really great features of Data Objects is the ability to dictate what tasks can access that data, and which cannot. But that isn't in any way connected to the BPMN standard. So I am inclined to say that we should provide a way to reference process level variables (Data Objects) from within a Gateway using some key as @sharky98 proposed, and making such access possible and standard across the Execution Engine. At this point, I, for one, would be willing to accept such a pull request.

@essweine essweine linked a pull request Apr 28, 2023 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants