Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory Usage #127

Closed
IAsmodai opened this issue Sep 13, 2021 · 12 comments
Closed

Memory Usage #127

IAsmodai opened this issue Sep 13, 2021 · 12 comments

Comments

@IAsmodai
Copy link

Hi. Thanks very much for maintaining this amazing library!
I'm currently using SpiffWorkflow to generate logs for a somewhat big BPMN (~1.5 MB). Unfortunately, creating a Workflow from the spec uses way too much memory (exceeding 100 GB easily).

I figured that the problem lies within the gateways, since every node with multiple children creates several distinct paths throughout the workflow (whilst also creating mulitple task instances of the same task-spec) creating a total of >10 bil. instances of tasks in just one Workflow.

Now that I haven't understood the deepest of functionings in SpiffWorkflow: Is it necessary to create those distinct paths from a Gateway up until the end or is there a way to prevent this while maintaining the main functioning?

Thanks you very much!

@danfunk
Copy link
Collaborator

danfunk commented Sep 13, 2021

goodness gracious me. 100GB is a lot of memory.

I'd like to see this BPMN, or some BPMN that demonstrates this problem. We've put a lot of effort in this last year to accommodate some level of recursion within the BPMN diagram. Given this, I don't think every conceivable route is loaded into memory at all times, and we are able to load dozens of diagrams into memory simultaneously in our core applications, using very little (<1gb) of memory.

I wonder, when you open a gateway, do you close it, as in the example below, where the streams come back together?
image

It sounds like you have a very different workflow from what we have tested in the past. I'd love to see something that I could test out and examine how it is being handled.

@IAsmodai
Copy link
Author

Hi Dan,
thanks for your quick response.

We are "closing" gateways somewhen, but mostly pretty far at the end. Our parallel gateways clamp huge brackets in which more brackets are clamped whose contained subflows can interact with subflows of other opened brackets, hence the concept of closing gateways and aligning the streams became obsolete or at least really opaque.

The workflow contains proprietary data, so I will have to censor all descriptions before I can upload it, but I'll be happy to provide it as soon as I've found a way!

@calexh-sar
Copy link

@IAsmodai I have had to edit a BPMN that may similar to what you need to do. I opened it in the desktop version of the Camunda modeler, copied the XML, pasted it into my text editor, did a Search & Replace and then pasted it back into the Modeler. Your situation likely would be more complicated, but just in case.

@IAsmodai
Copy link
Author

Finally back at work.

@calexh-sar Thanks for the idea, I basically did this and regex-deleted all occurences of name="[.*]"

@danfunk model.txt contains a small part of our bpmn (I unfortunately cannot attach bpmns, but renaming should seal the deal), which demonstrates the problem. Even this one, which only contains a fifth of the big model, crushes my memory.
I'd be very grateful, if you can look into this, thank you very much for your interest!

Some explanation on why the bpmn follows unordinary paradigms in design: We are researching the flow of information in a collaborative setting (partners modeled in different lanes) where every piece of data is modeled as a task (so we can use SpiffWorkflow to analyse it). Every colored node represents some piece of information which in some combination flow into a white task representing an actual task.

@calexh-sar
Copy link

@IAsmodai I did a bit of reformatting of a small bit of your .bpmn file to get a better picture of what was going on in there and found a few places where there may be problems caused by the diagram. I attached a callout to each instance with my potential concern. I suspect some may be due to the fact that you only sent part of the total diagram. Regardless, we discussed your situation in our scrum this morning and concluded that fixing any issues like this might be just the tip of the iceberg. Dan is going to respond as well with an update on the more technical side of that discussion.

model.txt

@danfunk
Copy link
Collaborator

danfunk commented Sep 17, 2021

Foundational to SpiffWorkflow is it's ability to look ahead and predict what tasks will likely happen next, and what tasks can be completed in parallel. The intention is to parse well formed and logically grounded BPMN diagrams in order to drive a software application. We are targeting audiences that would use this to build online workflow systems that step people through complex choices and to allow passing control between different types of users. We are also working to support complex data pipelines where each task can manipulate data as it courses through the system.

What you appear to be modeling aren't tasks, but the relationship between different pieces of data. I am not familiar with using BPMN in this way, and I'm not sure BPMN is the right tool for modeling relationships between difference pieces of data. I would tend to model as tasks, the things that happen /to/ data, while the data is passed into and then out of that task.

Currently we parse the complete BPMN model, and look out across all possible paths, and then allow you to progress through this with a constant understanding of what nodes are active and available at any point.

If I understand correctly, what you are looking to do is parse the XML from this diagram, and log what is happening in it, to make observations about the structure and what it means. I think you might be best served by writing your own XML Parser that will run through this diagram and just spit out what is happening, making whatever observations are appropriate to your use case.

@danfunk danfunk closed this as completed Sep 17, 2021
@IAsmodai
Copy link
Author

Hi guys,

many thanks for the extensive support!

@calexh-sar Unfortunately, due to our purpose with the graph, we are not able to make any changes to the bpmn, since it wouldn't be an adequate representation of the reality.

@danfunk XML parsing myself is exactly what I tried to avoid :D

After some time, I finally found a way of modifying SpiffWorkflow such that it fits my purpose and isn't too RAM consuming.
Basically, I modified the way a workflow is being built by creating tasks exactly once and connecting them in a way just like the workflow spec is doing it instead of creating multiple threads and a task tree (only requiring another change in a task to be able to have multiple parents):

class Workflow(object):
    def __init__(self, workflow_spec, **kwargs):
        assert workflow_spec is not None
        self.spec = workflow_spec
        
        # create an instance of every task
        self.tasks = []
        for task_spec in list(self.spec.task_specs.values()):
            self.tasks.append(Task(self, task_spec))
        
        #connect the tasks
        for t in self.tasks:
            for c in t.task_spec.outputs:
                t.children.append(self.get_task_by_task_spec(c))
            for p in t.task_spec.inputs:
                t.parents.append(self.get_task_by_task_spec(p))

All I then have to modify is how the state of a task has to be updated. For example, if one task gets completed, it just tells all of its children that it got completed and all of their children check based on all of their parents, which are then set to ready iff all of their parents are either completed or cancelled:

class Task(object):
    def complete(self):
        self.set_state(self.COMPLETED)
        
        for c in self.children:
            c._update_state()

    def _update_state(self):
        ready = True
        
        for c in self.parents:
            if c.state < self.COMPLETED:
                ready = False
                break
        
        if ready:
            self.set_state(self.READY)

note: _update_state only contains READY as a prestate of COMPLETED and CANCELLED as I am not using LIKELY, MAYBE or WAITING, but they could be extended.

As far as it concerns my purposes, I can still look ahead as much as I need. I don't know if this logic is viable for everything you are implementing in SpiffWorkflow, but I thought I'd leave it here as:

  • memory usage down from >> 100 TB to a nice 400 MB (I actually calculated an estimate by counting the needed task instances and multiplying them with the memory of one instance)
  • run time in smaller workflows down by ~90%

@danfunk
Copy link
Collaborator

danfunk commented Sep 28, 2021

@IAsmodai - we fully empathize with the issue you are seeing. You are absolutely right that the tree structure is out of hand, and needs to be refactored. I would love to see the kind of improvements you are touting. It would help a lot of if you could please submit this as a pull request. I wonder if any of the automated tests pass when you make this change? It is possible to serialize and deserialize the workflow? Even if it was partially broken, a pull request could help me see the ramifications and we could work together on it.

@danfunk danfunk reopened this Sep 28, 2021
@calexh-sar
Copy link

@IAsmodai to make sure I was clear in my comments, from what I am seeing I suspect there are some errors in your BPMN that may be causing some, but definitely not all, of the issues. Understand you do not want to make any changes that would alter the "adequate representation of the reality", but I suspect you may have some alterations going on in some places currently due to incorrect BPMN that is not executing as you might be expecting it to do.

@IAsmodai
Copy link
Author

IAsmodai commented Sep 30, 2021

@calexh-sar I'm sorry, my last response didn't really give a holistic explanation on why the bpmn has to stay the way it is.
You are absolutely right that many points in the bpmn aren't designed in the usual paradigms. Our model has two purposes:

  1. Give a graphical representation of what's actually happening in the whole process for everybody to take a look
  2. Giving the basis for analysing the process with SpiffWorkflow and Process Mining

I don't want to interfere with the design as long as I can use it in Spliff. Unusual design aspects don't make any sense in bpmn-language, but have to be there for the primary purpose.
For example, a task without an ingoing flow may describe an information needed for an actual task, which doesn't "really" have a source, it might just be some information coming from openly accessible databases or internal calculations that every employee can access and doesn't have to be transfered from another person. A node without an outgoing flow might be a secondary result of a task that isn't required anywhere else (or a connection is yet to be added).
Parallel gateways with a single outgoing node however are simply a product of my unprecise reduction of the complete model.

As far as it concerns SpiffWorkflow, parsing this bpmn is "working". Mulitple outgoing for example nodes are a more serious problem which had to (and fortunately could) be corrected before. The current model can be imported. The only known problem is that nodes without ingoing flows aren't included in the workflow spec and can't be regarded in any analysis, which however doesn't concern my purposes.

@danfunk I'd love to help enhancing SpiffWorkflow!
I will have to look deeper in the current working and understand the full extent of changing the basic implementation.
For now, you can surely take a look at the current state, but those are really slimmed versions as I only implemented the functions I require (a pull request would be nothing but a farce):
workflow.txt
task.txt

Additionally, as I found a solution for our purposes, it is no longer part of my job and I will have to do it in my currently sparse free time. Hence, please don't worry if some months pass, I will get back to you as soon as I have questions or I have a product with the proposed working capable of handling the tests :)

@danfunk danfunk closed this as completed Oct 4, 2021
@calexh-sar
Copy link

@IAsmodai we have done some recent refactoring that might have helped with your situation. Would be interested to know if you see any improvement.

@IAsmodai
Copy link
Author

IAsmodai commented Feb 8, 2022

Hi @calexh-sar,

thanks for the update!
Unfortunately, bigger bpmns still crush my memory. As far as I can see, the problem lies within the workflow creating multiple task instances for single task_specs. The task_mapping (new?) is really helpful for looking into this. Using more basic bpmns with only 60 BPMN-elements, there are up to 270 task instances per task_spec.

I still did not find the time to look into expanding my proposed way of creating workflows and testing it, but will get back to you when I did find the time!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants