-
Notifications
You must be signed in to change notification settings - Fork 312
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Memory Usage #127
Comments
Hi Dan, We are "closing" gateways somewhen, but mostly pretty far at the end. Our parallel gateways clamp huge brackets in which more brackets are clamped whose contained subflows can interact with subflows of other opened brackets, hence the concept of closing gateways and aligning the streams became obsolete or at least really opaque. The workflow contains proprietary data, so I will have to censor all descriptions before I can upload it, but I'll be happy to provide it as soon as I've found a way! |
@IAsmodai I have had to edit a BPMN that may similar to what you need to do. I opened it in the desktop version of the Camunda modeler, copied the XML, pasted it into my text editor, did a Search & Replace and then pasted it back into the Modeler. Your situation likely would be more complicated, but just in case. |
Finally back at work. @calexh-sar Thanks for the idea, I basically did this and regex-deleted all occurences of name="[.*]" @danfunk model.txt contains a small part of our bpmn (I unfortunately cannot attach bpmns, but renaming should seal the deal), which demonstrates the problem. Even this one, which only contains a fifth of the big model, crushes my memory. Some explanation on why the bpmn follows unordinary paradigms in design: We are researching the flow of information in a collaborative setting (partners modeled in different lanes) where every piece of data is modeled as a task (so we can use SpiffWorkflow to analyse it). Every colored node represents some piece of information which in some combination flow into a white task representing an actual task. |
@IAsmodai I did a bit of reformatting of a small bit of your .bpmn file to get a better picture of what was going on in there and found a few places where there may be problems caused by the diagram. I attached a callout to each instance with my potential concern. I suspect some may be due to the fact that you only sent part of the total diagram. Regardless, we discussed your situation in our scrum this morning and concluded that fixing any issues like this might be just the tip of the iceberg. Dan is going to respond as well with an update on the more technical side of that discussion. |
Foundational to SpiffWorkflow is it's ability to look ahead and predict what tasks will likely happen next, and what tasks can be completed in parallel. The intention is to parse well formed and logically grounded BPMN diagrams in order to drive a software application. We are targeting audiences that would use this to build online workflow systems that step people through complex choices and to allow passing control between different types of users. We are also working to support complex data pipelines where each task can manipulate data as it courses through the system. What you appear to be modeling aren't tasks, but the relationship between different pieces of data. I am not familiar with using BPMN in this way, and I'm not sure BPMN is the right tool for modeling relationships between difference pieces of data. I would tend to model as tasks, the things that happen /to/ data, while the data is passed into and then out of that task. Currently we parse the complete BPMN model, and look out across all possible paths, and then allow you to progress through this with a constant understanding of what nodes are active and available at any point. If I understand correctly, what you are looking to do is parse the XML from this diagram, and log what is happening in it, to make observations about the structure and what it means. I think you might be best served by writing your own XML Parser that will run through this diagram and just spit out what is happening, making whatever observations are appropriate to your use case. |
Hi guys, many thanks for the extensive support! @calexh-sar Unfortunately, due to our purpose with the graph, we are not able to make any changes to the bpmn, since it wouldn't be an adequate representation of the reality. @danfunk XML parsing myself is exactly what I tried to avoid :D After some time, I finally found a way of modifying SpiffWorkflow such that it fits my purpose and isn't too RAM consuming.
All I then have to modify is how the state of a task has to be updated. For example, if one task gets completed, it just tells all of its children that it got completed and all of their children check based on all of their parents, which are then set to ready iff all of their parents are either completed or cancelled:
note: _update_state only contains READY as a prestate of COMPLETED and CANCELLED as I am not using LIKELY, MAYBE or WAITING, but they could be extended. As far as it concerns my purposes, I can still look ahead as much as I need. I don't know if this logic is viable for everything you are implementing in SpiffWorkflow, but I thought I'd leave it here as:
|
@IAsmodai - we fully empathize with the issue you are seeing. You are absolutely right that the tree structure is out of hand, and needs to be refactored. I would love to see the kind of improvements you are touting. It would help a lot of if you could please submit this as a pull request. I wonder if any of the automated tests pass when you make this change? It is possible to serialize and deserialize the workflow? Even if it was partially broken, a pull request could help me see the ramifications and we could work together on it. |
@IAsmodai to make sure I was clear in my comments, from what I am seeing I suspect there are some errors in your BPMN that may be causing some, but definitely not all, of the issues. Understand you do not want to make any changes that would alter the "adequate representation of the reality", but I suspect you may have some alterations going on in some places currently due to incorrect BPMN that is not executing as you might be expecting it to do. |
@calexh-sar I'm sorry, my last response didn't really give a holistic explanation on why the bpmn has to stay the way it is.
I don't want to interfere with the design as long as I can use it in Spliff. Unusual design aspects don't make any sense in bpmn-language, but have to be there for the primary purpose. As far as it concerns SpiffWorkflow, parsing this bpmn is "working". Mulitple outgoing for example nodes are a more serious problem which had to (and fortunately could) be corrected before. The current model can be imported. The only known problem is that nodes without ingoing flows aren't included in the workflow spec and can't be regarded in any analysis, which however doesn't concern my purposes. @danfunk I'd love to help enhancing SpiffWorkflow! Additionally, as I found a solution for our purposes, it is no longer part of my job and I will have to do it in my currently sparse free time. Hence, please don't worry if some months pass, I will get back to you as soon as I have questions or I have a product with the proposed working capable of handling the tests :) |
@IAsmodai we have done some recent refactoring that might have helped with your situation. Would be interested to know if you see any improvement. |
Hi @calexh-sar, thanks for the update! I still did not find the time to look into expanding my proposed way of creating workflows and testing it, but will get back to you when I did find the time! |
Hi. Thanks very much for maintaining this amazing library!
I'm currently using SpiffWorkflow to generate logs for a somewhat big BPMN (~1.5 MB). Unfortunately, creating a Workflow from the spec uses way too much memory (exceeding 100 GB easily).
I figured that the problem lies within the gateways, since every node with multiple children creates several distinct paths throughout the workflow (whilst also creating mulitple task instances of the same task-spec) creating a total of >10 bil. instances of tasks in just one Workflow.
Now that I haven't understood the deepest of functionings in SpiffWorkflow: Is it necessary to create those distinct paths from a Gateway up until the end or is there a way to prevent this while maintaining the main functioning?
Thanks you very much!
The text was updated successfully, but these errors were encountered: