`limiter_node` fails to decrement/open with `continue_msg` yet works with `int` #419

diablodale · 2021-05-27T15:15:10Z

Using the default limiter_node and sent continue_msg() via an edge to the limiter.decrementer() does not decrement and open again the limiter node. It stays closed and no messages pass.

I made a single code change to limiter_node<T, int> and sent (int)1 via the edges. Now I have no issues. The limiter_node works as I expect. 🤔

Related to discussion in #342 (comment)

Setup

Windows 10 64-bit, Debug and Release compiles
Microsoft VS2019 v16.8.4 through v16.10.0 Community
one api tbb v2021.1.1 and v2021.2.0

Repo

I acknowledge a full compliable code example is best. My bandwidth is limited at the moment. Until something else exists...

Setup as above
Create graph with various node types including multifunction nodes, and put a limiter node near the entrance of the flowgraph.
create edges connecting the nodes and on the multifunction nodes some edges going back to the limiter's decrementer with associated try_put() in their lambda.
Set limiter_node<myClass>>(flowgraph, 2)
Set some of the nodes to tbb::flow::serial while others to 3
Run your graph

Result

About 2 or 3 messages will traverse the graph. Then no more.

Expected

Never ending flow of messages.

Workaround

The following workaround has been used for 3 months with 100% success. Across multiple VS versions, and both tbb v2021.1.1 and v2021.2.0

Change to limiter_node<myClass, int>>(flowgraph, 2)
update multifunction_node out tuples, and change all try_put(continue_msg()) to try_put(1)

Notes

I continue to be suspicious about the TBB threshold_regulator code in _flow_graph_body_impl. The specialization for continue_msg has execute() but doesn't have try_put_task() with a decrement like its sibling specialization. Missing code? Race condition?

The text was updated successfully, but these errors were encountered:

aleksei-fedotov · 2021-05-31T09:02:00Z

Hi @diablodale. Thank you for opening up a separate issue to track this!

The specialization for continue_msg inherits continue_receiver, which has try_put_task() that calls the execute() method, which, in turn, calls limiter_node<>::decrement_counter() so that it adjusts the gate accordingly.

<..> on the multifunction nodes some edges going back to the limiter's decrementer with associated try_put() in their lambda.

Please make sure multifunction nodes indeed send messages to the output ports that are connected to the limiter_node's decrementer.

I agree that this looks suspicious, since with the integral specialization of the limiter_node everything works on your side.
Could you please provide a reproducer then?

diablodale · 2021-05-31T17:29:56Z

I have a reproducer with bulk private code removed. The framework/flow is still there which I would prefer not be public.
Do you have a process by which I can provide the reproducer privately to a verified Intel employee and Intel to keep it confidential ala NDA?

aleksei-fedotov · 2021-06-01T09:51:02Z

I have an idea what can go wrong - have you considered buffering policy of the limiter_node's predecessor(s) in your graph? Perhaps, some of the rejected messages by the limiter_node are simply lost?

Regarding the reproducer, you can use official inteltbbdevelopers@intel.com address or send the reproducer directly to me - my email address should be written on my GitHub account page. Please let me know if it is not the case. Wherever you ask, I am sure your IP will be treated appropriately since we have strict guidelines and trainings about working with third-party code in order not to leak its IP be that customer or company-owned one.

diablodale · 2021-06-02T00:43:41Z

Intel asks if I considered buffering policy?

Yes. I want to drop messages. My scenario is a graph that processes sensor frame data. If n number of frames are still within the graph, then the limiter should drop incoming frames. It is more important to have current frames and drop excess frames until the graph completes frames already inflight.

While further reducing, I got to a point I could run generate a trace for Flow Graph Analyzer. The graph chart looks correct. But the Graph rule-check is reporting errors that I do not see in code. And the node inspector is reporting incorrect types on outlets even though it is clearly correct in the template params. This seems to be bugs with the Flow Graph collector and analyzer. I won't discuss this further in this bug thread.

I'm sending an email to inteltbbdevelopers@intel.com with a ZIP that contains this setup, cmake, and source files.

Setup

Intel i7-10875h (8 cores/16 threads)
Windows 10 64-bit
Microsoft VS2019 v16.10.0 Community
oneapi tbb v2021.2.0 installed from the oneapi web installer. Please note, your installer, the folders the installer makes, github, and documentation call this 2021.2.0. This is inconsistent with the PACKAGE_VERSION 2021.2.2.0 inside C:\Program Files (x86)\Intel\oneAPI\tbb\2021.2.0\lib\cmake\tbb\TBBConfigVersion.cmake that the installer saves to the drive. I recommend future versions of TBB correct this inconsistency so that find_package(TBB x.y.z) will work correctly. For example, given this existing version, the TBBConfigVersion.cmake should contain PACKAGE_VERSION 2021.2.0 or PACKAGE_VERSION 2021.2.0.0

Repro steps

Setup as above. I recommend you do it on a 8/16 cpu so that we have the same thread/pools
cmake config
copy tbb12.* and tbb12_debug.* to your build location
At the top of pipeline_oneapi_tbb.h, set LIMIT_WITH_INT to your desired value. Start with the default.
cmake build
run onetbb-419.exe
Wait 10 seconds for the exe to stop itself.

Good Result 🙂

Output showing a cascade of capture -> put -> get. These cascades should continue all 10 seconds.

Bad Result 😢

Output showing a cascade of capture -> put but less than 20 get.

Notes

Run the above repro steps with both the default LIMIT_WITH_INT 1 and LIMIT_WITH_INT 0. This controls if the limiter_node uses an int or continue_msg. The behaviors are demonstratively different! 🤔

LIMIT_WITH_INT 1 often produces the good result. However, with this simplified code, I now experience the bad result occasionally. And in this bad scenario, only one get occurs.

LIMIT_WITH_INT 0 produces the bad result. In this bad result, there are usually 1-25 get and then no more until the program ends.

These behaviors makes me suspect a race condition somewhere. And the simplied repro code exposes more bad scenarios that my real code with heavier/longer computation in the nodes doesn't expose.

diablodale · 2021-06-18T12:39:48Z

Hi. Checking back here to confirm that you received my email with the repo code. I sent it on the same day as above, 2 June 2021.

aleksei-fedotov · 2021-06-18T14:45:27Z

Yes, we received it and did the analysis. We found two issues:

The documentation of the limiter_node is vague about the behavior of its specialization for the continue_msg type for the decrementer's template parameter. We are working on improving that part in this PR.
The calls to graph::reset() method are done in concurrent environment, which clears all the buffers and hence leaves the graph in uncertain condition wrt what messages are processed by what node and what messages are not and will not be. Therefore, graph::reset() should be called only in serial part of the algorithm.

diablodale · 2021-06-18T15:35:41Z

In the sample code I provided, there is only one call to graph::reset(). And it is in the Pipeline() constructor, which naturally is serial itself. The constructor is triggered at the start of main() which naturally is also serial.

And am I right to assume that your (2) above is alluding that within that constructor I call taskArena.enque( []{ flowgraph.reset() } )? And that is before any other tasks or messages are created. And that it could be a race condition if somehow the first line of that enqueued task is somehow run out of sequence due to new tasks being created by code that runs after this constructor?

The intention of the reset() is to (re)attach the graph to a specific task arena (for independent threads) and specific task group (for graph shutdown). I'm not aware of any other constructor arguments or apis that will associate a flowgraph with a specific arena. This is why I call taskArena.enque( []{ flowgraph.reset() } )

A long time ago, I looked at task_arena::execute(). But the documentation writes that it first tries to directly execute but if that fails "wraps the functor into a task, enqueues it into the arena...". This behavior was indeterminate and therefore I chose to use task_arena::enqueue() so that I always have the same behavior.

If I can guarantee execute() never returns until after the reset(), then I could split the lambda and run just the reset() in execute() and run the rest of the lambda in an enqueue(). But if execute() uses its internal enqueue() code, then we are back to the same bug scenario.

Is there some other way to achieve the same goal? To have a unique arena, and to create a flowgraph associated with it?

I understand what you are describing in (1). That approach would add unneeded complexity in my solution. So I will continue to use an integer decrementer.

aleksei-fedotov · 2021-06-18T16:20:24Z

The enqueuing happens only if all available slots are occupied by other threads. Since task_arena::execute() is executed in serial environment, you may be sure it won't fallback enqueuing.

diablodale · 2021-06-18T18:23:09Z

ok. I read the (1) doc update in the PR. That looks good.

I think there is value in adding a sentence or two...somewhere...that describes the one-and-only-way to assign a flowgraph to a specific arena and that the approach using reset() must be done within task_arena::execute(). Since this is the only reliable way, I believe clear doc will help those like me in the future. Since it has to do with arena and flowgraph, it might be best served if on one or the other page for those classes.

aleksei-fedotov · 2021-06-24T02:52:45Z

Yeah, definitely this information should be written somewhere. Not sure about the specification, but in the docs for sure.

BTW, what do you think if constructor of the graph will allow specifying an arbitrary task arena it will attach its execution to? Will that be useful and less error-prone for you?

diablodale · 2021-06-24T14:53:42Z

Perhaps "Flow Graph Tips and Tricks" section?

I also was thinking this one-and-only-way seemed an API itself rather than a documented strict set of API calls. 😏

I (re)read articles, now having somewhat more TBB experience, and I can imagine three approaches I would explore to achieve my goal. I can imagine the generic desire of "I want to run my flow graph in an explicit grouping of arena, task group, or isolation".

explicit task_arena. This is my brute-force approach I currently use. Docs states this is the low-level scheduler api. Yes, it would be helpful to have a task_arena param on a graph constructor. I would use graph(task_group_context, task_arena)
explicit task_group. This is documented as the high-level scheduler api. Makes me wonder if I might could use this.
this_task_arena::isolate(). This is documented as an isolation approach to use for correctness vs. performance (via task_arena).

I do not have enough experience to know if 2. and 3. make sense with a flow graph. But it is something I would explore given time and an API.

My specific scenario is to have some TLS isolation; preventing accidents, not malicious attacks. I use OpenCL via the OpenCV sdk. The OpenCV sdk has an api to choose the "current OpenCL gpu"; the impl stores it in TLS. Imagine now my code and some 3rd party code that both use OpenCV and perhaps also TBB. My code calls the OpenCV api in the task_scheduler_observer to set the "current OpenCL gpu" to my Intel gpu. Sometime later the 3rd party calls the same OpenCV api to set the "current OpenCL gpu" to my Nvidia gpu. We live in a heterogenous world. 🙂

The OpenCV api to select the gpu is expensive...and...it has tremendous amounts of state. It is important that the tasks in my flow graph don't have their gpu changed. When threads runs tasks in my flow graph, it must be guaranteed the OpenCV gpu choice stored in TLS was not accidentally changed by a 3rd party calling the same OpenCV apis. I used the brute-force approach of a dedicated task_arena so that I am guaranteed separate threads and therefore no TLS data can possibly be shared between my code and all 3rd parties.

isaevil · 2022-10-06T14:15:41Z

@diablodale please take a look at this documentation page https://oneapi-src.github.io/oneTBB/main/tbb_userguide/attach_flow_graph_to_arena.html. This page was added as the result of #785.

We also have created #932 to track request for usability improvement of flow_graph to task_arena attachment.
I am closing that issue. Feel free to reopen if any questions are left.

isaevil mentioned this issue Oct 6, 2022

Improve usability of flow_graph to task_arena attachment #932

Open

isaevil closed this as completed Oct 6, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`limiter_node` fails to decrement/open with `continue_msg` yet works with `int` #419

`limiter_node` fails to decrement/open with `continue_msg` yet works with `int` #419

diablodale commented May 27, 2021

aleksei-fedotov commented May 31, 2021 •

edited

diablodale commented May 31, 2021

aleksei-fedotov commented Jun 1, 2021

diablodale commented Jun 2, 2021

diablodale commented Jun 18, 2021

aleksei-fedotov commented Jun 18, 2021

diablodale commented Jun 18, 2021 •

edited

aleksei-fedotov commented Jun 18, 2021

diablodale commented Jun 18, 2021 •

edited

aleksei-fedotov commented Jun 24, 2021

diablodale commented Jun 24, 2021

isaevil commented Oct 6, 2022

limiter_node fails to decrement/open with continue_msg yet works with int #419

limiter_node fails to decrement/open with continue_msg yet works with int #419

Comments

diablodale commented May 27, 2021

Setup

Repo

Result

Expected

Workaround

Notes

aleksei-fedotov commented May 31, 2021 • edited

diablodale commented May 31, 2021

aleksei-fedotov commented Jun 1, 2021

diablodale commented Jun 2, 2021

Setup

Repro steps

Good Result 🙂

Bad Result 😢

Notes

diablodale commented Jun 18, 2021

aleksei-fedotov commented Jun 18, 2021

diablodale commented Jun 18, 2021 • edited

aleksei-fedotov commented Jun 18, 2021

diablodale commented Jun 18, 2021 • edited

aleksei-fedotov commented Jun 24, 2021

diablodale commented Jun 24, 2021

isaevil commented Oct 6, 2022

`limiter_node` fails to decrement/open with `continue_msg` yet works with `int` #419

`limiter_node` fails to decrement/open with `continue_msg` yet works with `int` #419

aleksei-fedotov commented May 31, 2021 •

edited

diablodale commented Jun 18, 2021 •

edited

diablodale commented Jun 18, 2021 •

edited