You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Even by adding bcast to this struct, there is no way for the user to configure it, except by configuring service colls themselves. In my opinion, the features you suggest deal with service collectives in general and should be implemented at core ucc level.
Currently I'm facing very similar problem in the collective TL I'm working on. I need to try the design where I have a low-latency barrier syncrhonizaiton across the team, so that all collective participants posted the buffers before senders start sending. As this synchonization will happen in the collective progress function it has to be fast in terms of latency.
I wonder, whether it is possible to use service collectives layer from src/core/ucc_service_coll.h (assuming that I will also add barrier API there), so that I don't need to re-write my own barrier in my TL and just use some other TL for that?
E.g., I would imagine that during the initialization phase of team, the service fast-path context/team is configured, so that it is backed by the UCT TL with fast-path UCT transport (e.g., IB RC) and specific barrier algorithm (e.g., recursive-doubling).
So far I've skimmed over the current implementation of the service collectives and from what I've (mis?)understood they are designed and configured internally for slow-path OOB communication, e.g., connection management within the newly created team.
I would appreciate any advice and ideas here! In case this functionality is somehow missing I would be happy to discuss/contribute 😊
The text was updated successfully, but these errors were encountered:
In the discussion of #900 (comment) PR @samnordmann mentioned:
Currently I'm facing very similar problem in the collective TL I'm working on. I need to try the design where I have a low-latency barrier syncrhonizaiton across the team, so that all collective participants posted the buffers before senders start sending. As this synchonization will happen in the collective progress function it has to be fast in terms of latency.
I wonder, whether it is possible to use service collectives layer from
src/core/ucc_service_coll.h
(assuming that I will also add barrier API there), so that I don't need to re-write my own barrier in my TL and just use some other TL for that?E.g., I would imagine that during the initialization phase of team, the service fast-path context/team is configured, so that it is backed by the UCT TL with fast-path UCT transport (e.g., IB RC) and specific barrier algorithm (e.g., recursive-doubling).
So far I've skimmed over the current implementation of the service collectives and from what I've (mis?)understood they are designed and configured internally for slow-path OOB communication, e.g., connection management within the newly created team.
I would appreciate any advice and ideas here! In case this functionality is somehow missing I would be happy to discuss/contribute 😊
The text was updated successfully, but these errors were encountered: