-
Notifications
You must be signed in to change notification settings - Fork 49
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
redesign/fix flux_reduce_t handle #298
Conversation
By the way, as I go through the |
Thanks, sorry I noticed that as well and was fairly surprised that this was broken! I've been experimenting today with a bit different interface that is based on messages as "items", and transparently installs a request handler so that passing messages upstream can happen behind the scenes. Still not quite working but this is what the header file looks like:
|
Interesting idioms! So, this should allow flux_reduce_t users not to have to worry about installing communication callbacks and etc -- hiding that level of details from the users? Now that I think about this, making reduction message-passing-style can also allow you to define the reduction function even outside the calling context... Is this intended? One concern is, of course, requiring the reduction function message-passing-style (only) may incur higher overhead than necessary when in particular it is used within the calling context. In that case, passing IN/OUT of reduction as function arguments would presumably be more efficient. BTW, I am implementing something like this in my collective named reduction on top of the current flux_red_t support. I hid communication details and simplified reduction function callback signature. As a module, the named reduction installs its own control reduction callback which then ends up calling the user-provided (well now hardwared) reduction function. It takes two items (left-hand side and right-hand side) as the inputs and return the reduced item to the left-hand side. I also hid the sink function from the user by this module's control sink callback automatically takes care of this. This probably was possible because the reduction of this style is more or less deterministic. But base reduction support like this, you probably won't be able to rid the need for the sink function... BTW, if you expect significant churns on your core reduction support, I am happy to hold my investigation a bit until you are satisfied with revamping the core reduction support. |
Thanks for having a look @dongahn. I could take a step back and build on this PR adding a manual page and some tests, then after we've got that and your stuff in, take a look again (together) at a new interface? |
I think my plan will be to back off the above message oriented version, and return to something closer to what was there originally, with the following changes:
|
9f7ec3b
to
68736f7
Compare
This works for me as well. I will continue to look at the name reduction support then, and yes these experienced can be combined to generate a new interface! |
The changes look good to me. Adjusting the named reduction to that changes should be straightforward. A question what is the "register callbacks at creation" support though? Is this to hide communication needed for reduction from the users?
In any case, I think hiding communication details for reduction is an excellent direction to pursue. |
I've backed out the changes that hide the communications details from the user for now, but in the recent set of changes, there is a new callback called "forward" which, if defined, is called during a flush to forward items upstream in the TBON. "sink" used to do double duty to consume items on rank 0 and forward them on rank > 0. I thought this way made it a bit clearer. What I have now is pretty easy to port to from the old interface. I've converted modules/live and modules/mon with no trouble. I'll have a manual page shortly - once that's in it would be a good time to ask you to review briefly to make sure I'm on the right track. |
Cool! Thanks @garlick |
OK, just forced a push with the following changes:
Still todo:
|
@garlick: I was wondering about the uniformed timeout issue as well. Because the timer won't start exactly, timeout could be naturally staggered, but still I thought this could be improved. Can I ask what you have in mind as a solution? I was thinking along the line of providing a mechanism to set the timeout value as some linear function of the TBON level. (e.g., 1/level * value) and/or to start the timer when the module gets the first input remotely. |
BTW, just to clarify, do you want to merge this PR once you are done with the unit tests or do you consider this still a PR for review only? IMHO this should be considered to be merged once all is said and done. In any case, if this is to be merged, I will take a look at the PR more closely. |
Yeah, I was pondering two ideas:
Maybe both of these could be implemented, say as FLUX_REDUCE_TIMEDFLUSH and FLUX_REDUCE_ADAPTFLUSH? Ideally I'll complete this sometime today and ask for review. For now, I mainly was asking for a (cursory) review of the API and user semantics described in the man page. |
Also:
|
Just rebased on current master with the following changes:
Still todo - unit test. |
f87a4d2
to
cafbbb5
Compare
I think this is the first iteration I'd call a merge candidate. Added unit tests that use the loop connector, and cover the no-flush-policy and HWMFLUSH policy pretty well. The TIMEDFLUSH needs coverage but maybe I could add that after my week off? The reduce.c stuff was really broken before, and now it is (actually!) working, with improved API, and with documentation. The "mon" module and In case it wasn't too clear from my previous comment, I changed the signature of a public API function A couple of memory leak fixes discussed in #161 were tossed in here as well. I will keep an eye on this tomorrow; after that I will not be able to respond until monday Aug 10. |
Just rebased on current master, updated dictionary, fixed a man page typo, and squashed some commits down further. |
Would you actually be willing to split the unrelated leak fixes and |
Sure will do that. |
Both |
Sorry -- reading top to bottom. Perhaps you already have this with |
OK, here's a rebase on current master plus the following changes:
|
@dongahn, any feedback on the recent changes? I was just thinking maybe the I think what's left is a clear improvement over the reduce.[ch] in the tree now and be merged once @dongahn feels OK with it. |
This looks good to me, a great improvement. I think I would be inclined to merge, if @dongahn gives an ACK, since this is a clear improvement over what is in-tree already. One question, does the |
I think the loop coverage is probably sufficient for now, although I think what you're proposing would be complimentary to the unit testy stuff in the loop test. Maybe a TBD? I could open an issue on that. |
Ok, then I'm inclined to merge if I don't hear any objections |
Sorry folks, I have been very busy with guests and etc. I will try to get to this this afternoon. |
Yes, this looks great to me! I just have one comment -- which I should have raised before, but this just dawned on me. For Perhaps, we can provide an option to set the timeout as (H-h) * timeout instead of the current formula? Come to think of it, though, as we are using linear scaling, either formula can have opposite issues as the tree height gets high. If the user uses a constant timeout regardless of the scale, the passed timeout can become too small at the last tree level with the current formula, and with the inverse formula it can become too large at the root. A better practice may be for the users themselves to adjust the timeout value in accordance with the tree-height parameters, H and h. We can just document this (i.e., encouraging the users to use a height-aware timeout scheme) in the |
Thanks @dongahn! I did provide a getter/setter for the timeout so the user can override the calculated timeout at each level with their own. Also, in the man page for I think the next step is to make timeouts adaptive as in issue #307 but my feeling is that can wait for another time as we have other priorities right now. |
@garlick, want to quickly rebase on top of current master and then I'll push the button? (Just to keep linear history) |
Instead of returning the (now fairly useless) boolean 'treeroot', return the TBON branching factor under "arity".
Return "arity" instead of treeroot. Change type of "rank" and "size" to uint32_t.
It turns out that starting and stopping a timer watcher doesn't reset the timeout back to the original. This is needed for timeouts that need to be started and stopped as is needed in reduce.c
Change flux_red_t to flux_reduce_t and make it refer to an incomplete struct rather than a pointer to one (RFC 7). Rename FLUX_RED_* flags to FLUX_REDUCE_*. Instead of individual setters for reduction ops, create a structure of function pointers that is passed into flux_reduce_create(). Make reduce and sink ops have the same prototype. Add destroy op so that items can be disposed of if sink is not defined. Add forward op. Sink used to do double duty as sink and forward and was always implemented as a conditional with two branches. Ignore "forward" on rank 0, and "sink" on rank != 0. Drop flux_reduce_flush() from public API. Drop flux_redstack_t type and add functions that operate directly on the flux_reduce_t type: flux_reduce_push(), flux_reduce_pop(). Added itemweight op that allows an item to be interrogated as to how many original items it represents. If using the HWM flush policy, this is required (or EINVAL). Only call reduce if >1 items are queued Change timeout to seconds and make the type a double. Stagger the timeouts based on the level in the tree, scaled down at the leaves, full timeout at the root. Since there may be multiple items to reduce at the leaves, avoid setting TIMEDFLUSH timer there to zero. Use the tree height plus one to calculate it. Ensure that the reactor timer is reset between calls. Add flux_reduce_opt_get/set() for accessing - learned high water mark - calculated timeout values - unweighted item count - weighted item count
The live module uses flux_reduce_t to store nodeset into the kvs at the root. Update for the new API.
The loop test is able to exercise all the flush policies without actually instantiating a flux session. It does this by creating a single flux_reduce_t queue, putting items in it, and testing which callbacks are made. For TIMEDWAIT policy, it starts a reactor loop so that timers can fire. Edge cases such as tardy items from earlier batches are covered.
Create a somewhat detailed manual page for flux_reduce_create().
Add a man page for flux_info() that covers the new arguments. Create an example (included in man page) that is compiled as part of make check. The example demonstrates how to calculate tree height from info returned by flux_info().
Add a few words that came up after flux_reduce_create(3) and flux_info(3) were added.
OK, done. I did some minor editing of commit messages also. |
redesign/fix flux_reduce_t handle
This PR is preliminary, not for merging. It's posted as a PR mainly to keep @dongahn up to date with what I'm doing to stuff his work depends on. (He should feel free to redirect me if I am making life harder rather than easier):
flux_red_t
toflux_reduce_t
and make opaque type conformant to flux styleflux_reduce_
flux_redstack_t
and add a standaloneflux_list_t
class instead.flux-mon
command andmon
module (update to modern interfaces)flux_reduce_t
names.See comment at the top of
src/modules/mon/mon.c
for how to demo the mon module/program.Coming soon: man page for reduce functions.