New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
modules/kvs: Fix cut & paste error #969
Conversation
Initialize check watcher with flux_check_watcher_create() instead of flux_prepare_watcher_create().
Saw this while looking at #813 , seems like a cut & paste error? |
I take it that means the KVS's check watcher was being executed at prepare time. Ugh, I guess that would mean every time the KVS wakes up to do anything, it enters the commit handler at prepare time. Then when there was commit work to do, it also enters the commit handler at check time. Potentially a very good catch, if I understand! |
Codecov Report@@ Coverage Diff @@
## master #969 +/- ##
=========================================
+ Coverage 75.89% 75.9% +0.01%
=========================================
Files 152 152
Lines 25937 25937
=========================================
+ Hits 19684 19687 +3
+ Misses 6253 6250 -3
Continue to review full report at Codecov.
|
Tests are passing so I say this one goes in! Thanks @chu11 for finding a nasty one. |
Wouldn't it be that it enters the commit handler after the KVS wakes up to do anything? i.e. after the reactor has processed recently incoming KVS messages and starts the next event-loop cycle, at which the prepare callback is called?
Not sure what you mean by this, sort of conflicts with the prior sentence? Actually, I'm a little confused now looking at the code again. From my understanding, in the primary event loop (via Since the event callbacks would process an actual kvs commit message, the check callback wouldn't accomplish anything until the next time through the event loop. So perhaps we want this to be a prepare watcher so it gets executed a little bit earlier? Renaming the functions would help to clear this up (I saw the "check" variables names and assumed it was supposed to be a check watcher). Hmm, if the primary commit handler is a "prepare" watcher, I suppose we don't need both prepare watchers. |
Oops, my response was sitting here when I went to lunch, then I came back and clicked comment. Didn't know you already merged :-) I should say that I think it can go either way. What I said above may only be a minor possible optimization. |
I may have misspoken and/or been unclear above. We do need both a prepare and a check watcher. The way it is supposed to work is
This prepare/check/idle pattern is described in the libev manual. I think what you are proposing is to do the commit handling in one prepare watcher, which is effectively what we had going on with "check" initialized as a prepare watcher, so it clearly works. It seems like it doesn't really matter whether the commit is handled before or after the unblocking but I think I'm forgetting something important (pertaining to keeping the event loop responsive to other events) and don't have time to delve in right now. Maybe an experiment could shed some light? |
Did this have something to do with natural "commit batching" that was
tackled as part of an effort in increasing job throughput awhile back?
…On Fri, Feb 3, 2017 at 2:54 PM, Jim Garlick ***@***.***> wrote:
I may have misspoken and/or been unclear above. We do need both a prepare
and a check watcher.
The way it is supposed to work is
1. prepare watcher (executing at "end" of event loop, just before
blocking) checks if a commit is ready and if so starts idle watcher
2. if idle watcher is enabled, event loop immediately unblocks and
starts processing events again
3. check watcher (executing after unblocking) processes the commit and
stops idle watcher
This prepare/check/idle pattern is described in the libev manual
<http://pod.tst.eu/http://cvs.schmorp.de/libev/ev.pod#code_ev_prepare_code_and_code_ev_che>
.
I think what you are proposing is to do the commit handling in one prepare
watcher, which is effectively what we had going on with "check" initialized
as a prepare watcher, so it clearly works. It seems like it doesn't really
matter whether the commit is handled before or after the unblocking but I
think I'm forgetting something important (pertaining to keeping the event
loop responsive to other events) and don't have time to delve in right now.
Maybe an experiment could shed some light?
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#969 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAtSUsf-PVdPrEiYznwZEoi6C-iHOMNyks5rY7AYgaJpZM4L2s2f>
.
|
Related, since the code was restructured in this way to make that happen. I think the question is just whether it makes any difference to finalize the commit in a prepare versus a check watcher. It would be a little simpler to do it the way @chu11 is proposing and since (due to error) that's how it was working anyway when throughput was increased, maybe it's OK? In fact now that it's been "fixed" I wonder if that's affected performance at all? That might be an experiment to try with the soak test or something. |
@garlick I think we're on the same page but perhaps looking at it differently. Here's the loop logic from libev's documentation about how So a kvs message comes in and is noticed at As an experiment I commented out all the other watchers and set the check watcher to be a prepare one and flux did pass the unit tests. So it does seem to work in principle.
|
Uh oh, that's not what I thought was happening. It seems like if prepare comes first then there would be a tendency to get blocked in the event loop before the commit happens... Since kvs subscribes to the heartbeat and is otherwise generally busy, maybe this wouldn't be noticed? Still, that's quite bad. Almost seems like the test that was in prepare ought to be in the code called from message handlers (e.g. start idle process) and then check should finalize the commit as was originally designed? I need some time to stare at the code before I say that's right and I'm about to head out, but it does seem like you may have uncovered a problem. |
@garlick On the next loop iteration the prepare callback will set the idle watcher because it notices that a kvs commit message came in earlier (on the So I think everything works per design, it's just a matter of whether the errored way I fixed happened to be faster/better. I'll start a new issue we can track this from there. |
Initialize check watcher with flux_check_watcher_create() instead
of flux_prepare_watcher_create().