modules/kvs: Fix cut & paste error #969

chu11 · 2017-02-03T19:16:36Z

Initialize check watcher with flux_check_watcher_create() instead
of flux_prepare_watcher_create().

Initialize check watcher with flux_check_watcher_create() instead of flux_prepare_watcher_create().

chu11 · 2017-02-03T19:25:18Z

Saw this while looking at #813 , seems like a cut & paste error?

garlick · 2017-02-03T19:30:25Z

I take it that means the KVS's check watcher was being executed at prepare time. Ugh, I guess that would mean every time the KVS wakes up to do anything, it enters the commit handler at prepare time. Then when there was commit work to do, it also enters the commit handler at check time.

Potentially a very good catch, if I understand!

coveralls · 2017-02-03T19:35:39Z

Coverage increased (+0.01%) to 76.175% when pulling 5e5ee99 on chu11:kvswatchertypo into 2163e13 on flux-framework:master.

codecov-io · 2017-02-03T19:36:47Z

Codecov Report

Merging #969 into master will increase coverage by 0.01%.

@@            Coverage Diff            @@
##           master    #969      +/-   ##
=========================================
+ Coverage   75.89%   75.9%   +0.01%     
=========================================
  Files         152     152              
  Lines       25937   25937              
=========================================
+ Hits        19684   19687       +3     
+ Misses       6253    6250       -3

Impacted Files	Coverage Δ
src/modules/kvs/kvs.c	`80.09% <100%> (+0.12%)`	✅
src/common/libflux/message.c	`83.57% <ø> (ø)`	✅
src/common/libflux/handle.c	`85.67% <ø> (+0.26%)`	✅
src/common/libutil/veb.c	`98.86% <ø> (+0.56%)`	✅

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 2163e13...5e5ee99. Read the comment docs.

garlick · 2017-02-03T21:26:21Z

Tests are passing so I say this one goes in! Thanks @chu11 for finding a nasty one.

chu11 · 2017-02-03T21:56:51Z

Ugh, I guess that would mean every time the KVS wakes up to do anything, it enters the commit handler at prepare time.

Wouldn't it be that it enters the commit handler after the KVS wakes up to do anything? i.e. after the reactor has processed recently incoming KVS messages and starts the next event-loop cycle, at which the prepare callback is called?

Then when there was commit work to do, it also enters the commit handler at check time.

Not sure what you mean by this, sort of conflicts with the prior sentence?

Actually, I'm a little confused now looking at the code again. From my understanding, in the primary event loop (via flux_reactor_run/ev_run) prepare callbacks are invoked before waiting for events. Check callbacks are invoked after waiting for events, but before event callbacks are invoked (assuming all equal priority, which I think that's what the reactor code is doing).

Since the event callbacks would process an actual kvs commit message, the check callback wouldn't accomplish anything until the next time through the event loop. So perhaps we want this to be a prepare watcher so it gets executed a little bit earlier? Renaming the functions would help to clear this up (I saw the "check" variables names and assumed it was supposed to be a check watcher).

Hmm, if the primary commit handler is a "prepare" watcher, I suppose we don't need both prepare watchers.

chu11 · 2017-02-03T21:59:45Z

Oops, my response was sitting here when I went to lunch, then I came back and clicked comment. Didn't know you already merged :-) I should say that I think it can go either way. What I said above may only be a minor possible optimization.

garlick · 2017-02-03T22:54:15Z

I may have misspoken and/or been unclear above. We do need both a prepare and a check watcher.

The way it is supposed to work is

prepare watcher (executing at "end" of event loop, just before blocking) checks if a commit is ready and if so starts idle watcher
if idle watcher is enabled, event loop immediately unblocks and starts processing events again
check watcher (executing after unblocking) processes the commit and stops idle watcher

This prepare/check/idle pattern is described in the libev manual.

I think what you are proposing is to do the commit handling in one prepare watcher, which is effectively what we had going on with "check" initialized as a prepare watcher, so it clearly works. It seems like it doesn't really matter whether the commit is handled before or after the unblocking but I think I'm forgetting something important (pertaining to keeping the event loop responsive to other events) and don't have time to delve in right now.

Maybe an experiment could shed some light?

grondo · 2017-02-03T23:07:00Z

Did this have something to do with natural "commit batching" that was tackled as part of an effort in increasing job throughput awhile back?

…

On Fri, Feb 3, 2017 at 2:54 PM, Jim Garlick ***@***.***> wrote: I may have misspoken and/or been unclear above. We do need both a prepare and a check watcher. The way it is supposed to work is 1. prepare watcher (executing at "end" of event loop, just before blocking) checks if a commit is ready and if so starts idle watcher 2. if idle watcher is enabled, event loop immediately unblocks and starts processing events again 3. check watcher (executing after unblocking) processes the commit and stops idle watcher This prepare/check/idle pattern is described in the libev manual <http://pod.tst.eu/http://cvs.schmorp.de/libev/ev.pod#code_ev_prepare_code_and_code_ev_che> . I think what you are proposing is to do the commit handling in one prepare watcher, which is effectively what we had going on with "check" initialized as a prepare watcher, so it clearly works. It seems like it doesn't really matter whether the commit is handled before or after the unblocking but I think I'm forgetting something important (pertaining to keeping the event loop responsive to other events) and don't have time to delve in right now. Maybe an experiment could shed some light? — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#969 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAtSUsf-PVdPrEiYznwZEoi6C-iHOMNyks5rY7AYgaJpZM4L2s2f> .

garlick · 2017-02-03T23:46:24Z

Related, since the code was restructured in this way to make that happen. I think the question is just whether it makes any difference to finalize the commit in a prepare versus a check watcher. It would be a little simpler to do it the way @chu11 is proposing and since (due to error) that's how it was working anyway when throughput was increased, maybe it's OK?

In fact now that it's been "fixed" I wonder if that's affected performance at all? That might be an experiment to try with the soak test or something.

chu11 · 2017-02-03T23:49:36Z

@garlick I think we're on the same page but perhaps looking at it differently. Here's the loop logic from libev's documentation about how ev_run() works. From my perspective I viewed the prepare function called at the beginning of the loop and not the end.

So a kvs message comes in and is noticed at [A] and the callback handler for it is queued at [B]. The check watcher is called at [C] before the kvs message has been processed. Because of that, we need to go through one more loop iteration for the check watcher to complete the processing.

As an experiment I commented out all the other watchers and set the check watcher to be a prepare one and flux did pass the unit tests. So it does seem to work in principle.

   - Increment loop depth.
   - Reset the ev_break status.
   - Before the first iteration, call any pending watchers.
   LOOP:
   - If EVFLAG_FORKCHECK was used, check for a fork.
   - If a fork was detected (by any means), queue and call all fork watchers.
   - Queue and call all prepare watchers.
   - If ev_break was called, goto FINISH.
   - If we have been forked, detach and recreate the kernel state
     as to not disturb the other process.
   - Update the kernel state with all outstanding changes.
   - Update the "event loop time" (ev_now ()).
   - Calculate for how long to sleep or block, if at all
     (active idle watchers, EVRUN_NOWAIT or not having
     any active watchers at all will result in not sleeping). [D]
   - Sleep if the I/O and timer collect interval say so.
   - Increment loop iteration counter.
   - Block the process, waiting for any events. [A]
   - Queue all outstanding I/O (fd) events. [B]
   - Update the "event loop time" (ev_now ()), and do time jump adjustments.
   - Queue all expired timers.
   - Queue all expired periodics.
   - Queue all idle watchers with priority higher than that of pending events.
   - Queue all check watchers.
   - Call all queued watchers in reverse order (i.e. check watchers first). [C]
     Signals and child watchers are implemented as I/O watchers, and will
     be handled here by queueing them when their watcher gets executed. 
   - If ev_break has been called, or EVRUN_ONCE or EVRUN_NOWAIT
     were used, or there are no active watchers, goto FINISH, otherwise
     continue with step LOOP.
   FINISH:
   - Reset the ev_break status iff it was EVBREAK_ONE.
   - Decrement the loop depth.
   - Return.

garlick · 2017-02-04T00:00:52Z

Uh oh, that's not what I thought was happening. It seems like if prepare comes first then there would be a tendency to get blocked in the event loop before the commit happens... Since kvs subscribes to the heartbeat and is otherwise generally busy, maybe this wouldn't be noticed? Still, that's quite bad.

Almost seems like the test that was in prepare ought to be in the code called from message handlers (e.g. start idle process) and then check should finalize the commit as was originally designed?

I need some time to stare at the code before I say that's right and I'm about to head out, but it does seem like you may have uncovered a problem.

chu11 · 2017-02-04T00:14:32Z

@garlick On the next loop iteration the prepare callback will set the idle watcher because it notices that a kvs commit message came in earlier (on the ready list), so we're guaranteed not to block (per point [D] which I've added above).

So I think everything works per design, it's just a matter of whether the errored way I fixed happened to be faster/better. I'll start a new issue we can track this from there.

modules/kvs: Fix cut & paste error

5e5ee99

Initialize check watcher with flux_check_watcher_create() instead of flux_prepare_watcher_create().

garlick merged commit c44b98b into flux-framework:master Feb 3, 2017

chu11 mentioned this pull request Feb 4, 2017

Investigate if KVS prepare/check watchers should be redone #971

Closed

grondo mentioned this pull request Mar 28, 2017

0.7.0 Release Notes #1019

Closed

chu11 deleted the kvswatchertypo branch June 5, 2021 18:18

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

modules/kvs: Fix cut & paste error #969

modules/kvs: Fix cut & paste error #969

chu11 commented Feb 3, 2017

chu11 commented Feb 3, 2017

garlick commented Feb 3, 2017

coveralls commented Feb 3, 2017

codecov-io commented Feb 3, 2017 •

edited

garlick commented Feb 3, 2017

chu11 commented Feb 3, 2017 •

edited

chu11 commented Feb 3, 2017

garlick commented Feb 3, 2017

grondo commented Feb 3, 2017 via email

garlick commented Feb 3, 2017

chu11 commented Feb 3, 2017 •

edited

garlick commented Feb 4, 2017

chu11 commented Feb 4, 2017

modules/kvs: Fix cut & paste error #969

modules/kvs: Fix cut & paste error #969

Conversation

chu11 commented Feb 3, 2017

chu11 commented Feb 3, 2017

garlick commented Feb 3, 2017

coveralls commented Feb 3, 2017

codecov-io commented Feb 3, 2017 • edited

Codecov Report

garlick commented Feb 3, 2017

chu11 commented Feb 3, 2017 • edited

chu11 commented Feb 3, 2017

garlick commented Feb 3, 2017

grondo commented Feb 3, 2017 via email

garlick commented Feb 3, 2017

chu11 commented Feb 3, 2017 • edited

garlick commented Feb 4, 2017

chu11 commented Feb 4, 2017

codecov-io commented Feb 3, 2017 •

edited

chu11 commented Feb 3, 2017 •

edited

chu11 commented Feb 3, 2017 •

edited