Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KVS more oom() fixes #1128

Merged
merged 6 commits into from Jul 28, 2017

Conversation

Projects
None yet
5 participants
@chu11
Copy link
Contributor

chu11 commented Jul 26, 2017

These are fixups for two remaining oom() paths left in the KVS. These were trickier and required some additional logical changes, so I wanted to put them in another PR.

In wait_runqueue(), the primary change that had to occur to return an error to the user is that either all of the elements would be copied off of the queue, or none of them. We can't have an error occurring in the middle while we remove elements off one queue and put them on another. There's no way to get back to what you had before. Performance wise, this is effectively a change from 2 loops to 3 loops.

There are several catastrophic-like scenarios in the main KVS code if wait_runqueue() fails. I chose not to assert() them. But maybe I should have. I went back and forth on it. Dunno what others might think.

In fence_merge() there was a similar issue of how to deal with an error in the middle. In fence_merge(), originally fence names & operations were modified while iterating through fences. So if an error occurred at some point, how do you get back to what was there before? The function has been modified to make copies, modify those copies, and if everything succeeds, then store the result. So there are some extra allocations now.

In commit_mgr_merge_ready_commits(), there was an issue of how to deal with a commit entry on a queue that was popped off, but what if you can't push it back on due to ENOMEM. Through use of the zlist iterators, I can just remove the pop & push altogether, which removes any chance of an oom(). I think using the iterators and removing elements on the zlist was not allowed before (which is probably why the function was written that way), but in newer zlists it is.

I ran this through soak.sh -j 1000 on master and the branch on hype2, and the performance was actually faster on the branch. So I think we can consider it basically a wash. The numbers are comparable to what was in PR #1105.

And with that, no more oom() in the KVS.

@coveralls

This comment has been minimized.

Copy link

coveralls commented Jul 26, 2017

Coverage Status

Coverage decreased (-0.2%) to 77.925% when pulling aaa2a33 on chu11:kvs_oom_advanced_try1 into 5f48548 on flux-framework:master.

@codecov-io

This comment has been minimized.

Copy link

codecov-io commented Jul 26, 2017

Codecov Report

Merging #1128 into master will decrease coverage by 0.26%.
The diff coverage is 48.38%.

@@            Coverage Diff             @@
##           master    #1128      +/-   ##
==========================================
- Coverage   77.63%   77.36%   -0.27%     
==========================================
  Files         184      184              
  Lines       31698    31881     +183     
==========================================
+ Hits        24608    24666      +58     
- Misses       7090     7215     +125
Impacted Files Coverage Δ
src/modules/kvs/kvs_util.c 76.92% <100%> (+3.01%) ⬆️
src/modules/kvs/lookup.c 87.08% <33.33%> (+3.39%) ⬆️
src/modules/kvs/kvs.c 63.19% <38.84%> (-6.23%) ⬇️
src/modules/kvs/commit.c 76.55% <55.76%> (+3.46%) ⬆️
src/modules/kvs/waitqueue.c 83.49% <66.66%> (+2.36%) ⬆️
src/modules/kvs/fence.c 81.48% <66.66%> (+3.7%) ⬆️
src/modules/kvs/cache.c 91.07% <70.58%> (-0.24%) ⬇️
t/mpi/hello.c 93.54% <0%> (-6.46%) ⬇️
src/common/libkvs/kvs.c 65.08% <0%> (-6.01%) ⬇️
src/common/libflux/info.c 75% <0%> (-2.78%) ⬇️
... and 13 more
@garlick

This comment has been minimized.

Copy link
Member

garlick commented Jul 26, 2017

Great! OK, I've run through these changes once and I would like to run through them again in more detail when I get to work. One thing to do is to walk through these paths keeping in mind the semantic guarantees we make to KVS users - see 3 items at the top of this wiki page:

https://github.com/flux-framework/flux-core/wiki/kvs

and see if any invariants are violated in the error cases. It would be desirable to return an error to the user, or exit the module's main, or assert rather than silently violate one of these guarantees.

As @grondo noted earlier, the comments are very helpful.

@chu11

This comment has been minimized.

Copy link
Contributor Author

chu11 commented Jul 26, 2017

@garlick Ahh, I had forgotten about the guarantees. I'll go through those to think about if asserting would be better in certain error cases.

@chu11

This comment has been minimized.

Copy link
Contributor Author

chu11 commented Jul 26, 2017

I'll want to go through things line by line, but going through the code, I think the current error handling maintains the guarantees (although the comments need to be clarified or outright fixed).

setroot_event_cb() - for the most part, if anything cache related fails, it's not a big deal. We have the new root and will still set the new root (via setroot()). If the new root is not cached, later lookups will have to load the root from the content store. Perhaps that root load could fail and the user could get an error, but consistency is still maintained.

error while commiting - I think virtually any error (including where the user will never get a response or error) is safe. setroot() is only called at the end of a successful commit, so the root would never updated. So consistency is maintained. Worst is some data has been stuffed in the cache, but if those entries aren't dirty, they will be reclaimed later.

error processing watchlist - again I think consistency is maintained. While callers may miss values, they never get an older one.

@garlick

This comment has been minimized.

Copy link
Member

garlick commented Jul 26, 2017

Thanks for that @chu11 - just looked through again and didn't spot any problems. LMK when you think this is ready.

@chu11

This comment has been minimized.

Copy link
Contributor Author

chu11 commented Jul 26, 2017

Cool. Found one additional error path that I had missed before, so trying to work it into this PR as wait_runqueue() effects it.

@chu11 chu11 force-pushed the chu11:kvs_oom_advanced_try1 branch from aaa2a33 to 85894ec Jul 26, 2017

@chu11

This comment has been minimized.

Copy link
Contributor Author

chu11 commented Jul 26, 2017

repushed, cleaned up/fixed a lot of comments on tricky parts. Main difference from original PR set of patches is I allow an error from content_store_completion() to propagate to the caller when a synchronous call is made.

In several locations, when a cache entry is missing or invalid, I had to come up with an errno setting to propagate to a caller so they can output an error message. I decided on EFAULT, which seems like the right one to set. But maybe there's a better one?

Also, acknowledging issue #1130 remains in this PR. I will cleanup in a later PR everywhere in KVS.

@garlick

This comment has been minimized.

Copy link
Member

garlick commented Jul 26, 2017

On EFAULT: well, it might be good to punt with something that doesn't imply memory problems so someone doesn't begin chasing one of those unnecessarily. I dunno, maybe EINVAL works here? It's hopefully not going to be a common code path so probably not worth wasting too much energy on.

Regarding #1130 IMHO we should fix those up in this PR before merging if there are any. That will avoid some extra churn in the history.

@chu11

This comment has been minimized.

Copy link
Contributor Author

chu11 commented Jul 26, 2017

I think EINVAL is used in some places in flux-core, so I wanted to avoid that one. I think it's basically down to using a half-sensible one that could be confused for being a real problem (e.g. EFAULT) or using one that makes no sense (e.g. ENODEV). Guess it'll be confusing either way.

I'll go ahead and cleanup #1130 then.

@garlick

This comment has been minimized.

Copy link
Member

garlick commented Jul 26, 2017

Yeah... hmm, how about EBADSLT (Bad Slot). That sort of vaguely fits.

@coveralls

This comment has been minimized.

Copy link

coveralls commented Jul 26, 2017

Coverage Status

Coverage decreased (-0.07%) to 77.995% when pulling 85894ec on chu11:kvs_oom_advanced_try1 into e047dd4 on flux-framework:master.

@chu11

This comment has been minimized.

Copy link
Contributor Author

chu11 commented Jul 26, 2017

Ahh, I only looked in errno-base.h. EBADSLT sounds pretty good.

@grondo

This comment has been minimized.

Copy link
Contributor

grondo commented Jul 26, 2017

What about ENOTRECOVERABLE? You just need to be able to document what exactly went wrong for a given errno returned by function call failure (and differentiate between different failure states). Just a suggestion, I'm not advocating either way.

BTW, I did notice in this place and others a pattern of

if (failure) {
   errno = EBADBADNOTGOOD;
   goto fail;
...
fail:
   free_something();
   return -1;

If the function(s) between the fail: label and return reset errno then the actual errno set in the failed conditional will not actually get returned to the caller.

@garlick

This comment has been minimized.

Copy link
Member

garlick commented Jul 26, 2017

ENOTRECOVERABLE is nicely generic, yet laden with sadness. I like it!

But EBADBADNOTGOOD would be better.

@chu11

This comment has been minimized.

Copy link
Contributor Author

chu11 commented Jul 26, 2017

@grondo Doh! You're right, I totally missed that. Seems in most places we do saved_errno.

@coveralls

This comment has been minimized.

Copy link

coveralls commented Jul 26, 2017

Coverage Status

Coverage decreased (-0.1%) to 77.949% when pulling 119ed80 on chu11:kvs_oom_advanced_try1 into e047dd4 on flux-framework:master.

@chu11 chu11 force-pushed the chu11:kvs_oom_advanced_try1 branch from 119ed80 to f14746c Jul 27, 2017

@chu11

This comment has been minimized.

Copy link
Contributor Author

chu11 commented Jul 27, 2017

Just pushed an update. Chose ENOTRECOVERABLE over EFAULT, but also used a ENODATA in one location, b/c I think it made more sense there.

Fixed up the bad assert usage per #1130.

Fixed up the potential loss of errno when calling cleanup/error paths. There are a few locations where it may have not been 100% necessary to do a saved_errno fix, but did so for consistency with other functions.

I suspect the diff coverage will even be worse now, with a ton of extra saved_errno = errno lines all over the place now.

@coveralls

This comment has been minimized.

Copy link

coveralls commented Jul 27, 2017

Coverage Status

Coverage decreased (-0.2%) to 77.904% when pulling f14746c on chu11:kvs_oom_advanced_try1 into e047dd4 on flux-framework:master.

@garlick

This comment has been minimized.

Copy link
Member

garlick commented Jul 27, 2017

Fixed up the potential loss of errno when calling cleanup/error paths. There are a few locations where it may have not been 100% necessary to do a saved_errno fix, but did so for consistency with other functions.

I don't mean to be nit picky, but this is not a style I would want to propagate for the sake of consistency, especially in cases where saved_errno has to be set in all error legs of a long function because it's unconditionally restored in a single return path. Best to minimize and localize its use, if possible.

@chu11

This comment has been minimized.

Copy link
Contributor Author

chu11 commented Jul 27, 2017

@garlick Fair enough. I did this in only one place, and it ended up I could just remove the "goto done" path in its entirety. Just re-pushed.

@chu11

This comment has been minimized.

Copy link
Contributor Author

chu11 commented Jul 27, 2017

actually, what's people's opinions on when to use saved_errno. I was programming under the "you can't trust any other function to not change errno" assumption. I suppose we can trust free(), so I added one more fix (edit: and we can trust flux_log, but that's internal to flux). But what about something like json_decref(). It doesn't change errno today, but this isn't a posix function, I can't speak for tomorrow.

@coveralls

This comment has been minimized.

Copy link

coveralls commented Jul 27, 2017

Coverage Status

Coverage decreased (-0.1%) to 77.943% when pulling 3094f25 on chu11:kvs_oom_advanced_try1 into e047dd4 on flux-framework:master.

@coveralls

This comment has been minimized.

Copy link

coveralls commented Jul 27, 2017

Coverage Status

Coverage decreased (-0.2%) to 77.897% when pulling 0c4e038 on chu11:kvs_oom_advanced_try1 into e047dd4 on flux-framework:master.

@garlick

This comment has been minimized.

Copy link
Member

garlick commented Jul 27, 2017

I'm fine with what you have here, especially if that one idiom mentioned above is avoided. Just wanted to put in $0.02 for not going overboard.

@grondo

This comment has been minimized.

Copy link
Contributor

grondo commented Jul 27, 2017

Sorry if my comment had you unnecessarily running around fixing things that don't need to be fixed!

If you want to remove some code duplication, a common idiom I've seen is to set saved_errno or errnum before a set of conditionals that would all return the same errno, (e.g. for ENOMEM, you'd do

saved_errno = ENOMEM;
if (alloc failed)
   goto out;
/* more code */
if (another alloc failed)
   goto out;

This approach could reduce total LOC and duplication in fence_create and cache_get_stats for example. It is not a big deal at all and I certainly don't have a problem with the current code.

@coveralls

This comment has been minimized.

Copy link

coveralls commented Jul 27, 2017

Coverage Status

Coverage decreased (-0.1%) to 77.948% when pulling 7a30d6b on chu11:kvs_oom_advanced_try1 into e047dd4 on flux-framework:master.

@chu11

This comment has been minimized.

Copy link
Contributor Author

chu11 commented Jul 27, 2017

hmmm, hit a stalled build after PASS: t0017-security.t 35 - dispatcher delivers kvs.setroot event to owner connection. First time I've seen it on this PR. I'm assuming the issue is as discussed in PR #1125, restarting it.

@grondo

This comment has been minimized.

Copy link
Contributor

grondo commented Jul 27, 2017

@chu11, sorry about that, a timeout as suggested by @garlick has been added to #1125 -- however I'm still hitting timeouts in other places in that PR, so it is stuck for now...

@coveralls

This comment has been minimized.

Copy link

coveralls commented Jul 27, 2017

Coverage Status

Coverage decreased (-0.1%) to 77.95% when pulling 7a30d6b on chu11:kvs_oom_advanced_try1 into e047dd4 on flux-framework:master.

chu11 added some commits Jul 26, 2017

modules/kvs: Add cache_entry_force_clear_dirty()
Function is to be used in emergency error handling situations.

Add unit tests appropriately.
modules/kvs: Update fence_merge() to not oom()
Removal of fence_merge() oom()'s required slight logic change. Previously
fences were modified while iterating, so if an error occurred midway,
there was no way to go back to the original point before the function was
called.  The function now duplicates some json objects and ensures that
everything succeeds before completing the merge.

Update callers of fence_merge() appropriately.  Of note,
commit_mgr_merge_ready_commits()'s internal algorithm is also changed, so
that it no longer pops and pushes entries onto the list.  Only using
list iterators, it manipulates ready commits.

Update unit tests appropriately throughout.
modules/kvs: Refactor content_load_completion()
Split into synchronous content_load_get() call to allow errors
to be passed back on synchronous calls, and content_load_completion()
for asynchronous rpcs.

After failed cache_lookup, need to choose an errno to indiciate
missing cache entry, used errno = ENOTRECOVERABLE to represent this.
modules/kvs: Remove oom() from wait_runqueue()
Re-work wait_runqueue() to not oom() on error.  Algorithm within the
function has been altered to be safe against errors which requires
that all elements on the waitqueue be copied off or none.  This change
effectively converts the function from having two loops to having
three.

Update callers to check for error.  Notably adjust cache_entry_set_dirty()
and cache_entry_set_json() to return -1 on error, and update callers of
these functions.

In some locations, where an errno had to be set to indicate a missing
cache entry, errno = ENOTRECOVERABLE was chosen.  In another location where
a cache entry was missing data, errno = ENODATA was chosen.
modules/kvs: Fix assert usage
Asserts should only perform sanity checks and not call any real code,
b/c code within asserts can be removed during compilation.

Fixes #1130
modules/kvs: Fix errno return in cleanup paths
When cleaning up in a function, it is possible for an errno value to
be cleared via functions called in the cleanup.  Save errno appropriately
so that it is not lost before returning to caller.  Adjust code appropriately
in some functions if not necessary to save errno.

@chu11 chu11 force-pushed the chu11:kvs_oom_advanced_try1 branch from 7a30d6b to b12f711 Jul 28, 2017

@chu11

This comment has been minimized.

Copy link
Contributor Author

chu11 commented Jul 28, 2017

re-pushed, squashed minor fixes, rebased to master

@coveralls

This comment has been minimized.

Copy link

coveralls commented Jul 28, 2017

Coverage Status

Coverage decreased (-0.1%) to 77.943% when pulling b12f711 on chu11:kvs_oom_advanced_try1 into e047dd4 on flux-framework:master.

@garlick garlick merged commit 38fc4a6 into flux-framework:master Jul 28, 2017

2 of 4 checks passed

codecov/patch 48.38% of diff hit (target 77.63%)
Details
codecov/project 77.36% (-0.27%) compared to e047dd4
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
coverage/coveralls Coverage decreased (-0.1%) to 77.943%
Details

@grondo grondo referenced this pull request Aug 23, 2017

Closed

0.8.0 Release #1160

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.