Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

libflux: use common code for aux_set/aux_get #1797

Merged
merged 16 commits into from Nov 6, 2018

Conversation

Projects
None yet
5 participants
@garlick
Copy link
Member

garlick commented Nov 4, 2018

This is a cleanup of the aux get/set functions in theflux_t, flux_mrpc_t, flux_future_t, and flux_message_t objects to normalize their semantics, as called out in #1796.

I think this is a good idea as we stabilize our API, though I'm less sure that the switch from a zhash to a one-off singly linked list is justified, though it should save some memory in those objects.

The header from the shared implementation summarizes the semantics:

/* aux list - associate auxiliary data with a host object
 *
 * The object declares a 'struct aux_item *aux', initialized to NULL.
 * The object's destructor calls aux_destroy (&aux).
 * The object's aux_get/aux_set functions call aux_get ()/aux_set () below.
 *
 * An empty aux list is represented by a NULL pointer.
 *
 * It is legal to aux_set (key=NULL, value!=NULL).  aux_get (key=NULL) fails,
 * but aux_destroy () calls the anonymous value's destructor, if any.
 *
 * It is legal to aux_set () a duplicate key.  The new value replaces the old,
 * after calling its destructor, if any.
 *
 * It is legal to aux_set (key!=NULL, value=NULL).  Any value previously
 * stored under key is deleted, calling its destructor, if any.
 */
@codecov-io

This comment has been minimized.

Copy link

codecov-io commented Nov 4, 2018

Codecov Report

Merging #1797 into master will increase coverage by <.01%.
The diff coverage is 83.07%.

@@            Coverage Diff             @@
##           master    #1797      +/-   ##
==========================================
+ Coverage   79.64%   79.65%   +<.01%     
==========================================
  Files         186      187       +1     
  Lines       34550    34614      +64     
==========================================
+ Hits        27519    27571      +52     
- Misses       7031     7043      +12
Impacted Files Coverage Δ
src/common/libflux/security.c 76.47% <ø> (ø) ⬆️
src/cmd/builtin/proxy.c 74.75% <100%> (ø) ⬆️
src/broker/runlevel.c 80.89% <100%> (ø) ⬆️
src/common/libflux/handle.c 83.99% <100%> (+0.3%) ⬆️
src/modules/connector-local/local.c 74.96% <100%> (+1.02%) ⬆️
src/common/libjob/job.c 60.46% <100%> (ø) ⬆️
src/common/libflux/barrier.c 81.08% <100%> (-1.78%) ⬇️
src/bindings/lua/flux-lua.c 82.15% <100%> (+0.03%) ⬆️
src/common/libflux/message.c 81.64% <100%> (+0.49%) ⬆️
src/modules/barrier/barrier.c 76.55% <100%> (-2.07%) ⬇️
... and 32 more
@grondo

This comment has been minimized.

Copy link
Contributor

grondo commented Nov 5, 2018

though I'm less sure that the switch from a zhash to a one-off singly linked list is justified, though it should save some memory in those objects.

Was there a memory issue that this was intended to fix, and if so a benchmark that shows that the performance and interface tradeoff is acceptable due to the memory usage savings? I guess I'd feel more comfortable if I knew where the sudden focus on memory usage above anything else is coming from (not that I'm arguing against the current approach, it just seemed like the change was coming from some experience someone had with usage.)

I've only quickly looked at the struct aux_item interface, but my first question was why the "aux list" interface was so unique that it now requires its own implementation of a linked list and a "different" interface. I also wonder if that choice of interface would make it more difficult to switch to a different underlying data structure in the future (maybe a leaner hash?), since the sort of "head of list" interface presented here is very list-specific.

Sorry about all the questions! I don't mean to sound like I'm dumping on this PR, but for my own benefit wanted to understand the choices made here.

(and BTW, very good idea to unify the "aux" interfaces across objects)

Finally, libsubprocess also has a similar interface flux_subprocess_get_context(3) and flux_subprocess_set_context(3). I guess those need to be changed as well.

@garlick

This comment has been minimized.

Copy link
Member Author

garlick commented Nov 5, 2018

Sorry, I didn't mean to get us off in the weeds talking about saving a few bytes of memory here and there. The main justification of this PR is the interface normalization. I did that by factoring out common code, and didn't mean to create a new and weird generic container.

I could change the implementation to use a zlist or a zhash and it wouldn't really matter. My thought in doing it the way I did was that it was probably about the same amount of code, would have the side benefit of being lighter weight (yeah prob not justified), and wouldn't have the czmq dependency so it could be copied into flux-security if desired. But after I got done, it was a little more complicated than I was comfortable with hence the hedging in the PR description about its justification.

@@ -10,7 +10,7 @@

int aux_destroy_called;
void *aux_destroy_arg;
void aux_destroy (void *arg)
void aux_destroy_fun (void *arg)

This comment has been minimized.

@chu11

chu11 Nov 5, 2018

Contributor

i don't know how to comment on a commit message, but noticed you have at typo wiht in the commit message for this change :-)

This comment has been minimized.

@garlick

garlick Nov 5, 2018

Author Member

Thanks!

zhash_delete (mrpc->aux, name);
if (zhash_insert (mrpc->aux, name, aux) < 0) {
errno = ENOMEM;
if (!mrpc) {

This comment has been minimized.

@chu11

chu11 Nov 5, 2018

Contributor

don't know if it should be in this PR or perhaps we can just create a new issue for it. But there are several other places in mrpc.c that do assert (mrpc->magic == MRPC_MAGIC) instead of if (!mrpc). It should probably be consistent.

This comment has been minimized.

@garlick

garlick Nov 5, 2018

Author Member

I'd say that's a different issue.

@garlick

This comment has been minimized.

Copy link
Member Author

garlick commented Nov 5, 2018

Finally, libsubprocess also has a similar interface flux_subprocess_get_context(3) and flux_subprocess_set_context(3). I guess those need to be changed as well.

/*
 *  Set arbitrary context `ctx` with name `name` on subprocess object `p`.
 *
 *  Returns 0 on success
 */
int flux_subprocess_set_context (flux_subprocess_t *p,
                                 const char *name, void *ctx);

/*
 *  Return pointer to any context associated with `p` under `name`. If
 *   no such context exists, then NULL is returned.
 */
void *flux_subprocess_get_context (flux_subprocess_t *p, const char *name);

Did you want to change these to:

void *flux_subprocess_aux_get (flux_future_t *f, const char *name);

int flux_subprocess_aux_set (flux_future_t *f, const char *name,
                             void *aux, flux_free_f destroy);

or keep the names, but add a destructor to aux_set, and use the common implementation?

@chu11

This comment has been minimized.

Copy link
Contributor

chu11 commented Nov 5, 2018

One extra comment:

 * It is legal to aux_set (key=NULL, value!=NULL).  aux_get (key=NULL) fails,
 * but aux_destroy () calls the anonymous value's destructor, if any.

I see this as a convenient optional way to stash something away for destruction later on. But should this convenience be noted more explicitly in the manpage? When seeing an aux_set, function, I wouldn't think this behavior is allowed.

@grondo

This comment has been minimized.

Copy link
Contributor

grondo commented Nov 5, 2018

Actually I did a quick test and, though the test is a bit contrived, this PR is actually kind of a big win. Here's the output from valgrind's massif of a run before this PR and one after. The test code creates 1024 futures using flux_rpc, then calls flux_rpc_get on each one before exiting (without calling flux_future_destroy)

Before pr #1797:
massif-before

After pr #1797:
massif-after

This PR saves 4MiB, or over 20% of heap memory on this testcase.

Just for my own interest, I may retry this testcase again with a reactor instead of synchronous gets. I'm wondering about the vast majority of heap memory which is coming from msg_handler.c:fastpath_set.

@grondo

This comment has been minimized.

Copy link
Contributor

grondo commented Nov 5, 2018

But should this convenience be noted more explicitly in the manpage?

It is already documented in the flux_future_create(3) man page -- good idea that this should be copied to other manpages with aux_set/get

Objects may be stored anonymously under a NULL name to be scheduled for destruction without the option of retrieval.

@grondo

This comment has been minimized.

Copy link
Contributor

grondo commented Nov 5, 2018

Did you want to change these to:

@chu11 should probably comment, but I think your idea to develop a common interface applies also to libsubprocess, so I'd go with changing the function names.

@chu11

This comment has been minimized.

Copy link
Contributor

chu11 commented Nov 5, 2018

keep the names, but add a destructor to aux_set, and use the common implementation?

My vote would be to change to aux_set/aux_get, add the destructor, and use common implementation.

@garlick

This comment has been minimized.

Copy link
Member Author

garlick commented Nov 5, 2018

I also wonder if that choice of interface would make it more difficult to switch to a different underlying data structure in the future (maybe a leaner hash?), since the sort of "head of list" interface presented here is very list-specific.

Although the parameter name head could have been better chosen not imply implementation details, I think the function prototypes where the "aux thing" is passed in by reference whenever it can be modified would work equally well for creating a container like zhash or zlist on first use.

The "empty list represented by NULL pointer" invariant was to avoid creating the container unconditionally, in case it wasn't used. You could probably argue this is also premature optimization, although 3 of 4 implementations already included (duplicated) logic for this.

Would it be better if this was reimplemented in terms of a zhash? Since there are now tests on the generic implementation, that should be quite easy.

@garlick

This comment has been minimized.

Copy link
Member Author

garlick commented Nov 5, 2018

This PR saves 4MiB, or over 20% of heap memory on this testcase.

WTF?!?

@grondo

This comment has been minimized.

Copy link
Contributor

grondo commented Nov 5, 2018

Although the parameter name head could have been better chosen not imply implementation details, I think the function prototypes where the "aux thing" is passed in by reference whenever it can be modified would work equally well for creating a container like zhash or zlist on first use.

Ok, that satisfies my concern on that count! (like I said I just quickly looked at the interface, sorry!)

WTF?!?

Yes, this simple testcase which basically creates 1024 futures seems to use 4MiB just in zhash_new, which is eliminated by your improvements. I'm totally convinced that your suspicion that zhash is overkill was completely correct. In fact, now I'm wondering about zhash in other places...

@grondo

This comment has been minimized.

Copy link
Contributor

grondo commented Nov 5, 2018

Would it be better if this was reimplemented in terms of a zhash? Since there are now tests on the generic implementation, that should be quite easy.

oh, no! I'm totally convinced now that zhash is not meant for use except when many hundereds of items are guaranteed to be stored, and certainly nowhere where it will be a short-lived data structure! I guess we get spoiled by the easy access to associative arrays as an interface in other languages, but have to be more careful in straight C.

Sorry that I caused so much noise in what should have been a simple PR.

@garlick

This comment has been minimized.

Copy link
Member Author

garlick commented Nov 5, 2018

Sorry that I caused so much noise in what should have been a simple PR.

Not at all, that was interesting.

@grondo

This comment has been minimized.

Copy link
Contributor

grondo commented Nov 5, 2018

For the curious, here's a run of the same testcase using flux_future_then and the reactor instead of synchronous flux_future_get:

Before #1797:
massif-before

After #1797:
massif-after

@grondo

This comment has been minimized.

Copy link
Contributor

grondo commented Nov 5, 2018

However, note that the second test above runs 2x slower than the synchronous test. Probably due to the overhead of exiting/entering the reactor repeatedly?

@garlick

This comment has been minimized.

Copy link
Member Author

garlick commented Nov 5, 2018

I'm totally convinced now that zhash is not meant for use except when many hundereds of items are guaranteed to be stored, and certainly nowhere where it will be a short-lived data structure!

zhashx might be a little nicer, as I think it is more sophisticated about how it manages growth.

@SteVwonder

This comment has been minimized.

Copy link
Member

SteVwonder commented Nov 5, 2018

zhashx might be a little nicer, as I think it is more sophisticated about how it manages growth.

Yeah, based on a quick glance at the source code, it looks like zhashx is initialized to 3 items to start (i.e., the first prime in this prime list).

When expanding, it increases its "prime limit" by 5 primes.....and my knowledge of primes is nowhere good enough to estimate what that means in terms of a percent increase. But it does start small, grow quickly, and then converge to approx doubling (which becomes really apparent when you look at the prime list which has the powers of two nicely labeled, and the approx gap between the powers of two is 5 primes): 3 -> 17 -> 43 -> 103 -> 229 -> 487 ...

@chu11

This comment has been minimized.

Copy link
Contributor

chu11 commented Nov 5, 2018

zhashx might be a little nicer, as I think it is more sophisticated about how it manages growth.

Which begs the question, how many places should we be using zhashx_t instead of zhash_t? Skimming code, I can see several places in the libsubprocess code that the change would probably be positive (number of channels and number of options in a command should be small on average).

@grondo

This comment has been minimized.

Copy link
Contributor

grondo commented Nov 5, 2018

Which begs the question, how many places should we be using zhashx_t instead of zhash_t? Skimming code, I can see several places in the libsubprocess code that the change would probably be positive (number of channels and number of options in a command should be small on average).

Yeah, though it may be premature, we should probably open an issue, and optimize our search for sites where zhash is used in a data structure that might be rapidly used many times in a single address space -- e.g. flux_future_t [probably largely addressed in this pr], subprocess is another good one [before and after flux-exec max rss would be interesting], etc.)

@garlick

This comment has been minimized.

Copy link
Member Author

garlick commented Nov 6, 2018

I added on some commits to

  • check flux_aux_set() in a few more places
  • make the proposed change to libsubprocess and update users
  • try to normalize documentation for aux functions across objects
  • added to unit test

I'll fix that commit typo when I eventually squash. LMK if I should do that now and rebase on current master.

@grondo

This comment has been minimized.

Copy link
Contributor

grondo commented Nov 6, 2018

Great, thanks! Since we have a couple smaller PRs going in first you can probably wait until those go in (#1805, #1798), but I'd be fine with a squash+rebase when you're ready. Thanks for taking this one on. Good improvement!

garlick added some commits Nov 3, 2018

libutil/aux: build common object aux methods
Problem: there are several implementations of
aux_get/aux_set methods in objects, each of which
has subtly different semantics.

Create a common implementation that can replace
all the one-offs

Fixes #1796
testsuite: eliminate duplicate aux_destroy function
Problem: future unit test had a function called aux_destroy()
which conflicts with new aux_destroy() in libutil/aux.c.

Rename the test function.
bingings/lua: preserve errno in l_flux_reactor_start
Problem: the iowatcher 'error is ENOTDIR' test started
failing after flux_aux_get() started setting errno to
ENOENT on key fetch failure.

The lua bindings were depending on errno being preserved
if the "lua::reason" key was not set.  Add a bit of code
around the the flux_aux_get() to save and restore errno.

@garlick garlick force-pushed the garlick:aux_list branch from 1ebf3e8 to 1bbf0b1 Nov 6, 2018

@garlick

This comment has been minimized.

Copy link
Member Author

garlick commented Nov 6, 2018

Rebased on master, fixed commit typo, squashed incremental development, and combined the API changes with updates to API users to avoid breaking git bisect.

garlick added some commits Nov 3, 2018

libflux/handle: switch to common aux implementation
Change flux_aux_set() and flux_aux_get() to use
libutil/aux rather than a zhash.

Also, change the return value of flux_aux_set()
from void to int, to match other aux get/set
functions.  It can return -1 with errno=ENOMEM.

Update users to check aux_set return value:
- libflux/attr
- libflux/barrier
- libflux/flog
- libflux/info
- libflux/msg_handler
- libflux/reactor
- libjob
- libkvs
- libjsc
- modules/barrier
- modules/connector-local
- modules/content-sqlite
- modules/kvs
- modules/userdb
testsuite: cover flux_msg_aux_get(), _aux_set()
Cover flux_msg_aux_get() and flux_msg_aux_set() with
NULL message param.
testsuite: add mrpc unit test
Cover the aux_get/aux_set functions that were just modified
for (locally handled) invalid arguments.
testsuite: add handle unit test
Cover the aux_get/aux_set functions that were just modified
for (locally handled) invalid arguments.
libflux/security: drop extraneous munge.h include
Problem: although flux-core's direct dependency on libmunge
was dropped, one include of munge.h was left.

Remove it.

Fixes #1794
libsubprocess: switch to common aux implementation
Replace a zhash implementation with the common one.

Rename the functions from:
  flux_subprocess_get_context() -> flux_subprocess_aux_get()
  flux_subprocess_set_context() -> flux_subprocess_aux_set()

Add flux_free_f argument to the set function.

Update the following users:
- testsuite
- modules/wreck/job
- modules/cron
- broker
- cmd/flux-proxy
- cmd/flux-start
- cmd/flux-exec

@garlick garlick force-pushed the garlick:aux_list branch from 1bbf0b1 to ddd7a88 Nov 6, 2018

@garlick

This comment has been minimized.

Copy link
Member Author

garlick commented Nov 6, 2018

Oops, one more time.

@grondo

This comment has been minimized.

Copy link
Contributor

grondo commented Nov 6, 2018

Great, thanks!

@grondo grondo merged commit 31ce43a into flux-framework:master Nov 6, 2018

1 check passed

continuous-integration/travis-ci/pr The Travis CI build passed
Details
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.