Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pull in json-c #835

Merged
merged 23 commits into from Oct 11, 2016

Conversation

Projects
None yet
5 participants
@garlick
Copy link
Member

garlick commented Oct 6, 2016

This PR is the missing piece that would have allowed #319, an attempt to convert part of the project to the jansson JSON library, to have been completed. It pulls in a copy of the json-c library and renames all of its public symbols to have have a json_c_ prefix. The library is built as part of libflux-internal.la. Any part of flux-core can now dump json-c simply by dropping its include of src/common/libjson-c/json.h or src/common/libutil/shortjson.h.

If this approach is acceptable, next I'd like to revisit bringing back the really cool formatted rpc functions that were parked in #319.

Note to flux-sched: this doesn't affect external interfaces, and you can still track shortjson.h as the only change to it here is to include json.h via a different path.

Note on tests: I did add a small TAP test here but I did not pull in the tests that ship with json-c. We could do that but it will take some work and I didn't feel it was justified here, given that json-c releases are pretty well tested.

@garlick garlick added the review label Oct 6, 2016

@lipari

This comment has been minimized.

Copy link
Contributor

lipari commented Oct 6, 2016

Note to flux-sched: this doesn't affect external interfaces, and you can still track shortjson.h as the only change to it here is to include json.h via a different path.

@garlick, this sounds like something sched is going to want to adopt if we are to maintain as much consistency with the flux-core as possible. Do you agree?

@codecov-io

This comment has been minimized.

Copy link

codecov-io commented Oct 6, 2016

Current coverage is 71.48% (diff: 13.43%)

Merging #835 into master will decrease coverage by 3.57%

@@             master       #835   diff @@
==========================================
  Files           146        157     +11   
  Lines         25415      26688   +1273   
  Methods           0          0           
  Messages          0          0           
  Branches          0          0           
==========================================
+ Hits          19075      19077      +2   
- Misses         6340       7611   +1271   
  Partials          0          0           
Diff Coverage File Path
0% src/modules/live/liblive.c
0% src/modules/pymod/py_mod.c
0% new src/common/libjson-c/printbuf.c
0% src/cmd/builtin/heaptrace.c
0% new src/common/libjson-c/json_tokener.c
0% new src/common/libjson-c/arraylist.c
0% src/common/libflux/panic.c
0% new src/common/libjson-c/json_object_iterator.c
0% new src/common/libjson-c/linkhash.c
0% new src/common/libjson-c/json_c_version.c

Review all 58 files changed

Powered by Codecov. Last update 7eb19d0...cb354c2

@coveralls

This comment has been minimized.

Copy link

coveralls commented Oct 6, 2016

Coverage Status

Coverage decreased (-0.4%) to 74.939% when pulling 70e4ff5 on garlick:json_c into dc58fcb on flux-framework:master.

@garlick

This comment has been minimized.

Copy link
Member Author

garlick commented Oct 6, 2016

this sounds like something sched is going to want to adopt if we are to maintain as much consistency with the flux-core as possible. Do you agree?

Pulling in json-c was kind of a desperation move to control its namespace and I hope sched won't need that, certainly not until you want to migrate part of sched to jansson.

The specific problems are discussed in #319 where libflux was converted to use jansson. It's a dynamic library externally where the usual transitive dep rules apply, but internally libtool through its wrapper scripts arranges for anything that links against libflux to pick up its dependencies, e.g. jansson. We then found that the kvs module was mixing jansson and json-c APIs where they overlapped and caused segfaults.

@garlick

This comment has been minimized.

Copy link
Member Author

garlick commented Oct 6, 2016

I should add, we already got rid of json_object * in all the public flux-core interfaces, so flux-sched and flux-core can use completely different JSON implementations without conflict. Where JSON is passed, it is passed as a string (whether that be API parameter or a message).

@lipari

This comment has been minimized.

Copy link
Contributor

lipari commented Oct 6, 2016

Ok thanks. If more and more of flux-core starts to convert to jansson, then I would want to convert all of flux-sched as well. There's no point in forcing developers to work in two different json c libraries when they could work more efficiently using just one.

@coveralls

This comment has been minimized.

Copy link

coveralls commented Oct 7, 2016

Coverage Status

Coverage decreased (-0.2%) to 75.153% when pulling c76d90e on garlick:json_c into dc58fcb on flux-framework:master.

@garlick

This comment has been minimized.

Copy link
Member Author

garlick commented Oct 7, 2016

if more and more of flux-core starts to convert to jansson, then I would want to convert all of flux-sched as well. There's no point in forcing developers to work in two different json c libraries when they could work more efficiently using just one.

Although that sounds like a supportable position, I'm personally not hard over on using one implementation throughout the project, if different situations call for a different one, e.g. see pr #833.

@garlick garlick force-pushed the garlick:json_c branch from c76d90e to 7199586 Oct 7, 2016

@garlick

This comment has been minimized.

Copy link
Member Author

garlick commented Oct 7, 2016

Hmm, possibly there is something wrong here. It seems hard to get all of the travis-ci tests to run through without hitting some problem in a seemingly random spot.

Here's a segfault in the lua kvs test:

ok 125 - but key value has been updated
1..125
2016-10-07T16:08:22.091405Z broker.err[0]: Run level 2 Segmentation fault (rc=139) 1.9s
flux-start: 0 (pid 106093) exited with rc=139 (Segmentation fault)
FAIL: lua/t1002-kvs.t

Another travis-ci test hung in the crontab tests.

I did some poking with valgrind to see if any obvious memory problems popped up - nothing found.

@garlick

This comment has been minimized.

Copy link
Member Author

garlick commented Oct 7, 2016

Or it could just be travis this morning. One of my builds hung in configure - pretty sure that's not a problem introduced here.

@garlick garlick force-pushed the garlick:json_c branch from 7199586 to 243a2ca Oct 7, 2016

@garlick

This comment has been minimized.

Copy link
Member Author

garlick commented Oct 7, 2016

I was thinking this PR doesn't really do anything useful and so brought over the flux_rpcf(), flux_rpc_getf() functions from #319. In retrospect, maybe those belong in another PR?

@coveralls

This comment has been minimized.

Copy link

coveralls commented Oct 7, 2016

Coverage Status

Coverage decreased (-0.3%) to 75.068% when pulling 243a2ca on garlick:json_c into c94fdd3 on flux-framework:master.

@grondo

This comment has been minimized.

Copy link
Contributor

grondo commented Oct 7, 2016

FWIW, since the getf() functions require jansson to work, I don't mind them being part of this PR. They look really useful!

@garlick garlick force-pushed the garlick:json_c branch from 243a2ca to ba297dc Oct 7, 2016

@garlick

This comment has been minimized.

Copy link
Member Author

garlick commented Oct 7, 2016

This is rebased on current master, and I weaned libflux off json-c.

@garlick garlick force-pushed the garlick:json_c branch from d61d436 to 40e9154 Oct 10, 2016

@coveralls

This comment has been minimized.

Copy link

coveralls commented Oct 10, 2016

Coverage Status

Coverage decreased (-0.2%) to 75.126% when pulling 40e9154 on garlick:json_c into 2df89fd on flux-framework:master.

@garlick garlick force-pushed the garlick:json_c branch from 40e9154 to 305aa5d Oct 10, 2016

@garlick

This comment has been minimized.

Copy link
Member Author

garlick commented Oct 10, 2016

I'd like to propose this as a merge candidate.

As discussed in our project meeting today, there is one caveat here: the new RPC functions can expose json_t objects through the varargs interface, e.g. this example from libflux/attr.c:

    flux_rpc_t *r;
    json_t *array;
    if (!(r = flux_rpc (ctx->h, "cmb.attrlist", NULL, FLUX_NODEID_ANY, 0)))
        goto done;
    if (flux_rpc_getf (r, NULL, "{s:o}", "names", &array) < 0)
        goto done;

However, this doesn't require external API users to use jansson (include it or link with it) if they are willing to avoid the o format specifier. On the other hand, o is actually pretty useful in places where it's OK to adopt jansson, for example all over flux-core - the fact that the json_t object remains owned by the flux_rpc_t reduces the potential for JSON leaks, especially on error paths, and generally results in more compact code.

I've convinced myself that the pragmatic benefits of this PR outweigh the slight backsliding on API design principles. Any objections?

@coveralls

This comment has been minimized.

Copy link

coveralls commented Oct 10, 2016

Coverage Status

Coverage decreased (-0.2%) to 75.13% when pulling 305aa5d on garlick:json_c into 2df89fd on flux-framework:master.

@grondo

This comment has been minimized.

Copy link
Contributor

grondo commented Oct 10, 2016

I've convinced myself that the pragmatic benefits of this PR outweigh the slight backsliding on API design principles. Any objections?

I think this seems fine, and since the format specifier to pull a json_t out of an rpc will be very useful in flux internals I think you are right that it seems worth it. This is going to remove a lot of annoying boilerplate code in rpc handlers.

Some quick questions, I'm assuming it is ok to call flux_rpc_getf() multiple times on the same rpc_t, correct?

In your example above

if (flux_rpc_getf (r, NULL, "{s:o}", "names", &array) < 0)
        goto done;

Does caller add a reference to the json_t array, or does rpc_t still own the memory?

@garlick

This comment has been minimized.

Copy link
Member Author

garlick commented Oct 10, 2016

I'm assuming it is ok to call flux_rpc_getf() multiple times on the same rpc_t, correct?

No, because the flux_rpc_t was designed to also be used with flux_rpc_multi() where each flux_rpc_get() would fetch the next response. That is a bit annoying - should we try to address that? Maybe add an explicit call to invalidate the current response before getting the next one in the multi case?

Does caller add a reference to the json_t array, or does rpc_t still own the memory?

The flux_rpc_t still owns the memory.

@grondo

This comment has been minimized.

Copy link
Contributor

grondo commented Oct 10, 2016

should we try to address that? Maybe add an explicit call to invalidate the current response before getting the next one in the multi case?

Yeah, I'm thinking that will trip users of the API up -- it may not be practical or even good programming style to always call flux_rpc_getf() only once per rpc object (e.g. for anything but the most trivial of messages)

garlick added a commit to garlick/flux-core that referenced this pull request Oct 10, 2016

libflux/rpc: make flux_rpc_get() idempotent
Problem: flux_rpc_get() can only be called once on a
flux_rpc_t, which may be inconvenient or lead to poorly
structured code as discussed in pr flux-framework#835.

In the single-response case, there is really no reason
to invalidate a received message while the flux_rpc_t is
valid.  The origin of this semantic is the multi-response
case, where a flux_rpc_get() would "consume" a message and
make way for the next one.

Replace flux_rpc_completed(), a simple test whether all
responses to an rpc have been received, with flux_rpc_next(),
which invalidates the current received message and returns 0
if another is expected, -1 if not.

Update flux-ping, flux-module, and flux-hwloc to reflect
this API change.

Update multrpc and kvs_watch sharness tests.

@garlick garlick force-pushed the garlick:json_c branch from 4a7a7c3 to a99bf33 Oct 11, 2016

@garlick

This comment has been minimized.

Copy link
Member Author

garlick commented Oct 11, 2016

Just pushed a change that allows flux_rpc_get() to be called as many times as you like on a flux_rpc_t. The multi-RPC usage changes to:

do {
    flux_rpc_get (rpc, ...);
} while flux_rpc_next (rpc) == 0);

from

while (!flux_rpc_completed (rpc)) {
    flux_rpc_get (rpc, ...)
}
@grondo

This comment has been minimized.

Copy link
Contributor

grondo commented Oct 11, 2016

Oh I like it!

garlick added some commits Oct 7, 2016

libflux/rpc: add flux_rpcf(), flux_rpc_getf()
Add variants of flux_rpc(), flux_rpc_get() that support
building and parsing of JSON payloads via printf-style
arguments, without having to build a JSON object on
the side, or even interact with a JSON library.

The objects, and therefore any strings returned by
flux_rpc_get(), are owned by the flux_rpc_t.

These functions are based on jansson json_pack and
json_uppack() internally.
libflux/rpc: make flux_rpc_get() idempotent
Problem: flux_rpc_get() can only be called once on a
flux_rpc_t, which may be inconvenient or lead to poorly
structured code as discussed in pr #835.

In the single-response case, there is really no reason
to invalidate a received message while the flux_rpc_t is
valid.  The origin of this semantic is the multi-response
case, where a flux_rpc_get() would "consume" a message and
make way for the next one.

Replace flux_rpc_completed(), a simple test whether all
responses to an rpc have been received, with flux_rpc_next(),
which invalidates the current received message and returns 0
if another is expected, -1 if not.

Update flux-ping, flux-module, and flux-hwloc to reflect
this API change.

Update multrpc and kvs_watch sharness tests.
libflux/rpc: add flux_rpc_get_nodeid()
Now that flux_rpc_get() no longer "consumes" the received
message, we can have separate accessors for the payload
and the nodeid.  Drop the nodeid pointer argument from
the flux_rpc_get() function family and add flux_rpc_get_nodeid().

If an error was returned in the RPC response, flux_rpc_get_nodeid()
will succeed, thus it is possible in a flux_rpc_multi() use case
to determine which remote service returned an error.

Update users.
libflux/rpc: fix problem with lost errno
Problem: if flux_rpc_check() calls flux_recv() and gets an
error other than EWOULDBLOCK, a subsequent call to flux_rpc_get()
will ignore the fact that an error has occurred and retry the
flux_recv().

If an error occurs in the receive path, cache the errno in the
rpc object, and don't retry the receive, simply return the
cached errno.
libflux/attr: use flux_rpcf()
Switch rpcs to flux_rpcf() and flux_rpc_getf().
Convert array manipulation in lsattr to jansson.

Fix typo in Usage output of "flux setattr", noticed while testing.
libflux: drop extra includes
Drop includes of shortjson.h, xzmalloc.h, log.h, nodeset.h
that served no purpose.

@garlick garlick force-pushed the garlick:json_c branch from a99bf33 to cb354c2 Oct 11, 2016

@coveralls

This comment has been minimized.

Copy link

coveralls commented Oct 11, 2016

Coverage Status

Coverage decreased (-0.3%) to 75.105% when pulling cb354c2 on garlick:json_c into 7eb19d0 on flux-framework:master.

doc/flux_rpc(3): document API changes
Add
  flux_rpc_next()
  flux_rpc_get_nodeid()
  flux_rpcf()
  flux_rpc_getf()

Drop
  flux_rpc_completed().

Drop nodeid argument from flux_rpc_get().

@garlick garlick force-pushed the garlick:json_c branch from cb354c2 to ded4a18 Oct 11, 2016

@coveralls

This comment has been minimized.

Copy link

coveralls commented Oct 11, 2016

Coverage Status

Coverage decreased (-0.4%) to 74.939% when pulling ded4a18 on garlick:json_c into 7eb19d0 on flux-framework:master.

@garlick

This comment has been minimized.

Copy link
Member Author

garlick commented Oct 11, 2016

I think this might be near to a merge candidate. It's rebased on current master, and in addition to the change to flux_rpc suggested by @grondo, I dropped the nodeid argument from flux_rpc_get() and added a flux_rpc_get_nodeid() function for the relatively few places that this is useful.

@grondo

This comment has been minimized.

Copy link
Contributor

grondo commented Oct 11, 2016

Great! I think this is really nice and should go in after Travis is green. I think any more discussion or tweaks could be future PRs (not that I can think of any at this time)

@garlick garlick referenced this pull request Oct 11, 2016

Closed

make distcheck failing #842

@grondo

This comment has been minimized.

Copy link
Contributor

grondo commented Oct 11, 2016

Ok, @garlick. Nothing else to add here? I think I'll merge this so I can move forward with rebasing my WIP.

@garlick

This comment has been minimized.

Copy link
Member Author

garlick commented Oct 11, 2016

Works for me! Thanks.

@grondo grondo merged commit 286b6e7 into flux-framework:master Oct 11, 2016

2 checks passed

continuous-integration/travis-ci/pr The Travis CI build passed
Details
coverage/coveralls Coverage decreased (-0.2%) to 75.135%
Details

@grondo grondo removed the review label Oct 11, 2016

@garlick garlick deleted the garlick:json_c branch Oct 11, 2016

@garlick garlick referenced this pull request Oct 26, 2016

Closed

0.5.0 release notes #879

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.