Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

libjsc: convert to jansson and cleanup #1524

Merged
merged 27 commits into from May 18, 2018

Conversation

Projects
None yet
6 participants
@garlick
Copy link
Member

garlick commented May 17, 2018

This PR converts libjsc from json-c to jansson by pulling in shortjansson.h from flux-sched.

It then begins to simplify the code a bit using json_pack(), json_unpack(), etc..

Along the way some back to back KVS RPCs are able to be parallelized.

I'm still working on this - just posting a checkpoint in case @dongahn wants to have an early peek and see if I'm going off track in any way.

@dongahn

This comment has been minimized.

Copy link
Contributor

dongahn commented May 17, 2018

@garlick: I just gave a cursory look and found lots of neat changes!

json_pack () does improve readability a lot.

In terms of optimization, b23d23c seems to show the general direction of using non-blocking asynchronous communications (kvs in this case). This particular change exposes and exploits KVS parallelism within one function.

This makes me wonder if you also plan to expose parallelism across different handlers as part of JSC refactoring? I haven't thought much about this and not sure if JSC has enough logic to benefit from this way, but I was wondering if you are thinking about this.

@garlick garlick force-pushed the garlick:libjsc_convert branch from 316e432 to 66bef09 May 17, 2018

@garlick

This comment has been minimized.

Copy link
Member Author

garlick commented May 17, 2018

Just rebased on current master and fixed a couple of uninitialized variables in error paths that travis caught.

@dongahn - in this PR I mainly wanted to convert to jansson, and simplify the code a bit by using some of the higher level jansson API calls that we didn't have in json-c. The parallelism in the JSC_RDESC query, and the a similar opportunity in the JSC_PDESC were about as far as I thought was appropriate in this PR since it allows the API to remain the same.

In other words when you call jsc_query_jcb() it still blocks internally waiting for KVS RPC(s).

In a future PR (or in the update for new exec system) we should definitely look at having the query/update functions return futures IMHO.

@coveralls

This comment has been minimized.

Copy link

coveralls commented May 17, 2018

Coverage Status

Coverage increased (+0.7%) to 79.077% when pulling fdc8bb4 on garlick:libjsc_convert into c1a649a on flux-framework:master.

@grondo

This comment has been minimized.

Copy link
Contributor

grondo commented May 17, 2018

Restarted a build that stalled after the issues tests.

@codecov-io

This comment has been minimized.

Copy link

codecov-io commented May 17, 2018

Codecov Report

Merging #1524 into master will increase coverage by 0.68%.
The diff coverage is 85.37%.

@@            Coverage Diff            @@
##           master   #1524      +/-   ##
=========================================
+ Coverage   78.11%   78.8%   +0.68%     
=========================================
  Files         165     164       -1     
  Lines       30734   30308     -426     
=========================================
- Hits        24009   23885     -124     
+ Misses       6725    6423     -302
Impacted Files Coverage Δ
src/common/libjsc/jstatctl.c 88.36% <85.37%> (+32.53%) ⬆️
src/broker/module.c 83.79% <0%> (-1.68%) ⬇️
src/broker/content-cache.c 73.43% <0%> (-1.3%) ⬇️
src/common/libflux/mrpc.c 85.49% <0%> (-1.18%) ⬇️
src/modules/kvs/kvs.c 66.03% <0%> (+0.16%) ⬆️
src/bindings/lua/flux-lua.c 82.28% <0%> (+0.17%) ⬆️
src/common/libflux/message.c 81.41% <0%> (+0.23%) ⬆️
src/common/libflux/future.c 89.86% <0%> (+0.44%) ⬆️
src/broker/modservice.c 80.58% <0%> (+0.97%) ⬆️
src/common/libflux/content.c 90% <0%> (+3.33%) ⬆️
@grondo

This comment has been minimized.

Copy link
Contributor

grondo commented May 17, 2018

This is great cleanup, LGTM. What else remains to be done before we can merge it?

@dongahn

This comment has been minimized.

Copy link
Contributor

dongahn commented May 17, 2018

That sounds good. I was just wondering your future plan. Thanks.

@garlick

This comment has been minimized.

Copy link
Member Author

garlick commented May 17, 2018

Well there is one more big one, the JSC_PDESC query, but actually this could be merged as is to save me from getting distracted for another day :-)

It does need a test with flux-sched first though.

@garlick

This comment has been minimized.

Copy link
Member Author

garlick commented May 17, 2018

Sched 0.5.0 does pass make check when built against this PR's branch, once the unrelated patch in flux-framework/flux-sched#345 was applied to address a 32-bit build system issue.

@dongahn

This comment has been minimized.

Copy link
Contributor

dongahn commented May 17, 2018

Sched 0.5.0 does pass make check when built against this PR's branch

Nice job!

@grondo

This comment has been minimized.

Copy link
Contributor

grondo commented May 17, 2018

I notice a couple static functions don't have sanity coverage in the flux-core tests. Probably can't do anything about the rdl query function, but wreckrun does set a ersatz R_lite when running without the scheduler, so this addition to t2001-jsc.t might be warranted:

diff --git a/t/t2001-jsc.t b/t/t2001-jsc.t
index ea2109c..e88ab87 100755
--- a/t/t2001-jsc.t
+++ b/t/t2001-jsc.t
@@ -199,6 +199,10 @@ test_expect_success 'jstat 7.5: basic query works: pdesc' '
     flux jstat query 1 pdesc
 '
 
+test_expect_success 'jstat 7.6: basic query works: R_lite' '
+    flux jstat query 1 R_lite
+'
+
 test_expect_success 'jstat 8: query detects bad inputs' '
     test_expect_code 42 flux jstat query 0 jobid &&
     test_expect_code 42 flux jstat query 99999 state-pair &&

@garlick garlick force-pushed the garlick:libjsc_convert branch 2 times, most recently from 87364da to b12b1d1 May 17, 2018

garlick added some commits May 16, 2018

libjsc: pull in shortjansson.h from sched
To prepare for conversion of libjsc from json-c
to jansson, pull in shortjansson.h wrappers
from flux-sched project and adapt to local
environment.

This allows libjsc to be converted in several
steps;  otherwise, it's got to be done in one
giant patch to avoid breaking git bisect.

Drop Jtostr() - we will convert those directly
to json_dumps () since a free() is required now.
libjsc: convert to jansson
Make a first pass converting jstatctl.c from
json-c to jansson.
libjsc: [cleanup] use json_pack in get_update_jcb
Replace several shortjansson.h calls with one json_pack()
in get_update_jcb() to improve clarity.
libjsc: [cleanup] use json_pack in get_submit_jcb
Replace several shortjansson.h calls with one json_pack()
in get_submit_jcb() to improve clarity.

Use a fixed size buffer for numeric hash key instead of
allocating dynamically.
libjsc: [cleanup] use json_pack in get_reserve_jcb
Replace several shortjansson.h calls with one json_pack()
in get_reserve_jcb() to improve clarity.

Use a fixed size buffer for numeric hash key instead of
allocating dynamically.
libjsc: drop extra KVS lookup in JSC_JOBID query
Problem: jsc_query_jcb() calls jobid_exists(),
which looks up the job directory in the KVS,
then does it again for the JSC_JOBID query.

Replace query_jobid() with a json_pack() call,
since without the additional KVS lookup all the
qeury does is put the jobid in jcb object form.

@garlick garlick force-pushed the garlick:libjsc_convert branch 2 times, most recently from 60a30ed to 6d3806d May 17, 2018

garlick added some commits May 16, 2018

libjsc: [cleanup] simplify JSC_STATE_PAIR query
Simplify query_state_pair() by having it directly
perform the KVS lookup and create the JCB object
with json_pack().

It now returns jcb on success, NULL on failure.
libjsc: add ngpus to jsc_query_rdesc_efficiently()
Change public API function to include (optionally)
the number of gpus requested by job.
libjsc: parallelize KVS lookups in JSC_RDESC query
Refactor query_rdesc() and jsc_query_rdesc_efficiently()
so that they make the five KVS lookups in parallel rather
than serially.

Change query_rdesc() so that it returns jcb object on
success, NULL on failure.

Eliminate calls to flux_log() that are perhaps not advisable
in library context.
libjsc: [cleanup] simplify JSC_RDL query
Simplify query_rdl () by having it directly
perform the KVS lookup and create the JCB object
with json_pack().

It now returns jcb on success, NULL on failure.
libjsc: [cleanup] simplify JSC_R_LITE query
Simplify query_r_lite () by having it directly
perform the KVS lookup and create the JCB object
with json_pack().

It now returns jcb on success, NULL on failure.
libjsc: drop JSC_PDESC query/update support
Problem: procdesc debugging support was removed
in wreck.

Drop JSC_PDESC query and update support.
libjsc: [cleanup] restructure jsc_query_jcb()
Now that all sub-functions return a JSON object,
simplify the logic in the main jsc_query_jcb()
function.

garlick added some commits May 17, 2018

libjsc: make active_jobs a hash of int pointers
Problem: ctx->active_jobs stores an integer in the zhash
item pointer to avoid a memory allocation, which makes
the code confusing.

Allocate an integer sized chunk of memory and insert that
into the hash.

Plus get rid of a temporary memory allocation for the hash
key name since it's just an integer and can predictably
fit in a small static buffer.  Create functions for
insert, lookup, and delete and use those rather than
directly manipulating the hash at points of usage.

Also use int rather than int64_t for state enum values.
libjsc: consult active_jobs hash in jobid_exist()
Problem: every jsc query first calls jobid_exist(),
which performs a synchronous KVS lookup on the job's
KVS directory.  This is unnecessary if the job
is already in the active_jobs hash.

If job is in the active_jobs hash, return success
immediately.

Also, if the KVS lookup is performed, call
flux_future_get() rather than flux_kvs_lookup_get_dir()
since the latter decodes the directory, and we are
only going to throw it away.
libjsc: [cleanup] update_state() use json_unpack()
Clean up the update_state() function to utilize
json_unpack(), avoid shortjansson.h macros and
drop extraneous flux_log() calls that probably
shouldn't be there.
libjsc: [cleanup] update_rdesc() accepts jcb arg
Modify update_rdesc() to accept the jcb parameter
directly so it can be extracted with the same
json_unpack() used to parse its innards.
libjsc: [cleanup] update_rdl() accepts jcb arg
Modify update_rdl() to accept the jcb parameter
directly.
libjsc: [cleanup] update_r_lite() accepts jcb arg
Modify update_r_lite() to accept the jcb parameter
directly.
libjsc: [cleanup] simplify jsc_update_jcb()
Simplify the jsc_update_jcb() function now
that all the functions it calls can accept
the jcb directly.
libjsc: [cleanup] streamline job_state_cb()
Consolidate event parsing into one flux_event_unpack().
Drop calls to flux_log() that probably shouldn't be there.
libjsc: drop shortjansson.h
Now that libjsc has been converted to use
the native jansson API, we can drop shortjansson.h.
libutil/shortjson.h: drop json-c macros
We have finally migrated flux-core completely to
jansson, so these convenience macros are no long needed.
libjson-c: drop json-c library
We have finally migrated flux-core completely to
jansson, so the captive libjson-c is no longer
required.

Fixes #1326
@garlick

This comment has been minimized.

Copy link
Member Author

garlick commented May 17, 2018

Rebased and also dropped shortjson.h and libjson-c now that we're free of it.

That ought to reduce my lines-of-code output for the year!

@dongahn

This comment has been minimized.

Copy link
Contributor

dongahn commented May 17, 2018

I don't know if we want to handle the python jsc binding as part of this PR: https://github.com/flux-framework/flux-core/blob/master/src/bindings/python/flux/jsc.py. But this binding is a bit out of date -- e.g., deprecated keys are still there etc.

I am okay if we want to handle python in a separate PR. But at some point we need to revisit this as it looks to me like the initial MLSI users are python users.

@garlick

This comment has been minimized.

Copy link
Member Author

garlick commented May 18, 2018

Sounds like a separate PR to me, since the libjsc interfaces did not change in the PR (except for adding ngpus to the one function that was added for @trws.

IT would be good to find out what interfaces are required for this particular application and see if we can provide it in a way that can transition to the new exec system. New issue?

@dongahn

This comment has been minimized.

Copy link
Contributor

dongahn commented May 18, 2018

Agreed.

IT would be good to find out what interfaces are required for this particular application and see if we can provide it in a way that can transition to the new exec system. New issue?

I should let this user explore the basic functionality first. Then we can move to advanced feature, I will create an issue or issues to track this. Compatibility with the new exec system is a decent goal!

@grondo

This comment has been minimized.

Copy link
Contributor

grondo commented May 18, 2018

Nice work!

I restarted the coverage builder because for some reason it is reporting that the build failed. However, Travis did report green in all builds. Is this ready to merge?

@trws

This comment has been minimized.

Copy link
Member

trws commented May 18, 2018

@garlick

This comment has been minimized.

Copy link
Member Author

garlick commented May 18, 2018

Thanks!

Yes merge please.

@grondo grondo merged commit 302d9ff into flux-framework:master May 18, 2018

4 checks passed

codecov/patch 85.37% of diff hit (target 78.11%)
Details
codecov/project 78.8% (+0.68%) compared to c1a649a
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
coverage/coveralls Coverage increased (+0.7%) to 79.077%
Details

@garlick garlick deleted the garlick:libjsc_convert branch May 18, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.