Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

libjsc: convert to jansson and cleanup #1524

Merged
merged 27 commits into from May 18, 2018

Conversation

garlick
Copy link
Member

@garlick garlick commented May 17, 2018

This PR converts libjsc from json-c to jansson by pulling in shortjansson.h from flux-sched.

It then begins to simplify the code a bit using json_pack(), json_unpack(), etc..

Along the way some back to back KVS RPCs are able to be parallelized.

I'm still working on this - just posting a checkpoint in case @dongahn wants to have an early peek and see if I'm going off track in any way.

@dongahn
Copy link
Member

dongahn commented May 17, 2018

@garlick: I just gave a cursory look and found lots of neat changes!

json_pack () does improve readability a lot.

In terms of optimization, b23d23c seems to show the general direction of using non-blocking asynchronous communications (kvs in this case). This particular change exposes and exploits KVS parallelism within one function.

This makes me wonder if you also plan to expose parallelism across different handlers as part of JSC refactoring? I haven't thought much about this and not sure if JSC has enough logic to benefit from this way, but I was wondering if you are thinking about this.

@garlick
Copy link
Member Author

garlick commented May 17, 2018

Just rebased on current master and fixed a couple of uninitialized variables in error paths that travis caught.

@dongahn - in this PR I mainly wanted to convert to jansson, and simplify the code a bit by using some of the higher level jansson API calls that we didn't have in json-c. The parallelism in the JSC_RDESC query, and the a similar opportunity in the JSC_PDESC were about as far as I thought was appropriate in this PR since it allows the API to remain the same.

In other words when you call jsc_query_jcb() it still blocks internally waiting for KVS RPC(s).

In a future PR (or in the update for new exec system) we should definitely look at having the query/update functions return futures IMHO.

@coveralls
Copy link

coveralls commented May 17, 2018

Coverage Status

Coverage increased (+0.7%) to 79.077% when pulling fdc8bb4 on garlick:libjsc_convert into c1a649a on flux-framework:master.

@grondo
Copy link
Contributor

grondo commented May 17, 2018

Restarted a build that stalled after the issues tests.

@codecov-io
Copy link

codecov-io commented May 17, 2018

Codecov Report

Merging #1524 into master will increase coverage by 0.68%.
The diff coverage is 85.37%.

@@            Coverage Diff            @@
##           master   #1524      +/-   ##
=========================================
+ Coverage   78.11%   78.8%   +0.68%     
=========================================
  Files         165     164       -1     
  Lines       30734   30308     -426     
=========================================
- Hits        24009   23885     -124     
+ Misses       6725    6423     -302
Impacted Files Coverage Δ
src/common/libjsc/jstatctl.c 88.36% <85.37%> (+32.53%) ⬆️
src/broker/module.c 83.79% <0%> (-1.68%) ⬇️
src/broker/content-cache.c 73.43% <0%> (-1.3%) ⬇️
src/common/libflux/mrpc.c 85.49% <0%> (-1.18%) ⬇️
src/modules/kvs/kvs.c 66.03% <0%> (+0.16%) ⬆️
src/bindings/lua/flux-lua.c 82.28% <0%> (+0.17%) ⬆️
src/common/libflux/message.c 81.41% <0%> (+0.23%) ⬆️
src/common/libflux/future.c 89.86% <0%> (+0.44%) ⬆️
src/broker/modservice.c 80.58% <0%> (+0.97%) ⬆️
src/common/libflux/content.c 90% <0%> (+3.33%) ⬆️

@grondo
Copy link
Contributor

grondo commented May 17, 2018

This is great cleanup, LGTM. What else remains to be done before we can merge it?

@dongahn
Copy link
Member

dongahn commented May 17, 2018

That sounds good. I was just wondering your future plan. Thanks.

@garlick
Copy link
Member Author

garlick commented May 17, 2018

Well there is one more big one, the JSC_PDESC query, but actually this could be merged as is to save me from getting distracted for another day :-)

It does need a test with flux-sched first though.

@garlick
Copy link
Member Author

garlick commented May 17, 2018

Sched 0.5.0 does pass make check when built against this PR's branch, once the unrelated patch in flux-framework/flux-sched#345 was applied to address a 32-bit build system issue.

@dongahn
Copy link
Member

dongahn commented May 17, 2018

Sched 0.5.0 does pass make check when built against this PR's branch

Nice job!

@grondo
Copy link
Contributor

grondo commented May 17, 2018

I notice a couple static functions don't have sanity coverage in the flux-core tests. Probably can't do anything about the rdl query function, but wreckrun does set a ersatz R_lite when running without the scheduler, so this addition to t2001-jsc.t might be warranted:

diff --git a/t/t2001-jsc.t b/t/t2001-jsc.t
index ea2109c..e88ab87 100755
--- a/t/t2001-jsc.t
+++ b/t/t2001-jsc.t
@@ -199,6 +199,10 @@ test_expect_success 'jstat 7.5: basic query works: pdesc' '
     flux jstat query 1 pdesc
 '
 
+test_expect_success 'jstat 7.6: basic query works: R_lite' '
+    flux jstat query 1 R_lite
+'
+
 test_expect_success 'jstat 8: query detects bad inputs' '
     test_expect_code 42 flux jstat query 0 jobid &&
     test_expect_code 42 flux jstat query 99999 state-pair &&

@garlick garlick force-pushed the libjsc_convert branch 2 times, most recently from 87364da to b12b1d1 Compare May 17, 2018 21:11
To prepare for conversion of libjsc from json-c
to jansson, pull in shortjansson.h wrappers
from flux-sched project and adapt to local
environment.

This allows libjsc to be converted in several
steps;  otherwise, it's got to be done in one
giant patch to avoid breaking git bisect.

Drop Jtostr() - we will convert those directly
to json_dumps () since a free() is required now.
Make a first pass converting jstatctl.c from
json-c to jansson.
Replace several shortjansson.h calls with one json_pack()
in get_update_jcb() to improve clarity.
Replace several shortjansson.h calls with one json_pack()
in get_submit_jcb() to improve clarity.

Use a fixed size buffer for numeric hash key instead of
allocating dynamically.
Replace several shortjansson.h calls with one json_pack()
in get_reserve_jcb() to improve clarity.

Use a fixed size buffer for numeric hash key instead of
allocating dynamically.
Problem: jsc_query_jcb() calls jobid_exists(),
which looks up the job directory in the KVS,
then does it again for the JSC_JOBID query.

Replace query_jobid() with a json_pack() call,
since without the additional KVS lookup all the
qeury does is put the jobid in jcb object form.
@garlick garlick force-pushed the libjsc_convert branch 2 times, most recently from 60a30ed to 6d3806d Compare May 17, 2018 23:37
Simplify query_state_pair() by having it directly
perform the KVS lookup and create the JCB object
with json_pack().

It now returns jcb on success, NULL on failure.
Change public API function to include (optionally)
the number of gpus requested by job.
Refactor query_rdesc() and jsc_query_rdesc_efficiently()
so that they make the five KVS lookups in parallel rather
than serially.

Change query_rdesc() so that it returns jcb object on
success, NULL on failure.

Eliminate calls to flux_log() that are perhaps not advisable
in library context.
Simplify query_rdl () by having it directly
perform the KVS lookup and create the JCB object
with json_pack().

It now returns jcb on success, NULL on failure.
Simplify query_r_lite () by having it directly
perform the KVS lookup and create the JCB object
with json_pack().

It now returns jcb on success, NULL on failure.
Problem: procdesc debugging support was removed
in wreck.

Drop JSC_PDESC query and update support.
Now that all sub-functions return a JSON object,
simplify the logic in the main jsc_query_jcb()
function.
garlick added 14 commits May 17, 2018 16:39
Problem: ctx->active_jobs stores an integer in the zhash
item pointer to avoid a memory allocation, which makes
the code confusing.

Allocate an integer sized chunk of memory and insert that
into the hash.

Plus get rid of a temporary memory allocation for the hash
key name since it's just an integer and can predictably
fit in a small static buffer.  Create functions for
insert, lookup, and delete and use those rather than
directly manipulating the hash at points of usage.

Also use int rather than int64_t for state enum values.
Problem: every jsc query first calls jobid_exist(),
which performs a synchronous KVS lookup on the job's
KVS directory.  This is unnecessary if the job
is already in the active_jobs hash.

If job is in the active_jobs hash, return success
immediately.

Also, if the KVS lookup is performed, call
flux_future_get() rather than flux_kvs_lookup_get_dir()
since the latter decodes the directory, and we are
only going to throw it away.
Clean up the update_state() function to utilize
json_unpack(), avoid shortjansson.h macros and
drop extraneous flux_log() calls that probably
shouldn't be there.
Modify update_rdesc() to accept the jcb parameter
directly so it can be extracted with the same
json_unpack() used to parse its innards.
Modify update_rdl() to accept the jcb parameter
directly.
Modify update_r_lite() to accept the jcb parameter
directly.
Simplify the jsc_update_jcb() function now
that all the functions it calls can accept
the jcb directly.
Consolidate event parsing into one flux_event_unpack().
Drop calls to flux_log() that probably shouldn't be there.
Now that libjsc has been converted to use
the native jansson API, we can drop shortjansson.h.
We have finally migrated flux-core completely to
jansson, so these convenience macros are no long needed.
We have finally migrated flux-core completely to
jansson, so the captive libjson-c is no longer
required.

Fixes flux-framework#1326
@garlick
Copy link
Member Author

garlick commented May 17, 2018

Rebased and also dropped shortjson.h and libjson-c now that we're free of it.

That ought to reduce my lines-of-code output for the year!

@dongahn
Copy link
Member

dongahn commented May 17, 2018

I don't know if we want to handle the python jsc binding as part of this PR: https://github.com/flux-framework/flux-core/blob/master/src/bindings/python/flux/jsc.py. But this binding is a bit out of date -- e.g., deprecated keys are still there etc.

I am okay if we want to handle python in a separate PR. But at some point we need to revisit this as it looks to me like the initial MLSI users are python users.

@garlick
Copy link
Member Author

garlick commented May 18, 2018

Sounds like a separate PR to me, since the libjsc interfaces did not change in the PR (except for adding ngpus to the one function that was added for @trws.

IT would be good to find out what interfaces are required for this particular application and see if we can provide it in a way that can transition to the new exec system. New issue?

@dongahn
Copy link
Member

dongahn commented May 18, 2018

Agreed.

IT would be good to find out what interfaces are required for this particular application and see if we can provide it in a way that can transition to the new exec system. New issue?

I should let this user explore the basic functionality first. Then we can move to advanced feature, I will create an issue or issues to track this. Compatibility with the new exec system is a decent goal!

@grondo
Copy link
Contributor

grondo commented May 18, 2018

Nice work!

I restarted the coverage builder because for some reason it is reporting that the build failed. However, Travis did report green in all builds. Is this ready to merge?

@trws
Copy link
Member

trws commented May 18, 2018 via email

@garlick
Copy link
Member Author

garlick commented May 18, 2018

Thanks!

Yes merge please.

@grondo grondo merged commit 302d9ff into flux-framework:master May 18, 2018
@garlick garlick deleted the libjsc_convert branch May 18, 2018 14:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants