New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
libjsc: convert to jansson and cleanup #1524
Conversation
@garlick: I just gave a cursory look and found lots of neat changes!
In terms of optimization, b23d23c seems to show the general direction of using non-blocking asynchronous communications (kvs in this case). This particular change exposes and exploits KVS parallelism within one function. This makes me wonder if you also plan to expose parallelism across different handlers as part of JSC refactoring? I haven't thought much about this and not sure if JSC has enough logic to benefit from this way, but I was wondering if you are thinking about this. |
Just rebased on current master and fixed a couple of uninitialized variables in error paths that travis caught. @dongahn - in this PR I mainly wanted to convert to jansson, and simplify the code a bit by using some of the higher level jansson API calls that we didn't have in json-c. The parallelism in the JSC_RDESC query, and the a similar opportunity in the JSC_PDESC were about as far as I thought was appropriate in this PR since it allows the API to remain the same. In other words when you call In a future PR (or in the update for new exec system) we should definitely look at having the query/update functions return futures IMHO. |
Restarted a build that stalled after the issues tests. |
Codecov Report
@@ Coverage Diff @@
## master #1524 +/- ##
=========================================
+ Coverage 78.11% 78.8% +0.68%
=========================================
Files 165 164 -1
Lines 30734 30308 -426
=========================================
- Hits 24009 23885 -124
+ Misses 6725 6423 -302
|
This is great cleanup, LGTM. What else remains to be done before we can merge it? |
That sounds good. I was just wondering your future plan. Thanks. |
Well there is one more big one, the JSC_PDESC query, but actually this could be merged as is to save me from getting distracted for another day :-) It does need a test with flux-sched first though. |
Sched 0.5.0 does pass |
Nice job! |
I notice a couple static functions don't have sanity coverage in the flux-core tests. Probably can't do anything about the rdl query function, but wreckrun does set a ersatz diff --git a/t/t2001-jsc.t b/t/t2001-jsc.t
index ea2109c..e88ab87 100755
--- a/t/t2001-jsc.t
+++ b/t/t2001-jsc.t
@@ -199,6 +199,10 @@ test_expect_success 'jstat 7.5: basic query works: pdesc' '
flux jstat query 1 pdesc
'
+test_expect_success 'jstat 7.6: basic query works: R_lite' '
+ flux jstat query 1 R_lite
+'
+
test_expect_success 'jstat 8: query detects bad inputs' '
test_expect_code 42 flux jstat query 0 jobid &&
test_expect_code 42 flux jstat query 99999 state-pair &&
|
87364da
to
b12b1d1
Compare
To prepare for conversion of libjsc from json-c to jansson, pull in shortjansson.h wrappers from flux-sched project and adapt to local environment. This allows libjsc to be converted in several steps; otherwise, it's got to be done in one giant patch to avoid breaking git bisect. Drop Jtostr() - we will convert those directly to json_dumps () since a free() is required now.
Make a first pass converting jstatctl.c from json-c to jansson.
Replace several shortjansson.h calls with one json_pack() in get_update_jcb() to improve clarity.
Replace several shortjansson.h calls with one json_pack() in get_submit_jcb() to improve clarity. Use a fixed size buffer for numeric hash key instead of allocating dynamically.
Replace several shortjansson.h calls with one json_pack() in get_reserve_jcb() to improve clarity. Use a fixed size buffer for numeric hash key instead of allocating dynamically.
Problem: jsc_query_jcb() calls jobid_exists(), which looks up the job directory in the KVS, then does it again for the JSC_JOBID query. Replace query_jobid() with a json_pack() call, since without the additional KVS lookup all the qeury does is put the jobid in jcb object form.
60a30ed
to
6d3806d
Compare
Simplify query_state_pair() by having it directly perform the KVS lookup and create the JCB object with json_pack(). It now returns jcb on success, NULL on failure.
Change public API function to include (optionally) the number of gpus requested by job.
Refactor query_rdesc() and jsc_query_rdesc_efficiently() so that they make the five KVS lookups in parallel rather than serially. Change query_rdesc() so that it returns jcb object on success, NULL on failure. Eliminate calls to flux_log() that are perhaps not advisable in library context.
Simplify query_rdl () by having it directly perform the KVS lookup and create the JCB object with json_pack(). It now returns jcb on success, NULL on failure.
Simplify query_r_lite () by having it directly perform the KVS lookup and create the JCB object with json_pack(). It now returns jcb on success, NULL on failure.
Problem: procdesc debugging support was removed in wreck. Drop JSC_PDESC query and update support.
Now that all sub-functions return a JSON object, simplify the logic in the main jsc_query_jcb() function.
Problem: ctx->active_jobs stores an integer in the zhash item pointer to avoid a memory allocation, which makes the code confusing. Allocate an integer sized chunk of memory and insert that into the hash. Plus get rid of a temporary memory allocation for the hash key name since it's just an integer and can predictably fit in a small static buffer. Create functions for insert, lookup, and delete and use those rather than directly manipulating the hash at points of usage. Also use int rather than int64_t for state enum values.
Problem: every jsc query first calls jobid_exist(), which performs a synchronous KVS lookup on the job's KVS directory. This is unnecessary if the job is already in the active_jobs hash. If job is in the active_jobs hash, return success immediately. Also, if the KVS lookup is performed, call flux_future_get() rather than flux_kvs_lookup_get_dir() since the latter decodes the directory, and we are only going to throw it away.
Clean up the update_state() function to utilize json_unpack(), avoid shortjansson.h macros and drop extraneous flux_log() calls that probably shouldn't be there.
Modify update_rdesc() to accept the jcb parameter directly so it can be extracted with the same json_unpack() used to parse its innards.
Modify update_rdl() to accept the jcb parameter directly.
Modify update_r_lite() to accept the jcb parameter directly.
Simplify the jsc_update_jcb() function now that all the functions it calls can accept the jcb directly.
Consolidate event parsing into one flux_event_unpack(). Drop calls to flux_log() that probably shouldn't be there.
Now that libjsc has been converted to use the native jansson API, we can drop shortjansson.h.
We have finally migrated flux-core completely to jansson, so these convenience macros are no long needed.
We have finally migrated flux-core completely to jansson, so the captive libjson-c is no longer required. Fixes flux-framework#1326
Rebased and also dropped shortjson.h and libjson-c now that we're free of it. That ought to reduce my lines-of-code output for the year! |
I don't know if we want to handle the python jsc binding as part of this PR: https://github.com/flux-framework/flux-core/blob/master/src/bindings/python/flux/jsc.py. But this binding is a bit out of date -- e.g., deprecated keys are still there etc. I am okay if we want to handle python in a separate PR. But at some point we need to revisit this as it looks to me like the initial MLSI users are python users. |
Sounds like a separate PR to me, since the libjsc interfaces did not change in the PR (except for adding ngpus to the one function that was added for @trws. IT would be good to find out what interfaces are required for this particular application and see if we can provide it in a way that can transition to the new exec system. New issue? |
Agreed.
I should let this user explore the basic functionality first. Then we can move to advanced feature, I will create an issue or issues to track this. Compatibility with the new exec system is a decent goal! |
Nice work! I restarted the coverage builder because for some reason it is reporting that the build failed. However, Travis did report green in all builds. Is this ready to merge? |
This looks really good to me. The python stuff we should be able to
clean up in another ticket. It may be possible to do substantially
better by the bindings now that it supports importing some defines.
Alternately, we could actually make it a global constant struct that
contains all those values then the bindings would just pick it up,
either way.
…On 17 May 2018, at 22:10, Mark Grondona wrote:
Nice work!
I restarted the coverage builder because for some reason it is
reporting that the build failed. However, Travis did report green in
all builds. Is this ready to merge?
--
You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub:
#1524 (comment)
|
Thanks! Yes merge please. |
This PR converts libjsc from json-c to jansson by pulling in
shortjansson.h
from flux-sched.It then begins to simplify the code a bit using
json_pack()
,json_unpack()
, etc..Along the way some back to back KVS RPCs are able to be parallelized.
I'm still working on this - just posting a checkpoint in case @dongahn wants to have an early peek and see if I'm going off track in any way.