job-ingest: add module for accepting submitted jobs #1543

garlick · 2018-06-01T21:59:26Z

This is a toy "front end" for the new exec system. Don't get your hopes up!

There are three parts here:

flux-manager module for job ingest
flux job run client for submitting jobs
libjob - a placeholder for a new API for job submission/control

The API at this point only includes

enum {
    FLUX_JOB_NOVERIFY_SIGNATURE = 1,    // don't verify signer == requestor
};

typedef uint64_t flux_jobid_t;

flux_future_t *flux_job_run (flux_t *h, const char *J, int flags);
int flux_job_run_get_id (flux_future_t *f, flux_jobid_t *id);

J is a signed payload, expected to be jobspec, although for now only the signature is verified; the payload is not parsed, so you could stuff anything in there.

The command line allows jobspec (files or stdin) to be signed and submitted, and prints jobid(s) on stdout.

Usage: flux-job run [OPTIONS] jobspec
Run job(s)
  -U, --no-verify-signature
                         job-module should defer signature verification
  -c, --security-config=pattern
                         Use non-default security config glob
  -f, --fanout=N         Run at most N RPCs in parallel
  -h, --help             Display this message.
  -r, --repeat=N         Run N instances of jobspec
  -w, --words            Print jobid as a six word phrase

The job-manager module accepts flux_job_run() requests, assigns jobids (FLUIDs), and creates a KVS hierarchy under job that uses the FLUID "dothex" encoding to spread jobids over the KVS keyspace. Thus far only the J-signed key specified in RFC 16 is populated.

I worked on performance a bit. In the job-manager module, KVS updates are batched for up to 10ms, so a burst of job requests may be committed together. In the client, the code tries to keep --fanout RPCs in flight simultaneously. For a test case of

flux job run  --repeat 100000 --no-verify-signature ./myjob

where "myjob" is example 2.1 from RFC 14, and default fanout of 256, the ingest rate is about 10K jobs per second. With signature verification (configured for munge), about 3K jobs per second. The KVS contains lots of entries like this:

job.0000.0067.3200.0005.J-signed = dmVyc2lvbgBpMQBtZWNoYW5pc20Ac211bmdlAHVzZX...
job.0000.0067.3200.0006.J-signed = dmVyc2lvbgBpMQBtZWNoYW5pc20Ac211bmdlAHVzZX...
job.0000.0067.3200.0007.J-signed = dmVyc2lvbgBpMQBtZWNoYW5pc20Ac211bmdlAHVzZX...
job.0000.0067.3200.0008.J-signed = dmVyc2lvbgBpMQBtZWNoYW5pc20Ac211bmdlAHVzZX...
job.0000.0067.3300.0000.J-signed = dmVyc2lvbgBpMQBtZWNoYW5pc20Ac211bmdlAHVzZX...
job.0000.0067.3300.0001.J-signed = dmVyc2lvbgBpMQBtZWNoYW5pc20Ac211bmdlAHVzZX...

(the signed blob looks the same because I signed it once and submitted it --repeat times).

TODO:

Discussion(!)
Add option to individually sign repeat requests so we can see performance impact
Figure out how to call C++ jobpsec parser from job-manager and then see how validating goes.

There was no attempt to write tests or document at this stage. Maybe we'll scrap the whole thing after we have a talk about it, so I didn't want to put too much into this!

coveralls · 2018-06-01T22:22:41Z

Coverage increased (+0.002%) to 79.459% when pulling a06a1f7 on garlick:flux_job into 3b6a456 on flux-framework:master.

grondo · 2018-06-01T22:47:37Z

One quick comment, it might be wise at this point to save the flux job run command for later when we've designed the UI for interactive job running (flux job run) and job submission (flux job submit). This way we can continue to modify the test utility without worrying about perturbing a core user interface with test options, etc.

grondo · 2018-06-01T23:17:11Z

Really cool by the way! I really like the delayed kvs flush, and the implementation of that turned out so nice and simple.

In retrospect, it sure would be nice if there was an implementation of flux_sign_wrap that returned a future (though we have a circular dependency there sadly). I wonder if the fact that the module blocks when verifying has anything to do with the drastic drop in throughput for that case?

garlick · 2018-06-02T03:14:02Z

I hadn't thought about it that this way, but you're right, the munge/unmunge calls encapsulate a synchronous call to the MUNGE daemon, and so will stall our (single threaded) reactor loop each time a job is processed. It would be interesting to see the performance of the curve signing method in comparison.

Thoughts on how to work around the MUNGE interface? Some kind of threaded adapter? A specialized MUNGE client that we could incorporate into our reactor loop?

Regarding the flux job run command, excellent point about needing to carefully design the user interface and not have it grow a bunch of options needed for testing or experimentation! I'll rename to flux job testrun or similar (or we could move it entirely out of the user facing command into t/job/testrun or similar).

garlick · 2018-06-02T03:17:10Z

Oh just realized the curve method requires a synchronous call to the imp, so I see your point about needing a future or equivalent.

grondo · 2018-06-02T03:43:20Z

I'll rename to flux job testrun or similar (or we could move it entirely out of the user facing command into t/job/testrun or similar).

I'd be fine with testrun or submitbench under the flux job command, could be useful in the future

Thoughts on how to work around the MUNGE interface? Some kind of threaded adapter? A specialized MUNGE client that we could incorporate into our reactor loop?

Yeah, my only thought was to have a worker pool of flux_sign_unwrap threads, perhaps even behind a service which could offer unwrap over an RPC and farm out the work to its threads.

But I like your idea of a specialized MUNGE client much better!

However, even if MUNGE API had an interface that could be hooked into a reactor, the flux_sign_wrap/unwrap() interface would need to be replaced with something that could utilize it. Additionally, to avoid circular dependency between libflux-security and libflux-core, the security API can't directly return a flux_future_t, but would have to export lower level functions that could then be used to build a single function returning a flux_future_t within flux-core... (is that making sense?)

garlick · 2018-06-02T15:26:56Z

Makes sense.

Although, heh, look in t/rpc/util.c and t/rpc/rpc.c. There is a "test server" implementation in there that starts a callback with a flux_t handle that is run on the other end of a shmem:// connector, similar to czmq's zactor. In that case it allowed the RPC client and server to be tested without a broker. The callback just registers message handlers and starts the reactor like a comms module.

We could avoid stalling the job-manager thread by implementing something similar that performs the blocking unwrap call in another thread. Maybe we just need an API in front of that general concept in flux-core? Then extend to allow a round-robin work crew of individual actors to gain speed?

grondo · 2018-06-02T16:28:10Z

We could avoid stalling the job-manager thread by implementing something similar that performs the blocking unwrap call in another thread. Maybe we just need an API in front of that general concept in flux-core? Then extend to allow a round-robin work crew of individual actors to gain speed?

Yes, that would seem to work as well, though of course it would be nicer if code we control worked non-blocking... But the general concept of shuttling a blocking function to a thread seems like a nice addition to flux-core.

garlick · 2018-06-02T16:34:58Z

Yes, that would seem to work as well, though of course it would be nicer if code we control worked non-blocking... But the general concept of shuttling a blocking function to a thread seems like a nice addition to flux-core.

Agreed, and I was just pleased to remember there was something already semi-worked out, not necessarily arguing that it was the better choice. Although I find it challenging to work out the right interface to make e.g. flux_sign_unwrap() non blocking without unduly complicating flux-security...

grondo · 2018-06-02T16:37:58Z

Although I find it challenging to work out the right interface to make e.g. flux_sign_unwrap() non blocking without unduly complicating flux-security...

Yeah, that is a good point, and of course there are going to be other blocking functions libflux-core API users would like to use...

codecov-io · 2018-06-02T18:21:09Z

Codecov Report

Merging #1543 into master will decrease coverage by 0.01%.
The diff coverage is 76.45%.

@@            Coverage Diff             @@
##           master    #1543      +/-   ##
==========================================
- Coverage   79.16%   79.15%   -0.02%     
==========================================
  Files         169      175       +6     
  Lines       31035    31657     +622     
==========================================
+ Hits        24569    25057     +488     
- Misses       6466     6600     +134

Impacted Files	Coverage Δ
src/common/libutil/fluid.c	`100% <100%> (ø)`	⬆️
src/common/libjob/job.c	`100% <100%> (ø)`
src/modules/job-ingest/job-ingest.c	`64.67% <64.67%> (ø)`
src/cmd/flux-job.c	`88.96% <88.96%> (ø)`
src/broker/module.c	`83.79% <0%> (-1.68%)`	⬇️
src/common/libflux/request.c	`88.46% <0%> (-1.29%)`	⬇️
src/broker/content-cache.c	`72.88% <0%> (-1.28%)`	⬇️
src/common/libflux/mrpc.c	`86.11% <0%> (-1.2%)`	⬇️
src/broker/modservice.c	`80.58% <0%> (-0.98%)`	⬇️
src/common/libflux/rpc.c	`92.3% <0%> (-0.77%)`	⬇️
... and 16 more

garlick · 2018-06-07T22:23:49Z

I just pushed a (squashed) update that makes the following changes:

Add jobs to kvs under jobs.active.<jobid> as planned in RFC 16 (where <jobid> is the hex FLUID as above). I had skipped the "active" part before.
Add jobs.active.<jobid>.jobspec containing unwrapped jobspec YAML
Add jobs.active.<jobid>.userid containing authenticated userid of submitter
In job-manager, call flux_sign_unwrap() with the FLUX_SIGN_NOVERIFY flag. We can verify that the security header userid matches the authenticated userid without cryptographically verifying the signature inside the instance. This avoids the performance bottleneck described above on the job ingest side anyway.

garlick · 2018-06-08T23:00:43Z

Just pushed another (sqashed) set of changes:

Rename job-manager to job-ingest per discussion with @grondo. We'll keep ingest separate from the managment functionality, so that we can load job-ingest across an instance and possibly load job-manager (which may have a larger memory footprint) on a subset of nodes.
Issue an event containing a batch of new jobid's when batch KVS commits complete
Code for committing batches of KVS data reworked - it was a bit too simplistic before and may have responded to users before the KVS commit was complete in some overlapping cases.
Began to add a sharness test but this is not yet working.

I'll keep going on this next week.

garlick · 2018-06-13T00:55:23Z

Changes since the last comment

Updated to track flux-security API changes
Copious inline comments in job-ingest module
Submitted jobspec is validated, with text error messages returned from parser on invalid jobspec
Added flux job id command to convert between jobid representations
sharness test works (hopefully in travis, after various issues)

garlick · 2018-06-13T18:41:50Z

Just pushed a squashed and cleaned up version of this PR, with more tests and better commit documentation. Hopefully the coverage will be a little better than it was.

garlick · 2018-06-14T17:22:25Z

I think this PR now address the ingest portion of #1332 and is probably a candidate for merging.

One caveat: this module is not currently reloadable. For that to work, we will need some way for FLUID generator state to be save/restored (#1545), and I propose we do that in another PR. That's the only impediment to making this reloadable that I can think of.

grondo · 2018-06-14T17:37:29Z

If we do consider merging this now, we should probably wrap up any minor fixes and improvements to flux-core first, merge those and then tag a release.

Once this PR is merged flux-core now has a hard dependency on flux-security and we'd have to package and release that project for use on our systems where flux-core and flux-sched are currently installed.

Alternatively, we could create a "stable" branch off v0.9.0, and fixes destined for "alpha-production" could go on that branch (hopefully not much) as v0.9.1, v0.9.2, and master would become the unstable/experimental branch.

garlick · 2018-06-14T17:59:27Z

Oh right, I forgot we don't have a package for flux-security yet. My vote would be to create a stable branch from 0.9. However, see "creating stable releases" in RFC 1.

It mandates one branch per github project, that the main project master is the unstable development branch and, and that github repo clones are the mechanism for creating stable branches. So if we follow this, we'd clone flux-core to flux-core-0.9 and its master would be the first stable production "branch".

I guess the advantage is a stable repo can have different maintainers, and it's more clear to visitors of your organization site where the stable bits are?

(Ew, that will break our PKG_NAME, PKG_VERSION that is derived from git describe)

garlick · 2018-06-15T04:12:23Z

I went ahead and added the (opt-in) configure --with-flux-security option.

If this option is not provided, then only the instance owner is allowed to submit jobs, and the J-signed KVS key is left unpopulated (but otherwise everything works as before).

I tidied up a few other things, added some more inline documentation, and squashed.

Problem: DOTHEX-encoded FLUIDs can be used to map FLUIDs onto the KVS namespace, but without zero padding, flux-kvs ls doesn't display or sort them nicely. Pad each dotted hex number out to four digits.

Problem: fluid_decode (type=MNEMONIC) does not fail if presented with words not in its dictionary or a phrase too short to represent a uint64_t. mn_decode()'s inline documentation is incorrect: /* Return value: * This function may return all the value returned by mn_decode_word_index * plus the following result code: * * MN_EWORD - Unrecognized word. */ It actually returns the number of bytes successfully written to the output. Require mn_decode() to return 8 in order for fluid_decode() to be successful.

Problem: t2000-wreck-env.t fails in cloud9 environment. "C9_HOSTNAME" is filtered out of expected list of environment variables because "grep -v HOSTNAME" matches it. Change the test to match variables using e.g. "^HOSTNAME=".

Problem: jobspec sharness test was not run. Add jobspec inputs to EXTRA_DIST and run jobspec test if ENABLE_JOBSPEC.

Problem: while there is an ENABLE_JOBSPEC Makefile conditional, there is nothing that can be tested in config.h to determine if jobspec is being built. Add HAVE_JOBSPEC to config.h

If user requests it by specifying --with-flux-security, have configure locate flux-security using pkg-config. Defines HAVE_FLUX_SECURITY in config.h Defines HAVE_FLUX_SECURITY for Makefile.am's

Add flux-security to travis-dep-builder.sh's checkouts, checking out a specific recent commit. Since flux-security depends on libsodium (a tarball download), and other downloads depend on checkouts, move the checkout-building code from the main program to build_checkouts(), then call build_checkouts() once before downloads to build the original checkouts, and once after to build flux-security.

Add --with-flux-security to some builders. Leave it off of one gcc and one clang builder though to be sure we don't break compilation of the very tiny bit of #elsif code.

Add a library will provides an API for job creation, monitoring, and control. For now it contains only an interface for job submission: /* Submit signed jobspec J. flags must be zero. * Future is fulfilled once job has a jobid and * exists in KVS. */ flux_future_t *flux_job_add (flux_t *h, const char *J, int flags); /* Obtain jobid from future. * On error, use flux_rpc_get_error() to get detailed * failure message. */ int flux_job_add_get_id (flux_future_t *f, flux_jobid_t *id);

Add a module that handles job-ingest.add RPCs to add new jobs to the KVS using a rudimentary form of the RFC 16 job schema foramt. The user is returned a jobid based on the FLUID proposal, which allows 64-bit id generation to occur in parallel across ranks, while retaining a loose ordering of ids based in the time submitted. KVS per-job directories are generated using the DOTHEX FLUID encoding, e.g. job.active.0000.011c.ae00.0002 The instance owner and any user with ROLE_USER may submit jobs. Jobs must be signed, and the user authenticated as the submitter (by the connector) must match the signature, but the job signature is not authenticated at ingest time. The connector-authenticated userid is recored in the KVS under the "userid" key. The signed blob is recorded under the "J-signed" key. The job submission consists of signed RFC 14 jobspec, which is validated by temporarily instantiating a C++ Jobspec object and recording any parse errors. The parsed Jobspec may be further validated in the future, for example to find resource requests that can not be fulfilled, but for now we ingest all valid jobspec. (Validation is performed in a standalone .cpp file linked against libjobspec.la. The standalone C++, which exports a validate function callable from C, is compiled with the C++ compiler; automake then knows to link job-ingest against libstdc++). The unwrapped jobspec is written to the KVS under the "jobspec" key. The module is completely event driven, and KVS overhead is reduced and ingest rate increased by batching job-ingest.add requests that arrive toether within 10ms. A "job-ingest.new" event is generated after the KVS commit which contains an array of new jobids. This can be consumed by the job-manager module in the future, which will handle listing jobs for users, and informing the scheduler when new active jobs have been submitted.

Add a front end command that will eventually contain the primary user interfaces for submitting and managing jobs such as list, run, or cancel. For now, it contains two subcommands for testing the job-ingest module: flux job submitbench [OPTIONS] jobspec Submit a job some number of times, keeping a configurable number of RPCs in flight. flux job id [--to=TYPE] [--from=TYPE] id ... Where TYPE is dec, words, or kvs-active. Converts jobids from one representation to another.

Ad a test_under_flux "profile" for testing the new execution system, starting with job-ingest module.

garlick · 2018-06-25T17:37:13Z

Rebase on current master.

garlick · 2018-07-07T18:23:31Z

I'll withdraw this PR for now until we figure out where the new pieces of the exec system should land.

garlick mentioned this pull request Jun 4, 2018

API for event loop integration flux-framework/flux-security#74

Open

garlick force-pushed the flux_job branch from 3f356c5 to 112494c Compare June 7, 2018 22:15

garlick mentioned this pull request Jun 7, 2018

need a way to save/restore FLUID state #1545

Closed

garlick force-pushed the flux_job branch from 112494c to da3511e Compare June 8, 2018 22:55

garlick force-pushed the flux_job branch 4 times, most recently from 4f889be to a33835f Compare June 13, 2018 00:48

garlick changed the title ~~experimental: flux job command, job-manager module~~ job-ingest: add module for accepting submitted jobs Jun 13, 2018

garlick force-pushed the flux_job branch from db4a097 to 34fb142 Compare June 13, 2018 18:33

garlick force-pushed the flux_job branch from be2653f to 5154219 Compare June 13, 2018 22:49

garlick mentioned this pull request Jun 14, 2018

minimal flux-submit and job service #1332

Closed

garlick force-pushed the flux_job branch 3 times, most recently from 2ea229f to 2f3fb44 Compare June 15, 2018 04:07

garlick mentioned this pull request Jun 15, 2018

new rfc: document FLUID design flux-framework/rfc#128

Closed

garlick force-pushed the flux_job branch from e3d1e46 to d8e104c Compare June 18, 2018 13:58

garlick added 19 commits June 25, 2018 09:07

libutil/fluid: 0-pad DOTHEX encoding

9b40232

Problem: DOTHEX-encoded FLUIDs can be used to map FLUIDs onto the KVS namespace, but without zero padding, flux-kvs ls doesn't display or sort them nicely. Pad each dotted hex number out to four digits.

libutil/fluid/test: cover fluid_decode bad input

e7ae926

wreck: tighten env filter in wreck-env test

4d9f45d

Problem: t2000-wreck-env.t fails in cloud9 environment. "C9_HOSTNAME" is filtered out of expected list of environment variables because "grep -v HOSTNAME" matches it. Change the test to match variables using e.g. "^HOSTNAME=".

build: add jobspec sharness test to Makefile.am

a688bfd

Problem: jobspec sharness test was not run. Add jobspec inputs to EXTRA_DIST and run jobspec test if ENABLE_JOBSPEC.

build: define HAVE_JOBSPEC in config.h

3b979ad

Problem: while there is an ENABLE_JOBSPEC Makefile conditional, there is nothing that can be tested in config.h to determine if jobspec is being built. Add HAVE_JOBSPEC to config.h

build: add --with-flux-security (opt-in)

1b6d441

If user requests it by specifying --with-flux-security, have configure locate flux-security using pkg-config. Defines HAVE_FLUX_SECURITY in config.h Defines HAVE_FLUX_SECURITY for Makefile.am's

travis-ci: run autogen.sh in checkout if necessary

fa02155

travis-ci: add --with-flux-security

4314d97

Add --with-flux-security to some builders. Leave it off of one gcc and one clang builder though to be sure we don't break compilation of the very tiny bit of #elsif code.

libjob/test: add TAP test for flux_job invalid args

9a5e473

sharness: add prog to test for HAVE_FLUX_SECURITY

dc6dcc9

sharness: add "job" rc1/rc3 scripts

b0ca3df

Ad a test_under_flux "profile" for testing the new execution system, starting with job-ingest module.

sharness: add job-ingest test script

39d774d

sharness: add flux-job test script

2c07d3e

sharness: cover mech=none, non-owner ingest

a06a1f7

garlick force-pushed the flux_job branch from d8e104c to a06a1f7 Compare June 25, 2018 16:12

garlick closed this Jul 7, 2018

garlick mentioned this pull request Aug 22, 2018

job-ingest: add ingest module, flux job command, libjob library #1626

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

job-ingest: add module for accepting submitted jobs #1543

job-ingest: add module for accepting submitted jobs #1543

garlick commented Jun 1, 2018

coveralls commented Jun 1, 2018 •

edited

grondo commented Jun 1, 2018

grondo commented Jun 1, 2018

garlick commented Jun 2, 2018

garlick commented Jun 2, 2018

grondo commented Jun 2, 2018

garlick commented Jun 2, 2018

grondo commented Jun 2, 2018

garlick commented Jun 2, 2018

grondo commented Jun 2, 2018

codecov-io commented Jun 2, 2018 •

edited

garlick commented Jun 7, 2018

garlick commented Jun 8, 2018

garlick commented Jun 13, 2018

garlick commented Jun 13, 2018

garlick commented Jun 14, 2018

grondo commented Jun 14, 2018

garlick commented Jun 14, 2018

garlick commented Jun 15, 2018

garlick commented Jun 25, 2018

garlick commented Jul 7, 2018

job-ingest: add module for accepting submitted jobs #1543

job-ingest: add module for accepting submitted jobs #1543

Conversation

garlick commented Jun 1, 2018

coveralls commented Jun 1, 2018 • edited

grondo commented Jun 1, 2018

grondo commented Jun 1, 2018

garlick commented Jun 2, 2018

garlick commented Jun 2, 2018

grondo commented Jun 2, 2018

garlick commented Jun 2, 2018

grondo commented Jun 2, 2018

garlick commented Jun 2, 2018

grondo commented Jun 2, 2018

codecov-io commented Jun 2, 2018 • edited

Codecov Report

garlick commented Jun 7, 2018

garlick commented Jun 8, 2018

garlick commented Jun 13, 2018

garlick commented Jun 13, 2018

garlick commented Jun 14, 2018

grondo commented Jun 14, 2018

garlick commented Jun 14, 2018

garlick commented Jun 15, 2018

garlick commented Jun 25, 2018

garlick commented Jul 7, 2018

coveralls commented Jun 1, 2018 •

edited

codecov-io commented Jun 2, 2018 •

edited