Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

job-ingest: add module for accepting submitted jobs #1543

Closed
wants to merge 19 commits into from

Conversation

garlick
Copy link
Member

@garlick garlick commented Jun 1, 2018

This is a toy "front end" for the new exec system. Don't get your hopes up!

There are three parts here:

  • flux-manager module for job ingest
  • flux job run client for submitting jobs
  • libjob - a placeholder for a new API for job submission/control

The API at this point only includes

enum {
    FLUX_JOB_NOVERIFY_SIGNATURE = 1,    // don't verify signer == requestor
};

typedef uint64_t flux_jobid_t;

flux_future_t *flux_job_run (flux_t *h, const char *J, int flags);
int flux_job_run_get_id (flux_future_t *f, flux_jobid_t *id);

J is a signed payload, expected to be jobspec, although for now only the signature is verified; the payload is not parsed, so you could stuff anything in there.

The command line allows jobspec (files or stdin) to be signed and submitted, and prints jobid(s) on stdout.

Usage: flux-job run [OPTIONS] jobspec
Run job(s)
  -U, --no-verify-signature
                         job-module should defer signature verification
  -c, --security-config=pattern
                         Use non-default security config glob
  -f, --fanout=N         Run at most N RPCs in parallel
  -h, --help             Display this message.
  -r, --repeat=N         Run N instances of jobspec
  -w, --words            Print jobid as a six word phrase

The job-manager module accepts flux_job_run() requests, assigns jobids (FLUIDs), and creates a KVS hierarchy under job that uses the FLUID "dothex" encoding to spread jobids over the KVS keyspace. Thus far only the J-signed key specified in RFC 16 is populated.

I worked on performance a bit. In the job-manager module, KVS updates are batched for up to 10ms, so a burst of job requests may be committed together. In the client, the code tries to keep --fanout RPCs in flight simultaneously. For a test case of

flux job run  --repeat 100000 --no-verify-signature ./myjob

where "myjob" is example 2.1 from RFC 14, and default fanout of 256, the ingest rate is about 10K jobs per second. With signature verification (configured for munge), about 3K jobs per second. The KVS contains lots of entries like this:

job.0000.0067.3200.0005.J-signed = dmVyc2lvbgBpMQBtZWNoYW5pc20Ac211bmdlAHVzZX...
job.0000.0067.3200.0006.J-signed = dmVyc2lvbgBpMQBtZWNoYW5pc20Ac211bmdlAHVzZX...
job.0000.0067.3200.0007.J-signed = dmVyc2lvbgBpMQBtZWNoYW5pc20Ac211bmdlAHVzZX...
job.0000.0067.3200.0008.J-signed = dmVyc2lvbgBpMQBtZWNoYW5pc20Ac211bmdlAHVzZX...
job.0000.0067.3300.0000.J-signed = dmVyc2lvbgBpMQBtZWNoYW5pc20Ac211bmdlAHVzZX...
job.0000.0067.3300.0001.J-signed = dmVyc2lvbgBpMQBtZWNoYW5pc20Ac211bmdlAHVzZX...

(the signed blob looks the same because I signed it once and submitted it --repeat times).

TODO:

  • Discussion(!)
  • Add option to individually sign repeat requests so we can see performance impact
  • Figure out how to call C++ jobpsec parser from job-manager and then see how validating goes.

There was no attempt to write tests or document at this stage. Maybe we'll scrap the whole thing after we have a talk about it, so I didn't want to put too much into this!

@coveralls
Copy link

coveralls commented Jun 1, 2018

Coverage Status

Coverage increased (+0.002%) to 79.459% when pulling a06a1f7 on garlick:flux_job into 3b6a456 on flux-framework:master.

@grondo
Copy link
Contributor

grondo commented Jun 1, 2018

One quick comment, it might be wise at this point to save the flux job run command for later when we've designed the UI for interactive job running (flux job run) and job submission (flux job submit). This way we can continue to modify the test utility without worrying about perturbing a core user interface with test options, etc.

@grondo
Copy link
Contributor

grondo commented Jun 1, 2018

Really cool by the way! I really like the delayed kvs flush, and the implementation of that turned out so nice and simple.

In retrospect, it sure would be nice if there was an implementation of flux_sign_wrap that returned a future (though we have a circular dependency there sadly). I wonder if the fact that the module blocks when verifying has anything to do with the drastic drop in throughput for that case?

@garlick
Copy link
Member Author

garlick commented Jun 2, 2018

I hadn't thought about it that this way, but you're right, the munge/unmunge calls encapsulate a synchronous call to the MUNGE daemon, and so will stall our (single threaded) reactor loop each time a job is processed. It would be interesting to see the performance of the curve signing method in comparison.

Thoughts on how to work around the MUNGE interface? Some kind of threaded adapter? A specialized MUNGE client that we could incorporate into our reactor loop?

Regarding the flux job run command, excellent point about needing to carefully design the user interface and not have it grow a bunch of options needed for testing or experimentation! I'll rename to flux job testrun or similar (or we could move it entirely out of the user facing command into t/job/testrun or similar).

@garlick
Copy link
Member Author

garlick commented Jun 2, 2018

Oh just realized the curve method requires a synchronous call to the imp, so I see your point about needing a future or equivalent.

@grondo
Copy link
Contributor

grondo commented Jun 2, 2018

I'll rename to flux job testrun or similar (or we could move it entirely out of the user facing command into t/job/testrun or similar).

I'd be fine with testrun or submitbench under the flux job command, could be useful in the future

Thoughts on how to work around the MUNGE interface? Some kind of threaded adapter? A specialized MUNGE client that we could incorporate into our reactor loop?

Yeah, my only thought was to have a worker pool of flux_sign_unwrap threads, perhaps even behind a service which could offer unwrap over an RPC and farm out the work to its threads.

But I like your idea of a specialized MUNGE client much better!

However, even if MUNGE API had an interface that could be hooked into a reactor, the flux_sign_wrap/unwrap() interface would need to be replaced with something that could utilize it. Additionally, to avoid circular dependency between libflux-security and libflux-core, the security API can't directly return a flux_future_t, but would have to export lower level functions that could then be used to build a single function returning a flux_future_t within flux-core... (is that making sense?)

@garlick
Copy link
Member Author

garlick commented Jun 2, 2018

Makes sense.

Although, heh, look in t/rpc/util.c and t/rpc/rpc.c. There is a "test server" implementation in there that starts a callback with a flux_t handle that is run on the other end of a shmem:// connector, similar to czmq's zactor. In that case it allowed the RPC client and server to be tested without a broker. The callback just registers message handlers and starts the reactor like a comms module.

We could avoid stalling the job-manager thread by implementing something similar that performs the blocking unwrap call in another thread. Maybe we just need an API in front of that general concept in flux-core? Then extend to allow a round-robin work crew of individual actors to gain speed?

@grondo
Copy link
Contributor

grondo commented Jun 2, 2018

We could avoid stalling the job-manager thread by implementing something similar that performs the blocking unwrap call in another thread. Maybe we just need an API in front of that general concept in flux-core? Then extend to allow a round-robin work crew of individual actors to gain speed?

Yes, that would seem to work as well, though of course it would be nicer if code we control worked non-blocking... But the general concept of shuttling a blocking function to a thread seems like a nice addition to flux-core.

@garlick
Copy link
Member Author

garlick commented Jun 2, 2018

Yes, that would seem to work as well, though of course it would be nicer if code we control worked non-blocking... But the general concept of shuttling a blocking function to a thread seems like a nice addition to flux-core.

Agreed, and I was just pleased to remember there was something already semi-worked out, not necessarily arguing that it was the better choice. Although I find it challenging to work out the right interface to make e.g. flux_sign_unwrap() non blocking without unduly complicating flux-security...

@grondo
Copy link
Contributor

grondo commented Jun 2, 2018

Although I find it challenging to work out the right interface to make e.g. flux_sign_unwrap() non blocking without unduly complicating flux-security...

Yeah, that is a good point, and of course there are going to be other blocking functions libflux-core API users would like to use...

@codecov-io
Copy link

codecov-io commented Jun 2, 2018

Codecov Report

Merging #1543 into master will decrease coverage by 0.01%.
The diff coverage is 76.45%.

@@            Coverage Diff             @@
##           master    #1543      +/-   ##
==========================================
- Coverage   79.16%   79.15%   -0.02%     
==========================================
  Files         169      175       +6     
  Lines       31035    31657     +622     
==========================================
+ Hits        24569    25057     +488     
- Misses       6466     6600     +134
Impacted Files Coverage Δ
src/common/libutil/fluid.c 100% <100%> (ø) ⬆️
src/common/libjob/job.c 100% <100%> (ø)
src/modules/job-ingest/job-ingest.c 64.67% <64.67%> (ø)
src/cmd/flux-job.c 88.96% <88.96%> (ø)
src/broker/module.c 83.79% <0%> (-1.68%) ⬇️
src/common/libflux/request.c 88.46% <0%> (-1.29%) ⬇️
src/broker/content-cache.c 72.88% <0%> (-1.28%) ⬇️
src/common/libflux/mrpc.c 86.11% <0%> (-1.2%) ⬇️
src/broker/modservice.c 80.58% <0%> (-0.98%) ⬇️
src/common/libflux/rpc.c 92.3% <0%> (-0.77%) ⬇️
... and 16 more

@garlick
Copy link
Member Author

garlick commented Jun 7, 2018

I just pushed a (squashed) update that makes the following changes:

  • Add jobs to kvs under jobs.active.<jobid> as planned in RFC 16 (where <jobid> is the hex FLUID as above). I had skipped the "active" part before.
  • Add jobs.active.<jobid>.jobspec containing unwrapped jobspec YAML
  • Add jobs.active.<jobid>.userid containing authenticated userid of submitter
  • In job-manager, call flux_sign_unwrap() with the FLUX_SIGN_NOVERIFY flag. We can verify that the security header userid matches the authenticated userid without cryptographically verifying the signature inside the instance. This avoids the performance bottleneck described above on the job ingest side anyway.

@garlick
Copy link
Member Author

garlick commented Jun 8, 2018

Just pushed another (sqashed) set of changes:

  • Rename job-manager to job-ingest per discussion with @grondo. We'll keep ingest separate from the managment functionality, so that we can load job-ingest across an instance and possibly load job-manager (which may have a larger memory footprint) on a subset of nodes.
  • Issue an event containing a batch of new jobid's when batch KVS commits complete
  • Code for committing batches of KVS data reworked - it was a bit too simplistic before and may have responded to users before the KVS commit was complete in some overlapping cases.
  • Began to add a sharness test but this is not yet working.

I'll keep going on this next week.

@garlick garlick force-pushed the flux_job branch 4 times, most recently from 4f889be to a33835f Compare June 13, 2018 00:48
@garlick
Copy link
Member Author

garlick commented Jun 13, 2018

Changes since the last comment

  • Updated to track flux-security API changes
  • Copious inline comments in job-ingest module
  • Submitted jobspec is validated, with text error messages returned from parser on invalid jobspec
  • Added flux job id command to convert between jobid representations
  • sharness test works (hopefully in travis, after various issues)

@garlick garlick changed the title experimental: flux job command, job-manager module job-ingest: add module for accepting submitted jobs Jun 13, 2018
@garlick
Copy link
Member Author

garlick commented Jun 13, 2018

Just pushed a squashed and cleaned up version of this PR, with more tests and better commit documentation. Hopefully the coverage will be a little better than it was.

@garlick
Copy link
Member Author

garlick commented Jun 14, 2018

I think this PR now address the ingest portion of #1332 and is probably a candidate for merging.

One caveat: this module is not currently reloadable. For that to work, we will need some way for FLUID generator state to be save/restored (#1545), and I propose we do that in another PR. That's the only impediment to making this reloadable that I can think of.

@grondo
Copy link
Contributor

grondo commented Jun 14, 2018

If we do consider merging this now, we should probably wrap up any minor fixes and improvements to flux-core first, merge those and then tag a release.

Once this PR is merged flux-core now has a hard dependency on flux-security and we'd have to package and release that project for use on our systems where flux-core and flux-sched are currently installed.

Alternatively, we could create a "stable" branch off v0.9.0, and fixes destined for "alpha-production" could go on that branch (hopefully not much) as v0.9.1, v0.9.2, and master would become the unstable/experimental branch.

@garlick
Copy link
Member Author

garlick commented Jun 14, 2018

Oh right, I forgot we don't have a package for flux-security yet. My vote would be to create a stable branch from 0.9. However, see "creating stable releases" in RFC 1.

It mandates one branch per github project, that the main project master is the unstable development branch and, and that github repo clones are the mechanism for creating stable branches. So if we follow this, we'd clone flux-core to flux-core-0.9 and its master would be the first stable production "branch".

I guess the advantage is a stable repo can have different maintainers, and it's more clear to visitors of your organization site where the stable bits are?

(Ew, that will break our PKG_NAME, PKG_VERSION that is derived from git describe)

@garlick garlick force-pushed the flux_job branch 3 times, most recently from 2ea229f to 2f3fb44 Compare June 15, 2018 04:07
@garlick
Copy link
Member Author

garlick commented Jun 15, 2018

I went ahead and added the (opt-in) configure --with-flux-security option.

If this option is not provided, then only the instance owner is allowed to submit jobs, and the J-signed KVS key is left unpopulated (but otherwise everything works as before).

I tidied up a few other things, added some more inline documentation, and squashed.

Problem: DOTHEX-encoded FLUIDs can be used to map FLUIDs
onto the KVS namespace, but without zero padding,
flux-kvs ls doesn't display or sort them nicely.

Pad each dotted hex number out to four digits.
Problem: fluid_decode (type=MNEMONIC) does not fail if
presented with words not in its dictionary or a phrase
too short to represent a uint64_t.

mn_decode()'s inline documentation is incorrect:

/* Return value:
 *  This function may return all the value returned by mn_decode_word_index
 *  plus the following result code:
 *
 * MN_EWORD  - Unrecognized word.
 */

It actually returns the number of bytes successfully
written to the output.

Require mn_decode() to return 8 in order for fluid_decode() to
be successful.
Problem: t2000-wreck-env.t fails in cloud9 environment.

"C9_HOSTNAME" is filtered out of expected list of environment
variables because "grep -v HOSTNAME" matches it.  Change
the test to match variables using e.g. "^HOSTNAME=".
Problem: jobspec sharness test was not run.

Add jobspec inputs to EXTRA_DIST and run jobspec
test if ENABLE_JOBSPEC.
Problem: while there is an ENABLE_JOBSPEC Makefile
conditional, there is nothing that can be tested in
config.h to determine if jobspec is being built.

Add HAVE_JOBSPEC to config.h
If user requests it by specifying --with-flux-security,
have configure locate flux-security using pkg-config.

Defines HAVE_FLUX_SECURITY in config.h
Defines HAVE_FLUX_SECURITY for Makefile.am's
Add flux-security to travis-dep-builder.sh's
checkouts, checking out a specific recent commit.

Since flux-security depends on libsodium (a tarball
download), and other downloads depend on checkouts,
move the checkout-building code from the main program
to build_checkouts(), then call build_checkouts()
once before downloads to build the original checkouts,
and once after to build flux-security.
Add --with-flux-security to some builders.
Leave it off of one gcc and one clang builder though
to be sure we don't break compilation of the very tiny
bit of #elsif code.
Add a library will provides an API for job creation,
monitoring, and control.  For now it contains only
an interface for job submission:

/* Submit signed jobspec J.  flags must be zero.
 * Future is fulfilled once job has a jobid and
 * exists in KVS.
 */
flux_future_t *flux_job_add (flux_t *h, const char *J, int flags);

/* Obtain jobid from future.
 * On error, use flux_rpc_get_error() to get detailed
 * failure message.
 */
int flux_job_add_get_id (flux_future_t *f, flux_jobid_t *id);
Add a module that handles job-ingest.add RPCs to
add new jobs to the KVS using a rudimentary form of
the RFC 16 job schema foramt.  The user is returned
a jobid based on the FLUID proposal, which allows
64-bit id generation to occur in parallel across ranks,
while retaining a loose ordering of ids based in the
time submitted.  KVS per-job directories are generated
using the DOTHEX FLUID encoding, e.g.

job.active.0000.011c.ae00.0002

The instance owner and any user with ROLE_USER may
submit jobs.  Jobs must be signed, and the user
authenticated as the submitter (by the connector) must
match the signature, but the job signature is not
authenticated at ingest time.  The connector-authenticated
userid is recored in the KVS under the "userid" key.
The signed blob is recorded under the "J-signed" key.

The job submission consists of signed RFC 14 jobspec,
which is validated by temporarily instantiating a C++
Jobspec object and recording any parse errors.
The parsed Jobspec may be further validated in the
future, for example to find resource requests that can
not be fulfilled, but for now we ingest all valid jobspec.
(Validation is performed in a standalone .cpp file linked
against libjobspec.la.  The standalone C++, which exports
a validate function callable from C, is compiled with the
C++ compiler; automake then knows to link job-ingest
against libstdc++).  The unwrapped jobspec is written
to the KVS under the "jobspec" key.

The module is completely event driven, and KVS overhead
is reduced and ingest rate increased by batching job-ingest.add
requests that arrive toether within 10ms.

A "job-ingest.new" event is generated after the KVS commit
which contains an array of new jobids.  This can be consumed
by the job-manager module in the future, which will handle
listing jobs for users, and informing the scheduler when
new active jobs have been submitted.
Add a front end command that will eventually contain
the primary user interfaces for submitting and
managing jobs such as list, run, or cancel.
For now, it contains two subcommands  for testing
the job-ingest module:

flux job submitbench [OPTIONS] jobspec
    Submit a job some number of times, keeping a
    configurable number of RPCs in flight.

flux job id [--to=TYPE] [--from=TYPE] id ...
    Where TYPE is dec, words, or kvs-active.
    Converts jobids from one representation to another.
Ad a test_under_flux "profile" for testing the
new execution system, starting with job-ingest module.
@garlick
Copy link
Member Author

garlick commented Jun 25, 2018

Rebase on current master.

@garlick
Copy link
Member Author

garlick commented Jul 7, 2018

I'll withdraw this PR for now until we figure out where the new pieces of the exec system should land.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants