Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

job-ingest: add ingest module, flux job command, libjob library #1626

Merged
merged 15 commits into from Aug 28, 2018

Conversation

Projects
None yet
5 participants
@garlick
Copy link
Member

garlick commented Aug 22, 2018

This PR resurrects #1543.

The interface in job.h (under libjob) for submitting work is:

typedef uint64_t flux_jobid_t;

/* Submit a job to the system.
 * J should be RFC 14 jobspec signed by flux_sign_wrap(), provided
 * flux was built --with-flux-security.  If not, then J should be bare jobspec.
 * Currently the 'flags' parameter must be set to 0.
 * The system assigns a jobid and returns it in the response.
 */
flux_future_t *flux_job_add (flux_t *h, const char *J, int flags);

/* Parse jobid from response to flux_job_add() request.
 * Returns 0 on success, -1 on failure with errno set - and an extended
 * error message may be available with flux_future_error_string().
 */
int flux_job_add_get_id (flux_future_t *f, flux_jobid_t *id);

The flux job command currently has two subcommands:

Usage: flux-job id [OPTIONS] [id ...]
Convert jobid(s) to another form
  -f, --from=dec|kvs-active|words
                         Convert jobid from specified form
  -h, --help             Display this message.
  -t, --to=dec|kvs-active|words
                         Convert jobid to specified form

and

Usage: flux-job submitbench [OPTIONS] jobspec
Run job(s)
  -R, --reuse-signature  Sign jobspec once and reuse the result for multiple
                         RPCs
  -c, --security-config=pattern
                         Use non-default security config glob
  -f, --fanout=N         Run at most N RPCs in parallel
  -h, --help             Display this message.
  -r, --repeat=N         Run N instances of jobspec
  -s, --sign-type=TYPE   Use non-default mechanism type to sign J

There is an opt-in configure option for building with flux-security. If building without, jobs can be submitted and they are tagged with the submitting user, but they are not signed and thus will only be usable if we implement a short-circuit that allows the instance owner to launch its own work without signature verification.

  --with-flux-security    Build with flux-security
@grondo

This comment has been minimized.

Copy link
Contributor

grondo commented Aug 23, 2018

Just curious why not flux_job_submit() instead of flux_job_add(). Are you reserving flux_submit_submit() function for a future, higher-level interface? Apologies if I missed something obvious, I've only just started reviewing.

@garlick

This comment has been minimized.

Copy link
Member Author

garlick commented Aug 23, 2018

Honestly I can't remember. Happy to change it if that makes more sense.

@grondo

This comment has been minimized.

Copy link
Contributor

grondo commented Aug 23, 2018

Honestly I can't remember. Happy to change it if that makes more sense.

Well, it is only my opinion but a flux_job_submit() call would probably be what users of the API would be expecting...

@garlick

This comment has been minimized.

Copy link
Member Author

garlick commented Aug 23, 2018

flux_job_submit() call would probably be what users of the API would be expecting...

Yes will change. Not sure what I was on about there.

@coveralls

This comment has been minimized.

Copy link

coveralls commented Aug 23, 2018

Coverage Status

Coverage decreased (-0.3%) to 79.236% when pulling 8f405a3 on garlick:flux_job into 72a0550 on flux-framework:master.

@codecov-io

This comment has been minimized.

Copy link

codecov-io commented Aug 23, 2018

Codecov Report

❗️ No coverage uploaded for pull request base (master@72a0550). Click here to learn what that means.
The diff coverage is 76.14%.

@@           Coverage Diff            @@
##             master   #1626   +/-   ##
========================================
  Coverage          ?   79.4%           
========================================
  Files             ?     179           
  Lines             ?   32512           
  Branches          ?       0           
========================================
  Hits              ?   25816           
  Misses            ?    6696           
  Partials          ?       0
Impacted Files Coverage Δ
src/common/libutil/fluid.c 100% <100%> (ø)
src/common/libjob/job.c 100% <100%> (ø)
src/modules/job-ingest/job-ingest.c 64.67% <64.67%> (ø)
src/cmd/flux-job.c 87.91% <87.91%> (ø)

@garlick garlick force-pushed the garlick:flux_job branch from 37ded73 to d3ff474 Aug 23, 2018

@garlick

This comment has been minimized.

Copy link
Member Author

garlick commented Aug 23, 2018

OK, just renamed "add" to "submit" both in the API, the RPC request topic, and the event topic. I went ahead and squashed that down.

@grondo

This comment has been minimized.

Copy link
Contributor

grondo commented Aug 24, 2018

The "program" to test for HAVE_FLUX_SECURITY won't work for the FLUX_TEST_INSTALLED case in general -- though it does work for the case we're concerned about, which is to install the built flux-core sources and then test against the resulting installation directory.

It does feel like maybe we'd want a way outside of compiling a C program to determine if an installed flux-core was built with flux-security support though... maybe we should annotate flux version output with compile time information like some programs do? e.g.

$ flux version --verbose
flux-core-0.9.0
+security
-caliper

or similar?

Edit: meant to add this can be done in a later PR. Just wanted to make a general comment here.

@grondo

This comment has been minimized.

Copy link
Contributor

grondo commented Aug 24, 2018

Also, if built without flux-security, should the flux job submitbench --reuse-signature and --security-config options be hidden?

@grondo

This comment has been minimized.

Copy link
Contributor

grondo commented Aug 24, 2018

Error from flux job submitbench when job-ingest module is not loaded could be improved

$ src/cmd/flux start  flux job submitbench t/jobspec/valid/basic.yaml 
flux-job: (null): Function not implemented

Sorry about these trivial comments coming in disjoint like this, just trying to note small issues before I leave for the day.

@garlick

This comment has been minimized.

Copy link
Member Author

garlick commented Aug 27, 2018

Thanks! I opened #1632 to track the flux version --verbose suggestion and I'll address the other two shortly.

@SteVwonder
Copy link
Member

SteVwonder left a comment

LGTM! Thanks @garlick!

@grondo

This comment has been minimized.

Copy link
Contributor

grondo commented Aug 27, 2018

Now that options are suppressed in flux job submitbench when built without flux-security, the built program that returns success (or failure) if flux-core is built with security (or not) may replaced by a grep of the flux job submitbench --help output.

garlick added some commits May 23, 2018

build: add --with-flux-security (opt-in)
If user requests it by specifying --with-flux-security,
have configure locate flux-security using pkg-config.

Defines HAVE_FLUX_SECURITY in config.h
Defines HAVE_FLUX_SECURITY for Makefile.am's
libutil/fluid: 0-pad DOTHEX encoding
Problem: DOTHEX-encoded FLUIDs can be used to map FLUIDs
onto the KVS namespace, but without zero padding,
flux-kvs ls doesn't display or sort them nicely.

Pad each dotted hex number out to four digits.
build: add jobspec sharness test to Makefile.am
Problem: jobspec sharness test was not run.

Add jobspec inputs to EXTRA_DIST and run jobspec
test if ENABLE_JOBSPEC.
build: define HAVE_JOBSPEC in config.h
Problem: while there is an ENABLE_JOBSPEC Makefile
conditional, there is nothing that can be tested in
config.h to determine if jobspec is being built.

Add HAVE_JOBSPEC to config.h
libutil/fluid: detect bad mnemonic input
Problem: fluid_decode (type=MNEMONIC) does not fail if
presented with words not in its dictionary or a phrase
too short to represent a uint64_t.

mn_decode()'s inline documentation is incorrect:

/* Return value:
 *  This function may return all the value returned by mn_decode_word_index
 *  plus the following result code:
 *
 * MN_EWORD  - Unrecognized word.
 */

It actually returns the number of bytes successfully
written to the output.

Require mn_decode() to return 8 in order for fluid_decode() to
be successful.
modules/job-ingest: add module for job ingest
Add a module that handles job-ingest.add RPCs to
add new jobs to the KVS using a rudimentary form of
the RFC 16 job schema foramt.  The user is returned
a jobid based on the FLUID proposal, which allows
64-bit id generation to occur in parallel across ranks,
while retaining a loose ordering of ids based in the
time submitted.  KVS per-job directories are generated
using the DOTHEX FLUID encoding, e.g.

job.active.0000.011c.ae00.0002

The instance owner and any user with ROLE_USER may
submit jobs.  Jobs must be signed, and the user
authenticated as the submitter (by the connector) must
match the signature, but the job signature is not
authenticated at ingest time.  The connector-authenticated
userid is recored in the KVS under the "userid" key.
The signed blob is recorded under the "J-signed" key.

The job submission consists of signed RFC 14 jobspec,
which is validated by temporarily instantiating a C++
Jobspec object and recording any parse errors.
The parsed Jobspec may be further validated in the
future, for example to find resource requests that can
not be fulfilled, but for now we ingest all valid jobspec.
(Validation is performed in a standalone .cpp file linked
against libjobspec.la.  The standalone C++, which exports
a validate function callable from C, is compiled with the
C++ compiler; automake then knows to link job-ingest
against libstdc++).  The unwrapped jobspec is written
to the KVS under the "jobspec" key.

The module is completely event driven, and KVS overhead
is reduced and ingest rate increased by batching job-ingest.submit
requests that arrive toether within 10ms.

A "job-ingest.submit" event is generated after the KVS commit
which contains an array of new jobids.  This can be consumed
by the job-manager module in the future, which will handle
listing jobs for users, and informing the scheduler when
new active jobs have been submitted.
libjob: add flux job library
Add a library will provides an API for job creation,
monitoring, and control.  For now it contains only
an interface for job submission.

@garlick garlick force-pushed the garlick:flux_job branch from 2153618 to 3bc86bb Aug 28, 2018

@garlick

This comment has been minimized.

Copy link
Member Author

garlick commented Aug 28, 2018

Good point! OK, I dropped the little proggie and updated the sharness test to look at the output of flux job --help instead.

I also took the opportunity to rebase on current master.

If this is looking close, LMK and I'll squash the incremental development.

@garlick

This comment has been minimized.

Copy link
Member Author

garlick commented Aug 28, 2018

I'm going to go ahead and squash.

@garlick

This comment has been minimized.

Copy link
Member Author

garlick commented Aug 28, 2018

oops, just remembered there are two sharness scripts that need updating to check flux-job --help. I'll do that and then squash.

garlick added some commits May 22, 2018

cmd/flux-job: add flux-job command
Add a front end command that will eventually contain
the primary user interfaces for submitting and
managing jobs such as list, run, or cancel.
For now, it contains two subcommands  for testing
the job-ingest module:

submitbench - test ingest throughput, maintaining
   a minimum number of outstanding RPCs

id - convert jobid's between representations.
sharness: add "job" rc1/rc3 scripts
Ad a test_under_flux "profile" for testing the
new execution system, starting with job-ingest module.
travis-ci: build --with-flux-security
Pull in flux-security-0.2.0 via the travis-dep-builder script,
then add --with-flux-security to some builders in the travis
build matrix.

@garlick garlick force-pushed the garlick:flux_job branch from 3bc86bb to 8f405a3 Aug 28, 2018

@grondo

grondo approved these changes Aug 28, 2018

Copy link
Contributor

grondo left a comment

Went through this quickly and gave it a spin w/ and w/out flux-security and I don't have any other comments. LGTM.
Thanks, @garlick!

@garlick garlick requested review from grondo and removed request for chu11 Aug 28, 2018

@garlick

This comment has been minimized.

Copy link
Member Author

garlick commented Aug 28, 2018

OK, I think this can go in - could somebody press the button?

@SteVwonder SteVwonder merged commit bf5af45 into flux-framework:master Aug 28, 2018

2 checks passed

continuous-integration/travis-ci/pr The Travis CI build passed
Details
coverage/coveralls Coverage decreased (-0.3%) to 79.236%
Details
@SteVwonder

This comment has been minimized.

Copy link
Member

SteVwonder commented Aug 28, 2018

Thanks @garlick!

@garlick

This comment has been minimized.

Copy link
Member Author

garlick commented Aug 28, 2018

Thanks!

@garlick garlick deleted the garlick:flux_job branch Oct 16, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.