Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

flux-in-flux: set jobid attribute #2362

merged 9 commits into from Sep 11, 2019


Copy link

commented Sep 11, 2019

As discussed in #2344, the session-id attribute no longer works properly with flux-in-flux. This PR replaces it with a jobid attribute, passed in from the enclosing instance (the shell) using the PMI KVS rather than the PMI appnum, which was restricted to 32 bits. The implementation tracks that of instance-level, added in #2326.

If flux is at instance level 0, the jobid attribute is unset.

This also drops the moribund flux-proxy feature that allowed a jobid to be specified instead of a URI, and derived the URI from the jobid by assuming a particular socket path. That approach proved fragile, and thus I removed it rather than try to fix it now that the untruncated jobid is available. We will need better ways of finding URI's for running flux instances anyway.

garlick added 9 commits Sep 10, 2019
Problem: flux-proxy JOBID doesn't work anymore.

The heuristic that translates a JOBID argument to a local://
URI no longer works because job ID's are now 64-bit, yet they
are still passed through the 32-bit PMI appnum.

Drop this support as it is fragile, and we can find other
ways to look up URI's for jobs.

Update flux-proxy(1) man page.

Update proxy sharness test.
Set PMI appnum to zero.

Use flux-<pid>-XXXXXX as tmp directory.
Problem: the 'session-id' attribute is intended to
be the job ID of the flux instance, but it is passed
through the 32-bit PMI appnum, so it gets mangled.

Drop support for the 'session-id' attribute.

Use the pid of the broker instead of the session id
to construct the rundir and persistdir paths.

Drop session-id from default boot.conf file.

Update config file sharness test and its inputs.
Problem: if flux is booted via config file, the instance-level
attribute is unset.

Set it to zero if booted from a config file.
The appnum has insufficient range to reliably transmit
a jobid, so set it to zero instead.
If running at instance-level > 0, the flux-shell makes
the instance's jobid in the enclosing instance available
via PMI.  If available, set the 'jobid' broker attribute
to reflect this value, else set it to NULL to prevent it
from being set by users.

Fixes #2344

This comment has been minimized.

Copy link

commented Sep 11, 2019

Codecov Report

Merging #2362 into master will decrease coverage by <.01%.
The diff coverage is 88.57%.

@@            Coverage Diff             @@
##           master    #2362      +/-   ##
- Coverage   80.82%   80.82%   -0.01%     
  Files         218      218              
  Lines       34530    34501      -29     
- Hits        27909    27884      -25     
+ Misses       6621     6617       -4
Impacted Files Coverage Δ
src/shell/pmi.c 77.11% <100%> (+0.58%) ⬆️
src/broker/broker.c 73.72% <100%> (+0.19%) ⬆️
src/cmd/flux-start.c 78.4% <100%> (-0.08%) ⬇️
src/broker/boot_pmi.c 65.67% <100%> (+1.05%) ⬆️
src/broker/boot_config.c 83.13% <33.33%> (-2.41%) ⬇️
src/cmd/builtin/proxy.c 74.12% <71.42%> (-0.3%) ⬇️
src/broker/modservice.c 70.67% <0%> (-0.76%) ⬇️
src/cmd/flux-job.c 85.31% <0%> (-0.3%) ⬇️
src/common/libflux/message.c 80.7% <0%> (-0.26%) ⬇️
... and 1 more

This comment has been minimized.

Copy link

commented Sep 11, 2019

Very nice!

While working on this did you have any insight into how to propagate jobid from non-flux enclosing RM, e.g. to satisfy the strategy for libyogrt described in #2358?

Seems like this PR could go in as-is if our strategy for other RMs would be to place enclosing jobid into namespaced attribute, e.g. slurm.jobid or lsf.jobid, and jobid is always assumed to be a flux jobid.


This comment has been minimized.

Copy link
Member Author

commented Sep 11, 2019

Thanks for referencing that issue. It does seem like we would need something more than a number if jobid represented a "foreign" jobid. Capturing it as lsf.jobid=N or foreign-jobid=lsf-N makes some sense to me.

I guess I'm for doing that in a separate PR.


This comment has been minimized.

Copy link

commented Sep 11, 2019

Ok, seems simple enough to go in then!

@grondo grondo merged commit 9391b20 into flux-framework:master Sep 11, 2019
4 checks passed
4 checks passed
Summary 1 potential rule
codecov/patch 88.57% of diff hit (target 80.82%)
codecov/project Absolute coverage decreased by -<.01% but relative coverage increased by +7.74% compared to a3645e6
continuous-integration/travis-ci/pr The Travis CI build passed

This comment has been minimized.

Copy link
Member Author

commented Sep 11, 2019


Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
None yet
3 participants
You can’t perform that action at this time.