Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add flux --parent option and new 'instance-level' attribute #2326

Merged
merged 9 commits into from Aug 28, 2019

Conversation

@grondo
Copy link
Contributor

commented Aug 19, 2019

I was playing with an options to improve interaction with parent instance this weekend and decided to push one option as a WIP PR to further discussion.

This PR adds a flux --parent option to simplify interaction with parent instance as discussed in #2325. This way people can play around with the proposed interface and see if it could be workable.

The general idea is that adding flux --parent would avoid the need to potentially add --parent to every required subcommand, and is also more compact to type than if we added it to flux env, e.g. instead of

flux env --parent flux kvs put

we just have

flux --parent kvs put

It also seemed useful to have a way to determine the "level" of the current Flux instance (e.g. similar to the shell SHLVL variable). Though maybe the level of the current instance in a hierarchy should not be exposed? I'm not sure.

Anyway, in this experimental PR a variable FLUX_INSTANCE_LEVEL env var and level broker attribute were added in case it is useful.

One issue that came up is that, currently, a Flux instance assumes it is a child (i.e. parent-uri is set) whether the instance is started under an existing instance with flux start directly or as an actual job with flux srun flux start (or flux submit).

We might want to discuss if flux start used within an existing instance is really a "child" instance, or if it is the start of a new instance, and only flux launching flux is properly considered a "child".

@garlick

This comment has been minimized.

Copy link
Member

commented Aug 21, 2019

Well this came out nice IMHO!

My vote would be that only a Flux instance running as a job in another Flux instance should count as a child, since otherwise it won't have allocated resources in the enclosing instance?

@grondo

This comment has been minimized.

Copy link
Contributor Author

commented Aug 21, 2019

My vote would be that only a Flux instance running as a job in another Flux instance should count as a child, since otherwise it won't have allocated resources in the enclosing instance?

I think I agree. How does an instance reliably determine that it is running as a job in another Flux instance? We can check for FLUX_JOB_ID, but this variable could be set and the instance still might not be a job since flux start -s might have been used. Even if flux start -s wasn't used, the instance still might not be a job if it was bootstrapped from some other method (though maybe that is a non-issue for now).

@grondo

This comment has been minimized.

Copy link
Contributor Author

commented Aug 22, 2019

How does an instance reliably determine that it is running as a job in another Flux instance?

I'm not sure there is a reliable way to do this. With BOOTSTRAP_SELFPMI we could have flux-start sanitize some environment variables like FLUX_URI FLUX_INSTANCE_LEVEL and FLUX_JOB_ID to "reset" the instance level to zero. However, this doesn't help when running a singleton like flux start or flux broker. I'm not sure how the broker can tell in this case whether it was launched as a "job" or as a singleton within a job.

@garlick

This comment has been minimized.

Copy link
Member

commented Aug 22, 2019

I'm not sure how the broker can tell in this case whether it was launched as a "job" or as a singleton within a job.

Can it compare FLUX_JOB_SIZE to its own size?

I was going to suggest FLUX_JOB_ID vs session-id but (and I should open a bug on this), currently this gets truncated when passing through the 32-bit PMI-1 appnum.

@grondo grondo force-pushed the grondo:flux-parent branch from bdb8551 to 33a8ebd Aug 22, 2019
@grondo

This comment has been minimized.

Copy link
Contributor Author

commented Aug 22, 2019

Added an experiment here to have flux-shell set FLUX_INSTANCE_NEWLEVEL if it is running flux start or flux broker. The broker now only increments FLUX_INSTANCE_LEVEL if FLUX_INSTANCE_NEWLEVEL is set, since it was presumably launched by a flux shell.

@grondo

This comment has been minimized.

Copy link
Contributor Author

commented Aug 22, 2019

Can it compare FLUX_JOB_SIZE to its own size?

That would break for flux start -s ${FLUX_JOB_SIZE} or even flux srun -n 1 flux start flux start (I think).

I was going to suggest FLUX_JOB_ID vs session-id but (and I should open a bug on this), currently this gets truncated when passing through the 32-bit PMI-1 appnum.

I like the idea of some indicator that only gets set when a session is boostrapped via another flux instance PMI. Could we just add some local PMI key in FLUX PMI only that only flux-broker would access which would indicate this instance was bootstrapped as part of a job?

If not then I like your idea of FLUX_JOB_ID vs session-id if it is fool-proof.

@grondo

This comment has been minimized.

Copy link
Contributor Author

commented Aug 22, 2019

Opened #2331 on a new failure in t2204-job-info.t that one the latest builds hit.

@grondo grondo force-pushed the grondo:flux-parent branch from 33a8ebd to ad98676 Aug 28, 2019
@grondo

This comment has been minimized.

Copy link
Contributor Author

commented Aug 28, 2019

Swapped out the FLUX_INSTANCE_LEVEL nonsense for an experimental idea:

In boot_pmi.c the broker now checks for a pre-set PMI key flux.instance-level and if set, uses this as the value for an instance-level attribute. If not found the attribute is initialized to 0.

In flux-shell, when not running standalone, fetch the current instance-level attribute, increment it, and pre-set in the shell's PMI kvs.

End result: an instance can now reliably determine if it is a proper "subinstance" by checking for instance-level > 0, and instance level is only incremented for flux launching flux.

I'm not sure if "level" is the right term here. I was thinking similar to shell level, but in "tree" terminology perhaps depth would be the correct term for this case.

Also, I wasn't sure if there is a naming convention in PMI for pre-set keys, so just went with flux.instance-level for now.

@codecov-io

This comment has been minimized.

Copy link

commented Aug 28, 2019

Codecov Report

Merging #2326 into master will increase coverage by 0.02%.
The diff coverage is 83.63%.

@@            Coverage Diff             @@
##           master    #2326      +/-   ##
==========================================
+ Coverage   80.82%   80.84%   +0.02%     
==========================================
  Files         215      215              
  Lines       34254    34302      +48     
==========================================
+ Hits        27686    27732      +46     
- Misses       6568     6570       +2
Impacted Files Coverage Δ
src/broker/boot_pmi.c 64.61% <77.77%> (+0.97%) ⬆️
src/shell/pmi.c 75.86% <84.61%> (+1.32%) ⬆️
src/cmd/flux.c 84.14% <84.84%> (+0.06%) ⬆️
src/modules/connector-local/local.c 73.41% <0%> (+0.14%) ⬆️
src/broker/modservice.c 72.18% <0%> (+0.75%) ⬆️
src/common/libutil/environment.c 93% <0%> (+3%) ⬆️
@garlick

This comment has been minimized.

Copy link
Member

commented Aug 28, 2019

Nice!

There is only one pre-defined key in PMI-1, PMI_process_mapping, so not much help there. Your choice seems fine.

You'll want to add this key to the special case in shell_pmi_barrier_enter() that omits PMI_process_mapping from the KVS transaction. Since kvs.instance-level is added to the hash, it will be flushed to the KVS unnecessarily (although harmlessly) when the shell initiates the PMI barrier.

I'd appreciate it if you'd tack on the comment I should have added, something like

        /* Special case:
         * The following keys are not added to the KVS transaction
         * because they were locally generated and need not be
         * shared with the other shells.
         */

(Sorry for neglecting to comment on that when I added it).

@grondo

This comment has been minimized.

Copy link
Contributor Author

commented Aug 28, 2019

Thanks, I was sure I was forgetting something! I'll tackle that in the morning.

grondo and others added 5 commits Aug 19, 2019
Problem: It is inconvenient to connect to a parent Flux instance
and access kvs namespace since both FLUX_URI and FLUX_KVS_NAMESPACE
have to be set from the parent-uri and parent-kvs-namespace attributes.

Since this might be a common use case, and confusing for users, add
a `--parent` option to the flux(1) command driver to fetch these
attributes from the current instance on behalf of the user and set
them appropriately in the current environment.

If a connection to the parent instance is successful, also prepend
the parent's configured FLUX_EXEC_PATH and FLUX_MODULE_PATH to the
current env, so that subcommands (besides builtins) will use those
from the parent not the current instance, if they differ.

Allow the --parent option to be used multiple times, to access
granparent, its parent, and so on. Use of --parent from a root instance
does nothing (akin to `cd ..` when you are in the root directory)
Add basic tests to ensure flux --parent works.
Add a short description of the flux --parent option.
Check for an optional PMI key "flux.instance-level" which may be
set by the flux-shell of an enclosing instance to tell a child
instance at what "level" it is being started in a hiearachy, i.e.
the number of parent instances up to the root/system instance.

If "flux.instance-level" is found, set this as the "instance-level"
attribute, o/w initialize the instance-level attribute to zero.
In shell_pmi_barrier_enter(), add a note about locally generated
keys that are not added to the KVS transaction.
@grondo grondo force-pushed the grondo:flux-parent branch 2 times, most recently from b11a3b7 to a878513 Aug 28, 2019
@grondo

This comment has been minimized.

Copy link
Contributor Author

commented Aug 28, 2019

Ok, I added the comment above as well as adding flux.instance-level to the local-only kvs keys.

As an experiment, I tacked one more commit on the end to convert the list of "local keys" into a pmi->locals hash, along with a pmi_kvs_put_local() function that puts the key in both the pmi->kvs and locals hashes.

I was thinking that any other broker attrs that should only be set for proper subinstances could be shared from parent to child via special PMI keys, since this ends up being much more predictable than environment variables. (e.g. should parent-uri and parent-kvs-namespace be set this way?).

Anyway, I was planning for the possibility of more local-only PMI kvs keys. However, I'd be fine dropping this last commit.

Edit: oh, btw, wasn't sure of "best practice" for using a zhashx just as a lookup table (hash with no values). Sorry if the result is distasteful.

Copy link
Member

left a comment

Looks good though I called out a couple of minor things.

No objection to the second hash. We'll want something like that to implement #1790 (if still needed), and soon we'll have a way to use plugin options to let keys be pre-populated programmatically rather than hardwiring all of them. So useful!

One idea, since you mentioned parent-uri. Should we just put that in the PMI kvs and use flux_attr_get() to fetch any additional attributes? We could potentially add a bulk attr get interface later that would be much faster than individual PMI_KVS_Get()s.

if (!level)
return 0;

l = strtol (level, &p, 10);

This comment has been minimized.

Copy link
@garlick

garlick Aug 28, 2019

Member

from strtol(3):

Since strtol() can legitimately return 0, LONG_MAX, or LONG_MIN
(LLONG_MAX or LLONG_MIN for strtoll()) on both success and failure, the
calling program should set errno to 0 before the call, and then deter‐
mine if an error occurred by checking whether errno has a nonzero value
after the call.

might want to treat < 0 as an error too.

This comment has been minimized.

Copy link
@garlick

garlick Aug 28, 2019

Member

Did you see this comment @grondo?

This comment has been minimized.

Copy link
@grondo

grondo Aug 28, 2019

Author Contributor

Hm, yeah I thought I had added that check? Let me make sure I committed what I thought I committed.

And sorry for the sloppy work!

This comment has been minimized.

Copy link
@grondo

grondo Aug 28, 2019

Author Contributor

Maybe because I went ahead and squashed you missed the updated code. I now have the below. Does that address your comments, or did I mess it up again?

    errno = 0;
    l = strtol (level, &p, 10);
    if (errno != 0 || *p != '\0' || l < 0) {
        log_msg ("set_flux_instance_level level=%s invalid", level);
        goto out;
    }

This comment has been minimized.

Copy link
@garlick

garlick Aug 28, 2019

Member

Oh sorry. That looks great!

log_msg ("set_flux_instance_level level=%s invalid", level);
goto out;
}
n = snprintf (val, sizeof (val) - 1, "%lu", l+1);

This comment has been minimized.

Copy link
@garlick

garlick Aug 28, 2019

Member

snprintf size should be sizeof (val) not minus one.

This comment has been minimized.

Copy link
@grondo

grondo Aug 28, 2019

Author Contributor

Ugh, embarassing!

@garlick

This comment has been minimized.

Copy link
Member

commented Aug 28, 2019

Edit: oh, btw, wasn't sure of "best practice" for using a zhashx just as a lookup table (hash with no values). Sorry if the result is distasteful.

Works for me, although another approach would be to keep "local" and global keys in separate hashes and just have "get" look in the local one before the global one.

@garlick

This comment has been minimized.

Copy link
Member

commented Aug 28, 2019

Feel free to add merge-when-passing to this and drop the WIP when you're happy. I think the significant addition here is flux --parent, and how the attr info gets passed around could be changed later.

@grondo

This comment has been minimized.

Copy link
Contributor Author

commented Aug 28, 2019

One idea, since you mentioned parent-uri. Should we just put that in the PMI kvs and use flux_attr_get() to fetch any additional attributes? We could potentially add a bulk attr get interface later that would be much faster than individual PMI_KVS_Get()s.

So if parent-uri is found in the PMI kvs, the broker would connect to that URI, and fetch the parent's instance-level attribute (along with any future attributes, though not sure what those would be)?

Unfortunately parent-kvs-namespace would still have to be handled via the environment in this scheme.

@grondo

This comment has been minimized.

Copy link
Contributor Author

commented Aug 28, 2019

Works for me, although another approach would be to keep "local" and global keys in separate hashes and just have "get" look in the local one before the global one.

I considered this, but since there are many more non-local keys than local keys it seemed more optimal to not require the double-lookup for the common case (Though for all I know the impact of missed lookups in the "local" hash would be trivial)

@garlick

This comment has been minimized.

Copy link
Member

commented Aug 28, 2019

Unfortunately parent-kvs-namespace would still have to be handled via the environment in this scheme.

True, and on further consideration, I think your proposal is a lot cleaner and in keeping with what PMI is for.

I considered this, but since there are many more non-local keys than local keys it seemed more optimal to not require the double-lookup for the common case (Though for all I know the impact of missed lookups in the "local" hash would be trivial)

Yeah, the way it is now is fine.

If not running in standalone mode, fetch the current instance-level
attribute set in the broker, increment it, and set this as a preset
PMI key 'flux.instance-level'.

A subinstance booting from the shell's PMI implementation will
check for this flux-only PMI key to initialize its own instance-level
attribute.
grondo added 3 commits Aug 28, 2019
Add basic tests that instance-level is incremented for subinstances,
and reset to 0 for standalone instance.
Add a 'locals' hash to the pmi object, used to store local-only
keys; i.e. those keys that should not be published in the Flux
kvs transaction.

This should make adding new local-only keys in the future trivial.
Problem: t3100-flux-in-flux.t sometime fails when running make -j N
for N of some large-ish value, system dependent.

Extend the timeout for running the first flux launching flux test
to compensate for possible slow or busy system.
@grondo grondo force-pushed the grondo:flux-parent branch from a878513 to 6fb4289 Aug 28, 2019
@garlick

This comment has been minimized.

Copy link
Member

commented Aug 28, 2019

Thanks for extending the flux-in-flux timeout. I was hitting that one on my slow test system and had done the same thing in my local branch.

@grondo

This comment has been minimized.

Copy link
Contributor Author

commented Aug 28, 2019

Thanks for extending the flux-in-flux timeout. I was hitting that one on my slow test system and had done the same thing in my local branch.

Did I get it long enough?

@grondo grondo changed the title WIP: add flux --parent option and instance 'level' attribute add flux --parent option and new 'instance-level' attribute Aug 28, 2019
@garlick

This comment has been minimized.

Copy link
Member

commented Aug 28, 2019

@grondo

This comment has been minimized.

Copy link
Contributor Author

commented Aug 28, 2019

restarting a builder that hit

ERROR: lua/t0001-send-recv.t - exited with status 137 (terminated by signal 9?)

must be a slow to exit broker...

@garlick garlick merged commit 5931d4f into flux-framework:master Aug 28, 2019
2 checks passed
2 checks passed
Summary 1 potential rule
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
@grondo

This comment has been minimized.

Copy link
Contributor Author

commented Aug 28, 2019

Cool. Thanks!

@grondo grondo deleted the grondo:flux-parent branch Aug 28, 2019
@SteVwonder

This comment has been minimized.

Copy link
Member

commented Sep 5, 2019

Making my way through my Github notification backlog and just saw this. Very cool!!! Nice work!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants
You can’t perform that action at this time.