New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
flux session-id attribute == -1when running under slurm #1034
Comments
Yes, that's explicitly from slurm (at least in version installed on our clusters) |
For background: https://github.com/flux-framework/rfc/blob/master/spec_13.adoc Rereading the MPI forum docs, it seems we are misusing it when we use it to create the "session id" and compute the epgm port. |
So it seems for the use case above, this is known accepted behavior? But it being known and accepted behavior makes it unwise to use for the epgm port case. |
Early on I had assumed that the "appnum" was supposed to be the job id since that is what libpmi.so provided by slurm on TOSS2 returns. -1 is going to be a problem for both the session-id and the epgm port calculation. My suggestion would be to require the epgm endpoint (including port) to be set on the command line via attribute. Probably the main use will be for the system instance where we can just hardwire one endpoint in the config. Not sure what to do about the session id here. I guess if it comes back as -1 we could (choke choke) look for a slurm environment variable? Bleh... |
Does the session-id have to be the same across all brokers? |
Its the jobid when flux runs flux, or when slurm on TOSS2 runs flux, so yes it should be the same. We're using it as part of the "rundir" so it's needed early in startup. |
Sorry I didn't mean to ask if it was the same for existing scenarios, but whether there was anything that requires session-id to be the same across all brokers? Since FLUX_URI is set per rank in flux-exec and flux-wreckrun, I thought perhaps not. It does seem like we're depending on some job-wide variable being set before PMI initialization, which is completely dependent on whatever we're using to launch brokers (e.g. no clean solution). To avoid a cascade of checking various known "jobid" environment variables by hand, maybe we should have a FLUX_SESSION_ID_FALLBACK_VARIABLE, which could be set to "SLURM_JOB_ID" for this case, and would also handle other resource managers and parallel launch scenarios? Also, if the session-id is always the job id, would it be clearer if it was renamed 'job-id' or something? (not being glib, honestly just asking) |
To clarify: today we get this from PMI. We could share anything we want across the session using PMI (for example a generated uuid). Maybe we should pause for a rethink here about what we really need? And yes of course we should rename it from session-id to job-id. :-) |
It would be neat to have a "job name" based on mnemonicode that would let us give flux instances memorable names, however I'm not sure we have a good motivating use case for that yet, or for even a good reason to know the jobid assigned by the enclosing instance. We should probably focus on getting our PMI implementation right and unwinding the historical baggage that's accrued here. From the MPI Forum link describing the use of appnum (linked from our PMI rfc), it would appear that appnum is not intended to be valid outside of an SPMD launch scenario. I checked mpich's hydra[1], and it assigns an appnum of 0 in the non-spmd case. In our RFC we've defined a valid appnum as a
[1] hydra test
|
Great summary. I agree. |
Thanks. One correction: I meant mpmd not spmd above. Here's how hydra sets appnum for mpmd:
|
yeah, I never knew the real use of appnum, I guess it is meant to be an index into which application you are within a multi-program job? Hm, what about the kvsname, is that a source of uniqueness? Probably that is taking us backwards, sorry.... |
I'm not sure there are any guarantees that kvsname is unique in the enclosing instance. It would depend on the implementation of the PMI KVS (no real need for a unique kvsname prefix if each job gets its own KVS namespace already, for example) |
Problem: flux-list-instances no longer works, has no test coverage, and uses a fairly ad-hoc and fragile method to find instances. As discussed in flux-framework#1034, this command probably should be replaced with a scheme that uses the system instance as the point of contact. Drop the command and its helper function over in the python bindings.
I think this issue was "fixed" by #2362. |
On TOSS3 opal & catalyst I get the following:
Inside
pmi_simple_client_get_appnum()
, the string read fromdgetline()
is:cmd=appnum rc=0 appnum=-1
Don't know PMI perfectly, but would this be coming from the PMI setup by slurm?
The text was updated successfully, but these errors were encountered: