Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

flux session-id attribute == -1when running under slurm #1034

Closed
chu11 opened this issue Apr 10, 2017 · 14 comments
Closed

flux session-id attribute == -1when running under slurm #1034

chu11 opened this issue Apr 10, 2017 · 14 comments

Comments

@chu11
Copy link
Member

chu11 commented Apr 10, 2017

On TOSS3 opal & catalyst I get the following:

$ srun -N4 -n4 --pty src/cmd/flux start flux getattr session-id
srun: job 1182761 queued and waiting for resources
srun: job 1182761 has been allocated resources
-1

Inside pmi_simple_client_get_appnum(), the string read from dgetline() is:

cmd=appnum rc=0 appnum=-1

Don't know PMI perfectly, but would this be coming from the PMI setup by slurm?

@grondo
Copy link
Contributor

grondo commented Apr 10, 2017

Yes, that's explicitly from slurm (at least in version installed on our clusters)

@garlick
Copy link
Member

garlick commented Apr 10, 2017

For background:

https://github.com/flux-framework/rfc/blob/master/spec_13.adoc

Rereading the MPI forum docs, it seems we are misusing it when we use it to create the "session id" and compute the epgm port.

@chu11
Copy link
Member Author

chu11 commented Apr 10, 2017

So it seems for the use case above, this is known accepted behavior? But it being known and accepted behavior makes it unwise to use for the epgm port case.

@garlick
Copy link
Member

garlick commented Apr 10, 2017

Early on I had assumed that the "appnum" was supposed to be the job id since that is what libpmi.so provided by slurm on TOSS2 returns. -1 is going to be a problem for both the session-id and the epgm port calculation.

My suggestion would be to require the epgm endpoint (including port) to be set on the command line via attribute. Probably the main use will be for the system instance where we can just hardwire one endpoint in the config.

Not sure what to do about the session id here. I guess if it comes back as -1 we could (choke choke) look for a slurm environment variable? Bleh...

@grondo
Copy link
Contributor

grondo commented Apr 10, 2017

Does the session-id have to be the same across all brokers?
If so could you generate one on rank 0 and share to other brokers with PMI_Kvs_put?

@garlick
Copy link
Member

garlick commented Apr 10, 2017

Its the jobid when flux runs flux, or when slurm on TOSS2 runs flux, so yes it should be the same.

We're using it as part of the "rundir" so it's needed early in startup.

@grondo
Copy link
Contributor

grondo commented Apr 11, 2017

Its the jobid when flux runs flux, or when slurm on TOSS2 runs flux, so yes it should be the same.

Sorry I didn't mean to ask if it was the same for existing scenarios, but whether there was anything that requires session-id to be the same across all brokers? Since FLUX_URI is set per rank in flux-exec and flux-wreckrun, I thought perhaps not.

It does seem like we're depending on some job-wide variable being set before PMI initialization, which is completely dependent on whatever we're using to launch brokers (e.g. no clean solution).

To avoid a cascade of checking various known "jobid" environment variables by hand, maybe we should have a FLUX_SESSION_ID_FALLBACK_VARIABLE, which could be set to "SLURM_JOB_ID" for this case, and would also handle other resource managers and parallel launch scenarios?

Also, if the session-id is always the job id, would it be clearer if it was renamed 'job-id' or something? (not being glib, honestly just asking)

@garlick
Copy link
Member

garlick commented Apr 11, 2017

To clarify: today we get this from PMI. We could share anything we want across the session using PMI (for example a generated uuid).

Maybe we should pause for a rethink here about what we really need?

And yes of course we should rename it from session-id to job-id. :-)

@garlick
Copy link
Member

garlick commented Apr 11, 2017

It would be neat to have a "job name" based on mnemonicode that would let us give flux instances memorable names, however I'm not sure we have a good motivating use case for that yet, or for even a good reason to know the jobid assigned by the enclosing instance. We should probably focus on getting our PMI implementation right and unwinding the historical baggage that's accrued here.

From the MPI Forum link describing the use of appnum (linked from our PMI rfc), it would appear that appnum is not intended to be valid outside of an SPMD launch scenario. I checked mpich's hydra[1], and it assigns an appnum of 0 in the non-spmd case. In our RFC we've defined a valid appnum as a uint (based on code splelunking). So I propose we do the following:

  1. start returning 0 for appnum in our PMI server implementations (second arg to pmi_simple_server_create () in cmd/flux-start.c and modules/wreck/wrexecd.c).

  2. See if we can expunge session-id completely from the broker and flux-start, including as part of the rundir and persistdir default paths

  3. I think there may be faillout for flux-list-instances (hmm, not working for me currently, and no test coverage?) Possibly this should be dropped now and going forward we should use the system instance as the point of contact for locating running instances.

  4. There will be minor fallout in flux-proxy (see findjob() - maybe just delete the legs that try to interpret job in the uri path, e.g. ssh://host/jobid.


[1] hydra test

$ mpirun  -n 2 ./test_pminfo
0: size=2 appnum=0 maxes=256:64:1024 kvsname=kvs_27262_0
1: size=2 appnum=0 maxes=256:64:1024 kvsname=kvs_27262_0

@grondo
Copy link
Contributor

grondo commented Apr 11, 2017

Great summary. I agree.

@garlick
Copy link
Member

garlick commented Apr 11, 2017

Thanks. One correction: I meant mpmd not spmd above. Here's how hydra sets appnum for mpmd:

mpirun  -n 2 ./test_pminfo : -n 2 ./test_pminfo
0: size=4 appnum=0 maxes=256:64:1024 kvsname=kvs_27846_0
1: size=4 appnum=0 maxes=256:64:1024 kvsname=kvs_27846_0
2: size=4 appnum=1 maxes=256:64:1024 kvsname=kvs_27846_0
3: size=4 appnum=1 maxes=256:64:1024 kvsname=kvs_27846_0

@grondo
Copy link
Contributor

grondo commented Apr 11, 2017

yeah, I never knew the real use of appnum, I guess it is meant to be an index into which application you are within a multi-program job?

Hm, what about the kvsname, is that a source of uniqueness? Probably that is taking us backwards, sorry....

@garlick
Copy link
Member

garlick commented Apr 11, 2017

I'm not sure there are any guarantees that kvsname is unique in the enclosing instance. It would depend on the implementation of the PMI KVS (no real need for a unique kvsname prefix if each job gets its own KVS namespace already, for example)

garlick added a commit to garlick/flux-core that referenced this issue Apr 28, 2017
Problem: flux-list-instances no longer works, has no test
coverage, and uses a fairly ad-hoc and fragile method
to find instances.

As discussed in flux-framework#1034, this command probably should be
replaced with a scheme that uses the system instance
as the point of contact.

Drop the command and its helper function over in the
python bindings.
@grondo
Copy link
Contributor

grondo commented Sep 13, 2019

I think this issue was "fixed" by #2362.

@grondo grondo closed this as completed Sep 13, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants