Conversation
36d416e to
e570580
Compare
|
That's great - thanks! |
|
I thought this was still in development to address
|
|
Hi Jim!
Correct - Howard noted that (indirectly) in the files themselves. Problem isn't defining a VNI - issue is that CXI requires privilege to load the VNI into it. We would need a Flux PLM component to resolve this, but my sense is that this work isn't targeting such systems.
Yep - with Slurm and PALS, for example, we use their launcher to start the daemons. However, one must note that even there, the base environment has no knowledge of nor visibility into the application procs being executed. They only see the PRRTE daemons.
Didn't know that, but I assume Howard has at least tested it. This is a pretty minimal functionality and has zero impact on the rest of the PRRTE community, so I'm not concerned from our perspective. I'll leave it to you and Howard to work that one out. |
|
Hi Ralph! The VNI issue on the coral2 systems is easily resolved by running the PRRTE daemons as a Flux job, since Flux takes care of the privileged CXI setup and we just need the SLINGSHOT_* environment variables that it sets to be passed through to MPI. Those are set by the flux shell, which is part of Flux job launch. When PRRTE daemons are launched with ssh outside of any flux job, those variables are not set and MPI (libfabric cxi provider actually) will try to use the default CXI service, which is disabled.
This is the same model that Flux uses when launching Flux (which is how we do batch jobs), so no problem there. Only the top level Flux does the CXI service setup. It is inherited and shared by everything launched under that, which differs from Slurm where each job step is isolated with its own CXI allocation. The Slurm way is useful for NIC resource isolation but less so for RDMA security since all steps run as the same user. (Sorry if that was TMI)
OK. |
Agreed - and eventually something that should probably be added. Easy enough to do for someone who knows the Flux API for spawning one proc per node. I think it was left out here because the target audience are folks like the European research project(s) that (a) don't use Slingshot and (b) want PRRTE as a shim while working on more PMIx-Flux integration.
Yeah, that's exactly what happens when people use "mpirun" under Slurm as well. Since Slurm only sees the daemons, there is only one VNI assignment made and all applications inherit it. People seem content since it's the same user running the apps. Besides, nobody has yet demonstrated any real value from VNI use...but that's a personal rant 😄 |
|
I like doing things in stages. So RAS component comes first. |
OK just be aware that it is using the wrong resource set and will need to change when the next stage is done. |
This component enables prte/prterun to be used in a Flux environment without need for explicit hostfiles, amongst other things.
Tested using the ssh PLM.
Thanks to Flux developers @grondo and @garlick for for helpful suggestions!