Skip to content

ras: add Flux component#2407

Merged
rhc54 merged 1 commit intoopenpmix:masterfrom
hppritcha:ras_flux_upstream
Feb 18, 2026
Merged

ras: add Flux component#2407
rhc54 merged 1 commit intoopenpmix:masterfrom
hppritcha:ras_flux_upstream

Conversation

@hppritcha
Copy link
Contributor

This component enables prte/prterun to be used in a Flux environment without need for explicit hostfiles, amongst other things.

Tested using the ssh PLM.

Thanks to Flux developers @grondo and @garlick for for helpful suggestions!

This component enables prte/prterun to be used in a Flux environment
without need for explicit hostfiles, amongst other things.

Tested using the ssh PLM.

Thanks to Flux developers @grondo and @garlick for for helpful
suggestions!

Signed-off-by: Howard Pritchard <howardp@lanl.gov>
@rhc54
Copy link
Contributor

rhc54 commented Feb 18, 2026

That's great - thanks!

@rhc54 rhc54 merged commit 8836ea0 into openpmix:master Feb 18, 2026
17 checks passed
@garlick
Copy link

garlick commented Feb 18, 2026

I thought this was still in development to address

@rhc54
Copy link
Contributor

rhc54 commented Feb 18, 2026

Hi Jim!

it won't work on the CORAL2 systems with VNI tagging

Correct - Howard noted that (indirectly) in the files themselves. Problem isn't defining a VNI - issue is that CXI requires privilege to load the VNI into it. We would need a Flux PLM component to resolve this, but my sense is that this work isn't targeting such systems.

it doesn't work if prterun wants to run the DVM as a Flux job

Yep - with Slurm and PALS, for example, we use their launcher to start the daemons. However, one must note that even there, the base environment has no knowledge of nor visibility into the application procs being executed. They only see the PRRTE daemons.

no tests for this in flux or otherwise

Didn't know that, but I assume Howard has at least tested it. This is a pretty minimal functionality and has zero impact on the rest of the PRRTE community, so I'm not concerned from our perspective. I'll leave it to you and Howard to work that one out.

@garlick
Copy link

garlick commented Feb 18, 2026

Hi Ralph!

The VNI issue on the coral2 systems is easily resolved by running the PRRTE daemons as a Flux job, since Flux takes care of the privileged CXI setup and we just need the SLINGSHOT_* environment variables that it sets to be passed through to MPI. Those are set by the flux shell, which is part of Flux job launch. When PRRTE daemons are launched with ssh outside of any flux job, those variables are not set and MPI (libfabric cxi provider actually) will try to use the default CXI service, which is disabled.

Yep - with Slurm and PALS, for example, we use their launcher to start the daemons. However, one must note that even there, the base environment has no knowledge of nor visibility into the application procs being executed. They only see the PRRTE daemons.

This is the same model that Flux uses when launching Flux (which is how we do batch jobs), so no problem there. Only the top level Flux does the CXI service setup. It is inherited and shared by everything launched under that, which differs from Slurm where each job step is isolated with its own CXI allocation. The Slurm way is useful for NIC resource isolation but less so for RDMA security since all steps run as the same user.

(Sorry if that was TMI)

This is a pretty minimal functionality and has zero impact on the rest of the PRRTE community, so I'm not concerned from our perspective.

OK.

@rhc54
Copy link
Contributor

rhc54 commented Feb 18, 2026

The VNI issue on the coral2 systems is easily resolved by running the PRRTE daemons as a Flux job,

Agreed - and eventually something that should probably be added. Easy enough to do for someone who knows the Flux API for spawning one proc per node. I think it was left out here because the target audience are folks like the European research project(s) that (a) don't use Slingshot and (b) want PRRTE as a shim while working on more PMIx-Flux integration.

inherited and shared by everything launched under that

Yeah, that's exactly what happens when people use "mpirun" under Slurm as well. Since Slurm only sees the daemons, there is only one VNI assignment made and all applications inherit it. People seem content since it's the same user running the apps.

Besides, nobody has yet demonstrated any real value from VNI use...but that's a personal rant 😄

@hppritcha
Copy link
Contributor Author

I like doing things in stages. So RAS component comes first.

@garlick
Copy link

garlick commented Feb 18, 2026

I like doing things in stages. So RAS component comes first.

OK just be aware that it is using the wrong resource set and will need to change when the next stage is done.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants