Questions about Parsl integration #235

christopherwharrop-noaa · 2023-05-01T21:13:44Z

I apologize if this is the wrong forum, but I haven't been able to locate another mechanism for asking questions about Flux. If there is a Slack workspace or other help forum please point me to it and I'll be happy to post there.

I am testing use of Flux with Parsl. I have the latest Flux and the latest Parsl and am trying to understand requirements for connectivity and how environments are propagated.

I use one machine where users are not permitted to ssh to compute nodes, even when they own jobs that are currently running on them. I use another machine where ssh access is allowed for compute nodes that are running jobs owned by that user. I cannot get Parsl/Flux to work on the former, but I was successful on the latter. I'd like to inquire exactly what the port/protocol access requirements are for establishing communication between a running Parsl program on a login node that is using the FluxExecutor and the pool of resources being managed by Flux inside the Parsl pilot on compute nodes.
In my test program it appears that the Parsl "worker_init" environment is not being propagated to the jobs that run under Flux via Parsl's FluxExectutor. Is that expected? What this means is that I need to run "module load...." commands in the "worker_init" and also in the actual command used by the Parsl Bash Apps that run my MPI programs. I'd like to confirm if that is normal, or if maybe I have something misconfigured.

jameshcorbett · 2023-05-01T21:38:18Z

I'm not sure what the best forum is for this issue, but I think I am (along with some Parsl developers) the best person to ask, since I wrote the Parsl integration, although there may have been some changes to it since I last worked on it.

For 1), I believe the executor does assume that users are able to ssh to compute nodes they have allocated. I think that was a known limitation when I wrote it, but I'll confirm and circle back. We should at least make that known as a limitation.

(For fellow Flux developers, the implementation gets the URI of the remote instance and tries to communicate with it, which requires ssh if I remember correctly?)

For 2), I'll need to investigate further.

grondo · 2023-05-01T21:59:33Z

the implementation gets the URI of the remote instance and tries to communicate with it, which requires ssh if I remember correctly?)

Yes, currently the remote URI for jobs is an ssh connector URI, e.g. ssh://HOST/path/to/socket, which uses ssh to connect to remote unix domain socket.

If there is a use case where users do not have login access to compute nodes, we may need to create another kind of connector, perhaps one that proxies through the enclosing instance (e.g. there's been a couple prototypes of a job shell execution server). I'm not sure how all the tools would know to provide the alternate connector URI -- perhaps we should get an open issue on this, and I bet @garlick has thoughts.

vsoch · 2023-05-01T22:17:39Z

I have an example using Flux and parsl alongside the operator: https://github.com/flux-framework/flux-operator/blob/main/examples/launchers/parsl/molecular-design/minicluster.yaml

and that uses the container (with the docs and full example) here: https://github.com/rse-ops/flux-hpc/tree/main/molecular-design-parsl

vsoch · 2023-05-01T22:19:17Z

TLDR:

executor = FluxExecutor(working_dir=args.working_dir)

# Workaround for bug that it isn't set if provided
executor.launch_cmd="{flux} submit {python} {manager} {protocol} {hostname} {port}"
config = Config(executors=[executor])
parsl.load(config)

https://github.com/rse-ops/flux-hpc/blob/8a98124842994795ec0eb2f61500fb5f61ff241d/molecular-design-parsl/scripts/0_molecular-design-with-parsl.py#LL92C1-L98C1

I converted that from their example notebooks. They had a few more complex examples using something called columna but I never got that working with flux. Note that I changed the command to submit too - I think the command they have is something else and it didn't make sense to me at the time!

christopherwharrop-noaa · 2023-05-03T14:40:42Z

Thank you for the quick responses. In my case, many of our on-prem HPC systems do allow ssh access to nodes if they are running a job owned by the user. Unfortunately, local policies (which I have no control over, of course) for other machines disallow that for other machines. I don't know how to express the use case other than to say that ssh access is not always permitted. I'm going to test if ssh access on that machine is permitted between nodes of a running job. My guess is no, but I'll check.

The propagation of environments isn't a show stopper, but it makes the shell commands for the Parsl Bash Apps a bit messy.

@vsoch - I'm not quite following what you were getting at. Were you just sharing an example of using the FluxExecutor for me to look at? I'm not following how that relates to my questions. Am I missing something?

vsoch · 2023-05-03T19:21:53Z

Yes it's just an example of Flux + parsl integration - there are so few out there I thought it might help to have some means of comparison.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Questions about Parsl integration #235

Questions about Parsl integration #235

christopherwharrop-noaa commented May 1, 2023 •

edited

jameshcorbett commented May 1, 2023

grondo commented May 1, 2023 •

edited

vsoch commented May 1, 2023

vsoch commented May 1, 2023

christopherwharrop-noaa commented May 3, 2023

vsoch commented May 3, 2023

Questions about Parsl integration #235

Questions about Parsl integration #235

Comments

christopherwharrop-noaa commented May 1, 2023 • edited

jameshcorbett commented May 1, 2023

grondo commented May 1, 2023 • edited

vsoch commented May 1, 2023

vsoch commented May 1, 2023

christopherwharrop-noaa commented May 3, 2023

vsoch commented May 3, 2023

christopherwharrop-noaa commented May 1, 2023 •

edited

grondo commented May 1, 2023 •

edited