Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Questions about Parsl integration #235

Open
christopherwharrop-noaa opened this issue May 1, 2023 · 6 comments
Open

Questions about Parsl integration #235

christopherwharrop-noaa opened this issue May 1, 2023 · 6 comments

Comments

@christopherwharrop-noaa
Copy link

christopherwharrop-noaa commented May 1, 2023

I apologize if this is the wrong forum, but I haven't been able to locate another mechanism for asking questions about Flux. If there is a Slack workspace or other help forum please point me to it and I'll be happy to post there.

I am testing use of Flux with Parsl. I have the latest Flux and the latest Parsl and am trying to understand requirements for connectivity and how environments are propagated.

  1. I use one machine where users are not permitted to ssh to compute nodes, even when they own jobs that are currently running on them. I use another machine where ssh access is allowed for compute nodes that are running jobs owned by that user. I cannot get Parsl/Flux to work on the former, but I was successful on the latter. I'd like to inquire exactly what the port/protocol access requirements are for establishing communication between a running Parsl program on a login node that is using the FluxExecutor and the pool of resources being managed by Flux inside the Parsl pilot on compute nodes.

  2. In my test program it appears that the Parsl "worker_init" environment is not being propagated to the jobs that run under Flux via Parsl's FluxExectutor. Is that expected? What this means is that I need to run "module load...." commands in the "worker_init" and also in the actual command used by the Parsl Bash Apps that run my MPI programs. I'd like to confirm if that is normal, or if maybe I have something misconfigured.

@jameshcorbett
Copy link
Member

I'm not sure what the best forum is for this issue, but I think I am (along with some Parsl developers) the best person to ask, since I wrote the Parsl integration, although there may have been some changes to it since I last worked on it.

For 1), I believe the executor does assume that users are able to ssh to compute nodes they have allocated. I think that was a known limitation when I wrote it, but I'll confirm and circle back. We should at least make that known as a limitation.

(For fellow Flux developers, the implementation gets the URI of the remote instance and tries to communicate with it, which requires ssh if I remember correctly?)

For 2), I'll need to investigate further.

@grondo
Copy link
Contributor

grondo commented May 1, 2023

the implementation gets the URI of the remote instance and tries to communicate with it, which requires ssh if I remember correctly?)

Yes, currently the remote URI for jobs is an ssh connector URI, e.g. ssh://HOST/path/to/socket, which uses ssh to connect to remote unix domain socket.

If there is a use case where users do not have login access to compute nodes, we may need to create another kind of connector, perhaps one that proxies through the enclosing instance (e.g. there's been a couple prototypes of a job shell execution server). I'm not sure how all the tools would know to provide the alternate connector URI -- perhaps we should get an open issue on this, and I bet @garlick has thoughts.

@vsoch
Copy link
Member

vsoch commented May 1, 2023

I have an example using Flux and parsl alongside the operator: https://github.com/flux-framework/flux-operator/blob/main/examples/launchers/parsl/molecular-design/minicluster.yaml

and that uses the container (with the docs and full example) here: https://github.com/rse-ops/flux-hpc/tree/main/molecular-design-parsl

@vsoch
Copy link
Member

vsoch commented May 1, 2023

TLDR:

executor = FluxExecutor(working_dir=args.working_dir)

# Workaround for bug that it isn't set if provided
executor.launch_cmd="{flux} submit {python} {manager} {protocol} {hostname} {port}"
config = Config(executors=[executor])
parsl.load(config)

https://github.com/rse-ops/flux-hpc/blob/8a98124842994795ec0eb2f61500fb5f61ff241d/molecular-design-parsl/scripts/0_molecular-design-with-parsl.py#LL92C1-L98C1

I converted that from their example notebooks. They had a few more complex examples using something called columna but I never got that working with flux. Note that I changed the command to submit too - I think the command they have is something else and it didn't make sense to me at the time!

@christopherwharrop-noaa
Copy link
Author

Thank you for the quick responses. In my case, many of our on-prem HPC systems do allow ssh access to nodes if they are running a job owned by the user. Unfortunately, local policies (which I have no control over, of course) for other machines disallow that for other machines. I don't know how to express the use case other than to say that ssh access is not always permitted. I'm going to test if ssh access on that machine is permitted between nodes of a running job. My guess is no, but I'll check.

The propagation of environments isn't a show stopper, but it makes the shell commands for the Parsl Bash Apps a bit messy.

@vsoch - I'm not quite following what you were getting at. Were you just sharing an example of using the FluxExecutor for me to look at? I'm not following how that relates to my questions. Am I missing something?

@vsoch
Copy link
Member

vsoch commented May 3, 2023

Yes it's just an example of Flux + parsl integration - there are so few out there I thought it might help to have some means of comparison.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants