Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Python JobspecV1: Can I use flux batch rather than flux run? #5220

Closed
jan-janssen opened this issue May 30, 2023 · 6 comments
Closed

Python JobspecV1: Can I use flux batch rather than flux run? #5220

jan-janssen opened this issue May 30, 2023 · 6 comments

Comments

@jan-janssen
Copy link

For my density functional theory calculation, I typically have a python process running next to an MPI parallel Fortran code. The python process looks at the output file and interrupts the execution when the Fortran code is not converging. While ideally it would be great to handle all the convergence identification inside Fortran directly, the python wrapper offers rapid prototyping and material system specific tuning. When I use flux on the command line I would do something like this:

flux batch --flags=waitable -n 4 batch.sh

And then in the batch.sh script I would have:

python custodian.py &
flux run -n 4 dft_mpi_gpu

I can translate the flux run call to a JobspecV1 call in python, but I would prefer to move the flux batch call to the python level. In particular I like the concurrent.futures representation of the JobspecV1 class, which simplifies the integration in other python projects.

@grondo
Copy link
Contributor

grondo commented May 30, 2023

Are you asking if there is a JobspecV1 constructor that creates the equivalent of the flux batch command? If so, I believe what you are looking for is from_batch_command().

Let me know if that is not what you need.

@jan-janssen
Copy link
Author

Thanks a lot - from_batch_command() was exactly what I was looking for. Now I am just a little surprised, that the names change. How does num_slots compare to num_tasks? In my understanding they are identical but maybe I am missing something.

@grondo
Copy link
Contributor

grondo commented May 31, 2023

How does num_slots compare to num_tasks? In my understanding they are identical but maybe I am missing something.

As far as resource allocation goes, they are the same. The terminology is different for from_batch_command (and flux batch), because the result does not run any user tasks. It allocates resources, then runs 1 broker per node in the result, with the batch script (aka initial program) executing on broker rank 0.

They are called task slots because you're requesting to allocate slots of a given resource size (e.g. cores and gpus) for eventual placement of tasks.

@jan-janssen
Copy link
Author

Ok, thanks again for the explanation.

@jan-janssen
Copy link
Author

@grondo Maybe I do something wrong, but when I submit more than one flux run command inside the script which I set for from_batch_command() then somehow it seems to hang until it reached the last flux run command.

@jan-janssen
Copy link
Author

My mistake - I forgot to close the with-statement.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants