Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add RADICAL-Pilot backend #290

Open
1 of 9 tasks
LourensVeen opened this issue Mar 22, 2024 · 0 comments
Open
1 of 9 tasks

Add RADICAL-Pilot backend #290

LourensVeen opened this issue Mar 22, 2024 · 0 comments
Assignees

Comments

@LourensVeen
Copy link
Contributor

LourensVeen commented Mar 22, 2024

We're seeing some issues with QCG-PilotJob on very large machines, and in general given the heterogeneity of the HPC landscape it's probably good to have more than one way of starting and monitoring processes to maximise our chances of success.

We've been working with the authors of RADICAL-Pilot lately, and they are adding some features to it to make it more suitable for use as an instantiator for MUSCLE3. Let's add it as an optional second backend.

  • Add an integration test that uses a simulated SLURM cluster of Docker containers
  • Add an RPInstantiator
    • Scan the environment and determines what resources we have
    • Get an RP Pilot using those
    • Start instances using the pilot
    • Monitor execution and shut down correctly at simulation end or crash
  • Add a command line option to the manager to select the backend
  • Test (locally, DAS, ARCHER2, Frontier?)
  • Coordinate a release
@LourensVeen LourensVeen self-assigned this Jun 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant