Skip to content

Conversation

@vkarak
Copy link
Contributor

@vkarak vkarak commented Aug 28, 2023

The new backend is called ssh and will spawn a test job on a remote host. More specifically, it will copy the test's stage directory to the remote host, execute the job and copy back the results.

Multiple hosts can be assigned to a partition with an ssh scheduler, in which case the scheduler will pick the next free host to spawn a test on.

This PR also introduced a new run_command_async2 utility for executing asynchronous commands. Conversely to the run_command_async, this returns a "future" encapsulating the spawned process. Process futures can be chained with the then() method and the chained futures will be called upon termination of the predecessor in the chain.

The SSH scheduler backend uses the process futures and spawns a chain of push artefacts-execute job-pull artefacts commands in its submit() method, so that it does not block. The progress is ensured by the scheduler's poll() method which polls the spawned futures for completion.

This PR essentially solves the feature request in #64.

More details about the new scheduler can be found in its docs.

vkarak added 5 commits July 19, 2023 14:07
The following are added:

- Implementation of a future that wraps a spawned process
- A new scheduler that can spawn reframe jobs on a remote machine
  accessed with SSH.
@codecov
Copy link

codecov bot commented Aug 28, 2023

Codecov Report

Attention: 102 lines in your changes are missing coverage. Please review.

❗ No coverage uploaded for pull request base (develop@fda00a5). Click here to learn what that means.

Additional details and impacted files
@@            Coverage Diff             @@
##             develop    #2975   +/-   ##
==========================================
  Coverage           ?   86.73%           
==========================================
  Files              ?       61           
  Lines              ?    11962           
  Branches           ?        0           
==========================================
  Hits               ?    10375           
  Misses             ?     1587           
  Partials           ?        0           
Files Coverage Δ
reframe/core/backends.py 93.93% <ø> (ø)
reframe/core/schedulers/local.py 92.85% <100.00%> (ø)
reframe/core/schedulers/__init__.py 97.44% <75.00%> (ø)
reframe/utility/osext.py 86.72% <91.58%> (ø)
reframe/core/schedulers/ssh.py 35.91% <35.91%> (ø)

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@pep8speaks
Copy link

pep8speaks commented Sep 25, 2023

Hello @vkarak, Thank you for updating!

Cheers! There are no PEP8 issues in this Pull Request!Do see the ReFrame Coding Style Guide

Comment last updated at 2023-09-29 21:31:48 UTC

Copy link
Contributor

@victorusu victorusu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm. I wonder how we will do the unit tests for this scheduler. Do we have any plans?

@vkarak
Copy link
Contributor Author

vkarak commented Sep 28, 2023

lgtm. I wonder how we will do the unit tests for this scheduler. Do we have any plans?

I just added some basic unit tests.

@vkarak vkarak merged commit 1139fc7 into reframe-hpc:develop Sep 29, 2023
@vkarak vkarak deleted the feat/ssh-scheduler branch September 29, 2023 22:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

3 participants