Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support autoscaling and test in CI #151

Open
wants to merge 141 commits into
base: main
Choose a base branch
from
Open

Support autoscaling and test in CI #151

wants to merge 141 commits into from

Conversation

sjpb
Copy link
Collaborator

@sjpb sjpb commented Mar 8, 2022

Adds;

  • Support for autoscaling via cloud-state nodes.
  • CI for arcus and smslabs which test a mixed cloud/non-cloud cluster:
    • Test direct configuration of control, login + 2x compute
    • Test login + compute image build (in parallel with direct configuration, for speed)
    • Test reimage of login via openstack
    • Test reimage of compute via Slurm
    • Run test suite on 4x nodes (2x non-cloud + 2x cloud), testing autoscaling (uses predefined direct-mode ports for arcus, default new ports for smslabs)
  • Support for using predefined ports for autoscaled nodes.
  • A ResumeFail script which handles no node getting created and "not enough hosts available" errors.

NB 1: This is based off and supersedes #128.
NB 2: Requires the following merges, which will then need changes to dependencies:

Comments for release notes:

  • Should now use openhpc_config_extra instead of overriding openhpc_config (there is a check that configless mode hasn't been disabled by such an override)

@sjpb sjpb mentioned this pull request Mar 9, 2022
9 tasks
@sjpb sjpb changed the title Support autoscaling and CI test this on Arcus Support autoscaling and test in CI Mar 10, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant