Skip to content

Conversation

jovial
Copy link
Contributor

@jovial jovial commented Nov 8, 2024

There was a race conditions between slurmctld starting up and slurmd. This adds a few retries to make it more robust.

There was a race conditions between slurmctld starting up and slurmd.
This adds a few retries to make it more robust.
@jovial jovial requested a review from a team as a code owner November 8, 2024 14:52
Copy link
Collaborator

@sjpb sjpb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM although we don't understand why this is reproducable on a small client dev cluster and no-where else. May be related to poor volume performance due to ceph traversing a router.

@sjpb sjpb merged commit b9f9d16 into master Nov 8, 2024
34 checks passed
@sjpb sjpb deleted the bugfix/slurm-retries branch November 8, 2024 16:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants