Skip to content
This repository has been archived by the owner on Jan 8, 2024. It is now read-only.

Kubernetes Runner Management #1167

Merged
merged 4 commits into from
Mar 8, 2021
Merged

Kubernetes Runner Management #1167

merged 4 commits into from
Mar 8, 2021

Conversation

mitchellh
Copy link
Contributor

This adds Kubernetes support for runner management.

I use a Deployment for runners since they have no state.

The only sneaky thing here is that during uninstall of a runner, I grab the cpu and memory requirements of the last install and persist them so we can use them on the next install (in the same process) which happens during an upgrade. I do this because we don't ask for these flags for the upgrade command and I think we shouldn't, we should just copy the existing settings.

Most of the LOC in this PR is just extracting client initialization into a standalone function. 馃槃

@mitchellh mitchellh requested review from briancain and a team March 8, 2021 16:01
@github-actions github-actions bot added the core label Mar 8, 2021
Copy link
Member

@briancain briancain left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Worked for me once I rebuilt my local waypoint server image. One thing we should improve here (potentially in a separate PR) is a better way to check the state of the runner pod post install. The waypoint output said that it was reported as ready, but kubectl said the pod wasn't ready and was in a crash backoff loop. So maybe we can make that a bit better for any other runner pod install errors that could come up in the future.

@mitchellh
Copy link
Contributor Author

Good idea! Good idea. I think we can introduce a ListRunner API and then poll that for changes probably... I think that'd work well for now. That requires a totally new API though so probably (?) out of scope for this PR.

@mitchellh
Copy link
Contributor Author

Actually just noticed this was the K8S PR specifically 馃槃 I think adding a pod status probably is a good idea. Let me look into that now.

@briancain
Copy link
Member

@mitchellh maybe it could be similar to what we do on Deploy for k8s?

for _, p := range pods.Items {
for _, cs := range p.Status.ContainerStatuses {
if cs.Ready {
continue
}
if cs.State.Waiting != nil {
// TODO: handle other pod failures here
if cs.State.Waiting.Reason == "ImagePullBackOff" ||
cs.State.Waiting.Reason == "ErrImagePull" {
detectedError = "Pod unable to access Docker image"
k8error = cs.State.Waiting.Message
}
}
}
}
if detectedError != "" && !reportedError {
// we use ui output here instead of a step group, otherwise the warning
// gets swallowed up on the next poll iteration
ui.Output("Detected pods having an issue starting - %s: %s",
detectedError, k8error, terminal.WithWarningStyle())
reportedError = true
// force a faster rerender
lastStatus = time.Time{}
}
return false, nil
})
if err != nil {
if err == wait.ErrWaitTimeout {
err = fmt.Errorf("Deployment was not able to start pods after %s", timeout)
}
return nil, err
}

@mitchellh
Copy link
Contributor Author

Just pushed an easy short term fix :) I noticed we were checking for ready equality but it starts out at 0 so we don't wait at all. This now checks if ready is > 0 and then moves on.

Copy link
Member

@briancain briancain left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! The short term fix works for me. Ends up polling for a while before exiting rather than continuing on saying it's ready 馃憤

@mitchellh mitchellh merged commit 5a6daf7 into f-runner-mgmt Mar 8, 2021
@mitchellh mitchellh deleted the f-runner-k8s branch March 8, 2021 19:27
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants