Kubernetes Runner Management #1167

mitchellh · 2021-03-08T16:01:41Z

This adds Kubernetes support for runner management.

I use a Deployment for runners since they have no state.

The only sneaky thing here is that during uninstall of a runner, I grab the cpu and memory requirements of the last install and persist them so we can use them on the next install (in the same process) which happens during an upgrade. I do this because we don't ask for these flags for the upgrade command and I think we shouldn't, we should just copy the existing settings.

Most of the LOC in this PR is just extracting client initialization into a standalone function. 😄

briancain

Worked for me once I rebuilt my local waypoint server image. One thing we should improve here (potentially in a separate PR) is a better way to check the state of the runner pod post install. The waypoint output said that it was reported as ready, but kubectl said the pod wasn't ready and was in a crash backoff loop. So maybe we can make that a bit better for any other runner pod install errors that could come up in the future.

mitchellh · 2021-03-08T18:50:42Z

Good idea! Good idea. I think we can introduce a ListRunner API and then poll that for changes probably... I think that'd work well for now. That requires a totally new API though so probably (?) out of scope for this PR.

mitchellh · 2021-03-08T18:51:54Z

Actually just noticed this was the K8S PR specifically 😄 I think adding a pod status probably is a good idea. Let me look into that now.

briancain · 2021-03-08T18:53:19Z

@mitchellh maybe it could be similar to what we do on Deploy for k8s?

waypoint/builtin/k8s/platform.go

Lines 433 to 468 in 8c2bfa7

    
           	for _, p := range pods.Items { 
        
           		for _, cs := range p.Status.ContainerStatuses { 
        
           			if cs.Ready { 
        
           				continue 
        
           			} 
        
           			if cs.State.Waiting != nil { 
        
           				// TODO: handle other pod failures here 
        
           				if cs.State.Waiting.Reason == "ImagePullBackOff" || 
        
           					cs.State.Waiting.Reason == "ErrImagePull" { 
        
           					detectedError = "Pod unable to access Docker image" 
        
           					k8error = cs.State.Waiting.Message 
        
           				} 
        
           			} 
        
           		} 
        
           	} 
        
           	if detectedError != "" && !reportedError { 
        
           		// we use ui output here instead of a step group, otherwise the warning 
        
           		// gets swallowed up on the next poll iteration 
        
           		ui.Output("Detected pods having an issue starting - %s: %s", 
        
           			detectedError, k8error, terminal.WithWarningStyle()) 
        
           		reportedError = true 
        
           		// force a faster rerender 
        
           		lastStatus = time.Time{} 
        
           	} 
        
           	return false, nil 
        
           }) 
        
           if err != nil { 
        
           	if err == wait.ErrWaitTimeout { 
        
           		err = fmt.Errorf("Deployment was not able to start pods after %s", timeout) 
        
           	} 
        
           	return nil, err 
        
           }

mitchellh · 2021-03-08T18:59:09Z

Just pushed an easy short term fix :) I noticed we were checking for ready equality but it starts out at 0 so we don't wait at all. This now checks if ready is > 0 and then moves on.

briancain

Nice! The short term fix works for me. Ends up polling for a while before exiting rather than continuing on saying it's ready 👍

mitchellh added 3 commits March 7, 2021 09:49

nix: add minikube for Linux

30eef74

internal/serverinstall: k8s uninstallrunner impl

654b33a

internal/serverinstall/k8s: store runner settings for upgrade

45933b0

mitchellh requested review from briancain and a team March 8, 2021 16:01

github-actions bot added the core label Mar 8, 2021

briancain approved these changes Mar 8, 2021

View reviewed changes

internal/serverinstall: k8s should wait on ready replicas to be >0

bb22fd6

vercel bot temporarily deployed to Preview March 8, 2021 18:58 Inactive

briancain approved these changes Mar 8, 2021

View reviewed changes

mitchellh merged commit 5a6daf7 into f-runner-mgmt Mar 8, 2021

mitchellh deleted the f-runner-k8s branch March 8, 2021 19:27

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Kubernetes Runner Management #1167

Kubernetes Runner Management #1167

mitchellh commented Mar 8, 2021

briancain left a comment

mitchellh commented Mar 8, 2021

mitchellh commented Mar 8, 2021

briancain commented Mar 8, 2021

mitchellh commented Mar 8, 2021

briancain left a comment

Kubernetes Runner Management #1167

Kubernetes Runner Management #1167

Conversation

mitchellh commented Mar 8, 2021

briancain left a comment

Choose a reason for hiding this comment

mitchellh commented Mar 8, 2021

mitchellh commented Mar 8, 2021

briancain commented Mar 8, 2021

mitchellh commented Mar 8, 2021

briancain left a comment

Choose a reason for hiding this comment