Implement keepalived load balancer #4344

juanluisvaladas · 2024-04-29T11:57:09Z

Description

For easier review I split the PR in 5 commits:
1- Add API types and autogenerated code
2- Controller changes to make keepalived work
3- Updated inttest
4- Encapsulate keepalived config in a new struct
5- Documentations

I also found a typo so I sneaked in a tiny 1 line commit

This is part of #4181

Type of change

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to not work as expected)
Documentation update

How Has This Been Tested?

Manual test
Auto test added

Checklist:

jnummelin · 2024-05-02T09:26:23Z

cmd/controller/controller.go

@@ -232,11 +232,16 @@ func (c *command) start(ctx context.Context) error {
 		if c.SingleNode {
 			return errors.New("control plane load balancing cannot be used in a single-node cluster")
 		}
+		if cplb.Type != v1beta1.CPLBTypeKeepalived {


IMO we should do this in config validation already.

Agreed, this is done in a separate commit because rewriting the git history was quite complex...

jnummelin · 2024-05-02T09:27:42Z

docs/cplb.md

@@ -3,17 +3,23 @@
 For clusters that don't have an [externally managed load balancer](high-availability.md#load-balancer) for the k0s
 control plane, there is another option to get a highly available control plane called control plane load balancing (CPLB).

-CPLB allows automatic assigned of predefined IP addresses using VRRP across masters.
+CPLB has two features that often will be combined, but normally will be used together: VRRP Instances, which allows


CPLB has two features that often will be combined, but normally will be used together

This doesn't sound right 😄

It doesn't 😅 . Fixed

jnummelin · 2024-05-02T09:29:34Z

docs/cplb.md

+* If `VirtualServers` are used, the cluster configuration doesn't specify a non-empty
+  [`spec.api.externalAddress`][specapi]. `VRRPInstances` are compatible.


hmm, I don't really get what this means. Could you rephrase this a bit

I rephrased it, please review it again, I think it's better now but I'm not quite convinced to be honest

pkg/apis/k0s/v1beta1/cplb.go

pkg/component/controller/cplb_reconciler.go

jnummelin · 2024-05-02T09:45:14Z

pkg/component/controller/cplb_reconciler.go

+
+func (r *CPLBReconciler) watchAPIServers(ctx context.Context) {
+	// Before starting check that the API is actually responding
+	for {


Maybe we could utilize our internal watch helper here? IMO would be much simpler and that already handles retries etc.

Changed this as a separate commit. Can be squashed.

pkg/component/controller/cplb_reconciler.go

jnummelin · 2024-05-02T09:49:41Z

pkg/component/controller/cplb_unix.go

+		if err := k.configureDummy(); err != nil {
+			return fmt.Errorf("failed to configure dummy interface: %w", err)
+		}
+		if err := k.Config.ValidateVRRPInstances(nil); err != nil {


Again, IMO we need to hook validation to general config validation "phase"

Agreed, it's added as a separate commit because rewriting the git history wasn't easy...

pkg/component/controller/cplb_unix.go

pkg/apis/k0s/v1beta1/cplb.go

juanluisvaladas · 2024-05-02T12:03:22Z

Converting to draft while I address the concerns.

pkg/component/controller/cplb_unix.go

pkg/apis/k0s/v1beta1/cplb.go

pkg/component/controller/cplb_reconciler.go

docs/cplb.md

pkg/component/controller/cplb_unix.go

pkg/supervisor/supervisor.go

pkg/component/controller/cplb_reconciler.go

twz123 · 2024-05-03T10:57:08Z

pkg/component/controller/cplb_unix.go

+	// Wait for the supervisor to start keepalived before
+	// watching for endpoint changes
+	process := k.supervisor.GetProcess()
+	for process == nil {
+		k.log.Info("Waiting for keepalived to start")
+		time.Sleep(5 * time.Second)
+		process = k.supervisor.GetProcess()
+	}


It might work without this loop, if the nil check would be moved into the update loop.

If you feel strongly about it, I will change it, but I prefer it this way.

Even if the process is dead, s.GetProcess won't return nil once keepalived is started the first time, it will just return a process with an invalid PID. Adding it into the loop means we're doing a check that will always return false. Also we don't care if the PID is of an older dead process because the PID is obtained AFTER writing the template which means the new process will be using the new file.

But if you feel very strongly about it I'll move the nil check inside, there aren't big consecuences anyway, maybe faster reloads when the cluster is bootstrapping.

If you feel strongly about it, I will change it, but I prefer it this way.

Currently, the loop cannot be cancelled externally. That's why I figured that we could just inline the nil check below. Would be less code, too. I'm fine with keeping it, as long as it can be cancelled.

it will just return a process with an invalid PID

Right. I'd argue that this is a current shortcoming of Supervisor, though.

@twz123 I accidentally editted your comment with my reply 🤦

The reply is:

I added a limit of 6 times, that's 30 seconds and should be way more than enough time to start keepalived. It gets cancelled eventually, it's not the prettiest but should fix the problem.

Unfortunatelly I deleted your comment saying that it had to be possible to cancel it from the outside.

Unfortunatelly I deleted your comment saying that it had to be possible to cancel it from the outside.

Restored it from the history 🙃

Signed-off-by: Juan-Luis de Sousa-Valadas Castaño <jvaladas@mirantis.com>

pkg/apis/k0s/v1beta1/cplb.go

twz123 · 2024-05-07T10:10:40Z

pkg/apis/k0s/v1beta1/cplb.go

+	TUNLBKind KeepalivedLBKind = "TUN"
+)
+
+type RealServer struct {


Are they? Still not able to spot them 🙈

twz123 · 2024-05-07T10:31:50Z

pkg/component/controller/cplb_unix.go

@@ -108,6 +140,10 @@ func (k *Keepalived) Start(_ context.Context) error {
 		DataDir: k.K0sVars.DataDir,
 		UID:     k.uid,
 	}
+
+	if len(k.Config.VirtualServers) > 0 {
+		go k.watchReconcilerUpdates()


I cannot spot it. Can you verify?

twz123 · 2024-05-07T10:41:05Z

pkg/component/controller/cplb_unix.go

+	// Wait for the supervisor to start keepalived before
+	// watching for endpoint changes
+	process := k.supervisor.GetProcess()
+	for process == nil {
+		k.log.Info("Waiting for keepalived to start")
+		time.Sleep(5 * time.Second)
+		process = k.supervisor.GetProcess()
+	}


If you feel strongly about it, I will change it, but I prefer it this way.

Currently, the loop cannot be cancelled externally. That's why I figured that we could just inline the nil check below. Would be less code, too. I'm fine with keeping it, as long as it can be cancelled.

it will just return a process with an invalid PID

Right. I'd argue that this is a current shortcoming of Supervisor, though.

pkg/apis/k0s/v1beta1/cplb.go

pkg/component/controller/cplb_unix.go

juanluisvaladas · 2024-05-07T11:45:55Z

Everything should be addressed now

twz123

One last thing™. I think we can just omit the delay_loop if it's zero.

twz123 · 2024-05-07T12:36:07Z

pkg/component/controller/cplb_unix.go

+{{ if gt (len $RealServers) 0 }}
+{{ range .VirtualServers }}
+virtual_server {{ .IPAddress }} {{ $APIServerPort }} {
+    delay_loop {{ .DelayLoop.Seconds }}


Suggested change

delay_loop {{ .DelayLoop.Seconds }}

{{- if gt .DelayLoop.Seconds 0.0 }}

delay_loop {{ .DelayLoop.Seconds }}

{{- end }}

We can't. In keepalived it defaults to 60. I set it to 0 because I think it doesn't make sense to delay it at all in CPLB because it's added after kubernetes.default.svc is reconciled and hence has passed all the relevant health local health checks.

We can't in keepalived it defaults to 60.

I see. Apparently I found yet another delay_loop in the keepalived codebase 😅

I set it to 0 because I think it doesn't make sense to delay it at all in CPLB because it's added after kubernetes.default.svc is reconciled and hence has passed all the relevant health local health checks.

I don't fully understand the implications, but a delay_loop of 0 is not a thing in keepalived (it will use the default of 60, then):

# /var/lib/k0s/bin/keepalived --dont-fork --use-file /run/k0s/keepalived.conf --no-syslog --log-console --log-detail --dump-conf -t (/run/k0s/keepalived.conf: Line 33) number '0' outside range [0.000001, 18446744073709.551614] (/run/k0s/keepalived.conf: Line 33) virtual server delay loop '0' invalid - ignoring

But letz address this in a subsequent PR 😄

pkg/apis/k0s/v1beta1/cplb.go

docs/configuration.md

pkg/apis/k0s/v1beta1/cplb.go

And move validation to clusterconfig. Signed-off-by: Juan-Luis de Sousa-Valadas Castaño <jvaladas@mirantis.com>

Signed-off-by: Juan-Luis de Sousa-Valadas Castaño <jvaladas@mirantis.com>

The field wasn't required and didn't serve any actual purpose, so remove it and auto generate it always. Signed-off-by: Juan-Luis de Sousa-Valadas Castaño <jvaladas@mirantis.com>

Signed-off-by: Juan-Luis de Sousa-Valadas Castaño <jvaladas@mirantis.com>

Co-authored-by: Tom Wieczorek <twz123@users.noreply.github.com> Signed-off-by: Juan-Luis de Sousa-Valadas Castaño <jvaladas@mirantis.com>

juanluisvaladas force-pushed the k0s-lb-2 branch from 62f99f7 to 0d7de41 Compare April 29, 2024 14:06

jnummelin added this to the 1.30 milestone Apr 29, 2024

juanluisvaladas force-pushed the k0s-lb-2 branch 8 times, most recently from 7c5ef55 to 0871a4b Compare April 30, 2024 15:03

juanluisvaladas mentioned this pull request Apr 30, 2024

Keepalived support #4181

Closed

3 tasks

juanluisvaladas force-pushed the k0s-lb-2 branch 2 times, most recently from 7406008 to d5797f1 Compare April 30, 2024 15:29

juanluisvaladas marked this pull request as ready for review April 30, 2024 15:31

juanluisvaladas requested a review from a team as a code owner April 30, 2024 15:31

juanluisvaladas requested review from ncopa and twz123 April 30, 2024 15:31

juanluisvaladas force-pushed the k0s-lb-2 branch from d5797f1 to 6c6c346 Compare May 2, 2024 08:32

jnummelin reviewed May 2, 2024

View reviewed changes

pkg/apis/k0s/v1beta1/cplb.go Show resolved Hide resolved

jnummelin reviewed May 2, 2024

View reviewed changes

pkg/component/controller/cplb_reconciler.go Outdated Show resolved Hide resolved

jnummelin reviewed May 2, 2024

View reviewed changes

juanluisvaladas marked this pull request as draft May 2, 2024 12:03

twz123 reviewed May 2, 2024

View reviewed changes

pkg/component/controller/cplb_reconciler.go Outdated Show resolved Hide resolved

juanluisvaladas force-pushed the k0s-lb-2 branch from 6c6c346 to 63aef9d Compare May 3, 2024 10:02

twz123 reviewed May 3, 2024

View reviewed changes

juanluisvaladas force-pushed the k0s-lb-2 branch from 2e89b58 to f103344 Compare May 6, 2024 13:54

Implement CPLB virtualServers and reconciler

c3a0a58

Signed-off-by: Juan-Luis de Sousa-Valadas Castaño <jvaladas@mirantis.com>

juanluisvaladas force-pushed the k0s-lb-2 branch 2 times, most recently from dd92d37 to 7eff376 Compare May 6, 2024 14:56

juanluisvaladas marked this pull request as ready for review May 6, 2024 15:19

twz123 reviewed May 7, 2024

View reviewed changes

pkg/apis/k0s/v1beta1/cplb.go Outdated Show resolved Hide resolved

twz123 reviewed May 7, 2024

View reviewed changes

pkg/component/controller/cplb_unix.go Outdated Show resolved Hide resolved

twz123 reviewed May 7, 2024

View reviewed changes

pkg/apis/k0s/v1beta1/cplb.go Outdated Show resolved Hide resolved

juanluisvaladas force-pushed the k0s-lb-2 branch 2 times, most recently from cf8c331 to 1c9689a Compare May 7, 2024 13:55

juanluisvaladas enabled auto-merge May 7, 2024 14:09

juanluisvaladas force-pushed the k0s-lb-2 branch from 1c9689a to 997278b Compare May 7, 2024 14:10

twz123 reviewed May 7, 2024

View reviewed changes

pkg/apis/k0s/v1beta1/cplb.go Outdated Show resolved Hide resolved

juanluisvaladas force-pushed the k0s-lb-2 branch from fb7b554 to 9ca23a5 Compare May 7, 2024 14:18

juanluisvaladas and others added 10 commits May 7, 2024 16:47

Extract keepalived configuration to a subtype

661ef19

And move validation to clusterconfig. Signed-off-by: Juan-Luis de Sousa-Valadas Castaño <jvaladas@mirantis.com>

Implement VirtualServers inttest

30966db

Signed-off-by: Juan-Luis de Sousa-Valadas Castaño <jvaladas@mirantis.com>

Add VirtualServer docs and fix Keepalived subtype

d61005e

Signed-off-by: Juan-Luis de Sousa-Valadas Castaño <jvaladas@mirantis.com>

Automatically add virtualIPs to apiserver SANs

2619364

Signed-off-by: Juan-Luis de Sousa-Valadas Castaño <jvaladas@mirantis.com>

Remove VRRPInstances.Name

9175b9f

The field wasn't required and didn't serve any actual purpose, so remove it and auto generate it always. Signed-off-by: Juan-Luis de Sousa-Valadas Castaño <jvaladas@mirantis.com>

Remove ControlPlaneLoadBalancing from dynamic conf

8d57249

Signed-off-by: Juan-Luis de Sousa-Valadas Castaño <jvaladas@mirantis.com>

Make CPLB.VirtualServers.DelayLoop metav1.Duration

d5d7f1a

Signed-off-by: Juan-Luis de Sousa-Valadas Castaño <jvaladas@mirantis.com>

Addess multiple mistakes in code review

0e53299

Signed-off-by: Juan-Luis de Sousa-Valadas Castaño <jvaladas@mirantis.com>

Remove unused RealServer struct type

c22cbf3

Signed-off-by: Juan-Luis de Sousa-Valadas Castaño <jvaladas@mirantis.com>

Fix documentation for Delayloop

18d3583

Co-authored-by: Tom Wieczorek <twz123@users.noreply.github.com> Signed-off-by: Juan-Luis de Sousa-Valadas Castaño <jvaladas@mirantis.com>

juanluisvaladas force-pushed the k0s-lb-2 branch from 9ca23a5 to 18d3583 Compare May 7, 2024 14:47

twz123 approved these changes May 7, 2024

View reviewed changes

juanluisvaladas merged commit bb42d03 into k0sproject:main May 7, 2024
78 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement keepalived load balancer #4344

Implement keepalived load balancer #4344

juanluisvaladas commented Apr 29, 2024 •

edited

Loading

jnummelin May 2, 2024

juanluisvaladas May 3, 2024

jnummelin May 2, 2024

juanluisvaladas May 3, 2024

jnummelin May 2, 2024

juanluisvaladas May 3, 2024

jnummelin May 2, 2024

juanluisvaladas May 3, 2024

jnummelin May 2, 2024

juanluisvaladas May 3, 2024

juanluisvaladas commented May 2, 2024

twz123 May 3, 2024

juanluisvaladas May 6, 2024 •

edited

Loading

twz123 May 7, 2024 •

edited

Loading

juanluisvaladas May 7, 2024

twz123 May 7, 2024

twz123 May 7, 2024

twz123 May 7, 2024

twz123 May 7, 2024 •

edited

Loading

juanluisvaladas commented May 7, 2024

twz123 left a comment

twz123 May 7, 2024

juanluisvaladas May 7, 2024 •

edited

Loading

twz123 May 7, 2024

		* If `VirtualServers` are used, the cluster configuration doesn't specify a non-empty
		[`spec.api.externalAddress`][specapi]. `VRRPInstances` are compatible.

Implement keepalived load balancer #4344

Implement keepalived load balancer #4344

Conversation

juanluisvaladas commented Apr 29, 2024 • edited Loading

Description

Type of change

How Has This Been Tested?

Checklist:

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

juanluisvaladas commented May 2, 2024

Choose a reason for hiding this comment

juanluisvaladas May 6, 2024 • edited Loading

Choose a reason for hiding this comment

twz123 May 7, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

twz123 May 7, 2024 • edited Loading

Choose a reason for hiding this comment

juanluisvaladas commented May 7, 2024

twz123 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

juanluisvaladas May 7, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

juanluisvaladas commented Apr 29, 2024 •

edited

Loading

juanluisvaladas May 6, 2024 •

edited

Loading

twz123 May 7, 2024 •

edited

Loading

twz123 May 7, 2024 •

edited

Loading

juanluisvaladas May 7, 2024 •

edited

Loading