add controller-driven kube-vip load balancer for K3S_BASE clusters by naiming-zededa · Pull Request #5683 · lf-edge/eve

naiming-zededa · 2026-03-17T22:30:02Z

Description

This PR implements controller-driven Kubernetes LoadBalancer services
for CLUSTER_TYPE_K3S_BASE in eve-k.

 - pkg/pillar/types/clustertypes.go: Add LBInterfaceConfig (interface +
   CIDR string) and LBInterfaces []LBInterfaceConfig to both
   EdgeNodeClusterConfig and EdgeNodeClusterStatus.
  - pkg/pillar/cmd/zedagent/parseconfig.go: Parse LoadBalancerService
    from the controller proto and populate
    EdgeNodeClusterConfig.LBInterfaces (K3S_BASE only; first
    interface/CIDR entry applied).
  - pkg/pillar/cmd/zedkube/clusterstatus.go: Relay LBInterfaces from
    EdgeNodeClusterConfig into EdgeNodeClusterStatus on the bootstrap
    node only; non-bootstrap nodes publish an empty list so they do not
    trigger kube-vip setup.
  - pkg/pillar/dpcmanager/dns.go: Filter kube-vip VIPs out of
    DeviceNetworkStatus.AddrInfoList using the LBInterfaces CIDR range,
    preventing VIPs from being used as source addresses for
    controller-bound traffic.
  - pkg/kube/cluster-init.sh: Add check_kubevip_lb loop that reads
    EdgeNodeClusterStatus JSON each iteration and calls kubevip-apply.sh
    or kubevip-delete.sh when the LB config changes. Persists
    last-applied state to avoid redundant re-applies across restarts.
  - pkg/kube/kubevip-apply.sh / kubevip-delete.sh: Scripts to
    install/remove the kube-vip DaemonSet and kube-vip-cloud-provider
    Deployment, configuring the IP pool via a kubevip ConfigMap.
  - pkg/kube/kubevip-ds.yaml: kube-vip DaemonSet manifest (ARP mode,
    control-plane nodes).
  - pkg/kube/config.yaml: Disable k3s built-in ServiceLB (servicelb) and
    Traefik for K3S_BASE — kube-vip replaces ServiceLB; users bring
    their own ingress.
  - pkg/pillar/docs/zedkube.md: Document the feature with an overview
    diagram, data-flow, EVE-API proto, and DeviceNetworkStatus filtering
    notes.
  - the kube-vip version is change from 0.8.9 to the latest v1.1.0 for this
     formal LB support.

PR dependencies

How to test and validate this PR

Create a native orchestration type cluster on EVE devices, and in the controller config the enabling of 'loadbalancing'
for the cluster. Specify the interface and IP prefix of the LB.

In the kubernetes side, user specify the helm/yaml definition of App and Service with type of 'loadbalancer'. Verify
the service has the IP address allocated on the interface for this service, and endpoint created for the App.

User the specify client to access that ip address and port. and verify when one device is down, the IP is reallocated
to another device of the cluster, and the app is still reachable.

there are many different ways to use the LB service, see examples in pkg/pillar/docs/zedkube.md for detail

Changelog notes

add controller-driven kube-vip load balancer for K3S_BASE clusters

PR Backports

Checklist

I've provided a proper description
I've added the proper documentation
I've tested my PR on amd64 device
I've tested my PR on arm64 device
I've written the test verification instructions
I've set the proper labels to this PR

For backport PRs (remove it if it's not a backport):

I've added a reference link to the original PR
PR's title follows the template

And the last but not least:

I've checked the boxes above, or I've provided a good reason why I didn't
check them.

zedi-pramodh · 2026-03-18T17:45:47Z

+// collectLBPoolStatus reads the kubevip ConfigMap from kube-system to get the configured
+// load balancer pool, and gathers IPs currently allocated to LoadBalancer-type services.
+// Returns nil if the kubevip ConfigMap does not exist (kubevip not yet deployed).
+func collectLBPoolStatus(clientset *kubernetes.Clientset, services []types.KubeServiceInfo) *types.KubeLBPoolStatus {


Do we publish this LBPoolStatus in any eve pub/sub calls for collect info ?

this LBPoolStatus is published by zedkube in KubeUserServices, so it is part of the collectinfo.

andrewd-zededa

@naiming-zededa A few questions and a small requested change to the PR description.

andrewd-zededa · 2026-03-25T18:27:29Z

+			if ifName == "" || len(cidrs) == 0 {
+				continue
+			}
+			if _, _, lbErr := net.ParseCIDR(cidrs[0]); lbErr != nil {


Only the first address CIDR here is parsed, is len(cidrs) > 1 an error case or just not supported currently?

yes, we currently only support one intf w/ one prefix, i'll add a log warning message here if it is more than one.

@naiming-zededa let me know when you've updated the PR with the log message.

@eriknordmark log message already updated above, for the warning.

if len(cidrs) > 1 { log.Warnf("parseEdgeNodeClusterConfig: interface %s has %d CIDRs, only the first is supported; ignoring the rest", ifName, len(cidrs)) }

also updated in the cluster-init.sh for the log:

if [ "$lb_count" -gt 1 ] 2>/dev/null; then logmsg "check_kubevip_lb: $lb_count LB interfaces configured, only the first is supported; ignoring the rest" fi

andrewd-zededa · 2026-03-25T18:30:55Z

+    local lb_iface="" lb_cidr="" lb=""
+    if [ -f "$enc_status_file" ]; then
+        enc_data=$(cat "$enc_status_file")
+        lb_iface=$(echo "$enc_data" | jq -r '.LBInterfaces[0].Interface // ""')


Is only 1 LB interface supported currently?

yes, i'll add a log message here for potential more than 1 case.

andrewd-zededa · 2026-03-25T18:32:12Z

  labels:
    app.kubernetes.io/name: kube-vip-ds
-    app.kubernetes.io/version: v0.8.9
+    app.kubernetes.io/version: v1.1.0


What was the motivator for the upgrade here? Was this a required feature in the major version change? Please note this in the PR description.

this older version was put before, but we didn't have a use for kube-vip until now, so update this to the latest. this kube-vip v1.1.0 already supported in k3s 1.26, so it's a while there. I'll add a description on this.

andrewd-zededa · 2026-03-25T18:33:34Z

+	// Only the bootstrap node manages kube-vip load balancing; other nodes leave
+	// LBInterfaces empty so cluster-init.sh does not apply kubevip.
+	if z.clusterConfig.BootstrapNode {
+		status.LBInterfaces = z.clusterConfig.LBInterfaces
+	}
+


How will the lb be handled if the bootstrap node has failed and is pending replacement? Can the stats lease holder perform this instead?

Not really, if the seed node is down and before the clusterstatus w/ this config is applied, then we will not have LB service, until a new seed node is identified from the controller side. But if it is already applied, then it does not matter, it's in the cluster regardless of seed node is there. That would be a very rare corner case, and we will to handle this separately.

codecov · 2026-03-25T20:39:06Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 29.87%. Comparing base (2281599) to head (510f519).
⚠️ Report is 441 commits behind head on master.

Additional details and impacted files

@@             Coverage Diff             @@
##           master    #5683       +/-   ##
===========================================
+ Coverage   19.52%   29.87%   +10.34%     
===========================================
  Files          19       18        -1     
  Lines        3021     2417      -604     
===========================================
+ Hits          590      722      +132     
+ Misses       2310     1549      -761     
- Partials      121      146       +25

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

andrewd-zededa · 2026-03-25T22:09:08Z

+# successfully applied config so that container/device restarts do not
+# trigger a redundant re-apply (which causes "configmaps kubevip already
+# exists" errors from kube-vip-cloud-provider).
+KUBEVIP_STATE_FILE="/var/lib/kubevip-applied"


@naiming-zededa is this file and the kubevip config-map available in collect-info to view?

@andrewd-zededa no, i have updated the 'collect-info.sh' and i now do a list of /var/lib for all the directory so we don't have to do one-by-one later. the config-map, there is nothing new there, only has the interface and prefix we already know, so i'm not adding that.

milan-zededa · 2026-03-31T12:10:35Z

Please increase the script version

milan-zededa · 2026-03-31T14:37:15Z

 # Uses a targeted merge patch on developerConfiguration.featureGates only, so that
 # permittedHostDevices (PCIe/GPU/USB passthrough) is never disturbed.
 Kubevirt_migrate_feature_gates() {
+    if [ -f /var/lib/base-k3s-mode ]; then


Is this related to this PR? Or is it a separate fix that should be maybe backported to 16.0 LTS?

No, this is only a fix in the previous committed PR. That was on enhance the virtio driver for eve-k image, this is not part of the 16.0 LTS.

milan-zededa · 2026-03-31T14:52:34Z

Thanks for the detailed documentation, helps a lot!

andrewd-zededa · 2026-04-01T18:26:53Z

@naiming-zededa looks like a docker hash issue:

Run echo "::group::[Check] pkg: compare Dockerfile hashes"
[Check] pkg: compare Dockerfile hashes
Error: /home/runner/work/eve/eve/pkg/installer/Dockerfile uses lfedge/eve-debug:98a88488d3e0862b75840140dc94da134f398087 but fe4b930dcc3086d8d4241e12465a5b2f2800ade7 is built in this repo
Error: Process completed with exit code 1.

naiming-zededa · 2026-04-01T20:24:48Z

@naiming-zededa looks like a docker hash issue:

Run echo "::group::[Check] pkg: compare Dockerfile hashes"
[Check] pkg: compare Dockerfile hashes
Error: /home/runner/work/eve/eve/pkg/installer/Dockerfile uses lfedge/eve-debug:98a88488d3e0862b75840140dc94da134f398087 but fe4b930dcc3086d8d4241e12465a5b2f2800ade7 is built in this repo
Error: Process completed with exit code 1.

thanks @andrewd-zededa. Just updated.

This PR implements controller-driven Kubernetes LoadBalancer services for CLUSTER_TYPE_K3S_BASE in eve-k. - pkg/pillar/types/clustertypes.go: Add LBInterfaceConfig (interface + CIDR string) and LBInterfaces []LBInterfaceConfig to both EdgeNodeClusterConfig and EdgeNodeClusterStatus. - pkg/pillar/cmd/zedagent/parseconfig.go: Parse LoadBalancerService from the controller proto and populate EdgeNodeClusterConfig.LBInterfaces (K3S_BASE only; first interface/CIDR entry applied). - pkg/pillar/cmd/zedkube/clusterstatus.go: Relay LBInterfaces from EdgeNodeClusterConfig into EdgeNodeClusterStatus on the bootstrap node only; non-bootstrap nodes publish an empty list so they do not trigger kube-vip setup. - pkg/pillar/dpcmanager/dns.go: Filter kube-vip VIPs out of DeviceNetworkStatus.AddrInfoList using the LBInterfaces CIDR range, preventing VIPs from being used as source addresses for controller-bound traffic. - pkg/kube/cluster-init.sh: Add check_kubevip_lb loop that reads EdgeNodeClusterStatus JSON each iteration and calls kubevip-apply.sh or kubevip-delete.sh when the LB config changes. Persists last-applied state to avoid redundant re-applies across restarts. - pkg/kube/kubevip-apply.sh / kubevip-delete.sh: Scripts to install/remove the kube-vip DaemonSet and kube-vip-cloud-provider Deployment, configuring the IP pool via a kubevip ConfigMap. - pkg/kube/kubevip-ds.yaml: kube-vip DaemonSet manifest (ARP mode, control-plane nodes). - pkg/kube/config.yaml: Disable k3s built-in ServiceLB (servicelb) and Traefik for K3S_BASE — kube-vip replaces ServiceLB; users bring their own ingress. - pkg/pillar/docs/zedkube.md: Document the feature with an overview diagram, data-flow, EVE-API proto, and DeviceNetworkStatus filtering notes. Signed-off-by: naiming-zededa <naiming@zededa.com>

… K3S_BASE - add the pillar vendor files updated eve-api Signed-off-by: naiming-zededa <naiming@zededa.com>

andrewd-zededa

LGTM, but should wait for master riscv build fix before merge I guess.

- PR lf-edge#5720 for using the logical-label for clustering interface and the PR lf-edge#5683 for clustering load-balancing, this one addes the handling of logical label in the cluster load-balancing interface Signed-off-by: naiming-zededa <naiming@zededa.com>

- PR #5720 for using the logical-label for clustering interface and the PR #5683 for clustering load-balancing, this one addes the handling of logical label in the cluster load-balancing interface Signed-off-by: naiming-zededa <naiming@zededa.com>

naiming-zededa requested review from eriknordmark, milan-zededa, rouming and zedi-pramodh as code owners March 17, 2026 22:30

github-actions Bot requested review from OhmSpectator, andrewd-zededa and uncleDecart March 17, 2026 22:30

naiming-zededa force-pushed the naiming-loadbalancing branch from ceda466 to 9c5b309 Compare March 17, 2026 22:36

zedi-pramodh reviewed Mar 18, 2026

View reviewed changes

naiming-zededa force-pushed the naiming-loadbalancing branch from 9c5b309 to 7952e18 Compare March 23, 2026 19:05

github-actions Bot requested a review from zedi-pramodh March 23, 2026 19:06

andrewd-zededa suggested changes Mar 25, 2026

View reviewed changes

naiming-zededa force-pushed the naiming-loadbalancing branch from 7952e18 to 97eb753 Compare March 25, 2026 19:37

github-actions Bot requested a review from andrewd-zededa March 25, 2026 19:37

andrewd-zededa reviewed Mar 25, 2026

View reviewed changes

naiming-zededa force-pushed the naiming-loadbalancing branch from 97eb753 to 27f70ac Compare March 26, 2026 04:08

github-actions Bot requested a review from andrewd-zededa March 26, 2026 04:09

milan-zededa reviewed Mar 31, 2026

View reviewed changes

milan-zededa approved these changes Mar 31, 2026

View reviewed changes

naiming-zededa force-pushed the naiming-loadbalancing branch from 27f70ac to cddcacf Compare March 31, 2026 17:29

github-actions Bot requested a review from milan-zededa March 31, 2026 17:30

naiming-zededa force-pushed the naiming-loadbalancing branch from cddcacf to 1b34d5c Compare April 1, 2026 20:17

naiming-zededa force-pushed the naiming-loadbalancing branch from 1b34d5c to 3628d23 Compare April 3, 2026 04:37

Add vendor files for add controller-driven kube-vip load balancer for…

510f519

… K3S_BASE - add the pillar vendor files updated eve-api Signed-off-by: naiming-zededa <naiming@zededa.com>

naiming-zededa force-pushed the naiming-loadbalancing branch from 3628d23 to 510f519 Compare April 3, 2026 04:48

andrewd-zededa approved these changes Apr 3, 2026

View reviewed changes

eriknordmark merged commit d545c2b into lf-edge:master Apr 3, 2026
39 of 41 checks passed

naiming-zededa deleted the naiming-loadbalancing branch April 3, 2026 22:34

naiming-zededa mentioned this pull request Apr 7, 2026

Fix an issue of cluster loadbalancing using logical-lable #5763

Merged

9 tasks

Conversation

naiming-zededa commented Mar 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

PR dependencies

How to test and validate this PR

Changelog notes

PR Backports

Checklist

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

andrewd-zededa left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

codecov Bot commented Mar 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

milan-zededa commented Mar 31, 2026

Uh oh!

andrewd-zededa commented Apr 1, 2026

Uh oh!

naiming-zededa commented Apr 1, 2026

Uh oh!

andrewd-zededa left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

naiming-zededa commented Mar 17, 2026 •

edited

Loading

codecov Bot commented Mar 25, 2026 •

edited

Loading