Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Existing services with more than two external IPs prevent MetalLB from completing setup #1431

Closed
penguineer opened this issue Jun 14, 2022 · 19 comments · Fixed by #1778
Closed

Comments

@penguineer
Copy link

Following initial situation:

  • k3s (1.24.1+k3s1) cluster with three worker nodes
  • ingress-nginx running, which got an external IP for each of the worker nodes

I decided to add MetalLB as load balancer, mostly to have a fixed IP for firewall and DNS handling. Setup via Helm chart went well so far, but I got this weird error message:

reason: "nolbIPsIPFamily"
message: "Failed to retrieve lbIPs family"

Searching did not help much but turn up this part of code:

level.Error(l).Log("event", "clearAssignment", "reason", "nolbIPsIPFamily", "msg", "Failed to retrieve lbIPs family")

After reading some more code I ended up here:

func ForAddresses(ips []string) (Family, error) {

My understanding: MetalLB tries to determine IPv4/Ipv6/Dual Stack behavior based on the Service's IP addresses. In this regard, the service is expected to have either one IPv4 or one IPv6 or an IPv4 and IPv6 address.
My service, at this point, had three IPv4 addresses, which has been discarded as invalid in a Dual Stack context.

Unfortunately, because of this, MetalLB also did not go along and assigned the new address to my Ingress, leaving it with the three addresses that would let MetalLB run into the same issue over and over.

After two hours of code reading and thinking that I am too stupid to set up the lb based on its documentation, I decided to reinstall the nginx ingress controller → this solved my problem and it now got the correct IP address.

So, what happened: MetalLB stumbled over existing services with external IPs not matching its expectations
What I expected: MetalLB would re-configure these services

To solve this issue I needed to delete these services and deploy them again.
(Luckily this wasn't too hard with Helm, but it means that MetalLB cannot be introduced into a larger existing setup.)

I am not sure how well the problem can be solved, but I'm hoping that, even if it cannot or will not be fixed, with this issue thre is another point to end an search other than the line of code that generates this error.

Thank you and keep up the good work!

@protoz
Copy link

protoz commented Jul 5, 2022

I am having a similar issue but, I am not trying to use multiple external IPs, but something keeps assigning the host ips as well as the specified load balancer port. I'm at a loss as to what is going on as I can hit those host addresses and get to my resources but that defeats the purpose of a load balancer.

@github-actions
Copy link

github-actions bot commented Nov 5, 2022

This issue has been automatically marked as stale because it has been open 30 days
with no activity. This issue will be closed in 10 days unless you do one of the following:

  • respond to this issue
  • have one of these labels applied: bug,good first issue,help wanted,hold,enhancement,documentation,question

@penguineer
Copy link
Author

Has this bug been fixed?

@protoz
Copy link

protoz commented Nov 5, 2022

My issue ended up being during a k3s upgrade the config was reset and the built in lb was also running. After disabling that and cleaning up the environment by deleting both metallb and servicelb then reinstalling metallb I had a working environment again.

@penguineer
Copy link
Author

penguineer commented Nov 5, 2022

I think that an update should not break anything and one of Kubernetes' core features is to automatically create the configured state. MetalLB cannot handle all transitions, especially when another LB was there before.

My K3S upgrades are automated and I do not want to worry about fixing MetalLB every time this happens.

@noesberger
Copy link

I have the same issue. In my K3s Installation I'm using the buildin traefik and lb.
I'm trying to switch to metallb. So I disabled the integrated servicelb and installed metallb.

But my system is still showing for the traefik and grafana svc the old Configuration.

NAMESPACE              NAME                              TYPE           CLUSTER-IP      EXTERNAL-IP                                 PORT(S)                      AGE
default                kubernetes                        ClusterIP      10.43.0.1       <none>                                      443/TCP                      259d
kube-system            kube-dns                          ClusterIP      10.43.0.10      <none>                                      53/UDP,53/TCP,9153/TCP       259d
kube-system            metrics-server                    ClusterIP      10.43.52.43     <none>                                      443/TCP                      259d
kube-system            traefik                           LoadBalancer   10.43.210.150   192.168.1.170,192.168.1.171,192.168.1.172   80:32065/TCP,443:32113/TCP   8h
metallb-system         webhook-service                   ClusterIP      10.43.200.137   <none>                                      443/TCP                      124m
default                grafana                           LoadBalancer   10.43.151.220   192.168.1.170,192.168.1.171,192.168.1.172   3000:32394/TCP               67d

I also tried to delete the grafana svc, but the delete operation is just hanging.

Here the ConfigMap

---
apiVersion: metallb.io/v1beta1
kind: IPAddressPool
metadata:
  name: default
  namespace: metallb-system
spec:
  addresses:
  - 192.168.1.180-192.168.1.189

---
apiVersion: metallb.io/v1beta1
kind: L2Advertisement
metadata:
  name: default
  namespace: metallb-system
spec:
  ipAddressPools:
  - default

and the error in the metallb controller log is:

{"caller":"service.go:74","event":"clearAssignment","level":"error","msg":"Failed to retrieve lbIPs family","reason":"nolbIPsIPFamily","ts":"2022-11-28T20:18:02Z"}

Any tipps how I can redo the svc configs for traefik and grafana with the correct configuration.

@net47
Copy link

net47 commented Dec 22, 2022

I have a similar setup:

  • k3s version v1.24.9+k3s1
  • disabled traefik and servicelb

I installed MetalLB version v0.13.7 with the following L2 config:

apiVersion: metallb.io/v1beta1
kind: IPAddressPool
metadata:
  name: lab-dmz
  namespace: metallb-system
spec:
  addresses:
  - 192.168.110.50-192.168.110.70

My services looks like this:

apiVersion: v1
kind: Service
metadata:
  name: tremmel-xyz
  labels:
    app: tremmel-xyz
  annotations:
    metallb.universe.tf/address-pool: lab-dmz
    metallb.universe.tf/loadBalancerIPs: 192.168.110.55
spec:
  ports:
    - port: 80
      name: tremmel-xyz
  selector:
    app: tremmel-xyz
  type: LoadBalancer

Unfortunately the IP doesn't get assigned, instead my node IPs 192.168.110.11, 192.168.110.12, 192.168.110.13 get assigned (which basically works, but isn't the expected behaviour):

Name:                     tremmel-xyz
Namespace:                default
Labels:                   app=tremmel-xyz
Annotations:              metallb.universe.tf/address-pool: lab-dmz
                          metallb.universe.tf/loadBalancerIPs: 192.168.110.55
Selector:                 app=tremmel-xyz
Type:                     LoadBalancer
IP Family Policy:         SingleStack
IP Families:              IPv4
IP:                       10.43.6.56
IPs:                      10.43.6.56
LoadBalancer Ingress:     192.168.110.11, 192.168.110.12, 192.168.110.13
Port:                     tremmel-xyz  80/TCP
TargetPort:               80/TCP
NodePort:                 tremmel-xyz  32003/TCP
Endpoints:                10.42.1.5:80
Session Affinity:         None
External Traffic Policy:  Cluster
Events:
  Type     Reason                Age                  From                Message
  ----     ------                ----                 ----                -------
  Normal   EnsuringLoadBalancer  115s                 service-controller  Ensuring load balancer
  Normal   AppliedDaemonSet      115s                 service-controller  Applied LoadBalancer DaemonSet kube-system/svclb-tremmel-xyz-97e89521
  Normal   IPAllocated           114s (x3 over 115s)  metallb-controller  Assigned IP ["192.168.110.55"]
  Normal   UpdatedLoadBalancer   114s                 service-controller  Updated LoadBalancer with new IPs: [192.168.110.55] -> [192.168.110.13]
  Warning  nolbIPsIPFamily       114s                 metallb-controller  Failed to retrieve LBIPs IPFamily for ["192.168.110.11" "192.168.110.12" "192.168.110.13"]: IPFamilyForAddresses: invalid ips length 3 ["192.168.110.11" "192.168.110.12" "192.168.110.13"]
  Normal   UpdatedLoadBalancer   114s                 service-controller  Updated LoadBalancer with new IPs: [192.168.110.55] -> [192.168.110.11 192.168.110.12 192.168.110.13]

Is this related to k3s or is this a bug?

@penguineer
Copy link
Author

@net47 I had (I inclined to say "exactly") the same situation. Also with the same error message.

The reason is that MetalLB inspects the existing configuration, finds more than one IP address and tries to interpret the second address as IPv6, which, of course, fails.

For the initial cluster your configuration looks fine. MetalLB needs to work with this "cold start" case where it is just being set up and there is a configuration that does not fit the MetalLB scheme.

@net47
Copy link

net47 commented Dec 22, 2022

@penguineer thanks for clarification! Applies this to Metallb in general or are there differences/chances between L2 and BGP?

@penguineer
Copy link
Author

@penguineer thanks for clarification! Applies this to Metallb in general or are there differences/chances between L2 and BGP?

This I don't know. I have an L2 setup only.

@auxworker
Copy link

auxworker commented Dec 28, 2022

I'm running into this exact same issue with an L2 setup

{"caller":"service.go:74","event":"clearAssignment","level":"error","msg":"Failed to retrieve lbIPs family","reason":"nolbIPsIPFamily","ts":"2022-12-28T20:58:14Z"}

k3s: v1.24.8+k3s1
metallb: v0.13.7

I tried re-imaging the machine a few times, installing k3s & metallb from scratch. I can reproduce this issue very consistently.

I have been able to install pihole ( https://github.com/MoJo2600/pihole-kubernetes ) inside k3s, but exposing some of the services is a no go because of metallb, metalllb eventually tries to assign the IPs of the k3s worker nodes (10.0.0.4 & 10.0.0.5 )to the service even though that is not what i'm asking it it do, it's supposed to get 10.0.0.15

kubectl describe service --namespace argocd helm-chart-pihole-dns-udp
Name:                     helm-chart-pihole-dns-udp
Namespace:                argocd
Labels:                   app=pihole
                          app.kubernetes.io/instance=helm-chart-pihole
                          chart=pihole-2.11.0
                          heritage=Helm
                          release=helm-chart-pihole
Annotations:              metallb.universe.tf/allow-shared-ip: pihole-svc
Selector:                 app=pihole,release=helm-chart-pihole
Type:                     LoadBalancer
IP Family Policy:         SingleStack
IP Families:              IPv4
IP:                       10.43.64.186
IPs:                      10.43.64.186
IP:                       10.0.0.15
LoadBalancer Ingress:     10.0.0.4, 10.0.0.5
Port:                     dns-udp  53/UDP
TargetPort:               dns-udp/UDP
NodePort:                 dns-udp  32537/UDP
Endpoints:                10.42.0.40:53
Session Affinity:         None
External Traffic Policy:  Local
HealthCheck NodePort:     30993
Events:
  Type     Reason                Age                  From                Message
  ----     ------                ----                 ----                -------
  Normal   EnsuringLoadBalancer  117s                 service-controller  Ensuring load balancer
  Normal   AppliedDaemonSet      117s                 service-controller  Applied LoadBalancer DaemonSet kube-system/svclb-helm-chart-pihole-dns-udp-240f6482
  Normal   IPAllocated           116s (x2 over 117s)  metallb-controller  Assigned IP ["10.0.0.15"]
  Normal   UpdatedLoadBalancer   116s                 service-controller  Updated LoadBalancer with new IPs: [10.0.0.15] -> [10.0.0.4]
  Normal   nodeAssigned          116s (x3 over 117s)  metallb-speaker     announcing from node "master.infrastructure.router.local" with protocol "layer2"
  Normal   UpdatedLoadBalancer   115s                 service-controller  Updated LoadBalancer with new IPs: [10.0.0.15] -> [10.0.0.4 10.0.0.5]
  Warning  nolbIPsIPFamily       115s                 metallb-controller  Failed to retrieve LBIPs IPFamily for ["10.0.0.4" "10.0.0.5"]: IPFamilyForAddresses: same address family ["10.0.0.4" "10.0.0.5"]



@penguineer
Copy link
Author

@auxworker Are you, by chance, missing the address pool annotation? I only see allow-shared-ip in your example.

I did not observe that MetalLB would assign ClusterIPs, only that it could not change the settings when ClusterIPs were already assigned.

@auxworker
Copy link

the pihole chart assumes that metallb is used, I added the allow-shared-ip annotation and it didnt make any difference

kubectl describe svc --namespace argocd helm-chart-pihole-dns-udp 
Name:                     helm-chart-pihole-dns-udp
Namespace:                argocd
Labels:                   app=pihole
                          app.kubernetes.io/instance=helm-chart-pihole
                          chart=pihole-2.11.0
                          heritage=Helm
                          release=helm-chart-pihole
Annotations:              metallb.universe.tf/address-pool: mypool1
                          metallb.universe.tf/allow-shared-ip: pihole-svc
Selector:                 app=pihole,release=helm-chart-pihole
Type:                     LoadBalancer
IP Family Policy:         SingleStack
IP Families:              IPv4
IP:                       10.43.248.246
IPs:                      10.43.248.246
IP:                       10.0.0.15
LoadBalancer Ingress:     10.0.0.4, 10.0.0.5
Port:                     dns-udp  53/UDP
TargetPort:               dns-udp/UDP
NodePort:                 dns-udp  32541/UDP
Endpoints:                10.42.0.40:53
Session Affinity:         None
External Traffic Policy:  Local
HealthCheck NodePort:     30860
Events:
  Type     Reason                Age                From                Message
  ----     ------                ----               ----                -------
  Normal   EnsuringLoadBalancer  21s                service-controller  Ensuring load balancer
  Normal   AppliedDaemonSet      21s                service-controller  Applied LoadBalancer DaemonSet kube-system/svclb-helm-chart-pihole-dns-udp-46dd889c
  Normal   UpdatedLoadBalancer   19s                service-controller  Updated LoadBalancer with new IPs: [10.0.0.15] -> [10.0.0.5]
  Normal   IPAllocated           19s (x3 over 21s)  metallb-controller  Assigned IP ["10.0.0.15"]
  Normal   nodeAssigned          19s (x2 over 21s)  metallb-speaker     announcing from node "master.infrastructure.router.local" with protocol "layer2"
  Warning  nolbIPsIPFamily       19s                metallb-controller  Failed to retrieve LBIPs IPFamily for ["10.0.0.4" "10.0.0.5"]: IPFamilyForAddresses: same address family ["10.0.0.4" "10.0.0.5"]
  Normal   UpdatedLoadBalancer   19s                service-controller  Updated LoadBalancer with new IPs: [10.0.0.15] -> [10.0.0.4 10.0.0.5]

@penguineer
Copy link
Author

@auxworker I assume you did defined mypool1, which is configured as the MetallLB address pool for pihole, in the MetalLB setup?

@auxworker
Copy link

auxworker commented Dec 29, 2022

@auxworker I assume you did defined mypool1, which is configured as the MetallLB address pool for pihole, in the MetalLB setup?

here is my metallb config:

apiVersion: metallb.io/v1beta1
kind: IPAddressPool
metadata:
  name: mypool1
  namespace: metallb-system
spec:
  addresses:
    - 10.0.0.10-10.0.0.20
---
apiVersion: metallb.io/v1beta1
kind: L2Advertisement
metadata:
  name: mypool1-advertisement
  namespace: metallb-system
spec:
  ipAddressPools:
    - mypool1

I'm able to assign IPs to another service without any issue:

kubectl describe svc --namespace argocd helm-chart-pihole-web 
Name:                     helm-chart-pihole-web
Namespace:                argocd
Labels:                   app=pihole
                          app.kubernetes.io/instance=helm-chart-pihole
                          chart=pihole-2.11.0
                          heritage=Helm
                          release=helm-chart-pihole
Annotations:              metallb.universe.tf/address-pool: mypool1
                          metallb.universe.tf/allow-shared-ip: pihole-svc
Selector:                 app=pihole,release=helm-chart-pihole
Type:                     LoadBalancer
IP Family Policy:         SingleStack
IP Families:              IPv4
IP:                       10.43.176.237
IPs:                      10.43.176.237
IP:                       10.0.0.19
LoadBalancer Ingress:     10.0.0.19
Port:                     http  80/TCP
TargetPort:               http/TCP
NodePort:                 http  31687/TCP
Endpoints:                10.42.0.40:80
Port:                     https  443/TCP
TargetPort:               https/TCP
NodePort:                 https  32300/TCP
Endpoints:                10.42.0.40:443
Session Affinity:         None
External Traffic Policy:  Local
HealthCheck NodePort:     32626
Events:
  Type    Reason                Age                    From                Message
  ----    ------                ----                   ----                -------
  Normal  EnsuringLoadBalancer  8m15s                  service-controller  Ensuring load balancer
  Normal  IPAllocated           8m15s (x2 over 8m15s)  metallb-controller  Assigned IP ["10.0.0.19"]
  Normal  AppliedDaemonSet      8m15s                  service-controller  Applied LoadBalancer DaemonSet kube-sy
stem/svclb-helm-chart-pihole-web-2cba918f
  Normal  nodeAssigned          8m15s                  metallb-speaker     announcing from node "master.infrastru
cture.router.local" with protocol "layer2"

@auxworker
Copy link

here are some logs from metallb controller

aller":"service_controller.go:103","controller":"ServiceReconciler","end reconcile":"argocd/helm-chart-pihole-web","level":"info","ts":"2022-12-29T18:37:11Z"}
{"caller":"service_controller.go:60","controller":"ServiceReconciler","level":"info","start reconcile":"argocd/helm-chart-pihole-dns-udp","ts":"2022-12-29T18:37:11Z"}
{"caller":"service_controller.go:103","controller":"ServiceReconciler","end reconcile":"argocd/helm-chart-pihole-dns-udp","level":"info","ts":"2022-12-29T18:37:11Z"}
{"caller":"service_controller.go:60","controller":"ServiceReconciler","level":"info","start reconcile":"argocd/helm-chart-pihole-dns-udp","ts":"2022-12-29T18:37:12Z"}
{"caller":"service.go:99","error":"[\"10.0.0.5\"] is not allowed in config","event":"clearAssignment","level":"info","msg":"current IP not allowed by config, clearing","ts":"2022-12-29T18:37:12Z"}
{"caller":"service.go:130","event":"clearAssignment","level":"info","msg":"user requested a different IP than the one currently assigned","reason":"differentIPRequested","ts":"2022-12-29T18:37:12Z"}
{"caller":"service.go:147","event":"ipAllocated","ip":["10.0.0.15"],"level":"info","msg":"IP address assigned by controller","ts":"2022-12-29T18:37:12Z"}
{"caller":"main.go:90","event":"serviceUpdated","level":"info","msg":"updated service object","ts":"2022-12-29T18:37:12Z"}
{"caller":"service_controller.go:103","controller":"ServiceReconciler","end reconcile":"argocd/helm-chart-pihole-dns-udp","level":"info","ts":"2022-12-29T18:37:12Z"}
{"caller":"service_controller.go:60","controller":"ServiceReconciler","level":"info","start reconcile":"argocd/helm-chart-pihole-dns-tcp","ts":"2022-12-29T18:37:12Z"}
{"caller":"service.go:99","error":"[\"10.0.0.5\"] is not allowed in config","event":"clearAssignment","level":"info","msg":"current IP not allowed by config, clearing","ts":"2022-12-29T18:37:12Z"}
{"caller":"service.go:130","event":"clearAssignment","level":"info","msg":"user requested a different IP than the one currently assigned","reason":"differentIPRequested","ts":"2022-12-29T18:37:12Z"}
{"caller":"service.go:147","event":"ipAllocated","ip":["10.0.0.15"],"level":"info","msg":"IP address assigned by controller","ts":"2022-12-29T18:37:12Z"}
{"caller":"main.go:90","event":"serviceUpdated","level":"info","msg":"updated service object","ts":"2022-12-29T18:37:12Z"}
{"caller":"service_controller.go:103","controller":"ServiceReconciler","end reconcile":"argocd/helm-chart-pihole-dns-tcp","level":"info","ts":"2022-12-29T18:37:12Z"}
{"caller":"service_controller.go:60","controller":"ServiceReconciler","level":"info","start reconcile":"argocd/helm-chart-pihole-dns-udp","ts":"2022-12-29T18:37:12Z"}
{"caller":"service_controller.go:103","controller":"ServiceReconciler","end reconcile":"argocd/helm-chart-pihole-dns-udp","level":"info","ts":"2022-12-29T18:37:12Z"}
{"caller":"service_controller.go:60","controller":"ServiceReconciler","level":"info","start reconcile":"argocd/helm-chart-pihole-dns-tcp","ts":"2022-12-29T18:37:12Z"}
{"caller":"service_controller.go:103","controller":"ServiceReconciler","end reconcile":"argocd/helm-chart-pihole-dns-tcp","level":"info","ts":"2022-12-29T18:37:12Z"}
{"caller":"service_controller.go:60","controller":"ServiceReconciler","level":"info","start reconcile":"argocd/helm-chart-pihole-dns-tcp","ts":"2022-12-29T18:37:12Z"}
{"caller":"service.go:74","event":"clearAssignment","level":"error","msg":"Failed to retrieve lbIPs family","reason":"nolbIPsIPFamily","ts":"2022-12-29T18:37:12Z"}
{"caller":"service_controller.go:103","controller":"ServiceReconciler","end reconcile":"argocd/helm-chart-pihole-dns-tcp","level":"info","ts":"2022-12-29T18:37:12Z"}
{"caller":"service_controller.go:60","controller":"ServiceReconciler","level":"info","start reconcile":"argocd/helm-chart-pihole-dns-udp","ts":"2022-12-29T18:37:12Z"}
{"caller":"service.go:74","event":"clearAssignment","level":"error","msg":"Failed to retrieve lbIPs family","reason":"nolbIPsIPFamily","ts":"2022-12-29T18:37:12Z"}
{"caller":"service_controller.go:103","controller":"ServiceReconciler","end reconcile":"argocd/helm-chart-pihole-dns-udp","level":"info","ts":"2022-12-29T18:37:12Z"}

@fedepaol
Copy link
Member

Sorry for the delay, I filed #1778 based on the op's issue.
The new behavious will be that if metallb receives a lb service it should handle, but with a malformed set of ips (which it did not set) it will reset them and reallocate.

@penguineer
Copy link
Author

@fedepaol Thank you very much for taking care of this issue!

@vavan11
Copy link

vavan11 commented Dec 18, 2023

I have faced the same issues at k3s, and the solution for me was to update the service config.
/etc/systemd/system/k3s.service

ExecStart=/usr/local/bin/k3s \
    server \
        '--no-deploy' \
        'traefik' \
        '--disable' \
        'servicelb' \
        '--flannel-backend=none' \

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants