Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

k3s with version v0.6.0-rc5 leads to unstable metallb #542

Closed
evrardjp opened this issue Jun 17, 2019 · 4 comments
Closed

k3s with version v0.6.0-rc5 leads to unstable metallb #542

evrardjp opened this issue Jun 17, 2019 · 4 comments
Milestone

Comments

@evrardjp
Copy link
Contributor

evrardjp commented Jun 17, 2019

Describe the bug
Installing a 0.6.0-rc5 leads to unstable metallb load balancer, whose external(?!) ips are flapping (between a IP from the pool, and "waiting").
0.5.0 doesn't have the same issue with the same configuration.

To Reproduce

  1. Install cluster with: curl -sfL https://get.k3s.io | INSTALL_K3S_VERSION=v0.6.0-rc5 INSTALL_K3S_EXEC="--no-deploy=traefik --no-deploy=servicelb" sh -
  2. (Maybe optional) join the cluster on an additional node (not sure if optional step or not, but it happens when multiple nodes are in the cluster)
  3. kubectl apply -f https://raw.githubusercontent.com/google/metallb/v0.7.3/manifests/metallb.yaml
  4. kubectl apply -f <your pool file>
  5. Do a basic app deployment + service (LoadBalancer type).
  6. kubectl get services -n <your app deploy>.
  7. Check external IP service field a few times. See it flapping (only for Service type LoadBalancer, on external IP).

Expected behavior
A type LoadBalancer Service should not have its IP flapping every few seconds, but instead the service external IP should be stable.

Screenshots
I can see things like this in logs of my metallb pods, repeated thousands of times in a few minutes:

{"caller":"main.go:159","event":"startUpdate","msg":"start of service update","service":"argocd/argocd-server","ts":"2019-06-17T11:33:35.978806385Z"}
{"caller":"main.go:229","event":"serviceAnnounced","ip":"192.168.102.20","msg":"service has IP, announcing","pool":"default","protocol":"layer2","service":"argocd/argocd-server","ts":"2019-06-17T11:33:35.978971396Z"}
{"caller":"main.go:231","event":"endUpdate","msg":"end of service update","service":"argocd/argocd-server","ts":"2019-06-17T11:33:35.979019474Z"}
{"caller":"main.go:159","event":"startUpdate","msg":"start of service update","service":"argocd/argocd-server","ts":"2019-06-17T11:33:35.979059827Z"}
{"caller":"arp.go:102","interface":"enp1s0","ip":"192.168.102.20","msg":"got ARP request for service IP, sending response","responseMAC":"70:85:c2:5f:65:33","senderIP":"192.168.102.20","senderMAC":"90:e2:ba:93:3a:f4","ts":"2019-06-17T11:33:35.97933408Z"}
{"caller":"arp.go:102","interface":"enp3s0f1","ip":"192.168.102.20","msg":"got ARP request for service IP, sending response","responseMAC":"90:e2:ba:93:3a:f4","senderIP":"192.168.102.20","senderMAC":"70:85:c2:5f:65:33","ts":"2019-06-17T11:33:35.979359537Z"}
{"caller":"main.go:254","event":"serviceWithdrawn","ip":"","msg":"withdrawing service announcement","reason":"noIPAllocated","service":"argocd/argocd-server","ts":"2019-06-17T11:33:35.979592061Z"}
{"caller":"main.go:172","event":"endUpdate","msg":"end of service update","service":"argocd/argocd-server","ts":"2019-06-17T11:33:35.97964767Z"}

Using 0.5.0 doesn't result in this, which leads to me think there is a network issue. Was there a change in flannel? Do we need to add metallb testing in k3s? How could we do this kind of testing?

Additional context
Uninstalling and installing v0.5.0 works.

@evrardjp evrardjp changed the title Updating k3s to version v0.6.0-rc5 leads to unstable metallb Reinstalling k3s with version v0.6.0-rc5 leads to unstable metallb Jun 17, 2019
@evrardjp evrardjp changed the title Reinstalling k3s with version v0.6.0-rc5 leads to unstable metallb k3s with version v0.6.0-rc5 leads to unstable metallb Jun 17, 2019
@ibuildthecloud
Copy link
Contributor

@evrardjp Is it feasible to do a "docker-compose style" setup that has multiple nodes in a containers to test metalb. Basically if we can testing metalb using just docker containers we can add this to the standard release testing.

@galal-hussein
Copy link
Contributor

So after checking the debug logs, it turns out that the svclb controller is not entirely disabled, its not deploying pods but still updating the service lb:

Jun 17 21:48:28 ip-172-31-43-25 k3s[16295]: time="2019-06-17T21:48:28.883148054Z" level=debug msg="Setting service loadbalancer default/nginx to IPs []"
Jun 17 21:48:29 ip-172-31-43-25 k3s[16295]: time="2019-06-17T21:48:29.089052485Z" level=debug msg="Setting service loadbalancer default/nginx to IPs []"
Jun 17 21:48:29 ip-172-31-43-25 k3s[16295]: time="2019-06-17T21:48:29.296077051Z" level=debug msg="Setting service loadbalancer default/nginx to IPs []"
Jun 17 21:48:29 ip-172-31-43-25 k3s[16295]: time="2019-06-17T21:48:29.501974598Z" level=debug msg="Setting service loadbalancer default/nginx to IPs []"
Jun 17 21:48:29 ip-172-31-43-25 k3s[16295]: time="2019-06-17T21:48:29.710768098Z" level=debug msg="Setting service loadbalancer default/nginx to IPs []"
Jun 17 21:48:29 ip-172-31-43-25 k3s[16295]: time="2019-06-17T21:48:29.917156182Z" level=debug msg="Setting service loadbalancer default/nginx to IPs []"
Jun 17 21:48:30 ip-172-31-43-25 k3s[16295]: time="2019-06-17T21:48:30.122418629Z" level=debug msg="Setting service loadbalancer default/nginx to IPs []"

the problem also happen in 0.5.0 and 0.6.0-rc1 however not as much as recent rcs, since the porting to wrangler instead of norman, the controller is instantly changing the ips as seen above in the debug logs.

@erikwilson
Copy link
Contributor

Thanks for reporting this @evrardjp! Reproduced and tested with #543 to verify issue no longer exists. Please re-open if this is still a problem.

@evrardjp
Copy link
Contributor Author

Wow, I am testing this on my week-ends, I didn't got the time to help on this, sorry. Cool you did it, I guess... Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants