-
Notifications
You must be signed in to change notification settings - Fork 946
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Go Panic using BGP on CentOS 7.4 / K8S 1.10.1 #250
Comments
@FireDrunk Could you listen with tcpdump and show the traffic thats going back AND forward before the crash? I worked on the significant changes to the Dialer functions required for TCP MD5, would be interesting if there is a flaw in the new Logic. |
Thanks for the bug report. I suspect I know what happened, based on the stack trace. There is a window right after BGP session startup where we might try sending a BGP update on a connection that was just aborted (which nil's out
It should be a trivial fix, I'll prepare it on a branch so you can test. The next question is: why is the consumer goroutine terminating the BGP session? That means it received something it doesn't like from the peer, which is either a BGP NOTIFY (peer terminating the session explicitly), or malformed BGP packets (or more likely, we got the parsing wrong somewhere). Let's fix the crash first, then we can look at what's happening with your session. |
@FireDrunk can you test the change, with |
Just updated the image, the speaker stays online now, but the logs tell me this:
LoadBalancer IP given to service/pod is also not pingable from node that is running that pod/service. TCPDump (tcpdump -i any -s 1500 port 179):
|
Okay, the pod is stable, good - now the problem seems to be somewhere in configuration. The BGP session is getting terminated by pfSense after MetalLB sends its OPEN message. In your original report, it stated that the last error was "AS unacceptable" ... So, why is the AS unacceptable? I have no idea. It looks to me like a perfectly fine iBGP session, but the router is rejecting it. It looks like pfSense uses OpenBGPD, which is not one that I've tested against in the past :(. To figure out more, I'm going to have to reproduce this on a local testbench. The tcpdump was good, however to really dissect the packets I need a pcap. Can you |
Just found a typo in my OpenBGPd Neighbour config (had AS as 65512 instead of 64512). Just updated, and it seems the session is now correctly established:
IP is still offline though.
So I think the current problem is just the container that hasn't configured the IP on the node yet. EDIT: |
Ah yes, the I'm glad you got it working! |
Any idea when this will make it into a release? |
Doh, when you closed the bug, I forgot to go and merge it. I'll release 0.6.2 now, should be available in ~30min. |
0.6.2 released with the fix. |
Awesome! Works like a charm! |
Is this a bug report or a feature request?:
Bug Report
What happened:
After succesfully connecting BGP to my pfSense router and activating the first LoadBalancer service the Speaker pod crashed.
What you expected to happen:
An activated LoadBalancer IP, and a working pod with the attached IP.
How to reproduce it (as minimally and precisely as possible):
ConfigMap
Anything else we need to know?:
The BGP status was active before the crash (without setting a service type to LoadBalancer).
My router was activly communicationg via BGP with the Speaker and all was running well.
After adding the LoadBalancer type to my service, the Speaker crashed, and the bgp connection went down.
After inspecting the container in docker, I saw the following crash:
Environment:
uname -a
): 3.10.0-693.21.1The text was updated successfully, but these errors were encountered: