Join GitHub today
GitHub is home to over 28 million developers working together to host and review code, manage projects, and build software together.Sign up
MetalLB incorrectly uses AS_SET to specify the originating ASN in path attributes #225
Is this a bug report or a feature request?:
The BGP AS number can be either 2 bytes (when the value is from 1 to 65535) or 4 bytes (when the value is from 65536 to 4294967295) but Metal LB is always using 4 bytes (even for values lower than 65536).
This is causing compatibility issues with routers from e.g. Cisco, Ericsson etc as they have a strict check and reject AS numbers which are not compatible with the above stated rule.
See this article from Cisco with details about their implementation of 4 bytes AS numbers.
For example, Ericsson routers reject the BGP Update sent by Metal LB when the AS number is lower than 65536 with this error:
The result is that the Services IPs are not propagated to the routers.
What you expected to happen:
AS numbers lower than 65536 should be encoded as 2 bytes
How to reproduce it (as minimally and precisely as possible):
Use the configuration in the tutorial with a BGP router from Ericsson.
Anything else we need to know?:
Hold on .. :)
After doing a bit more research, the bug might be on the Router side.
From RFC 6793:
My understanding: the Metal LB behaviour is valid if the it negotiated the use of 4 bytes ASN with the router when BGP session was established.
So, it could also be a bug on the Cisco side (or the Cisco is using a version not compatible with RFC 6793).
I will do a trace to see what is negotiated exactly.
Thanks and please don't spend any time on this until it's more clear where the bug is. Feel free to close the issue if you want to and I can reopen it if I am sure the bug in on the Metal LB side.
Based on that discussion you linked, I think the bug could be in either. The RFC says if both routers agree to use 4B ASN, then the AS_PATH attribute can encode 4b ASNs. So, if that's not happening, 2 possibilities: either the peers are not configuring 4B ASN (but in that case metallb should be detecting that and rejecting the connection), or the router has a bug and rejects valid protocol messages.
I'm digging through metallb code now to check the former.
One useful thing, if you can, would be to get a pcap of the BGP session (from establishment all the way to the disconnection). That way I can dissect exactly what the packets are saying, and compare to the spec.
OK, I see that my BGP OPEN parsing code does not check that the peer advertised 4B ASN support. So, if your router is configured to not use 4B ASN, we could have this issue, where MetalLB behaves as if the session is full 4B ASN compliant, but the peer is not expecting this.
Can you check if there's an option on your router(s) to force 4B ASN on? And, if you can capture a pcap of the BGP exchange, that would allow us to confirm that this is happening.
If this is the problem, I think the fix is relatively simple, it just makes path attribute encoding a little bit more complicated.
I have done some more troubleshooting and the router supports 4bytes ASNs so that should not be the issue (see the attached traces).
I think the issue is actually related to the AS_PATH segment type.
I see that Metal LB is using AS_SET as AS path segment type and I think that is what the Router does not like.
According to BGP RFC 4271 the originating external router (MetalLB in this case) must use AS_SEQUENCE segment type in the
So, I think the issue would be solved if Metal LB would use AS_SEQUENCE instead of AS_SET.
I tried to switch to iBGP between MetalLB and the router (I configured the same ASN in the LB and the router) and then the router is accepting the UPDATE messages and the routes get propagated.
I have attached traces with both iBGP and eBGP with connection setup and route updates.
Well, that's embarrassing. I'm surprised other BGP stacks let me get away with that mistake for all this time! Thank you for the very detailed research! Pushing a patch to fix this very soon.
As for "why not gobgp", good question! It was a deliberate choice between tradeoffs. I'll quote myself from a few months ago, when someone else asked the question:
Good question! The initial prototype used gobgp, but I kept having issues and headaches with it. The
Aside from that, GoBGP also implements way more stuff than MetalLB needs. It wants to be a real
With that said, there's obviously downsides as well, and the big one is that OSRG has money to buy