-
Notifications
You must be signed in to change notification settings - Fork 903
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
design: BGP FRR integration #832
Conversation
Hi @markdgray I went through the document quickly, for me this integration is a great occasion to create a separate BGP controller instead of a keeping the idea of BGP speaker around:
Having some BGP controllers running only on the control-plane would be great for security My 2 cents |
I have added a section to discuss evaluation of alternative routing stacks and the motivation for selecting FRR. I also updated the "non-goals" appropriately. |
c63b385
to
4b556da
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey there @markdgray.
First of all, let me say this is an excellent design proposal 👏 It's readable, well-rationalized and makes a lot of sense. I'd say we can use it as a template for future proposals.
Second, your overview of the MetalLB codebase looks great. I'd consider converting this part of the proposal to a developer-oriented document. Would love to get a PR for that if you're up for it.
Overall I like the proposal a lot. I do have some concerns and reservations which I've expressed as inline comments. My biggest concern is the lack of strategy around long-term maintenance of the BGP stacks. My experience tells me we will very quickly abandon the old stack because FRR has more features compared to the native stack. This could be a good thing, but I think we should be more explicit about our intentions here.
Happy to hear your thoughts, and would love to see more from you in this project 🙂
design/0001-frr.md
Outdated
BGP. There is an experimental gRPC interface. This interface may need to be | ||
productionized through the FRR community. When this has been satisfactorily | ||
achieved, we can start [Story 3](#story-3). Until that time, in order to | ||
mitigate this risk, we can wrap the FRR ‘vtysh’ command line interface in Go |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does wrapping vtysh
entail automating an interactive CLI using Go code? This sounds fragile to me. I'd opt for updating a config file if FRR supports live config reloading until we have e.g. gRPC support.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wasn't considering wrapping the interactive vtysh
shell but rather send commands using vtysh which should be less fragile. i.e. vtysh -c "command"
.
Using a config file is an option. There are some limitations with that that would be preferable to address if we were to go this route. (FRRouting/frr#2128)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@markdgray oh, good point. If the config file option is limited (people asking for the reload script to be smarter) and this is already a risk mitigation option, IMHO it sounds ok to go with the vtysh -c cmd
option.
I don't have experience with either of these options, so while I think maybe cfg file seems simpler (no "state" to keep, probably simpler to reproduce getting to some broken state, etc.), I'm ok if you think vtysh
is the way to go here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
btw, @markdgray why did you mark this as resolved? Not sure I follow what was the reasoning here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shall I update the document to state the preferred method (e.g. vtysh -c cmd
)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. @johananl ?
Great!
I was originally planning something like that so I can refactor the document as you have suggested.
I will look through these comments and address them in the next couple of days or maybe next week.
Thanks @johananl ! |
I haven't taken the time to read the proposal in full, so sorry if the response in in the proposal (just did some quick search). |
It can be configured (https://docs.frrouting.org/en/latest/bgp.html#starting-bgp). It looks like that is a configuration parameter in MetalLB so we would just need to implement that? |
I added this part as a separate file. Will I split it out into a separate file? |
Overall I'm quite happy with this and am ready to give my +1. I'd like to resolve the remaining open question on whether FRR would run within the metallb speaker pod or not, and if it's separate, how we would secure that communication. |
e0ca7b7
to
eecfe6a
Compare
At Ericsson we use the metallb
At a meeting with |
@rata Thanks for the comprehensive review. I had a few follow-up questions before I do another revision of this PR. |
@markdgray I think I answer them all, thanks again! @uablrek Cool. Not sure how to read your message, though: do you mean you would prefer to contribute your speaker implementation instead of using FRR? Or are you saying that MetalLB going to FRR is beneficial and you'll be able to drop the custom speaker and use metallb upstream speaker implementation? |
Based on some offline conversation, I think this is an expression of support of the general direction we're taking. They have an alternative implementation of it, but are not able to open source it at this time (and maybe that could change in the future). In the meantime, their configuration examples offer some inspiration for further extensions we may want to make for additional features they find important in their own implementation. In other words, take it as input for some future target features in the implementation. |
d2039a1
to
b0e3818
Compare
Is there any consensus? I think everything has been addressed. |
I don't think I'll have time to have another look soon, but don't blovk the PR on my final ACK. It was quite close to ready IMHO when I reviewed the last time, also :) Thanks again for the PR! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot for the effort @markdgray.
Please don't be overwhelmed by the number of comments 🙂 These are mostly cosmetic nitpicks. I took the time to add them since I believe being lenient about details in one part of a codebase has a tendency to lower the quality bar for other areas (such as actual code), too. Please treat all the language/phrasing/capitalization-related comments as non blockers.
Following is a summary of the only "big" concerns I have left (more info inline):
- Configuring FRR imperatively using
vtysh
poses IMO a risk of running into unexpected state problems between MetalLB and FRR. - I'm worried about the emphasis on managing "external" agents in this proposal.
- Are we good to "embed" FRR inside MetalLB in terms of OSS licenses?
14e8ea5
to
1e4fe71
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great work @markdgray. This is an easy read now and makes a lot of sense to me.
The remaining comments are all language-related nitpicks so I'm already LGTM-ing the PR and will leave it up to you to decide whether to fix them.
514a91d
to
909264a
Compare
This design document discusses integration of FRR as an alternative BGP implementation for MetalLB. Signed-off-by: Mark Gray <mark.d.gray@redhat.com> Co-authored-by: Johannes Liebermann <johanan.liebermann@gmail.com>
909264a
to
6cd75dc
Compare
Thanks @johananl. All the changes seem reasonable. I have accepted your suggestions, squashed the commits and rebased against |
LGTM @markdgray. |
@uablrek sorry for getting back on this so late. What would be the use case for static BFD without BGP? Your Thanks! |
I have only passed the config so I might be wrong, but I think the metallb contoller requires a "protocol" entry so it is there but not used. Our setup uses the metallb controller with one modification; the yaml parsing is not "strict", so we can add things but must comply with existing syntax. |
The remove-strict commit Nordix@13b6cca |
So, if I am getting your comment right, in the "static-bfd" context you are not using bgp at all (which kind of resonated with the "static-bfd" naming). In that case, how is the advertisement of the virtual ip performed? |
Right. The route is statically configured in the GW-router by the site owner. Sometimes the site owner does not allow routing protocols. |
So how does the GW-router node knows which node(s) own the ip address associated with the lb service in this case? Or to put in other words, where that static route is set towards? |
No node "owns" the lb-address. But if |
On top of that, there's the bfd + node selector section where only a subset of nodes are enabled. I'm still not sure I get how the routes are configured. Are those something like ecmp static rules using the nodes (all of them) as gateways for the vips or something like that? |
BFD is per node, BGP let you advertise and retract multiple IPs, so this static-bfd is only used for 1 IP I guess ? |
static-bfd doesn't care about the externalTrafficPolicy. We do not have additional mechanism regarding |
This design document discusses integration of FRR as an
alternative BGP agent implementation for MetalLB.
Signed-off-by: Mark Gray mark.d.gray@redhat.com