Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a pool configurability design proposal. #942

Merged
merged 2 commits into from Mar 24, 2022

Conversation

fedepaol
Copy link
Member

@fedepaol fedepaol commented Sep 8, 2021

As discussed in the last community meeting, this proposal tries to collect all the requests done in separate issues in a organic way, and come with a proposal that includes all of them to start a discussion.

@fedepaol
Copy link
Member Author

fedepaol commented Sep 8, 2021

cc @johananl @rata @champtar @russellb

@@ -0,0 +1,385 @@
# Address Pool configuration options
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a general question for maintainers. Do we have a naming convention for design proposals?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AFAIK, no. But maybe we can move ones already implemented to a subdir, assume most will be implemented so this will still be manageable?

* Where / to whom the ip address must be advertised to
* How to allocate that ip to a given service

The general idea of this proposal is that these degree of configuration are all related to the AddressPool object, and
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: degree -> degrees

Currently, an IP assigned to a Service is advertised to all the possible destinations from all the nodes.
The only exception is the BGP peer [node selector](https://metallb.universe.tf/configuration/#limiting-peers-to-certain-nodes).

Here, we suggest to extend the configuration logic with a protocol dependent way to select the target(s) of the announcement.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

independent?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure I understand the question here.
This part is about choosing the targets of the advertisement, which (I believe) is protocol dependent because of the different nature of the "targets" in the BGP and the L2 context.

The part above (node / addresspool mapping) is meant to be protocol independent, and as such the one below (pool / service association)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah and reading over this, I agree. I thought you were proposing doing something with L2 as well (selectively choosing interfaces) but you say below it wasn't included in this proposal. Do we think we will ever need or want to do this?

@fedepaol fedepaol force-pushed the speakerallocationdesign branch 2 times, most recently from b65fc5a to 7a17df7 Compare September 8, 2021 10:25

No issues were raised but this combination is a more flexible variant of the [peer node selector](https://metallb.universe.tf/configuration/#limiting-peers-to-certain-nodes).

With this combination, the user may choose a different set of targets depending on the node.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So node allocation and peer selection are kind of orthogonal. The exceptions are due to a node being unable to reach a peer. So where would we check for those exceptions? The webhook that you mention below?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that is clearer in the examples below

There are two types of node selection here:

  • The pool node selector drives which nodes are allowed to advertise the ip
  • The node selector in the bgp peer of a given pool, tells that the ip must be advertised to that particular peer from a given subset of nodes

A webhook would be able to spot configurations with no intersections between the two selectors resulting in a pool that won't be advertised anywhere.


### Pool / Service - Namespace association

The association of the pool with a particular service is orthogonal to the previously described feature.
Copy link
Contributor

@markdgray markdgray Sep 8, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am trying to convince myself of this statement :D. So, if I understand this correctly, if I assign an address pool to a particular service, we can reach that service from any node that has a backend for the service - this is regardless of what MetalLB does as it is only advertising its location? .. which would seem to make them independent. Even if we don't advertise from a particular node, we could still actually reach it on that node.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This section is only about the way we associate the pool to a given service. Today is just random. The first pool that has free ips is associated to a given service here.

Once we have an ip associated to the service, MetalLB will advertise it from a node that:

  • has a speaker running (L2 only)
  • in case of local traffic policy, has a running backend
  • in case the node selector is implemented, the node must match the selector

This phase comes after the pool allocation to the service.
The only relation I can see is that if a user cares about pinning a particular pool to a service (or to a namespace), the reason is likely because they want to leverage some of the extra tweakings that we are proposing here.

Copy link
Member

@johananl johananl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for driving this @fedepaol!

I'm not done going through the proposal, however I wanted to provide feedback ASAP for the things I've already read. So, I'm submitting a couple of comments.
Will submit another review in the upcoming days after reviewing the rest 🤞

design/pool-configuration.md Outdated Show resolved Hide resolved
design/pool-configuration.md Outdated Show resolved Hide resolved
design/pool-configuration.md Outdated Show resolved Hide resolved
design/pool-configuration.md Outdated Show resolved Hide resolved
The [current implementation](https://github.com/metallb/metallb/blob/312b03cd3065687f25274486cd3ff5c79d6f6068/speaker/layer2_controller.go#L45) announces L2 pools only from those nodes
with a running speaker and an active endpoint for the service.

In order to be able to announce L2 from a subset of nodes, this logic must be changed, checking for active endpoints only if the TrafficPolicy is local (which is handled by the [BGP implementation](https://github.com/metallb/metallb/blob/5d236989e61683daf1eaad00777b81ce7dd98c5d/speaker/bgp_controller.go#L153)).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

checking for active endpoints only if the TrafficPolicy is local was done:

if svc.Spec.ExternalTrafficPolicy == v1.ServiceExternalTrafficPolicyTypeLocal {

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, I did it after redacting this proposal. Removing it.

Copy link
Member

@johananl johananl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for working on this @fedepaol. I really like your work around classifying the different issues we have around making address pools more flexible. This brings a lot of order and makes it easy to see the big picture 👍

IMO there is too much ambiguity in the "Proposal" section - see my inline comments. I haven't reviewed the sections which follow since I feel like I have to understand the "Proposal" section properly before I can evaluate the rest of the document. Specifically, I'm having a hard time understanding the relation between the various sections (i.e. "I understand the contents of this paragraph but I'm not sure why this paragraph is there").

If you could help me understand your intention by addressing my inline comments and ensure the various section titles make sense - that would be great. Please tag me again when done and I'll do another pass 🙏

design/pool-configuration.md Outdated Show resolved Hide resolved
design/pool-configuration.md Outdated Show resolved Hide resolved
design/pool-configuration.md Outdated Show resolved Hide resolved
design/pool-configuration.md Outdated Show resolved Hide resolved
design/pool-configuration.md Outdated Show resolved Hide resolved
design/pool-configuration.md Show resolved Hide resolved
The general idea of this proposal is that these degrees of configuration are all related to the AddressPool object, and
that these configurations are independent between each other.

### Node / Address Pool mapping
Copy link
Member

@johananl johananl Dec 13, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a bit confused. Does this title refer to the 1st "point" in the previous list, namely "Which nodes should advertise the ip"? If this title and the following ones are an elaboration on the previous list, could we use the same text for the "points" and the titles of the respective sections? If this title and the following ones are not an elaboration of the list, could we explain what these are (e.g. are these use cases? Feature requests? Something else?)? In the latter case, how do these sections relate to the 3 "points" above?

Also, what does the / denote here? Could the slash be substituted with the word "to"? If there is directionality to the mapping, could use use -> or something similar? If not, maybe <> would be better.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll try to re-organize always using the same title.


The following issues also fall in the same "speaker announcement logic" category:

Service with external backends
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand the meaning of this line. Is it necessary? Does this line classify the two issues which follow? Could we reduce ambiguity here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does, but those issues now closed and fixed by #976
Will remove it.

Comment on lines 32 to 33
* To which BGP neighbors the IP address should be advertised
* How to allocate that ip to a given service
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Could we capitalize "IP" consistently throughout the document?

@fedepaol fedepaol force-pushed the speakerallocationdesign branch 2 times, most recently from 7ae6a16 to eb703bc Compare December 17, 2021 16:12
Copy link
Member

@johananl johananl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice work @fedepaol.

I've done a full pass over the design and added some comments. My biggest concern is not being able to influence the formation of BGP sessions separately from prefix filtering. In addition I am not 100% "sold" on the idea of adding a node selector to BGPAdvertisement - see my inline comments.

@fedepaol fedepaol force-pushed the speakerallocationdesign branch 4 times, most recently from eb5e4f1 to db48a73 Compare February 4, 2022 16:01
@ChipWolf
Copy link

ChipWolf commented Feb 7, 2022

May I request priorities on the L2Advertisement node selector?

apiVersion: metallb.io/v1beta1
kind: L2Advertisement
metadata:
  name: lsadvertisement
  namespace: metallb-system
spec:
    pools:
      - pool1
      - pool2
    nodeSelectors:
    - priority: 10
      matchExpression:
      - key: kubernetes.io/hostname
        operator: In
        values: [hostB, hostC]
    - priority: 50
      matchExpression:
      - key: kubernetes.io/hostname
        operator: In
        values: [hostA]

apiVersion: metallb.io/v1beta1
kind: L2Advertisement
metadata:
name: lsadvertisement
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: l2advertisement

apiVersion: metallb.io/v1beta1
kind: L2Advertisement
metadata:
name: lsadvertisement
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: l2advertisement

@adampl
Copy link

adampl commented Feb 24, 2022

What about the issue of interface selection in the L2 mode? I don't see any mention of it in this proposal.
The issue was discussed here: #277

@fedepaol
Copy link
Member Author

@gclawes @johananl I aligned the design doc to the latest changes (IPPool and secret as extra field alternative to password).

@gclawes
Copy link
Collaborator

gclawes commented Mar 21, 2022

@gclawes @johananl I aligned the design doc to the latest changes (IPPool and secret as extra field alternative to password).

Just a note (as discussed on the community meeting), the kubectl short-name ipool will conflict with a similarly-named resource in calico.

@ChipWolf
Copy link

ChipWolf commented Mar 22, 2022

Out of interest, with the nodes effectively selecting pools based on the highest priority, what if the priority of each pool differs for each node?
For example; you have a pool of French IPs and German IPs, your node is distributed between France and Germany, your nodes have the ability to use the IPs interchangeably thanks to your vendor's network mesh, but you'd prefer them to select the IPs from the same geo as the node (where possible).

@fedepaol
Copy link
Member Author

Out of interest, with the nodes effectively selecting pools based on the highest priority, what if the priority of each pool differs for each node? For example; you have a pool of French IPs and German IPs, your node is distributed between France and Germany, your nodes have the ability to use the IPs interchangeably thanks to your vendor's network mesh, but you'd prefer them to select the IPs from the same geo as the node (where possible).

Not sure what you mean. The node selector will be in the announcement.
So, the announcement (let's say L2) will be pinned both to a pool and to a set of nodes. If there's an announcement matching the ippool and the node, the speaker will announce. There is no priority.

@fedepaol
Copy link
Member Author

@gclawes @johananl I aligned the design doc to the latest changes (IPPool and secret as extra field alternative to password).

Just a note (as discussed on the community meeting), the kubectl short-name ipool will conflict with a similarly-named resource in calico.

@gclawes after thinking about it, the ux for calico users will be not ideal. I'll rename ippool to ipaddresspool. We will still have some time til the next release to let it converge if a better name comes to our mind.

This proposal tries to collect all the requests done in separate issues
in a organic way, and come with a proposal that includes all of them to
start a discussion.

Signed-off-by: Federico Paolinelli <fpaoline@redhat.com>
@gclawes
Copy link
Collaborator

gclawes commented Mar 24, 2022

@gclawes @johananl I aligned the design doc to the latest changes (IPPool and secret as extra field alternative to password).

Just a note (as discussed on the community meeting), the kubectl short-name ipool will conflict with a similarly-named resource in calico.

@gclawes after thinking about it, the ux for calico users will be not ideal. I'll rename ippool to ipaddresspool. We will still have some time til the next release to let it converge if a better name comes to our mind.

Sounds good, that sidesteps the issue. For what it's worth, when using the built-in kubectl tab-completion, it will complete the fully qualified name of all CRDs anyways.

@fedepaol
Copy link
Member Author

Merging this.
Thanks to all those who took part to the discussion, the proposal was shaped in the right direction as a collective effort, and the quality is way better than the initial version.

@fedepaol fedepaol merged commit 8d906f0 into metallb:main Mar 24, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

9 participants