Skip to content

Add config flag to disable usage of endpointslices#937

Merged
rata merged 1 commit intometallb:mainfrom
Christoph-Raab:config-flag-to-disable-endpointslices
Sep 16, 2021
Merged

Add config flag to disable usage of endpointslices#937
rata merged 1 commit intometallb:mainfrom
Christoph-Raab:config-flag-to-disable-endpointslices

Conversation

@Christoph-Raab
Copy link
Contributor

As discussed in #936 there are certain scenarios where endpointslices are technical available but should not be used by metallb.

This PR adds a config flag to disable the usage of endpointslices completely. If the flag is not provided the behaviour is the same as before.

@russellb
Copy link
Contributor

@fedepaol Can you take a look?

@fedepaol
Copy link
Member

@fedepaol Can you take a look?

sure!

@fedepaol
Copy link
Member

@Christoph-Raab if I get this right, in the cluster you are mentioning the EndpointSliceProxying feature gate is disabled but EndpointSlice is enabled, so the CR is getting filled but it's not used.

Just to understand, any reason why you did not go with disabling EndpointSlice too instead of modifying metallb?

@Christoph-Raab
Copy link
Contributor Author

Christoph-Raab commented Aug 31, 2021

if I get this right, in the cluster you are mentioning the EndpointSliceProxying feature gate is disabled but EndpointSlice is enabled, so the CR is getting filled but it's not used.

Exactly.

Just to understand, any reason why you did not go with disabling EndpointSlice too instead of modifying metallb?

The main reason is that you can not disable the Endpointslices resource in the apiserver. So we would not only need to disable the endpointslices controller in the controller-manager, but also deny creating the resource and remove all existing ones.

That's why we want a feature flag like core-dns has to disable the usage in the addon.

@fedepaol
Copy link
Member

if I get this right, in the cluster you are mentioning the EndpointSliceProxying feature gate is disabled but EndpointSlice is enabled, so the CR is getting filled but it's not used.

Exactly.

Just to understand, any reason why you did not go with disabling EndpointSlice too instead of modifying metallb?

The main reason is that you can not disable the Endpointslices resource in the apiserver. So we would not only need to disable the endpointslices controller in the controller-manager, but also deny creating the resource and remove all existing ones.

That's why we want a feature flag like core-dns has to disable the usage in the addon.

But (and I may be wrong here), if you disable the controller in controller-manager via the EndpointSlice, ep slices won't be created (even though the resource is still available), and the automatic check should cover that:

if _, err := kubeClient.DiscoveryV1beta1().EndpointSlices("default").Get(context.Background(), "kubernetes", metav1.GetOptions{}); err != nil {

Here we check if the kubernetes service doesn't have endpoint slices, and if so we assume that ep slices are disabled. Would that work or is that logic bugged somehow?

@Christoph-Raab
Copy link
Contributor Author

You are correct that if you disable the controller, EndpointSlices will not be automatically created by kubernetes.

But the resource still exists in the apiserver. So any user or addon/operator could still create EndpointSlices. So you need to forbid that operation as well, remove all existing ones (because they are no longer manged by the controller/user/addon/operator) and hope, that no user/addon/operator depends on them or something breaks, if they are not there/can not be created.

It's much saver to disable them in metallb, where we are certain, that we don't want to use them and nothing breaks if we do so.

@fedepaol
Copy link
Member

You are correct that if you disable the controller, EndpointSlices will not be automatically created by kubernetes.

But the resource still exists in the apiserver. So any user or addon/operator could still create EndpointSlices. So you need to forbid that operation as well, remove all existing ones (because they are no longer manged by the controller/user/addon/operator) and hope, that no user/addon/operator depends on them or something breaks, if they are not there/can not be created.

Well, endpoints slices are not meant to be created by anything else than k8s. You could say the same thing about endpoints, if somebody creates a new endpoint manually you break the service.
In particular, the check we are performing is against the "kubernetes" service which is always there.

I understand on the other hand the fact that having to delete them is clunky, and that we did not think about this corner case where a cluster could switch from having them enabled to disabled.

It's much saver to disable them in metallb, where we are certain, that we don't want to use them and nothing breaks if we do so.

mlSecret = flag.String("ml-secret-name", os.Getenv("METALLB_ML_SECRET_NAME"), "name of the memberlist secret to create")
deployName = flag.String("deployment", os.Getenv("METALLB_DEPLOYMENT"), "name of the MetalLB controller Deployment")
logLevel = flag.String("log-level", "info", fmt.Sprintf("log level. must be one of: [%s]", strings.Join(logging.Levels, ", ")))
disableSlices = flag.Bool("disable-slices", false, "Disable the usage of endpointslices")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd call this disable-epslices and mention the fact that it defaults to endpoints in the description without relying on the autodiscovery mechanism

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, EndpointSlices or endpoint slices in the description

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

// UseEndpointSlices detect if Endpoints Slices are enabled in the cluster.
func UseEndpointSlices(kubeClient kubernetes.Interface) bool {
func UseEndpointSlices(kubeClient kubernetes.Interface, cfg *Config) bool {
if cfg.DisableSlices {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a comment about the particular scenario (epslices enabled but disabled in the proxy) that we are covering with this parameter?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done


if cfg.ReadEndpoints {
if !UseEndpointSlices(c.client) {
if !UseEndpointSlices(c.client, cfg) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd change this to check the flag outside, seems more readable.

Something like if cfg.DisableEPSlices || !UseEndpointSlices(c.client) {

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having all logic to determine if EndpointSlices should be used in one method seams like the reasonalbe thing to do, so I put it there. But I can change that.

Should I add a comment like I did in UseEndpointSlices? Or is it fine without that?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree, but passing a flag just to return early won't add much value to the function itself I think. Might be different if we start using that in many different places, but that's not the current situation.

I'd move the comment too.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@fedepaol
Copy link
Member

@Christoph-Raab sorry for not reviewing this before

@fedepaol
Copy link
Member

@Christoph-Raab sorry for not reviewing this before

Also, mind adding the rationale to the commit message? Same thing you added as a comment.

If set controller/speaker default to Endpoints
without relying on the autodiscovery mechanism.
Useful if EndpointSlices are enabled in the
cluster but disabled in kube-proxy.
@Christoph-Raab
Copy link
Contributor Author

@Christoph-Raab sorry for not reviewing this before

Also, mind adding the rationale to the commit message? Same thing you added as a comment.

Sure, no problem

@russellb
Copy link
Contributor

just to be explicit, i'm looking for a "lgtm" from @fedepaol since @fedepaol did the endpointslices support

@fedepaol
Copy link
Member

lgtm , and thanks!

Copy link
Contributor

@rata rata left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks @Christoph-Raab and @fedepaol ! Will merge now.

However, some questions for both regarding a possible follow-up PR:

  1. Would it make sense in a future PR to try to detect this situation in "UseEndpointSlices()" without any need for a user-specified flag?

For example, we can look at the kubernetes service (to say an example) or some other service that should be present on any conformant clusters and see if endpoint slices are created for them? Or was that considered too hacky or there isn't such a service we can know to be present?

Or the issue is that endpoint slices are being created and updated, but we don't want to use them in metallb for some reason? What is the reason to not use them in metalLB if they are created and updated by k8s controllers?

I think I'm missing those details. Will merge this now, but please let me know what you think :)

@rata
Copy link
Contributor

rata commented Sep 16, 2021

Merging, LGTM to me, also LGTM from @fedepaol and IIUC also @russellb

@rata rata merged commit 31f4de0 into metallb:main Sep 16, 2021
@fedepaol
Copy link
Member

For example, we can look at the kubernetes service (to say an example) or some other service that should be present on any conformant clusters and see if endpoint slices are created for them? Or was that considered too hacky or there isn't such a service we can know to be present?

Or the issue is that endpoint slices are being created and updated, but we don't want to use them in metallb for some reason? What is the reason to not use them in metalLB if they are created and updated by k8s controllers?

The current code tries to do what you just described, against the k8s service, checking for the presence of epslices.

There is one corner case though, where you can have the epslices enabled but kubeproxy instructed to consume endpoints, that is enabled by a feature flag and there's no way (I know of) to find it programmatically.
I asked @Christoph-Raab to specify that in the commit message and as a comment in the code.

I think I'm missing those details. Will merge this now, but please let me know what you think :)

@Christoph-Raab Christoph-Raab deleted the config-flag-to-disable-endpointslices branch September 16, 2021 15:03
@rata
Copy link
Contributor

rata commented Sep 16, 2021

@fedepaol oh, you are right, on my quick look I missed that the current code is doing just that. Thanks!

@Christoph-Raab just curious, in which scenario endpoint slices are created by the k8s controllers (I assume they have up to date information) but is problematic if MetalLB uses them? Do you hit a bug in MetalLB using endpoint slices? Or why MetalLB should not use them?

In any case, if kube-proxy is doing this, I guess we can too...

@rata rata self-assigned this Sep 16, 2021
@rata rata added the waiting-on-contributor For when a PR cannot move forward without a contributor modification label Sep 16, 2021
@champtar
Copy link
Contributor

(on my phone) Is the flag name the same as kube-proxy ? If not does it make sense to use the same flag name ?

@Christoph-Raab
Copy link
Contributor Author

@rata

@Christoph-Raab just curious, in which scenario endpoint slices are created by the k8s controllers (I assume they have up to date information) but is problematic if MetalLB uses them? Do you hit a bug in MetalLB using endpoint slices? Or why MetalLB should not use them?

We disabled endpointslices with kube-proxy but not in the whole cluster. So metallb would be using them but kube-proxy would still work with endpoints. We don't really trust that endpoints and endpointslices are in perfect sync so we rather want addons to use endpoints.

@champtar

Is the flag name the same as kube-proxy ? If not does it make sense to use the same flag name ?

Kube-proxy uses a flag in the config map:

    featureGates:
      EndpointSliceProxying: false

We chose --disable-epslices for metallb which describes quite good what it does.

@russellb russellb removed the waiting-on-contributor For when a PR cannot move forward without a contributor modification label Oct 28, 2021
@russellb russellb added this to the v0.11.0 milestone Oct 28, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants