Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Metallb should use Endpointslices serving property, not ready #2074

Closed
8 tasks done
zerkms opened this issue Sep 12, 2023 · 4 comments · Fixed by #2088
Closed
8 tasks done

Metallb should use Endpointslices serving property, not ready #2074

zerkms opened this issue Sep 12, 2023 · 4 comments · Fixed by #2088

Comments

@zerkms
Copy link
Contributor

zerkms commented Sep 12, 2023

MetalLB Version

0.13.9

Deployment method

Manifests

Main CNI

kube-router

Kubernetes Version

1.26.4

Cluster Distribution

No response

Describe the bug

UPD: all this mostly relevant to ExternalTrafficPolicy: Cluster services, but in my opinion all eps.ready uses should be suppressed.

At the moment throughout the code

for _, slice := range eps.SlicesVal {
for _, ep := range slice.Endpoints {
if !epslices.IsConditionReady(ep.Conditions) {
continue
}

the "readiness" of an endpoint slice is determined by the ready property.

Now let's have a look at the EPS conditions documentation:

FIELDS:
   ready	<boolean>
     ready indicates that this endpoint is prepared to receive traffic,
     according to whatever system is managing the endpoint. A nil value
     indicates an unknown state. In most cases consumers should interpret this
     unknown state as ready. For compatibility reasons, ready should never be
     "true" for terminating endpoints.

   serving	<boolean>
     serving is identical to ready except that it is set regardless of the
     terminating state of endpoints. This condition should be set to true for a
     ready endpoint that is terminating. If nil, consumers should defer to the
     ready condition.

   terminating	<boolean>
     terminating indicates that this endpoint is terminating. A nil value
     indicates an unknown state. Consumers should interpret this unknown state
     to mean that the endpoint is not terminating.

So, as you can see the better suited field is in fact serving.

Why it's important: with the current implementation it's impossible to implement a service that implement any graceful shutdown strategy whatsoever. As long as the pod becomes in terminating state - its eps.ready condition becomes False.

If eps.serving was used instead - then on pod shutdown it had chance to stay running and complete serving clients gracefully

To Reproduce

  1. Create a pod
  2. Expose it via a service (with externalTrafficPolicy: Local)
  3. Describe the endpointslice for the service and see the status of all 3 condition fields.
  4. Now delete the pod
  5. Quickly describe the endpointslice again

Expected Behavior

Metallb should not immediately stop announcing the pod that is currently being terminated, but instead refer to the serving field.

So the flow should look like this:

  1. The pod receives delete request. CRI sends signal to the process to shutdown. At this point Pod is still in Ready state
  2. Endpointslice.ready=False, Endpointslice.serving=True
  3. metallb continues announcing the node ip with the pod
  4. Process completes the graceful shutdown ceremony and quits, endpoint is entirely removed
  5. metallb stops announcing the node ip as the endpoint does not exist anymore

With the current implementation it all breaks on the step 3.

Additional Context

N/a

I've read and agree with the following

  • I've checked all open and closed issues and my request is not there.
  • I've checked all open and closed pull requests and my request is not there.

I've read and agree with the following

  • I've checked all open and closed issues and my issue is not there.
  • This bug is reproducible when deploying MetalLB from the main branch
  • I have read the troubleshooting guide and I am still not able to make it work
  • I checked the logs and MetalLB is not discarding the configuration as not valid
  • I enabled the debug logs, collected the information required from the cluster using the collect script and will attach them to the issue
  • I will provide the definition of my service and the related endpoint slices and attach them to this issue
@zerkms zerkms added the bug label Sep 12, 2023
@leppeK
Copy link

leppeK commented Sep 12, 2023

Please note that this behaviour only affects services with ExternalTrafficPolicy: Local

@zerkms
Copy link
Contributor Author

zerkms commented Sep 12, 2023

@leppeK that's true, I implied it, sorry. Now added it explicitly to the repro steps.

Also thanks to the slack user Merijn Keppel who mentioned the same.

@fedepaol
Copy link
Member

Yup, makes sense! Thanks for rasinig this

zerkms pushed a commit to zerkms/metallb that referenced this issue Sep 20, 2023
@zerkms
Copy link
Contributor Author

zerkms commented Sep 20, 2023

It was an easy fix, hopefully I didn't miss anything :-)

zerkms added a commit to zerkms/metallb that referenced this issue Sep 20, 2023
Fixed metallb#2074

Signed-off-by: Ivan Kurnosov <zerkms@zerkms.com>
zerkms added a commit to zerkms/metallb that referenced this issue Sep 21, 2023
Fixed metallb#2074

Signed-off-by: Ivan Kurnosov <zerkms@zerkms.com>
zerkms added a commit to zerkms/metallb that referenced this issue Sep 21, 2023
It is necessary because `.ready` flag is set to `False` immediately during shutdown of the pod, while pod is still alive. And `.serving` flag is `True` while pod is healthy (even while shutting down).

So this change unlocks the ability to implement graceful shutdown for pods.

Sample scenario:

1. Pod is healthy and running, then it receives a shutdown signal (eg: pod is just deleted, or the node is drained)
2. Pod handles the kill signal and starts gracefully shutting down. At this state `.ready = False, .serving = True`
3. With new implementation - because `.serving == True` the pod's IP is still announced, which allows traffic for already established connection to freely flow to it
4. Then the pod completes its graceful shutdown ceremony and quits. At this point endpoint is removed from the endpointslice, and the IP is removed from announces.

Fixed metallb#2074

Signed-off-by: Ivan Kurnosov <zerkms@zerkms.com>
zerkms added a commit to zerkms/metallb that referenced this issue Sep 21, 2023
It is necessary because `.ready` flag is set to `False` immediately during shutdown of the pod, while pod is still alive. And `.serving` flag is `True` while pod is healthy (even while shutting down).

So this change unlocks the ability to implement graceful shutdown for pods.

Sample scenario:

1. Pod is healthy and running, then it receives a shutdown signal (eg: pod is just deleted, or the node is drained)
2. Pod handles the kill signal and starts gracefully shutting down. At this state `.ready = False, .serving = True`
3. With new implementation - because `.serving == True` the pod's IP is still announced, which allows traffic for already established connections to freely flow to it
4. Then the pod completes its graceful shutdown ceremony and quits. At this point endpoint is removed from the endpointslice, and the IP is removed from announces.

Fixed metallb#2074

Signed-off-by: Ivan Kurnosov <zerkms@zerkms.com>
github-merge-queue bot pushed a commit that referenced this issue Sep 26, 2023
It is necessary because `.ready` flag is set to `False` immediately during shutdown of the pod, while pod is still alive. And `.serving` flag is `True` while pod is healthy (even while shutting down).

So this change unlocks the ability to implement graceful shutdown for pods.

Sample scenario:

1. Pod is healthy and running, then it receives a shutdown signal (eg: pod is just deleted, or the node is drained)
2. Pod handles the kill signal and starts gracefully shutting down. At this state `.ready = False, .serving = True`
3. With new implementation - because `.serving == True` the pod's IP is still announced, which allows traffic for already established connections to freely flow to it
4. Then the pod completes its graceful shutdown ceremony and quits. At this point endpoint is removed from the endpointslice, and the IP is removed from announces.

Fixed #2074

Signed-off-by: Ivan Kurnosov <zerkms@zerkms.com>
fedepaol pushed a commit to fedepaol/metallb that referenced this issue Oct 19, 2023
It is necessary because `.ready` flag is set to `False` immediately during shutdown of the pod, while pod is still alive. And `.serving` flag is `True` while pod is healthy (even while shutting down).

So this change unlocks the ability to implement graceful shutdown for pods.

Sample scenario:

1. Pod is healthy and running, then it receives a shutdown signal (eg: pod is just deleted, or the node is drained)
2. Pod handles the kill signal and starts gracefully shutting down. At this state `.ready = False, .serving = True`
3. With new implementation - because `.serving == True` the pod's IP is still announced, which allows traffic for already established connections to freely flow to it
4. Then the pod completes its graceful shutdown ceremony and quits. At this point endpoint is removed from the endpointslice, and the IP is removed from announces.

Fixed metallb#2074

Signed-off-by: Ivan Kurnosov <zerkms@zerkms.com>
novad03 pushed a commit to novad03/k8s-meta that referenced this issue Nov 25, 2023
It is necessary because `.ready` flag is set to `False` immediately during shutdown of the pod, while pod is still alive. And `.serving` flag is `True` while pod is healthy (even while shutting down).

So this change unlocks the ability to implement graceful shutdown for pods.

Sample scenario:

1. Pod is healthy and running, then it receives a shutdown signal (eg: pod is just deleted, or the node is drained)
2. Pod handles the kill signal and starts gracefully shutting down. At this state `.ready = False, .serving = True`
3. With new implementation - because `.serving == True` the pod's IP is still announced, which allows traffic for already established connections to freely flow to it
4. Then the pod completes its graceful shutdown ceremony and quits. At this point endpoint is removed from the endpointslice, and the IP is removed from announces.

Fixed metallb/metallb#2074

Signed-off-by: Ivan Kurnosov <zerkms@zerkms.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants