Graceful Termination for External Traffic Policy Local #1607

andrewsykim · 2020-03-10T14:06:50Z

This PR adds two related, but separate KEPS:

Tracking Terminating Endpoints in EndpointSlice API (KEP-1672)

Adds a field terminating to the endpointCondition field in the EndpointSlice API. This allows consumers of the API to know when a pod is terminating in case extra steps are required during a pod's graceful termination period. One use-case (in addition to the one below) is to simplify the graceful termination logic in the IPVS proxier, more details in the PR.

Graceful Termination for External Traffic Policy Local (KEP-1669)

Uses the proposed changes to EndpointSlice API above to enable zero downtime deployments for Services with Type=LoadBalancer/NodePort and ExternalTrafficPolicy=Local. This is mainly accomplished by redirecting traffic to ready terminating endpoints if all endpoints on a given node are terminating.

Continuing discussion from kubernetes/kubernetes#85643 (comment)

note: first commit just fixes some trailing white spaces.

andrewsykim · 2020-03-10T14:09:05Z

/assign @robscott @lbernail @thockin

PoC: kubernetes/kubernetes#88590

andrewsykim · 2020-03-10T14:21:19Z

keps/sig-network/20190603-EndpointSlice-API.md

+    // how long existing connections to an endpoint should be kept open after the endpoint
+    // is deleted.
+    // +optional
+    TerminationGracePeriodSeconds *int64 `json:"terminationGracePeriodSeconds,omitempty" protobuf:"bytes,6,opt,name=terminationGracePeriodSeconds"`


@robscott mentioned that an alternative would be to add a Terminating field in EndpointCondition (kubernetes/kubernetes#88590 (comment)). That would require some larger changes in the endpointslice controller but it might be worth while, thoughts?

thockin

Procedurally this should be a new KEP.

I am not convinced this can really be used for what you want to use it for. I am also worried about the expansion of per-endpoint metadata. As we try to map endpoint slice onto other use-cases, the size of them starts to matter. If that propagates into a per-group value, this could become another thing that chops-up otherwise contiguous blocks. I expect in practice it will mostly be the same across all pods, so IF we decide to do something, it should probably be scoped to the whole slice. This does make the slice controller more complicated again.

thockin · 2020-03-11T03:59:57Z

keps/sig-network/20190603-EndpointSlice-API.md

+
+An optional field `TerminationGracePeriodSeconds` can be added to each Endpoint indicating the longest duration an endpoint is given to gracefully shutdown. This is analogous to `TerminationGracePeriodSeconds` in `PodSpec`. In most cases, the endpointslice controller will set `TerminationGracePeriodSeconds` on an `Endpoint` based on the termination grace period on `PodSpec`. External implementations of the `EndpointSlice` API can set this field arbitrarily based on specific implementation requirements.
+
+This field allows consumers of the API to know the termination grace period of endpoints (pods in this case) without having to watch Pods. A specific use-case for this field is the [graceful termination handler in the IPVS proxy](https://github.com/kubernetes/kubernetes/blob/master/pkg/proxy/ipvs/graceful_termination.go). Because deleting an IPVS backend (a.k.a IPVS real server) immediately deletes all its correpsonding conntrack entries, deleting a real server is disruptive during a pod's graceful termination period. Today, the IPVS proxy will only delete a real server if it sees that it has no connections or it has exceeded a hardcoded duration limit (15m). The `TerminationGracePeriodSeconds` field would remove the need for this hardcoded limit, improving overall performance of the proxy as it no longer needs to track stale backends.


I'm not sure I buy that this is useful for this. First, it's the longest timeout. The pod may choose to exit immediately or the node may go down, causing reduced grace. In that case, you will still be routing traffic to it as a realserver, when you could have stopped.

This also doesn't help if kube-proxy gets restarted, right? At that point the endpoint has been removed from the slice and you've lost any "memory" of it, so you will (should) nuke it. I guess that's strictly better than today, but see previous paragraph.

How does IPVS expect a backend to be removed? I don't see a "drain" command...

You have two options:

set the backend weight to 0 which prevents new connections but allows existing ones to graceful terminate (this what we do today on endpoint deletion)

delete the backend, which removes it and terminates the connection by clearing the conntrack and sending a RST on the next packet (which we only do today when there are no active connections to the backend)

The pod may choose to exit immediately or the node may go down, causing reduced grace. In that case, you will still be routing traffic to it as a realserver, when you could have stopped.

As @lbernail mentioned, for the IPVS proxier we reduce the weight of the backend to 0 so no new connections are created, so the grace period only applies to existing connections. The IPVS proxier also tracks connections by state, so we would remove the terminating endpoint early if we saw that all existing connections closed cleanly. In the few edge cases where the connection isn't closed properly, the real server would stay around. So at worse we would be leaving an existing connection open longer than it should up to terminationGracePeriodSeconds, but this is still better than the 15m duration today.

I see, yeah that makes more sense. :) Thanks for explaining.

You still have the kube-proxy restart problem, right? Specifically

Pod X is deleted

10 minute grace period starts

X is removed from endpoints

kube-proxy sets IPVS weight to 0

kube-proxy restarts

WATCH returns endpoints without X

kube-proxy observes realserver X with weight 0 but has no idea if that is seconds old or days old

So it seems to lose information, and at some point needs to make a dubious assumption about how long to wait.

yeah that's true -- for what it's worth the restart problem exists today since graceful termination handling is all done in local memory.

The solution I proposed #1607 (comment) would fix this, but requires changing the behavior so that terminating pods are kept in the slice.

thockin · 2020-03-11T04:01:52Z

keps/sig-network/20190603-EndpointSlice-API.md

+
+This field allows consumers of the API to know the termination grace period of endpoints (pods in this case) without having to watch Pods. A specific use-case for this field is the [graceful termination handler in the IPVS proxy](https://github.com/kubernetes/kubernetes/blob/master/pkg/proxy/ipvs/graceful_termination.go). Because deleting an IPVS backend (a.k.a IPVS real server) immediately deletes all its correpsonding conntrack entries, deleting a real server is disruptive during a pod's graceful termination period. Today, the IPVS proxy will only delete a real server if it sees that it has no connections or it has exceeded a hardcoded duration limit (15m). The `TerminationGracePeriodSeconds` field would remove the need for this hardcoded limit, improving overall performance of the proxy as it no longer needs to track stale backends.
+
+why was "not ready" addresses not used?


This is a good question. I don't think I know the answer. @robscott you have more recent context... ?

If NotReady endpoints were always in the slice, we could easily make the difference between a NotReady endpoint (weight=0) and a deleted endpoint (absent). However, I think NotReady endpoints are only present when PublishNotReadyAddresses is enabled.
If we always have NotReady endpoint in the slice, it will definitely be enough, but it would require some changes in the logic of kube-proxy (and maybe other tools relying on endpoints) to look at PublishNotReadyAddresses to decide on the behavior for a specific service

lol oops, this was not supposed to be in here but yeah this was also raiesd in this thread kubernetes/kubernetes#88590 (comment)

Let's try to drag the conversation from PR back to here.

Ignore "tolerate unready", which just merges the two states.

Endpoints (not slice) puts pods in NotReadyAddresses on startup, but on teardown (when DeletionTimestamp is set) it removes them entirely.

If we moved them BACK to NotReady on teardown it would be a little bit of a behavioral change, but it's semantically correct and seems unlikely to break users. In case of tolerate-unready, we'd need to decide what the "right" behavior is. Today pods being deleted are removed, regardless of tolerate-unready. We could a) put them in NotReadyAddresses (which retains existing behavior -- they are not in Addresses) or b) put then in Addresses (which is a behavioral change, but consistent).

The bigger issue is whether this would cause a scalability problem. It would cause another round of writes to the Endpoints object (ready->unready->gone vs today's ready->gone), which is already problematic at scale. It seems that we don't actively batch updates to a single Endpoints.

For EndpointSlice we have the same question, though less pathologically. For a single pod it's much less of a big deal. In practice, we're likely to see this when a deployment is being updated, so it will hit all pods anyway. This is why we held off adding more Conditions - each one is a cost scalar. This is roughly equivalent to adding a new Condition (and indeed we should discuss that choice).

That said, I prefer making this sort of change to adding a new field. I really want to keep EPSlice general-purpose and grace period feels weird. Also, the above "memory loss" problem exists, whereas "unready" is more complete.

@robscott @wojtek-t re scalability.

andrewsykim · 2020-03-11T13:25:26Z

I am not convinced this can really be used for what you want to use it for. I am also worried about the expansion of per-endpoint metadata. As we try to map endpoint slice onto other use-cases, the size of them starts to matter. If that propagates into a per-group value, this could become another thing that chops-up otherwise contiguous blocks. I expect in practice it will mostly be the same across all pods, so IF we decide to do something, it should probably be scoped to the whole slice. This does make the slice controller more complicated again.

Do you feel this way about adding to EndpointCondition as well? The alternative would be to add a Terminating field to EndpointCondition, which only expands metadata for terminating endpoints (using nil default?) -- which presumably is a smaller set of the endpoints for short durations during rolling updates. This would also address concerns for endpoints that terminate earlier than the grace period and handling terminating endpoints during kube-proxy restarts.

thockin · 2020-03-11T17:04:26Z

I am also worried about the expansion of per-endpoint metadata. As we try to map endpoint slice onto other use-cases, the size of them starts to matter. If that propagates into a per-group value, this could become another thing that chops-up otherwise contiguous blocks. I expect in practice it will mostly be the same across all pods, so IF we decide to do something, it should probably be scoped to the whole slice. This does make the slice controller more complicated again.

Do you feel this way about adding to EndpointCondition as well? The alternative would be to add a Terminating field to EndpointCondition, which only expands metadata for terminating endpoints (using nil default?) -- which presumably is a smaller set of the endpoints for short durations during rolling updates. This would also address concerns for endpoints that terminate earlier than the grace period and handling terminating endpoints during kube-proxy restarts.

I think the Conditions is a better design, but we should think about the meaning of them and their orthogonality.

What we don't really want is to encode a full state-machine of the pod lifecycle into a set of bit fields. For example (ready: false, terminating: false) -> (ready: true, terminating: false) -> (???) Is the terminal state (ready: false, terminating: true) or (ready: true, terminating: true) ? In either case, what does the opposite mean? They aren't really orthogonal.

For better or worse, we did not call "ready" as "started", in part because that implies keeping history rather than simply observing state. This is important because history can get lost, but state is always re-observable. Upon experimenting, Pods (curiously) DO NOT get set to unready (status.Conditions "Ready"="False") when they are being deleted (deletionTimestamp != nil). I don't know if this is an oversight or intentional, but if it is intentional (or can't be changed) the compatible thing to do would be to add a new condition (and a new TerminatingAddresses field in classic Endpoints? Euchh).

@derekwaynecarr since we were JUST talking about pod lifecycle. Do you consider this paragraph above a "bug"? Specifically "pods do not get set to unready when they are being deleted".

This suggests (ready:true, terminating: true) is "normal end of life". So what does (ready: false, terminating: true) mean?

thockin · 2020-03-26T15:26:27Z

@andrewsykim PTAL at kubernetes/kubernetes#85643 (comment)

I think this KEP and that issue are pretty closely related. I would sign up to shepherd the work, but I don't have the time to do all the implementation legwork right now. Is this something you'd be willing/able to carry? It's a slight expansion of scope over what you proposed here (which is just the API, no consumers thereof).

wojtek-t · 2020-03-26T16:16:11Z

I've just read that comment and I think it may have significant performance/scalability implications. So if we decide to put it into this KEP, I would like to be a reviewer (BTW - will be great thing to valdiate prod readiness questionaire too, which you will have to fill anyway).

andrewsykim · 2020-03-26T18:14:07Z

I think this KEP and that issue are pretty closely related. I would sign up to shepherd the work, but I don't have the time to do all the implementation legwork right now. Is this something you'd be willing/able to carry? It's a slight expansion of scope over what you proposed here (which is just the API, no consumers thereof).

Yes, I can try to carry this one forward. Will dig into the issue for more context.

I've just read that comment and I think it may have significant performance/scalability implications. So if we decide to put it into this KEP, I would like to be a reviewer

Noted, I'll ping you once I've updated the KEP.

andrewsykim · 2020-04-02T17:09:24Z

@thockin @wojtek-t I took a stab at implementing kubernetes/kubernetes#85643 (comment) to get a better idea of the work involved and to help facilitate discussion in the KEP (see kubernetes/kubernetes#89780).

I think the semantics get a little weird if we fall back to local not ready endpoints because a not ready endpoint can be terminating or actually failing readiness. In that case should we still fallback to not ready endpoints? If not I think a terminating bool field would still be required? Thoughts?

smarterclayton · 2020-04-06T16:05:53Z

So I missed this when it was open but I consider the current behavior of services while terminating under external traffic policy local to be a bug.

In general a terminating pod (deletionTimestamp set) wants:

load balancers to redirect new traffic
existing connections to continue
to stop accepting connections after it is sure the frontends are pointed away and no new connections are sent
to finish its existing connections
to exit
and for 1-5 to control the pace of drain on a node

I think we are are as a platform not in a great spot to offer correct zero-downtime drains and relabalances for workers (I have DaemonSet surge #1590 open to fix it so that network pods can correctly hand off responsibilities, we have fixed maybe 5-10 issues in drain over the last few months).

I would urge us to ensure services can be used safely for zero disruption handoff of new connections before we add more features to services.

smarterclayton · 2020-04-06T16:08:01Z

keps/sig-network/20190603-EndpointSlice-API.md

+Pod that has a valid `DeletionTimestamp`. Controllers that manage EndpointSlice objects may want to opt out of this feature for optimization reasons (e.g. avoiding per-endpoint metadata expansion)
+which can be accomplished by setting the field to nil.
+
+This field allows consumers of the API to know if endpoints are currently terminating without having to initiate a watch on Pods. A specific use-case for this field is the [graceful termination handler in the IPVS proxy](https://github.com/kubernetes/kubernetes/blob/master/pkg/proxy/ipvs/graceful_termination.go). Because deleting an IPVS backend (a.k.a IPVS real server) immediately deletes all its corresponding conntrack entries, it can be disruptive to delete a real server during its graceful termination period. The solution today is to only delete a real server if it sees that it has no connections or it has exceeded a hardcoded duration limit (15m). This approach is not full-proof as the state of "terminating endpoints" is not preserved when kube-proxy is restarted. The `Terminating` field would remove the need for storing termintaing state if memory, improving overall design and performance of the proxy as it no longer needs to track stale backends in memory.


I think we should explicitly call it in either this KEP or another the use case around graceful shutdown without disruption of pods using services for nodeport, local traffic policy, and service proxy at minimum as the guiding goal. This actually loses some of the context - which is that we generally want to ensure that pods continue to receive existing connection traffic for all services while new connections are rapidly (or less rapidly with readiness gates) diverted away during termination.

I'm trying to add e2e tests to kube suite that will ensure this is possible for a range of services (kubernetes/kubernetes#89547 is one of the first changes we need to so we have a test shim)

Noted, I'll put consumption of the API in a separate KEP.

andrewsykim · 2020-04-07T21:48:46Z

FYI: updated the KEP so it's split into two separate KEPs and proposes to only fallback to ready terminating endpoints (instead of all terminating endpoints).

andrewsykim · 2020-04-24T17:05:51Z

Based on discussions in #1607 (comment), I've updated the KEP to walk through all terminating states for external node local traffic: {R, !T} -> {R, T} -> {!R, T} -> {}.

I've also updated kubernetes/kubernetes#89780 to reflect the changes in this KEP in case some of the details were not clear in writing.

thockin

I'm approving. I'd like Rob to confirm this will not be a shock for EPSlice. Only one comment on note in my feedback.

/approve

thockin · 2020-06-16T19:45:19Z

keps/sig-network/1669-graceful-termination-local-external-traffic-policy/README.md

+  * local ready & terminating endpoints
+  * local not ready & terminating endpoints
+  * blackhole traffic
+* for all other traffic, preserve existing behavior where traffic is only sent to ready && !terminating endpoints.


Why this distinction? I forget if we decided "start here and expand" or if there's a reason to stop at this level. Why NOT do this for all?

This distinction is actually referring to the externalTrafficPolicy=Cluster caes.

Updated to make this clearer

thockin · 2020-06-16T19:58:04Z

keps/sig-network/1669-graceful-termination-local-external-traffic-policy/kep.yaml

+  - "@smarterclayton"
+approvers:
+  - "@thockin"
+editor: TBD


editor has been removed from the template :)

thockin · 2020-06-16T20:05:38Z

keps/sig-network/1672-tracking-terminating-endpoints/kep.yaml

@@ -0,0 +1,20 @@
+title: Tracking Terminating Endpoints in EndpointSlice


@robscott Are you good with this?

Added a couple small nits, but LGTM overall. This specific KEP feels like it might make sense as part of the existing EndpointSlice KEP. That has traditionally been where the EndpointSlice API and any changes to it have been defined. With that said, I know that could result in it becoming a massive KEP. Not sure what the best practice is here.

Tim had previously suggested that this should be a new KEP (see here) and I agree.

I think it would make sense for this KEP to summarize the motivation for the change but we should still backport the API changes into the main EPSlice KEP with maybe a reference to this one?

(I'll open a follow-up PR for this)

k8s-ci-robot · 2020-06-16T20:06:46Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: andrewsykim, thockin

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~keps/sig-network/OWNERS~~ [thockin]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

robscott

Thanks for the work on this!

robscott · 2020-06-16T21:36:05Z

keps/sig-network/1672-tracking-terminating-endpoints/README.md

+    // terminating indicates if this endpoint is terminating. Consumers should assume a
+    // nil value indicates the endpoint  is not terminating.
+    // +optional
+    Terminating *bool `json:"terminating,omitempty" protobuf:"bytes,2,name=terminating"`


Why a pointer here? Would it be safe to get rid of the nil case since it's indistinguishable from false here?

I'm assuming that the nil state could be useful in the future. Some consumeres of EPSlice might have a special case for when the terminating state is unknown vs when it's explicitly false. Maybe I'm overthinking this?

robscott · 2020-06-16T21:38:22Z

keps/sig-network/1672-tracking-terminating-endpoints/README.md

+### Upgrade / Downgrade Strategy
+
+Since this is an addition to the EndpointSlice API, the upgrade/downgrade strategy will follow that
+of the [EndpointSlice API work](/keps/sig-network/20190603-EndpointSlice-API.md).


This and other links to the EndpointSlice KEP are out of date: https://github.com/kubernetes/enhancements/blob/master/keps/sig-network/20190603-endpointslices/README.md.

robscott · 2020-06-16T21:40:36Z

keps/sig-network/1672-tracking-terminating-endpoints/kep.yaml

@@ -0,0 +1,20 @@
+title: Tracking Terminating Endpoints in EndpointSlice


Added a couple small nits, but LGTM overall. This specific KEP feels like it might make sense as part of the existing EndpointSlice KEP. That has traditionally been where the EndpointSlice API and any changes to it have been defined. With that said, I know that could result in it becoming a massive KEP. Not sure what the best practice is here.

Signed-off-by: Andrew Sy Kim <kim.andrewsy@gmail.com>

robscott · 2020-06-17T23:22:07Z

/lgtm

k8s-ci-robot requested review from dcbw and thockin March 10, 2020 14:08

k8s-ci-robot assigned lbernail, robscott and thockin Mar 10, 2020

andrewsykim force-pushed the endpointslice-graceful-termination branch from b26a1dc to 50b0c9e Compare March 10, 2020 14:10

andrewsykim commented Mar 10, 2020

View reviewed changes

thockin reviewed Mar 11, 2020

View reviewed changes

andrewsykim force-pushed the endpointslice-graceful-termination branch 2 times, most recently from eea2797 to 70b8672 Compare March 11, 2020 13:16

andrewsykim mentioned this pull request Mar 11, 2020

[WIP] Add TerminationGracePeriodSeconds to EndpointSlice API kubernetes/kubernetes#88590

Closed

andrewsykim mentioned this pull request Mar 11, 2020

EndpointSlice and Endpoints behavior for terminating pods should be the same kubernetes/kubernetes#89056

Merged

andrewsykim force-pushed the endpointslice-graceful-termination branch from 70b8672 to 9ea1c61 Compare March 17, 2020 19:47

andrewsykim changed the title ~~EndpointSlice: add terminationGracePeriodSeconds to Endpoint~~ EndpointSlice: add Terminating field to EndpointCondition Mar 17, 2020

robscott mentioned this pull request Mar 25, 2020

Create ability to do zero downtime deployments when using externalTrafficPolicy: Local kubernetes/kubernetes#85643

Closed

andrewsykim mentioned this pull request Apr 2, 2020

[WIP] Fallback to terminating endpoints for local external traffic policy kubernetes/kubernetes#89780

Closed

smarterclayton reviewed Apr 6, 2020

View reviewed changes

k8s-ci-robot removed the size/M Denotes a PR that changes 30-99 lines, ignoring generated files. label Apr 6, 2020

andrewsykim force-pushed the endpointslice-graceful-termination branch 2 times, most recently from ccc0cda to e1bf06f Compare April 7, 2020 21:47

andrewsykim force-pushed the endpointslice-graceful-termination branch from e1bf06f to b2b3174 Compare April 23, 2020 15:41

k8s-ci-robot added the sig/architecture Categorizes an issue or PR as relevant to SIG Architecture. label Apr 23, 2020

andrewsykim force-pushed the endpointslice-graceful-termination branch 4 times, most recently from 71e2356 to b5d8f28 Compare April 23, 2020 16:08

andrewsykim mentioned this pull request May 25, 2020

externalTrafficPolicy:Local and proxy-mode=ipvs blackholes traffic on kubernetes/kubernetes#75262

Closed

k8s-ci-robot mentioned this pull request May 28, 2020

proxy/ipvs: add 15m timeout for real server graceful terimination kubernetes/kubernetes#81962

Closed

aojea mentioned this pull request Jun 15, 2020

kube-proxy leaves IPVS weight == 0 after restarting kubernetes/kubernetes#92019

Closed

thockin reviewed Jun 16, 2020

View reviewed changes

k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jun 16, 2020

robscott reviewed Jun 16, 2020

View reviewed changes

andrewsykim force-pushed the endpointslice-graceful-termination branch from b5d8f28 to 559d4aa Compare June 17, 2020 00:20

andrewsykim added 2 commits June 16, 2020 20:26

KEP 1669: Graceful Termination for Local External Traffic Policy

49c9788

Signed-off-by: Andrew Sy Kim <kim.andrewsy@gmail.com>

KEP 1672: Track Terminating Endpoints in EndpointSlice API

d1e0d0e

Signed-off-by: Andrew Sy Kim <kim.andrewsy@gmail.com>

andrewsykim force-pushed the endpointslice-graceful-termination branch from 559d4aa to d1e0d0e Compare June 17, 2020 00:26

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jun 17, 2020

k8s-ci-robot merged commit c005089 into kubernetes:master Jun 17, 2020

k8s-ci-robot added this to the v1.19 milestone Jun 17, 2020

tennix mentioned this pull request Oct 24, 2020

Support graceful shutdown API pingcap/tidb#20619

Closed

andrewsykim mentioned this pull request Nov 1, 2020

Add Terminating Condition to EndpointSlice kubernetes/kubernetes#92968

Merged


		An optional field `TerminationGracePeriodSeconds` can be added to each Endpoint indicating the longest duration an endpoint is given to gracefully shutdown. This is analogous to `TerminationGracePeriodSeconds` in `PodSpec`. In most cases, the endpointslice controller will set `TerminationGracePeriodSeconds` on an `Endpoint` based on the termination grace period on `PodSpec`. External implementations of the `EndpointSlice` API can set this field arbitrarily based on specific implementation requirements.

		This field allows consumers of the API to know the termination grace period of endpoints (pods in this case) without having to watch Pods. A specific use-case for this field is the [graceful termination handler in the IPVS proxy](https://github.com/kubernetes/kubernetes/blob/master/pkg/proxy/ipvs/graceful_termination.go). Because deleting an IPVS backend (a.k.a IPVS real server) immediately deletes all its correpsonding conntrack entries, deleting a real server is disruptive during a pod's graceful termination period. Today, the IPVS proxy will only delete a real server if it sees that it has no connections or it has exceeded a hardcoded duration limit (15m). The `TerminationGracePeriodSeconds` field would remove the need for this hardcoded limit, improving overall performance of the proxy as it no longer needs to track stale backends.


		This field allows consumers of the API to know the termination grace period of endpoints (pods in this case) without having to watch Pods. A specific use-case for this field is the [graceful termination handler in the IPVS proxy](https://github.com/kubernetes/kubernetes/blob/master/pkg/proxy/ipvs/graceful_termination.go). Because deleting an IPVS backend (a.k.a IPVS real server) immediately deletes all its correpsonding conntrack entries, deleting a real server is disruptive during a pod's graceful termination period. Today, the IPVS proxy will only delete a real server if it sees that it has no connections or it has exceeded a hardcoded duration limit (15m). The `TerminationGracePeriodSeconds` field would remove the need for this hardcoded limit, improving overall performance of the proxy as it no longer needs to track stale backends.

		why was "not ready" addresses not used?

		@@ -0,0 +1,20 @@
		title: Tracking Terminating Endpoints in EndpointSlice

Graceful Termination for External Traffic Policy Local #1607

Graceful Termination for External Traffic Policy Local #1607

Conversation

andrewsykim commented Mar 10, 2020 • edited Loading

Tracking Terminating Endpoints in EndpointSlice API (KEP-1672)

Graceful Termination for External Traffic Policy Local (KEP-1669)

andrewsykim commented Mar 10, 2020

andrewsykim Mar 10, 2020 • edited Loading

Choose a reason for hiding this comment

thockin left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

andrewsykim commented Mar 11, 2020 • edited Loading

thockin commented Mar 11, 2020

thockin commented Mar 26, 2020

wojtek-t commented Mar 26, 2020

andrewsykim commented Mar 26, 2020

andrewsykim commented Apr 2, 2020

smarterclayton commented Apr 6, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

andrewsykim Apr 6, 2020 • edited Loading

Choose a reason for hiding this comment

andrewsykim commented Apr 7, 2020

andrewsykim commented Apr 24, 2020

thockin left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

k8s-ci-robot commented Jun 16, 2020

robscott left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

robscott commented Jun 17, 2020

andrewsykim commented Mar 10, 2020 •

edited

Loading

andrewsykim Mar 10, 2020 •

edited

Loading

andrewsykim commented Mar 11, 2020 •

edited

Loading

andrewsykim Apr 6, 2020 •

edited

Loading