Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KEP-1672: promote EndpointSliceTerminatingCondition feature to GA #3504

Merged
merged 2 commits into from Oct 2, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
2 changes: 2 additions & 0 deletions keps/prod-readiness/sig-network/1672.yaml
Expand Up @@ -3,3 +3,5 @@ alpha:
approver: "@wojtek-t"
beta:
approver: "@wojtek-t"
stable:
approver: "@wojtek-t"
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wojtek-t @marseel looking for a sign off here.

A few months back, myself and @marseel ran the Kubernetes performance tests to validate that this change did not regress performance significantly. IIRC, the tests included these two changes:

I don't think we saw any noticeable regression in performance, but please correct me if I'm mis-remebering.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, you are correct, we haven't seen noticeable regression.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup - I remember that and I'm fine with this from scalability POV.
Some other comment below though :)

145 changes: 144 additions & 1 deletion keps/sig-network/1672-tracking-terminating-endpoints/README.md
Expand Up @@ -233,7 +233,150 @@ large number of traffic to the apiserver.

###### Were upgrade and rollback tested? Was the upgrade->downgrade->upgrade path tested?

Not yet, but manual upgrade and rollback testing will be done prior to graduating the feature to Beta.
Rollback was manually validated using the following steps:

Create a kind cluster:
```
$ kind create cluster
Creating cluster "kind" ...
✓ Ensuring node image (kindest/node:v1.25.0) 🖼
✓ Preparing nodes 📦
✓ Writing configuration 📜
✓ Starting control-plane 🕹️
✓ Installing CNI 🔌
✓ Installing StorageClass 💾
Set kubectl context to "kind-kind"
You can now use your cluster with:

kubectl cluster-info --context kind-kind

Thanks for using kind! 😊
```

Check an EndpointSlice object to verify that the EndpointSliceTerminatingCondition feature gate is enabled.
Note that endpoints should have 3 conditions, `ready`, `serving` and `terminating`:
```
$ kubectl -n kube-system get endpointslice
NAME ADDRESSTYPE PORTS ENDPOINTS AGE
kube-dns-zp8h5 IPv4 9153,53,53 10.244.0.2,10.244.0.4 2m20s
$ kubectl -n kube-system get endpointslice kube-dns-zp8h5 -oyaml
addressType: IPv4
apiVersion: discovery.k8s.io/v1
endpoints:
- addresses:
- 10.244.0.2
conditions:
ready: true
serving: true
terminating: false
nodeName: kind-control-plane
targetRef:
kind: Pod
name: coredns-565d847f94-lgsrp
namespace: kube-system
uid: cefa189a-66e0-4da3-8341-5c4e9f11407b
- addresses:
- 10.244.0.4
conditions:
ready: true
serving: true
terminating: false
nodeName: kind-control-plane
targetRef:
kind: Pod
name: coredns-565d847f94-ptfln
namespace: kube-system
uid: d9003b65-2316-4d76-96f5-34d7570e6fcb
kind: EndpointSlice
metadata:
...
...
```

Turn off the `EndpointSliceTerminatingCondition` feature gate in `kube-apiserver` and `kube-controller-manager` (this is effectively
the state of the feature gate when it was Alpha).
```
$ docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
501fafe18dbd kindest/node:v1.25.0 "/usr/local/bin/entr…" 4 minutes ago Up 4 minutes 127.0.0.1:36795->6443/tcp kind-control-plane
$ docker exec -ti kind-control-plane bash
$ vim /etc/kubernetes/manifests/kube-apiserver.yaml
# append --feature-gates=EndpointSliceTerminatingCondition=false to kube-apiserver flags
$ vim /etc/kubernetes/manifests/kube-controller-manager.yaml
# append --feature-gates=EndpointSliceTerminatingCondition=false to kube-controller-manager flags
```

Once `kube-apiserver` and `kube-controller-manager` restarts with the flag disabled, check that endpoints have the
`serving` and `terminating` conditions preserved and only dropped on the next update.
```
# preserved initially
$ kubectl -n kube-system get endpointslice kube-dns-zp8h5 -oyaml
addressType: IPv4
apiVersion: discovery.k8s.io/v1
endpoints:
- addresses:
- 10.244.0.2
conditions:
ready: true
serving: true
terminating: false
nodeName: kind-control-plane
targetRef:
kind: Pod
name: coredns-565d847f94-lgsrp
namespace: kube-system
uid: cefa189a-66e0-4da3-8341-5c4e9f11407b
- addresses:
- 10.244.0.4
conditions:
ready: true
serving: true
terminating: false
nodeName: kind-control-plane
targetRef:
kind: Pod
name: coredns-565d847f94-ptfln
namespace: kube-system
uid: d9003b65-2316-4d76-96f5-34d7570e6fcb
kind: EndpointSlice
metadata:
...

# trigger an update to endpointslice
$ kubectl -n kube-system delete po -l k8s-app=kube-dns
pod "coredns-565d847f94-lgsrp" deleted
pod "coredns-565d847f94-ptfln" deleted

# verify that serving/terminating conditions are now dropped
$ kubectl -n kube-system get endpointslice kube-dns-zp8h5 -oyaml
addressType: IPv4
apiVersion: discovery.k8s.io/v1
endpoints:
- addresses:
- 10.244.0.6
conditions:
ready: true
nodeName: kind-control-plane
targetRef:
kind: Pod
name: coredns-565d847f94-jhtxk
namespace: kube-system
uid: b9a45145-fd3a-4f03-8243-59b0a0789bbf
- addresses:
- 10.244.0.5
conditions:
ready: true
nodeName: kind-control-plane
targetRef:
kind: Pod
name: coredns-565d847f94-r6n9s
namespace: kube-system
uid: e20ce163-cf2b-4251-bcc8-352dcaf135c9
kind: EndpointSlice
metadata:
...
...
```

###### Is the rollout accompanied by any deprecations and/or removals of features, APIs, fields of API types, flags, etc.?

Expand Down
5 changes: 3 additions & 2 deletions keps/sig-network/1672-tracking-terminating-endpoints/kep.yaml
Expand Up @@ -20,17 +20,18 @@ see-also:
replaces: []

# The target maturity stage in the current dev cycle for this KEP.
stage: beta
stage: stable
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's no diff in the actual KEP since it already includes the graudation criteria for GA:

#### GA

* E2E tests validating that terminating pods are properly reflected in EndpointSlice API.
* Ensure there are no performance/scalability regressions when enabling additional endpointslice writes for terminating endpoints.
  * This will be validated by running the existing scalability test suites where pods handle SIGTERM from kubelet before terminating.
* All necessary metrics are in place to provide adequate observability and monitoring for this feature.


# The most recent milestone for which work toward delivery of this KEP has been
# done. This can be the current (upcoming) milestone, if it is being actively
# worked on.
latest-milestone: "v1.24"
latest-milestone: "v1.26"

# The milestone at which this feature was, or is targeted to be, at each stage.
milestone:
alpha: "v1.20"
beta: "v1.22"
stable: "v1.26"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment regarding PRR in the KEP;

Not yet, but manual upgrade and rollback testing will be done prior to graduating the feature to Beta.

Have those been done? Can you please summarize the setup in which it was done and findings?
If not - can you please do that now?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I added a new section for manual steps taken for testing rollback. Let me know if they are sufficient.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes - this looks great - thanks!


# The following PRR answers are required at alpha release
# List the feature gate name and the components for which it must be enabled
Expand Down