Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introducing Topology Mode Annotation, Deprecating Topology Hints Annotation #116522

Merged
merged 1 commit into from Mar 14, 2023

Conversation

robscott
Copy link
Member

What type of PR is this?

/kind feature

What this PR does / why we need it:

This introduces the service.kubernetes.io/topology-mode annotation as an eventual replacement for the service.kubernetes.io/topology-aware-hints annotation. The service.kubernetes.io/topology-aware-hints annotation has been deprecated, but will continue to be supported for at least some period of time.

This introduces the idea of implementation-specific values for the annotation - the idea that someone can bring their own implementation of topology. Although I expect most of these implementations to populate EndpointSlice hints, that is not a requirement.

To enable this concept, kube-proxy now accepts any value for the annotation that is not "disabled" or empty.

Does this PR introduce a user-facing change?

* New `service.kubernetes.io/topology-mode` annotation has been introduced as a replacement for the `service.kubernetes.io/topology-aware-hints` annotation.
* `service.kubernetes.io/topology-aware-hints` annotation has been deprecated.
* kube-proxy now accepts any value that is not "disabled" for these annotations, enabling custom implementation-specific and/or future built-in heuristics to be used. 

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:

/sig network
/cc @aojea @bowei @danwinship @shaneutt
/assign @thockin

@k8s-ci-robot k8s-ci-robot added the release-note Denotes a PR that will be considered when it comes time to generate release notes. label Mar 13, 2023
@k8s-ci-robot k8s-ci-robot added kind/feature Categorizes issue or PR as related to a new feature. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Mar 13, 2023
@k8s-ci-robot k8s-ci-robot requested a review from aojea March 13, 2023 06:32
@k8s-ci-robot k8s-ci-robot added the sig/network Categorizes an issue or PR as relevant to SIG Network. label Mar 13, 2023
@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Mar 13, 2023
@k8s-ci-robot
Copy link
Contributor

This issue is currently awaiting triage.

If a SIG or subproject determines this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added needs-priority Indicates a PR lacks a `priority/foo` label and requires one. area/test kind/api-change Categorizes issue or PR as related to adding, removing, or otherwise changing an API sig/apps Categorizes an issue or PR as relevant to SIG Apps. sig/testing Categorizes an issue or PR as relevant to SIG Testing. labels Mar 13, 2023
@k8s-triage-robot
Copy link

This PR may require API review.

If so, when the changes are ready, complete the pre-review checklist and request an API review.

Status of requested reviews is tracked in the API Review project.

Comment on lines +134 to +143
// AnnotationTopologyMode can be used to enable or disable Topology Aware
// Routing for a Service. Well known values are "Auto" and "Disabled".
// Implementations may choose to develop new topology approaches, exposing
// them with domain-prefixed values. For example, "example.com/lowest-rtt"
// could be a valid implementation-specific value for this annotation. These
// heuristics will often populate topology hints on EndpointSlices, but that
// is not a requirement.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

EndpointSlice hints are a typed field so implementation-specific topology modes would not be able to use them, right? Does that put implementation-specific topology modes at a disadvantage relative to official ones?

For that matter, why is this an annotation but the EndpointSlice Hints are a field?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

EndpointSlice hints are a typed field so implementation-specific topology modes would not be able to use them, right?

I think this is all working towards a place where people that want to experiment with new hints heuristics can bring their own fork of EPS controller. That would be made easier by kubernetes/enhancements#3686 and could be another reason to invest in a KEP for something like #87488. The conclusion I've personally come to is that it's going to be very hard to find a heuristic that works for everyone. Providing a way for the community to develop others seems like it may be our best path forward. In the future, if there are clearly effective and widely used community heuristics, we may want to merge them into k/k.

For that matter, why is this an annotation but the EndpointSlice Hints are a field?

Probably the most concrete reason is that it's not possible to attach annotations to a specific endpoint in the list. We're also inching closer to a place where the dataplane doesn't need to care why a hint is present, just that it is. (Prior to this PR the annotation needed to be set to "Auto", now it just can't be set to "Disabled").

Open to other ideas here, but at a high level I think it's important that we provide a way for community heuristics to be developed and experimented with because I don't think we can make the correct routing decisions for everyone with a single algorithm and/or the limited time we have to invest in this feature.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cc @s1061123 and @akhilles since this relates to ideas you've worked on

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does that put implementation-specific topology modes at a disadvantage relative to official ones?

I think that, if there was a need and it was reasonable, we could consider additional hints. Unfortunately, our API and our implementation are entangled and we don't have an awesome way to extend existing resources.

In a perfect world, hints would be an extension of the base type, but there's just no way to express that. We could put all hints in a parallel type, but now we have wonky races where the slice is updated but the hints are not, and we 2x the API ops. We could render hints as a per-slice annotation. That's probably the "fairest" approach, but ick.

I expect that other implementations won't use EPSlice anyway, so it's moot

why is this an annotation but the EndpointSlice Hints are a field

This is fair - topologyMode could be a field. As much churn as this has undergone, that would require resetting to alpha. I think I'd just as soon run with annotation and if this pattern sticks, we can eat the tech-debt of converting to a field.

Agree or disagree?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I'd just as soon run with annotation and if this pattern sticks, we can eat the tech-debt of converting to a field.

Agree or disagree?

agree

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@robscott thank you for the mention to me. My interest is delegate endpointSlice management for specific service, by other controller, that is not to create any endpointslices by EPS controller in Kubernetes controller manager. In this case, this topology-mode annotation only controls the topology, so I believe that endpointSlice is created. So it is slightly related but not so related to me.

@robscott
Copy link
Member Author

/retest

@shaneutt
Copy link
Member

CI problems from earlier should be set now: kubernetes/test-infra#29003

/test all

Copy link
Member

@shaneutt shaneutt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like we're trying to put something in place sooner (rather than later) to provide some clarifications about the fact that hints have been found to be insufficient and that (for now) we want to provide an open-ended annotation which over the coming months new implementations can coalesce upon. I agree with signaling that we found hints were not by themselves sufficient and I also think that adding just a simple common annotation for this for now (and we can revisit later) is a decent start (annotation technical debt aside).

pkg/apis/core/annotation_key_constants.go Outdated Show resolved Hide resolved
pkg/apis/core/annotation_key_constants.go Outdated Show resolved Hide resolved
pkg/apis/core/annotation_key_constants.go Outdated Show resolved Hide resolved
Comment on lines +134 to +143
// AnnotationTopologyMode can be used to enable or disable Topology Aware
// Routing for a Service. Well known values are "Auto" and "Disabled".
// Implementations may choose to develop new topology approaches, exposing
// them with domain-prefixed values. For example, "example.com/lowest-rtt"
// could be a valid implementation-specific value for this annotation. These
// heuristics will often populate topology hints on EndpointSlices, but that
// is not a requirement.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does that put implementation-specific topology modes at a disadvantage relative to official ones?

I think that, if there was a need and it was reasonable, we could consider additional hints. Unfortunately, our API and our implementation are entangled and we don't have an awesome way to extend existing resources.

In a perfect world, hints would be an extension of the base type, but there's just no way to express that. We could put all hints in a parallel type, but now we have wonky races where the slice is updated but the hints are not, and we 2x the API ops. We could render hints as a per-slice annotation. That's probably the "fairest" approach, but ick.

I expect that other implementations won't use EPSlice anyway, so it's moot

why is this an annotation but the EndpointSlice Hints are a field

This is fair - topologyMode could be a field. As much churn as this has undergone, that would require resetting to alpha. I think I'd just as soon run with annotation and if this pattern sticks, we can eat the tech-debt of converting to a field.

Agree or disagree?

pkg/controller/endpointslice/utils.go Outdated Show resolved Hide resolved
if hintsAnnotation != "" && hintsAnnotation != "Disabled" && hintsAnnotation != "disabled" {
klog.InfoS("Skipping topology aware endpoint filtering since Service has unexpected value", "annotationTopologyAwareHints", v1.AnnotationTopologyAwareHints, "hints", hintsAnnotation)
}
if hintsAnnotation == "" || hintsAnnotation == "disabled" || hintsAnnotation == "Disabled" {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't usually allow variant case - should we?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although we only document "Auto" and "Disabled" at this point, the first implementation unfortunately used "auto" and "disabled" so we were stuck with either inconsistency or forever supporting both.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should define the syntax clearly. If it were to become a field what would it be?

I guess it would either be like a label ("foo" vs "example.com/foo") or like a condition ("Foo vs example.com/foo")?

Let's just be clear in the definition. We can accept "auto" and "disabled", but "Auto" and "Disabled" are the canonical names, right?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's just be clear in the definition. We can accept "auto" and "disabled", but "Auto" and "Disabled" are the canonical names, right?

Yes the documentation on both annotations only specifies "Auto" and "Disabled.

We should define the syntax clearly. If it were to become a field what would it be?
I guess it would either be like a label ("foo" vs "example.com/foo") or like a condition ("Foo vs example.com/foo")?

If we're talking about the allowed values of the field/annotation, I think "like a condition" is the answer here so "Auto", "Disabled", and "example.com/foo" are all valid.

@thockin
Copy link
Member

thockin commented Mar 14, 2023 via email

Annotation

As part of this change, kube-proxy accepts any value for either
annotation that is not "disabled".

Change-Id: Idfc26eb4cc97ff062649dc52ed29823a64fc59a4
@robscott
Copy link
Member Author

We should define the syntax clearly. If it were a field what would it be? I guess it would either be like a label (foo vs example.com/foo) or like a condition (Foo vs example.com/foo)?

Not sure what this is in response to. Unfortunately I think we're stuck with both, although we could choose to not carry over lowercase support to the new annotation, that may result in unexpected migration problems. So I think my perspective is that we should document it as a condition (Foo and example.com/foo are valid values), but I'm not sure we can/should prevent use of "auto" and "disabled" for backwards compat.

@thockin
Copy link
Member

thockin commented Mar 14, 2023

Sorry, email responds to threaded comments in conversation, will attach to the variant-case thread.

Copy link
Member

@thockin thockin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we leave a TODO or something to consider converting this to a field?


// v1.DeprecatedAnnotationTopologyAwareHints has precedence over v1.AnnotationTopologyMode.
var ok bool
info.hintsAnnotation, ok = service.Annotations[v1.DeprecatedAnnotationTopologyAwareHints]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to issue an API warning if both are set and different values? Not super critical, but nice.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a great idea, is that something we should just prevent entirely in validation? I'm not familiar with how/where to add API warnings, any tips for the component that should live in if not API validation?

@thockin
Copy link
Member

thockin commented Mar 14, 2023

Thanks!

/lgtm
/approve
/hold

unhold if you're happy with definition of constants or want to do anything else as followup.

nathanjsweet pushed a commit to cilium/cilium that referenced this pull request May 26, 2023
Updating k8s libs to v1.27.2 requires updating
sigs.k8s.io/controller-runtime to the 0.15 version with
kubernetes-sigs/controller-runtime#2223 in
place. This in turn requires some additional adjustments to changed APIs
in the controller-runtime, mainly in operator/pkg/gateway-api. Specifically,
around request methods and testing.

The service.kubernetes.io/topology-aware-hints annotation got deprecated
with k8s 1.27 in favor of service.kubernetes.io/topology-mode (see
kubernetes/kubernetes#116522). This change keeps
support for the deprecated annotation. Support for the new annotation
will be added in a successive commit.

Signed-off-by: Tobias Klauser <tobias@cilium.io>
Signed-off-by: Nate Sweet <nathanjsweet@pm.me>
nathanjsweet pushed a commit to cilium/cilium that referenced this pull request May 26, 2023
k8s 1.27 deprecated the service.kubernetes.io/topology-aware-hints
annotation in favor of the service.kubernetes.io/topology-mode
annotation to allow implementation-specific values for the annotation.
See kubernetes/kubernetes#116522 for more
details.

Add support for service.kubernetes.io/topology-mode, letting the
deprecated service.kubernetes.io/topology-aware-hints take precedence.
This is in correspondence with the upstream implementation.

Signed-off-by: Tobias Klauser <tobias@cilium.io>
Signed-off-by: Nate Sweet <nathanjsweet@pm.me>
sayboras pushed a commit to cilium/cilium that referenced this pull request May 29, 2023
Updating k8s libs to v1.27.2 requires updating
sigs.k8s.io/controller-runtime to the 0.15 version with
kubernetes-sigs/controller-runtime#2223 in
place. This in turn requires some additional adjustments to changed APIs
in the controller-runtime, mainly in operator/pkg/gateway-api. Specifically,
around request methods and testing.

The service.kubernetes.io/topology-aware-hints annotation got deprecated
with k8s 1.27 in favor of service.kubernetes.io/topology-mode (see
kubernetes/kubernetes#116522). This change keeps
support for the deprecated annotation. Support for the new annotation
will be added in a successive commit.

Signed-off-by: Tobias Klauser <tobias@cilium.io>
Signed-off-by: Nate Sweet <nathanjsweet@pm.me>
sayboras pushed a commit to cilium/cilium that referenced this pull request May 29, 2023
k8s 1.27 deprecated the service.kubernetes.io/topology-aware-hints
annotation in favor of the service.kubernetes.io/topology-mode
annotation to allow implementation-specific values for the annotation.
See kubernetes/kubernetes#116522 for more
details.

Add support for service.kubernetes.io/topology-mode, letting the
deprecated service.kubernetes.io/topology-aware-hints take precedence.
This is in correspondence with the upstream implementation.

Signed-off-by: Tobias Klauser <tobias@cilium.io>
Signed-off-by: Nate Sweet <nathanjsweet@pm.me>
nathanjsweet pushed a commit to cilium/cilium that referenced this pull request Jun 6, 2023
Updating k8s libs to v1.27.2 requires updating
sigs.k8s.io/controller-runtime to the 0.15 version with
kubernetes-sigs/controller-runtime#2223 in
place. This in turn requires some additional adjustments to changed APIs
in the controller-runtime, mainly in operator/pkg/gateway-api. Specifically,
around request methods and testing.

The service.kubernetes.io/topology-aware-hints annotation got deprecated
with k8s 1.27 in favor of service.kubernetes.io/topology-mode (see
kubernetes/kubernetes#116522). This change keeps
support for the deprecated annotation. Support for the new annotation
will be added in a successive commit.

Signed-off-by: Tobias Klauser <tobias@cilium.io>
Signed-off-by: Nate Sweet <nathanjsweet@pm.me>
nathanjsweet pushed a commit to cilium/cilium that referenced this pull request Jun 6, 2023
k8s 1.27 deprecated the service.kubernetes.io/topology-aware-hints
annotation in favor of the service.kubernetes.io/topology-mode
annotation to allow implementation-specific values for the annotation.
See kubernetes/kubernetes#116522 for more
details.

Add support for service.kubernetes.io/topology-mode, letting the
deprecated service.kubernetes.io/topology-aware-hints take precedence.
This is in correspondence with the upstream implementation.

Signed-off-by: Tobias Klauser <tobias@cilium.io>
Signed-off-by: Nate Sweet <nathanjsweet@pm.me>
nathanjsweet pushed a commit to cilium/cilium that referenced this pull request Jun 9, 2023
Updating k8s libs to v1.27.2 requires updating
sigs.k8s.io/controller-runtime to the 0.15 version with
kubernetes-sigs/controller-runtime#2223 in
place. This in turn requires some additional adjustments to changed APIs
in the controller-runtime, mainly in operator/pkg/gateway-api. Specifically,
around request methods and testing.

The service.kubernetes.io/topology-aware-hints annotation got deprecated
with k8s 1.27 in favor of service.kubernetes.io/topology-mode (see
kubernetes/kubernetes#116522). This change keeps
support for the deprecated annotation. Support for the new annotation
will be added in a successive commit.

Signed-off-by: Tobias Klauser <tobias@cilium.io>
Signed-off-by: Nate Sweet <nathanjsweet@pm.me>
nathanjsweet pushed a commit to cilium/cilium that referenced this pull request Jun 9, 2023
k8s 1.27 deprecated the service.kubernetes.io/topology-aware-hints
annotation in favor of the service.kubernetes.io/topology-mode
annotation to allow implementation-specific values for the annotation.
See kubernetes/kubernetes#116522 for more
details.

Add support for service.kubernetes.io/topology-mode, letting the
deprecated service.kubernetes.io/topology-aware-hints take precedence.
This is in correspondence with the upstream implementation.

Signed-off-by: Tobias Klauser <tobias@cilium.io>
Signed-off-by: Nate Sweet <nathanjsweet@pm.me>
nathanjsweet pushed a commit to cilium/cilium that referenced this pull request Jun 14, 2023
Updating k8s libs to v1.27.2 requires updating
sigs.k8s.io/controller-runtime to the 0.15 version with
kubernetes-sigs/controller-runtime#2223 in
place. This in turn requires some additional adjustments to changed APIs
in the controller-runtime, mainly in operator/pkg/gateway-api. Specifically,
around request methods and testing.

The service.kubernetes.io/topology-aware-hints annotation got deprecated
with k8s 1.27 in favor of service.kubernetes.io/topology-mode (see
kubernetes/kubernetes#116522). This change keeps
support for the deprecated annotation. Support for the new annotation
will be added in a successive commit.

Signed-off-by: Tobias Klauser <tobias@cilium.io>
Signed-off-by: Nate Sweet <nathanjsweet@pm.me>
nathanjsweet pushed a commit to cilium/cilium that referenced this pull request Jun 14, 2023
k8s 1.27 deprecated the service.kubernetes.io/topology-aware-hints
annotation in favor of the service.kubernetes.io/topology-mode
annotation to allow implementation-specific values for the annotation.
See kubernetes/kubernetes#116522 for more
details.

Add support for service.kubernetes.io/topology-mode, letting the
deprecated service.kubernetes.io/topology-aware-hints take precedence.
This is in correspondence with the upstream implementation.

Signed-off-by: Tobias Klauser <tobias@cilium.io>
Signed-off-by: Nate Sweet <nathanjsweet@pm.me>
nathanjsweet pushed a commit to cilium/cilium that referenced this pull request Jun 14, 2023
Updating k8s libs to v1.27.2 requires updating
sigs.k8s.io/controller-runtime to the 0.15 version with
kubernetes-sigs/controller-runtime#2223 in
place. This in turn requires some additional adjustments to changed APIs
in the controller-runtime, mainly in operator/pkg/gateway-api. Specifically,
around request methods and testing.

The service.kubernetes.io/topology-aware-hints annotation got deprecated
with k8s 1.27 in favor of service.kubernetes.io/topology-mode (see
kubernetes/kubernetes#116522). This change keeps
support for the deprecated annotation. Support for the new annotation
will be added in a successive commit.

Signed-off-by: Tobias Klauser <tobias@cilium.io>
Signed-off-by: Nate Sweet <nathanjsweet@pm.me>
nathanjsweet pushed a commit to cilium/cilium that referenced this pull request Jun 14, 2023
k8s 1.27 deprecated the service.kubernetes.io/topology-aware-hints
annotation in favor of the service.kubernetes.io/topology-mode
annotation to allow implementation-specific values for the annotation.
See kubernetes/kubernetes#116522 for more
details.

Add support for service.kubernetes.io/topology-mode, letting the
deprecated service.kubernetes.io/topology-aware-hints take precedence.
This is in correspondence with the upstream implementation.

Signed-off-by: Tobias Klauser <tobias@cilium.io>
Signed-off-by: Nate Sweet <nathanjsweet@pm.me>
romanspb80 pushed a commit to romanspb80/cilium that referenced this pull request Jun 22, 2023
Updating k8s libs to v1.27.2 requires updating
sigs.k8s.io/controller-runtime to the 0.15 version with
kubernetes-sigs/controller-runtime#2223 in
place. This in turn requires some additional adjustments to changed APIs
in the controller-runtime, mainly in operator/pkg/gateway-api. Specifically,
around request methods and testing.

The service.kubernetes.io/topology-aware-hints annotation got deprecated
with k8s 1.27 in favor of service.kubernetes.io/topology-mode (see
kubernetes/kubernetes#116522). This change keeps
support for the deprecated annotation. Support for the new annotation
will be added in a successive commit.

Signed-off-by: Tobias Klauser <tobias@cilium.io>
Signed-off-by: Nate Sweet <nathanjsweet@pm.me>
romanspb80 pushed a commit to romanspb80/cilium that referenced this pull request Jun 22, 2023
k8s 1.27 deprecated the service.kubernetes.io/topology-aware-hints
annotation in favor of the service.kubernetes.io/topology-mode
annotation to allow implementation-specific values for the annotation.
See kubernetes/kubernetes#116522 for more
details.

Add support for service.kubernetes.io/topology-mode, letting the
deprecated service.kubernetes.io/topology-aware-hints take precedence.
This is in correspondence with the upstream implementation.

Signed-off-by: Tobias Klauser <tobias@cilium.io>
Signed-off-by: Nate Sweet <nathanjsweet@pm.me>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/test cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/api-change Categorizes issue or PR as related to adding, removing, or otherwise changing an API kind/feature Categorizes issue or PR as related to a new feature. lgtm "Looks good to me", indicates that a PR is ready to be merged. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. release-note Denotes a PR that will be considered when it comes time to generate release notes. sig/apps Categorizes an issue or PR as relevant to SIG Apps. sig/network Categorizes an issue or PR as relevant to SIG Network. sig/testing Categorizes an issue or PR as relevant to SIG Testing. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants