Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding support for Topology Aware Hints #99522

Merged
merged 4 commits into from Mar 9, 2021

Conversation

robscott
Copy link
Member

@robscott robscott commented Feb 27, 2021

What type of PR is this?

/kind feature
/kind api-change

What this PR does / why we need it:

This adds initial alpha support for Topology Aware Hints.

Does this PR introduce a user-facing change?

Topology Aware Hints are now available in alpha and can be enabled with the `TopologyAwareHints` feature gate.

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:

/sig network
/priority important-soon

@k8s-ci-robot k8s-ci-robot added release-note kind/feature size/XXL kind/api-change sig/network priority/important-soon cncf-cla: yes needs-triage sig/apps labels Feb 27, 2021
@k8s-ci-robot k8s-ci-robot requested review from mikedanese and MrHohn Feb 27, 2021
@robscott robscott changed the title Adding support for Topology Aware Hints WIP: Adding support for Topology Aware Hints Feb 27, 2021
@k8s-ci-robot k8s-ci-robot added the do-not-merge/work-in-progress label Feb 27, 2021
@aojea
Copy link
Member

@aojea aojea commented Feb 27, 2021

/cc

@k8s-ci-robot k8s-ci-robot requested a review from aojea Feb 27, 2021
@robscott robscott force-pushed the topology-hints branch 3 times, most recently from df43282 to c6e2ebe Compare Mar 1, 2021
@robscott robscott changed the title WIP: Adding support for Topology Aware Hints Adding support for Topology Aware Hints Mar 1, 2021
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/work-in-progress label Mar 1, 2021
@robscott
Copy link
Member Author

@robscott robscott commented Mar 1, 2021

/retest

@fejta-bot
Copy link

@fejta-bot fejta-bot commented Mar 1, 2021

This PR may require API review.

If so, when the changes are ready, complete the pre-review checklist and request an API review.

Status of requested reviews is tracked in the API Review project.

@robscott robscott force-pushed the topology-hints branch 3 times, most recently from 8445c39 to 584d277 Compare Mar 2, 2021
@robscott
Copy link
Member Author

@robscott robscott commented Mar 2, 2021

This may be a noisy PR, so removing reviewers that were auto-assigned, feel free to add yourself back if you're interested.
/uncc @mikedanese @MrHohn

@robscott
Copy link
Member Author

@robscott robscott commented Mar 2, 2021

I'm continuing to work on this PR, but I think we've reached a point where review would be valuable. I think the most significant and complex part of this PR is the controller logic. That is largely done now, although I need to significantly improve test coverage. Feedback on the structure and logic here would be very appreciated.

I still need to work on:

  • API strategy and validation for new fields
  • Filtering endpoints in kube-proxy based on these hints when they are present
  • Add metrics
  • Update kube-proxy to support multiple hints per endpoint.
  • Potentially adding a way to opt-in instead of feature gate enabling feature for all Services. Discussions ongoing as far as if this should integrate with traffic policy fields or be standalone.
  • Improved test coverage

/cc @andrewsykim @bowei @dcbw @wojtek-t
/assign @thockin

@k8s-ci-robot
Copy link
Contributor

@k8s-ci-robot k8s-ci-robot commented Mar 7, 2021

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: robscott, thockin

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@robscott
Copy link
Member Author

@robscott robscott commented Mar 7, 2021

Now that EndpointSlice GA API and Controller PRs are in, I've rebased this PR one more time. Leaving the hold in place because I want to make sure Tim is OK with the annotation approach I've taken here. Happy to change it if not.

@robscott
Copy link
Member Author

@robscott robscott commented Mar 7, 2021

/retest

@thockin thockin added this to the v1.21 milestone Mar 8, 2021
@robscott
Copy link
Member Author

@robscott robscott commented Mar 8, 2021

Today's updates:

  • A couple rebases
  • Refactoring + better testing for how endpoints are allocated to different zones
  • A new run of make update that resulted in some updates to staging/src/k8s.io/api/testdata/HEAD for EndpointSlice resources

givingZone, numToGive := getMost(givingZonesDesired)
receivingZone, numToReceive := getMost(receivingZonesDesired)

if (numToGive < 1.0 && numToReceive < 1.0) || numToGive < 0.0 || numToReceive < 0.0 {
Copy link
Member

@aojea aojea Mar 8, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does this guarantee to break 😄

Copy link
Member Author

@robscott robscott Mar 8, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah that's a good question. I can't find any edge cases where it wouldn't, but I may have missed one. I've added some better test coverage that includes some unexpected/invalid inputs. I've also slightly expanded the conditions that would cause this to break out of the loop. Let me know if you can think of any edge cases I'm missing.

@k8s-ci-robot
Copy link
Contributor

@k8s-ci-robot k8s-ci-robot commented Mar 9, 2021

@robscott: The following tests failed, say /retest to rerun all failed tests:

Test name Commit Details Rerun command
pull-kubernetes-bazel-build 5cf0840b8af7d1703d1b6133de7c5111d86a8822 link /test pull-kubernetes-bazel-build
pull-kubernetes-bazel-test 5cf0840b8af7d1703d1b6133de7c5111d86a8822 link /test pull-kubernetes-bazel-test

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@robscott
Copy link
Member Author

@robscott robscott commented Mar 9, 2021

/test pull-kubernetes-integration
(was the TestWebhookTimeoutWithWatchCache flake I've seen several times)

/test pull-kubernetes-e2e-kind-ipv6
(Probing container should be ready immediately after startupProbe succeeds)

@robscott
Copy link
Member Author

@robscott robscott commented Mar 9, 2021

Removing hold now that @thockin has looked at annotation config. I think this is good to go now, PR still needs a LGTM if anyone is able to add that.

@aojea
Copy link
Member

@aojea aojea commented Mar 9, 2021

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm label Mar 9, 2021
@robscott
Copy link
Member Author

@robscott robscott commented Mar 9, 2021

/hold cancel

@k8s-ci-robot k8s-ci-robot removed the do-not-merge/hold label Mar 9, 2021
@k8s-ci-robot k8s-ci-robot merged commit 207c75c into kubernetes:master Mar 9, 2021
16 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved area/apiserver area/ipvs area/test cncf-cla: yes kind/api-change kind/feature lgtm priority/important-soon release-note sig/api-machinery sig/apps sig/auth sig/instrumentation sig/network sig/testing size/XXL triage/accepted
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants