Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding new EndpointSlice Mirroring Controller #91637

Merged
merged 6 commits into from
Jul 10, 2020

Conversation

robscott
Copy link
Member

@robscott robscott commented Jun 1, 2020

What type of PR is this?
/kind feature

What this PR does / why we need it:
This adds an EndpointSlice mirroring controller to mirror custom Endpoints to EndpointSlices as described by the KEP.

Does this PR introduce a user-facing change?:

Custom Endpoints are now mirrored to EndpointSlices by a new EndpointSliceMirroring controller.

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:

Test Coverage
Progress on required test coverage from the KEP.

Unit Tests

  • Stale EndpointSlices managed by EndpointSliceMirroring controller are cleaned up by the controller when it starts up.
    Covered by controller tests, primarily an issue for garbage collector since these are owned by Endpoints resources.

  • Only Endpoints with the skip label set to "true" will be mirrored by the EndpointSliceMirroring controller.
    This has changed a bit, but is covered by controller tests.

  • Endpoints transitioning between values of the skip label should result in corresponding EndpointSlices being created or deleted.
    Covered by controller tests, could potentially use additional coverage in e2e or integration tests

  • Mirrored EndpointSlices will:

    • Include the appropriate labels, endpoints, and ports of the corresponding Endpoints resource.
      Covered by unit tests.

    • Refer to the original Endpoints resource with an OwnerReference and a kubernetes.io/service-name label.
      Covered by unit tests.

    • Include a endpointslice.kubernetes.io/managed-by label set to endpointslicemirroring-controller.k8s.io.
      Covered by unit tests.

  • Endpoints with dual stack Endpoints should be represented with multiple EndpointSlices, at least 1 per IP family.
    Covered by unit tests, could potentially add some scale with unit or integration tests.

  • Endpoints with multiple subsets should be represented with multiple EndpointSlices, at least 1 per subset.
    Covered by unit tests, could potentially add some scale with unit or integration tests.

  • Endpoints with multiple subsets and IP families should be represented with multiple EndpointSlices, at least 1 per subset and IP family.
    Covered by unit tests, could potentially add some scale with unit or integration tests.

  • Endpoints with more than 1000 endpoints should result in exactly 1000 endpoints being mirrored to corresponding EndpointSlices.
    Ended up with a variation that will limit replication to 1000 endpoints per subset, covered by controller tests.

  • EndpointSlices with a different endpointslice.kubernetes.io/managed-by will not be modified by the EndpointSlice mirroring controller.
    Covered in controller tests.

  • When a selector is added to a Service with a preexisting Endpoints resource:

    • The Endpoints controller will modify the existing Endpoints resource to add the endpointslice.kubernetes.io/skip-mirror label.
      Covered by unit and integration tests.

    • The EndpointSliceMirroring controller will delete any EndpointSlices it has mirrored for that Endpoints resource.
      EndpointSliceMirroring controller doesn't watch Services, but other tests
      verify that skip label is respected.

E2E Tests

  • Custom Endpoints are mirrored through create, update, and delete process.
    Covered by e2e tests.

  • Endpoints created by the Endpoints controller include the skip label.
    Covered by e2e and integration tests.

  • EndpointSlices are created with the appropriate labels, endpoints, ports, and Service references (label and owner ref).
    Covered fully by unit tests, partially by e2e tests.

/sig network
/hold

@k8s-ci-robot
Copy link
Contributor

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@k8s-ci-robot k8s-ci-robot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. kind/feature Categorizes issue or PR as related to a new feature. sig/network Categorizes an issue or PR as relevant to SIG Network. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. sig/apps Categorizes an issue or PR as relevant to SIG Apps. sig/instrumentation Categorizes an issue or PR as relevant to SIG Instrumentation. labels Jun 1, 2020
@fejta-bot
Copy link

Unknown CLA label state. Rechecking for CLA labels.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/check-cla

@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/api-change Categorizes issue or PR as related to adding, removing, or otherwise changing an API labels Jun 1, 2020
@fedebongio
Copy link
Contributor

/cc @bowei

@k8s-ci-robot k8s-ci-robot requested a review from bowei June 2, 2020 20:15
@aojea
Copy link
Member

aojea commented Jun 4, 2020

/cc

@k8s-ci-robot k8s-ci-robot requested a review from aojea June 4, 2020 11:08
@robscott robscott marked this pull request as ready for review June 10, 2020 05:13
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jun 10, 2020
@robscott
Copy link
Member Author

This is not quite ready for review, but it's close. I'm transitioning this from draft status to get tests running.

@k8s-ci-robot k8s-ci-robot added sig/cloud-provider Categorizes an issue or PR as relevant to SIG Cloud Provider. area/test sig/testing Categorizes an issue or PR as relevant to SIG Testing. labels Jun 11, 2020
Copy link
Member

@bowei bowei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just started reviewing./..

cmd/kube-controller-manager/app/discovery.go Outdated Show resolved Hide resolved
return endpointHash(endpointutil.DeepHashObjectToString(hashObj))
}

// endpointSet provides simple methods for comparing sets of Endpoints.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are implementing this somewhat as in a "functional programming" map style wrt to the func sigs, but it seems like it actually has side effects on the mutation?

Might want to consider matching OO-style vs doing this, as it can become confusing when the original object is modified vs not.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Commented below about wanting to maintain consistency with the EndpointSlice controller for now so we can potentially share this code in the future.

}

// Returns an endpoint matching the hash if contained in the set.
func (s endpointSet) Get(item *discovery.Endpoint) *discovery.Endpoint {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Go map usually returns (elt, ok)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is actually a direct copy of the set defined by the EndpointSlice controller. I think that may have been based on another common k8s Set datastructure, but can't quite remember. I'm open to changing the signature of this function at some point, but would prefer to do it once we're able to combine/share some of the code for these controllers. For the sake of simpler backporting I'm trying to avoid any changes like that here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

@fejta-bot
Copy link

/retest
This bot automatically retries jobs that failed/flaked on approved PRs (send feedback to fejta).

Review the full test history for this PR.

Silence the bot with an /lgtm cancel or /hold comment for consistent failures.

@aojea
Copy link
Member

aojea commented Jul 7, 2020

/retest

@fejta-bot
Copy link

/retest
This bot automatically retries jobs that failed/flaked on approved PRs (send feedback to fejta).

Review the full test history for this PR.

Silence the bot with an /lgtm cancel or /hold comment for consistent failures.

2 similar comments
@fejta-bot
Copy link

/retest
This bot automatically retries jobs that failed/flaked on approved PRs (send feedback to fejta).

Review the full test history for this PR.

Silence the bot with an /lgtm cancel or /hold comment for consistent failures.

@fejta-bot
Copy link

/retest
This bot automatically retries jobs that failed/flaked on approved PRs (send feedback to fejta).

Review the full test history for this PR.

Silence the bot with an /lgtm cancel or /hold comment for consistent failures.

@robscott
Copy link
Member Author

robscott commented Jul 8, 2020

/retest

@fejta-bot
Copy link

/retest
This bot automatically retries jobs that failed/flaked on approved PRs (send feedback to fejta).

Review the full test history for this PR.

Silence the bot with an /lgtm cancel or /hold comment for consistent failures.

@robscott
Copy link
Member Author

robscott commented Jul 8, 2020

/retest

@fejta-bot
Copy link

/retest
This bot automatically retries jobs that failed/flaked on approved PRs (send feedback to fejta).

Review the full test history for this PR.

Silence the bot with an /lgtm cancel or /hold comment for consistent failures.

1 similar comment
@fejta-bot
Copy link

/retest
This bot automatically retries jobs that failed/flaked on approved PRs (send feedback to fejta).

Review the full test history for this PR.

Silence the bot with an /lgtm cancel or /hold comment for consistent failures.

@spiffxp spiffxp added this to the v1.19 milestone Jul 9, 2020
@bowei
Copy link
Member

bowei commented Jul 10, 2020

/lgtm

@robscott
Copy link
Member Author

/retest

1 similar comment
@robscott
Copy link
Member Author

/retest

@fejta-bot
Copy link

/retest
This bot automatically retries jobs that failed/flaked on approved PRs (send feedback to fejta).

Review the full test history for this PR.

Silence the bot with an /lgtm cancel or /hold comment for consistent failures.

@robscott
Copy link
Member Author

/retest

1 similar comment
@robscott
Copy link
Member Author

/retest

@Antiarchitect
Copy link

Will this be backported to 1.18? Currently facing this issue with Cilium 1.8.2 and K8s 1.18.8 with services without selector.

@robscott
Copy link
Member Author

@Antiarchitect Is that with the EndpointSliceProxying feature gate enabled on kube-proxy? Although I'd love to cherry pick this I think it's likely too big to pull back to a previous Kubernetes version.

@Antiarchitect
Copy link

@robscott There is no kube-proxy with Cilium - it provides a full replacement for it.

@robscott
Copy link
Member Author

@Antiarchitect woops, I missed that detail. The assumption in 1.18 and earlier was that anything publishing custom Endpoints would starting publishing custom EndpointSlices. That ended up being naive so we added the EndpointSliceMirroring controller in 1.19. Unfortunately any use of EndpointSlices in 1.18 by a proxy is going to require any custom Endpoints to be replicated to EndpointSlices through some other means.

@Antiarchitect
Copy link

Actually there is a way in Cilium to cut the consequences: cilium/cilium#12513 (comment) I'll stick with that until I upgrade to K8s 1.19.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/test cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/api-change Categorizes issue or PR as related to adding, removing, or otherwise changing an API kind/feature Categorizes issue or PR as related to a new feature. lgtm "Looks good to me", indicates that a PR is ready to be merged. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. release-note Denotes a PR that will be considered when it comes time to generate release notes. sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. sig/apps Categorizes an issue or PR as relevant to SIG Apps. sig/auth Categorizes an issue or PR as relevant to SIG Auth. sig/cloud-provider Categorizes an issue or PR as relevant to SIG Cloud Provider. sig/instrumentation Categorizes an issue or PR as relevant to SIG Instrumentation. sig/network Categorizes an issue or PR as relevant to SIG Network. sig/testing Categorizes an issue or PR as relevant to SIG Testing. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet