Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ref counting is only applicable to Remote endpoints #101358

Merged
merged 1 commit into from May 5, 2021

Conversation

sbangari
Copy link
Contributor

@sbangari sbangari commented Apr 22, 2021

What type of PR is this?

/kind bug

What this PR does / why we need it:

The ref counting logic in Windows kube-proxy is only applicable to remote endpoints. There are instances where the nonessential ref counting on local endpoints is causing a panic in kube-proxy due to ref count object being nil. This PR scopes the ref counting logic only to remote endpoints.

Which issue(s) this PR fixes:

Fixes #100384

Special notes for your reviewer:

Does this PR introduce a user-facing change?

NONE

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:


@k8s-ci-robot
Copy link
Contributor

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@k8s-ci-robot k8s-ci-robot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/needs-kind Indicates a PR lacks a `kind/foo` label and requires one. do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. labels Apr 22, 2021
@k8s-ci-robot k8s-ci-robot added sig/network Categorizes an issue or PR as relevant to SIG Network. approved Indicates a PR has been approved by an approver from all required OWNERS files. and removed do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Apr 22, 2021
@aravindhp
Copy link
Contributor

/sig windows
/area kube-proxy

@k8s-ci-robot k8s-ci-robot added sig/windows Categorizes an issue or PR as relevant to SIG Windows. area/kube-proxy release-note-none Denotes a PR that doesn't merit a release note. and removed do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. labels Apr 26, 2021
@sbangari sbangari marked this pull request as ready for review April 27, 2021 21:03
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Apr 27, 2021
@sbangari
Copy link
Contributor Author

/ok-to-test

@k8s-ci-robot k8s-ci-robot added the ok-to-test Indicates a non-member PR verified by an org member that is safe to test. label Apr 27, 2021
@sbangari
Copy link
Contributor Author

/retest

1 similar comment
@sbangari
Copy link
Contributor Author

/retest

@sbangari
Copy link
Contributor Author

/assign @Keith-Mange

@k8s-ci-robot
Copy link
Contributor

@sbangari: GitHub didn't allow me to assign the following users: Keith-Mange.

Note that only kubernetes members, repo collaborators and people who have commented on this issue/PR can be assigned. Additionally, issues/PRs can only have 10 assignees at the same time.
For more information please see the contributor guide

In response to this:

/assign @Keith-Mange

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@sbangari
Copy link
Contributor Author

/assign Keith-Mange

@k8s-ci-robot
Copy link
Contributor

@sbangari: GitHub didn't allow me to assign the following users: Keith-Mange.

Note that only kubernetes members, repo collaborators and people who have commented on this issue/PR can be assigned. Additionally, issues/PRs can only have 10 assignees at the same time.
For more information please see the contributor guide

In response to this:

/assign Keith-Mange

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@sbangari
Copy link
Contributor Author

/lgtm

@k8s-ci-robot
Copy link
Contributor

@sbangari: you cannot LGTM your own PR.

In response to this:

/lgtm

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@Keith-Mange
Copy link

/lgtm

@k8s-ci-robot
Copy link
Contributor

@Keith-Mange: changing LGTM is restricted to collaborators

In response to this:

/lgtm

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@sbangari
Copy link
Contributor Author

/assign elweb9858

@@ -354,7 +354,7 @@ func newSourceVIP(hns HostNetworkService, network string, ip string, mac string,

func (ep *endpointsInfo) Cleanup() {
Log(ep, "Endpoint Cleanup", 3)
if ep.refCount != nil {
if !ep.GetIsLocal() && ep.refCount != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just double-checking: we do not want to decrement the endpoint ref count if the endpoint is local?

@@ -354,7 +354,7 @@ func newSourceVIP(hns HostNetworkService, network string, ip string, mac string,

func (ep *endpointsInfo) Cleanup() {
Log(ep, "Endpoint Cleanup", 3)
if ep.refCount != nil {
if !ep.GetIsLocal() && ep.refCount != nil {
*ep.refCount--

// Remove the remote hns endpoint, if no service is referring it
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having trouble with the git UI, so this comment is really for line 363, but I think the !ep.GetIsLocal() check there is now redundant

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also having more trouble with the UI, but I am wondering why 'ep.refCount = nil' is not within that if block on lines 363-371. If we don't attempt to delete a remote endpoint because the refCount is above zero, why set the refCount to nil? This may be outside the scope of this change, but just doesn't make sense to me

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider two services with endpoint objects:

Service1 -> Endpoint1, Endpoint2, Endpoint3

Service2 -> Endpoint1, Endpoint2

And, Service1_Endpoint1 and Service2_Endpoint2 are referring to the same HNS remote endpoint

Original behavior:

Service1_Endpoint1 -> HNSRemoteEndpoint1RefCount

Service1_Endpoint2 -> HNSRemoteEndpoint1RefCount

Deleting Service1 would drop (Service1_Endpoint1 -> HNSRemoteEndpoint1RefCount) to zero and corresponding HNS remote endpoint was deleted, though Service 2 was still using it

Modified behavior:

Service1_Endpoint1 -> *HNSRemoteEndpoint1RefCount ------>
Shared_HNSRemoteEndpoint1RefCount
Service1_Endpoint2 -> *HNSRemoteEndpoint1RefCount ------>

HNSRemoteEndpoint1RefCount is a shared counter and actual refcount's in the endpoint objects are pointers to this shared refcount. Everytime endpoint for a service is cleaned shared refcount is decremented and the refcount pointer on the endpoint is set to nil, so that it can no longer accidentally modify the shared refcount. Once the shared refcount drops to 0 (i.e. no service is using it anylonger) we delete the HNSRemoteEndpoint

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: elweb9858, sbangari

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@elweb9858
Copy link
Contributor

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label May 4, 2021
@sbangari
Copy link
Contributor Author

sbangari commented May 4, 2021

/kind bug

@k8s-ci-robot k8s-ci-robot added kind/bug Categorizes issue or PR as related to a bug. and removed do-not-merge/needs-kind Indicates a PR lacks a `kind/foo` label and requires one. labels May 4, 2021
@sbangari
Copy link
Contributor Author

sbangari commented May 4, 2021

/retest

@k8s-ci-robot k8s-ci-robot merged commit 73c1b2e into kubernetes:master May 5, 2021
@k8s-ci-robot k8s-ci-robot added this to the v1.22 milestone May 5, 2021
@aravindhp
Copy link
Contributor

@sbangari thanks for fixing this. Is it possible to backport this to 1.21 and 1.20?

@cpanato
Copy link
Member

cpanato commented May 21, 2021

/triage accepted
/priority important-soon

@k8s-ci-robot k8s-ci-robot added triage/accepted Indicates an issue or PR is ready to be actively worked on. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. labels May 21, 2021
k8s-ci-robot added a commit that referenced this pull request May 21, 2021
…1358-upstream-release-1.20

Automated cherry pick of #101358: Ref counting is only applicable to Remote endpoints
k8s-ci-robot added a commit that referenced this pull request May 21, 2021
…1358-upstream-release-1.21

Automated cherry pick of #101358: Ref counting is only applicable to Remote endpoints
k8s-ci-robot added a commit that referenced this pull request May 21, 2021
…1358-upstream-release-1.19

Automated cherry pick of #101358: Ref counting is only applicable to Remote endpoints
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/kube-proxy cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/bug Categorizes issue or PR as related to a bug. lgtm "Looks good to me", indicates that a PR is ready to be merged. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. release-note-none Denotes a PR that doesn't merit a release note. sig/network Categorizes an issue or PR as relevant to SIG Network. sig/windows Categorizes an issue or PR as relevant to SIG Windows. size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
SIG-Windows
  
Done (v1.22)
Development

Successfully merging this pull request may close these issues.

Windows kube-proxy panics after a LoadBalancer service is created
6 participants