Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

manager#syncPod should cover reconciliation #89155

Closed
wants to merge 1 commit into from

Conversation

tedyu
Copy link
Contributor

@tedyu tedyu commented Mar 16, 2020

What type of PR is this?
/kind bug

What this PR does / why we need it:
Please see related discussion (around issue #80968): #88255 (comment)

@yujuhong 's question led me to thinking about whether reconciliation may miss some case.

For the following code in manager#syncBatch :

				delete(m.apiStatusVersions, syncedUID)
				updatedStatuses = append(updatedStatuses, podStatusSyncRequest{uid, status})

Please note that syncedUID and uid may carry different values.
With this additional log,

@@ -517,33 +521,37 @@ func (m *manager) syncBatch() {
                                        continue
                                }
                                syncedUID = mirrorUID
+                               klog.Infof("assigned syncedUID %q uid %q", syncedUID, uid)
                        }

Here is one example:

I0710 02:47:35.039903    9058 config.go:383] Receiving a new pod "kube-proxy-e2e-9e17d10a0f-a7d53-minion-group-32vz_kube-system(de2df129-bfbd-4c48-90b1-06d49400dad0)"
I0710 02:47:35.040140    9058 kubelet.go:1817] SyncLoop (ADD, "api"): "kube-proxy-e2e-9e17d10a0f-a7d53-minion-group-32vz_kube-system(de2df129-bfbd-4c48-90b1-06d49400dad0)"
...
I0710 02:47:42.007051    9058 status_manager.go:524] assigned syncedUID "de2df129-bfbd-4c48-90b1-06d49400dad0" uid "b31ab15bb4ea18d79cf1cfa40fe53c82"

Please note that, in current code, if m.needsUpdate returns false, syncPod() would return early.
However, this may not work with the reconciliation code path (the UID used to check presence in apiStatusVersions in syncPod may be different from the one whose entry is cleared in syncBatch).

This PR allows manager#syncPod to perform reconciliation. Without this change, reconciliation may be skipped (when syncedUID and uid differ).

See #88255 (comment) from @byxorna

Note: another potential fix is to remove uid from apiStatusVersions in manager#syncBatch but this is not as robust as the current fix.

Which issue(s) this PR fixes:
Fixes #

NONE

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:


@k8s-ci-robot k8s-ci-robot added release-note-none Denotes a PR that doesn't merit a release note. kind/bug Categorizes issue or PR as related to a bug. size/S Denotes a PR that changes 10-29 lines, ignoring generated files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. labels Mar 16, 2020
@k8s-ci-robot k8s-ci-robot added area/kubelet sig/node Categorizes an issue or PR as relevant to SIG Node. and removed needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Mar 16, 2020
@tedyu
Copy link
Contributor Author

tedyu commented Mar 16, 2020

/cc @sjenning @rphillips

@tedyu
Copy link
Contributor Author

tedyu commented Mar 16, 2020

/retest

@tedyu
Copy link
Contributor Author

tedyu commented Mar 16, 2020

/priority important-soon

@k8s-ci-robot k8s-ci-robot added priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. and removed needs-priority Indicates a PR lacks a `priority/foo` label and requires one. labels Mar 16, 2020
@tedyu
Copy link
Contributor Author

tedyu commented Mar 16, 2020

The current formation is a little smarter than the first patch @byxorna put in his cluster.

The checkNeedsUpdate flag only needs to be false when m.needsReconcile() says so.

@tedyu
Copy link
Contributor Author

tedyu commented Mar 16, 2020

/test pull-kubernetes-e2e-gce

@derekwaynecarr
Copy link
Member

/hold

@k8s-ci-robot k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Mar 17, 2020
@derekwaynecarr
Copy link
Member

@smarterclayton i know you are also looking in similar areas.

lets have a broader discussion in sig-node meeting on areas we are investigating on kubelet reliability. right now a number of the changes are hard to follow in a coherent fashion without a broader story.

@derekwaynecarr
Copy link
Member

/assign

@tedyu
Copy link
Contributor Author

tedyu commented Mar 21, 2020

/cc @Random-Liu

@tedyu
Copy link
Contributor Author

tedyu commented Jun 7, 2021

/remove-lifecycle stale

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jun 7, 2021
Copy link
Member

@ehashman ehashman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please don't fight the bot on stale PRs.

There are no accompanying test changes for this PR so we can't be sure it actually addresses the issue. We also recently refactored the PLEG so this change may no longer be necessary. Without demonstrated impact I am inclined to close this PR.

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: tedyu
To complete the pull request process, please ask for approval from derekwaynecarr after the PR has been reviewed.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@tedyu
Copy link
Contributor Author

tedyu commented Aug 4, 2021

I would suggest giving this PR some time to see if the problem still occurs in latest release.

@tedyu
Copy link
Contributor Author

tedyu commented Aug 4, 2021

Which old releases did the PLEG refactoring go to ?

What about the releases without the refactoring ? Shouldn't those releases be covered (by this fix)?

@tedyu
Copy link
Contributor Author

tedyu commented Aug 5, 2021

I took a look at recent checkins to pkg/kubelet/status/status_manager.go

As far as I can tell, the logic touched by the PR remains unchanged.

@ehashman
Can you point me to the commits which you referred to earlier ?

@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Mark this issue or PR as rotten with /lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Nov 3, 2021
@tedyu
Copy link
Contributor Author

tedyu commented Nov 3, 2021

/remove-lifecycle stale

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Nov 3, 2021
@dims
Copy link
Member

dims commented Jan 14, 2022

this PR has been on hold for about 2 weeks. Can you please see how to unblock this by reviewing comments etc? or please close this out (Trying to declutter our open list of PRs)

@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Mark this issue or PR as rotten with /lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Apr 14, 2022
@CormickKneey
Copy link

Please don't fight the bot on stale PRs.

There are no accompanying test changes for this PR so we can't be sure it actually addresses the issue. We also recently refactored the PLEG so this change may no longer be necessary. Without demonstrated impact I am inclined to close this PR.

Any proposal doc or reference link can be followed?

@tedyu
Copy link
Contributor Author

tedyu commented Apr 15, 2022

/remove-lifecycle stale

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Apr 15, 2022
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Mark this issue or PR as rotten with /lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jul 14, 2022
@tedyu
Copy link
Contributor Author

tedyu commented Jul 14, 2022

/remove-lifecycle stale

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jul 14, 2022
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Mark this issue or PR as rotten with /lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Oct 12, 2022
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Nov 11, 2022
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the PR is closed

You can:

  • Reopen this PR with /reopen
  • Mark this PR as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close

@k8s-ci-robot
Copy link
Contributor

@k8s-triage-robot: Closed this PR.

In response to this:

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the PR is closed

You can:

  • Reopen this PR with /reopen
  • Mark this PR as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

SIG Node PR Triage automation moved this from Waiting on Author to Done Dec 11, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/kubelet cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. kind/bug Categorizes issue or PR as related to a bug. lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. release-note-none Denotes a PR that doesn't merit a release note. sig/node Categorizes an issue or PR as relevant to SIG Node. size/S Denotes a PR that changes 10-29 lines, ignoring generated files.
Projects
Development

Successfully merging this pull request may close these issues.

None yet