Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add GC workqueue Forget to stop the rate limiter #106029

Merged
merged 1 commit into from
Jan 10, 2022

Conversation

astraw99
Copy link
Member

What type of PR is this?

/kind bug

What this PR does / why we need it:

In controller sync func, after the workqueue item AddRateLimited, we need Forget to stop the rate limiter from tracking it if the sync finished without error.

Does this PR introduce a user-facing change?

NONE

@k8s-ci-robot k8s-ci-robot added release-note-none Denotes a PR that doesn't merit a release note. kind/bug Categorizes issue or PR as related to a bug. size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Oct 31, 2021
@k8s-ci-robot
Copy link
Contributor

Hi @astraw99. Thanks for your PR.

I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. sig/apps Categorizes an issue or PR as relevant to SIG Apps. and removed do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Oct 31, 2021
@fedebongio
Copy link
Contributor

/assign @deads2k @caesarxuchao
/triage accepted

@k8s-ci-robot k8s-ci-robot added triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Nov 2, 2021
@k8s-ci-robot k8s-ci-robot added size/S Denotes a PR that changes 10-29 lines, ignoring generated files. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. do-not-merge/contains-merge-commits Indicates a PR which contains merge commits. and removed size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. labels Nov 9, 2021
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/contains-merge-commits Indicates a PR which contains merge commits. label Nov 9, 2021
Copy link
Contributor

@deads2k deads2k left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I cannot comment outside the diff, but in the cases where we are not adding back to the queue, why shouldn't we .Forget? A standard controller does the forget in a layer outside the worker itself (example: https://github.com/kubernetes/community/blob/master/contributors/devel/sig-api-machinery/controllers.md#rough-structure).

I think it may be cleaner to model it like this. @caesarxuchao what do you think?

pkg/controller/garbagecollector/garbagecollector.go Outdated Show resolved Hide resolved
pkg/controller/garbagecollector/garbagecollector.go Outdated Show resolved Hide resolved
@astraw99 astraw99 force-pushed the fix-workqueue-forget branch 2 times, most recently from a7c67a4 to a430179 Compare November 10, 2021 04:55
@astraw99
Copy link
Member Author

@deads2k @caesarxuchao PTAL thanks.

@deads2k
Copy link
Contributor

deads2k commented Nov 30, 2021

This is a good catch. I'm wondering if it's caused because the structure doesn't follow https://github.com/kubernetes/community/blame/master/contributors/devel/sig-api-machinery/controllers.md#L146-L190 . If we followed that pattern of "return an error to requeue, return nil to be forgotten" we could side-step all the edge cases we're catching with one-off's here, right?

@caesarxuchao
Copy link
Member

Good catch. There is no call to Forget() in the entire GC code base. Perhaps the GC code was written before the Forget() method was introduced?

If we followed that pattern of "return an error to requeue, return nil to be forgotten" we could side-step all the edge cases we're catching with one-off's here, right?

That would be ideal but I don't see a clear way to refactor all three non-conforming cases:

  • no requeuing if err==enqueuedVirtualDeleteEventErr
  • no requeuing if err==namespacedOwnerOfClusterScopedObjectErr
  • requeuing if err==nil but !n.isObserved()

I feel it's not worth to refactor.

@deads2k
Copy link
Contributor

deads2k commented Dec 14, 2021

I gave the refactor a try and it got sticky for cases where we want to requeue, but we do not want a message in the log. this happens on RESTMapper errors for instance.

Lacking a refactor, I think we need to go through every return and either add

  1. AddRateLimited
  2. Forget
  3. Comment

It's painful, but I think it's the only way to help us avoid misses. For instance, in attemptToDeleteWorker, we don't currently forget in three of the early returns and in attemptToOrphanWorker we're missing one.

@k8s-ci-robot k8s-ci-robot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Dec 17, 2021
@astraw99
Copy link
Member Author

@deads2k @caesarxuchao Tried to make a refactor, PTAL thanks.

@dims
Copy link
Member

dims commented Jan 6, 2022

This is back on reviewers plate! author has already incorporated suggestions.

@deads2k
Copy link
Contributor

deads2k commented Jan 7, 2022

This is back on reviewers plate! author has already incorporated suggestions.

Working through some factorization tweaks on slack with the author.

@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jan 9, 2022
@astraw99 astraw99 force-pushed the fix-workqueue-forget branch 2 times, most recently from 6a29c18 to 7e4fe39 Compare January 9, 2022 07:16
@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jan 9, 2022
@deads2k
Copy link
Contributor

deads2k commented Jan 10, 2022

Great find and good job on the fix.

/lgtm
/approve

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jan 10, 2022
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: astraw99, deads2k

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jan 10, 2022
@k8s-ci-robot k8s-ci-robot merged commit cc16e77 into kubernetes:master Jan 10, 2022
@k8s-ci-robot k8s-ci-robot added this to the v1.24 milestone Jan 10, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/bug Categorizes issue or PR as related to a bug. lgtm "Looks good to me", indicates that a PR is ready to be merged. needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. release-note-none Denotes a PR that doesn't merit a release note. sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. sig/apps Categorizes an issue or PR as relevant to SIG Apps. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants