Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Event spam in some storage Operation Generator functions #74988

Open
davidz627 opened this issue Mar 5, 2019 · 21 comments · Fixed by #75986
Open

Event spam in some storage Operation Generator functions #74988

davidz627 opened this issue Mar 5, 2019 · 21 comments · Fixed by #75986
Assignees
Labels
kind/bug Categorizes issue or PR as related to a bug. lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. sig/storage Categorizes an issue or PR as relevant to SIG Storage. triage/needs-information Indicates an issue needs more information in order to work on it.

Comments

@davidz627
Copy link
Contributor

Seems like in some Operation Generator functions (I saw this in GenerateAttachVolumeFunc if an error is encountered in the "set up" part outside of the actual AttachVolumeFunc there is no rate limiting and the GenerateAttachVolumeFunc will be retried over and over hundreds of times a second. This causes significant event spam and can also slow down the controller.

/sig storage
/kind bug
/cc @msau42 @saad-ali @jingxu97 @verult

How to reproduce it (as minimally and precisely as possible):
Cause some long term error in lines:
https://github.com/davidz627/kubernetes/blob/f7a6b0a8602e02d67fabaa85458a94c0f14599a5/pkg/volume/util/operationexecutor/operation_generator.go#L305-L334
Observe function being retried without rate limiting.

@davidz627 davidz627 added the kind/bug Categorizes issue or PR as related to a bug. label Mar 5, 2019
@k8s-ci-robot k8s-ci-robot added the sig/storage Categorizes an issue or PR as relevant to SIG Storage. label Mar 5, 2019
@davidz627
Copy link
Contributor Author

fixed for mount here: #71581

@mucahitkurt
Copy link
Contributor

Hi @davidz627 , I would like to work on this issue, I think I'll apply the similar approach like #71581.

@saad-ali
Copy link
Member

saad-ali commented Apr 1, 2019

SGTM

@msau42
Copy link
Member

msau42 commented Apr 25, 2019

/reopen
We should double check all volume operations are fixed

@k8s-ci-robot k8s-ci-robot reopened this Apr 25, 2019
@k8s-ci-robot
Copy link
Contributor

@msau42: Reopened this issue.

In response to this:

/reopen
We should double check all volume operations are fixed

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jul 24, 2019
@fejta-bot
Copy link

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Aug 23, 2019
@davidz627
Copy link
Contributor Author

/remove-lifecycle rotten

@k8s-ci-robot k8s-ci-robot removed the lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. label Aug 26, 2019
@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Nov 24, 2019
@davidz627
Copy link
Contributor Author

/remove-lifecycle stale

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Nov 26, 2019
@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Feb 24, 2020
@davidz627
Copy link
Contributor Author

/assign
I'm working on a quick refactor right now

@fejta-bot
Copy link

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Mar 26, 2020
@fejta-bot
Copy link

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

@k8s-ci-robot
Copy link
Contributor

@fejta-bot: Closing this issue.

In response to this:

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@msau42
Copy link
Member

msau42 commented Jul 24, 2020

/reopen
/assign @msau42

@msau42
Copy link
Member

msau42 commented Jul 24, 2020

/lifecycle frozen

@k8s-ci-robot
Copy link
Contributor

@msau42: Reopened this issue.

In response to this:

/reopen
/assign @msau42

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot reopened this Jul 24, 2020
@k8s-ci-robot k8s-ci-robot added lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. and removed lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. labels Jul 24, 2020
@verult
Copy link
Contributor

verult commented Jul 29, 2021

Per @msau42 's suggestion, for mount success events we could disable events only for remountable volumes (secrets/configmaps/etc., CSI volumes with this field set).

Also consider adding failure events for other volume operations, most importantly teardown operations such as unmount and detach. Currently failure events only exist for attach, mount, map, expand, and expandInUse.

@msau42
Copy link
Member

msau42 commented Jul 29, 2021

This original issue is tracking a problem where we generate events outside of the operation executor, which means that any errors that happen are event spammed without backoff.

@xing-yang
Copy link
Contributor

/triage needs-information

@k8s-ci-robot k8s-ci-robot added the triage/needs-information Indicates an issue needs more information in order to work on it. label Mar 1, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug. lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. sig/storage Categorizes an issue or PR as relevant to SIG Storage. triage/needs-information Indicates an issue needs more information in order to work on it.
Projects
None yet
8 participants