New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Event spam in some storage Operation Generator functions #74988

Open

davidz627 opened this issue Mar 5, 2019 · 21 comments · Fixed by #75986

Assignees

Labels

kind/bug lifecycle/frozen sig/storage triage/needs-information

Contributor

davidz627 commented Mar 5, 2019

Seems like in some Operation Generator functions (I saw this in GenerateAttachVolumeFunc if an error is encountered in the "set up" part outside of the actual AttachVolumeFunc there is no rate limiting and the GenerateAttachVolumeFunc will be retried over and over hundreds of times a second. This causes significant event spam and can also slow down the controller.

/sig storage
/kind bug
/cc @msau42 @saad-ali @jingxu97 @verult

How to reproduce it (as minimally and precisely as possible):
Cause some long term error in lines:
https://github.com/davidz627/kubernetes/blob/f7a6b0a8602e02d67fabaa85458a94c0f14599a5/pkg/volume/util/operationexecutor/operation_generator.go#L305-L334
Observe function being retried without rate limiting.

The text was updated successfully, but these errors were encountered:

davidz627 added the kind/bug label

k8s-ci-robot added the sig/storage label

Contributor Author

davidz627 commented Mar 5, 2019

fixed for mount here: #71581

Contributor

mucahitkurt commented Mar 31, 2019

Hi @davidz627 , I would like to work on this issue, I think I'll apply the similar approach like #71581.

mucahitkurt mentioned this issue

Reduce event spam for function GenerateAttachVolumeFunc #75986

Merged

Member

saad-ali commented Apr 1, 2019

SGTM

k8s-ci-robot closed this as completed in #75986

Member

msau42 commented Apr 25, 2019

/reopen
We should double check all volume operations are fixed

k8s-ci-robot reopened this

Contributor

k8s-ci-robot commented Apr 25, 2019

@msau42: Reopened this issue.

In response to this:

/reopen
We should double check all volume operations are fixed

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

fejta-bot commented Jul 24, 2019

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

k8s-ci-robot added the lifecycle/stale label

fejta-bot commented Aug 23, 2019

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten

k8s-ci-robot added lifecycle/rotten and removed lifecycle/stale labels

Contributor Author

davidz627 commented Aug 26, 2019

/remove-lifecycle rotten

k8s-ci-robot removed the lifecycle/rotten label

davidz627 mentioned this issue

CSI Migration phase 2: disable probing of in-tree plugins #83098

Merged

fejta-bot commented Nov 24, 2019

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

k8s-ci-robot added the lifecycle/stale label

Contributor Author

davidz627 commented Nov 26, 2019

/remove-lifecycle stale

k8s-ci-robot removed the lifecycle/stale label

fejta-bot commented Feb 24, 2020

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

k8s-ci-robot added the lifecycle/stale label

Contributor Author

davidz627 commented Feb 25, 2020

/assign
I'm working on a quick refactor right now

k8s-ci-robot assigned davidz627

davidz627 mentioned this issue

Refactor all storage Operation Generator functions to remove top level errors reducing event spam. Functionality moved to inner retryable/backoffable funcs #88537

Closed

fejta-bot commented Mar 26, 2020

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten

k8s-ci-robot added lifecycle/rotten and removed lifecycle/stale labels

fejta-bot commented Apr 25, 2020

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

Contributor

k8s-ci-robot commented Apr 25, 2020

@fejta-bot: Closing this issue.

In response to this:

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

k8s-ci-robot closed this as completed

Member

msau42 commented Jul 24, 2020

/reopen
/assign @msau42

k8s-ci-robot assigned msau42

Member

msau42 commented Jul 24, 2020

/lifecycle frozen

Contributor

k8s-ci-robot commented Jul 24, 2020

@msau42: Reopened this issue.

In response to this:

/reopen
/assign @msau42

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

k8s-ci-robot reopened this

k8s-ci-robot added lifecycle/frozen and removed lifecycle/rotten labels

Contributor

verult commented Jul 29, 2021 •

edited

Per @msau42 's suggestion, for mount success events we could disable events only for remountable volumes (secrets/configmaps/etc., CSI volumes with this field set).

Also consider adding failure events for other volume operations, most importantly teardown operations such as unmount and detach. Currently failure events only exist for attach, mount, map, expand, and expandInUse.

Member

msau42 commented Jul 29, 2021

This original issue is tracking a problem where we generate events outside of the operation executor, which means that any errors that happen are event spammed without backoff.

Contributor

xing-yang commented Mar 1, 2023

/triage needs-information

k8s-ci-robot added the triage/needs-information label

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment