Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid dummy-patching for out-of-scope handlers (via active/passive states) #731

Merged
merged 1 commit into from
Apr 3, 2021

Conversation

nolar
Copy link
Owner

@nolar nolar commented Apr 2, 2021

In #686, a complicated bug is well-described and well-debugged by @paxbit: Kopf floods the K8s API with dummy patches with no pause between them in certain conditions — specifically, when there is a persisted state of a handler that has started, didn't finish (with either success or failure), BUT came of out scope during the processing cycle (e.g. due to filters).

The bug was there since the "purposes" were introduced in #606, but was activated and made visible by fn+id handler deduplication in #674.

Reproducible with:

import kopf


@kopf.on.startup()
def configure(settings: kopf.OperatorSettings, **_):
    # Status storage must not intervene with its data:
    settings.persistence.progress_storage = kopf.AnnotationsProgressStorage()


@kopf.on.resume("configmaps", id="aaa")
@kopf.on.update("configmaps", id="bbb",
    when=lambda **_: False,  # <<< comment it, and the bug is gone.
)
def the_handler(**_):
    print("the_handler called")
apiVersion: v1
kind: ConfigMap
metadata:
  annotations:
    kopf.zalando.org/last-handled-configuration: |
      {"simulated-change":123}
    kopf.zalando.org/bbb: '{"started":"2021-03-01T12:34:03.155586","purpose":"update","retries":0,"success":false,"failure":false}'
  name: bug
kubectl delete -f _issue686.yaml
kubectl create -f _issue686.yaml
kopf run _issue686.py --verbose -n default

The fix (this PR) replaces the state's "done"/"delays" computation logic:

  • AS-IS: all recorded handlers with the matching purpose.
  • TO-BE: all handlers currently in scope ("active").

If a handler falls out of scope during the processing cycle, it will not be taken into account in the next "done"/"delays" computation, even if it was not finished before — this was the behaviour before the "purposes" were introduced in #606.


TODOs:

  • Reverify all other side-effects.
  • Reverify the differences of 0 vs. None delays in handler states (see Endless touch-dummy Annotation Patching On Pod Causes High API Load #686's discussion).
  • Check if the purpose checking in "done"/"delays" is still needed and makes sense with explicitly requested handlers.
  • If the logic is correct, adjust the tests.
  • Consider getting rid of "purposes" at all — but they might be needed for "extras"/"counts", where the "currently in-scope handler" might be not enough, but "ever-were in-scope handlers" should be counted.

Replaces #728. Fixes #686.

Difference from #728: There, .done/.delays were changed to check_done(handlers)/get_delays(handlers), requiring the handlers list on every check. While this is not a problem for the logic itself, it feels wrong, as the state is supposed to be stateful by definition, and we feed it the same handlers a few lines before that. In this PR, the global state remembers which handlers are "active" or "passive"; only the "active" handlers are used in .done/.delays, while all of them, "active" and "passive", are used in .counts/.extras/.purge() with the previous logic of "[re]purposing".

Signed-off-by: Sergey Vasilyev <sergey.vasilyev@zalando.de>
@nolar nolar added the bug Something isn't working label Apr 2, 2021
@nolar nolar marked this pull request as ready for review April 2, 2021 22:10
@nolar nolar merged commit 9fa2d68 into main Apr 3, 2021
@nolar nolar deleted the fix-touch-hammer-2 branch April 3, 2021 11:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Endless touch-dummy Annotation Patching On Pod Causes High API Load
1 participant