Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ignore annotations made by other Kopf-based operators #539

Merged
merged 3 commits into from
Sep 9, 2020

Conversation

nolar
Copy link
Owner

@nolar nolar commented Sep 9, 2020

What do these changes do?

Inject Kopf's identity into annotations of all Kopf-based operators. Use that identity to detect and ignore annotations belonging to other Kopf-based operators regardless of their identities.

By this, reduce ping-pong effects with multiple operators handling the same resources, which can overload the K8s API and explode the resources' sizes.

Description

Problem:

Kopf implements additional constructs on top of the regular watch-and-react flow of K8s. To make this work in multiple reaction cycles, Kopf persist its own state into the resource itself. Since #331, the state is persisted into the resource's annotations in addition to (or instead of) the status stanza.

If multiple Kopf-based operators handle the same resources, they can cause ping-pong behaviour:

  • (1) operator A persists its state into an object;
  • (2) operator B believes it is a valid essential change and reacts;
  • (3) operator B persists its updated diff-base, which contains
    the state of the operator A as the essential payload;
  • (4) operator A believes the new change is a valid essential change;
  • (∞) and so it continues forever.

This can quickly overload K8s API with requests per second (ping-pong happens fast), and blow the size of the resources.

The problem becomes more complicated when the operators have different identities: then, operators in the cluster could be unaware that other operators are Kopf-based too, and their annotations are for their state persistence, i.e. it is not something useful.

It is impossible for operator developers to configure their operators so that they would be aware of any other Kopf-based operators, unless the developers control each and every of those operators.


Solution:

To solve the problem with other Kopf-based operators' persistence being detected as meaningful content (i.e. the "essence"), these annotations should be filtered out from the essence and become invisible to any other Kopf-based operator — same as this already happens with the whole .status stanza.

However, Kopf's annotations are stored in a space shared with other non-Kopf-based operators, controllers, applications, and human-made annotations — all of these should be considered as the essential content, and its changes should be detected. Currently, there is no way to distinguish the annotations by their type or origin.

To filter out the annotations of Kopf-based operators, they all should be marked or tainted, and then these marks/taints should be used to ignore them.

Such detection should be independent of the operator's identity, i.e. shared by all operators made on this framework that produces these annotations.


Why so?

Looking back into history, the annotations store the information that previously (before #331) was stored in the status stanza, which is fully ignored when extracting the body's essence and diff — unless some specific fields are explicitly marked as of interest by having a field-handler for that field.

The state-persisting annotations should behave the same: they should all be ignored, no matter from which operator they come, and only be noticed when explicitly monitored. This is not the case now.


Exclusion logic:

In this PR, the following logic of annotation detection is used:

  • Ignore all annotations that have a prefix, which contains "kopf" as a separate item (dot- or dash-separated): e.g. kopf.zalando.org/*, my-kopf-operator.com/*, etc. UPD: Narrowed to the following, to allow mentioning Kopf without losing the annotations from diffs, and to keep to specific namespaces only. After all, it can be just a coincidence that the annotations contain "kopf".
  1. Reserved prefixes: Ignore all annotations that have a prefix which is a reserved prefix, or a subdomain of that reserved prefix. Currently, only kopf.zalando.org is reserved, but can be extended. Examples: kopf.zalando.org/*, demo.kopf.zalando.org/*, etc. — but not kopf.my-operator.com/*, my-kopf-operator.com/* or com.example.kopf/*.

  2. Arbitrary prefixes: Ignore all annotations that have a prefix, for which a standalone annotation named */kopf-managed exists: e.g. example.com/kopf-managed. The value of the marking annotation is irrelevant for detection. This annotation is automatically created every time any other annotation with that prefix is stored by Kopf, but only if the annotation is not identifiable by the prefix itself (i.e. the annotation is not detectable by the 1st rule).

This logic only works with prefixed annotations.

For non-prefixed annotations, the implementation would be too complex. Instead, the non-prefixed storages (both diffbase & progress) are deprecated, and a warning is issued when the prefix is set to None. By default, the prefix is kopf.zalando.org, so it would work "out of the box" without any issues as before.

An example (based on a code snippet from #517):

metadata:
  annotations:
    another.com/kopf-managed: "yes"
    another.com/last-handled-configuration: |
      {"spec":{"duration":"1m","field":"value","items":["item1","item2"],"x":400},"metadata":{"labels":{"somelabel":"somevalue"},"annotations":{"someannotation":"somevalue"}}}
    example.com/kopf-managed: "yes"
    example.com/last-handled-configuration: |
      {"spec":{"duration":"1m","field":"value","items":["item1","item2"]},"metadata":{"labels":{"somelabel":"somevalue"},"annotations":{"someannotation":"somevalue"}}}
    example.com/update_fn: '{"started":"2020-09-09T20:49:34.304534","delayed":"2020-09-09T20:49:34.753660","retries":4,"success":false,"failure":false,"message":"None","subrefs":["update_fn/s1","update_fn/s2","update_fn/s2/z1","update_fn/s2/z2","update_fn/s3","update_fn/s4","update_fn/s4/z1","update_fn/s4/z2"]}'
    example.com/update_fn.s1: '{"started":"2020-09-09T20:49:34.306154","stopped":"2020-09-09T20:49:34.308542","retries":1,"success":true,"failure":false}'
    example.com/update_fn.s2: '{"started":"2020-09-09T20:49:34.306256","stopped":"2020-09-09T20:49:34.604117","retries":2,"success":true,"failure":false,"subrefs":["update_fn/s2/z1","update_fn/s2/z2"]}'
    example.com/update_fn.s2.z1: '{"started":"2020-09-09T20:49:34.461745","stopped":"2020-09-09T20:49:34.464269","retries":1,"success":true,"failure":false}'
    example.com/update_fn.s2.z2: '{"started":"2020-09-09T20:49:34.461850","stopped":"2020-09-09T20:49:34.602632","retries":1,"success":true,"failure":false}'
    example.com/update_fn.s3: '{"started":"2020-09-09T20:49:34.604924","stopped":"2020-09-09T20:49:34.606769","retries":1,"success":true,"failure":false}'
    example.com/update_fn.s4: '{"started":"2020-09-09T20:49:34.605008","delayed":"2020-09-09T20:49:34.752832","retries":1,"success":false,"failure":false,"message":"None","subrefs":["update_fn/s4/z1","update_fn/s4/z2"]}'
    example.com/update_fn.s4.z1: '{"started":"2020-09-09T20:49:34.749341","stopped":"2020-09-09T20:49:34.751653","retries":1,"success":true,"failure":false}'
    example.com/update_fn.s4.z2: '{"started":"2020-09-09T20:49:34.749447","retries":0,"success":false,"failure":false}'
    someannotation: somevalue

Reproduction:

Start 2 operators in parallel:

import kopf

@kopf.on.startup()
def cfg(settings: kopf.OperatorSettings, **_):
    settings.persistence.progress_storage = kopf.AnnotationsProgressStorage(prefix='example.com')
    settings.persistence.diffbase_storage = kopf.AnnotationsDiffBaseStorage(prefix='example.com')

@kopf.on.update("zalando.org", "v1", "kopfexamples")
async def update_fn(**_):
    pass
import kopf

@kopf.on.startup()
def cfg(settings: kopf.OperatorSettings, **_):
    settings.persistence.progress_storage = kopf.AnnotationsProgressStorage(prefix='another.com')
    settings.persistence.diffbase_storage = kopf.AnnotationsDiffBaseStorage(prefix='another.com')

@kopf.on.update("zalando.org", "v1", "kopfexamples")
async def update_fn(**_):
    pass

With the unfixed code: The "ping-pong" issue is observed immediately: the object size increases to approximately 188KB in a matter of a second, and the operator becomes unable to PATCH the resource since the API fails with "Unprocessable Entity" — and the throttling begins.

With the fixed code: Go to kopf.storage.conventions.StorageKeyMarkingConvention and modify the "known marker" (from kopf-managed to kopf-managed-2). Observe the issue as if the code is unfixed. Once the marker is restored, all those garbage annotations disappear on the first successful handling cycle.


Risks:

If the user modifies the resource and deletes the */kopf-managed annotation in case of a non-Kopf-branded prefix, this would immediately notify all Kopf-based operators about a change, and they could "notice" the addition of all annotations in that group. It is the same kind of risk, as deleting the */last-handled-configuration, which would trigger the on-creation handlers. It is expected that the users do not modify the annotations they do no understand.

The existing state-persisting annotations remain the same, so both the upgrades from any previous versions are possible, and the downgrades (rollbacks) to any other previous version are possible.


Discarded ideas:

There was a draft implementation which added kopf-- infixes to the annotation names (e.g. example.com/kopf--handler1.subhanderA-HaSh) and used that for detection both of the Kopf-based prefixes and even non-prefixed annotations. The solution was quite complex, but what is worse: it would require storing duplicate annotations even for the */last-handled-configuration (i.e. */kopf--last-handled-configuration). This would pollute the annotations for the time of upgrade/downgrade for no good reason, would break the upgrade path through versions, and would steal extra 6 characters from the 63 available for the annotation names. This schema didn't have any significant advantages over having just one Kopf-branding annotation per prefix.

Another idea was to inject the markers into the content of the annotations, such as //kopf in the end or #kopf in the beginning of the annotation's value. There are no promises on what exactly and how exactly is stored in the annotations, so there is no need for it to be a JSON. However, JSON is a common convention and we should avoid breaking it. This approach would also break backward compatibility.

Another idea was to explicitly configure the operators to be aware of other operators by listing their prefixes. This wouldn't work: the developers (1) do not want to control the operators' environment that closely, and (2) are not able to control the environment with some third-party Kopf-based operators involved. It all should work "out of the box".

Issues/PRs

Issues: #372

Related: #538

Type of changes

  • Bug fix (non-breaking change which fixes an issue)
  • Refactoring (non-breaking change which does not alter the behaviour)

Checklist

  • The code addresses only the mentioned problem, and this problem only
  • I think the code is well written
  • Unit tests for the changes exist
  • Documentation reflects the changes
  • If you provide code modification, please add yourself to CONTRIBUTORS.txt

It is now configured with prefix + key of the `AnnotationsDiffBaseStorage`,
so that it can vary between operators with different identities.

It is also filtered out by generic rules of Kopf-based operators detection:
via the `kopf.` part in the prefix.
@nolar nolar merged commit d03f24c into master Sep 9, 2020
@nolar nolar deleted the other-kopfs-awareness branch September 9, 2020 21:11
@lgtm-com
Copy link

lgtm-com bot commented Sep 9, 2020

This pull request fixes 1 alert when merging 62de138 into 576ec87 - view on LGTM.com

fixed alerts:

  • 1 for Incomplete URL substring sanitization

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant