Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Event Aggregation #117

Open
kopf-archiver bot opened this issue Aug 18, 2020 · 0 comments
Open

Event Aggregation #117

kopf-archiver bot opened this issue Aug 18, 2020 · 0 comments
Labels
archive enhancement New feature or request

Comments

@kopf-archiver
Copy link

kopf-archiver bot commented Aug 18, 2020

An issue by mzizzi at 2019-06-16 15:07:05+00:00
Original URL: zalando-incubator/kopf#117
 

First off, thanks for a getting this framework together. I've enjoyed hacking around :-)

This might be more of a request for the python kube client as it appears to be lacking event aggregation functionality similar to that found in the go client.

Occasionally an operator may get stuck in a retry loop. If many handlers are failing with retryable errors then a large number of events will be generated putting stress on etcd and making the output of kubectl get events very hard to work with.

Expected Behavior

Duplicate or "near duplicate" events are aggregated.

$ oc get events
LASTSEEN   FIRSTSEEN   COUNT     NAME         KIND      SUBOBJECT   TYPE      REASON              SOURCE    MESSAGE
20m        20m         1234         my-custom-resource   MyCustomResource               Error     HandlerRetryError   kopf      Handler 'on_delete' failed. Will retry. ['errors']

Actual Behavior

Every event generated by kopf is a new event in kubernetes which, if I understand correctly, puts undue load on etcd.

$ oc get events
LASTSEEN   FIRSTSEEN   COUNT     NAME         KIND      SUBOBJECT   TYPE      REASON              SOURCE    MESSAGE
20m        20m         0         my-custom-resource   MyCustomResource               Error     HandlerRetryError   kopf      Handler 'on_delete' failed. Will retry. ['errors']
21m        21m         0         my-custom-resource   MyCustomResource               Error     HandlerRetryError   kopf      Handler 'on_delete' failed. Will retry. ['errors']
20m        20m         0         my-custom-resource   MyCustomResource               Error     HandlerRetryError   kopf      Handler 'on_delete' failed. Will retry. ['errors']
20m        20m         0         my-custom-resource   MyCustomResource               Error     HandlerRetryError   kopf      Handler 'on_delete' failed. Will retry. ['errors']
...

Steps to Reproduce the Problem

Write any handler that gets stuck in a retry loop and observe kubectl get events.

  1. Install any CRD into the cluster using a handler like the one below. (In this case you'd have to invoke the
    handler by creating a new "MyCustomResource")
@kopf.on.create('foo.bar', 'v1', 'my-custom-resources')
def on_create(**kwargs):
    raise HandlerRetryError(['errors'], delay=1)

Specifications

  • Platform: minishift
  • Kubernetes version:
    oc version
    oc v3.7.23
    kubernetes v1.7.6+a08f5eeb62
    features: Basic-Auth GSSAPI Kerberos SPNEGO
    
    Server https://192.168.42.153:8443
    kubernetes v1.11.0+d4cacc0
    
  • Python version:
    Python 3.7.3
    
  • Python packages installed:
    pip freeze --all
    aiohttp==3.5.4
    aiojobs==0.2.2
    async-timeout==3.0.1
    attrs==19.1.0
    cachetools==3.1.1
    certifi==2019.3.9
    chardet==3.0.4
    Click==7.0
    datadog==0.29.3
    decorator==4.4.0
    entrypoints==0.3
    flake8==3.7.7
    google-auth==1.6.3
    idna==2.8
    iso8601==0.1.12
    kopf==0.14
    kubernetes==9.0.0
    mccabe==0.6.1
    multidict==4.5.2
    oauthlib==3.0.1
    pip==19.0.3
    pyasn1==0.4.5
    pyasn1-modules==0.2.5
    pycodestyle==2.5.0
    pyflakes==2.1.1
    python-dateutil==2.8.0
    PyYAML==5.1.1
    requests==2.22.0
    requests-oauthlib==1.2.0
    rsa==4.0
    setuptools==40.8.0
    six==1.12.0
    urllib3==1.25.3
    websocket-client==0.56.0
    yarl==1.3.0
    

Commented by nolar at 2019-06-19 14:54:47+00:00
 

mzizzi Do you mean the in-memory event accumulation, aggregation, and then posting only the aggregated events every few seconds/minutes/events?

Or is this also about the event patching with "lastTimestamp", "count", and some other field updates? — Which implies one API request per event anyway, just PATCH rather that POST, but will make kubectl get events output shorter.


Commented by mzizzi at 2019-06-19 18:22:15+00:00
 

nolar Good question. I hadn't made the distinction when I originally posted the question.

After reading more into how the go client works.. It uses a combination of rate-limiting, in memory caching, and event patching. That solves both potential issues that you highlighted:

  • Load introduced by many POST/PATCH requests for events
  • Load due to excessive amounts of events being stored in kube

Incorporating some (or all!) of these features will help us create well-behaved Operators.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
archive enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

0 participants