Is resume called before create/update? #112

kopf-archiver · 2020-08-18T19:44:34Z

An issue by Jc2k at 2019-06-14 11:18:53+00:00
Original URL: zalando-incubator/kopf#112

Hi!

I have a few questions about resume handlers:

Are there any guarantees about call order with .on.resume vs update and create?
Will resume get called always, or could sometimes update be called instead?
Is it possible to know that kopf has finished processing the current set of resumes? As in, has it finished start up and is now watching and (crucially) waiting for events?
Is there any way where I could see a resume event after i've processed a delete event?

I'd like to populate a cache with all instances of my CRD at start up, then keep it up to date by following the created/update/delete events. I'd like to be able to know that my cache is in a good state before i start processing events so that I can reference it. Is this doable?

Commented by nolar at 2019-06-14 14:20:12+00:00

Neither before, nor after, but as part of create/update. Usually, the handlers are called in the order they are declared. Resume handlers are then just mixed-in into the list of handlers to call, if this is an operator restart.

The order of handlers can be additionally controlled with the handlers lifecycles. It seems, they are not documented (accidentally); but they ARE the public interface. These ones: https://kopf.readthedocs.io/en/latest/packages/kopf.reactor.lifecycles/. Usage example:

import kopf

kopf.set_default_lifecycle(kopf.lifecycles.one_by_one)

@kopf.on.whatever...

Or you can make your own callback and control the order as you wish (and store the state in the status).

I will document the handler ordering control a bit later.

Commented by nolar at 2019-06-14 14:20:25+00:00

Both resume+update, or resume+create handlers will be called. However, if they point to the same function, they will be de-duplicated, and that function will be called only once. E.g.:

import kopf

@kopf.on.resume('zalando.org', 'v1', 'kopfexamples')
@kopf.on.create('zalando.org', 'v1', 'kopfexamples')
@kopf.on.update('zalando.org', 'v1', 'kopfexamples')
def fn(spec, **kwargs):
    print(spec)

The fn function will be called only once per event, be that creation or update or operator restart (i.e. object resuming).

import kopf

@kopf.on.resume('zalando.org', 'v1', 'kopfexamples')
def fn1(spec, **kwargs):
    print(spec)

@kopf.on.create('zalando.org', 'v1', 'kopfexamples')
def fn2(spec, **kwargs):
    print(spec)

@kopf.on.update('zalando.org', 'v1', 'kopfexamples')
def fn3(spec, **kwargs):
    print(spec)

In this case, either fn1+fn2 or fn1+fn3 will be called on operator restart if the object pre-existed. Or just fn2 or fn3 if the event happened when the operator was up (fn1 will not be called, as there was no "resuming" of anything).

Commented by nolar at 2019-06-14 14:20:35+00:00

No, it is not possible at the moment to know when Kopf has finished the initial listing and started watching.

And I'm not sure if this is at all conceptually possible: the handlers are per-object, not per-CRD; the operator could finish the initial listing and start watching, but the resuming handlers can continue being executed (e.g. retried; or just queued and waiting for some objects (of there are many)).

As I write this, I've realised there is a bug in the implementation: if the resume handler fails, it will never be retried as other handlers are retried, because any retry will be a regular status-update event, not the initial listing; same for the resume handlers if they are not the first ones. Created #113 out of this (seems easily fixable).

Commented by nolar at 2019-06-14 14:20:43+00:00

The resuming will not happen after deletion or as part of deletion. if the object is gone (or is going to be gone soon), there is nothing to resume/monitor/handle. This can be seen in the kopf.reactor.causation module — the deletion events & states are the first-catchers: the routine does not continue further if the object is in deletion state or really deleted.

Why should it do so? What is the use-case?

Commented by nolar at 2019-06-14 14:21:12+00:00

The in-memory caching is probably doable (in theory). I didn't try this yet. Usually, I keep the state on the objects themselves, not in memory (e.g. children's labels referring to the parents, or parent's status fields referring to the children).

If there will be any problems, please let me know.

Commented by Jc2k at 2019-06-14 14:31:44+00:00

Thanks for the detailed answers!

I don't have a use case for delete+resume, i just wanted to make sure it wasn't a case I had to handle - glad I dont!

Commented by nolar at 2019-11-13 18:17:56+00:00

kopf==0.23rc1 was pre-released (see the release notes). It fixes a lot of things with the on-resume handlers in #230:

There can be more than one on-resume handler (previously only the first one was executed).
They can go after the on-create/on-update handlers (previous worked only if they were the first in row).
Arbitrary or temporary errors in the on-resume handlers are now retried, as for all other handlers (previously, were ignored).
The sub-handlers should now be possible too (I didn't check though — but the preventing issue is the same as for all of the above).
And they are not repeated anymore, once the object was resumed once.

The order of execution is the same as before — mixed with regular handlers in the order of appearance.

As it turned out, contrary to what I said above, the on-resume handlers CAN be called when the object is marked for deletion, and they actually were supposed to be called before — just never got that far to be actually executed due to the issues mentioned above.

I am now in confusion on the desired behaviour.

On the one hand, such behaviour is more expected: on-resumes should happen when the operator restarts, the object does exist (no matter if it is marked for deletion or not), and the deletion handlers are yet to be executed. And the execution can take some time due to retries.

Skipping the on-resumes when the object is marked for deletion can have undesired side-effects: the object DOES exist, but the operator does not know it after the restart (unlike for any objects that are not marked for deletion). And the deletion handlers can in fact execute for long time (due to retries) — and the object still exists at this time.

On the other hand, the deletion handlers is a natural place for cleaning up the system resources allocated for the object. E.g., threads, tasks.

If the on-delete and on-resume handlers are mixed in this case, the resources are allocated and deleted fast enough and with no need (in the best case), or can be allocated in on.resume() AFTER the release happened in on.delete() (worst case), thus leading to the memory leaks.

The solution to this would be to check if the object is deleted or not while allocating the resources in the on-resume handlers. But this leads to unnecessary code when the behaviour is in most cases to "ignore" the handler (which violates the Kopf's mission of being simple and intuitive).

Commented by nolar at 2019-11-14 11:34:02+00:00

Decided to go both ways (#233): Skip the on-resume handlers normally on deletions. But make it possible to mark them as deletion-safe (on the developer's responsibility).

Pre-released as kopf==0.23rc2

Commented by Jc2k at 2019-11-14 11:42:49+00:00

Nice! Thanks for the update.

One of the reasons i asked about this behaviour is because i wanted to construct a cache of the current state in memory in my operator. I noticed you are now doing this internally for kopf. Is that accessible from operator code?

Commented by nolar at 2019-11-14 12:49:38+00:00

Jc2k It was made internal-only for beginning (as part of this massive refactoring release).

It is relatively easy now to add an extra field to the ResourceMemory class with arbitrary user fields, with the same semantics as threading.local does, and pass it as memo into the handler's kwargs.

Though, I have some fear that it will be abused by the operator developers to store the data that should be persistent and stored on the resource's status instead. — And this is why I didn't expose it initially.

On second thought, it will be their problem then. There is anyway a plenty of other ways to mis-design something.

Implemented in #234. Released as kopf==0.23rc3.

The text was updated successfully, but these errors were encountered:

nolar · 2021-02-06T00:37:23Z

I assume everything is answered in this question.

kopf-archiver bot added the archive label Aug 18, 2020

kopf-archiver bot closed this as completed Aug 18, 2020

kopf-archiver bot changed the title ~~[archival placeholder]~~ Is resume called before create/update? Aug 19, 2020

kopf-archiver bot added the question Further information is requested label Aug 19, 2020

kopf-archiver bot reopened this Aug 19, 2020

This was referenced Aug 19, 2020

[PR] Skip resumes for deleted objects, unless explicitly marked for selection #233

Closed

[PR] Use memo for arbitrary per-resource payload during operator lifetime #234

Closed

nolar closed this as completed Feb 6, 2021

sajuptpm mentioned this issue Mar 16, 2021

on.resume update/create only failed objects #716

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Is resume called before create/update? #112

Is resume called before create/update? #112

kopf-archiver bot commented Aug 18, 2020 •

edited

nolar commented Feb 6, 2021

Is resume called before create/update? #112

Is resume called before create/update? #112

Comments

kopf-archiver bot commented Aug 18, 2020 • edited

nolar commented Feb 6, 2021

kopf-archiver bot commented Aug 18, 2020 •

edited