Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[omdb] add basic support for activating background tasks #5615

Conversation

sunshowers
Copy link
Contributor

@sunshowers sunshowers commented Apr 25, 2024

This does its basic job of activating a background task, e.g.
inventory_collection.

It is a little unsatisfying because it's a bit hard with the current structure
to do things like:

  • if a task is currently running, wait until that is done
  • wait until the task we triggered is finished
  • return some kind of identifier (e.g. collection ID for inventory_collection)

Providing this kind of progress reporting is one of the kinds of problems we
solved with the update engine, and I'm wondering if it makes sense to try and
integrate that at some point.

Fixes #5058.

Created using spr 1.3.6-beta.1
Created using spr 1.3.6-beta.1
Created using spr 1.3.6-beta.1
@davepacheco
Copy link
Collaborator

Providing this kind of progress reporting is one of the kinds of problems we
solved with the update engine, and I'm wondering if it makes sense to try and
integrate that at some point.

Yeah, I've also kind of wanted what you described (a way to wait for the next activation to complete that was triggered after some point). I keep talking myself out of it. I think we want to look at the specific use cases really carefully. Background tasks exist in part to carry out operations that can take a really long time or are fairly likely to experience transient errors (e.g., because they operate on all sleds). They may take many laps to finish applying a change. In the meantime, more changes may accumulate. This makes it hard to have a notion of linear progress, though it's still important to provide clear status (e.g., "of the 3 DNS servers, 2 are at version 8, which is the latest, and 1 is at version 7, which is an hour old").

I suspect that what most programmatic consumers want is not to wait for a specific activation to complete but rather for some specific set of changes to be applied, which could take many laps. (Complicating it further, they might also want to give up waiting if there was a successful lap and the change was not applied.)

@sunshowers
Copy link
Contributor Author

sunshowers commented Apr 25, 2024

I suspect that what most programmatic consumers want is not to wait for a specific activation to complete but rather for some specific set of changes to be applied, which could take many laps. (Complicating it further, they might also want to give up waiting if there was a successful lap and the change was not applied.)

I think I generally agree with this model for programmatic consumers of background tasks (especially as compared to wicketd, which is much more of a foreground/oneshot approach).

I think humans would benefit greatly from being able to wait until the next activation after the trigger point completes. As a human looking at a live system, it can be hard to express exactly the constraint one is waiting for. (You could write Rust code for specific cases, like waiting for a new sled to show up in inventory for the first time, but in general I think you'd need some kind of dynamic query language that covers everything operators could reasonable ask for.)

In general, I think transient errors like not being able to talk to a sled will be quite rare -- and if an operator or support tech is actively monitoring the situation, they can observe those errors and kick off another run of the background task.

The update engine can also be used for post-mortem analysis as long as the generated events are written to a log file -- we already have code that can read and replay logs in a nice fashion, that we've integrated into wicketd.

@davepacheco
Copy link
Collaborator

I think humans would benefit greatly from being able to wait until the next activation after the trigger point completes. As a human looking at a live system, it can be hard to express exactly the constraint one is waiting for. (You could write Rust code for specific cases, like waiting for a new sled to show up in inventory for the first time, but in general I think you'd need some kind of dynamic query language that covers everything operators could reasonable ask for.)

Is this something where we could leave it to the human to poll by hand on whatever condition they care about (e.g., watch --chgexit omdb db inventory list)?

I also think it would be fine to have richer debugging for tasks like you're suggesting. We could have APIs to list recent activations and their statuses and/or to wait for an activation to complete. Or to wait for activation N to complete. I just think there's a ton of stuff we could do here (so there's some risk of scope creep) and some of it could be easily misused to build brittle stuff. So I'm wondering how big a pain point it currently is. If it is painful today in dev then yeah maybe we should do it and see if we can make sure people don't accidentally use it to build programmatic consumers that ought to be checking other conditions instead.

Created using spr 1.3.6-beta.1
@sunshowers
Copy link
Contributor Author

Yeah, let's see how it plays out over the next while as we add more functionality to our background task system, and decide from there.

@sunshowers
Copy link
Contributor Author

Need to wait until 5620 lands.

sunshowers added a commit that referenced this pull request Apr 26, 2024
I realized that we have this wonderful `SledFilter` enum lying around,
and we can just use it in omdb.

I decided to shorten "eligible-for-discretionary-services" to "discretionary",
which I hope communicates the same meaning but in a shorter manner.

Depends on #5615.
Created using spr 1.3.6-beta.1
@sunshowers sunshowers enabled auto-merge (squash) April 26, 2024 17:52
@sunshowers sunshowers merged commit bd40fc8 into main Apr 26, 2024
21 checks passed
@sunshowers sunshowers deleted the sunshowers/spr/omdb-add-basic-support-for-activating-background-tasks branch April 26, 2024 19:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Allow direct triggering of inventory collection?
4 participants