[omdb] add basic support for activating background tasks #5615

sunshowers · 2024-04-25T08:46:28Z

This does its basic job of activating a background task, e.g.
inventory_collection.

It is a little unsatisfying because it's a bit hard with the current structure
to do things like:

if a task is currently running, wait until that is done
wait until the task we triggered is finished
return some kind of identifier (e.g. collection ID for inventory_collection)

Providing this kind of progress reporting is one of the kinds of problems we
solved with the update engine, and I'm wondering if it makes sense to try and
integrate that at some point.

Fixes #5058.

Created using spr 1.3.6-beta.1

dev-tools/omdb/src/bin/omdb/nexus.rs

nexus/src/internal_api/http_entrypoints.rs

davepacheco · 2024-04-25T18:16:29Z

Providing this kind of progress reporting is one of the kinds of problems we
solved with the update engine, and I'm wondering if it makes sense to try and
integrate that at some point.

Yeah, I've also kind of wanted what you described (a way to wait for the next activation to complete that was triggered after some point). I keep talking myself out of it. I think we want to look at the specific use cases really carefully. Background tasks exist in part to carry out operations that can take a really long time or are fairly likely to experience transient errors (e.g., because they operate on all sleds). They may take many laps to finish applying a change. In the meantime, more changes may accumulate. This makes it hard to have a notion of linear progress, though it's still important to provide clear status (e.g., "of the 3 DNS servers, 2 are at version 8, which is the latest, and 1 is at version 7, which is an hour old").

I suspect that what most programmatic consumers want is not to wait for a specific activation to complete but rather for some specific set of changes to be applied, which could take many laps. (Complicating it further, they might also want to give up waiting if there was a successful lap and the change was not applied.)

sunshowers · 2024-04-25T19:38:42Z

I suspect that what most programmatic consumers want is not to wait for a specific activation to complete but rather for some specific set of changes to be applied, which could take many laps. (Complicating it further, they might also want to give up waiting if there was a successful lap and the change was not applied.)

I think I generally agree with this model for programmatic consumers of background tasks (especially as compared to wicketd, which is much more of a foreground/oneshot approach).

I think humans would benefit greatly from being able to wait until the next activation after the trigger point completes. As a human looking at a live system, it can be hard to express exactly the constraint one is waiting for. (You could write Rust code for specific cases, like waiting for a new sled to show up in inventory for the first time, but in general I think you'd need some kind of dynamic query language that covers everything operators could reasonable ask for.)

In general, I think transient errors like not being able to talk to a sled will be quite rare -- and if an operator or support tech is actively monitoring the situation, they can observe those errors and kick off another run of the background task.

The update engine can also be used for post-mortem analysis as long as the generated events are written to a log file -- we already have code that can read and replay logs in a nice fashion, that we've integrated into wicketd.

davepacheco · 2024-04-25T19:55:56Z

I think humans would benefit greatly from being able to wait until the next activation after the trigger point completes. As a human looking at a live system, it can be hard to express exactly the constraint one is waiting for. (You could write Rust code for specific cases, like waiting for a new sled to show up in inventory for the first time, but in general I think you'd need some kind of dynamic query language that covers everything operators could reasonable ask for.)

Is this something where we could leave it to the human to poll by hand on whatever condition they care about (e.g., watch --chgexit omdb db inventory list)?

I also think it would be fine to have richer debugging for tasks like you're suggesting. We could have APIs to list recent activations and their statuses and/or to wait for an activation to complete. Or to wait for activation N to complete. I just think there's a ton of stuff we could do here (so there's some risk of scope creep) and some of it could be easily misused to build brittle stuff. So I'm wondering how big a pain point it currently is. If it is painful today in dev then yeah maybe we should do it and see if we can make sure people don't accidentally use it to build programmatic consumers that ought to be checking other conditions instead.

Created using spr 1.3.6-beta.1

sunshowers · 2024-04-25T22:36:22Z

Yeah, let's see how it plays out over the next while as we add more functionality to our background task system, and decide from there.

sunshowers · 2024-04-25T22:47:55Z

Need to wait until 5620 lands.

I realized that we have this wonderful `SledFilter` enum lying around, and we can just use it in omdb. I decided to shorten "eligible-for-discretionary-services" to "discretionary", which I hope communicates the same meaning but in a shorter manner. Depends on #5615.

Created using spr 1.3.6-beta.1

[𝘀𝗽𝗿] initial version

347c91e

Created using spr 1.3.6-beta.1

sunshowers mentioned this pull request Apr 25, 2024

Testing: Expunge Sled #5480

Closed

sunshowers added 2 commits April 25, 2024 01:53

add nexus blueprints and nexus sleds

c389177

Created using spr 1.3.6-beta.1

clippy

237061a

Created using spr 1.3.6-beta.1

sunshowers requested review from davepacheco, jgallagher and andrewjstone April 25, 2024 09:45

sunshowers mentioned this pull request Apr 25, 2024

[omdb] show sled policy and state, allow application of filter #5620

Merged

jgallagher reviewed Apr 25, 2024

View reviewed changes

dev-tools/omdb/src/bin/omdb/nexus.rs Show resolved Hide resolved

nexus/src/internal_api/http_entrypoints.rs Outdated Show resolved Hide resolved

address review comments

74cb252

Created using spr 1.3.6-beta.1

andrewjstone approved these changes Apr 25, 2024

View reviewed changes

sunshowers mentioned this pull request Apr 25, 2024

[omdb] improve CLI UX #5627

Merged

rebase

6ec688a

Created using spr 1.3.6-beta.1

sunshowers enabled auto-merge (squash) April 26, 2024 17:52

sunshowers merged commit bd40fc8 into main Apr 26, 2024
21 checks passed

sunshowers deleted the sunshowers/spr/omdb-add-basic-support-for-activating-background-tasks branch April 26, 2024 19:08

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[omdb] add basic support for activating background tasks #5615

[omdb] add basic support for activating background tasks #5615

sunshowers commented Apr 25, 2024 •

edited

Loading

davepacheco commented Apr 25, 2024

sunshowers commented Apr 25, 2024 •

edited

Loading

davepacheco commented Apr 25, 2024

sunshowers commented Apr 25, 2024

sunshowers commented Apr 25, 2024

[omdb] add basic support for activating background tasks #5615

[omdb] add basic support for activating background tasks #5615

Conversation

sunshowers commented Apr 25, 2024 • edited Loading

davepacheco commented Apr 25, 2024

sunshowers commented Apr 25, 2024 • edited Loading

davepacheco commented Apr 25, 2024

sunshowers commented Apr 25, 2024

sunshowers commented Apr 25, 2024

sunshowers commented Apr 25, 2024 •

edited

Loading

sunshowers commented Apr 25, 2024 •

edited

Loading