Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce the concept of a WASM manifest; use cacache for local storage #39

Merged
merged 6 commits into from Feb 3, 2023

Conversation

ceejbot
Copy link
Contributor

@ceejbot ceejbot commented Feb 2, 2023

This PR reworks the agent API and the cli to support a new concept of a "manifest". This is a toml specification of a WASM job type. It includes a name, a namespace, a path to a binary, and some human-readable descriptions.

Here's an example manifest:

name = "loudify"
namespace = "sh.serval"
binary = "../../wasm-samples/build/loudify.wasm"
version = "1"
description = "SHOUT SHOUT LET IT ALL OUT"
required_extensions = ["birdfeeder"]

Once a WASM manifest + executable has been stored in the system, you can execute that wasm using the
name in the manifest: pounce run sh.serval.loudify inputfile.txt

The local blob storage has been reimplemented to use the cacache
crate. This is a content-adddressable store that also allows lookup of items by key. In our case
we store manifests with keys like name.manifest.toml and wasm executables with names like
name.wasm. Execution of jobs is now more indirect than it was, in that we look up the manifest by
name, and from that look up where the binary is by version. This indirection allows us to support
versioning more fully in the long term.

This PR also significantly reworks the agent API and changes the CLI to support this new API. The agent endpoints now look like this:

  • GET /monitor/ping: cheap health check
  • POST /v1/jobs/:name/run: body is interpreted as job input
  • GET /v1/storage/manifests: fetch a list of manifests
  • POST /v1/storage/manifests: store a new manifest
  • GET /v1/storage/manifests/:name: get a manifest by name
  • HEAD /v1/storage/manifests/:name: 404s if manifest not found
  • PUT /v1/storage/manifests/:name/executable/:version: store a new wasm executable associated with the named task manifest
  • GET /v1/storage/manifests/:name/executable/:version: get a wasm executable by manifest name & version
  • GET /v1/monitor/history: very cheap run history; to be deleted
  • GET /v1/jobs: list all running jobs; not implemented

Also, take a few minor updates to crates; revert change to the workspace syntax because it is making VSCode angry.

- Use the newly-published tokio version of cacache.
- Rework the blob storage module to use cacache.
- Extend its api a bunch because cacache lets us do more.
- Rework endpoints in the mesh agent to support the new
  slightly-larger API.

Also, take a few minor updates to crates; revert change to
the workspace syntax because it is making VSCode angry.

Dry-coded a new agent API on top of a new blob API

There is a new concept called a "Manifest" that has some of the fields
specified in @shinypb's job manifest spec. (I skipped a bunch just so I
could get started with the implementation.) The blob storage now
knows about wasm executables and manifests and can build type-aware
keys for each kind of data.

The agent endpoints now look like this:

`GET /monitor/ping`
    cheap health check
`POST /v1/jobs/:name/run`
    body is interpreted as job input
`GET /v1/storage/manifests`
    fetch a list of manifests
`POST /v1/storage/manifests`
    store a new manifest
`GET /v1/storage/manifests/:name`
    get a manifest by name
`HEAD /v1/storage/manifests/:name`
    404s if manifest not found
`PUT /v1/storage/manifests/:name/executable/:version`
    store a new wasm executable associated with the named task manifest
`GET /v1/storage/manifests/:name/executable/:version`
    get a wasm executable by manifest name & version
`GET /v1/monitor/history`
    very cheap run history; to be deleted
`GET /v1/jobs`
    list all running jobs; not implemented

The underlying blob store API supports this external API.

Up next: making the CLI use the new API.

IT'S ALIVE

The CLI can now store wasm jobs in the mesh (okay, in one node)
and then run those jobs only passing up input data.
This needs more than the seat-of-the-pants design I've given
it just now, but this pass removes redundant information.
Also, we now standardize on using the fully-qualified name
everywhere.

While I was there, introduced some `prettytable` to the CLI
to start formatting some displays nicely. This is throwaway
work in the long term, but it makes things a little nicer now.
@ceejbot ceejbot marked this pull request as ready for review February 2, 2023 22:37
) -> impl IntoResponse {
let mut lock = state.lock().await;
let storage = lock.storage.as_ref().unwrap();
let Ok(manifest) = storage.manifest(&name).await else {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this mean that only nodes that have storage locally available can run jobs?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At the moment, yes. I think the immediate next followup work is to make a node ask if if has the appropriate binary, and if not, find it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense. Thoughts on swapping BlobStore out for an abstract trait that has CacacheBlobStore as one implementation and RemoteBlobStore (something that proxies to a serval_storage instance) as the other? If that sounds reasonable to me I am happy to start chipping away at it.

@ceejbot ceejbot changed the title Squash of eight commits; new writeup needed Introduce the concept of a WASM manifest; use cacache for local storage Feb 2, 2023
executable: Vec<u8>,
/// Input data
input: Vec<u8>,
// TODO: might have version chosen to run here, plus run options; might also store the input
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, something like an Option<u16> to reference a specific version number (rather "latest") would make sense.

@@ -86,55 +87,56 @@ fn build_url(path: String, version: Option<&str>) -> String {
}

fn upload_manifest(manifest_path: PathBuf) -> Result<()> {
println!("Reading manifest at {}", manifest_path.display());
println!("Reading manifest: {}", manifest_path.display());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this is the equivalent of

println!("Reading manifest: {manifest_path:?}");

?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a debugging print, but converting a pointer to a pathbuf to its display version. I think it's cleaner; haven't looked at what the output is with the :? formatter.

pub fn name(&self) -> String {
self.fq_name.clone()
pub fn fq_name(&self) -> String {
let name = self.name.to_ascii_lowercase().replace('-', "_");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the - support just because of our wasm-samples naming convention?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should get a TODO next to it, because IMO it needs to be cleaned more definitively to restrict it to allowed characters.

cli/src/main.rs Outdated Show resolved Hide resolved
table.add_row(row!["WASM task name:", manifest.fq_name()]);
table.add_row(row!["Version:", manifest.version()]);

let url = build_url("storage/manifests".to_string(), Some("1"));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We pass Some("1") around often enough that a build_url_v1 helper function probably wouldn't be a bad idea at some point.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed! It annoyed me.

cli/src/main.rs Outdated Show resolved Hide resolved
@shinypb
Copy link
Contributor

shinypb commented Feb 2, 2023

This looks good to me, modulo the rustfmt error giving you grief. 😅 Calling out that landing this means that STORAGE_ROLE=never nodes can no longer run jobs, which (temporarily) means that we can't have nodes on more than one machine working together due to our lack of distributed storage.

@ceejbot ceejbot merged commit 8e4624d into main Feb 3, 2023
@ceejbot ceejbot deleted the ceej/cacache-in branch February 3, 2023 01:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants