Skip to content

Conversation

chrisdickinson
Copy link
Contributor

@chrisdickinson chrisdickinson commented Oct 26, 2023

web workers pr

Apologies for the major surgery here! If you've ever worked with the Worker API in JS before, this will come as no surprise to you to hear: it's not the most ergonomic API to use, even in the best of cases. That only gets trickier when you're working within a package that targets multiple platforms.


Why even

Why do we want workers? Unlike other SDK platforms, JavaScript is single-threaded by default with few mechanisms for dealing with other threads. Those mechanisms frequently don't compose well with typical applications -- it's fairly complicated to take a React app or Node server and add threads "after the fact".

At the same time, contention over that main thread is a primary concern when writing JavaScript applications. Apps that spend a lot of time on the main thread without yielding to the reactor appear broken: overtaxed Node servers fall over under traffic because they can't service new incoming requests, overtaxed browser apps appear to "lock up".

Given that Wasm plugins are likely to be compute-heavy, running them off of the main thread seems imperative. The host function interface puts even more pressure on this: without running the guest plugin in a thread, host functions cannot be asynchronous. Most JavaScript operations are async, however, which severely limits what can be accomplished in synchronous host functions (or requires plugin hosts to define a polling API.)

This is what leads me to believe we want to run Extism plugins to run off of the main thread.

Design decisions to highlight

I went in with the following bullet points:

1. differences between browser, deno, and node should be pushed to the compilation layer as
   much as possible
    - if the browser and node need to import different things, then we're using different mappings for the
      same source text during compilation
2. unit tests represent a different compile target & resulting artifact (a la cargo test)
3. integration tests should test using the compiled artifacts from the outside
4. we are using typescript and esbuild
5. I would like to use the built-in test tooling where available without repeating test content
6. when in doubt, use the web platform API (or the minimal subset thereof)
7. the API is (still) fluid, but we should produce the same types (insofar as is possible) for each target
8. when you get offended by the state of JS tooling, use the provided screaming bucket (screaming bucket omitted)

I'll highlight some outcomes that might bear more conversation:

The build process

I've included an DEVELOPMENT.md file with more details, but the TL;DR is: this PR rewrites the library in TypeScript that's "mutually intelligble" across all of our platform targets. That means we:

  • Treat Deno as an interpreted target that refers to the web worker by relative path
  • Treat the web worker as two separate compiled targets (Node and Browser)
  • Treat Node as two separate compiled targets (CJS and ESM) that embed the web worker
  • Treat the browser as a compiled target that embeds the web worker
  • ...And then multiply that by all the tests.

The upside is that the library and tests are written once, with polyfills inserted at module boundaries. The downside is that you'll probably look at the justfile that orchestrates this build process and, um, maybe gag a little? I'd be happy to talk more about what's going on there if y'all are interested.

The runtime

The aforementioned DEVELOPMENT.md touches on the runtime implementation as well. In this case, the rules around sharing array buffers mean we're back to using an extism runtime implemented in JS. This runtime allocates blocks in a list held by JS; it encodes the block index into the return address by using the high 16 bits of the address. (I figured, maybe incorrectly, that most concrete ISAs only really use the lower 48-bits of a 64-bit pointer, so we should be able to get away with that too. Please let me know if you can think of any pitfalls!)

The other maybe-contentious thing: I kept mixing up Plugin and CurrentPlugin, so this PR calls CurrentPlugin CallContext instead. The exposed APIs are readString and store.

The plugin

There are two plugin implementations: src/foreground-plugin.ts, which runs on the current thread (whether that's the main or worker thread) and src/background-plugin.ts, which manages a worker thread, represented by src/worker.ts. The worker creates its own "foreground plugin". It wires up all incoming host functions to block the worker thread until the host notifies the worker to wake using Atomics.notify. Because of the rules around SharedArrayBuffer, moving from the host to the guest requires copying any blocks allocated during host execution from the host thread down to the guest thread using a 64KiB shared scratch space.

The other implication here is that, when a background plugin is servicing a call() request, no other threads can call into it to avoid a potential deadlock. In practice, this means that users will want to run this behind a Pool of some sort (and we may even want to provide an API for this in the near-ish future.) This is, however, a good argument for storing long-lived data in CallContext vars instead of in Wasm module static memory.

Tests

Tests are collocated alongside the source. Right now, that's src/mod.test.js, but this could be expanded. Tests are "write once, run in multiple contexts": Deno interprets tests, Node runs against the compiled versions of the tests, and the browser runs the tests as polyfilled by tape using playwright/test. These tests run against Deno, Node, Firefox, Chromium, and WebKit.

If these tests pass, integration "artifact" tests kick in: we run npm install against the tarball generated by npm pack in a separate directory. We then ensure that the exports for both ESM and CJS are as we expect them and that we can run wasm/code.wasm successfully.

The examples are updated to match. The examples can now be run using ./examples/node.js and ./examples/deno.ts, respectively.

- 2023-10-20  6:00pm: hey, the tests pass now
- 2023-10-20 11:00pm: hey! we're passing data around
  - tomorrow:
    - implement host->guest and host->worker->guest calls
    - create a wasi
- 2023-10-21 10:40am: stuff all of our internal config into an interface
  - I kind of think we want a `wasi: WASI | boolean` flag instead of `useWasi: boolean`
- 2023-10-21  1:09am: update deps
- 2023-10-22 10:51pm: whew! lots of stuff working now! next up: host functions!
- 2023-10-23  2:51pm: okay, I have to do some more major surgery: callcontext needs to
   		      be instantiated per invocation and live canonically on the host
		      thread
- 2023-10-25 12:40pm: blocks are transferred between the main and background threads
		      as-needed now. what a ride.
- 2023-10-25 11:08pm: all of the tests from the original deno test suite pass now.
- 2023-10-26  9:57am: we lint!
- 2023-10-26 10:26am: all of the examples work!
- 2023-10-26 12:51pm: add DEVELOPING.md, artifacts tests
@chrisdickinson chrisdickinson force-pushed the chris/20231020-to-workers branch from 73fefcd to a0e515d Compare October 26, 2023 21:01
@chrisdickinson chrisdickinson force-pushed the chris/20231020-to-workers branch from 15abf6a to e4b9e8d Compare October 27, 2023 00:39
Copy link
Contributor

@wikiwong wikiwong left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Love what you did with the place 😍

Will hold off on approving until it's out of draft

src/mod.ts Outdated
*/
export interface ExtismPluginOptions {
useWasi?: boolean | undefined;
offMainThread?: boolean | undefined;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This variable name makes sense, but when said verbally, I feel like saying something runs "off of the main thread" could be interpreted as running "on the main thread" 😆

Maybe runInWorker ? Also fine with keeping as is

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ooh, yes, great suggestion! I tripped over offMainThread a lot too.

src/manifest.ts Outdated
* @throws {@link TypeError} when `URL` parameters don't resolve to a known `content-type`
* @throws {@link TypeError} when the resulting {@link Manifest} does not contain a `wasm` member with valid {@link ManifestWasm} items.
*/
export type ManifestLike = Manifest | ArrayBuffer | string | URL;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it's necessary for 1.0, just more of a reminder that we should port over the ability to initialize from a pre-compiled module (WebAssembly.Module) in order to work in Cloudflare Workers. Here's how it was added to pre-1.0 :extism/extism@e1a5b01

src/mod.ts Outdated
initialize(instance: WebAssembly.Instance): Promise<void>
}

export default async function createPlugin(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: we tend to lean towards newPlugin in the other SDKs

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll export newPlugin as an alias of createPlugin if that's okay! (I went with createPlugin to follow the create* pattern from the web platform/node; like URL.createObjectURL, fs.createReadStream, and http.createServer.)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm good with just leaving create then. Less to document!

"typedoc": "^0.23.20",
"typescript": "^4.8.4"
},
"dependencies": {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it really true that we have no more external runtime dependencies?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's true! Even after adding minimatch, Node has no external runtime deps.

@mhmd-azeez
Copy link
Collaborator

wow, there is a lot of good stuff in this PR. especially making sure the http calls are not blocking. will review as soon as it gets out of draft

@chrisdickinson chrisdickinson changed the title wip: a threaded SDK A threaded SDK Nov 7, 2023
@chrisdickinson chrisdickinson marked this pull request as ready for review November 7, 2023 22:14
}
});

test('plugin can get/set variables', async () => {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bringing our chat from earlier into a comment here. Not required to merge, but it will be good to include tests that ensure that plugin variables get persisted across multiple plugin.calls

// wait for the thread to read the data out...
const result = AtomicsWaitAsync(this.flag, RingBufferWriter.SAB_IDX, targetOffset, MAX_WAIT);

// XXX(chrisdickinson): this is cürsëd code. Add a setTimeout because some platforms
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💀

// XXX(chrisdickinson): this is cürsëd code. Add a setTimeout because some platforms
// don't spin their event loops if the only pending item is a Promise generated by Atomics.waitAsync.
//
// - https://github.com/nodejs/node/pull/44409
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 thanks for linking

Copy link
Contributor

@wikiwong wikiwong left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really awesome work here! 🚀 🚀 🚀


export const BEGIN = Symbol('begin');
export const END = Symbol('end');
export const ENV = Symbol('env');
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be the updated namespace?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm working on that in a new PR, it requires updating the example plugins and a few other things.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this case env doesn't need to be changed – this is just a JS Symbol to provide protected, package-only access to the extism host functions.

Copy link
Collaborator

@mhmd-azeez mhmd-azeez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried out the examples, they work well, however, I couldn't run the tests:

extism-extism-1.0.0-rc1.tgz

error: Unsupported scheme "js-sdk" for module "js-sdk:wasi". Supported schemes: [
    "data",
    "blob",
    "file",
    "http",
    "https",
]
    at file:///mnt/d/dylibso/js-sdk/src/foreground-plugin.ts:3:26
mo@mo-dylibso:/mnt/d/dylibso/js-sdk$ node --version
node --version
v21.1.0

Overall, great work! this PR is a masterclass in JS tooling hahaha

src/worker.ts Outdated
import { parentPort } from 'node:worker_threads';

import {
EXTISM_ENV,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

EXTISM_ENV is imported, but not used

src/worker.ts Outdated
}
}

const _CONTEXT = new Reactor(parentPort);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_CONTEXT is also not used anywhere

@chrisdickinson
Copy link
Contributor Author

@mhmd-azeez, @wikiwong – thanks for the reviews!

@mhmd-azeez – I think that error is coming from Deno – I added minimum supported version info for Deno and Node to the README in a new commit.

@chrisdickinson chrisdickinson merged commit 7012e8d into main Nov 9, 2023
@chrisdickinson chrisdickinson deleted the chris/20231020-to-workers branch November 9, 2023 18:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants