Skip to content

Custom JS resolver #196

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 5 commits into from
Closed

Conversation

dgp1130
Copy link
Contributor

@dgp1130 dgp1130 commented Jun 6, 2022

This adds a new bundleAsync() function with support for a custom JavaScript resolver. A resolver which re-implements the default Rust behavior looks like:

import { promises as fs } from 'fs';
import * as path from 'path';
import css from '@parcel/css';

await css.bundleAsync({
  filename: 'foo.css',
  // Custom resolver.
  resolver: {
    // Custom read function, takes a file path and returns its contents.
    async read(file: string): Promise<string> {
      return await fs.readFile(file, 'utf-8');
    },

    // Custom resolve function, takes an import specifier and the file it
    // originated from, and returns its full path which will be passed to `read()`.
    async resolve(specifier: string, originatingFile: string): Promise<string> {
      return path.join(originatingFile, '..', specifier);
    }
  },
});

This required some changes to the way work is parallelized to be compatible with async/await. I had to remove Rayon as it doesn't seem compatible with this model. This resulted in two still-unsolved issues (edit: these should both be resolved now, see #196 (comment)):

  1. Tests build, but at least one fails with a layering issue:
    ---- bundler::tests::test_bundle stdout ----
    thread 'bundler::tests::test_bundle' panicked at 'assertion failed: `(left == right)`
      left: `"@layer bar, foo;\n@layer qux, baz;\n\n@layer foo.baz {\n  div {\n    background: #ff0;\n  }\n}\n\n@layer foo {\n@layer qux {\n    div {\n      background: green;\n    }\n  }\n}\n\n@layer bar {\n  div {\n    background: red;\n  }\n}\n"`,
     right: `"@layer bar, foo;\n@layer foo.qux, foo.baz;\n\n@layer foo.baz {\n  div {\n    background: #ff0;\n  }\n}\n\n@layer foo {\n  @layer qux {\n    div {\n      background: green;\n    }\n  }\n}\n\n@layer bar {\n  div {\n    background: red;\n  }\n}\n"`', src/bundler.rs:1176:5
    
    I think the reason for this is because of subtle changes to loadFile() where I had to clone each rule during processing to resolve ownership issues. I'll leave a comment at the relevant place in the PR.
  2. I'm pretty sure removing Rayon means that everything is left on the main thread. It runs concurrently given that everything is async and joined with futures::future::join_all(), but as soon as one operation gets blocked all of them are blocked. Since we can't use Rayon for this, I suspect tokio::spawn() could serve the purpose of abstracting out all the details of how many threads to create and how to distribute jobs for them, but it ran into similar ownership issues from 1. and I couldn't find a good way to get it working.

Any ideas or suggestions for how to resolve these issues would be greatly appreciated.

This implementation takes advantage of napi-rs ThreadsafeFunction because JS data is restricted to the main thread, the Rust processing is mostly happening on background threads. This queues any requested invocations and waits for the main thread to become available (effectively adding the call to the JS event queue). Once ready, the JS function is invoked. The return value is dropped as napi-rs doesn't seem to do anything with it, however Parcel passes in a callback function for the JS to invoke with the result, using Node callback conventions (callback(err, result)). This means the actual JS contract is:

function read(file: string, cb: (err: any, result: string) => void): void {
  fs.readFile(file, 'utf-8').then(
    (res) => cb(null, res),
    (err) => cb(err, null),
  );
}

function resolve(specifier: string, originatingFile: string, cb: (err: any, result: string) => void): void {
  cb(null, path.join(originatingFile, '..', specifier));
}

This is unergonomic, so a small JS shim wraps the Rust implementation of bundleAsync() and converts this contract to the Promise-based one mentioned earlier. This makes custom resolvers much easier to use while still adhering to napi-rs' required calling conventions.

I included tests for the new behavior, though there doesn't seem to be much existing JS test infrastructure, so it's a bit primitive for now and mostly aligns with the existing JS test of transform(). They can be executed with npm run build && node test-bundle.js. Not sure if there's a better setup for this.

bundle() is unaffected because communicating between threads in a synchronous-compatible manner is quite tricky and not in-scope here. However, it does throw if given a resolver, since that's indicative of user-error.


drop(stylesheets); // ensure we aren't holding the lock anymore
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Everything above this point in the function is effectively unchanged, the diff looks worse than it is. All I did was remove drop(stylesheets) because this wasn't good enough to prove to Rust that stylesheets isn't accessible past the await boundaries. Instead I made a scope that returned source_index and kept stylesheets there so they are fully out of scope before the await.

@dgp1130 dgp1130 force-pushed the async-resolver branch 2 times, most recently from abcbd4f to 85db0c6 Compare June 6, 2022 05:10
@dgp1130 dgp1130 mentioned this pull request Jun 6, 2022
@devongovett
Copy link
Member

Thanks! There's a lot here. I'll take a look soon.

dgp1130 added 5 commits June 24, 2022 19:11
)

The two `SourceProvider` methods are now `async` which will give a hook for JavaScript to implement them and be called from the main thread. Everything is `async` now where possible and `bundle()` executes the `Future` synchronously to maintain its contract. Since it won't support custom JavaScript resolvers, there should never be a case where `bundle()` can't execute synchronously.

Rayon doesn't seem to support `async` iterators, but also shouldn't be as necessary here. Removed it in order to call `async` APIs.
This is roughly equivalent to `bundle()`, except that is executes the `Future` *asynchronously*, returning a JS `Promise` holding the result.

Errors are formatted a bit differently since they just use the `Display` trait to convert to strings, rather than the `throw()` function which converts them to a JS `SyntaxError` in `bundle()`. This can't be used in the async version because it must return a `napi::Result` which is immediately emitted to the user and no JS values are accessible. While it is possible to return the `CompileError` directly from the async block in a `napi::Result<Result<TransformResult, CompileError>>` and then call `throw()` in the callback with access to JS data, doing so causes lifetime issues with `fs` and isn't easily viable.
)

This adds a new `JsSourceProvider` which acts a `SourceProvider` which invokes associated JS implementations of the resolver if they exist or falls back to `FileProvider` when not given. This allows JS consumers to override one or both of these methods. If JS does *not* override either, they should not pay any significant performance penalty since all the Rust work will stay on the same thread.

The JS implementations are invoked as thread-safe functions, pushing the arguments onto a queue and adding it to the event queue. Some time later, the main thread pulls from this queue and invokes the function. `napi-rs` doesn't seem to provide any means of receiving a JS return value in Rust, so instead arguments for both `read()` and `resolve()` include a callback function using Node calling conventions (`callback(err, result)`). This looks like:

```javascript
await bundleAsync({
  // Options...

  resolver: {
    read(file: string, cb: (string) => void): void {
      // Read file and invoke callback with result.
      fs.read(file, 'utf-8').then((res) => cb(null, res), (err) => cb(err, null));
    },

    resolve(specifier: string, originatingFile: string, cb: (string) => void): void {
      // Resolve and invoke callback with result.
      const resolved = path.resolve(originatingFile, '..', specifier);
      cb(null, resolved);
    },
  },
});
```
…(...args) => Promise<Result>` (parcel-bundler#174)

This hides the fact that `napi-rs` doesn't actually do anything with the return value of a threadsafe JS function, so any returned data would be dropped. Instead, Parcel adds a callback function which gets invoked by JS with the resulting value, following Node conventions of `callback(err, result)`. This is unergonimic as an API, so the JS shim exposes a `Promise`-based interface which gets converted to the callback behavior required by `napi-rs`.

This also includes tests for all the common use cases and error conditions of custom JS resolvers.
This rejects immediately if the user attempts to pass a `resolver` property to the synchronous `bundle()` function. Since communciating across between background threads and the main thread is quite tricky without an asynchronous hook, `bundle()` does not support custom resolvers.
@dgp1130
Copy link
Contributor Author

dgp1130 commented Jun 25, 2022

I took another pass at this PR and I think I was able to resolve those two open issues.

  1. I was able to fix tests by removing the .clone() calls to @import processing and fixing borrow checker issues with a futures::lock::Mutex which allows its data to be passed across await boundaries. TBH, I don't fully understand this solution and I'm not 100% that it is the correct one, but everything compiles and tests pass so I think it's at least reasonable until a stronger Rust dev can point me at a more idiomatic solution here.
  2. While I did remove Rayon, I don't believe that necessarily moves everything to a single thread. Apparently #[tokio::main] will use multi-threading by default, and you have to explicitly opt-in to single-threaded async execution. Looking at napi-rs and execute_tokio_future(), it seems to spawn a task on a static runtime which is created with tokio::runtime::Runtime::new() which uses a multi-threaded executor. This seems to use one worker thread per-CPU with a work-stealing strategy, which sounds pretty similar to Rayon's approach from my recollection, so I think this shouldn't have any significant performance impact, though there may be subtle differences between Rayon and Tokio's implementations.

Hopefully that should resolve all the known blockers here. Let me know if there's anything I can do to help push this forward.

@dgp1130
Copy link
Contributor Author

dgp1130 commented Jul 4, 2022

@devongovett, anything I can do to help push this forward? Would love to see custom JS resolvers in Parcel.

@devongovett
Copy link
Member

hey, sorry for my slow responses. This is a big change, and I haven't had time to fully test it out and think through it yet. Two things I am thinking about:

  1. Replacing rayon with tokio and making everything async on the Rust side seems potentially unnecessary. I think it should be possible to keep the Rust API the same (synchronous) and use locks or channels in the Node binding to make the JS calls synchronous. I did a small experiment over here that does something similar.
  2. I think we should contribute to napi-rs to enable getting the return value from a ThreadsafeFunction. In my experiment, I made a fork of it which allows passing a custom Rust function that calls the JS callback rather than only returning the arguments. This allows it to receive the return value, and pass it back to the Rust thread (thus unlocking it) via a channel. Perhaps we can contribute this upstream.

Hopefully in the meantime you're able to use your fork so I'm not blocking you.

@dgp1130
Copy link
Contributor Author

dgp1130 commented Jul 12, 2022

hey, sorry for my slow responses.

Sorry, I don't mean to be too annoying about this, mainly just making sure it didn't get forgotten. 😅

  1. Replacing rayon with tokio and making everything async on the Rust side seems potentially unnecessary. I think it should be possible to keep the Rust API the same (synchronous) and use locks or channels in the Node binding to make the JS calls synchronous. I did a small experiment over here that does something similar.

Can you elaborate on your concern with the async approach? Are you worried that there is a performance regression since async isn't quite zero-cost? Are you hoping to avoid a new bundleAsync() function and support custom resolvers in the synchronous bundle() function? Are you trying to keep work off the main thread? Are you worried about maintainability? I'm just trying to understand what problem you're hoping to solve here.

Looking at your prototype, IIUC, it seems like you're basically using JavaScript workers to manage the different threads with Rayon to parallelize scheduling and using mpsc channels to block on the work. Am I reading this correctly?

I'm a bit confused by the value of this given that the parallel operation with Rayon just sends a message to a JS worker and waits for it. I'm not sure how much value that's really giving and I feel like the overhead of additional spawned threads is probably more costly than its benefit? Since Tokio uses a multi-threaded work stealing approach, that sounds a lot closer to the current Rayon model than this mpsc method and would probably have similar performance characteristics (I'm speculating here, not sure if you have any benchmarks where I could easily measure and validate this).

We should also keep in mind the expected use case of custom resolvers and readers. A resolver is likely a trivial amount of work (if it reads files it might take more time, but will be IO-bound), while a reader will always be heavily IO-bound. Given that neither is CPU-bound, if the goal is to avoid overworking the main thread, I'm not sure there's much value in it given that async JS will push those IO tasks off the main thread automatically and any CPU work is likely to be trivial. I guess it might result in an unnecessary context switch in and out of the main thread, but it looks like your mpsc approach doesn't use the main thread for much useful work anyways, so I don't think it's really much worse in this regard? Also, if we do this in the synchronous bundle() method, that means the main thread is forced to be blocked on any IO operations, whereas await bundleAsync() can be parallelized with other work on the main thread.

In the context of the custom resolver, the JS worker approach also means that the resolver needs to be written in a worker, which definitely makes the API much less ergonomic. Module blocks could help with this, but that proposal is still a ways out.

I'm not saying the async approach is superior, just trying to understand the trade-offs you're making here and where you're trying to go.

  1. I think we should contribute to napi-rs to enable getting the return value from a ThreadsafeFunction. In my experiment, I made a fork of it which allows passing a custom Rust function that calls the JS callback rather than only returning the arguments. This allows it to receive the return value, and pass it back to the Rust thread (thus unlocking it) via a channel. Perhaps we can contribute this upstream.

Agreed that napi-rs should support this. I was avoiding that just to reduce the blockers on this PR. If you think it's a hard blocker here, I can look into making that contribution. Not sure if there are other challenges with upgrading the napi dependency if/when that feature lands.

@devongovett
Copy link
Member

Merged an alternative approach in #263, following the same API you proposed here, but without needing to make the bundler internals async. Will be contributing a few napi changes upstream later.

Thanks for getting this started, and apologies for my slow responses. Hopefully you were able to use your fork in the meantime.

@devongovett devongovett closed this Sep 4, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants