-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Breaking Change Proposal: Move compilation_id out of the protobuf to improve performance in multi-thread/multi-isolate context #3579
Comments
More specifically, these can be removed and replaced with a single
These should remain, to reserve the potential of a compiler implementation sending multiple of same kind concurrently in a single compilation:
|
I was hoping we could extract this information from the serialized protobuf directly, but since field order isn't guaranteed I guess that's not safe. This is probably the next best option. We should bear in mind the possibility of #2745 as we design this. That is, we may want to have the ability to ensure that multiple compilations take place within the same compiler thread. We could potentially track this separately, but that would complicate the compiler's ability to multiplex new compilation requests. One possibility would be to preserve all the existing |
That is a good migration path. We can feature flag dart-sass-embedded to accept and send |
This is more of an issue due to lack of shared memory across isolates in Dart. If we can find a way to share data across isolates, this can potentially be solved differently. Read-write access across isolates would be better, but even with read-only shared data it can work to some degree. E.g. The main isolates owns a shared cache. All isolates can read from the shared cache. When worker isolates mutate the shared cache, they commit to a thread-local cache first, then thread-local cache can be asynchronously send back to main isolate for a persisted commit to the shared cache. |
That's true, I could imagine a situation where we want parallel compilation that still share state with one another. It seems quite tricky to handle given that for a lot of entrypoints almost everything will be cacheable, but it's certainly not impossible. In that case, maybe we should explicitly make the wire-level ID a compilation ID, and plan to have a separate session ID in the |
I think we're having different interpretation on
With your interpretation of worker affinity, I think we can call the In other words, no changes to current proto definition at all, just a change to wire encoding as:
Specially, the If we go with this explicit approach, we need to think about lifecycle of workers:
My take is that we go with the simplest option that we keep created workers until process is terminated. - Host implementation can make the judgement of when to cleanup. However, The real big issue of worker affinity idea is execution order. In this implementation we would allow enqueuing of multiple CompilationRequests to a single worker. Imagining we have a synchronous compilation A which contains a custom JS function. The JS function starts a synchronous compilation B asking to run on the same worker. We can deadlock a worker with FIFO order. LIFO, on the other hand, risks back into stack overflow, which was part of the reasons we land here. This complexity of worker affinity is why I was thinking to use |
I didn't mean to imply that the wire ID would be tightly semantically tied to workers. Rather, I meant that it would be tied to a single compilation—essentially the same as your idea of You're right that it's unlikely to be a performance issue in practice to have To summarize, here's my proposal:
Semantically, the wire ID isn't any different than the compilation ID is today—it doesn't intrinsically map to workers or threads. It's just defined in such a way that a host can efficiently parallelize it as an implementation detail. |
@nex3 Your proposal sounds good to me. |
Pinging maintainers of other known host implementations for comments: @ahabhgk (https://github.com/ahabhgk/sass-embedded-host-rust) |
Sounds fine to me. Keeping in mind 'Version' as a placeholder for 'Version and other slow-path commands that don't exist yet': we would keep |
I don't see any real problems on my end (godartsass) -- other than some work, but if helps with performance, I'm always thumbs up. |
We don’t necessarily need an id on VersionRequest itself if we reserve wire id 0, as we don’t really expect to receive concurrent/repeated requests for checking version, and even that happens the response will always be the same. For other potential future requests, they will be as different protobuf type if any. Since we only have a single wire id 0 reserved for all these, the process of these requests would be always first checking the received message type for wire id 0 and then proceeding to process accordingly depends on the message type. |
I think just for consistency it makes sense to retain |
The proto decoding and encoding never seemed to be a performance bottleneck for me, but If you have problems there or see the potential for performance improvements, we can definitely look into this. Have you already done some tests to see how much improvement is possible here? Im also not sure if it's a good idea that the wire-transfer now becomes more than just protobuf. The java protobuf library has dedicated support for writing and reading the currently used var-int delimited messages, which allows me to implement the actual stream communication without any custom code: So the current protocol is just "plain old" protobuf on a stream with no additional IO logic needed. |
The performance issue is mainly on dart-sdk/dart-sass-embedded. Here are some related issues and discussions: |
It's difficult to get meaningful test results without a real app that uses custom importers/functions and has a high degree of concurrency. It's certainly the case that double-parsing the protobufs to dispatch between workers is wasted effort. That said, we could also potentially mitigate this by having the dispatcher iterate through the raw bytes to find the message's ID without fully parsing it. This would be more complex to implement on the compiler's side—it effectively requires us to write a protobuf "lexer"—but a single-pass traversal should still be fairly efficient, especially if (as I suspect) serializers will tend to put the ID at the beginning of the message. And, crucially, it wouldn't require a change from the current wire format. @ntkme what do you think of that possibility? |
The complexity is not only for the compiler side. For example, the ruby host already has proper multithreaded implementation, but not truly multithreaded unless running with JRuby or Truffleruby which do not have global interpreter lock. In the future I would like to move to a Ractor based model where it provides true concurrency without GIL limitations. In the Ractor model, it faces the problem that protobuf Ruby object are not shareable - therefore there is the exact same double-parsing problem. In other words, it not only that the compiler need to implement a “lexer” to peak into the message for finding out id, the host can face the same issue depends on language features and limitations. |
Would it be possible or helpful to implement a container object like this?: message Container {
uint32 id = 1;
//Serialized InboundMessage or OutboundMessage
bytes message = 2;
} I'd hope that such a message should be fast and easy to encode and decode on the main dispatching thread. |
I think a container object would be more or less on the same level of code complexity for implementation comparing to a wire ID. However, in actual protobuf implementation in most language, this would require doing an extra memory copy of the For example:
More specifically it would have more impact on embedded-host-node’s legacy API and sassc-embedded’s SassC shim where the base importer is fully emulated and all imports need to go through stdio. |
The critical issue with parsing raw protobufs is that the order fields are serialized in is not guaranteed. So the ID could come at the beginning or it could come at the end, and the only way to find it reliably is to understand the wire format well enough to traverse it and locate the appropriate tag. That's certainly not impossible—the protobuf wire format is relatively simple—but it's something any endpoint that wants parallelism would have to implement itself, in addition to some logic for de-duplicating IDs of in-flight requests across workers. I think ultimately it's simpler to make the transport protocol a little more complex in exchange for making the multiplexing implementations much simpler. I'd rather make it easier to build more performant hosts than less performant ones, given that we're stuck with a trade-off between the two. We could potentially have two separate protocols to allow some hosts to go on using existing multi-message infrastracture, but honestly I'm not sure how useful that would be. Even if hosts don't have host-side parallelism, they're likely to want to take advantage of compiler-side parallelism. |
Having to implement an additional "raw" protobuf parser just to find the id in order to avoid the "real" protobuf parsing does not make sense to me, too. If the additional varint on the wire helps the embedded compiler and/or other host implementations I'm fine with it. I was just a bit hesitant because I don't have any performance or concurrency problems with protobuf encoding/decoding in my host implementation. So from my point of view, the proposed change seems unnecessary. |
The official proposal is now up at #3582. |
The proposal has landed. I'm just going to give it a week for public comment because it's already been discussed in some depth here, and getting this all squared away is blocking Dart Sass releases. |
As I'm working on implementing this, I became a little concerned again about #2745. As the protocol stands currently, if we want to share state across compilations the compiler will have to find a way to communicate that state across multiple isolates. For example, if we want to handle a widely-used module that's shared across many entrypoints, we'll either need to serialize each function/mixin/variable call/response or serialize the entire module each time it's loaded. This could have real benefits in terms of parallelism across many compilations with shared state, but it's not obvious that those benefits would outweigh the cross-isolate overhead relative to sequential compilations that are able to directly access the shared modules. All that said, it's not intrinsically a breaking change to add back |
I'd like to keep this open until the implementation has landed. |
Today we have the wire data encoded as:
This is a propose for a breaking change to remove
compilation_id
field inside proto, and replace it with a top levelsession_id
like below to serve the same purpose:The motivation is to improve I/O performance in a multi-thread context, by offloading message encoding and decoding to different CPU cores running different threads/isolates to reduce the pressure on the main isolate/thread due to potential heavy I/O multiplexing on stdio. With this change, both host, and compiler would be able to pass raw proto to a thread/isolate that is currently working on a specific compilation, and the decoding of the proto can happen on the child thread's CPU cycle, giving the main thread more free CPU cycle to purely process multiplexed I/O.
session_id
should be in the range ofuint32
, the reason to choosevarint
as its encoding instead is:The text was updated successfully, but these errors were encountered: