-
Notifications
You must be signed in to change notification settings - Fork 8.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Multi-threaded interpreter #23080
Comments
Original comment by @lukasolson: I guess there's already a ticket for the browser side web workers: LINK REDACTED |
Original comment by @Bargs: Started researching and thinking about this today, gonna write down some thoughts here so others can read it if they're interested. Started out trying to clearly define the problem we're trying to solve. I think it comes down to two things:
So, before immediately jumping to multi-threading, I considered whether there are any other options for dealing with these problems. Most expensive operations in Canvas functions will involve processing a lot of data. Usually in Node you'd process large amounts of data in a streaming manner, a chunk at a time, so the thread doesn't block while the entirety of the data is gathered and processed. Couldn't we do that here? In theory it seems like it would be feasible, but I'm not really sure without doing a lot of extra research. Downsides? It would require rearchitecting all of the canvas functions to expect a stream as the context. It would require all plugin developers to deal with streams instead of plain objects, complicating the development process for them. One badly behaving function could still take down the Kibana server. Probably a bunch of other stuff I'm not thinking of. It seems like an idea worthy of trying out. Beyond avoiding the thread blocking and memory issues it might also increase performance. But it would be a ton of work and there are a lot of unknowns, it could easily end up being a trash idea and a waste of time. The path to multi-threading seems a lot more clear at the moment. And the two aren't necessarily mutually exclusive. Even if we implemented a streaming architecture, doing the work in a separate process would still be handy to protect ourselves against a badly behaving function taking down the server, and it would give Kibana the ability to utilize more CPU cores on a machine. I couldn't think of any other alternative solutions to our issues, so with that I jumped into researching Node forking and multithreading. I looked at a few open source projects that aim to make multithreading easier, LINK REDACTED in particular. Using the same API on the server and in the browser is appealing. I was thinking of creating a POC with this library, but then I learned you can't So I moved on to looking more at Node's core I tried to figure out if there's any way to stream data between parent and child process, so the parent could pipe it to the web socket without parsing it and adding it to the JS heap, but I didn't find anything in my searches. Another more promising idea is to give each child process its own socket. You can pass a Node server object between processes. I'm a little hazy on this, but it looks as though node will natively load balance between processes when multiple processes are listening for connections on the same server instance. So if each process creates it's own socket.io instance with the same server object I think they'll both receive connections, but I haven't tested this yet. The only problem is that currently some of the set up of the socket.io listeners requires access to the Hapi server object, namely getting the auth header and creating the handlers passed to the interpreter. @rashidkpc did you deal with this at all in your POC? I'm beginning to recollect that we talked about marking functions that need handlers as "non-threadable". If some functions aren't threadable then we'll have to copy data between parent and child even if the child has a copy of the socket, so maybe this whole idea is moot. So that ended up being a bit of a rabbit hole. Not sure if I have a clear solution yet, but the next time I pick this work up I think I'm going to focus on getting the child processes spawned and getting the "threadable" functions loaded up in them. Even if further optimizations are needed, some of the basic work there will still be useful. |
Original comment by @rashidkpc: I want to support streams eventually, and I think we can do it as part of Lukas's work on serializers, but like you said, it will complicate a ton of stuff and we'd still want threading. |
Original comment by @rashidkpc: RE: handlers My logic was that threadable functions run in another environment and we don't keep handlers consistent across environments. Thus we only provide the handlers that can be implemented in the threaded environment, so possibly nothing that needs hapi. I think for the first run at it, copying the data between the main thread and the threaded environment is fine. I too am a bit foggy on if we can actually get ahold of the server object. I also don't know that we want node doing native load balancing, I'd rather we control our own thread pool. |
Original comment by @Bargs: I'm punting on this for now. My gut tells me spawning multiple kibana servers will be a better approach than creating our own specialized worker pool. It would allow all server functions to be "threadable", we wouldn't have to pass around file paths, it would help out other CPU bound plugins like timelion without requiring any changes from them and it feels like a nice general approach. But, the platform team is currently in the middle of completely changing how Kibana gets spawned. I really don't want to get tangled up in that atm. We can talk about whether its worth going forward with the specialized worker pool instead, but right now I really want to make some clear forward progress on other tasks. |
Original comment by @rashidkpc: This is coming along well. I had to make a bunch of changes to the plugin loader to be able to get file references, but the outcome was that we have a much easier to use, more more consistent plugin interface that allows us to thread on both the server side and client side. This phase will only deal with the serverside, but it sets us up nicely for later. |
Original comment by @rashidkpc: This is about ready to be merged, which is a large amount of the work: LINK REDACTED |
Replaces #23301 Closes #23080 --- This is a minimal threading implementation for Canvas. There's still a lot to be done to make this concept great, but this is a start. What it does: - Creates a server side abstraction on top of the interpreter - Determines where to send the expression by checking the first function to be run - Loads common functions in a separate worker thread on the server. - Routes to a single forked worker (thread), the main thread (server), or the browser (browser), in that order - Defers back to the router when a function isn't found. Fails if the function isn't found in any of the above 3 environments - Times out the worker if it takes too long, and respawns it as needed. - Simplifies the error dialog to remove the stack. What is does not.: - Round robin a pool of workers - Queue. If one expression in the threaded env fails then anything sent to it in the meantime will fail. The upstream environment handles managing timeouts. I think this would only make sense todo with a pool. - Client side. This doesn't implement web workers, but we could use roughly the same architecture. - Implement a specific, pluggable `worker` environment on the server. Right now it's just common functions, so plugin authors will always end up in a thread if they put their function in the common directory. What I don't like: - The socketProvider code. This was reused across the server & browser, but now that it's only used in the browser there's no good reason for the abstraction - The serialize/deserialize stuff feels messy. Do we really need serialization?
Replaces #23301 Closes #23080 --- This is a minimal threading implementation for Canvas. There's still a lot to be done to make this concept great, but this is a start. What it does: - Creates a server side abstraction on top of the interpreter - Determines where to send the expression by checking the first function to be run - Loads common functions in a separate worker thread on the server. - Routes to a single forked worker (thread), the main thread (server), or the browser (browser), in that order - Defers back to the router when a function isn't found. Fails if the function isn't found in any of the above 3 environments - Times out the worker if it takes too long, and respawns it as needed. - Simplifies the error dialog to remove the stack. What is does not.: - Round robin a pool of workers - Queue. If one expression in the threaded env fails then anything sent to it in the meantime will fail. The upstream environment handles managing timeouts. I think this would only make sense todo with a pool. - Client side. This doesn't implement web workers, but we could use roughly the same architecture. - Implement a specific, pluggable `worker` environment on the server. Right now it's just common functions, so plugin authors will always end up in a thread if they put their function in the common directory. What I don't like: - The socketProvider code. This was reused across the server & browser, but now that it's only used in the browser there's no good reason for the abstraction - The serialize/deserialize stuff feels messy. Do we really need serialization?
Replaces #23301 Closes #23080 --- This is a minimal threading implementation for Canvas. There's still a lot to be done to make this concept great, but this is a start. What it does: - Creates a server side abstraction on top of the interpreter - Determines where to send the expression by checking the first function to be run - Loads common functions in a separate worker thread on the server. - Routes to a single forked worker (thread), the main thread (server), or the browser (browser), in that order - Defers back to the router when a function isn't found. Fails if the function isn't found in any of the above 3 environments - Times out the worker if it takes too long, and respawns it as needed. - Simplifies the error dialog to remove the stack. What is does not.: - Round robin a pool of workers - Queue. If one expression in the threaded env fails then anything sent to it in the meantime will fail. The upstream environment handles managing timeouts. I think this would only make sense todo with a pool. - Client side. This doesn't implement web workers, but we could use roughly the same architecture. - Implement a specific, pluggable `worker` environment on the server. Right now it's just common functions, so plugin authors will always end up in a thread if they put their function in the common directory. What I don't like: - The socketProvider code. This was reused across the server & browser, but now that it's only used in the browser there's no good reason for the abstraction - The serialize/deserialize stuff feels messy. Do we really need serialization?
* Translate global navigation bar component (#23993) Translate global navigation bar component * [backport] add back earlier 6.x minor versions We still backport to these branches, primarily for doc changes. * [dev/build] fix invalid assertion * Skip this test until snapshots are updated (#24650) * Feat/expression threading (#24598) Replaces #23301 Closes #23080 --- This is a minimal threading implementation for Canvas. There's still a lot to be done to make this concept great, but this is a start. What it does: - Creates a server side abstraction on top of the interpreter - Determines where to send the expression by checking the first function to be run - Loads common functions in a separate worker thread on the server. - Routes to a single forked worker (thread), the main thread (server), or the browser (browser), in that order - Defers back to the router when a function isn't found. Fails if the function isn't found in any of the above 3 environments - Times out the worker if it takes too long, and respawns it as needed. - Simplifies the error dialog to remove the stack. What is does not.: - Round robin a pool of workers - Queue. If one expression in the threaded env fails then anything sent to it in the meantime will fail. The upstream environment handles managing timeouts. I think this would only make sense todo with a pool. - Client side. This doesn't implement web workers, but we could use roughly the same architecture. - Implement a specific, pluggable `worker` environment on the server. Right now it's just common functions, so plugin authors will always end up in a thread if they put their function in the common directory. What I don't like: - The socketProvider code. This was reused across the server & browser, but now that it's only used in the browser there's no good reason for the abstraction - The serialize/deserialize stuff feels messy. Do we really need serialization? * Polish 6.5 (#24556) * Updates waterfall item design for timeline rows * Adjusts span and tx flyouts and updates tooltips to EUI * Heading size fixes and clean up * Updates tooltip snapshots * Review tweaks and snapshot updates * Revert experiment :) Co-Authored-By: jasonrhodes <jason.matthew.rhodes@gmail.com> * Fixes bug with v1 waterfall state * Fixes bug with timeline bar height * Updates snapshot tests * Updated test so it doesn't mount and rely on EUI makeId() which is non-deterministic per test run * Don't throw errors in optimizer (#24660) * Fixed label position on progress elements (#24623) * [kbn/es] add context to error message (#24664) This just tweaks the kbn-es error message to provide more context than just `Not Found` * [BeatsCM] Beats without tags should return an empty array via the config API (#24665) * [ML] Change file data visualizer JSON format label to NDJSON (#24643) * [ML] Change file datavisualizer JSON format label to NDJSON * [ML] Update edit flyout overrides snapshot * Translations for Coordinate Map (#23952) translate Coordinate Map * Translations for Region Map (#23875) add translations for region_map plugin * [Tools] Add TemplateLiteral parsing to i18n_check tool (#24580) * [Tools] Add TemplateLiteral parsing to i18n_check tool * Add comments * [ML] Remove obsolete sentence from info tooltip. (#24716) * Translate security/users component (#23940) Translate security/users * [Docs] Remove beta notes for ML and Query bar (#24718) * Translations for Table Vis plugin (#23679) add translations for table vis plugin * Feature/translate new nav bar (#24326) translate new_nav_bar * center content in fullscreen mode, hide K7 top nav (#24589) * [APM] Fixes rare cases where KibanaLink is loaded outside of React context (#24705) * Fixes rare cases where KibanaLink will be loaded outside of React context and requires no redux connect dependency * Fixes tests for updated Kibana link component * Removes obsolete snapshot * Secops structure code (#24652) * add basic structure for secops application * finalize skeleton for secops * fix type issue and hapi new version * remove route home, not needed for now * Add configuration + delete noise * prepend elastic license to generated file * Cut down on all tests except for secops tests and one example of infr… (#24693) * Cut down on all tests except for secops tests and one example of infra integration tests * Commented out code for only this branch * Added comments and "please see issue number" * https://github.com/elastic/ingest-dev/issues/60
Original comment by @lukasolson:
See LINK REDACTED
Server forking. Currently Canvas can bring down the kibana server if the user manages to craft a massive expression. Also, our slow functions all run on the server, but all in a single thread. We need to fork workers. The problem exists in the browser too, but we don’t usually run slow stuff there.
It'd be awesome if we could also get browser forking to work.
Maybe check out node-serviceworker, webworker-threads, webassembly-threads
Rough set of phases for this work:
phase 1: Just get function execution in forked process working
phase 2: Figure out best way to load functions from plugins in forked process
phase 3: Get it working in the browser
phase 4: Create thread pool
The text was updated successfully, but these errors were encountered: