Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multi-threaded interpreter #23080

Closed
kibanamachine opened this issue May 8, 2018 · 7 comments
Closed

Multi-threaded interpreter #23080

kibanamachine opened this issue May 8, 2018 · 7 comments
Assignees
Labels
blocker loe:x-large Extra Large Level of Effort PR sent Team:Presentation Presentation Team for Dashboard, Input Controls, and Canvas v6.5.0 v7.0.0

Comments

@kibanamachine
Copy link
Contributor

Original comment by @lukasolson:

See LINK REDACTED

Server forking. Currently Canvas can bring down the kibana server if the user manages to craft a massive expression. Also, our slow functions all run on the server, but all in a single thread. We need to fork workers. The problem exists in the browser too, but we don’t usually run slow stuff there.

It'd be awesome if we could also get browser forking to work.

Maybe check out node-serviceworker, webworker-threads, webassembly-threads

Rough set of phases for this work:

phase 1: Just get function execution in forked process working
phase 2: Figure out best way to load functions from plugins in forked process
phase 3: Get it working in the browser
phase 4: Create thread pool

@kibanamachine
Copy link
Contributor Author

Original comment by @lukasolson:

I guess there's already a ticket for the browser side web workers: LINK REDACTED

@kibanamachine
Copy link
Contributor Author

Original comment by @Bargs:

Started researching and thinking about this today, gonna write down some thoughts here so others can read it if they're interested.

Started out trying to clearly define the problem we're trying to solve. I think it comes down to two things:

  1. Canvas functions run on the server and may be CPU bound, which would block the Kibana server's only thread.
  2. Canvas functions could be memory intensive, if they run in the main Kibana process an OOM would take down the entire Kibana server.

So, before immediately jumping to multi-threading, I considered whether there are any other options for dealing with these problems. Most expensive operations in Canvas functions will involve processing a lot of data. Usually in Node you'd process large amounts of data in a streaming manner, a chunk at a time, so the thread doesn't block while the entirety of the data is gathered and processed. Couldn't we do that here? In theory it seems like it would be feasible, but I'm not really sure without doing a lot of extra research. Downsides? It would require rearchitecting all of the canvas functions to expect a stream as the context. It would require all plugin developers to deal with streams instead of plain objects, complicating the development process for them. One badly behaving function could still take down the Kibana server. Probably a bunch of other stuff I'm not thinking of.

It seems like an idea worthy of trying out. Beyond avoiding the thread blocking and memory issues it might also increase performance. But it would be a ton of work and there are a lot of unknowns, it could easily end up being a trash idea and a waste of time. The path to multi-threading seems a lot more clear at the moment. And the two aren't necessarily mutually exclusive. Even if we implemented a streaming architecture, doing the work in a separate process would still be handy to protect ourselves against a badly behaving function taking down the server, and it would give Kibana the ability to utilize more CPU cores on a machine.

I couldn't think of any other alternative solutions to our issues, so with that I jumped into researching Node forking and multithreading.

I looked at a few open source projects that aim to make multithreading easier, LINK REDACTED in particular. Using the same API on the server and in the browser is appealing. I was thinking of creating a POC with this library, but then I learned you can't require modules in a worker. So that's a non-starter.

So I moved on to looking more at Node's core child_process module. One issue that came to mind immediately is the fact that communication between the parent and child process must create copies of any data being transferred. Initially I had imagined the parent handling the socket, passing the data to the child for processing, and the child passing the results back to the parent to be sent over the socket to the browser. That might not be a great idea if copies of the data are made on each transfer between parent and child.

I tried to figure out if there's any way to stream data between parent and child process, so the parent could pipe it to the web socket without parsing it and adding it to the JS heap, but I didn't find anything in my searches.

Another more promising idea is to give each child process its own socket. You can pass a Node server object between processes. I'm a little hazy on this, but it looks as though node will natively load balance between processes when multiple processes are listening for connections on the same server instance. So if each process creates it's own socket.io instance with the same server object I think they'll both receive connections, but I haven't tested this yet. The only problem is that currently some of the set up of the socket.io listeners requires access to the Hapi server object, namely getting the auth header and creating the handlers passed to the interpreter. @rashidkpc did you deal with this at all in your POC? I'm beginning to recollect that we talked about marking functions that need handlers as "non-threadable". If some functions aren't threadable then we'll have to copy data between parent and child even if the child has a copy of the socket, so maybe this whole idea is moot.

So that ended up being a bit of a rabbit hole. Not sure if I have a clear solution yet, but the next time I pick this work up I think I'm going to focus on getting the child processes spawned and getting the "threadable" functions loaded up in them. Even if further optimizations are needed, some of the basic work there will still be useful.

@kibanamachine
Copy link
Contributor Author

Original comment by @rashidkpc:

I want to support streams eventually, and I think we can do it as part of Lukas's work on serializers, but like you said, it will complicate a ton of stuff and we'd still want threading.

@kibanamachine
Copy link
Contributor Author

Original comment by @rashidkpc:

RE: handlers

My logic was that threadable functions run in another environment and we don't keep handlers consistent across environments. Thus we only provide the handlers that can be implemented in the threaded environment, so possibly nothing that needs hapi.

I think for the first run at it, copying the data between the main thread and the threaded environment is fine. I too am a bit foggy on if we can actually get ahold of the server object. I also don't know that we want node doing native load balancing, I'd rather we control our own thread pool.

@kibanamachine
Copy link
Contributor Author

Original comment by @Bargs:

I'm punting on this for now. My gut tells me spawning multiple kibana servers will be a better approach than creating our own specialized worker pool. It would allow all server functions to be "threadable", we wouldn't have to pass around file paths, it would help out other CPU bound plugins like timelion without requiring any changes from them and it feels like a nice general approach. But, the platform team is currently in the middle of completely changing how Kibana gets spawned. I really don't want to get tangled up in that atm.

We can talk about whether its worth going forward with the specialized worker pool instead, but right now I really want to make some clear forward progress on other tasks.

@kibanamachine
Copy link
Contributor Author

Original comment by @rashidkpc:

This is coming along well. I had to make a bunch of changes to the plugin loader to be able to get file references, but the outcome was that we have a much easier to use, more more consistent plugin interface that allows us to thread on both the server side and client side.

This phase will only deal with the serverside, but it sets us up nicely for later.

@kibanamachine
Copy link
Contributor Author

Original comment by @rashidkpc:

This is about ready to be merged, which is a large amount of the work: LINK REDACTED

@kibanamachine kibanamachine added Team:Presentation Presentation Team for Dashboard, Input Controls, and Canvas loe:x-large Extra Large Level of Effort labels Sep 14, 2018
Canvas (6.6 @ 12/18, 6.7 @ 01/29) automation moved this from In Progress to Done Oct 26, 2018
w33ble added a commit that referenced this issue Oct 26, 2018
Replaces #23301
Closes #23080

---

This is a minimal threading implementation for Canvas. There's still a lot to be done to make this concept great, but this is a start. 

What it does:
- Creates a server side abstraction on top of the interpreter
- Determines where to send the expression by checking the first function to be run
- Loads common functions in a separate worker thread on the server. 
- Routes to a single forked worker (thread), the main thread (server), or the browser (browser), in that order
- Defers back to the router when a function isn't found. Fails if the function isn't found in any of the above 3 environments
- Times out the worker if it takes too long, and respawns it as needed.
- Simplifies the error dialog to remove the stack. 

What is does not.:
- Round robin a pool of workers
- Queue. If one expression in the threaded env fails then anything sent to it in the meantime will fail. The upstream environment handles managing timeouts. I think this would only make sense todo with a pool.
- Client side. This doesn't implement web workers, but we could use roughly the same architecture. 
- Implement a specific, pluggable `worker` environment on the server. Right now it's just common functions, so plugin authors will always end up in a thread if they put their function in the common directory.

What I don't like:
- The socketProvider code. This was reused across the server & browser, but now that it's only used in the browser there's no good reason for the abstraction
- The serialize/deserialize stuff feels messy. Do we really need serialization?
w33ble added a commit that referenced this issue Oct 26, 2018
Replaces #23301
Closes #23080

---

This is a minimal threading implementation for Canvas. There's still a lot to be done to make this concept great, but this is a start. 

What it does:
- Creates a server side abstraction on top of the interpreter
- Determines where to send the expression by checking the first function to be run
- Loads common functions in a separate worker thread on the server. 
- Routes to a single forked worker (thread), the main thread (server), or the browser (browser), in that order
- Defers back to the router when a function isn't found. Fails if the function isn't found in any of the above 3 environments
- Times out the worker if it takes too long, and respawns it as needed.
- Simplifies the error dialog to remove the stack. 

What is does not.:
- Round robin a pool of workers
- Queue. If one expression in the threaded env fails then anything sent to it in the meantime will fail. The upstream environment handles managing timeouts. I think this would only make sense todo with a pool.
- Client side. This doesn't implement web workers, but we could use roughly the same architecture. 
- Implement a specific, pluggable `worker` environment on the server. Right now it's just common functions, so plugin authors will always end up in a thread if they put their function in the common directory.

What I don't like:
- The socketProvider code. This was reused across the server & browser, but now that it's only used in the browser there's no good reason for the abstraction
- The serialize/deserialize stuff feels messy. Do we really need serialization?
w33ble added a commit that referenced this issue Oct 26, 2018
Replaces #23301
Closes #23080

---

This is a minimal threading implementation for Canvas. There's still a lot to be done to make this concept great, but this is a start. 

What it does:
- Creates a server side abstraction on top of the interpreter
- Determines where to send the expression by checking the first function to be run
- Loads common functions in a separate worker thread on the server. 
- Routes to a single forked worker (thread), the main thread (server), or the browser (browser), in that order
- Defers back to the router when a function isn't found. Fails if the function isn't found in any of the above 3 environments
- Times out the worker if it takes too long, and respawns it as needed.
- Simplifies the error dialog to remove the stack. 

What is does not.:
- Round robin a pool of workers
- Queue. If one expression in the threaded env fails then anything sent to it in the meantime will fail. The upstream environment handles managing timeouts. I think this would only make sense todo with a pool.
- Client side. This doesn't implement web workers, but we could use roughly the same architecture. 
- Implement a specific, pluggable `worker` environment on the server. Right now it's just common functions, so plugin authors will always end up in a thread if they put their function in the common directory.

What I don't like:
- The socketProvider code. This was reused across the server & browser, but now that it's only used in the browser there's no good reason for the abstraction
- The serialize/deserialize stuff feels messy. Do we really need serialization?
XavierM added a commit that referenced this issue Oct 29, 2018
* Translate global navigation bar component (#23993)

Translate global navigation bar component

* [backport] add back earlier 6.x minor versions

We still backport to these branches, primarily for doc changes.

* [dev/build] fix invalid assertion

* Skip this test until snapshots are updated (#24650)

* Feat/expression threading (#24598)

Replaces #23301
Closes #23080

---

This is a minimal threading implementation for Canvas. There's still a lot to be done to make this concept great, but this is a start. 

What it does:
- Creates a server side abstraction on top of the interpreter
- Determines where to send the expression by checking the first function to be run
- Loads common functions in a separate worker thread on the server. 
- Routes to a single forked worker (thread), the main thread (server), or the browser (browser), in that order
- Defers back to the router when a function isn't found. Fails if the function isn't found in any of the above 3 environments
- Times out the worker if it takes too long, and respawns it as needed.
- Simplifies the error dialog to remove the stack. 

What is does not.:
- Round robin a pool of workers
- Queue. If one expression in the threaded env fails then anything sent to it in the meantime will fail. The upstream environment handles managing timeouts. I think this would only make sense todo with a pool.
- Client side. This doesn't implement web workers, but we could use roughly the same architecture. 
- Implement a specific, pluggable `worker` environment on the server. Right now it's just common functions, so plugin authors will always end up in a thread if they put their function in the common directory.

What I don't like:
- The socketProvider code. This was reused across the server & browser, but now that it's only used in the browser there's no good reason for the abstraction
- The serialize/deserialize stuff feels messy. Do we really need serialization?

* Polish 6.5 (#24556)

* Updates waterfall item design for timeline rows

* Adjusts span and tx flyouts and updates tooltips to EUI

* Heading size fixes and clean up

* Updates tooltip snapshots

* Review tweaks and snapshot updates

* Revert experiment :)

Co-Authored-By: jasonrhodes <jason.matthew.rhodes@gmail.com>

* Fixes bug with v1 waterfall state

* Fixes bug with timeline bar height

* Updates snapshot tests

* Updated test so it doesn't mount and rely on EUI makeId() which is non-deterministic per test run

* Don't throw errors in optimizer (#24660)

* Fixed label position on progress elements (#24623)

* [kbn/es] add context to error message (#24664)

This just tweaks the kbn-es error message to provide more context than just `Not Found`

* [BeatsCM] Beats without tags should return an empty array via the config API (#24665)

* [ML] Change file data visualizer JSON format label to NDJSON (#24643)

* [ML] Change file datavisualizer JSON format label to NDJSON
* [ML] Update edit flyout overrides snapshot

* Translations for Coordinate Map (#23952)

translate Coordinate Map

* Translations for Region Map (#23875)

add translations for region_map plugin

* [Tools] Add TemplateLiteral parsing to i18n_check tool (#24580)

* [Tools] Add TemplateLiteral parsing to i18n_check tool

* Add comments

* [ML] Remove obsolete sentence from info tooltip. (#24716)

* Translate security/users component (#23940)

Translate security/users

* [Docs] Remove beta notes for ML and Query bar (#24718)

* Translations for Table Vis plugin (#23679)

add translations for table vis plugin

* Feature/translate new nav bar (#24326)

translate new_nav_bar

* center content in fullscreen mode, hide K7 top nav (#24589)

* [APM] Fixes rare cases where KibanaLink is loaded outside of React context (#24705)

* Fixes rare cases where KibanaLink will be loaded outside of React context and requires no redux connect dependency

* Fixes tests for updated Kibana link component

* Removes obsolete snapshot

* Secops structure code (#24652)

* add basic structure for secops application
* finalize skeleton for secops
* fix type issue and hapi new version
* remove route home, not needed for now
* Add configuration + delete noise
* prepend elastic license to generated file

* Cut down on all tests except for secops tests and one example of infr… (#24693)

* Cut down on all tests except for secops tests and one example of infra integration tests
* Commented out code for only this branch
* Added comments and "please see issue number"
* https://github.com/elastic/ingest-dev/issues/60
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
blocker loe:x-large Extra Large Level of Effort PR sent Team:Presentation Presentation Team for Dashboard, Input Controls, and Canvas v6.5.0 v7.0.0
Projects
No open projects
Development

No branches or pull requests

3 participants