-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Thoughts on the use of threads in SpinoDB #4
Comments
Loading and saving definitely needs to become async / threaded. I've not had issues with queries slowing down the event loop myself, though. Even with several million documents, queries on properly indexed data should be executing in under 100us. Have you done any bench marking to find out if there is a particular query that is problematic? To me it seems likely that Spino isn't able to look up an index for the query. Might be able to restructure the data somehow to make it easier to index. Then you shouldn't have any performance issues at all. |
The whole query parser / executor mechanism is not ideal and I've got some ideas to improve it which I expect will speed up performance a little. Right now, it uses parses the query into a syntax tree, then it transverses the tree first to find an index, and then again for each document it should check. It should just pick an index during the parsing phase, and compile into a flat list of instructions. This is probably faster and should improve query performance. This is probably the next thing that I want to do that has performance implications. |
You are exactly right; the latency is quite good and not a problem. My idea was to increase the throughput by multithreading queries to get more requests per seconds. Currently, I get about 2000 requests/ per second and because RAM is still expensive the more requests, I can get per second the fewer ram I need to serve the same number of customers. This is particularly true if the database is large, which makes a copy of the data more expensive. |
Do you have any thoughts about this?@supercamel |
There is some overhead involved in spawning / synchronising threads. The queries themselves are essentially non-blocking. It's not clear to me that multi-threading will be beneficial for queries. When this happens it will be a major version increment because it will be a significant change to the current API. |
This is a very interesting idea. So, I suppose we would have a directory for the data. In that directory would be the data file which contains the saved state of the database, and a log for each collection. Each collection would need a worker thread and some kind of inter-thread communication mechanism. What about find queries and cursors? Getting results from queries seems like an interesting challenge. I suppose a cursor could run in the collection thread. A find query might return a handle to a cursor. An action might be 'create a cursor with this query', which would return the cursor handle. Another action might be 'get the next result for this cursor handle'. The handle would be invalidated by drop or insert queries. |
I agree this could work like you described. My only worry is the latency of the inter-tread communication. But if that isn’t a problem, then this should increase throughput and log replay speeds by quite a bit. This would be really nice if implemented. If there is any way that I can support this effort, let me know (my C++ knowledge is unfortunately somewhat limited). |
When using node js SpinodDB runs on the main thread which slows down the event loop and degrades the application performance.
I was wondering if it would be possible to make the calls to the Database asynchronous in order to free up the event loop. To go further with speeding up the database, each collection could have its own thread to parallelize the operations even more.
What are your thoughts about this let’s discuss it in the comments. @supercamel
The text was updated successfully, but these errors were encountered: