Thoughts on the use of threads in SpinoDB #4

falcon027 · 2022-08-16T11:52:20Z

When using node js SpinodDB runs on the main thread which slows down the event loop and degrades the application performance.
I was wondering if it would be possible to make the calls to the Database asynchronous in order to free up the event loop. To go further with speeding up the database, each collection could have its own thread to parallelize the operations even more.
What are your thoughts about this let’s discuss it in the comments. @supercamel

supercamel · 2022-08-16T13:02:26Z

Loading and saving definitely needs to become async / threaded. I've not had issues with queries slowing down the event loop myself, though. Even with several million documents, queries on properly indexed data should be executing in under 100us.

Have you done any bench marking to find out if there is a particular query that is problematic? To me it seems likely that Spino isn't able to look up an index for the query. Might be able to restructure the data somehow to make it easier to index. Then you shouldn't have any performance issues at all.

supercamel · 2022-08-16T13:20:47Z

The whole query parser / executor mechanism is not ideal and I've got some ideas to improve it which I expect will speed up performance a little. Right now, it uses parses the query into a syntax tree, then it transverses the tree first to find an index, and then again for each document it should check.

It should just pick an index during the parsing phase, and compile into a flat list of instructions. This is probably faster and should improve query performance.

This is probably the next thing that I want to do that has performance implications.

falcon027 · 2022-08-18T08:13:29Z

You are exactly right; the latency is quite good and not a problem. My idea was to increase the throughput by multithreading queries to get more requests per seconds. Currently, I get about 2000 requests/ per second and because RAM is still expensive the more requests, I can get per second the fewer ram I need to serve the same number of customers. This is particularly true if the database is large, which makes a copy of the data more expensive.

falcon027 · 2022-08-29T12:21:40Z

Do you have any thoughts about this?@supercamel

supercamel · 2022-08-30T01:00:49Z

Do you have any thoughts about this?@supercamel

There is some overhead involved in spawning / synchronising threads. The queries themselves are essentially non-blocking. It's not clear to me that multi-threading will be beneficial for queries.
However, the load and save functions do block, for quite a while too depending on the data size. This is, I think, problematic. If I go to a threaded model it will be to solve this problem. Loading must block queries, because it doesn't make sense to run queries while the db is still being loaded, but it shouldn't block the main loop. And saving must block create/update/delete functions. So I think this justifies going to some kind of threaded model but I don't think it will help you with RAM useage or query throughput.

When this happens it will be a major version increment because it will be a significant change to the current API.

falcon027 · 2022-11-01T12:10:30Z

I think the simplest approach would be to spit the collections in threads and add a separate log for each. This could speed up the replay of the log by paralysing the io and the execution. And perhaps one could do it without handling the locking of collections. What are your thoughts about this?

supercamel · 2022-11-02T00:19:44Z

This is a very interesting idea.

So, I suppose we would have a directory for the data. In that directory would be the data file which contains the saved state of the database, and a log for each collection.

Each collection would need a worker thread and some kind of inter-thread communication mechanism.

What about find queries and cursors? Getting results from queries seems like an interesting challenge. I suppose a cursor could run in the collection thread. A find query might return a handle to a cursor. An action might be 'create a cursor with this query', which would return the cursor handle. Another action might be 'get the next result for this cursor handle'. The handle would be invalidated by drop or insert queries.
Something like this. What do you think?

falcon027 · 2022-11-03T10:00:43Z

I agree this could work like you described. My only worry is the latency of the inter-tread communication. But if that isn’t a problem, then this should increase throughput and log replay speeds by quite a bit. This would be really nice if implemented. If there is any way that I can support this effort, let me know (my C++ knowledge is unfortunately somewhat limited).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Thoughts on the use of threads in SpinoDB #4

Thoughts on the use of threads in SpinoDB #4

falcon027 commented Aug 16, 2022

supercamel commented Aug 16, 2022

supercamel commented Aug 16, 2022

falcon027 commented Aug 18, 2022

falcon027 commented Aug 29, 2022

supercamel commented Aug 30, 2022

falcon027 commented Nov 1, 2022

supercamel commented Nov 2, 2022

falcon027 commented Nov 3, 2022

Thoughts on the use of threads in SpinoDB #4

Thoughts on the use of threads in SpinoDB #4

Comments

falcon027 commented Aug 16, 2022

supercamel commented Aug 16, 2022

supercamel commented Aug 16, 2022

falcon027 commented Aug 18, 2022

falcon027 commented Aug 29, 2022

supercamel commented Aug 30, 2022

falcon027 commented Nov 1, 2022

supercamel commented Nov 2, 2022

falcon027 commented Nov 3, 2022