Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

use better-sqlite3 as sqlite database driver (or provide the option) #2825

Closed
andykais opened this issue Jun 23, 2020 · 12 comments
Closed

use better-sqlite3 as sqlite database driver (or provide the option) #2825

andykais opened this issue Jun 23, 2020 · 12 comments
Labels

Comments

@andykais
Copy link

Problem

sqlite is a synchronous access database, e.g. only one connection can read from the database at a time. So the async api that sqlite3 provides actually does more harm than good. This is better explained by the creator of better-sqlite3 than I can:

  • node-sqlite3 uses asynchronous APIs for tasks that are either CPU-bound or serialized. That's not only bad design, but it wastes tons of resources. It also causes mutex thrashing which has devastating effects on performance.
  • node-sqlite3 exposes low-level (C language) memory management functions. better-sqlite3 does it the JavaScript way, allowing the garbage collector to worry about memory management.
  • better-sqlite3 is simpler to use, and it provides nice utilities for some operations that are very difficult or impossible in node-sqlite3.
  • better-sqlite3 is much faster than node-sqlite3 in most cases, and just as fast in all other cases.
    (also here: Convince me to use better-sqlite3 WiseLibs/better-sqlite3#262)

I am building a performance-heavy local web app. My gut feeling when building apps like this is that handwritten sql statements will always be more performant than ORM generated statements. Given that, I feel that picking prisma for sqlite has an extra hurdle to get over, that the driver under the hood isnt as performant as it could be.

Suggested solution

If support for this driver feels worthwhile, there are two options. Either provide a driver option in the schema datasource:

datasource sqlite {
  provider = "sqlite"
  driver = "better-sqlite3"
  url      = "file:./dev.db"
}

or just replace the internal sqlite3 database with better-sqlite3.

Alternatives

Additional context

If sqlite3 was chosen because it is backed by mapbox and therefore has a better chance of being long-lived, it appears that sqlite3 is also maintained by a single person: https://kewde.github.io/sqlite

If this looks interesting to the contributors, I am willing to open a pr with some guidance!

@pantharshit00
Copy link
Contributor

pantharshit00 commented Jun 23, 2020

We use the Rust driver for SQLite under the hood: https://github.com/rusqlite/rusqlite . We interface with it via our quaint library: https://github.com/prisma/quaint

We can't consume any Node.js library as a driver so for this feature request to be feasiable, there should be a Rust counter part and I also assume Rust SQLite might already be faster than better-sqlite3 so that would need some investigation.

Prisma client is powered by a Rust engine under the hood. You can find the source code of that here: https://github.com/prisma/prisma-engines. We are not using any node js driver under the hood.

@pimeys
Copy link
Contributor

pimeys commented Jun 23, 2020

Also I'm rewriting the whole stack for sqlite/pg/my in the next few weeks for many reasons. The library we'll be using is SQLx, and as you said, connections are stateful and they are used by one request at a time.

What you can do is you can set connection_limit=1 in your connection string. Sqlite connection is anyhow stateful as we know all SQL connections should be, and a mutex is guaranteeing only one user will be able to access the database at once.

How the asynchronous nature of SQLite in SQLx is implemented is they use the blocking API and use Tokio's block_in_place, that might or might not spawn a new thread for requests if having lots of throughput, but that will just perform a blocking query in that thread and continue its life.

You are right that SQLite allows only one writer at a time, and using multiple connections to the same file is waste of resources. I can also guarantee to you that with Prisma the speed of accessing SQLite will not be the first bottleneck.

@Sytten
Copy link
Contributor

Sytten commented Jun 23, 2020

@pimeys interesting, will this replace Quaint?

@pantharshit00
Copy link
Contributor

Quaint will be powered by SQLx under the hood.

Please follow: prisma/quaint#138

@andykais
Copy link
Author

ah. I see now. Apologies for spreading misinformation! I had assumed the driver was node-sqlite3 after poking around the prisma/prisma repo and seeing it in the cli's package.json.

I will limit my question to the SQLx rewrite then:

How the asynchronous nature of SQLite in SQLx is implemented is they use the blocking API and use Tokio's block_in_place, that might or might not spawn a new thread for requests if having lots of throughput, but that will just perform a blocking query in that thread and continue its life.

Cool, the first part makes sense to me, the interface is a blocking api. Do you know how this will affect the node interface? Will it remain asynchronous to stay aligned with the pg/my drivers? At what point will the access become blocking? The second part of

that will just perform a blocking query in that thread and continue its life.

Is slightly confusing because I dont know if we are talking about a thread inside a rust (a tokio async block?) or a libuv thread (the rust code is executed in an async manner from nodejs)

E.g. accessing a sqlite database still looks like this:

const allUsers = await prisma.user.findMany()

Rust is not my strong suit, so apologies if my questions are missing the point. The bottom line question is:

Will there be asynchronosity when using prisma with sqlite, where will it occur, and what are its implications?

@pimeys
Copy link
Contributor

pimeys commented Jun 23, 2020

The node interface is asynchronous due to the rust implementation using a threadpool underneath to spread the cpu heavy stuff to all the available cores in your computer. The database interfaces will be asynchronous due to it being easy to plug with the rest of the system.

What happens with pg/my and soon mssql is the async interface will wait for IO and let the threads to do other things until the response is ready. With sqlite, the right thing to do is to use one connection and share it with the threads, and queue up all waiting tasks until the mutex is free using more than one connections with it will just consume more RAM and the connections will still wait for others to finish. We're talking about a few megabytes maximum here, so nothing noticeable.

In the end I wouldn't do really high performance things with SQLite. It's not meant for super fast processing, more of a nice tool to work with data in memory and for testing (and Android!). It has one mutex blocking all other writes, and although you can get read-only mode in it that allows others using it in parallel, none of the popular clients currently support that.

Anyhow, play with the connection limits and see how fast you can go with Prisma and SQLite.

@pantharshit00 pantharshit00 added kind/feedback Issue for gathering feedback. and removed kind/feature A request for a new feature. labels Jun 24, 2020
@andykais
Copy link
Author

Got it, I do think sqlite is a very fast database (just based on the fact that there are less hops that data needs to make to write to disk than say postgres or any other network-based database). It just happens to scale poorly with a large volume of requests. Its perfect for local web apps like I am describing.

I will just assume that doing things in rust will be performant and thread-safe. However, there are a few concerns surrounding backpressure. If we were to use a fully sync database driver, and sqlite were to become blocked, then requests would start to queue up at the OS level on the TCP backlog. If it were to get overwhelmed, requests would start getting back messages like ECONNRESET (or whatever the browser equivalent is). In the case of an async database, if sqlite were to get blocked, requests would queue up inside the prisma database threadpool. If it were to get overwhelmed, what would happen? Would the nodejs process crash with a heap out of memory error?

I understand we are dealing with extremes here, but I do think its worth knowing what it will do in those situations. E.g. the sync database server will stop accepting connections if it is blocked and therefore cannot overflow its memory. However the async database server could potentially keep swelling while its database is blocked until its memory overflows.

By the way, thanks for indulging me here. I had to learn a bit more about how nodejs works to understand what else a node server would do under stressful conditions and learning about the TCP kernel queue is was pretty neat!

@pimeys
Copy link
Contributor

pimeys commented Jun 25, 2020

Ok, let's think about this a bit :)

First underneath we have SQLite, protected with a mutex (takes about max 20 nanoseconds to lock, queries what you push are hopefully done in a few microseconds). One request at a time is allowed for access, but it's quite fast due to the database being in the same memory space. We have an RPC server here that does a hop from Node to Rust and back, which makes sense to do in an asynchronous manner. Fine. We can saturate all CPUs and we can hit a limit with the database when having a crazy amount of requests.

On top of this there's Node, a single-threaded system with a dynamic JavaScript or dynamic-ish TypeScript. What will happen here is that the node service will never be able to push enough traffic to the underlying rust system. The bottleneck will for sure be node's event loop.

But. If you are able to eat all the RAM with back pressure. Or if you start getting system-level errors, such as ECONNRESET, I'm more than happy to come and debug this scenario! It sounds interesting...

@andykais
Copy link
Author

Great, thanks for the info. If you are interested, all I did to generate ECONNRESET was create a basic node server, block the event loop, then hit with with anywhere between 500-3000 requests in parallel. I have not started using prisma yet, this is recon 😄. However, as you predicted, the nodejs itself is also a bottleneck. Without blocking the event loop, I was still able see ECONNRESET errors with a sufficiently high number of parallel requests.

It is probably still possible to create a scenario where the database becomes a bottleneck before the server (e.g. sufficiently large datasets being written to the database, or an old, old HDD being written to), but these are definitely edge cases.

I think what you said makes the most sense here:

What will happen here is that the node service will never be able to push enough traffic to the underlying rust system. The bottleneck will for sure be node's event loop.

Im going to close this now since I have a pretty good idea that I cant get into a situation where an async driver is going to harm me. Ill be curious to see what kind of benchmarks for prisma I can produce after this rewrite (though I think until prepared statements can be reused, its likely going to be lagging behind better-sqlite3)

@pimeys
Copy link
Contributor

pimeys commented Jun 25, 2020

Funny that you posted that reusing of prepared statements link. I'm the engineer working on that, but focusing first on MySQL. The PR for SQLx was just accepted today.

The plan is to have caching for MySQL for 2.2.0, but if everything goes well, I'd say all our databases should be using SQLx in 2.2.0. Hopefully. And with caching!

@andykais
Copy link
Author

🎉 how about that. Best of luck!

@Mwni
Copy link

Mwni commented Apr 12, 2022

#12785

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

5 participants