Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

More advanced connection pooling #569

Open
simonw opened this issue Aug 17, 2019 · 4 comments

Comments

@simonw
Copy link
Owner

@simonw simonw commented Aug 17, 2019

We need a much smarter way of handling database connections.

Today, connections are simple: Datasette runs a number of threads (defaults to 3) and each thread gets a threadlocal read-only (or immutable) connection to each attached database - opened on demand.

For Datasette Library (#417) I want to support potentially hundreds of attached databases. Datasette Edit (#567) is going to introduce a need for writable connections too.

I'd also like to be able to run joins across multiple databases (#283) which further complicates things.

Supporting thousands of open SQLite connections at once feels like it won't provide good enough performance (though I should benchmark that to be sure). Some kind of connection pooling is likely to be necessary.

@simonw

This comment has been minimized.

Copy link
Owner Author

@simonw simonw commented Aug 17, 2019

I think what I want it a mechanism where a thread can say "give me a connection for database X" and it either gets back connection to X instantly OR a new connection to X is created and returned OR it blocks (because a certain number of connections to that database exist already) until another thread returns their connection OR it times out and returns an error.

@simonw

This comment has been minimized.

Copy link
Owner Author

@simonw simonw commented Aug 17, 2019

Potential API design:

with pool.connection("fixtures") as conn:
    conn.set_time_limit(1000)
    conn.allow_all()
    conn.execute(...)

Within this block the current thread has exclusive action to the connection - which has essentially been "checked out" from the pool. When the block ends that connection is made available to be checked out by other threads.

This could accept multiple database names for the case where I want to join across databases:

with pool.connection("fixtures", "dogs") as conn:
    conn.execute(...)
simonw added a commit that referenced this issue Oct 2, 2019
@simonw

This comment has been minimized.

Copy link
Owner Author

@simonw simonw commented Oct 2, 2019

I'm going to refactor the execute() and execute_against_connection_in_thread() methods.

They currently live on the Datasette class, but in this new world it would make more sense for them to live on the Database, ConnectionGroup or Connection classes.

I think I'll put them on the Database class.

@simonw

This comment has been minimized.

Copy link
Owner Author

@simonw simonw commented Oct 2, 2019

I'm going to simplify things a bunch by continuing to ignore the cross-database joining issue #283 - I'll post some notes there on my latest thinking.

simonw added a commit that referenced this issue Oct 2, 2019
Also moved Pool stuff into the datasette/database.py module

Refs #569
simonw added a commit that referenced this issue Nov 15, 2019
Refs #569 - I split this change out from #579
simonw added a commit that referenced this issue Nov 15, 2019
Refs #569
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
1 participant
You can’t perform that action at this time.