Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Experiment with patterns for concurrent long running queries #38

Closed
simonw opened this issue Oct 25, 2017 · 5 comments
Closed

Experiment with patterns for concurrent long running queries #38

simonw opened this issue Oct 25, 2017 · 5 comments

Comments

@simonw
Copy link
Owner

simonw commented Oct 25, 2017

I want to understand how the system could perform under load with many concurrent long-running queries. Can we serve these without blocking the event loop?

@simonw
Copy link
Owner Author

simonw commented Oct 25, 2017

First experiment: hook up an iterative CSV dump (just because that’s a tiny bit easier to get started with than iterative a JSON). Have it execute a big select statement and then iterate through the result set 100 rows at a time using sqite fetchmany() - also have it async sleep for a second in between each batch of 100.

Can this work without needing python threads?

@simonw
Copy link
Owner Author

simonw commented Oct 25, 2017

If this does work, I need to figure it what to do about the HTML view. ASsuming I can iteratively produce JSON and CSV, what to do about HTML? One option: render the first 500 rows as HTML, then hand off to an infinite scroll experience that iteratively loads more rows as JSON.

@simonw
Copy link
Owner Author

simonw commented Oct 25, 2017

The gold standard here is to be able to serve up increasingly large datasets without blocking the event loop and while using a sustainable amount of RAM

@simonw
Copy link
Owner Author

simonw commented Oct 25, 2017

Ideally we can get some serious gains from the fact that our database file is opened with the immutable option.

@simonw simonw added this to the v1 stretch goals milestone Oct 25, 2017
simonw pushed a commit that referenced this issue Nov 5, 2017
SQLite operations are blocking, but we're running everything in Sanic, an
asyncio web framework, so blocking operations are bad - a long-running DB
operation could hold up the entire server.

Instead, I've moved all SQLite operations into threads. These are managed by a
concurrent.futures ThreadPoolExecutor. This means I can run up to X queries in
parallel, and I can continue to queue up additional incoming HTTP traffic
while the threadpool is busy.

Each thread is responsible for managing its own SQLite connections - one per
database. These are cached in a threadlocal.

Since we are working with immutable, read-only SQLite databases it should be
safe to share SQLite objects across threads. On this assumption I'm using the
check_same_thread=False option. Opening a database connection looks like this:

    conn = sqlite3.connect(
        'file:filename.db?immutable=1',
        uri=True,
        check_same_thread=False,
    )

The following articles were helpful in figuring this out:

* https://pymotw.com/3/asyncio/executors.html
* https://marlinux.wordpress.com/2017/05/19/python-3-6-asyncio-sqlalchemy/

Closes #45. Refs #38.
@simonw simonw removed this from the v1 stretch goals milestone Dec 10, 2017
@simonw
Copy link
Owner Author

simonw commented May 28, 2018

I think the way Datasette executes SQL queries in a thread pool introduced in #45 is a good solution for this ticket.

@simonw simonw closed this as completed May 28, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant