Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

datasette inspect takes a very long time on large dbs #316

Closed
gavinband opened this issue Jun 18, 2018 · 5 comments
Closed

datasette inspect takes a very long time on large dbs #316

gavinband opened this issue Jun 18, 2018 · 5 comments

Comments

@gavinband
Copy link

Hi,

I want to expose data in a very large sqlite database (~600Gb) to the web. I have used datasette with success on smaller test databases with the same schema - it works very well (thanks!). However, using the full db, both datasette inspect and datasette serve seem to hang or pause for a very long time (tens of minutes) on startup. Is this expected behaviour?

(I noticed that the output of datasette inspect includes row counts for each table. Simply counting the rows in this db will take a long time (tens of millions of rows across each of ~10 tables), so I wondered if this is the source of the problem.)

Any help on a workaround would be appreciated.

@gavinband
Copy link
Author

I should add that I'm using datasette version 0.22, Python 2.7.10 on Mac OS X. Happy to send more info if helpful.

@simonw
Copy link
Owner

simonw commented Jun 18, 2018

Wow, I've gone as high as 7GB but I've never tried it against 600GB.

datasette inspect is indeed expected to take a long time for large databases. That's why it's available as a separate command: by running datasette inspect to generate inspect-data.json you can execute it just once against a large database and then have datasette serve take advantage of that cached metadata (hence avoiding datasette serve hanging on startup).

As you spotted, most of the time is spent in those counts. I imagine you don't need those row counts in order for the rest of Datasette to function correctly (they are mainly used for display purposes - on the https://latest.datasette.io/fixtures index page for example).

If your database changes infrequently, for the moment I recommend running datasette inspect once to generate the inspect-data.json file (let me know how long it takes) and then passing that file to datasette serve mydb.db --inspect-file=inspect-data.json

If your database DOES change frequently then this workaround won't help you much. Let me know and I'll see how much work it would take to have those row counts be optional rather than required.

@gavinband
Copy link
Author

Hi Simon,
Thanks for the response. Ok I'll try running datasette inspect up front.
In principle the db won't change. However, the site's in development and it's likely I'll need to add views and some auxiliary (smaller) tables as I go along. I will need to be careful with this if it involves an inspect step in each iteration, though.
g.

@simonw
Copy link
Owner

simonw commented Jun 18, 2018

For #271 I've been contemplating having Datasette work against an on-disk database that gets modified without needing to restart the server.

For that to work, I'll have to dramatically change the inspect() mechanism. It may be that inspect becomes an optional optimization in the future.

@simonw
Copy link
Owner

simonw commented May 11, 2019

This will be fixed by #419

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants