datasette inspect takes a very long time on large dbs #316

gavinband · 2018-06-18T11:56:27Z

Hi,

I want to expose data in a very large sqlite database (~600Gb) to the web. I have used datasette with success on smaller test databases with the same schema - it works very well (thanks!). However, using the full db, both datasette inspect and datasette serve seem to hang or pause for a very long time (tens of minutes) on startup. Is this expected behaviour?

(I noticed that the output of datasette inspect includes row counts for each table. Simply counting the rows in this db will take a long time (tens of millions of rows across each of ~10 tables), so I wondered if this is the source of the problem.)

Any help on a workaround would be appreciated.

The text was updated successfully, but these errors were encountered:

gavinband · 2018-06-18T12:00:43Z

I should add that I'm using datasette version 0.22, Python 2.7.10 on Mac OS X. Happy to send more info if helpful.

simonw · 2018-06-18T15:49:35Z

Wow, I've gone as high as 7GB but I've never tried it against 600GB.

datasette inspect is indeed expected to take a long time for large databases. That's why it's available as a separate command: by running datasette inspect to generate inspect-data.json you can execute it just once against a large database and then have datasette serve take advantage of that cached metadata (hence avoiding datasette serve hanging on startup).

As you spotted, most of the time is spent in those counts. I imagine you don't need those row counts in order for the rest of Datasette to function correctly (they are mainly used for display purposes - on the https://latest.datasette.io/fixtures index page for example).

If your database changes infrequently, for the moment I recommend running datasette inspect once to generate the inspect-data.json file (let me know how long it takes) and then passing that file to datasette serve mydb.db --inspect-file=inspect-data.json

If your database DOES change frequently then this workaround won't help you much. Let me know and I'll see how much work it would take to have those row counts be optional rather than required.

gavinband · 2018-06-18T16:12:45Z

Hi Simon,
Thanks for the response. Ok I'll try running datasette inspect up front.
In principle the db won't change. However, the site's in development and it's likely I'll need to add views and some auxiliary (smaller) tables as I go along. I will need to be careful with this if it involves an inspect step in each iteration, though.
g.

simonw · 2018-06-18T17:29:59Z

For #271 I've been contemplating having Datasette work against an on-disk database that gets modified without needing to restart the server.

For that to work, I'll have to dramatically change the inspect() mechanism. It may be that inspect becomes an optional optimization in the future.

simonw · 2019-05-11T18:26:25Z

This will be fixed by #419

simonw mentioned this issue Jun 18, 2018

Mechanism for automatically picking up changes when on-disk .db file changes #271

Closed

simonw closed this as completed May 11, 2019

jeansaad mentioned this issue Nov 10, 2023

Is the redirect slow for anyone else? jeansaad/chalet#42

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

datasette inspect takes a very long time on large dbs #316

datasette inspect takes a very long time on large dbs #316

gavinband commented Jun 18, 2018

gavinband commented Jun 18, 2018

simonw commented Jun 18, 2018 •

edited

Loading

gavinband commented Jun 18, 2018

simonw commented Jun 18, 2018 •

edited

Loading

simonw commented May 11, 2019

datasette inspect takes a very long time on large dbs #316

datasette inspect takes a very long time on large dbs #316

Comments

gavinband commented Jun 18, 2018

gavinband commented Jun 18, 2018

simonw commented Jun 18, 2018 • edited Loading

gavinband commented Jun 18, 2018

simonw commented Jun 18, 2018 • edited Loading

simonw commented May 11, 2019

simonw commented Jun 18, 2018 •

edited

Loading

simonw commented Jun 18, 2018 •

edited

Loading