Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

internal.db database is written to WAY too often #2290

Closed
simonw opened this issue Mar 1, 2024 · 6 comments
Closed

internal.db database is written to WAY too often #2290

simonw opened this issue Mar 1, 2024 · 6 comments

Comments

@simonw
Copy link
Owner

simonw commented Mar 1, 2024

I'm seeing Litestream backing up that database (when loaded on disk using the --internal option) constantly even when it shouldn't have had any changes.

@simonw
Copy link
Owner Author

simonw commented Mar 1, 2024

I applied this change:

diff --git a/datasette/utils/internal_db.py b/datasette/utils/internal_db.py
index dbfcceb4..6a8c1eb8 100644
--- a/datasette/utils/internal_db.py
+++ b/datasette/utils/internal_db.py
@@ -66,9 +66,11 @@ async def init_internal_db(db):
 
 
 async def populate_schema_tables(internal_db, db):
+    print("populate_schema_tables")
     database_name = db.name
 
     def delete_everything(conn):
+        print("  delete_everything")
         conn.execute(
             "DELETE FROM catalog_tables WHERE database_name = ?", [database_name]
         )

And confirmed that the populate_schema_tables() function is called once at startup and then only after a schema change has been made.

@simonw
Copy link
Owner Author

simonw commented Mar 1, 2024

Which lead me to this piece of code:

datasette/datasette/app.py

Lines 486 to 505 in 86335dc

for database_name, db in self.databases.items():
schema_version = (await db.execute("PRAGMA schema_version")).first()[0]
# Compare schema versions to see if we should skip it
if schema_version == current_schema_versions.get(database_name):
continue
placeholders = "(?, ?, ?, ?)"
values = [database_name, str(db.path), db.is_memory, schema_version]
if db.path is None:
placeholders = "(?, null, ?, ?)"
values = [database_name, db.is_memory, schema_version]
await internal_db.execute_write(
"""
INSERT OR REPLACE INTO catalog_databases (database_name, path, is_memory, schema_version)
VALUES {}
""".format(
placeholders
),
values,
)
await populate_schema_tables(internal_db, db)

I thought for a moment that the INSERT OR REPLACE INTO there meant there would be constant changes to the internal.db database, but actually that looks like it only runs if the schema version has changed.

@simonw
Copy link
Owner Author

simonw commented Mar 1, 2024

To help investigate I upgraded this plugin:

It doesn't seem to be showing the _internal database being upgraded much though - it's at version 36 and sticking there.

@simonw
Copy link
Owner Author

simonw commented Mar 1, 2024

Added this debug code:

diff --git a/datasette/app.py b/datasette/app.py
index 8591af6a..ea61ba75 100644
--- a/datasette/app.py
+++ b/datasette/app.py
@@ -486,8 +486,12 @@ class Datasette:
         for database_name, db in self.databases.items():
             schema_version = (await db.execute("PRAGMA schema_version")).first()[0]
             # Compare schema versions to see if we should skip it
+            print("schema_version = {}, current_schema_version = {}".format(
+                schema_version, current_schema_versions.get(database_name)
+            ))
             if schema_version == current_schema_versions.get(database_name):
                 continue
+            print("  Did not skip, gonna INSERT OR REPLACE INTO")
             placeholders = "(?, ?, ?, ?)"
             values = [database_name, str(db.path), db.is_memory, schema_version]
             if db.path is None:

And it looks like that INSERT OR REPLACE INTO is only run when the schema changes, so it shouldn't be running very often at all.

@simonw
Copy link
Owner Author

simonw commented Mar 1, 2024

I'm not at all sure that internal.db is being updated often - I think this bug I'm seeing may relate to a specific configuration of the way I'm running it in Datasette Cloud.

I'll try one last thing: I'm going to run datasette --internal internal.db locally with some plugins and have a watch command that checks the MD5 hash of the database to see if it changed.

@simonw
Copy link
Owner Author

simonw commented Mar 1, 2024

I ran watch md5sum internal.db and fired up Datasette, then messed around with inserting rows and altering the schema.

The MD5 of that file only updated after I altered the table schemas, which is as it should be.

So this isn't a Datasette bug after all.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant