Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

500 "attempt to write a readonly database" error caused by "PRAGMA schema_version" #2058

Open
simonw opened this issue Apr 11, 2023 · 9 comments
Labels

Comments

@simonw
Copy link
Owner

simonw commented Apr 11, 2023

I've not been able to replicate this myself yet, but I've seen log files from a user affected by it.

File "/usr/local/lib/python3.11/site-packages/datasette/views/base.py", line 89, in dispatch_request
await self.ds.refresh_schemas()
File "/usr/local/lib/python3.11/site-packages/datasette/app.py", line 371, in refresh_schemas
await self._refresh_schemas()
File "/usr/local/lib/python3.11/site-packages/datasette/app.py", line 386, in _refresh_schemas
schema_version = (await db.execute("PRAGMA schema_version")).first()[0]
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/datasette/database.py", line 267, in execute
results = await self.execute_fn(sql_operation_in_thread)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/datasette/database.py", line 213, in execute_fn
return await asyncio.get_event_loop().run_in_executor(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/concurrent/futures/thread.py", line 58, in run
result = self.fn(*self.args, **self.kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/datasette/database.py", line 211, in in_thread
return fn(conn)
^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/datasette/database.py", line 237, in sql_operation_in_thread
cursor.execute(sql, params if params is not None else {})
sqlite3.OperationalError: attempt to write a readonly database

That's running the official Datasette Docker image on https://fly.io/ - it's causing 500 errors on every page of their site.

@simonw simonw added the bug label Apr 11, 2023
@simonw
Copy link
Owner Author

simonw commented Apr 11, 2023

I thought it might relate to the "defensive mode" issue described here:

But I have since determined that the Datasette official Docker image does NOT run anything in defensive mode, so I don't think it's related to that.

@simonw
Copy link
Owner Author

simonw commented Apr 11, 2023

Asked on the SQLite Forum if anyone has seen this before: https://sqlite.org/forum/forumpost/793a2ed75b

@simonw
Copy link
Owner Author

simonw commented Apr 12, 2023

Here's the relevant code:

datasette/datasette/app.py

Lines 421 to 437 in 5890a20

async def _refresh_schemas(self):
internal_db = self.databases["_internal"]
if not self.internal_db_created:
await init_internal_db(internal_db)
self.internal_db_created = True
current_schema_versions = {
row["database_name"]: row["schema_version"]
for row in await internal_db.execute(
"select database_name, schema_version from databases"
)
}
for database_name, db in self.databases.items():
schema_version = (await db.execute("PRAGMA schema_version")).first()[0]
# Compare schema versions to see if we should skip it
if schema_version == current_schema_versions.get(database_name):
continue

This function is called on almost every request (everything that subclasses BaseView at least - need to remember that for the refactor in #2053 etc).

async def dispatch_request(self, request):
if self.ds:
await self.ds.refresh_schemas()

It uses PRAGMA schema_version as a cheap way to determine if the schema has changed, in which case it needs to refresh the internal schema tables.

This was already the cause of a subtle bug here:

@simonw
Copy link
Owner Author

simonw commented Apr 12, 2023

Here's a potential workaround: when I store the schema versions, I could also score an MD5 hash of the full schema (select group_concat(sql) from sqlite_master). When I read the schema version with PRAGMA schema_version I could catch that exception and, if I see it, I could calculate that MD5 hash again as a fallback and use that to determine if the schema has changed instead.

The performance overhead of this needs investigating - how much more expensive is md5(... that SQL query result) compared to just PRAGMA schema_version, especially on a database with a lot of tables?

@simonw
Copy link
Owner Author

simonw commented Apr 12, 2023

I got ChatGPT (code execution alpha) to run a micro-benchmark for me. This was the conclusion:

The benchmark using PRAGMA schema_version is approximately 1.36 times faster than the benchmark using hashlib.md5 for the case with 100 tables. For the case with 200 tables, the benchmark using PRAGMA schema_version is approximately 2.33 times faster than the benchmark using hashlib.md5.

Here's the chart it drew me:

image

(It's a pretty rubbish chart though, it only took measurements at 100 and 200 and drew a line between the two, I should have told it to measure every 10 and plot that)

And the full transcript: https://gist.github.com/simonw/2fc46effbfbe49e6de0bcfdc9e31b235

The benchmark looks good enough on first glance that I don't feel the need to be more thorough with it. PRAGMA schema_version is faster, but not so fast that I feel like the MD5 hack is worth worrying about too much.

I'm tempted to add something to the /-/versions page that tries to identify if this is a problem or not though.

@simonw
Copy link
Owner Author

simonw commented Apr 12, 2023

Here's a much better chart, which shows that MD5 performance unsurprisingly gets worse as the number of tables increases while schema_version remains constant:

image

@simonw
Copy link
Owner Author

simonw commented Apr 12, 2023

I tightened up the benchmark (it was measuring the time taken to create the tables too) and got this:

image

@cephillips
Copy link

Really interesting how you are using ChatGPT in this.

@esagara
Copy link

esagara commented Apr 13, 2023

I tried deploying the fix you submitted, but still getting the same errors. I can past the logs here if needed, but I really don't see anything new in them.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants