New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Server hang on parallel execution of queries to named in-memory databases #2189
Comments
I need reliable steps to reproduce, then I can bisect and figure out which exact version of Datasette introduced the problem. I have a hunch that it relates to changes made to the |
The good news is that this bug is currently unlikely to affect most users since named in-memory databases (created using |
Just managed to get this exception trace:
|
The bug exhibits when I try to add a facet. I think it's caused by the parallel query execution I added to facets at some point. http://127.0.0.1:8045/airtable_refs/airtable_refs - no error Crucial line in the traceback:
From here: datasette/datasette/views/table.py Line 568 in 917272c
|
That line was added in 942411e which first shipped in 0.62a0. |
I wrote this test, but it passes: @pytest.mark.asyncio
async def test_facet_against_in_memory_database():
ds = Datasette()
db = ds.add_memory_database("mem")
await db.execute_write("create table t (id integer primary key, name text)")
await db.execute_write_many(
"insert into t (name) values (?)", [["one"], ["one"], ["two"]]
)
response1 = await ds.client.get("/mem/t.json")
assert response1.status_code == 200
response2 = await ds.client.get("/mem/t.json?_facet=name")
assert response2.status_code == 200
assert response2.json() == {
"ok": True,
"next": None,
"facet_results": {
"results": {
"name": {
"name": "name",
"type": "column",
"hideable": True,
"toggle_url": "/mem/t.json",
"results": [
{
"value": "one",
"label": "one",
"count": 2,
"toggle_url": "http://localhost/mem/t.json?_facet=name&name=one",
"selected": False,
},
{
"value": "two",
"label": "two",
"count": 1,
"toggle_url": "http://localhost/mem/t.json?_facet=name&name=two",
"selected": False,
},
],
"truncated": False,
}
},
"timed_out": [],
},
"rows": [
{"id": 1, "name": "one"},
{"id": 2, "name": "one"},
{"id": 3, "name": "two"},
],
"truncated": False,
} |
Landing a version of that test anyway. |
This is meant to illustrate a crashing bug but it does not trigger it.
This now executes two facets, in the hope that parallel facet execution would illustrate the bug - but it did not illustrate the bug.
Maybe it's not related to faceting - I just got it on a hit to |
Sometimes it takes a few clicks for the bug to occur, but it does seem to always be within the in-memory database. |
I managed to trigger it by loading |
I switched that particular implementation to using an on-disk database instead of an in-memory database and could no longer recreate the bug. |
OK, I can trigger the bug like this: datasette pottery2.db -p 8045 --get /airtable_refs/airtable_refs Can I write a bash script that fails (and terminates the process) if it takes longer than X seconds? |
This worked, including on macOS even though GPT-4 thought #!/bin/bash
# Run the command with a timeout of 5 seconds
timeout 5s datasette pottery2.db -p 8045 --get /airtable_refs/airtable_refs
# Check the exit code from timeout
if [ $? -eq 124 ]; then
echo "Error: Command timed out after 5 seconds."
exit 1
fi |
I'm now trying this test script: #!/bin/bash
port=8064
# Start datasette server in the background and get its PID
datasette pottery2.db -p $port &
server_pid=$!
# Wait for a moment to ensure the server has time to start up
sleep 2
# Initialize counters and parameters
retry_count=0
max_retries=3
success_count=0
path="/airtable_refs/airtable_refs"
# Function to run curl with a timeout
function test_curl {
# Run the curl command with a timeout of 3 seconds
timeout 3s curl -s "http://localhost:${port}${path}" > /dev/null
if [ $? -eq 0 ]; then
# Curl was successful
((success_count++))
fi
}
# Try three parallel curl requests
while [[ $retry_count -lt $max_retries ]]; do
# Reset the success counter
success_count=0
# Run the curls in parallel
echo " Running curls"
test_curl
test_curl
test_curl # & test_curl & test_curl &
# Wait for all curls to finish
#wait
# Check the success count
if [[ $success_count -eq 3 ]]; then
# All curls succeeded, break out of the loop
echo " All curl succeeded"
break
fi
((retry_count++))
done
# Kill the datasette server
echo "Killing datasette server with PID $server_pid"
kill -9 $server_pid
sleep 2
# Print result
if [[ $success_count -eq 3 ]]; then
echo "All three curls succeeded."
exit 0
else
echo "Error: Not all curls succeeded after $retry_count attempts."
exit 1
fi I run it like this: git bisect reset
git bisect start
git bisect good 0.59.4
git bisect bad 1.0a6
git bisect run ../airtable-export/testit.sh But... it's not having the desired result, I think because the bug is intermittent so each time I run it the bisect spits out a different commit as the one that is to blame. |
Output while it is running looks like this:
|
I tried it with a path of |
I knocked it down to 1 retry just to see what happened. |
Turned out I wasn't running the FIxed that with Now I'm seeing some passes, which look like this:
|
OK it looks like it found it!
942411e does look like the cause of this problem. |
I also confirmed that |
Now that I've confirmed that parallel query execution of the kind introduced in 942411e can cause hangs (presumably some kind of locking issue) against in-memory databases, some options:
The parallel execution work is something I was playing with last year in the hope of speeding up Datasette pages like the table page which need to execute a bunch of queries - one for each facet, plus one for each column to see if it should be suggested as a facet. I wrote about this at the time here: https://simonwillison.net/2022/May/6/weeknotes/ My hope was that despite Python's GIL this optimization would still help, because the SQLite C module releases the GIL once it gets to SQLite. But... that didn't hold up. It looked like enough work was happening in Python land with the GIL that the optimization didn't improve things. Running the ... which it now has! But it will still be a year or two before it fully lands: https://discuss.python.org/t/a-steering-council-notice-about-pep-703-making-the-global-interpreter-lock-optional-in-cpython/30474 So I'm not particularly concerned about dropping the parallel execution. If I do drop it though do I leave the potentially complex code in that relates to it? |
Looking again at this code: datasette/datasette/database.py Lines 87 to 117 in 6ed7908
Python docs at https://docs.python.org/3/library/sqlite3.html
I think I'm playing with fire by allowing multiple threads to access the same connection without doing my own serialization of those requests. I do do that using the write connection - and in this particular case the bug isn't coming from write queries, it's coming from read queries - but perhaps SQLite has issues with threading for reads, too. |
Using SQLite In Multi-Threaded Applications That indicates that there's a SQLite option for "Serialized" mode where it's safe to access anything SQLite provides from multiple threads, but as far as I can tell Python doesn't give you an option to turn that mode on or off for a connection - you can read On my Mac |
My current hunch is that SQLite gets unhappy if multiple threads access the same underlying C object - which sometimes happens with in-memory connections and Datasette presumably because they are faster than file-backed databases. I'm going to remove the |
The one other thing affected by this change is this documentation, which suggests a not-actually-safe pattern: Lines 1292 to 1321 in 6ed7908
|
Added a note to that example in the documentation: Line 1320 in 4e6a341
|
|
I'm going to release this in |
Release 0.64.4: https://docs.datasette.io/en/stable/changelog.html#v0-64-4 |
We're planning a breaking change in Since that's a breaking change I'm going to ship 1.0a7 right now with this fix, then ship that breaking change as |
1.0a7 is out with this fix as well now: https://docs.datasette.io/en/1.0a7/changelog.html#a7-2023-09-21 |
I've started to encounter a bug where queries to tables inside named in-memory databases sometimes trigger server hangs.
I'm still trying to figure out what's going on here - on one occasion I managed to Ctrl+C the server and saw an exception that mentioned a thread lock, but usually hitting Ctrl+C does nothing and I have to
kill -9
the PID instead.This is all running on my M2 Mac.
I've seen the bug in the Datasette 1.0 alphas and in Datasette 0.64.3 - but reverting to 0.61 appeared to fix it.
The text was updated successfully, but these errors were encountered: