Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

invoke_startup() is not run in some conditions, e.g. gunicorn/uvicorn workers, breaking lots of things #1955

Closed
Rik-de-Kort opened this issue Dec 14, 2022 · 36 comments
Labels

Comments

@Rik-de-Kort
Copy link

Rik-de-Kort commented Dec 14, 2022

In the past (pre-september 14, #1809) I had a running deployment of Datasette on Azure WebApps by emulating the call in cli.py to Gunicorn: gunicorn -w 2 -k uvicorn.workers.UvicornWorker app:app.

My most recent deployment, however, fails loudly by shouting that Datasette.invoke_startup() was not called. It does not seem to be possible to call invoke_startup when running using a uvicorn command directly like this (I've reproduced this locally using uvicorn). Two candidates that I have tried:

  • Uvicorn has a --factory option, but the app factory has to be synchronous, so no await invoke_startup there
  • asyncio.get_event_loop().run_until_complete is also not an option because uvicorn already has the event loop running.

One additional option is:

  • Use Gunicorn's server hooks to call invoke_startup. These are also synchronous, but I might be able to get ahead of the event loop starting here.

In my current deployment setup, it does not appear to be possible to use datasette serve directly, so I'm stuck either

  • Trying to rework my complete deployment setup, for instance, using Azure functions as described here)
  • Or dig into the ASGI spec and write a wrapper for the sole purpose of launching Datasette using a direct Uvicorn invocation.

Questions for the maintainers:

  • Is this intended behaviour/will not support/etc.? If so, I'd be happy to add a PR with a couple lines in the documentation.
  • if this is not intended behaviour, what is a good way to fix it? I could have a go at the ASGI spec thing (I think the Azure Functions thing is related) and provide a PR with the wrapper here, but I'm all ears!

Almost forgot, minimal reproducer:

from datasette import Datasette

ds = Datasette(files=['./global-power-plants.db'])]
app = ds.app()

Save as app.py in the same folder as global-power-plants.db, and then try running
uvicorn app:app.

Opening the resulting Datasette instance in the browser will show the error message.

@simonw
Copy link
Owner

simonw commented Dec 15, 2022

This is definitely a regression: Datasette is meant to work in those environments, and I didn't think to test them when I added the invoke_startup() hook.

Coincidentally I actually built a plugin for running Datasette with Gunicorn just a couple of months ago:

https://datasette.io/plugins/datasette-gunicorn

And I just tested and it has the same bug you describe here! Filed:

@simonw
Copy link
Owner

simonw commented Dec 15, 2022

Datasette 0.63 is the release that broke this, thanks to this issue:

@simonw
Copy link
Owner

simonw commented Dec 15, 2022

It's possible the fix for this might be for the first incoming HTTP request to trigger invoke_startup() if it hasn't been called yet - similar to the hack I put in place for datasette.client.get() in tests:

datasette/datasette/app.py

Lines 1728 to 1731 in e054704

async def get(self, path, **kwargs):
await self.ds.invoke_startup()
async with httpx.AsyncClient(app=self.app) as client:
return await client.get(self._fix(path), **kwargs)

This would be a much more elegant fix, I could remove those multiple invoke_startup() calls entirely - and remove this tip from the documentation too: https://docs.datasette.io/en/0.63.2/testing_plugins.html#setting-up-a-datasette-test-instance

@simonw
Copy link
Owner

simonw commented Dec 15, 2022

Wow, just spotted this in the code - it turns out I solved this problem a different (and better) way long before i introduced invoke_startup()!

datasette/datasette/app.py

Lines 1416 to 1440 in e054704

def app(self):
"""Returns an ASGI app function that serves the whole of Datasette"""
routes = self._routes()
self._register_custom_units()
async def setup_db():
# First time server starts up, calculate table counts for immutable databases
for dbname, database in self.databases.items():
if not database.is_mutable:
await database.table_counts(limit=60 * 60 * 1000)
asgi = asgi_csrf.asgi_csrf(
DatasetteRouter(self, routes),
signing_secret=self._secret,
cookie_name="ds_csrftoken",
skip_if_scope=lambda scope: any(
pm.hook.skip_csrf(datasette=self, scope=scope)
),
)
if self.setting("trace_debug"):
asgi = AsgiTracer(asgi)
asgi = AsgiLifespan(
asgi,
on_startup=setup_db,
)

@simonw
Copy link
Owner

simonw commented Dec 15, 2022

That may not be the best fix here. It turns out this pattern:

    async def get(self, path, **kwargs):
        async with httpx.AsyncClient(app=self.app) as client:
            return await client.get(self._fix(path), **kwargs)

Doesn't trigger that AsgiLifespan class.

I wrote about that previously in this TIL: https://til.simonwillison.net/asgi/lifespan-test-httpx

@simonw
Copy link
Owner

simonw commented Dec 15, 2022

So actually that setup_db() function I wrote back in 2019 has not been executing for most of Datasette's tests. Which seems bad.

I'm inclined to ditch AsgiLifespan entirely in favour of the mechanism I described above, where invoke_startup() is called for every request on the first request processed by the server.

@simonw simonw changed the title Running Datasette using direct Uvicorn/Gunicorn command is not possible invoke_startup() is not run in some conditions, e.g. gunicorn/uvicorn workers, breaking lots of things Dec 15, 2022
@simonw
Copy link
Owner

simonw commented Dec 15, 2022

It looks like that fix almost works... except it seems to push the tests into an infinite loop or similar? They're not finishing their runs from what I can see.

@simonw
Copy link
Owner

simonw commented Dec 15, 2022

Running:

pytest -n auto -x -v

On may laptop to see if I can replicate.

@simonw
Copy link
Owner

simonw commented Dec 15, 2022

I added this to conftest.py:

@pytest.fixture(autouse=True)
def log_name_of_test_before_test(request):
    # To help identify tests that are hanging
    name = str(request.node)
    with open("/tmp/test.log", "a") as f:
        f.write(name + "\n")
    yield

This logs out the name of each test to /tmp/test.log before running the test - so I can wait until it hangs and see which test it was that caused that.

@simonw
Copy link
Owner

simonw commented Dec 15, 2022

This is surprising!

Image

The logs suggest that the test suite hung running this test here:

def test_path_with_added_args(path, added_args, expected):
request = Request.fake(path)
actual = utils.path_with_added_args(request, added_args)
assert expected == actual

I find that very hard to believe.

@simonw
Copy link
Owner

simonw commented Dec 15, 2022

I added return to the first line of that test to disable it, then ran again - and now it's hanging at about the same progress point through the tests but in a different test:

Image

So this time it was hanging at test_urlsafe_components().

So it's clearly not the individual tests themselves that are the problem - something about running the entire test suite in one go is incompatible with this change for some reason.

@simonw
Copy link
Owner

simonw commented Dec 15, 2022

When I hit Ctr+C here's the traceback I get:

^C^CException ignored in: <module 'threading' from '/Users/simon/.pyenv/versions/3.10.3/lib/python3.10/threading.py'>
Traceback (most recent call last):
  File "/Users/simon/.pyenv/versions/3.10.3/lib/python3.10/threading.py", line 1530, in _shutdown
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! KeyboardInterrupt !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
/Users/simon/.pyenv/versions/3.10.3/lib/python3.10/threading.py:324: KeyboardInterrupt
(to show a full traceback on KeyboardInterrupt use --full-trace)
Traceback (most recent call last):
  File "/Users/simon/.local/share/virtualenvs/datasette-AWNrQs95/bin/pytest", line 8, in <module>
    atexit_call()
  File "/Users/simon/.pyenv/versions/3.10.3/lib/python3.10/concurrent/futures/thread.py", line 31, in _python_exit
    sys.exit(console_main())
  File "/Users/simon/.local/share/virtualenvs/datasette-AWNrQs95/lib/python3.10/site-packages/_pytest/config/__init__.py", line 187, in console_main
    t.join()
  File "/Users/simon/.pyenv/versions/3.10.3/lib/python3.10/threading.py", line 1089, in join
    self._wait_for_tstate_lock()
  File "/Users/simon/.pyenv/versions/3.10.3/lib/python3.10/threading.py", line 1109, in _wait_for_tstate_lock
    if lock.acquire(block, timeout):
KeyboardInterrupt: 
    code = main()
  File "/Users/simon/.local/share/virtualenvs/datasette-AWNrQs95/lib/python3.10/site-packages/_pytest/config/__init__.py", line 164, in main
    ret: Union[ExitCode, int] = config.hook.pytest_cmdline_main(
  File "/Users/simon/.local/share/virtualenvs/datasette-AWNrQs95/lib/python3.10/site-packages/pluggy/_hooks.py", line 265, in __call__
    return self._hookexec(self.name, self.get_hookimpls(), kwargs, firstresult)
  File "/Users/simon/.local/share/virtualenvs/datasette-AWNrQs95/lib/python3.10/site-packages/pluggy/_manager.py", line 80, in _hookexec
    return self._inner_hookexec(hook_name, methods, kwargs, firstresult)
  File "/Users/simon/.local/share/virtualenvs/datasette-AWNrQs95/lib/python3.10/site-packages/pluggy/_callers.py", line 60, in _multicall
    return outcome.get_result()
  File "/Users/simon/.local/share/virtualenvs/datasette-AWNrQs95/lib/python3.10/site-packages/pluggy/_result.py", line 60, in get_result
    raise ex[1].with_traceback(ex[2])
  File "/Users/simon/.local/share/virtualenvs/datasette-AWNrQs95/lib/python3.10/site-packages/pluggy/_callers.py", line 39, in _multicall
    res = hook_impl.function(*args)
  File "/Users/simon/.local/share/virtualenvs/datasette-AWNrQs95/lib/python3.10/site-packages/_pytest/main.py", line 315, in pytest_cmdline_main
    return wrap_session(config, _main)
  File "/Users/simon/.local/share/virtualenvs/datasette-AWNrQs95/lib/python3.10/site-packages/_pytest/main.py", line 303, in wrap_session
    config.hook.pytest_sessionfinish(
  File "/Users/simon/.local/share/virtualenvs/datasette-AWNrQs95/lib/python3.10/site-packages/pluggy/_hooks.py", line 265, in __call__
    return self._hookexec(self.name, self.get_hookimpls(), kwargs, firstresult)
  File "/Users/simon/.local/share/virtualenvs/datasette-AWNrQs95/lib/python3.10/site-packages/pluggy/_manager.py", line 80, in _hookexec
    return self._inner_hookexec(hook_name, methods, kwargs, firstresult)
  File "/Users/simon/.local/share/virtualenvs/datasette-AWNrQs95/lib/python3.10/site-packages/pluggy/_callers.py", line 55, in _multicall
    gen.send(outcome)
  File "/Users/simon/.local/share/virtualenvs/datasette-AWNrQs95/lib/python3.10/site-packages/_pytest/terminal.py", line 798, in pytest_sessionfinish
    outcome.get_result()
  File "/Users/simon/.local/share/virtualenvs/datasette-AWNrQs95/lib/python3.10/site-packages/pluggy/_result.py", line 60, in get_result
    raise ex[1].with_traceback(ex[2])
  File "/Users/simon/.local/share/virtualenvs/datasette-AWNrQs95/lib/python3.10/site-packages/pluggy/_callers.py", line 39, in _multicall
    res = hook_impl.function(*args)
  File "/Users/simon/.local/share/virtualenvs/datasette-AWNrQs95/lib/python3.10/site-packages/xdist/dsession.py", line 88, in pytest_sessionfinish
    nm.teardown_nodes()
  File "/Users/simon/.local/share/virtualenvs/datasette-AWNrQs95/lib/python3.10/site-packages/xdist/workermanage.py", line 79, in teardown_nodes
    self.group.terminate(self.EXIT_TIMEOUT)
  File "/Users/simon/.local/share/virtualenvs/datasette-AWNrQs95/lib/python3.10/site-packages/execnet/multi.py", line 215, in terminate
    safe_terminate(
  File "/Users/simon/.local/share/virtualenvs/datasette-AWNrQs95/lib/python3.10/site-packages/execnet/multi.py", line 311, in safe_terminate
    reply.get()
  File "/Users/simon/.local/share/virtualenvs/datasette-AWNrQs95/lib/python3.10/site-packages/execnet/gateway_base.py", line 206, in get
    self.waitfinish(timeout)
  File "/Users/simon/.local/share/virtualenvs/datasette-AWNrQs95/lib/python3.10/site-packages/execnet/gateway_base.py", line 213, in waitfinish
    if not self._result_ready.wait(timeout):
  File "/Users/simon/.pyenv/versions/3.10.3/lib/python3.10/threading.py", line 600, in wait
    signaled = self._cond.wait(timeout)
  File "/Users/simon/.pyenv/versions/3.10.3/lib/python3.10/threading.py", line 320, in wait
    waiter.acquire()
KeyboardInterrupt

It looks to me like this relates to pytest-xdist istelf - it's waiting on some locks but site-packages/xdist/workermanage.py shows up in that track.

@simonw
Copy link
Owner

simonw commented Dec 15, 2022

@simonw
Copy link
Owner

simonw commented Dec 15, 2022

Just noticed this: https://github.com/simonw/datasette/actions/runs/3706504228/jobs/6281796135

image

This suggests that the regular tests passed in CI fine, but the non-serial ones failed.

I'm going to try running everything using pytest -n auto without splitting serial and non-serial tests. Maybe the serial thing isn't needed any more?

@simonw
Copy link
Owner

simonw commented Dec 15, 2022

OK, I've broken the test suite here.

I'm going to revert these two commits:

Then I'll do a bunch of work making the test suite more robust before I try this again.

@simonw simonw added the blocked Awaiting something else to happen first label Dec 15, 2022
simonw added a commit that referenced this issue Dec 15, 2022
simonw added a commit that referenced this issue Dec 15, 2022
@simonw
Copy link
Owner

simonw commented Dec 17, 2022

I'm trying this fix again, after a bunch of work on the test suite in:

@simonw
Copy link
Owner

simonw commented Dec 17, 2022

No, it still causes the tests to hang (I let them run for 12 minutes):

image

Interesting that the regular tests passed an then the pytest -m serial ones seem to have failed.

@simonw
Copy link
Owner

simonw commented Dec 18, 2022

pytest -m serial on my Mac laptop also freezes:

(datasette) datasette % pytest -m serial
======================================================= test session starts ========================================================
platform darwin -- Python 3.10.3, pytest-7.1.3, pluggy-1.0.0
SQLite: 3.39.4
rootdir: /Users/simon/Dropbox/Development/datasette, configfile: pytest.ini
plugins: anyio-3.6.1, xdist-2.5.0, forked-1.4.0, asyncio-0.19.0, timeout-2.1.0, profiling-1.7.0
asyncio: mode=strict
collected 1295 items / 1264 deselected / 31 selected                                                                               

tests/test_package.py .                                                                                                      [  3%]
tests/test_cli_serve_server.py 

@simonw
Copy link
Owner

simonw commented Dec 18, 2022

Hitting Ctrl+C while using --full-trace gave me more clues:

% pytest -m serial tests/test_cli_serve_server.py --full-trace
======================================================= test session starts ========================================================
platform darwin -- Python 3.10.3, pytest-7.1.3, pluggy-1.0.0
SQLite: 3.39.4
rootdir: /Users/simon/Dropbox/Development/datasette, configfile: pytest.ini
plugins: anyio-3.6.1, xdist-2.5.0, forked-1.4.0, asyncio-0.19.0, timeout-2.1.0, profiling-1.7.0
asyncio: mode=strict
collected 3 items                                                                                                                  

tests/test_cli_serve_server.py ^C^C

====================================================== no tests ran in 3.49s =======================================================
Traceback (most recent call last):
  File "/Users/simon/.local/share/virtualenvs/datasette-AWNrQs95/lib/python3.10/site-packages/httpcore/_exceptions.py", line 8, in map_exceptions
    yield
  File "/Users/simon/.local/share/virtualenvs/datasette-AWNrQs95/lib/python3.10/site-packages/httpcore/backends/sync.py", line 86, in connect_tcp
    sock = socket.create_connection(
  File "/Users/simon/.pyenv/versions/3.10.3/lib/python3.10/socket.py", line 845, in create_connection
    raise err
  File "/Users/simon/.pyenv/versions/3.10.3/lib/python3.10/socket.py", line 833, in create_connection
    sock.connect(sa)
ConnectionRefusedError: [Errno 61] Connection refused
[...]

@simonw
Copy link
Owner

simonw commented Dec 18, 2022

I think that's this test:

@pytest.mark.serial
def test_serve_localhost_http(ds_localhost_http_server):
response = httpx.get("http://localhost:8041/_memory.json")
assert {
"database": "_memory",
"path": "/_memory",
"tables": [],
}.items() <= response.json().items()

Using this fixture:

datasette/tests/conftest.py

Lines 155 to 175 in 63fb750

@pytest.fixture(scope="session")
def ds_localhost_http_server():
ds_proc = subprocess.Popen(
["datasette", "--memory", "-p", "8041"],
stdout=subprocess.PIPE,
stderr=subprocess.STDOUT,
# Avoid FileNotFoundError: [Errno 2] No such file or directory:
cwd=tempfile.gettempdir(),
)
# Loop until port 8041 serves traffic
while True:
try:
httpx.get("http://localhost:8041/")
break
except httpx.ConnectError:
time.sleep(0.1)
# Check it started successfully
assert not ds_proc.poll(), ds_proc.stdout.read().decode("utf-8")
yield ds_proc
# Shut it down at the end of the pytest session
ds_proc.terminate()

@simonw
Copy link
Owner

simonw commented Dec 18, 2022

This bit here looks like it could hang!

     # Loop until port 8041 serves traffic 
     while True: 
         try: 
             httpx.get("http://localhost:8041/") 
             break 
         except httpx.ConnectError: 
             time.sleep(0.1) 

@simonw
Copy link
Owner

simonw commented Dec 18, 2022

Improved version of that fixture:

diff --git a/tests/conftest.py b/tests/conftest.py
index 44c44f87..69dee68b 100644
--- a/tests/conftest.py
+++ b/tests/conftest.py
@@ -27,6 +27,17 @@ UNDOCUMENTED_PERMISSIONS = {
 _ds_client = None
 
 
+def wait_until_responds(url, timeout=5.0, client=httpx, **kwargs):
+    start = time.time()
+    while time.time() - start < timeout:
+        try:
+            client.get(url, **kwargs)
+            return
+        except httpx.ConnectError:
+            time.sleep(0.1)
+    raise AssertionError("Timed out waiting for {} to respond".format(url))
+
+
 @pytest_asyncio.fixture
 async def ds_client():
     from datasette.app import Datasette
@@ -161,13 +172,7 @@ def ds_localhost_http_server():
         # Avoid FileNotFoundError: [Errno 2] No such file or directory:
         cwd=tempfile.gettempdir(),
     )
-    # Loop until port 8041 serves traffic
-    while True:
-        try:
-            httpx.get("http://localhost:8041/")
-            break
-        except httpx.ConnectError:
-            time.sleep(0.1)
+    wait_until_responds("http://localhost:8041/")
     # Check it started successfully
     assert not ds_proc.poll(), ds_proc.stdout.read().decode("utf-8")
     yield ds_proc
@@ -202,12 +207,7 @@ def ds_localhost_https_server(tmp_path_factory):
         stderr=subprocess.STDOUT,
         cwd=tempfile.gettempdir(),
     )
-    while True:
-        try:
-            httpx.get("https://localhost:8042/", verify=client_cert)
-            break
-        except httpx.ConnectError:
-            time.sleep(0.1)
+    wait_until_responds("http://localhost:8042/", verify=client_cert)
     # Check it started successfully
     assert not ds_proc.poll(), ds_proc.stdout.read().decode("utf-8")
     yield ds_proc, client_cert
@@ -231,12 +231,7 @@ def ds_unix_domain_socket_server(tmp_path_factory):
     # Poll until available
     transport = httpx.HTTPTransport(uds=uds)
     client = httpx.Client(transport=transport)
-    while True:
-        try:
-            client.get("http://localhost/_memory.json")
-            break
-        except httpx.ConnectError:
-            time.sleep(0.1)
+    wait_until_responds("http://localhost/_memory.json", client=client)
     # Check it started successfully
     assert not ds_proc.poll(), ds_proc.stdout.read().decode("utf-8")
     yield ds_proc, uds

@simonw
Copy link
Owner

simonw commented Dec 18, 2022

... and it turns out those tests saved me. Because I forgot to check if datasette would actually start a server correctly!

 % datasette fixtures.db -p 8852
INFO:     Started server process [3538]
INFO:     Waiting for application startup.
ERROR:    Exception in 'lifespan' protocol
Traceback (most recent call last):
  File "/Users/simon/.local/share/virtualenvs/datasette-AWNrQs95/lib/python3.10/site-packages/uvicorn/lifespan/on.py", line 86, in main
    await app(scope, self.receive, self.send)
  File "/Users/simon/.local/share/virtualenvs/datasette-AWNrQs95/lib/python3.10/site-packages/uvicorn/middleware/proxy_headers.py", line 78, in __call__
    return await self.app(scope, receive, send)
  File "/Users/simon/Dropbox/Development/datasette/datasette/utils/asgi.py", line 437, in __call__
    return await self.asgi(scope, receive, send)
  File "/Users/simon/.local/share/virtualenvs/datasette-AWNrQs95/lib/python3.10/site-packages/asgi_csrf.py", line 39, in app_wrapped_with_csrf
    await app(scope, receive, send)
  File "/Users/simon/Dropbox/Development/datasette/datasette/app.py", line 1457, in __call__
    path = scope["path"]
KeyError: 'path'
ERROR:    Application startup failed. Exiting.

@simonw
Copy link
Owner

simonw commented Dec 18, 2022

Now the only failure is in the https test - which fails like this (in CI and on my laptop):

            message = str(exc)
>           raise mapped_exc(message) from exc
E           httpx.RemoteProtocolError: Server disconnected without sending a response.

/opt/hostedtoolcache/Python/3.11.1/x64/lib/python3.11/site-packages/httpx/_transports/default.py:77: RemoteProtocolError
=========================== short test summary info ============================
ERROR tests/test_cli_serve_server.py::test_serve_localhost_https - httpx.RemoteProtocolError: Server disconnected without sending a response.
================= 30 passed, 1264 deselected, 1 error in 6.15s =================

That's this test:

@pytest.mark.serial
def test_serve_localhost_https(ds_localhost_https_server):
_, client_cert = ds_localhost_https_server
response = httpx.get("https://localhost:8042/_memory.json", verify=client_cert)
assert {
"database": "_memory",
"path": "/_memory",
"tables": [],
}.items() <= response.json().items()

And this fixture:

datasette/tests/conftest.py

Lines 178 to 215 in 63fb750

@pytest.fixture(scope="session")
def ds_localhost_https_server(tmp_path_factory):
cert_directory = tmp_path_factory.mktemp("certs")
ca = trustme.CA()
server_cert = ca.issue_cert("localhost")
keyfile = str(cert_directory / "server.key")
certfile = str(cert_directory / "server.pem")
client_cert = str(cert_directory / "client.pem")
server_cert.private_key_pem.write_to_path(path=keyfile)
for blob in server_cert.cert_chain_pems:
blob.write_to_path(path=certfile, append=True)
ca.cert_pem.write_to_path(path=client_cert)
ds_proc = subprocess.Popen(
[
"datasette",
"--memory",
"-p",
"8042",
"--ssl-keyfile",
keyfile,
"--ssl-certfile",
certfile,
],
stdout=subprocess.PIPE,
stderr=subprocess.STDOUT,
cwd=tempfile.gettempdir(),
)
while True:
try:
httpx.get("https://localhost:8042/", verify=client_cert)
break
except httpx.ConnectError:
time.sleep(0.1)
# Check it started successfully
assert not ds_proc.poll(), ds_proc.stdout.read().decode("utf-8")
yield ds_proc, client_cert
# Shut it down at the end of the pytest session
ds_proc.terminate()

@simonw
Copy link
Owner

simonw commented Dec 18, 2022

During the polling loop it constantly raises:

httpx.RemoteProtocolError: Server disconnected without sending a response

@simonw
Copy link
Owner

simonw commented Dec 18, 2022

Maybe the reason the ASGI lifespan stuff broke was this line:

datasette/datasette/cli.py

Lines 630 to 632 in 8b73fc6

uvicorn_kwargs = dict(
host=host, port=port, log_level="info", lifespan="on", workers=1
)

@simonw
Copy link
Owner

simonw commented Dec 18, 2022

I added the TLS support here:

@simonw
Copy link
Owner

simonw commented Dec 18, 2022

I used the steps to test manually from this comment: #1221 (comment)

In one terminal:

cd /tmp
python -m trustme
datasette --memory --ssl-keyfile=/tmp/server.key --ssl-certfile=/tmp/server.pem -p 8003

Then in another terminal:

curl --cacert /tmp/client.pem 'https://localhost:8003/_memory.json'

This worked correctly, outputting the expected JSON.

So the feature still works, it's just the test that is broken for some reason.

@simonw
Copy link
Owner

simonw commented Dec 18, 2022

This issue might be relevant, but I tried the suggested fix in there (Connection: close on the incoming requests) and it didn't fix my problem:

@simonw
Copy link
Owner

simonw commented Dec 18, 2022

Rather than continue to bang my head against this, I'm tempted to rewrite this test to happen outside of Python world - in a bash script run by GitHub Actions, for example.

@simonw
Copy link
Owner

simonw commented Dec 18, 2022

Various attempts at a fix which didn't work:

diff --git a/tests/conftest.py b/tests/conftest.py
index 69dee68b..899d36fd 100644
--- a/tests/conftest.py
+++ b/tests/conftest.py
@@ -1,4 +1,3 @@
-import asyncio
 import httpx
 import os
 import pathlib
@@ -6,6 +5,7 @@ import pytest
 import pytest_asyncio
 import re
 import subprocess
+import sys
 import tempfile
 import time
 import trustme
@@ -27,13 +27,23 @@ UNDOCUMENTED_PERMISSIONS = {
 _ds_client = None
 
 
-def wait_until_responds(url, timeout=5.0, client=httpx, **kwargs):
+def wait_until_responds(url, timeout=5.0, client=None, **kwargs):
+    client = client or httpx.Client(**kwargs)
     start = time.time()
     while time.time() - start < timeout:
         try:
-            client.get(url, **kwargs)
+            if "verify" in kwargs:
+                print(kwargs["verify"])
+                print(
+                    "Contents of verify file: {}".format(
+                        open(kwargs.get("verify")).read()
+                    )
+                )
+                print("client = {}, kwargs = {}".format(client, kwargs))
+            client.get(url)
             return
-        except httpx.ConnectError:
+        except (httpx.ConnectError, httpx.RemoteProtocolError) as ex:
+            print(ex)
             time.sleep(0.1)
     raise AssertionError("Timed out waiting for {} to respond".format(url))
 
@@ -166,7 +176,7 @@ def check_permission_actions_are_documented():
 @pytest.fixture(scope="session")
 def ds_localhost_http_server():
     ds_proc = subprocess.Popen(
-        ["datasette", "--memory", "-p", "8041"],
+        [sys.executable, "-m", "datasette", "--memory", "-p", "8041"],
         stdout=subprocess.PIPE,
         stderr=subprocess.STDOUT,
         # Avoid FileNotFoundError: [Errno 2] No such file or directory:
@@ -180,7 +190,7 @@ def ds_localhost_http_server():
     ds_proc.terminate()
 
 
-@pytest.fixture(scope="session")
+@pytest.fixture
 def ds_localhost_https_server(tmp_path_factory):
     cert_directory = tmp_path_factory.mktemp("certs")
     ca = trustme.CA()
@@ -194,6 +204,8 @@ def ds_localhost_https_server(tmp_path_factory):
     ca.cert_pem.write_to_path(path=client_cert)
     ds_proc = subprocess.Popen(
         [
+            sys.executable,
+            "-m",
             "datasette",
             "--memory",
             "-p",
@@ -207,7 +219,11 @@ def ds_localhost_https_server(tmp_path_factory):
         stderr=subprocess.STDOUT,
         cwd=tempfile.gettempdir(),
     )
-    wait_until_responds("http://localhost:8042/", verify=client_cert)
+    wait_until_responds(
+        "http://localhost:8042/_memory.json",
+        verify=client_cert,
+        headers={"Connection": "close"},
+    )
     # Check it started successfully
     assert not ds_proc.poll(), ds_proc.stdout.read().decode("utf-8")
     yield ds_proc, client_cert
diff --git a/tests/test_cli_serve_server.py b/tests/test_cli_serve_server.py
index 1c31e2a3..9320b623 100644
--- a/tests/test_cli_serve_server.py
+++ b/tests/test_cli_serve_server.py
@@ -16,7 +16,11 @@ def test_serve_localhost_http(ds_localhost_http_server):
 @pytest.mark.serial
 def test_serve_localhost_https(ds_localhost_https_server):
     _, client_cert = ds_localhost_https_server
-    response = httpx.get("https://localhost:8042/_memory.json", verify=client_cert)
+    response = httpx.get(
+        "https://localhost:8042/_memory.json",
+        verify=client_cert,
+        headers={"Connection": "close"},
+    )
     assert {
         "database": "_memory",
         "path": "/_memory",

@simonw
Copy link
Owner

simonw commented Dec 18, 2022

Asked ChatGPT:

Write a bash script which starts a server in the background using "datasette -p 8002", then uses curl to make a test request against it, then shuts the server down again at the end

It gave me:

#!/bin/bash

# Start the server in the background
datasette -p 8002 &

# Store the background process ID in a variable
server_pid=$!

# Make a test request using curl
curl http://localhost:8002

# Shut down the server
kill $server_pid

@simonw
Copy link
Owner

simonw commented Dec 18, 2022

This bash script does the job:

#!/bin/bash

# Generate certificates
python -m trustme
# This creates server.pem, server.key, client.pem

# Start the server in the background
datasette --memory \
    --ssl-keyfile=server.key \
    --ssl-certfile=server.pem \
    -p 8152 &

# Store the background process ID in a variable
server_pid=$!

# Wait for the server to start
sleep 2

# Make a test request using curl
curl -f --cacert client.pem 'https://localhost:8152/_memory.json'

# Save curl's exit code (-f option causes it to return one on HTTP errors)
curl_exit_code=$?

# Shut down the server
kill $server_pid
sleep 1

# Clean up the certificates
rm server.pem server.key client.pem

echo $curl_exit_code
exit $curl_exit_code

simonw added a commit that referenced this issue Dec 18, 2022
simonw added a commit that referenced this issue Dec 18, 2022
@simonw
Copy link
Owner

simonw commented Dec 18, 2022

https://github.com/simonw/datasette/actions/runs/3722908296/jobs/6314093163 shows that new test passing in CI:

Generated a certificate for 'localhost', '127.0.0.1', '::1'
Configure your server to use the following files:
  cert=/home/runner/work/datasette/datasette/server.pem
  key=/home/runner/work/datasette/datasette/server.key
Configure your client to use the following files:
  cert=/home/runner/work/datasette/datasette/client.pem
INFO:     Started server process [4036]
INFO:     Waiting for application startup.
INFO:     Application startup complete.
INFO:     Uvicorn running on https://127.0.0.1:8152/ (Press CTRL+C to quit)
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed

INFO:     127.0.0.1:56726 - "GET /_memory.json HTTP/1.1" 200 OK
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
100   213    0   213    0     0  11542      0 --:--:-- --:--:-- --:--:-- 11833
INFO:     Shutting down
INFO:     Waiting for application shutdown.
INFO:     Application shutdown complete.
INFO:     Finished server process [4036]
{"database": "_memory", "private": false, "path": "/_memory", "size": 0, "tables": [], "hidden_count": 0, "views": [], "queries": [], "allow_execute_sql": true, "table_columns": {}, "query_ms": 1.4545189999921604}0

@simonw
Copy link
Owner

simonw commented Dec 18, 2022

... and with this change, the following now works correctly:

% datasette install datasette-gunicorn
% datasette gunicorn fixtures.db -p 8855
[2022-12-17 18:44:29 -0800] [7651] [INFO] Starting gunicorn 20.1.0
[2022-12-17 18:44:29 -0800] [7651] [INFO] Listening at: http://127.0.0.1:8855 (7651)
[2022-12-17 18:44:29 -0800] [7651] [INFO] Using worker: uvicorn.workers.UvicornWorker
[2022-12-17 18:44:29 -0800] [7653] [INFO] Booting worker with pid: 7653
[2022-12-17 18:44:29 -0800] [7653] [INFO] Started server process [7653]
[2022-12-17 18:44:29 -0800] [7653] [INFO] Waiting for application startup.
[2022-12-17 18:44:29 -0800] [7653] [INFO] Application startup complete.

So this issue is now fixed!

@simonw simonw closed this as completed Dec 18, 2022
@simonw simonw removed the blocked Awaiting something else to happen first label Dec 18, 2022
simonw added a commit that referenced this issue Dec 18, 2022
simonw added a commit that referenced this issue Dec 18, 2022
simonw added a commit that referenced this issue Dec 18, 2022
simonw added a commit to simonw/datasette-gunicorn that referenced this issue Dec 18, 2022
@andrewdotn
Copy link

You were super-close on the python version of the test here, changing http to https on 8b73fc6 is enough to pass the test:

diff --git a/tests/conftest.py b/tests/conftest.py
index 69dee68b4a3f..ba07a11d37f6 100644
--- a/tests/conftest.py
+++ b/tests/conftest.py
@@ -207,7 +207,7 @@ def ds_localhost_https_server(tmp_path_factory):
         stderr=subprocess.STDOUT,
         cwd=tempfile.gettempdir(),
     )
-    wait_until_responds("http://localhost:8042/", verify=client_cert)
+    wait_until_responds("https://localhost:8042/", verify=client_cert)
     # Check it started successfully
     assert not ds_proc.poll(), ds_proc.stdout.read().decode("utf-8")
     yield ds_proc, client_cert

My speculation about what was happening here: when wait_until_responds() would time out due to SSL connection problems, because .terminate() isn’t in a finally, the datasette process wouldn’t get killed. That could (1) hang CI and (2) cause all your future local test runs to mysteriously fail because they’d be secretly talking to that old datasette process still hanging around from a past test run with an old temporary server certificate, and that old server cert wouldn’t validate against your newly-created ca cert.

A finally for .terminate() would help; a fancier version could be a context manager for running the external datasette process that could:

  • ensure the process always exited when no longer needed
    • if you want to be fancy, call terminate(), wait() for a short timeout for the process to exit, then try kill() and wait() again; raise an exception complaining about the seemingly-unkillable process if all that fails
  • raise an error if the process exited with a non-zero error code; here it’s likely that some datasette executions were secretly failing with [Errno 48] error while attempting to bind on address ('127.0.0.1', 8042): address already in use

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants