Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Port Datasette from Sanic to ASGI + Uvicorn #518

Merged
merged 34 commits into from Jun 24, 2019

Conversation

Projects
None yet
1 participant
@simonw
Copy link
Owner

commented Jun 23, 2019

Most of the code here was fleshed out in comments on #272 (Port Datasette to ASGI) - this pull request will track the final pieces:

  • Update test harness to more correctly simulate the raw_path issue
  • Use raw_path so table names containing / can work correctly
  • Bug: JSON not served with correct content-type
  • Get ?_trace=1 working again
  • Replacement for @app.listener("before_server_start")
  • Bug: /fixtures/table%2Fwith%2Fslashes.csv?_format=json downloads as CSV
  • Replace Sanic request and response objects with my own classes, so I can remove Sanic dependency
  • Final code tidy-up before merging to master

simonw added some commits Jun 15, 2019

Revert "New encode/decode_path_component functions"
Refs #272

This reverts commit 9fdb47c.

Now that ASGI supports raw_path we don't need our own encoding scheme!
First partially working version of ASGI-powered Datasette #272
Lots still to do:

* Static files are not being served
* Streaming CSV files don't work
* Tests all fail
* Some URLs (e.g. the 'next' link on tables) are incorrect

But... the server does start up and you can browse databases/tables
All API tests now pass, refs #272
CSV tests still all fail.

Also I marked test_trace() as skip because I have not yet re-implemented trace.
Basic static files now work, refs #272
Not yet using aiofiles so will not correctly handle larger static assets.

Still needs security tightening.

Still needs tests.

But the CSS and JS now work
Include "asgi": "3.0" in /-/versions, refs #272
Mainly so you can tell if a Datasette instance is running
on ASGI or not.
Database download works again, refactored utils.py #272
Refactored utils.py into a datasette/utils package, refactored some of
the ASGI helper code into datasette/utils/asgi.py
@simonw

This comment has been minimized.

Copy link
Owner Author

commented on 3bd5e14 Jun 23, 2019

Forgot to say this refs #272

Show resolved Hide resolved datasette/app.py Outdated
Show resolved Hide resolved datasette/app.py Outdated
Show resolved Hide resolved tests/test_csv.py
Show resolved Hide resolved tests/test_api.py Outdated
@simonw

This comment has been minimized.

Copy link
Owner Author

commented Jun 23, 2019

This is strange: on my local machine http://127.0.0.1:8001/fixtures/table%2Fwith%2Fslashes.csv is returning a 404 BUT there's a test for that which is passing under pytest:

datasette/tests/test_api.py

Lines 721 to 727 in d60fbfc

def test_table_with_slashes_in_name(app_client):
response = app_client.get(
"/fixtures/table%2Fwith%2Fslashes.csv?_shape=objects&_format=json"
)
assert response.status == 200
data = response.json
assert data["rows"] == [{"pk": "3", "content": "hey"}]

@simonw

This comment has been minimized.

Copy link
Owner Author

commented Jun 23, 2019

Mystery solved: that's because I'm constructing my own scope object and testing via ApplicationCommunicator rather than exercising Uvicorn directly.

async def _get(self, path, allow_redirects=True, redirect_count=0):
query_string = b""
if "?" in path:
path, _, query_string = path.partition("?")
query_string = query_string.encode("utf8")
instance = ApplicationCommunicator(
self.asgi_app,
{
"type": "http",
"http_version": "1.0",
"method": "GET",
"path": path,
"query_string": query_string,
"headers": [[b"host", b"localhost"]],
},
)

I don't want to introduce the complexity of launching a real Uvicorn as part of the tests, so I guess I'll have to carefully update my ApplicationCommunicator test harness to more correctly emulate real life.

@simonw simonw self-assigned this Jun 23, 2019

@simonw simonw added this to the Datasette 1.0 milestone Jun 23, 2019

@simonw simonw changed the title Use ASGI internally, serve using Uvicorn instead of Sanic #272 Port Datasette from Sanic to ASGI + Uvicorn Jun 23, 2019

Test harness simulates raw_path/path properly
This causes tests to fail.
@simonw

This comment has been minimized.

Copy link
Owner Author

commented Jun 23, 2019

@simonw

This comment has been minimized.

Copy link
Owner Author

commented Jun 23, 2019

Another bug: JSON is being served without a content-type header:

~ $ curl -i 'http://127.0.0.1:8001/fivethirtyeight/ahca-polls%2Fahca_polls.json'
HTTP/1.1 200 OK
date: Sun, 23 Jun 2019 16:04:01 GMT
server: uvicorn
referrer-policy: no-referrer
transfer-encoding: chunked

{"database": "fivethirtyeight", "table": "ahca-polls/ahca_polls", ...
@simonw

This comment has been minimized.

Copy link
Owner Author

commented Jun 23, 2019

OK, for Get ?_trace=1 working again. The old code lives in two places:

datasette/datasette/app.py

Lines 546 to 560 in 35429f9

class TracingSanic(Sanic):
async def handle_request(self, request, write_callback, stream_callback):
if request.args.get("_trace"):
request["traces"] = []
request["trace_start"] = time.time()
with capture_traces(request["traces"]):
await super().handle_request(
request, write_callback, stream_callback
)
else:
await super().handle_request(
request, write_callback, stream_callback
)
app = TracingSanic(__name__)

And then:

datasette/datasette/app.py

Lines 653 to 672 in 35429f9

@app.middleware("response")
async def add_traces_to_response(request, response):
if request.get("traces") is None:
return
traces = request["traces"]
trace_info = {
"request_duration_ms": 1000 * (time.time() - request["trace_start"]),
"sum_trace_duration_ms": sum(t["duration_ms"] for t in traces),
"num_traces": len(traces),
"traces": traces,
}
if "text/html" in response.content_type and b"</body>" in response.body:
extra = json.dumps(trace_info, indent=2)
extra_html = "<pre>{}</pre></body>".format(extra).encode("utf8")
response.body = response.body.replace(b"</body>", extra_html)
elif "json" in response.content_type and response.body.startswith(b"{"):
data = json.loads(response.body.decode("utf8"))
if "_trace" not in data:
data["_trace"] = trace_info
response.body = json.dumps(data).encode("utf8")

So it's stashing something on the request to tell the rest of the code it should be tracing, then using that collected data from the request to add information to the final body.

One possible shape for the replacement is a new ASGI middleware that wraps everything else. We don't have a mutable request object here though, so we will need to untangle this entirely from the request object.

Also tricky is that in ASGI land we handle streams - we don't usually wait around for the entire response body to be compiled for us. This means the code that modifies the response (adding to the JSON or appending inside the </body>) needs to be reconsidered.

As usual, Starlette seems to have figured this out: https://github.com/encode/starlette/blob/8c8cc2ec0a5cb834a9a15b871ae8b480503abb67/starlette/middleware/gzip.py

simonw added some commits Jun 23, 2019

@simonw

This comment has been minimized.

Copy link
Owner Author

commented Jun 23, 2019

Replacement for @app.listener("before_server_start") - this is what the ASGI lifespan protocol is for.

I know Uvicorn supports this because it keeps saying ASGI 'lifespan' protocol appears unsupported on the console.

I think the solution here will be to introduce another ASGI wrapper class similar to AsgiTracer. I'll model this on the example in the ASGI lifespan spec.

simonw added some commits Jun 23, 2019

Implemented ASGI lifespan #272
Also did a little bit of lint cleanup
@simonw

This comment has been minimized.

Copy link
Owner Author

commented Jun 23, 2019

The big one: Replace Sanic request and response objects with my own classes, so I can remove Sanic dependency

@simonw

This comment has been minimized.

Copy link
Owner Author

commented Jun 23, 2019

The InvalidUsage exception is thrown by Sanic when it gets an unhandled request - usually a HEAD. It was added in efbb4e8

I'm going to replace it with specific handling for HEAD requests plus a unit test.

@simonw

This comment has been minimized.

Copy link
Owner Author

commented Jun 23, 2019

The Sanic stuff I import at the moment is:

@simonw

This comment has been minimized.

Copy link
Owner Author

commented Jun 23, 2019

For the request object.... what are the fields of it I actually use?

  • request.url
  • request.query_string
  • request.path
  • request.method
  • request.args
  • request.raw_args

ALL of those are things that can be derived from the scope - so I think my new Request class (in utils/asgi.py) is just going to be a wrapper around a scope.

Show resolved Hide resolved datasette/utils/asgi.py Outdated
@simonw

This comment has been minimized.

Copy link
Owner Author

commented Jun 23, 2019

Last thing is to replace sanic.response:

  • response.text("")
  • response.html()
  • response.redirect(path)
  • response.HTTPResponse

Implementations here: https://github.com/huge-success/sanic/blob/0.7.0/sanic/response.py#L175-L285

@simonw simonw merged commit ba8db96 into master Jun 24, 2019

2 checks passed

continuous-integration/travis-ci/pr The Travis CI build passed
Details
continuous-integration/travis-ci/push The Travis CI build passed
Details
@simonw

This comment has been minimized.

Copy link
Owner Author

commented Jun 24, 2019

simonw added a commit that referenced this pull request Jul 7, 2019

Port Datasette from Sanic to ASGI + Uvicorn (#518)
Datasette now uses ASGI internally, and no longer depends on Sanic.

It now uses Uvicorn as the underlying HTTP server.

This was thirteen months in the making... for full details see the issue:

#272

And for a full sequence of commits plus commentary, see the pull request:

#518
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.