Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

No longer possible to tell if path had escaped characters in it #261

Closed
simonw opened this issue Dec 31, 2018 · 3 comments
Closed

No longer possible to tell if path had escaped characters in it #261

simonw opened this issue Dec 31, 2018 · 3 comments

Comments

@simonw
Copy link
Contributor

simonw commented Dec 31, 2018

This commit introduced behaviour which is causing me problems: b420242

For my application (Datasette) I need to be able to tell the difference between the following two URLs:

This is because Datasette's URL design allows for the name of a SQLite database table to be incorporated into the URL path - and SQLite tables can (bizarrely) include slashes in their names. As you can see in the above example, this works: you can %2F encode those slashes and Datasette will know that they are part of the table name.

Prior to the b420242 uvicorn handled this in a way that works for my application. Here's a debugging tool I deployed using an older version of uvicorn that illustrates the behaviour I need:

https://asgi-scope.now.sh/fixtures/table%2Fwith%2Fslashes.csv

This tool ( https://github.com/simonw/asgi-scope ) simply outputs the current ASGI scope. As you can see, the path component looks like this:

 'method': 'GET',
 'path': '/fixtures/table%2Fwith%2Fslashes.csv',
 'query_string': b'',

But... here's a new version of the above tool I just deployed using the latest uvicorn release:

https://asgi-scope-uvicorn-0-3-23.now.sh/fixtures/table%2Fwith%2Fslashes.csv

 'method': 'GET',
 'path': '/fixtures/table/with/slashes.csv',
 'query_string': b'',

The path has been decoded for me! This means that my application code has no way of telling if the initial input had %2F or / - that information has been lost entirely.

Unfortunately, this appears to be a behaviour that is encoded in the ASGI spec itself:

https://github.com/django/asgiref/blob/5fe2e535e64f85ada8078ad6aabf5e418b77d26b/specs/www.rst#L63-L65

I'm going to open a similar bug against the spec, but assuming the spec doesn't change how about adding a new key to the scope called something like raw_path which exposes the original bytes?

Alternatively, how about making it so I can easily subclass the relevant part of uvicorn and get back the behaviour that I need?

@simonw
Copy link
Contributor Author

simonw commented Dec 31, 2018

Huh... turns out this has been discussed in the asgi repo previously: django/asgiref#51 (comment)

@tomchristie
Copy link
Member

I’m going to defer any resolution/behaviour here to the ASGI spec, tho open to discussing it there.

A useful point of comparison might be Node. Does their API allow developers access to the uniescaped path?

Does datasette ever use the bit of path component after the table name? (If not, then why not change the URL rules so that you don’t have to care if it’s escaped or not and that
https://asgi-scope.now.sh/fixtures/table/with/slashes.csv is equally valid.)

@tomchristie
Copy link
Member

tomchristie commented Jan 4, 2019

Okay, given CGI, WSGI, and what I can see of Node's API, I'm gonna close this in deference to the ASGI spec.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants