Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Get rid of the no-longer necessary ?_format=json hack for tables called x.json #1651

Closed
simonw opened this issue Mar 7, 2022 · 8 comments
Closed

Comments

@simonw
Copy link
Owner

simonw commented Mar 7, 2022

Tidy up from:

@simonw simonw added the refactor label Mar 7, 2022
@simonw
Copy link
Owner Author

simonw commented Mar 7, 2022

Here's the relevant code:

def path_with_format(
*, request=None, path=None, format=None, extra_qs=None, replace_format=None
):
qs = extra_qs or {}
path = request.path if request else path
if replace_format and path.endswith(f".{replace_format}"):
path = path[: -(1 + len(replace_format))]
if "." in path:
qs["_format"] = format
else:
path = f"{path}.{format}"
if qs:
extra = urllib.parse.urlencode(sorted(qs.items()))
if request and request.query_string:
path = f"{path}?{request.query_string}&{extra}"
else:
path = f"{path}?{extra}"
elif request and request.query_string:
path = f"{path}?{request.query_string}"
return path

async def get_format(self, request, database, args):
"""Determine the format of the response from the request, from URL
parameters or from a file extension.
`args` is a dict of the path components parsed from the URL by the router.
"""
# If ?_format= is provided, use that as the format
_format = request.args.get("_format", None)
if not _format:
_format = (args.pop("as_format", None) or "").lstrip(".")
else:
args.pop("as_format", None)
if "table_and_format" in args:
db = self.ds.databases[database]
async def async_table_exists(t):
return await db.table_exists(t)
table, _ext_format = await resolve_table_and_format(
table_and_format=dash_decode(args["table_and_format"]),
table_exists=async_table_exists,
allowed_formats=self.ds.renderers.keys(),
)
_format = _format or _ext_format
args["table"] = table
del args["table_and_format"]
elif "table" in args:
args["table"] = dash_decode(args["table"])
return _format, args

@simonw
Copy link
Owner Author

simonw commented Mar 7, 2022

I found an obscure bug in #1650 which I can fix with this too. The following test should pass:

@pytest.mark.parametrize(
    "path,expected",
    (
        (
            "/fivethirtyeight/twitter-ratio%2Fsenators",
            "/fivethirtyeight/twitter-2Dratio-2Fsenators",
        ),
        (
            "/fixtures/table%2Fwith%2Fslashes.csv",
            "/fixtures/table-2Fwith-2Fslashes-2Ecsv",
        ),
        # query string should be preserved
        ("/foo/bar%2Fbaz?id=5", "/foo/bar-2Fbaz?id=5"),
    ),
)
def test_redirect_percent_encoding_to_dash_encoding(app_client, path, expected):
    response = app_client.get(path)
    assert response.status == 302
    assert response.headers["location"] == expected

It currently fails like this:

>       assert response.headers["location"] == expected
E       AssertionError: assert '/fixtures/table-2Fwith-2Fslashes.csv?_nofacet=1&_nocount=1' == '/fixtures/table-2Fwith-2Fslashes-2Ecsv'
E         - /fixtures/table-2Fwith-2Fslashes-2Ecsv
E         + /fixtures/table-2Fwith-2Fslashes.csv?_nofacet=1&_nocount=1

Because the logic in that get_format() function notices that the table exists, and then weird things happen here:

extra_parameters = [
"{}=1".format(key)
for key in ("_nofacet", "_nocount")
if not request.args.get(key)
]
if extra_parameters:
if not request.query_string:
new_query_string = "&".join(extra_parameters)
else:
new_query_string = (
request.query_string + "&" + "&".join(extra_parameters)
)
new_scope = dict(
request.scope, query_string=new_query_string.encode("latin-1")
)
request.scope = new_scope

@simonw
Copy link
Owner Author

simonw commented Mar 7, 2022

Wow, this code is difficult to follow! Look at this bit inside the get_format() method:

table, _ext_format = await resolve_table_and_format(
table_and_format=dash_decode(args["table_and_format"]),
table_exists=async_table_exists,
allowed_formats=self.ds.renderers.keys(),
)
_format = _format or _ext_format
args["table"] = table
del args["table_and_format"]
elif "table" in args:
args["table"] = dash_decode(args["table"])

That's modifying the arguments that were extracted from the path by the routing regular expressions to have table as dash-decoded value! So calling.get_format()` has the side effect of decoding the table names for you. Nasty.

@simonw
Copy link
Owner Author

simonw commented Mar 7, 2022

My attempts to simplify get_format() keep resulting in errors like this one:

  File "/Users/simon/Dropbox/Development/datasette/datasette/views/base.py", line 474, in view_get
    response_or_template_contexts = await self.data(
TypeError: TableView.data() missing 1 required positional argument: 'table'

I really need to clean this up.

@simonw
Copy link
Owner Author

simonw commented Mar 7, 2022

I'm going to do a review of how URL routing works at the moment for the various views.

I edited down the full list a bit - these are the most relevant:

add_route(IndexView.as_view(self), r"/(?P<as_format>(\.jsono?)?$)")
add_route(
    DatabaseView.as_view(self),
    r"/(?P<db_name>[^/]+?)(?P<as_format>"
    + renderer_regex
    + r"|.jsono|\.csv)?$",
)
add_route(
    TableView.as_view(self),
    r"/(?P<db_name>[^/]+)/(?P<table_and_format>[^/]+?$)",
)
add_route(
    RowView.as_view(self),
    r"/(?P<db_name>[^/]+)/(?P<table>[^/]+?)/(?P<pk_path>[^/]+?)(?P<as_format>"
    + renderer_regex
    + r")?$",
)

@simonw
Copy link
Owner Author

simonw commented Mar 8, 2022

A lot of the code complexity here is caused by DataView (here), which has the logic for CSV streaming and plugin formats such that it can be shared between tables and custom queries.

It would be good to get rid of that subclassed shared code, figure out how to do it via a utility function instead.

@simonw
Copy link
Owner Author

simonw commented Mar 14, 2022

This work is now blocked on:

@simonw simonw closed this as completed Mar 15, 2022
@simonw simonw added this to the Datasette 1.0 milestone Mar 19, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant