Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tilde encoding: use ~ instead of - for dash-encoding #1657

Closed
simonw opened this issue Mar 14, 2022 · 12 comments
Closed

Tilde encoding: use ~ instead of - for dash-encoding #1657

simonw opened this issue Mar 14, 2022 · 12 comments

Comments

@simonw
Copy link
Owner

simonw commented Mar 14, 2022

Refs #1439

@simonw
Copy link
Owner Author

simonw commented Mar 14, 2022

The problem with the dash encoding mechanism is that it turns out dashes are used in a LOT of existing Datasette instances - much of https://fivethirtyeight.datasettes.com/fivethirtyeight for example, and even https://datasette.io/ itself: https://datasette.io/dogsheep-index

It's pretty ugly to force all of those to change to their dash-encoded equivalent - and in fact it broke https://datasette.io/ in a subtle way:

I'm going to try using ~ instead and see if that works as well and causes less breakage to existing sites.

@simonw
Copy link
Owner Author

simonw commented Mar 14, 2022

Asked about this on Twitter:

Anyone ever seen a proxy or other URL handling system do anything surprising with the tilde "~" character?

I'm considering it as an escaping character, in place of "-" as described in

Replies so far seem like it should be OK - Apache has supported this for home directories for a couple of decades now without any problems.

@simonw
Copy link
Owner Author

simonw commented Mar 14, 2022

Relevant: https://datatracker.ietf.org/doc/html/rfc3986#section-2.1


      reserved    = gen-delims / sub-delims

      gen-delims  = ":" / "/" / "?" / "#" / "[" / "]" / "@"

      sub-delims  = "!" / "$" / "&" / "'" / "(" / ")"
                  / "*" / "+" / "," / ";" / "="

Notably ~ is not in either of those lists.

@simonw
Copy link
Owner Author

simonw commented Mar 14, 2022

And in https://datatracker.ietf.org/doc/html/rfc3986#section-2.3 "Unreserved Characters":

  unreserved  = ALPHA / DIGIT / "-" / "." / "_" / "~"

@simonw
Copy link
Owner Author

simonw commented Mar 14, 2022

Updated test:

@pytest.mark.parametrize(
    "original,expected",
    (
        ("abc", "abc"),
        ("/foo/bar", "~2Ffoo~2Fbar"),
        ("/-/bar", "~2F-~2Fbar"),
        ("-/db-/table.csv", "-~2Fdb-~2Ftable~2Ecsv"),
        (r"%~-/", "~25~7E-~2F"),
        ("~25~7E~2D~2F", "~7E25~7E7E~7E2D~7E2F"),
    ),
)
def test_tilde_encoding(original, expected):
    actual = utils.tilde_encode(original)
    assert actual == expected
    # And test round-trip
    assert original == utils.tilde_decode(actual)

@simonw
Copy link
Owner Author

simonw commented Mar 15, 2022

I've made a real mess of this. I'm going to revert Datasettemain back to the last commit that passed the tests and try this again in a branch.

simonw added a commit that referenced this issue Mar 15, 2022
Still a ton of broken tests though
@simonw
Copy link
Owner Author

simonw commented Mar 15, 2022

The state I had got to prior to that revert is in https://github.com/simonw/datasette/tree/issue-1657-wip

@simonw
Copy link
Owner Author

simonw commented Mar 15, 2022

The thing that broke everything was this change:

image

I'm going to bring back the horrible get_format() method for the moment, with its weird mutations of the args object, then try and get rid of it again later.

@simonw
Copy link
Owner Author

simonw commented Mar 15, 2022

Moving this to a PR.

@simonw simonw changed the title Try using ~ instead of - for dash-encoding (I guess tilde-encoding) Tilde encoding: use ~ instead of - for dash-encoding Mar 15, 2022
@simonw simonw added this to the Datasette 1.0 milestone Mar 15, 2022
@simonw simonw mentioned this issue Mar 15, 2022
simonw added a commit that referenced this issue Mar 15, 2022
@simonw simonw closed this as completed in a35393b Mar 15, 2022
@simonw
Copy link
Owner Author

simonw commented Mar 15, 2022

simonw added a commit to simonw/datasette.io that referenced this issue Mar 15, 2022
@simonw
Copy link
Owner Author

simonw commented Mar 15, 2022

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant