Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use dash encoding for table names and row primary keys in URLs #1648

Merged
merged 8 commits into from Mar 7, 2022

Conversation

simonw
Copy link
Owner

@simonw simonw commented Mar 5, 2022

Refs #1439.

  • Build dash_encode / dash_decode functions
  • Use dash encoding for row primary keys
  • Use dash encoding for ?_next= pagination tokens
  • Use dash encoding for table names in URLs
  • Use dash encoding for database name
  • Implement redirects from previous % URLs that replace those with - - separate issue: Implement redirects from old % encoding to new dash encoding #1650

@codecov
Copy link

codecov bot commented Mar 5, 2022

Codecov Report

Merging #1648 (32548b8) into main (7d24fd4) will increase coverage by 0.02%.
The diff coverage is 100.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #1648      +/-   ##
==========================================
+ Coverage   92.03%   92.05%   +0.02%     
==========================================
  Files          34       34              
  Lines        4557     4570      +13     
==========================================
+ Hits         4194     4207      +13     
  Misses        363      363              
Impacted Files Coverage Δ
datasette/url_builder.py 100.00% <100.00%> (ø)
datasette/utils/__init__.py 94.97% <100.00%> (+0.10%) ⬆️
datasette/views/base.py 95.49% <100.00%> (ø)
datasette/views/table.py 96.19% <100.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 7d24fd4...32548b8. Read the comment docs.

@simonw
Copy link
Owner Author

simonw commented Mar 6, 2022

Change of plan: based on extensive conversations on Twitter - see #1439 (comment) - I'm going to try a variant of this which is basically percent-encoding but with a hyphen instead of a percent symbol.

Reason being that the current scheme doesn't handle the case of % being part of the table name, which could cause weird breakage due to some proxies decoding percent encoding before it gets to Datasette.

@simonw
Copy link
Owner Author

simonw commented Mar 6, 2022

Updated documentation:
image

@simonw
Copy link
Owner Author

simonw commented Mar 6, 2022

For consistency, I'm going to change how ?_next= tokens work too. Right now they work like this:

if _next:
sort_value = None
if is_view:
# _next is an offset
offset = f" offset {int(_next)}"
else:
components = urlsafe_components(_next)

def urlsafe_components(token):
"""Splits token on commas and URL decodes each component"""
return [urllib.parse.unquote_plus(b) for b in token.split(",")]

I'm going to change those to use dash-encoding instead.

I considered looking for % in those values and replacing that as - too, but since Datasette isn't 1.0 yet I'm going to risk breaking any pagination tokens that people might have saved away somewhere!

@simonw
Copy link
Owner Author

simonw commented Mar 6, 2022

Just spotted this:

async def resolve_db_name(self, request, db_name, **kwargs):
hash = None
name = None
db_name = urllib.parse.unquote_plus(db_name)
if db_name not in self.ds.databases and "-" in db_name:
# No matching DB found, maybe it's a name-hash?
name_bit, hash_bit = db_name.rsplit("-", 1)
if name_bit not in self.ds.databases:
raise NotFound(f"Database not found: {name}")
else:
name = name_bit
hash = hash_bit
else:
name = db_name

Maybe the db name should use dash encoding too?

If so, relevant code includes this bit:

def database(self, database, format=None):
db = self.ds.databases[database]
if self.ds.setting("hash_urls") and db.hash:
path = self.path(
f"{urllib.parse.quote(database)}-{db.hash[:HASH_LENGTH]}", format=format
)
else:
path = self.path(urllib.parse.quote(database), format=format)
return path

@simonw
Copy link
Owner Author

simonw commented Mar 6, 2022

  • Maybe use dash encoding for database name too?

Yes, I'm going to do this. At the moment if a DB file is called fixx%tures.db when you run it in Datasette the path is /fix%2525tures - which is liable to break.

@simonw
Copy link
Owner Author

simonw commented Mar 6, 2022

I may have to do extra work here

    def database(self, database, format=None):
        db = self.ds.databases[database]
        if self.ds.setting("hash_urls") and db.hash:
            path = self.path(
                f"{dash_encode(database)}-{db.hash[:HASH_LENGTH]}", format=format
            )
        else:
            path = self.path(dash_encode(database), format=format)
        return path

The URLs that incorporate a hash have a dbname-hash format - will that - in the middle there mess up the dash decoding mechanism? I think it will.

Might be able to solve that like so:

    async def resolve_db_name(self, request, db_name, **kwargs):
        hash = None
        name = None
        decoded_name = dash_decode(db_name)
        if decoded_name not in self.ds.databases and "-" in db_name:
            # No matching DB found, maybe it's a name-hash?
            name_bit, hash_bit = db_name.rsplit("-", 1)
            if dash_decode(name_bit) not in self.ds.databases:
                raise NotFound(f"Database not found: {name}")
            else:
                name = dash_decode(name_bit)
                hash = hash_bit
        else:
            name = decoded_name

@simonw simonw merged commit 1baa030 into main Mar 7, 2022
@simonw simonw deleted the dash-encoding branch March 7, 2022 15:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant