Introduce concept of a database `route`, separate from its name #1668

simonw · 2022-03-19T16:48:28Z

Some issues came up in the new datasette-hashed-urls plugin relating to the way it renames databases on startup to achieve unique URLs that depend on the database SHA-256 content:

Renaming the database names makes crossdb queries very brittle datasette-hashed-urls#10
On database page, have the database name without the -HASH suffix datasette-hashed-urls#9
Restore behavior where the link text is the unhashed db names datasette-hashed-urls#8

All three of these could be addressed by making the "path" concept for a database (the /foo bit where it is served) work independently of the database's name, which would be used for default display and also as the alias when configuring cross-database aliases.

The text was updated successfully, but these errors were encountered:

simonw · 2022-03-19T16:54:26Z

The Database class already has a path property but it means something else - it's the path to the .db file on disk:

datasette/datasette/database.py

Lines 29 to 50 in 6141938

    
           class Database: 
        
               def __init__( 
        
                   self, ds, path=None, is_mutable=False, is_memory=False, memory_name=None 
        
               ): 
        
                   self.name = None 
        
                   self.ds = ds 
        
                   self.path = path 
        
                   self.is_mutable = is_mutable 
        
                   self.is_memory = is_memory 
        
                   self.memory_name = memory_name 
        
                   if memory_name is not None: 
        
                       self.is_memory = True 
        
                       self.is_mutable = True 
        
                   self.hash = None 
        
                   self.cached_size = None 
        
                   self._cached_table_counts = None 
        
                   self._write_thread = None 
        
                   self._write_queue = None 
        
                   if not self.is_mutable and not self.is_memory: 
        
                       p = Path(path) 
        
                       self.hash = inspect_hash(p) 
        
                       self.cached_size = p.stat().st_size

So need a different name for the path-that-is-used-in-the-URL.

simonw · 2022-03-19T16:54:55Z

Options:

route_path
url_path
route

I like route_path, or maybe route.

simonw · 2022-03-19T16:56:19Z

Worth noting that the name right now is picked automatically to avoid conflicts:

datasette/datasette/app.py

Lines 397 to 413 in 6141938

    
           def add_database(self, db, name=None): 
        
               new_databases = self.databases.copy() 
        
               if name is None: 
        
                   # Pick a unique name for this database 
        
                   suggestion = db.suggest_name() 
        
                   name = suggestion 
        
               else: 
        
                   suggestion = name 
        
               i = 2 
        
               while name in self.databases: 
        
                   name = "{}_{}".format(suggestion, i) 
        
                   i += 1 
        
               db.name = name 
        
               new_databases[name] = db 
        
               # don't mutate! that causes race conditions with live import 
        
               self.databases = new_databases 
        
               return db

simonw · 2022-03-19T20:06:07Z

Implementing this is a little tricky because there's a whole lot of code that expects the database captured by the URL routing to be the name used to look up the database in datasette.databases - or via .get_database().

The DataView.get() method is a good example of the trickyness here. It even has code that dispatches out to plugin hooks that take database as a parameter.

datasette/datasette/views/base.py

Lines 383 to 555 in 6141938

    
               async def get(self, request): 
        
                   db_name = request.url_vars["database"] 
        
                   database = tilde_decode(db_name) 
        
                   _format = self.get_format(request) 
        
                   data_kwargs = {} 
        
                   if _format == "csv": 
        
                       return await self.as_csv(request, database) 
        
                   if _format is None: 
        
                       # HTML views default to expanding all foreign key labels 
        
                       data_kwargs["default_labels"] = True 
        
                   extra_template_data = {} 
        
                   start = time.perf_counter() 
        
                   status_code = None 
        
                   templates = [] 
        
                   try: 
        
                       response_or_template_contexts = await self.data(request, **data_kwargs) 
        
                       if isinstance(response_or_template_contexts, Response): 
        
                           return response_or_template_contexts 
        
                       # If it has four items, it includes an HTTP status code 
        
                       if len(response_or_template_contexts) == 4: 
        
                           ( 
        
                               data, 
        
                               extra_template_data, 
        
                               templates, 
        
                               status_code, 
        
                           ) = response_or_template_contexts 
        
                       else: 
        
                           data, extra_template_data, templates = response_or_template_contexts 
        
                   except QueryInterrupted: 
        
                       raise DatasetteError( 
        
                           """ 
        
                           SQL query took too long. The time limit is controlled by the 
        
                           <a href="https://docs.datasette.io/en/stable/settings.html#sql-time-limit-ms">sql_time_limit_ms</a> 
        
                           configuration option. 
        
                       """, 
        
                           title="SQL Interrupted", 
        
                           status=400, 
        
                           message_is_html=True, 
        
                       ) 
        
                   except (sqlite3.OperationalError, InvalidSql) as e: 
        
                       raise DatasetteError(str(e), title="Invalid SQL", status=400) 
        
                   except sqlite3.OperationalError as e: 
        
                       raise DatasetteError(str(e)) 
        
                   except DatasetteError: 
        
                       raise 
        
                   end = time.perf_counter() 
        
                   data["query_ms"] = (end - start) * 1000 
        
                   for key in ("source", "source_url", "license", "license_url"): 
        
                       value = self.ds.metadata(key) 
        
                       if value: 
        
                           data[key] = value 
        
                   # Special case for .jsono extension - redirect to _shape=objects 
        
                   if _format == "jsono": 
        
                       return self.redirect( 
        
                           request, 
        
                           path_with_added_args( 
        
                               request, 
        
                               {"_shape": "objects"}, 
        
                               path=request.path.rsplit(".jsono", 1)[0] + ".json", 
        
                           ), 
        
                           forward_querystring=False, 
        
                       ) 
        
                   if _format in self.ds.renderers.keys(): 
        
                       # Dispatch request to the correct output format renderer 
        
                       # (CSV is not handled here due to streaming) 
        
                       result = call_with_supported_arguments( 
        
                           self.ds.renderers[_format][0], 
        
                           datasette=self.ds, 
        
                           columns=data.get("columns") or [], 
        
                           rows=data.get("rows") or [], 
        
                           sql=data.get("query", {}).get("sql", None), 
        
                           query_name=data.get("query_name"), 
        
                           database=database, 
        
                           table=data.get("table"), 
        
                           request=request, 
        
                           view_name=self.name, 
        
                           # These will be deprecated in Datasette 1.0: 
        
                           args=request.args, 
        
                           data=data, 
        
                       ) 
        
                       if asyncio.iscoroutine(result): 
        
                           result = await result 
        
                       if result is None: 
        
                           raise NotFound("No data") 
        
                       if isinstance(result, dict): 
        
                           r = Response( 
        
                               body=result.get("body"), 
        
                               status=result.get("status_code", status_code or 200), 
        
                               content_type=result.get("content_type", "text/plain"), 
        
                               headers=result.get("headers"), 
        
                           ) 
        
                       elif isinstance(result, Response): 
        
                           r = result 
        
                           if status_code is not None: 
        
                               # Over-ride the status code 
        
                               r.status = status_code 
        
                       else: 
        
                           assert False, f"{result} should be dict or Response" 
        
                   else: 
        
                       extras = {} 
        
                       if callable(extra_template_data): 
        
                           extras = extra_template_data() 
        
                           if asyncio.iscoroutine(extras): 
        
                               extras = await extras 
        
                       else: 
        
                           extras = extra_template_data 
        
                       url_labels_extra = {} 
        
                       if data.get("expandable_columns"): 
        
                           url_labels_extra = {"_labels": "on"} 
        
                       renderers = {} 
        
                       for key, (_, can_render) in self.ds.renderers.items(): 
        
                           it_can_render = call_with_supported_arguments( 
        
                               can_render, 
        
                               datasette=self.ds, 
        
                               columns=data.get("columns") or [], 
        
                               rows=data.get("rows") or [], 
        
                               sql=data.get("query", {}).get("sql", None), 
        
                               query_name=data.get("query_name"), 
        
                               database=database, 
        
                               table=data.get("table"), 
        
                               request=request, 
        
                               view_name=self.name, 
        
                           ) 
        
                           it_can_render = await await_me_maybe(it_can_render) 
        
                           if it_can_render: 
        
                               renderers[key] = self.ds.urls.path( 
        
                                   path_with_format( 
        
                                       request=request, format=key, extra_qs={**url_labels_extra} 
        
                                   ) 
        
                               ) 
        
                       url_csv_args = {"_size": "max", **url_labels_extra} 
        
                       url_csv = self.ds.urls.path( 
        
                           path_with_format(request=request, format="csv", extra_qs=url_csv_args) 
        
                       ) 
        
                       url_csv_path = url_csv.split("?")[0] 
        
                       context = { 
        
                           **data, 
        
                           **extras, 
        
                           **{ 
        
                               "renderers": renderers, 
        
                               "url_csv": url_csv, 
        
                               "url_csv_path": url_csv_path, 
        
                               "url_csv_hidden_args": [ 
        
                                   (key, value) 
        
                                   for key, value in urllib.parse.parse_qsl(request.query_string) 
        
                                   if key not in ("_labels", "_facet", "_size") 
        
                               ] 
        
                               + [("_size", "max")], 
        
                               "datasette_version": __version__, 
        
                               "settings": self.ds.settings_dict(), 
        
                           }, 
        
                       } 
        
                       if "metadata" not in context: 
        
                           context["metadata"] = self.ds.metadata 
        
                       r = await self.render(templates, request=request, context=context) 
        
                       if status_code is not None: 
        
                           r.status = status_code 
        
                   ttl = request.args.get("_ttl", None) 
        
                   if ttl is None or not ttl.isdigit(): 
        
                       ttl = self.ds.setting("default_cache_ttl") 
        
                   return self.set_response_headers(r, ttl)

All the more reason to get rid of that BaseView -> DataView -> TableView hierarchy entirely:

Refactor and simplify Datasette routing and views #1660

simonw · 2022-03-19T20:06:27Z

Marking this as blocked until #1660 is done.

simonw · 2022-03-19T20:24:06Z

Right now if a database has a . in its name e.g. fixtures.dot the URL to that database is:

/fixtures~2Edot

But the output on /-/databases doesn't reflect that, it still shows the name with the dot.

simonw · 2022-03-19T20:25:46Z

The output of /.json DOES use path to mean the URL path, not the path to the file on disk:

{
  "fixtures.dot": {
    "name": "fixtures.dot",
    "hash": null,
    "color": "631f11",
    "path": "/fixtures~2Edot",

So that's a problem already: having db.path refer to something different from that JSON is inconsistent.

simonw · 2022-03-19T20:26:32Z

I'm inclined to redefine ds.path to ds.file_path to fix this. Or ds.filepath.

simonw · 2022-03-19T20:27:22Z

The docs do currently describe path as the filesystem path here: https://docs.datasette.io/en/stable/internals.html#database-class

Good thing I'm not at 1.0 yet so I can change that!

simonw · 2022-03-19T20:27:44Z

Pretty sure changing it will break some existing plugins though, including likely Datasette Desktop.

simonw · 2022-03-19T20:28:20Z

I'm going to keep path as the path to the file on disk. I'll pick a new name for what is currently path in that undocumented JSON API.

simonw · 2022-03-19T20:56:35Z

I'm trying to think if there's any reason not to use route for this. Would I possibly want to use that noun for something else in the future? I like it more than route_path because it has no underscore.

Decision made: I'm going with route.

simonw · 2022-03-19T21:08:21Z

I think I've got this working but I need to write a test for it that covers the rare case when the route is not the same thing as the database name.

I'll do that with a new test.

simonw · 2022-03-19T22:53:55Z

Need to update documentation in a few places - e.g. https://docs.datasette.io/en/stable/internals.html#remove-database-name

This removes a database that has been previously added. name= is the unique name of that database, used in its URL path.

simonw · 2022-03-19T22:59:30Z

Also need to update the datasette.urls methods that construct the URL to a database/table/row - they take the database name but they need to know to look for the route.

Need to add tests that check the links in the HTML and can confirm this is working correctly.

simonw · 2022-03-20T00:12:03Z

I'd like to have a live demo of this up on latest.datasette.io too.

simonw · 2022-03-20T00:16:22Z

I'm going to add a fixtures2.db database which has that as the name but alternative-route as the route. I'll set that up using a custom plugin in the plugins/ folder that gets deployed by https://github.com/simonw/datasette/blob/main/.github/workflows/deploy-latest.yml

simonw · 2022-03-20T00:20:36Z

Building this plugin instantly revealed that all of the links - on the homepage and the database page and so on - are incorrect:

from datasette import hookimpl

@hookimpl
def startup(datasette):
    db = datasette.get_database("fixtures2")
    db.route = "alternative-route"

simonw · 2022-03-20T00:31:13Z

That demo is now live:

simonw · 2022-03-20T00:33:23Z

I'm going to release this as a 0.61 alpha so I can more easily depend on it from datasette-hashed-urls.

Refs #957, #1533, #1545, #1576, #1577, #1587, #1601, #1603, #1607, #1612, #1621, #1649, #1654, #1657, #1661, #1668

Refs simonw/datasette#1668

Refs #957, #1228, #1533, #1545, #1576, #1577, #1587, #1601, #1603, #1607, #1612, #1621, #1649, #1654, #1657, #1661, #1668, #1675, #1678

simonw added the enhancement label Mar 19, 2022

simonw added this to the Datasette 1.0 milestone Mar 19, 2022

simonw changed the title ~~Introduce concept of a database path, separate from its name~~ Introduce concept of a database route path, separate from its name Mar 19, 2022

simonw added the blocked Awaiting something else to happen first label Mar 19, 2022

simonw mentioned this issue Mar 19, 2022

Refactor and simplify Datasette routing and views #1660

Closed

simonw changed the title ~~Introduce concept of a database route path, separate from its name~~ Introduce concept of a database route, separate from its name Mar 19, 2022

simonw added a commit that referenced this issue Mar 20, 2022

Databases can now have a .route separate from their .name, refs #1668

7a6654a

simonw added a commit that referenced this issue Mar 20, 2022

alternative-route demo, refs #1668

e10da9a

simonw added a commit that referenced this issue Mar 20, 2022

Fixed internal links to respect db.route, refs #1668

cdbae2b

simonw mentioned this issue Mar 20, 2022

Release 0.61 alpha #1669

Closed

simonw added a commit that referenced this issue Mar 20, 2022

Release 0.61a0

5471e3c

Refs #957, #1533, #1545, #1576, #1577, #1587, #1601, #1603, #1607, #1612, #1621, #1649, #1654, #1657, #1661, #1668

simonw added a commit to simonw/datasette-hashed-urls that referenced this issue Mar 20, 2022

Use the new db.route mechanism, closes #8, closes #9

b7ec370

Refs simonw/datasette#1668

simonw mentioned this issue Mar 20, 2022

Ship Datasette 0.61 #1670

Closed

simonw closed this as completed Mar 20, 2022

simonw mentioned this issue Mar 20, 2022

Tweak design of /.json #1674

Open

simonw added a commit that referenced this issue Mar 23, 2022

Release 0.61

d7c793d

Refs #957, #1228, #1533, #1545, #1576, #1577, #1587, #1601, #1603, #1607, #1612, #1621, #1649, #1654, #1657, #1661, #1668, #1675, #1678

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Introduce concept of a database `route`, separate from its name #1668

Introduce concept of a database `route`, separate from its name #1668

simonw commented Mar 19, 2022

simonw commented Mar 19, 2022

simonw commented Mar 19, 2022 •

edited

Loading

simonw commented Mar 19, 2022

simonw commented Mar 19, 2022

simonw commented Mar 19, 2022

simonw commented Mar 19, 2022

simonw commented Mar 19, 2022 •

edited

Loading

simonw commented Mar 19, 2022

simonw commented Mar 19, 2022

simonw commented Mar 19, 2022

simonw commented Mar 19, 2022

simonw commented Mar 19, 2022

simonw commented Mar 19, 2022

simonw commented Mar 19, 2022

simonw commented Mar 19, 2022

simonw commented Mar 20, 2022

simonw commented Mar 20, 2022

simonw commented Mar 20, 2022

simonw commented Mar 20, 2022

simonw commented Mar 20, 2022

Introduce concept of a database route, separate from its name #1668

Introduce concept of a database route, separate from its name #1668

Comments

simonw commented Mar 19, 2022

simonw commented Mar 19, 2022

simonw commented Mar 19, 2022 • edited Loading

simonw commented Mar 19, 2022

simonw commented Mar 19, 2022

simonw commented Mar 19, 2022

simonw commented Mar 19, 2022

simonw commented Mar 19, 2022 • edited Loading

simonw commented Mar 19, 2022

simonw commented Mar 19, 2022

simonw commented Mar 19, 2022

simonw commented Mar 19, 2022

simonw commented Mar 19, 2022

simonw commented Mar 19, 2022

simonw commented Mar 19, 2022

simonw commented Mar 19, 2022

simonw commented Mar 20, 2022

simonw commented Mar 20, 2022

simonw commented Mar 20, 2022

simonw commented Mar 20, 2022

simonw commented Mar 20, 2022

Introduce concept of a database `route`, separate from its name #1668

Introduce concept of a database `route`, separate from its name #1668

simonw commented Mar 19, 2022 •

edited

Loading

simonw commented Mar 19, 2022 •

edited

Loading