Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce concept of a database route, separate from its name #1668

Closed
simonw opened this issue Mar 19, 2022 · 20 comments
Closed

Introduce concept of a database route, separate from its name #1668

simonw opened this issue Mar 19, 2022 · 20 comments
Labels
blocked Awaiting something else to happen first enhancement
Milestone

Comments

@simonw
Copy link
Owner

simonw commented Mar 19, 2022

Some issues came up in the new datasette-hashed-urls plugin relating to the way it renames databases on startup to achieve unique URLs that depend on the database SHA-256 content:

All three of these could be addressed by making the "path" concept for a database (the /foo bit where it is served) work independently of the database's name, which would be used for default display and also as the alias when configuring cross-database aliases.

@simonw simonw added this to the Datasette 1.0 milestone Mar 19, 2022
@simonw
Copy link
Owner Author

simonw commented Mar 19, 2022

The Database class already has a path property but it means something else - it's the path to the .db file on disk:

class Database:
def __init__(
self, ds, path=None, is_mutable=False, is_memory=False, memory_name=None
):
self.name = None
self.ds = ds
self.path = path
self.is_mutable = is_mutable
self.is_memory = is_memory
self.memory_name = memory_name
if memory_name is not None:
self.is_memory = True
self.is_mutable = True
self.hash = None
self.cached_size = None
self._cached_table_counts = None
self._write_thread = None
self._write_queue = None
if not self.is_mutable and not self.is_memory:
p = Path(path)
self.hash = inspect_hash(p)
self.cached_size = p.stat().st_size

So need a different name for the path-that-is-used-in-the-URL.

@simonw simonw changed the title Introduce concept of a database path, separate from its name Introduce concept of a database route path, separate from its name Mar 19, 2022
@simonw
Copy link
Owner Author

simonw commented Mar 19, 2022

Options:

  • route_path
  • url_path
  • route

I like route_path, or maybe route.

@simonw
Copy link
Owner Author

simonw commented Mar 19, 2022

Worth noting that the name right now is picked automatically to avoid conflicts:

datasette/datasette/app.py

Lines 397 to 413 in 6141938

def add_database(self, db, name=None):
new_databases = self.databases.copy()
if name is None:
# Pick a unique name for this database
suggestion = db.suggest_name()
name = suggestion
else:
suggestion = name
i = 2
while name in self.databases:
name = "{}_{}".format(suggestion, i)
i += 1
db.name = name
new_databases[name] = db
# don't mutate! that causes race conditions with live import
self.databases = new_databases
return db

@simonw
Copy link
Owner Author

simonw commented Mar 19, 2022

Implementing this is a little tricky because there's a whole lot of code that expects the database captured by the URL routing to be the name used to look up the database in datasette.databases - or via .get_database().

The DataView.get() method is a good example of the trickyness here. It even has code that dispatches out to plugin hooks that take database as a parameter.

async def get(self, request):
db_name = request.url_vars["database"]
database = tilde_decode(db_name)
_format = self.get_format(request)
data_kwargs = {}
if _format == "csv":
return await self.as_csv(request, database)
if _format is None:
# HTML views default to expanding all foreign key labels
data_kwargs["default_labels"] = True
extra_template_data = {}
start = time.perf_counter()
status_code = None
templates = []
try:
response_or_template_contexts = await self.data(request, **data_kwargs)
if isinstance(response_or_template_contexts, Response):
return response_or_template_contexts
# If it has four items, it includes an HTTP status code
if len(response_or_template_contexts) == 4:
(
data,
extra_template_data,
templates,
status_code,
) = response_or_template_contexts
else:
data, extra_template_data, templates = response_or_template_contexts
except QueryInterrupted:
raise DatasetteError(
"""
SQL query took too long. The time limit is controlled by the
<a href="https://docs.datasette.io/en/stable/settings.html#sql-time-limit-ms">sql_time_limit_ms</a>
configuration option.
""",
title="SQL Interrupted",
status=400,
message_is_html=True,
)
except (sqlite3.OperationalError, InvalidSql) as e:
raise DatasetteError(str(e), title="Invalid SQL", status=400)
except sqlite3.OperationalError as e:
raise DatasetteError(str(e))
except DatasetteError:
raise
end = time.perf_counter()
data["query_ms"] = (end - start) * 1000
for key in ("source", "source_url", "license", "license_url"):
value = self.ds.metadata(key)
if value:
data[key] = value
# Special case for .jsono extension - redirect to _shape=objects
if _format == "jsono":
return self.redirect(
request,
path_with_added_args(
request,
{"_shape": "objects"},
path=request.path.rsplit(".jsono", 1)[0] + ".json",
),
forward_querystring=False,
)
if _format in self.ds.renderers.keys():
# Dispatch request to the correct output format renderer
# (CSV is not handled here due to streaming)
result = call_with_supported_arguments(
self.ds.renderers[_format][0],
datasette=self.ds,
columns=data.get("columns") or [],
rows=data.get("rows") or [],
sql=data.get("query", {}).get("sql", None),
query_name=data.get("query_name"),
database=database,
table=data.get("table"),
request=request,
view_name=self.name,
# These will be deprecated in Datasette 1.0:
args=request.args,
data=data,
)
if asyncio.iscoroutine(result):
result = await result
if result is None:
raise NotFound("No data")
if isinstance(result, dict):
r = Response(
body=result.get("body"),
status=result.get("status_code", status_code or 200),
content_type=result.get("content_type", "text/plain"),
headers=result.get("headers"),
)
elif isinstance(result, Response):
r = result
if status_code is not None:
# Over-ride the status code
r.status = status_code
else:
assert False, f"{result} should be dict or Response"
else:
extras = {}
if callable(extra_template_data):
extras = extra_template_data()
if asyncio.iscoroutine(extras):
extras = await extras
else:
extras = extra_template_data
url_labels_extra = {}
if data.get("expandable_columns"):
url_labels_extra = {"_labels": "on"}
renderers = {}
for key, (_, can_render) in self.ds.renderers.items():
it_can_render = call_with_supported_arguments(
can_render,
datasette=self.ds,
columns=data.get("columns") or [],
rows=data.get("rows") or [],
sql=data.get("query", {}).get("sql", None),
query_name=data.get("query_name"),
database=database,
table=data.get("table"),
request=request,
view_name=self.name,
)
it_can_render = await await_me_maybe(it_can_render)
if it_can_render:
renderers[key] = self.ds.urls.path(
path_with_format(
request=request, format=key, extra_qs={**url_labels_extra}
)
)
url_csv_args = {"_size": "max", **url_labels_extra}
url_csv = self.ds.urls.path(
path_with_format(request=request, format="csv", extra_qs=url_csv_args)
)
url_csv_path = url_csv.split("?")[0]
context = {
**data,
**extras,
**{
"renderers": renderers,
"url_csv": url_csv,
"url_csv_path": url_csv_path,
"url_csv_hidden_args": [
(key, value)
for key, value in urllib.parse.parse_qsl(request.query_string)
if key not in ("_labels", "_facet", "_size")
]
+ [("_size", "max")],
"datasette_version": __version__,
"settings": self.ds.settings_dict(),
},
}
if "metadata" not in context:
context["metadata"] = self.ds.metadata
r = await self.render(templates, request=request, context=context)
if status_code is not None:
r.status = status_code
ttl = request.args.get("_ttl", None)
if ttl is None or not ttl.isdigit():
ttl = self.ds.setting("default_cache_ttl")
return self.set_response_headers(r, ttl)

All the more reason to get rid of that BaseView -> DataView -> TableView hierarchy entirely:

@simonw simonw added the blocked Awaiting something else to happen first label Mar 19, 2022
@simonw
Copy link
Owner Author

simonw commented Mar 19, 2022

Marking this as blocked until #1660 is done.

@simonw
Copy link
Owner Author

simonw commented Mar 19, 2022

Right now if a database has a . in its name e.g. fixtures.dot the URL to that database is:

/fixtures~2Edot

But the output on /-/databases doesn't reflect that, it still shows the name with the dot.

@simonw
Copy link
Owner Author

simonw commented Mar 19, 2022

The output of /.json DOES use path to mean the URL path, not the path to the file on disk:

{
  "fixtures.dot": {
    "name": "fixtures.dot",
    "hash": null,
    "color": "631f11",
    "path": "/fixtures~2Edot",

So that's a problem already: having db.path refer to something different from that JSON is inconsistent.

@simonw
Copy link
Owner Author

simonw commented Mar 19, 2022

I'm inclined to redefine ds.path to ds.file_path to fix this. Or ds.filepath.

@simonw
Copy link
Owner Author

simonw commented Mar 19, 2022

The docs do currently describe path as the filesystem path here: https://docs.datasette.io/en/stable/internals.html#database-class

image

Good thing I'm not at 1.0 yet so I can change that!

@simonw
Copy link
Owner Author

simonw commented Mar 19, 2022

Pretty sure changing it will break some existing plugins though, including likely Datasette Desktop.

@simonw
Copy link
Owner Author

simonw commented Mar 19, 2022

I'm going to keep path as the path to the file on disk. I'll pick a new name for what is currently path in that undocumented JSON API.

@simonw
Copy link
Owner Author

simonw commented Mar 19, 2022

I'm trying to think if there's any reason not to use route for this. Would I possibly want to use that noun for something else in the future? I like it more than route_path because it has no underscore.

Decision made: I'm going with route.

@simonw simonw changed the title Introduce concept of a database route path, separate from its name Introduce concept of a database route, separate from its name Mar 19, 2022
@simonw
Copy link
Owner Author

simonw commented Mar 19, 2022

I think I've got this working but I need to write a test for it that covers the rare case when the route is not the same thing as the database name.

I'll do that with a new test.

@simonw
Copy link
Owner Author

simonw commented Mar 19, 2022

Need to update documentation in a few places - e.g. https://docs.datasette.io/en/stable/internals.html#remove-database-name

This removes a database that has been previously added. name= is the unique name of that database, used in its URL path.

@simonw
Copy link
Owner Author

simonw commented Mar 19, 2022

Also need to update the datasette.urls methods that construct the URL to a database/table/row - they take the database name but they need to know to look for the route.

Need to add tests that check the links in the HTML and can confirm this is working correctly.

@simonw
Copy link
Owner Author

simonw commented Mar 20, 2022

I'd like to have a live demo of this up on latest.datasette.io too.

@simonw
Copy link
Owner Author

simonw commented Mar 20, 2022

I'm going to add a fixtures2.db database which has that as the name but alternative-route as the route. I'll set that up using a custom plugin in the plugins/ folder that gets deployed by https://github.com/simonw/datasette/blob/main/.github/workflows/deploy-latest.yml

@simonw
Copy link
Owner Author

simonw commented Mar 20, 2022

Building this plugin instantly revealed that all of the links - on the homepage and the database page and so on - are incorrect:

from datasette import hookimpl

@hookimpl
def startup(datasette):
    db = datasette.get_database("fixtures2")
    db.route = "alternative-route"

simonw added a commit that referenced this issue Mar 20, 2022
@simonw
Copy link
Owner Author

simonw commented Mar 20, 2022

@simonw
Copy link
Owner Author

simonw commented Mar 20, 2022

I'm going to release this as a 0.61 alpha so I can more easily depend on it from datasette-hashed-urls.

simonw added a commit that referenced this issue Mar 20, 2022
simonw added a commit to simonw/datasette-hashed-urls that referenced this issue Mar 20, 2022
@simonw simonw closed this as completed Mar 20, 2022
simonw added a commit that referenced this issue Mar 23, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
blocked Awaiting something else to happen first enhancement
Projects
None yet
Development

No branches or pull requests

1 participant