-
Notifications
You must be signed in to change notification settings - Fork 129
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
sql cells, backed by DuckDB #844
Conversation
Neat! I wonder if sql is the right name for the list of data sources; it could be interesting to make it more generic, since these data sources could be consumed by something else than sql in the future (maybe… a smart data table)? |
28e8947
to
89ddd07
Compare
I want to use the name ---
sql:
engine: sqlite
tables: chinook.db
--- As for the display, I think we would change |
880f771
to
59cc841
Compare
d638eef
to
498f9d3
Compare
The interactive view tends to demultiply the number of tables sql-view.mp4 |
Feature requests (probably for future iterations):
We'll probably need to add some helper code similar to this snippet?
For example: ```yaml
sql:
adresses: https://static.data.gouv.fr/resources/bureaux-de-vote-et-adresses-de-leurs-electeurs/20230626-135723/table-adresses-reu.parquet
``` (this currently fails with Error: Invalid Error: Opening file 'table-adresses-reu.parquet' failed with error: Failed to open file: table-adresses-reu.parquet)
|
Yes, I mentioned the race condition in Slack. Is because there promises resolve out of order, and you can get a display from an older query after an earlier query. We need to handle this race condition in the display function (by ignoring the call). I can work on this later today but feel free to take a crack at it if you’re interested. |
On your requested enhancements, don’t forget that you can just override the definition of const db = await DuckDBClient.of({gaia: FileAttachment("./lib/gaia-sample.parquet")});
const sql = db.sql.bind(db); Seems reasonable to support external URLs in front matter, though. I think the challenge is that we’ll have to guess the format from the file extension. Perhaps it’s better to just use another SQL cell to insert the remote data: CREATE TABLE addresses
AS SELECT *
FROM read_parquet('https://static.data.gouv.fr/resources/bureaux-de-vote-et-adresses-de-leurs-electeurs/20230626-135723/table-adresses-reu.parquet')
LIMIT 100 But then we’d need the other SQL cells to block on the SQL cell that defines the |
On the race condition, I have filed a bug and a simple reproduction in #995 (which confirms that it’s a pre-existing problem, so we could ignore it in this PR for now — say by switching to a radio button instead of a range slider). It’ll also be a bit tricky to fix, I expect; see the linked issue for discussion. |
I could imagine passing it as an option in a long-form description of the source, for when it's opaque: sql:
- tableName:
- source: https://<something opaque>
- format: parquet |
I would not have guessed this invocation, but we can add syntactic sugar later. |
```sql id=[{min}] | ||
SELECT MIN(phot_g_mean_mag) AS min FROM gaia | ||
``` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would have liked this to be extensible to define multiple values, but it doesn't seem to work:
```sql id=[{min, max}]
SELECT MIN(phot_g_mean_mag) AS min, MAX(phot_g_mean_mag) AS max FROM gaia
```
defines only min
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, that’s surprising. I would expect that to work and I’ll investigate.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In fact it works, I was editing the wrong code block! 😭
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could update that example in the documentation to include min and max?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, so the attribute gotcha applies, which means you either need quotes
```sql id="[{min, max}]"
SELECT MIN(phot_g_mean_mag) AS min, MAX(phot_g_mean_mag) AS max FROM gaia
```
or to remove spaces
```sql id=[{min,max}]
SELECT MIN(phot_g_mean_mag) AS min, MAX(phot_g_mean_mag) AS max FROM gaia
```
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is truly magnificent
We could add this shorthand in the future: const sql = DuckDBClient.sql({gaia: FileAttachment("./lib/gaia-sample.parquet")}); |
Fixes #48. TODO
sql
tagged-template literal implemented bynpm:@observablehq/duckdb
sql
needs implicit dependency on duckdbsql
literal needs a sharedDuckDBClient
under the hood (not a new one for each query!)sql
frontmatter option to declare which tables should be available tosql
registerTable
calls, or some such, to register these tables forsql
id
) the output of asql
cell for access in JavaScript?id
is invalidid=[{foo}]
)getResolvers
sql
when front matter is edited during previewdisplay
directivesupport DuckDB files as source?[future]I’m imagining the frontmatter would look something like this:
But possibly you can declare the tables individually:
And in any case we’ll need to make sure you can use a data loader to generate the backing files (e.g.,
customers.csv.js
). And ideally we’d correctly handle incremental edits to the frontmatter during preview, un-registering and re-registering tables, and hot module replacement to the tables 😅 with file watching. But if we have to reload the page in the interim I think this would still be a great start.A challenge with
sql.init
will be avoiding the race condition — if we start to process asql
query before we’ve registered the tables, the results will be wrong. But if we’re going to do reactive updates to thesql
definition anyway, maybe the answer is thatsql
should be defined as a generator, and any call to register or unregister a table will cause this generator to yield a new value, and then reactively reevaluate all the downstream queries. That sounds like the right path since it leverages reactivity (though it might mean some brief flashes of “wrong” results?).One way to implement a better display for
sql
cells would be to use asql.table
tagged-template literal instead ofsql
(like we do fortex.block
). We could also change the generated code under the hood so instead of this:it’s more like this:
As for naming the output, that could be something like: