Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DuckDBClient.of should support the native file format #1057

Closed
bayre opened this issue Mar 14, 2024 · 4 comments · Fixed by #1065
Closed

DuckDBClient.of should support the native file format #1057

bayre opened this issue Mar 14, 2024 · 4 comments · Fixed by #1065
Labels
enhancement New feature or request

Comments

@bayre
Copy link

bayre commented Mar 14, 2024

DuckDB has a native file format which can be used to create a persistent database. In my case I have a data pipeline in dbt-duckdb and the output is a single named file that I'd like to analyse with framework. I've got it working locally with the following addition to stdlib/duckdb.js:

if (/\.duckdb$/i.test(file.name)) {
    return await connection.query(`ATTACH '${file.name}' AS ${name} (READ_ONLY)`);
}

It's worth noting that the file extension appears to be arbitrary, and this approach would not support .db or .ddb files. It's also possible that users may want to attach other types of database to the client, but that's probably worth a separate issue.

@bayre bayre added the enhancement New feature or request label Mar 14, 2024
@Fil
Copy link
Contributor

Fil commented Mar 15, 2024

About the db file format: ideally DuckDB should auto-detect the type (based on the file's magic number, I guess), and we would not need to tell it explicitly.

The magic number for sqlite3 files is "53 51 4c 69 74 65 20 66 6f 72 6d 61 74 20 33 00" (zero-terminated ASCII "SQLite format 3").

The magic number for duckdb files (reference) looks a bit strange to me, as it does not start at the first byte of the file.

Of course we could imagine detecting this server-side and saving the information in the file reference mimeType, or loading the first bytes in the client and checking the magic number—but I think it's more an issue for DuckDB than for Framework.

In the meantime we should support this only for explicit file names (.duckdb and .sqlite3), and not support .db.

PS: to support sqlite3 in DuckDB, we'd also need to load the sqlite3 extension.

@Fil
Copy link
Contributor

Fil commented Mar 15, 2024

A second remark is that this imports a complete database—as opposed to a single table, as we do for parquet and arrow files. It's not a problem, it works perfectly, but it means we have to change our idea that DuckDBClient.of({ x : …}) or the equivalent sql front matter maps x to a table. In this case it will also map x to a table prefix (the actual table name being x."TABLENAME".)

For instance, with your patch, we can import the database.duckdb file under the base name:

const db = DuckDBClient.of({base: FileAttachment("database.duckdb")});

And thanks to duckdb’s "hierarchical" handling of table names, we retrieve the data with:

display([...await db.sql`SELECT * FROM base.TEST`]) // [Row, Row…]

Similarly (and also, much simpler), your patch enables this:

---
sql:
  base: database.duckdb
---

```sql
SELECT * FROM base.TEST
```

@bayre
Copy link
Author

bayre commented Mar 15, 2024

Thanks @Fil. The latter is how I'm using it currently. Very satisfying, but a clear departure from the client creating an in memory database from a list of sources. My sense is that it might be worth being more explicit about attachments, even if my current solution does work.

@Fil
Copy link
Contributor

Fil commented Mar 15, 2024

In fact checking the magic number was not that hard, and allows to support any file extension. In #1065 I still limit the scope a bit to well-known extensions, but it's not a requirement and we could trust the user instead. (Since anyway it's the last thing we try before erroring.)

mbostock added a commit that referenced this issue Mar 18, 2024
* Add support for DuckDB database files to sql and DuckDBClient.of

closes #1057

* let DuckDB handle any other file as a database file

* document

* a bit more doc

* clarify attach, append an example database and the associated (but inert) data loader.

* doc edits

* .{db,ddb,duckdb}

---------

Co-authored-by: Mike Bostock <mbostock@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants