-
Notifications
You must be signed in to change notification settings - Fork 89
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Data loaders #38
Comments
It would be great if these could support directories as well. E.g. DuckDB's import/export database requires multiple files. |
We could do that using a zip file, I imagine. |
That would negate the benefits of using DuckDB to stream data. |
A static file server won’t support file listing, so you’ll need to statically list the files anyway. Therefore I expect you’ll need multiple data loaders so that the set of generated files is statically analyzable by the CLI. |
Here is an example for a set of files produced by DuckDB's
They are loaded by specifying the parent folder (here const db = await DuckDBClient.of();
const conn = await db._db.connect();
await conn.query(`IMPORT DATABASE 'http://localhost:3000/_file/cache/merge-queue/db'`); |
Can you create three data loaders?
You can share code between them, of course. |
They would need to share state. The build process goes roughly like this:
A practical example can be found in this BI branch (WIP but working). Ideally we would just produce a persistent database file as artifact (also demoed in that branch), but DuckDB's storage format is still in-flux and very finicky about mismatching duckdb versions. |
We should formalize the concept of a data loader: a script that runs during
build
(and as needed duringpreview
) to materialize a file attachment.As an initial sketch, it could be something like this:
package.json
), but we want to support arbitrary executables for a polyglot workflow, e.g. writing a data loader in R, Python, Julia, Zig, etc.FileAttachment
orfetch
); nothing on the client indicates any difference between a “static” file and a file that is generated by a data loader.docs/foo.csv
, the corresponding data loader would be nameddocs/foo.csv.ts
ordocs/foo.csv.py
..json.ts
) is used to distinguish data loaders from static files.docs/foo.csv.sql
. For this we’d need metadata to specify which database to use, and we’d infer the output format (such as CSV) from the file name.During preview:
During build:
The file-based routing approach described above isn’t a requirement. But it does have some nice properties that we should seek to maintain:
The text was updated successfully, but these errors were encountered: