-
Notifications
You must be signed in to change notification settings - Fork 0
File uploads
The fastest path from "I have a CSV" to "I'm querying it."
Drag a file anywhere on the app → it lands as a queryable DuckDB
table. Drop customers.csv → SELECT * FROM customers works
immediately, no read_csv_auto('…') calls required.
CSV · TSV · TXT · JSON · JSONL · NDJSON · Parquet
The connector picks the right DuckDB reader based on file extension:
| Extension | Reader |
|---|---|
.csv / .tsv / .txt
|
read_csv_auto |
.json / .jsonl / .ndjson
|
read_json_auto |
.parquet |
read_parquet |
Excel (.xlsx) is not supported yet — DuckDB needs its excel
extension autoloaded. See Roadmap.
- You drop a file. The frontend POSTs it to
/api/files/uploadas multipart form data. - The backend (
rednotebook/uploads/store.py) saves it underlocal_data/uploads/<user-id>/<uuid>.<ext>and adds a manifest entry. - The table name is sanitised: lowercase, alnum + underscore, leading
digits prefixed, collisions resolved (
customers_2,customers_3). - On every subsequent query, the DuckDB connector reads the manifest
and emits
CREATE OR REPLACE VIEW <table_name> AS SELECT * FROM read_<ext>('/abs/path')per file before running the user's SQL.
The views are session-local to each query, so:
- They're always fresh.
- A renamed / deleted file disappears on the next query without a reconnect.
- The connector layer never holds a long-lived connection.
- 200 MB per file. Streams in 1 MiB chunks during upload so memory doesn't double.
- Per-user. The manifest is scoped to the request's user; one user's uploads are not visible to another.
- DuckDB connections only. Postgres / Snowflake / etc. don't auto-register the views — those engines can't query a local file the way DuckDB can.
- The Files panel in the left sidebar lists every uploaded file with its current table name + original filename + size.
- Hover a row → click the trash icon to delete.
- A rename endpoint exists (
PATCH /api/files/<id>) — UI for it is on the Issue #X follow-up list.
-
Drag
orders.csvonto the canvas. Toast confirmsReady: \orders``. -
In a SQL cell:
SELECT date_trunc('week', order_date)::date AS week, region, COUNT(*) AS orders, ROUND(SUM(revenue), 2) AS revenue FROM orders GROUP BY 1, 2 ORDER BY 1, 2
-
Hit Run. Result populates. Click Profile for distributions.
-
Click Summarize result — AI brief grounded in the actual rows.
-
Click Publish to share the notebook + result snapshot publicly.
That's the analyst-favourite five-minute loop.