Skip to content

File uploads

Sanni heruwala edited this page Jun 12, 2026 · 1 revision

File uploads

The fastest path from "I have a CSV" to "I'm querying it."

Drag a file anywhere on the app → it lands as a queryable DuckDB table. Drop customers.csvSELECT * FROM customers works immediately, no read_csv_auto('…') calls required.


What's supported

CSV · TSV · TXT · JSON · JSONL · NDJSON · Parquet

The connector picks the right DuckDB reader based on file extension:

Extension Reader
.csv / .tsv / .txt read_csv_auto
.json / .jsonl / .ndjson read_json_auto
.parquet read_parquet

Excel (.xlsx) is not supported yet — DuckDB needs its excel extension autoloaded. See Roadmap.


How it works under the hood

  1. You drop a file. The frontend POSTs it to /api/files/upload as multipart form data.
  2. The backend (rednotebook/uploads/store.py) saves it under local_data/uploads/<user-id>/<uuid>.<ext> and adds a manifest entry.
  3. The table name is sanitised: lowercase, alnum + underscore, leading digits prefixed, collisions resolved (customers_2, customers_3).
  4. On every subsequent query, the DuckDB connector reads the manifest and emits CREATE OR REPLACE VIEW <table_name> AS SELECT * FROM read_<ext>('/abs/path') per file before running the user's SQL.

The views are session-local to each query, so:

  • They're always fresh.
  • A renamed / deleted file disappears on the next query without a reconnect.
  • The connector layer never holds a long-lived connection.

Limits

  • 200 MB per file. Streams in 1 MiB chunks during upload so memory doesn't double.
  • Per-user. The manifest is scoped to the request's user; one user's uploads are not visible to another.
  • DuckDB connections only. Postgres / Snowflake / etc. don't auto-register the views — those engines can't query a local file the way DuckDB can.

Renaming and removing

  • The Files panel in the left sidebar lists every uploaded file with its current table name + original filename + size.
  • Hover a row → click the trash icon to delete.
  • A rename endpoint exists (PATCH /api/files/<id>) — UI for it is on the Issue #X follow-up list.

Example flow

  1. Drag orders.csv onto the canvas. Toast confirms Ready: \orders``.

  2. In a SQL cell:

    SELECT
      date_trunc('week', order_date)::date AS week,
      region,
      COUNT(*)                              AS orders,
      ROUND(SUM(revenue), 2)                AS revenue
    FROM orders
    GROUP BY 1, 2
    ORDER BY 1, 2
  3. Hit Run. Result populates. Click Profile for distributions.

  4. Click Summarize result — AI brief grounded in the actual rows.

  5. Click Publish to share the notebook + result snapshot publicly.

That's the analyst-favourite five-minute loop.

Clone this wiki locally