Duckle v0.2.0
Duckle v0.2.0 - local-first, embedded ETL on DuckDB.
Highlights
Universal upsert + CDC delete propagation
mode = upsert(MERGE) on every relational sink: PostgreSQL, MySQL/MariaDB,
CockroachDB, SQL Server, Oracle, Snowflake, Databricks, DuckDB and SQLite,
plus MongoDB (replace_one).- Optional delete propagation: a flag column (e.g. a CDC change type) removes
matched rows from the target instead of upserting them. Verified live in
Docker for SQL Server, Oracle, MySQL, Snowflake and DuckDB.
Change data capture + incremental
- DuckLake CDC change-feed source (
src.ducklake.changes): replays
table_changes() since the last consumed snapshot, emitting a change_type
column - pairs with delete propagation for a true mirror. - Watermark incremental load (
xf.incremental): processes only rows past the
last successful run's high-water mark, saved to workspace state.
Orchestration + canvas
- Run Job (parent calls a child pipeline with context variables) and
Parallelize (independent downstream branches run concurrently; 0 = auto, one
branch per CPU core). - Control-flow nodes: Log, Warn, Die.
- Undo/redo across nodes, edges and settings; Ctrl/Cmd+S save; component-level
run logs written under the workspace.
Look and feel
- DuckDB-aligned theme in light and dark modes: lemon-yellow / orange brand
fills for primary actions and selection.
Install
Download the raw executable for your OS below and run it - no installer.
Enterprise / corporate networks (v0.2.0, refreshed 2026-06-05)
Duckle's HTTP clients now trust the operating-system certificate store in
addition to the bundled Mozilla roots, so it works behind a TLS-inspecting
corporate proxy (Zscaler, Netskope, ...) whose CA is installed in the OS
store, instead of failing with invalid peer certificate: UnknownIssuer.
This covers the engine + model downloads and the REST / cloud-API / warehouse
connectors plus the update and CI checks. The trust set is a strict superset
of the previous one, so machines without a corporate CA are unaffected.
- New
DUCKLE_CA_CERTenv var: point it at a PEM bundle to trust an extra CA
explicitly (split-tunnel setups, or a CA handed out as a file). - DuckDB's own extension downloads (
extensions.duckdb.org) and cloud reads
(S3 / GCS / Azure) run inside the DuckDB engine with its own TLS, so also
allow / exemptextensions.duckdb.orgfrom inspection for those. - The Duckie AI assistant remains optional - only it needs
huggingface.co.
Thanks to @DarekDan
Fixes (v0.2.0, refreshed 2026-06-06)
- CSV / TSV reject port (#15): a delimited source with a declared typed
column now routes rows that fail to parse (e.g. an invalid date) to its
reject output as raw text, so they can be written straight to a separate
CSV for review. Valid rows flow on, typed, instead of the whole read
aborting on a single bad value. Wire the source's reject output to a sink's
main input. Pipelines that do not wire the reject port are unchanged.
Fixes (v0.2.0, refreshed 2026-06-07)
- SFTP support (#16): the File Transfer source's Protocol dropdown now does
real SFTP (SSH) via russh + russh-sftp, alongside FTP / FTPS. Password or
OpenSSH private-key auth, plus an optional host-key fingerprint (SHA256) pin.
Pure-Rust, no extra system dependencies. - Parquet / CSV partition guard: a partitioned file sink now fails fast with a
clear message if "Partition by columns" would create more than "Max
partitions" files (default 10,000; 0 = unlimited), instead of silently
writing tens of thousands of tiny files. Stops a high-cardinality partition
key (e.g. country pairs) from turning a write into a multi-minute file storm.
Fixes (v0.2.0, refreshed 2026-06-08)
- Schema autodetect (#18): "Autodetect from source" returned a generic
col_1 / col_2 / col_3 placeholder for Excel, DuckLake, Avro, Iceberg, Delta,
Spatial and Fixed-Width sources, even though running the node read the real
schema. Autodetect now builds the exact same query as a run, so it reports
the real columns for every file and embedded source. - Excel multi-file reads (#18): an Excel source pointed at a folder or a
wildcard (e.g.data/*.xlsx) now reads every matching workbook instead of
silently loading only the first one. The file picker also filters on
.xlsx/.xlsinstead of.excel. - Embedded + DuckLake upsert (#19): the SQLite and DuckDB sinks now expose the
Upsert write mode (set-based delete-by-key + re-insert) that the engine
already supported, and the DuckLake sink gains the same Upsert mode with
conflict columns.
Build + deploy: standalone pipelines (v0.2.0, refreshed 2026-06-08)
- Build Pipeline: right-click a pipeline (in the project tree or on the
canvas) and pick "Build pipeline" to produce ONE self-contained executable
named after the pipeline - the equivalent of a Talend "Build Job". The file
embeds the resolved pipeline, its contexts and routines, DuckDB, and only
the DuckDB extensions that pipeline's components actually need, plus its
secrets. There is no folder, no run script, and nothing extra to download. - Run it anywhere: on the server it self-extracts to a temp cache, uses its
own embedded DuckDB and extensions (so the host needs no DuckDB install),
runs the pipeline, and exits with the pipeline's status code. Schedule it
with whatever the OS already has - cron, systemd timers, or Windows Task
Scheduler. See docs/current/scheduler.md. - Secrets, your choice per build: Environment mode replaces every secret with
a${ENV:KEY}placeholder and ships asecrets.env.example, so nothing
sensitive is written into the artifact; the runner resolves real env vars
first, then asecrets.envbeside the exe. Passphrase mode encrypts secrets
with AES-256-GCM, unlocked at run time withDUCKLE_BUNDLE_PASSPHRASE. - Lean by design: only the needed extensions are bundled, so a CSV-to-CSV
pipeline builds to about 28 MB instead of hundreds of MB. - File Transfer sink: a new sink uploads a pipeline's output over FTP, FTPS or
SFTP (SSH), the write-side counterpart to the existing File Transfer source.
Connect to Claude / any MCP client (v0.2.0, refreshed 2026-06-09)
- Connect to Claude: a new button in the designer top bar opens a popup that
wires Duckle into Claude Code, Claude Desktop, Cursor, or any MCP client in
one click. Duckle now ships a Model Context Protocol (MCP) server, so an AI
assistant can browse the component catalog, generate a pipeline straight into
your working directory, validate it, run it, and build a standalone artifact- all in your workspace, on your machine.
- One click: "Connect to Claude Code" runs the registration for you;
"Add to Claude Desktop" / "Add to Cursor" write the server into that client's
config (both the Microsoft Store / MSIX and standalone Claude Desktop layouts
are handled); or copy the command / config for any other client. - Bundled, no extra download: the MCP server ships inside the app and reuses the
DuckDB engine, so there is nothing else to install. Headless usage + the full
tool reference are in docs/current/mcp.md. @SouravRoy-ETL
Fixes (v0.2.0, refreshed 2026-06-09)
- Source schema preserved on run (#18 follow-up): running a pipeline no longer
overwrites a source's autodetected / declared schema. Re-running keeps the
schema you set (the engine uses it, e.g. CSV column types) and only refreshes
the preview rows. Covers CSV, Excel, DuckDB and DuckLake sources. - Connector username in ATTACH (#20): PostgreSQL / MySQL / CockroachDB /
Redshift / pgvector connections now map the username field to the DB user in
the generated connection string, so a connection that sets a username
authenticates correctly. Thanks to @kyounghoonJang (#21).
Reroll - 2026-06-09 (logo, smaller downloads, Snowflake + SQLite/DuckDB fixes)
- New brand: a geometric "D." mark replaces the old logo across the app icon /
taskbar, the README hero, and the in-app top bar (theme-aware - brand yellow
on slate in dark mode, orange on a pale disc in light mode). - Snowflake key-pair auth (#22): fixed
390144 "JWT token is invalid"on
regional / PrivateLink accounts. The JWT now uses the account locator only in
its iss / sub claims (the full account is still used for the REST URL). - SQLite / DuckDB sinks (#19): the Write mode dropdown now offers Append,
Truncate, and Upsert (delete-by-key + re-insert, with optional delete-flag
propagation), not just Create or replace. - MCP popup: the "Connect to Claude" action buttons now use Claude's orange to
match its color scheme.
Reroll - 2026-06-10 (Snowflake, DuckDB/SQLite, Excel fixes)
- Snowflake source (#24): result sets that split into multiple partitions
(roughly n>300 wide rows) no longer fail with "response not JSON" - the
gzip-compressed partition bodies are now decoded, and result columns are
typed from the result metadata (real timestamps / dates / numbers instead
of VARCHAR). - SQLite / DuckDB sinks (#19): selecting Upsert without conflict columns now
errors clearly ("upsert needs at least one conflict column") instead of
silently falling back to DROP TABLE + CREATE. - Excel source (#25): the Schema panel is now respected - retyped and removed
columns are applied on read instead of being ignored.
Reroll - 2026-06-10 (canvas quick-add + Iterate / For Each fix)
- Quick-add on the canvas: start typing on the designer to fuzzy-search every
component (sources, transforms, sinks, connectors, control, quality, code)
and drop the match where your cursor is - Enter to add, Esc to close. - Iterate / For Each (#26): these run a child pipeline, but the panel never let
you pick one, so a run failed with "pipelineRef required". You can now select
the pipeline to run (plus an iteration count for Iterate).