Skip to content

Duckle v0.2.0

Choose a tag to compare

@github-actions github-actions released this 04 Jun 14:35
· 167 commits to main since this release

Duckle v0.2.0 - local-first, embedded ETL on DuckDB.

Highlights

Universal upsert + CDC delete propagation

  • mode = upsert (MERGE) on every relational sink: PostgreSQL, MySQL/MariaDB,
    CockroachDB, SQL Server, Oracle, Snowflake, Databricks, DuckDB and SQLite,
    plus MongoDB (replace_one).
  • Optional delete propagation: a flag column (e.g. a CDC change type) removes
    matched rows from the target instead of upserting them. Verified live in
    Docker for SQL Server, Oracle, MySQL, Snowflake and DuckDB.

Change data capture + incremental

  • DuckLake CDC change-feed source (src.ducklake.changes): replays
    table_changes() since the last consumed snapshot, emitting a change_type
    column - pairs with delete propagation for a true mirror.
  • Watermark incremental load (xf.incremental): processes only rows past the
    last successful run's high-water mark, saved to workspace state.

Orchestration + canvas

  • Run Job (parent calls a child pipeline with context variables) and
    Parallelize (independent downstream branches run concurrently; 0 = auto, one
    branch per CPU core).
  • Control-flow nodes: Log, Warn, Die.
  • Undo/redo across nodes, edges and settings; Ctrl/Cmd+S save; component-level
    run logs written under the workspace.

Look and feel

  • DuckDB-aligned theme in light and dark modes: lemon-yellow / orange brand
    fills for primary actions and selection.

Install

Download the raw executable for your OS below and run it - no installer.

Enterprise / corporate networks (v0.2.0, refreshed 2026-06-05)

Duckle's HTTP clients now trust the operating-system certificate store in
addition to the bundled Mozilla roots, so it works behind a TLS-inspecting
corporate proxy (Zscaler, Netskope, ...) whose CA is installed in the OS
store, instead of failing with invalid peer certificate: UnknownIssuer.
This covers the engine + model downloads and the REST / cloud-API / warehouse
connectors plus the update and CI checks. The trust set is a strict superset
of the previous one, so machines without a corporate CA are unaffected.

  • New DUCKLE_CA_CERT env var: point it at a PEM bundle to trust an extra CA
    explicitly (split-tunnel setups, or a CA handed out as a file).
  • DuckDB's own extension downloads (extensions.duckdb.org) and cloud reads
    (S3 / GCS / Azure) run inside the DuckDB engine with its own TLS, so also
    allow / exempt extensions.duckdb.org from inspection for those.
  • The Duckie AI assistant remains optional - only it needs huggingface.co.
    Thanks to @DarekDan

Fixes (v0.2.0, refreshed 2026-06-06)

  • CSV / TSV reject port (#15): a delimited source with a declared typed
    column now routes rows that fail to parse (e.g. an invalid date) to its
    reject output as raw text, so they can be written straight to a separate
    CSV for review. Valid rows flow on, typed, instead of the whole read
    aborting on a single bad value. Wire the source's reject output to a sink's
    main input. Pipelines that do not wire the reject port are unchanged.

Fixes (v0.2.0, refreshed 2026-06-07)

  • SFTP support (#16): the File Transfer source's Protocol dropdown now does
    real SFTP (SSH) via russh + russh-sftp, alongside FTP / FTPS. Password or
    OpenSSH private-key auth, plus an optional host-key fingerprint (SHA256) pin.
    Pure-Rust, no extra system dependencies.
  • Parquet / CSV partition guard: a partitioned file sink now fails fast with a
    clear message if "Partition by columns" would create more than "Max
    partitions" files (default 10,000; 0 = unlimited), instead of silently
    writing tens of thousands of tiny files. Stops a high-cardinality partition
    key (e.g. country pairs) from turning a write into a multi-minute file storm.

Fixes (v0.2.0, refreshed 2026-06-08)

  • Schema autodetect (#18): "Autodetect from source" returned a generic
    col_1 / col_2 / col_3 placeholder for Excel, DuckLake, Avro, Iceberg, Delta,
    Spatial and Fixed-Width sources, even though running the node read the real
    schema. Autodetect now builds the exact same query as a run, so it reports
    the real columns for every file and embedded source.
  • Excel multi-file reads (#18): an Excel source pointed at a folder or a
    wildcard (e.g. data/*.xlsx) now reads every matching workbook instead of
    silently loading only the first one. The file picker also filters on
    .xlsx / .xls instead of .excel.
  • Embedded + DuckLake upsert (#19): the SQLite and DuckDB sinks now expose the
    Upsert write mode (set-based delete-by-key + re-insert) that the engine
    already supported, and the DuckLake sink gains the same Upsert mode with
    conflict columns.

Build + deploy: standalone pipelines (v0.2.0, refreshed 2026-06-08)

  • Build Pipeline: right-click a pipeline (in the project tree or on the
    canvas) and pick "Build pipeline" to produce ONE self-contained executable
    named after the pipeline - the equivalent of a Talend "Build Job". The file
    embeds the resolved pipeline, its contexts and routines, DuckDB, and only
    the DuckDB extensions that pipeline's components actually need, plus its
    secrets. There is no folder, no run script, and nothing extra to download.
  • Run it anywhere: on the server it self-extracts to a temp cache, uses its
    own embedded DuckDB and extensions (so the host needs no DuckDB install),
    runs the pipeline, and exits with the pipeline's status code. Schedule it
    with whatever the OS already has - cron, systemd timers, or Windows Task
    Scheduler. See docs/current/scheduler.md.
  • Secrets, your choice per build: Environment mode replaces every secret with
    a ${ENV:KEY} placeholder and ships a secrets.env.example, so nothing
    sensitive is written into the artifact; the runner resolves real env vars
    first, then a secrets.env beside the exe. Passphrase mode encrypts secrets
    with AES-256-GCM, unlocked at run time with DUCKLE_BUNDLE_PASSPHRASE.
  • Lean by design: only the needed extensions are bundled, so a CSV-to-CSV
    pipeline builds to about 28 MB instead of hundreds of MB.
  • File Transfer sink: a new sink uploads a pipeline's output over FTP, FTPS or
    SFTP (SSH), the write-side counterpart to the existing File Transfer source.

Connect to Claude / any MCP client (v0.2.0, refreshed 2026-06-09)

  • Connect to Claude: a new button in the designer top bar opens a popup that
    wires Duckle into Claude Code, Claude Desktop, Cursor, or any MCP client in
    one click. Duckle now ships a Model Context Protocol (MCP) server, so an AI
    assistant can browse the component catalog, generate a pipeline straight into
    your working directory, validate it, run it, and build a standalone artifact
    • all in your workspace, on your machine.
  • One click: "Connect to Claude Code" runs the registration for you;
    "Add to Claude Desktop" / "Add to Cursor" write the server into that client's
    config (both the Microsoft Store / MSIX and standalone Claude Desktop layouts
    are handled); or copy the command / config for any other client.
  • Bundled, no extra download: the MCP server ships inside the app and reuses the
    DuckDB engine, so there is nothing else to install. Headless usage + the full
    tool reference are in docs/current/mcp.md. @SouravRoy-ETL

Fixes (v0.2.0, refreshed 2026-06-09)

  • Source schema preserved on run (#18 follow-up): running a pipeline no longer
    overwrites a source's autodetected / declared schema. Re-running keeps the
    schema you set (the engine uses it, e.g. CSV column types) and only refreshes
    the preview rows. Covers CSV, Excel, DuckDB and DuckLake sources.
  • Connector username in ATTACH (#20): PostgreSQL / MySQL / CockroachDB /
    Redshift / pgvector connections now map the username field to the DB user in
    the generated connection string, so a connection that sets a username
    authenticates correctly. Thanks to @kyounghoonJang (#21).

Reroll - 2026-06-09 (logo, smaller downloads, Snowflake + SQLite/DuckDB fixes)

  • New brand: a geometric "D." mark replaces the old logo across the app icon /
    taskbar, the README hero, and the in-app top bar (theme-aware - brand yellow
    on slate in dark mode, orange on a pale disc in light mode).
  • Snowflake key-pair auth (#22): fixed 390144 "JWT token is invalid" on
    regional / PrivateLink accounts. The JWT now uses the account locator only in
    its iss / sub claims (the full account is still used for the REST URL).
  • SQLite / DuckDB sinks (#19): the Write mode dropdown now offers Append,
    Truncate, and Upsert (delete-by-key + re-insert, with optional delete-flag
    propagation), not just Create or replace.
  • MCP popup: the "Connect to Claude" action buttons now use Claude's orange to
    match its color scheme.

Reroll - 2026-06-10 (Snowflake, DuckDB/SQLite, Excel fixes)

  • Snowflake source (#24): result sets that split into multiple partitions
    (roughly n>300 wide rows) no longer fail with "response not JSON" - the
    gzip-compressed partition bodies are now decoded, and result columns are
    typed from the result metadata (real timestamps / dates / numbers instead
    of VARCHAR).
  • SQLite / DuckDB sinks (#19): selecting Upsert without conflict columns now
    errors clearly ("upsert needs at least one conflict column") instead of
    silently falling back to DROP TABLE + CREATE.
  • Excel source (#25): the Schema panel is now respected - retyped and removed
    columns are applied on read instead of being ignored.

Reroll - 2026-06-10 (canvas quick-add + Iterate / For Each fix)

  • Quick-add on the canvas: start typing on the designer to fuzzy-search every
    component (sources, transforms, sinks, connectors, control, quality, code)
    and drop the match where your cursor is - Enter to add, Esc to close.
  • Iterate / For Each (#26): these run a child pipeline, but the panel never let
    you pick one, so a run failed with "pipelineRef required". You can now select
    the pipeline to run (plus an iteration count for Iterate).