Skip to content

v0.99.45

Choose a tag to compare

@github-actions github-actions released this 15 Jun 02:31
· 5 commits to main since this release
9f35b61

sluice v0.99.45

New: sluice sync now forwards source schema changes by default — your CDC stream stays online through routine column adds/drops/type changes instead of refusing or crash-looping. Three changes from the first real PlanetScale soak: (1) default-on single-stream schema-change forwarding via a new tristate --schema-changes=forward|refuse (default forward); (2) source-side Vitess schema-resolution gaps (DDL cutover / historian off) now ride out in-process instead of killing the stream; (3) target schema-drift apply errors now self-heal instead of a tight restart crash-loop. This is a behavior change on upgrade — see Compatibility. No on-disk/format changes; migrate and the cold-start copy path are untouched.

Features

  • Default-on schema-change forwarding for single-stream CDC (ADR-0091, F7a). A new tristate flag --schema-changes=forward|refuse (default forward) replaces the old opt-in ADD-COLUMN-only path. When the source applies an unambiguous DDL change, sluice now retargets it to the target dialect and applies it in-line on the CDC apply boundary, so the sync stays online instead of refusing and forcing a manual drain-and-DDL. The per-shape forwarding matrix (ADR-0091 §1d is the source of truth) is:
    • ADD COLUMN, DROP COLUMN, ALTER COLUMN TYPE — forward on both source engines, same-engine and cross-engine (MySQL↔PG). DROP COLUMN auto-applies on the target.
    • ALTER NULLABILITY — forwards on a MySQL source only (the reader's emission gate is widened to a separate nullability-delta signature; the value-fidelity decode signature is left untouched). PG-source nullability is not forwarded — pgoutput's wire carries no nullability flag.
    • REORDER — a no-op (sluice decodes by column name, not ordinal).
    • RENAME COLUMNrefuses loudly on both engines. From the stream alone a rename is indistinguishable from a same-type drop+add, and guessing wrong risks silent data loss. (PG attnum-proven rename forwarding is a planned follow-up, F7b.)
    • Multi-shape combos and ADD COLUMN with a volatile/computed DEFAULT (NOW()/nextval/random) — refuse loudly, preserved verbatim from ADR-0058.
    • Documented limitations — not forwarded (the wire doesn't carry the metadata): PG-source nullability / index / check; MySQL-source index / check. These produce no boundary and so are invisible to forwarding. This is not silent corruption: any resulting incompatibility (e.g. a source DROP NOT NULL the target still enforces) surfaces as a loud apply error on the next affected row; a benign one (a missing secondary index) simply leaves the target without that object.
    • Safety against phantom-destructive forwards: a seed-guard never forwards a destructive shape classified against the cold-start baseline — only on a genuine CDC→CDC boundary — and the PG normalizer strips the generated columns + secondary indexes that pgoutput omits from the cold-start seed, so a phantom DROP can't be synthesized and applied (this was a CRITICAL regression caught by CI on the flip and fixed before ship).
    • --forward-schema-add-column is deprecated: it still forwards (now subsumed by --schema-changes=forward) and emits a one-time WARN. Pin the old conservative behavior with --schema-changes=refuse.

Fixed

  • Source-side Vitess schema-resolution errors are now retriable, not terminal (F9). The source vstreamer resolves each row event against the table schema for the replay position; right after a DDL cutover — or when the Vitess schema historian is off (track_schema_versions is disabled by default on PlanetScale) — that lookup transiently misses with unknown table <t> in schema / no schema found for table <t>. These arrive as free-text VStream errors with no gRPC status or MySQL-error wrapper, so they fell through to terminal and killed the stream on a window that clears itself once the historian catches up. They are now classified ir.RetriableError, so the ADR-0038 backoff rides out the cutover window in-process. Substring-matched and pinned, with a near-miss guard so a bare "unknown table" (a real DROP/typo) stays terminal. Affected releases: the reader-error classification shipped in v0.46.0, so every CDC release v0.46.0 through v0.99.44 carried the terminal misclassification (introduced by e320a49, internal/engines/mysql/reader_errors.go); it became materially more likely to bite on PlanetScale (historian off) and on self-hosted Vitess once warm-resume landed in v0.99.44.

  • Target schema-drift apply errors no longer tight-restart crash-loop; the sync self-heals (F8). A source ADD COLUMN (or new table) that the operator hasn't yet created on the target made the apply fail terminal — PG 42703 undefined_column / 42P01 undefined_table, MySQL 1054 ER_BAD_FIELD_ERROR / 1146 ER_NO_SUCH_TABLE — exiting the process; under a supervisor that became a ~6s tight-restart loop (the soak observed NRestarts=1821). These codes are now classified ir.RetriableError with a remedy-named message, so the ADR-0038 exponential backoff rides them out in-process and the sync self-heals the moment the operator adds the missing column/table on the target (verified live on the soak). The wrap keeps the underlying *PgError/*MySQLError reachable via errors.As, so the offending column stays named on every (loud) retry; a genuine sluice bug producing these still fails loud after the retry budget — just not in a tight loop. Covers MySQL→MySQL (incl. PlanetScale→PlanetScale) and PG targets symmetrically. Scope: ADD COLUMN / missing-table only; DROP COLUMN, rename, and reorder drift are tracked separately and intentionally out of scope. Affected releases: the default-deny (terminal) classification of these codes has existed since the bounded-retry applier framework shipped in v0.42.0 (introduced by 008f2f2, internal/engines/postgres/applier_errors.go), so every CDC release v0.42.0 through v0.99.44 carried the crash-loop behavior.

Compatibility

  • Drop-in on disk, but a deliberate runtime behavior change on upgrade — read this. With the new default --schema-changes=forward, a continuous sync that previously refused loudly on a source DDL change now forwards it by default — including DROP COLUMN, which auto-applies (drops the column) on the target. If your operational model depended on sluice halting on source DDL so you could coordinate the change by hand, set --schema-changes=refuse to restore the exact pre-v0.99.45 conservative behavior. The behavior change applies only to the shapes that actually forward (ADR-0091 §1d's ✅ rows); refused shapes (RENAME, multi-shape, volatile DEFAULT) still refuse loudly as before.
  • --forward-schema-add-column is deprecated (still works, warns). No action required unless you want to silence the warning; replace it with the default --schema-changes=forward.
  • No new required flags; no on-disk/format changes. Existing backups restore unchanged. migrate, the cold-start bulk-copy path, and cross-engine value translation are untouched. The F8/F9 retriability fixes need no configuration.

Who needs this — action required

  • Anyone running live sluice sync (CDC) who does routine schema evolution on the source — review the new default before upgrading. After upgrade, source column add/drop/type changes (and MySQL nullability changes) forward to the target automatically. Action: decide whether you want that (the new default forward, recommended for uptime) or the old halt-on-DDL behavior (--schema-changes=refuse). If you relied on the stream stopping so you could apply DDL manually, you must set --schema-changes=refuse or your sync will now forward — including target column drops. No data is silently lost either way (refused/unforwardable shapes fail loud), but the forwarding is real and applies on upgrade.
  • PlanetScale / Vitess sync users — upgrade recommended (F9). A DDL cutover or the (default-off on PlanetScale) schema historian no longer kills the stream on the transient unknown table … in schema window; it rides out in-process. No action beyond upgrading.
  • Anyone who hit the schema-drift restart crash-loop (F8) — upgrade fixes it; no re-verification needed. A missing target column/table no longer tight-restart-loops the process; the sync self-heals once you add the column/table on the target. This fix changes failure handling only — it does not affect already-applied data, so no count re-verification is required.

Install: brew install sluicesync/tap/sluice · go install sluicesync.dev/sluice/cmd/sluice@v0.99.45 · Container: ghcr.io/sluicesync/sluice:0.99.45
Full changelog: https://github.com/sluicesync/sluice/blob/main/CHANGELOG.md