Skip to content

History / sharding design

Revisions

  • design: codify Phase G keychain-backed shard endpoints (deferred from R4) Records the design decision made 2026-06-04: R4 ships with the flat `NOETL_SHARDS` env-var DSN model; the keychain-backed DB-resident shard model becomes Phase G work. Two reasons captured: 1. Cross-cloud heterogeneity (GCP workload identity vs AWS IAM vs static password) needs a connection record with auth_kind + credential_alias, not a flat URL. 2. `execution-model.md` § "Secrets and credentials rule" lists tenant DB DSNs as business-logic credentials; same applies to shard DSNs at the cross-cloud boundary. Phase G shape sketched: `noetl.shard_endpoint` table on the cluster master with credential_alias references into `noetl.keychain`; same model extended to `noetl.result_ref` so external storage pointers carry their own keychain alias. R4 continues unchanged: env-var DSN through R4-5 + kind validation. R4-1's env-var path documented as kind-dev fallback in the long-term layering.

    @kadyapam kadyapam committed Jun 4, 2026
  • wiki: Phase F R1 — sharding design doc (noetl/server#40) R1 of Phase F under noetl/ai-meta#49. Codifies the survey findings into a durable design doc the next rounds (R2 server- side shard_id; R3 gateway-side dispatch; R4 DB sharding; R5 cutover) refer to. New page `sharding-design.md` covering: - TL;DR with the 5 design decisions. - Why the architecture is already ready (NATS subjects already execution-aware; orchestrator is stateless re: execution; advisory locks are command-scoped; transient variables already keyed by execution_id). - Shard assignment: `hash(execution_id) % N` on the full i64, with rationale for rejecting `timestamp % N` (time hotspots) and `machine_id % N` (clusters by generating machine). - Routing strategy: gateway-aware for F; server-aware proxy as Phase G fallback if needed. - Per-execution table partition list (5 tables) with citations. - Cluster-wide table list (4 tables) + sync strategy (single- master for F; per-shard read replicas as G). - Full endpoint inventory: per-execution (route by execution_id) vs cluster-wide (any shard answers); the awkward GET /api/executions list case called out. - Load-bearing prerequisite: app-side snowflake IDs. Per observability.md Principle 3, execution_id should be generated in the application, not by the DB. Must land before R4 — otherwise per-shard DB sequences cluster IDs to one shard. Migration shape laid out as R1.5. - Phase F decomposition table (R1–R5). - Open question NOT decided in R1: DB partition scheme (Citus vs per-shard schemas vs per-shard DBs) — R4 decides. - Out-of-scope for F: multi-region, per-tenant sharding, dynamic re-sharding. Sidebar + Home cross-linked. Closes noetl/server#40 Refs noetl/ai-meta#49

    @kadyapam kadyapam committed Jun 4, 2026