Skip to content

Data Access Boundary

Kadyapam edited this page Jun 2, 2026 · 3 revisions

Data Access Boundary

Authoritative rule: agents/rules/data-access-boundary.md · Pairs with: agents/rules/execution-model.md

TL;DR

NoETL platform data is accessible via the NoETL server API only. Workers — including the system worker pool — call the server's HTTP API for any read or write to NoETL-owned tables.

NoETL-owned tables: noetl.event, noetl.command, noetl.execution, noetl.outbox, noetl.catalog, noetl.credential, noetl.keychain, noetl.runtime, any future noetl.*.

Visual

   ┌──────────────────┐    HTTP     ┌──────────────────────┐
   │ Rust worker      │────────────►│  NoETL server        │
   │ (user pool)      │             │  (Python today,      │
   └──────────────────┘             │   Rust eventually)   │
                                    │                      │
   ┌──────────────────┐    HTTP     │  ▲                   │
   │ noetl-worker-    │────────────►│  │ ONLY this talks   │
   │ system-pool      │             │  │ to Postgres       │
   └──────────────────┘             │  │                   │
                                    └──┼───────────────────┘
                                       │
                                       │ SQL
                                       ▼
                                    ┌──────────────────────┐
                                    │  Postgres            │
                                    │  noetl.event         │
                                    │  noetl.command       │
                                    │  noetl.outbox        │
                                    │  noetl.catalog       │
                                    │  noetl.credential    │
                                    │  ...                 │
                                    └──────────────────────┘


                       Exception — external subsystems
                       (auth, credential rotation, alerting)
                                       │
                                       │ direct via tool kinds
                                       ▼
                                    ┌──────────────────────┐
                                    │  External systems    │
                                    │  (Auth0, Vault,      │
                                    │   PagerDuty, ...)    │
                                    └──────────────────────┘

Workers calling /api/internal/* carry the system pool's service-account token; user-pool workers can't call those routes (403). Everything else (/api/events, /api/catalog/*, etc.) is open to all worker pools with appropriate auth.

Auth flow detail (kind-validated 2026-06-02)

The bearer-token plumbing reuses only existing NoETL primitives — no new code:

+------------------------------------------+
|  K8s Secret  noetl-internal-api-token    |
|    key=token, value=<32-byte hex>        |
+--------------------+---------------------+
                     |
                     | valueFrom.secretKeyRef
                     v
+------------------------------------------+
|  Pod env (worker-system-pool)            |
|    NOETL_INTERNAL_API_TOKEN=<token>      |
|    NOETL_KEYCHAIN_ENV_VARS=NOETL_INTERNAL_API_TOKEN
+--------------------+---------------------+
                     |
                     | worker startup
                     | load_keychain_env_allowlist()
                     v
+------------------------------------------+
|  ctx.secrets["NOETL_INTERNAL_API_TOKEN"] |
|    = "<token>"                           |
+--------------------+---------------------+
                     |
                     | playbook command dispatch
                     v
+------------------------------------------+
|  tool: http                              |
|    auth:                                 |
|      type: bearer                        |
|      credential: NOETL_INTERNAL_API_TOKEN|
|  AuthResolver.resolve_bearer():          |
|    ctx.get_secret("NOETL_INTERNAL_..")   |
+--------------------+---------------------+
                     |
                     | HTTPS
                     v
+------------------------------------------+
|  /api/internal/* + Authorization header  |
|    server: secrets.compare_digest        |
|  ✅ 200 OK                               |
+------------------------------------------+

Five hops, each an existing primitive: K8s Secret, env mount, NOETL_KEYCHAIN_ENV_VARS allow-list, AuthResolver, server's constant-time comparison gate.

Phase 2.a.3 turned out to be configuration-only — no Rust code change. The deployment manifest adds two env vars; the playbook references the alias via the standard auth: block.

Why

Three load-bearing reasons:

  1. Connection pool isolation — workers scale 1→50+ on backlog; direct DB access blows the platform connection pool and deadlocks the server's own API.
  2. Sharding readiness — server may shard by execution_id; API boundary makes shard routing transparent to workers.
  3. Single point of consistency — schema migrations, audit logging, RBAC, scrub all enforced at the server. Direct workers bypass them.

The exception — external-subsystem playbooks

When a playbook's target is an external system (Auth0, Vault, GCS, PagerDuty, a tenant's Postgres), it uses tool kinds direct. That's just NoETL acting as a client to an external system — the normal playbook pattern.

Examples of external-subsystem system playbooks (exempt from the rule):

  • system/auth — talks to Auth0 / Okta / SAML.
  • system/credential_rotate — talks to GCP Secret Manager / AWS Secrets Manager / Vault.
  • system/notify_alert — talks to PagerDuty / OpsGenie / Slack.

Examples of NoETL-state system playbooks (rule applies):

  • system/outbox_publisher — reads noetl.outboxHTTP to /api/internal/outbox/*.
  • system/projector — writes noetl.eventHTTP to /api/internal/events/project.
  • system/scheduled_cleanup — TTL on noetl.* tables → HTTP to /api/internal/cleanup/*.

Server-side implications

The Python server's API surface doesn't expose endpoints for the operations the system pool's playbooks need. These get added in Phase 1 of #46.

Initial inventory:

Endpoint Replaces direct DB access by
POST /api/internal/outbox/claim?limit=N Python claim_outbox_batch
POST /api/internal/outbox/mark-published Python mark_outbox_published
POST /api/internal/outbox/mark-failed Python mark_outbox_failed
GET /api/internal/outbox/pending-count (new) KEDA HTTP scaler source
POST /api/internal/events/project Python projector batch INSERT

Auth gate: service-account token only the system pool's K8s ServiceAccount carries. User playbooks calling /api/internal/* get 403.

Decision tree for a new playbook

  1. Does it touch noetl.* tables?
    • Yes → API only.
    • No → tool kinds direct.
  2. Does the API have the endpoints needed?
    • Yes → use them.
    • No → file sub-issue on noetl/noetl / noetl/server to add the endpoint before writing the playbook.
  3. Auth? Always — /api/internal/* requires the system pool's service account token.

History

Codified 2026-06-02 (afternoon) per standing instruction. Full rationale + endpoint inventory in the authoritative rule above.

NoETL Dashboard

Active Umbrellas

Closed Umbrellas

Conventions

Per-repo wikis

Clone this wiki locally