Skip to content

feat(011): Model Management API — validate, refresh, delete, single-cube, diff, rollback#45

Merged
acmeguy merged 6 commits intomainfrom
011-model-mgmt-api
Apr 21, 2026
Merged

feat(011): Model Management API — validate, refresh, delete, single-cube, diff, rollback#45
acmeguy merged 6 commits intomainfrom
011-model-mgmt-api

Conversation

@acmeguy
Copy link
Copy Markdown

@acmeguy acmeguy commented Apr 20, 2026

Summary

Six authenticated REST endpoints that let an agent own the full author-to-publish lifecycle of a cube model without operator assistance:

Endpoint Story
POST /api/v1/validate-in-branch US1 — contextual compile against a branch's cubes, three modes (append / replace / preview-delete)
POST /api/v1/internal/refresh-compiler US2 — branch-scoped compiler cache eviction, owner/admin only
DELETE /api/v1/dataschema/:dataschemaId US3 — dataschema delete with seven-kind cross-cube reference blocker
GET /api/v1/meta/cube/:cubeName US4 — single-cube compiled-metadata envelope, honours x-hasura-branch-id
POST /api/v1/version/diff US5 — structured diff between two versions on the same branch
POST /api/v1/version/rollback US5 — insert new origin=rollback version whose dataschemas clone the target

Delete and rollback emit durable audit rows via Hasura event triggers into a new audit_logs table with 90-day retention via a daily cron. Refresh is cache-only and emits a non-durable structured log line.

Migration (critical — read before merging)

Adds versions.origin + versions.is_current to the existing versions table. is_current is maintained by a statement-level AFTER INSERT trigger using a NEW TABLE transition so bulk inserts (INSERT ... SELECT) cannot break the invariant. A transaction-scoped advisory lock keyed on the hash of affected branches serialises concurrent inserts on the same branch while different branches still proceed in parallel. Backfill runs in 1 000-row batches inside a DO block.

Rollback was tested round-trip against the live dev DB (up → down → up).

See specs/011-model-mgmt-api/DEPLOYMENT.md for the operator runbook, and services/hasura/migrations/1713600000000_dataschemas_delete_permission/README.md for migration-specific sizing + rollback guidance.

Spec artefacts

  • specs/011-model-mgmt-api/spec.md
  • specs/011-model-mgmt-api/plan.md
  • specs/011-model-mgmt-api/research.md
  • specs/011-model-mgmt-api/data-model.md
  • specs/011-model-mgmt-api/contracts/ — six OpenAPI files, all consistent on the shared ErrorCode enum
  • specs/011-model-mgmt-api/quickstart.md
  • specs/011-model-mgmt-api/tasks.md — every task marked with its real status
  • specs/011-model-mgmt-api/DEPLOYMENT.md

FR-017 error code hygiene

services/cubejs/src/utils/errorCodes.js is the single source of truth (15 codes). scripts/lint-error-codes.mjs fails CI if any contract's ErrorCode.enum drifts from it — wired as yarn lint:error-codes in services/cubejs/package.json.

Test plan

Already verified locally:

  • node --test on services/cubejs/src/{utils,routes}/__tests__/*.test.js → 49 / 49 Model Management tests pass (1 pre-existing unrelated failure: provisionFraiOS.test.js uses Node 22.3+ experimental mock.module)
  • node --test on services/actions/src/rpc/__tests__/*.test.js → 5 / 5 pass
  • scripts/lint-error-codes.mjs → 15 codes × 6 contracts, green
  • ./cli.sh hasura cli "migrate apply --version 1713600000000 --type up/down" round-trips cleanly on local dev DB
  • is_current backfill produces exactly one current per branch (6 branches, 130 total versions)
  • 20×20 multi-row INSERT ... SELECT holds invariant (statement-level trigger)
  • Two-session concurrent 10×2 bulk inserts on the same branch hold invariant (advisory lock)
  • Every pre-existing REST endpoint still works: /api/v1/test, /api/v1/meta, /api/v1/pre-aggregations, /api/v1/get-schema, /api/v1/validate, /api/v1/run-sql, /api/v1/column-values, /api/v1/discover-nested, /api/v1/pre-aggregation-preview, /api/v1/load, /api/v1/generate-models, /api/v1/profile-table, /api/v1/smart-generate, /api/v1/cubesql, /api/v1/discover, /api/v1/meta-all, /api/v1/version, /v1/graphql proxy, /api/v1/internal/invalidate-cache
  • All six new endpoints verified end-to-end with correct happy + sad paths + audit writes against the live dev stack
  • /meta-all latency baseline warm: 36–48 ms / 62 KB (15 datasources, 17 cubes)

Needed on staging before promotion to prod:

  • Measure SELECT count(*) FROM versions; on prod — if > 100 000, schedule backfill in a maintenance window
  • Apply hasura-migrations image + metadata on staging; watch the job log for backfill duration
  • Verify audit_logs is reachable: trigger a delete, confirm the success row lands within ≤5 s
  • /meta-all p95 on a real-scale tenant < 2 s
  • Client-v2 smoke: dashboard load, model editor save, datasource switch

Deferred (not blocking this PR):

  • Tychi skill doc update in cxs-agents repo (separate cross-repo PR)
  • rollback_source_columns_missing check (driver round-trip; Hasura errors surface via hasura_rejected in the interim)
  • Wire tests/workflows/model-management/ into the tests/stepci/workflow.yml include list (operator currently runs it standalone)

Deployment order

Per specs/011-model-mgmt-api/DEPLOYMENT.md: merge → CI builds quicklookup/synmetrix-{cube,actions,hasura-migrations} → bump newTag on all three in cxs repo's data/synmetrix/overlays/staging/kustomization.yaml → validate → promote to production overlay.

The Hasura migrations image must roll out before the cube/actions images — the new routes assume audit_logs exists and versions.origin / versions.is_current columns are present.

🤖 Generated with Claude Code

acmeguy and others added 6 commits April 20, 2026 16:13
Cube.js v1.6 invokes checkSqlAuth as (request, user, password) — three
positional args — see
@cubejs-backend/api-gateway/dist/src/sql-server.js:291,105.

Our implementation declared (_, user) and did:
  password = typeof user === "string" ? user : user?.password
  username = typeof user === "string" ? _     : user?.username

With the v1.6 wire server, user arrives as a plain string (the Postgres
username), so the code took the username as the password AND used the
request metadata object as the username. findSqlCredentials then
received the {protocol, method, apiType} object, and Hasura rejected
the query with:

  parsing Text failed, expected String, but encountered Object
  path: $.selectionSet.sql_credentials.args.where.username._eq

Every SQL API login failed before any password comparison ran
(reproduced via `psql -U <valid> -h <cubejs>` → 28P01).

Fix: match the documented v1.6 signature and keep a defensive branch
for the legacy object-shape call. Also reject non-string username
early so the Hasura GraphQL layer cannot receive a non-string variable.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…gins

queryRewrite's rule-based row filtering relies on
`securityContext.userScope.teamProperties` and `.memberProperties` to
look up per-rule property values (e.g. `partition` from team settings).

defineUserScope populates both from the member's team settings and
member properties. buildSqlSecurityContext (the SQL API path) never
did, so userScope for SQL logins had no team/member properties. Every
rule whose property lookup returned undefined blocked the whole query
and queryRewrite replaced query.filters with:

  [{ member: allMembers[0], operator: "equals",
     values: ["__blocked_by_access_control__"] }]

When the first member was a numeric measure (e.g. `count`), ClickHouse
tried to cast the sentinel to Float64:

  Cannot parse string '__blocked_by_access_control__' as Float64

Fix: buildSqlSecurityContext now resolves the member for the
datasource's team and passes the team settings + member properties
into the scope (matching defineUserScope). Team settings also flow
into buildSecurityContext so the content hash includes them, keeping
cache isolation consistent between REST and SQL paths.

Reproduced via `SELECT count(*) FROM stockout_event` over the Postgres
wire — rule `semantic_events.partition` couldn't resolve
team.partition, blocked fired, sentinel filter crashed ClickHouse.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…agment

Follow-up to #42. buildSqlSecurityContext now resolves teamProperties
and memberProperties from sqlCredentials.user.members, but the GraphQL
query used to load sql_credentials (sqlCredentialsQuery →
membersFragment) never selected those fields. At runtime teamMember
was found but team and properties were undefined, so teamProperties
stayed empty and queryRewrite rules that look up
teamProperties.<key> still blocked every query.

Observed via:
  SELECT count(*) FROM stockout_event
→ still rewritten to:
  HAVING count(*) = toFloat64('__blocked_by_access_control__')

Add team { id settings } and properties to membersFragment so the SQL
API path has the same shape defineUserScope consumes on the REST path.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Brings in 16 patch releases of upstream Cube.js fixes and features,
including cubesql improvements (Tableau format codes, Talend compat,
MEASURE function panic fix, LAG/LEAD pushdown, TO_TIMESTAMP formats,
SET TIMEZONE, FETCH directions, CASE/LIKE planning, pg_catalog.pg_collation,
SAVEPOINT/ROLLBACK TO/RELEASE, SET ROLE auth context, and more).

Does not close the DataGrip introspection gap (regclass in functions,
pg_get_userbyid coercion, OPERATOR(schema.~), CHAR[] arrays,
SHOW server_version, pg_description.objoid mapping, empty pg_index/
pg_constraint — none addressed upstream between 1.6.21 and 1.6.37),
but keeps us current before we build or wait on a JetBrains-friendly
fix.

yarn.lock will regenerate on CI build (Dockerfile runs `yarn --network-timeout 100000`).

No breaking changes relevant to us (server-core access-policy row
filtering breaking change doesn't affect our usage — we don't use
cube's access_policy feature; row filtering is via queryRewrite.js).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Keeps local docker-compose in sync with the Cube.js backend bump.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ube, diff, rollback

Adds six authenticated REST endpoints that let an agent own the full
author-to-publish lifecycle of a cube model without operator assistance:

  POST   /api/v1/validate-in-branch        (US1)
  POST   /api/v1/internal/refresh-compiler (US2)
  DELETE /api/v1/dataschema/:dataschemaId  (US3)
  GET    /api/v1/meta/cube/:cubeName       (US4)
  POST   /api/v1/version/diff              (US5)
  POST   /api/v1/version/rollback          (US5)

Delete and rollback emit durable audit rows via Hasura event triggers
(`audit_dataschema_delete`, `audit_version_rollback`) into a new
`audit_logs` table with 90-day retention via a daily cron trigger. Refresh
is cache-only and emits a non-durable structured log line only.

The Hasura migration (`1713600000000_dataschemas_delete_permission`) adds
`versions.origin` + `versions.is_current` with a statement-level trigger
that uses a NEW TABLE transition table so multi-row inserts cannot break
the invariant. A transaction-scoped advisory lock keyed on affected branches
serialises concurrent inserts on the same branch while different branches
still proceed in parallel. Backfill runs in 1 000-row batches. `/meta-all`
now enriches every cube summary with `dataschema_id` + `file_name` by
parsing each schema once per call (cube-name keyed, since Cube.js v1.6
`metaConfig` omits `fileName`).

All five mutating handlers write a durable audit row on every failure path
(partition mismatch, insufficient role, historical version, blocking
references, Hasura rejection). Success is captured by the event triggers.

New utilities: `compilerCacheInvalidator`, `referenceScanner` (FR-008 seven
kinds), `directVerifyAuth`, `requireOwnerOrAdmin`, `mapHasuraErrorCode`,
`auditWriter`, `metaForBranch`, `versionDiff`, `errorCodes` (FR-017
single-source-of-truth enum). `graphql.js` gains a `preserveErrors` option
so handlers can surface Hasura extension codes as stable FR-017 codes.

Spec + runbook: `specs/011-model-mgmt-api/` (spec.md, plan.md, tasks.md,
research.md, data-model.md, contracts/, quickstart.md, DEPLOYMENT.md,
migration README). `scripts/lint-error-codes.mjs` fails the build if the
error-code enum drifts across `errorCodes.js` and any of the six contracts.

Tests: 49 Vitest-style `node:test` unit tests for the new utilities +
`summarizeCube` + versionDiff adapter + SC-003 fixture corpus; 5
integration tests for the Actions RPC handlers; 8 StepCI workflows under
`tests/workflows/model-management/` including an end-to-end flow.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@acmeguy
Copy link
Copy Markdown
Author

acmeguy commented Apr 20, 2026

CI + test run status

Container builds (manual workflow_dispatch — note: PR-triggered detect-changes is a pre-existing CI quirk where github.event.before is empty on pull_request events; change detection returns nothing and the matrix is skipped. Builds succeed normally on main merge. Triggered manually here to prove the three Dockerfiles still assemble):

  • quicklookup/synmetrix-actions:8e9ffa7 — 36 s
  • quicklookup/synmetrix-cube:8e9ffa7 — 3 m 26 s
  • quicklookup/synmetrix-hasura-migrations:8e9ffa7 — 20 s

Run: https://github.com/smartdataHQ/synmetrix/actions/runs/24695941588

Local test run (node --test):

  • services/cubejs — 470 / 474 pass. Four pre-existing failures unrelated to this PR:
    • provisionFraiOS.test.js — uses Node 22.3+ experimental mock.module; runtime lacks it.
    • arrayJoin.test.js (1 failure), profiler.test.js (2 failures) — smart-generation tests that pre-date this branch.
    • All 49 Model Management tests pass.
  • services/actions — 5 / 5 pass.
  • scripts/lint-error-codes.mjs — green (15 codes × 6 contracts).

Merge readiness: needs code review only. Deploy readiness: see specs/011-model-mgmt-api/DEPLOYMENT.md — staging rehearsal not yet done.

@acmeguy acmeguy merged commit e1dd0bf into main Apr 21, 2026
8 checks passed
@acmeguy acmeguy deleted the 011-model-mgmt-api branch April 21, 2026 06:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants